Index MySQL Data into Elasticsearch 5.X

This article will demonstrate step by step guideline to index MySQL data into Elasticsearch. Until the Elasticsearch 1.5 version we have different river was there to index MySQL data into the Elasticsearch. But river has been deprecated since Elasticsearch 1.5.

Now Elasticsearch provides support in Logstash to import data from any database that supports the JDBC interface. So instead of using external client we have now native support to index MySQL data into Elasticsearch.

Why Elasticsearch Deprecated River??
Elasticsearch Author said about this,

“By their nature, rivers deal with external systems, and those external systems require external libraries to work with. Those are great to use, but they come with an overhead. Part of it is built in overhead, things like additional memory usage, more sockets, file descriptors and so on. Others, sadly, are bugs.”

In order to Import MySQL data into Elasticsearch we need to install Logstash first. If you are installing it in Linux machine then you can refer my earlier post regarding Install Logstash into Linux.

https://techieroop.com/how-to-install-logstash-in-linux/#.VpvMlFJWLO8

If you done with Logstash installation then next step is to install Logstash input jdbc plugin.

Prerequisites:

1) Drivers: Logstash jdbc plugin doesn’t have JDBC driver libraries. We need to pass appropriate jdbc library explicitly in jdbc_driver_library configuration.  You can download it from official MySQL website or you can download it from maven repository.

2) Obviously Working MySQL database 🙂

For demonstration purpose I will use MySQL database “ecommerce“, MySQL user “roopendra“, password “roopendra” and table is “products“.

Here are the steps to installation of Logstash jdbc plugin

Step 1:
Check your Logstash installation path. In my CentOS it is in /opt/logstash. Now command to install Logstash plugin would be.

sudo /opt/logstash/bin/plugin install logstash-input-jdbc

So before starting into next step I am assuming you have downloaded JDBC driver and ready with all required MySQL related details which we needed in jdbc plugin.

MySQL Database: ecommerce
MySQL Table: products
MySQL User: roopendra
MySQL Password: roopendra
JDBC Library Path: /home/roop/logstash/mysql-connector-java-5.1.36.jar

Step 2: Prepare Logstash jdbc configuration file:

I have created logstash-jdbc.conf in /home/roop/logstash directory

/home/roop/logstash/logstash-jdbc.conf

input {
  jdbc {
    jdbc_driver_library => "/home/roop/logstash/mysql-connector-java-5.1.36.jar"
    jdbc_driver_class => "com.mysql.jdbc.Driver"
    jdbc_connection_string => "jdbc:mysql://localhost:3306/ecommerce"
    jdbc_user => "roopendra"
    jdbc_password => "roopendra"
    parameters => { "product_category" => "electronics" }
    schedule => "* * * * *"
    statement => "select * from products where category = :product_category"
  }
}
output {
    elasticsearch {
        index => "products"
        document_type => "product"
        document_id => "%{unique_id_field_name}"
        hosts => ["localhost:9200"]
    }
}
ParameterDescription
jdbc_driver_libraryFull path of jdbc library
jdbc_driver_classJDBC driver class to load, for MySQL it is com.mysql.jdbc.Driver
jdbc_connection_stringJDBC mysql connection string
jdbc_userMySQL user
jdbc_passwordMySQL user password
scheduleRun input statement every minutes. Available scheduling option
statementMySQL statement
parametersParameter to pass in MySQL statement
ParameterDescription
indexElasticsearch Index name
document_typeElasticsearch Index type.
document_idpid is the primary column of my table which stores product id. change your id column here.
hostsComplete Elasticsearch host and port value. if you have multiple host then you can pass comma seperated host and port here

Step 3: Test and Run Logstash jdbc configuration.

Test configuration:

sudo /opt/logstash/bin/logstash --configtest -f /home/roop/logstash/logstash-jdbc.conf

Output:

/home/roop/logstash/logstash-jdbc.conf
Configuration OK

Run Configuration:

sudo /opt/logstash/bin/logstash -f /home/roop/logstash/logstash-jdbc.conf

Output:

Settings: Default filter workers: 1
Logstash startup completed

Step 4: Verify Records in Elasticsearch

curl -XGET 'localhost:9200/products/product/_search?pretty=true'

In this article I have covered how to  Index MySQL data into Elasticsearch. We have store MySQL table records in Elasticsearch index type product. Similarly you can replace mysql statement with your complex query and parameter.  Another benefits of Logstash jdbc plugin is we don’t need to restart elasticsearch node after installation of logstash jdbc plugin whereas in case of river we have to restart elasticsearch node after installation.

Source: https://www.elastic.co/blog/deprecating-rivers

(Visited 244 times, 16 visits today)