Index mysql table into elasticsearch
Index MySQL Data into Elasticsearch 5.X
This article will demonstrate step by step guideline to index MySQL data into Elasticsearch. Until the Elasticsearch 1.5 version we have different river was there to index MySQL data into the Elasticsearch. But river has been deprecated since Elasticsearch 1.5.
Now Elasticsearch provides support in Logstash to import data from any database that supports the JDBC interface. So instead of using external client we have now native support to index MySQL data into Elasticsearch.
Why Elasticsearch Deprecated River??
Elasticsearch Author said about this,
“By their nature, rivers deal with external systems, and those external systems require external libraries to work with. Those are great to use, but they come with an overhead. Part of it is built in overhead, things like additional memory usage, more sockets, file descriptors and so on. Others, sadly, are bugs.”
In order to Import MySQL data into Elasticsearch we need to install Logstash first. If you are installing it in Linux machine then you can refer my earlier post regarding Install Logstash into Linux.
How to install Logstash in Linux
If you done with Logstash installation then next step is to install Logstash input jdbc plugin.
Prerequisites:
1) Drivers: Logstash jdbc plugin doesn’t have JDBC driver libraries. We need to pass appropriate jdbc library explicitly in jdbc_driver_library configuration. You can download it from official MySQL website or you can download it from maven repository.
2) Obviously Working MySQL database 🙂
For demonstration purpose I will use MySQL database “ecommerce“, MySQL user “roopendra“, password “roopendra” and table is “products“.
Here are the steps to installation of Logstash jdbc plugin
Step 1:
Check your Logstash installation path. In my CentOS it is in /opt/logstash. Now command to install Logstash plugin would be.
sudo /opt/logstash/bin/plugin install logstash-input-jdbc
So before starting into next step I am assuming you have downloaded JDBC driver and ready with all required MySQL related details which we needed in jdbc plugin.
MySQL Database: ecommerce
MySQL Table: products
MySQL User: roopendra
MySQL Password: roopendra
JDBC Library Path: /home/roop/logstash/mysql-connector-java-5.1.36.jar
Step 2: Prepare Logstash jdbc configuration file:
I have created logstash-jdbc.conf in /home/roop/logstash directory
/home/roop/logstash/logstash-jdbc.conf
input { jdbc { jdbc_driver_library => "/home/roop/logstash/mysql-connector-java-5.1.36.jar" jdbc_driver_class => "com.mysql.jdbc.Driver" jdbc_connection_string => "jdbc:mysql://localhost:3306/ecommerce" jdbc_user => "roopendra" jdbc_password => "roopendra" parameters => { "product_category" => "electronics" } schedule => "* * * * *" statement => "select * from products where category = :product_category" } } output { elasticsearch { index => "products" document_type => "product" document_id => "%{unique_id_field_name}" hosts => ["localhost:9200"] } }
Parameter | Description |
jdbc_driver_library | Full path of jdbc library |
jdbc_driver_class | JDBC driver class to load, for MySQL it is com.mysql.jdbc.Driver |
jdbc_connection_string | JDBC mysql connection string |
jdbc_user | MySQL user |
jdbc_password | MySQL user password |
schedule | Run input statement every minutes. Available scheduling option |
statement | MySQL statement |
parameters | Parameter to pass in MySQL statement |
Parameter | Description |
index | Elasticsearch Index name |
document_type | Elasticsearch Index type. |
document_id | pid is the primary column of my table which stores product id. change your id column here. |
hosts | Complete Elasticsearch host and port value. if you have multiple host then you can pass comma seperated host and port here |
Step 3: Test and Run Logstash jdbc configuration.
Test configuration:
sudo /opt/logstash/bin/logstash --configtest -f /home/roop/logstash/logstash-jdbc.conf
Output:
/home/roop/logstash/logstash-jdbc.conf Configuration OK
Run Configuration:
sudo /opt/logstash/bin/logstash -f /home/roop/logstash/logstash-jdbc.conf
Output:
Settings: Default filter workers: 1 Logstash startup completed
Step 4: Verify Records in Elasticsearch
curl -XGET 'localhost:9200/products/product/_search?pretty=true'
In this article I have covered how to Index MySQL data into Elasticsearch. We have store MySQL table records in Elasticsearch index type product. Similarly you can replace mysql statement with your complex query and parameter. Another benefits of Logstash jdbc plugin is we don’t need to restart elasticsearch node after installation of logstash jdbc plugin whereas in case of river we have to restart elasticsearch node after installation.
Source: https://www.elastic.co/blog/deprecating-rivers
Anyone know , how to specify parent child relationship using logstash ?
Hello Anish, I think you are talking about Parent Child implementation in Elasticsearch.
Please refer this documentation https://qbox.io/blog/parent-child-relationships-in-elasticsearch
For logstash version 2.3, it’s worth noting that the syntax for the ‘hosts’ directive requires square brackets.
hosts => “localhost:9200”
needs to be:
hosts => [“localhost:9200”]
Details are here:
https://www.elastic.co/guide/en/logstash/current/plugins-outputs-elasticsearch.html#plugins-outputs-elasticsearch-hosts
Mark
The current value in the blog is still in correct. I’ve tried this on logstash 5.1.2, and there is a difference between:
“[localhost:9200]” and [“localhost:9200”]
Good catch. I corrected it. Thanks.
How to configure multiple table in logstash?
how to add more input jdbc parameters like ‘modified’ date which can have dynamic value like today’ s date?