README.md in fluent-plugin-sql-0.2.1 vs README.md in fluent-plugin-sql-0.2.2

- old
+ new

@@ -1,21 +1,21 @@ # SQL input plugin for Fluentd event collector ## Overview -This sql input plugin reads records from a RDBMS periodically. Thus you can replicate tables to other storages through Fluentd. +This sql input plugin reads records from a RDBMS periodically. Thus you can copy tables to other storages through Fluentd. ## How does it work? -This plugin runs following SQL repeatedly every 60 seconds to *tail* a table like `tail` command of UNIX. +This plugin runs following SQL periodically: SELECT * FROM *table* WHERE *update\_column* > *last\_update\_column\_value* ORDER BY *update_column* ASC LIMIT 500 -What you need to configure is *update\_column*. The column needs to be updated every time when you update the row so that this plugin detects newly updated rows. Generally, the column is a timestamp such as `updated_at`. -If you omit to set the column, it uses primary key. And this plugin can't detect updated but it only reads newly inserted rows. +What you need to configure is *update\_column*. The column should be an incremental column (such as AUTO\_ INCREMENT primary key) so that this plugin reads newly INSERTed rows. Alternatively, you can use a column incremented every time when you update the row (such as `last_updated_at` column) so that this plugin reads the UPDATEd rows as well. +If you omit to set *update\_column* parameter, it uses primary key. -It stores last selected rows to a file named state\_file to not forget the last row when fluentd restarted. +It stores last selected rows to a file (named *state\_file*) to not forget the last row when Fluentd restarts. ## Configuration <source> type sql @@ -24,29 +24,29 @@ database rdb_database adapter mysql2_or_postgresql_etc user myusername password mypassword - tag_prefix my.rdb + tag_prefix my.rdb # optional, but recommended - select_interval 60s - select_limit 500 + select_interval 60s # optional + select_limit 500 # optional state_file /var/run/fluentd/sql_state <table> - tag table1 table table1 + tag table1 # optional update_column update_col1 - time_column time_col2 + time_column time_col2 # optional </table> <table> - tag table2 table table2 + tag table2 # optional update_column updated_at - time_column updated_at + time_column updated_at # optional </table> # detects all tables instead of <table> sections #all_tables </source> @@ -65,8 +65,13 @@ \<table\> sections: * **tag** tag name of events (optional; default value is table name) * **table** RDBM table name -* **update_column** -* **time_column** (optional) +* **update_column**: see above description +* **time_column** (optional): if this option is set, this plugin uses this column's value as the the event's time. Otherwise it uses current time. +## Limitation + +You should make sure target tables have index (and/or partitions) on the *update\_column*. Otherwise SELECT causes full table scan and serious performance problem. + +You can't replicate DELETEd rows.