= Three Ruby usages : subtitle Inside Droonga : author Kouhei Sutou : institution ClearCode Inc. : content-source RubyKaigi 2014 : date 2014/09/20 : allotted-time 25m : theme . = Silver sponsor # image # src = images/clear-code-is-silver-sponsor.png # relative_width = 100 == slide property : enable-title-on-image false = Goal * You know three Ruby usages * High-level interface * Glue * Embed * You can remember them later = Targets * High-level interface * Pure Rubyists * Glue * Rubyists who can write C/C++ * Embed * Rubyists who also write C/C++ = Case study # blockquote Implement distributed full-text search engine in Ruby (('note:Abbreviation: DFTSE = Distributed Full-Text Search Engine')) = DFTSE? # image # src = images/distributed-full-text-search-engine.svg # relative_width = 80 == slide property : enable-title-on-image false = Why do we use DFTSE? I'm developing Droonga\n (('note:(A DFTSE implementation in Ruby)'))\n 😃 = High-level interface (('tag:center'))Three Ruby usages * ((*High-level interface*)) * Target: Pure Rubyists * Glue * Embed = High-level interface * Provides\n lower layer feature to\n higher layer * With simpler/convenience API = High-level interface # image # src = images/high-level-interface.svg # relative_height = 95 == slide property : enable-title-on-image false = Example # image # src = images/high-level-interface-examples.svg # relative_width = 100 == slide property : enable-title-on-image false = Droonga: High-level IF (('tag:center'))DFTSE components * Full-text search engine * ((*Messaging system*)) * Cluster management * Process management = Messaging system # image # src = images/droonga-messaging-system.svg # relative_width = 80 == slide property : enable-title-on-image false = Messaging system * Provides\n distributed search feature * Plan how to search * Distribute requests * Merge responses * Users don't know details = Characteristic * Plan how to search * May speed up/down over 100 times * Distribute requests * Network bound operation * Merge responses * CPU and network bound operation = Point * Algorithm is important * Need to find new/existing better algorithm * "Rapid prototype and measure" feedback loop is helpful * Ruby is good at rapid dev. = Glue (('tag:center'))Three Ruby usages * High-level interface * ((*Glue*)) * Target:\n Rubyists who can write C/C++ * Embed = Glue # image # src = images/glue.svg # relative_width = 80 == slide property : enable-title-on-image false = Example # image # src = images/glue-examples.svg # relative_width = 80 == slide property : enable-title-on-image false = Why do we glue? * Reuse existing features = How to glue * Use external library * (('wait'))Implement bindings (('note:(mysql2 gem)')) * Use external command * (('wait'))Spawn command (('note:(Vagrant)')) * Use external service * (('wait'))Implement client = Glue in Droonga * ((*Rroonga*)): Groonga bindings * Groonga: FTSE C library (('note:(and server)')) * Cool.io: libev bindings * libev: Event loop C library\n (('note:(Based on I/O multiplexing and non-blocking I/O)')) * Serf: Clustering tool (('note:(in Droonga)')) = Rroonga in Droonga # image # src = images/droonga-rroonga.svg # relative_width = 80 == slide property : enable-title-on-image false = FTSE in Droonga * Must be fast! * CPU bound processing = For fast Rroonga * ((*Do heavy processing in C*)) * Nice to have Ruby-ish API * (('wait'))Less memory allocation * Cache internal buffer * (('wait'))Multiprocessing * Groonga supports multiprocessing = Search # coderay ruby Groonga::Database.open(ARGV[0]) entries = Groonga["Entries"] entries.select do |record| record.description =~ "Ruby" end = Search - Pure Ruby (ref) # coderay ruby Groonga::Database.open(ARGV[0]) entries = Groonga["Entries"] entries.find_all do |record| # This block is evaluated for each record /Ruby/ =~ record.description end = Search impl. # coderay ruby # (2) Evaluate expression in C entries.select do |record| # (1) Build expression in Ruby # This block is evaluated only once record.description =~ "Ruby" end = Search impl. - Fig. # image # src = images/rroonga-search.svg # relative_width = 80 == slide property : enable-title-on-image false = Search - Benchmark * Ruby (('note:(It's already showed)')) * C = Search - C # coderay c grn_obj *expr; grn_obj *variable; const gchar *filter = "description @ \"Ruby\""; grn_obj *result; GRN_EXPR_CREATE_FOR_QUERY(&ctx, table, expr, variable); grn_expr_parse(&ctx, expr, filter, strlen(filter), NULL, GRN_OP_MATCH, GRN_OP_AND, GRN_EXPR_SYNTAX_SCRIPT); result = grn_table_select(&ctx, table, expr, NULL, GRN_OP_OR); grn_obj_unlink(&ctx, expr); grn_obj_unlink(&ctx, result); = Search - Benchmark (('tag:center'))Ruby impl. is fast enough 😃 # RT Impl., Elapsed time C, 0.6ms Ruby, 0.8ms (('tag:center'))\n (('note:(Full-text search with "Ruby" against 72632 records)')) = Embed (('tag:center'))Three Ruby usages * High-level interface * Glue * ((*Embed*)) * Target:\n Rubyists who also write C/C++ = Embed # image # src = images/embed.svg # relative_width = 90 == slide property : enable-title-on-image false = Examples # image # src = images/embed-examples.svg # relative_width = 90 == slide property : enable-title-on-image false = Embed in Droonga # image # src = images/droonga-mruby.svg # relative_height = 100 == slide property : enable-title-on-image false = CRuby vs. mruby * CRuby * Full featured! * Signal handler isn't needed 😞 * mruby * Multi-interpreters in a process! * You may miss some features 😞 = mruby in Groonga * Query optimizer * Command interface (plan) * High-level interface! * Plugin API (plan) = Query optimizer * Plan how to search * It's a bother 😞 * Light operation than FTS * Depends on data\n (('note:(Choose effective index, use table scan and so on)')) = Example rank < 200 && rank > 100 = Simple impl. # image # src = images/range-search-simple.svg # relative_height = 90 == slide property : enable-title-on-image false = Simple impl. * Slow against\n many out of range data = Optimized impl. # image # src = images/range-search-optimized.svg # relative_width = 90 == slide property : enable-title-on-image false = Is embedding reasonable? Measure = Measure * mruby overhead * Speed-up by optimization = Overhead (('tag:center'))Small overhead: Reasonable😃 # RT (('# conds')), mruby, Elapsed 1, ○, 0.24ms 1, ×, 0.16ms 4, ○, 0.45ms 4, ×, 0.19ms = Speed-up (('tag:center'))Fast for many data:Reasonable😃 # RT (('# records')), mruby, nom mruby 1000, 0.29ms, 0.31ms 10000, 0.31ms, 2.3ms 100000, 0.26ms, 21.1ms 1000000, 0.26ms, 210.2ms = Note * Embedding needs many works * Write bindings, import mruby your build system and ... * How to test your mruby part? * And how to debug? = Conclusion = Conclusion 1 * Describe three Ruby usages * (('wait'))High-level interface * (('wait'))Glue * (('wait'))Embed = Conclusion 2 * High-level interface * Target: Pure Rubyists * Provides lower layer feature to higher layer w/ usable interface * Ruby's flexibility is useful = Conclusion 3 * Glue * Target:\n Rubyists who can write C/C++ * Why: Reuse existing feature * To be fast, do the process in C = Conclusion 4 * Embed * Target:\n Rubyists who also write C/C++ * Why:\n Avoid bother programming by Ruby = Conclusion 5 * Embed * Is it reasonable for your case? * You need many works * Very powerful\n if your case is reasonable😃 = Announcement * ClearCode Inc. * A silver sponsor * Is recruiting * Will do readable code workshop * The next Groonga conference * It's held at 11/29