= PGroonga\n&\nZulip : author Kouhei Sutou : institution ClearCode Inc. : content-source Zulip & PGroonga Night : date 2017-09-06 : allotted-time 30m : theme . = PGroonga * Pronunciation: píːzí:lúnɡά\n (('note:読み方:ぴーじーるんが')) * PostgreSQL extension\n (('note:PostgreSQLの拡張機能')) * Fast full text search\n (('note:高速全文検索機能')) * All languages are supported!\n (('note:全言語対応!')) = Fast?(('note:(高速?)')) * Need to measure to confirm\n (('note:確認するには測定しないと')) * Targets(('note:(測定対象)')) * textsearch (built-in)(('note:(組み込み)')) * pg_bigm (third party)(('note:(外部プロダクト)')) = PGroona and textsearch # image # src = images/search-pgroonga-textsearch.pdf # relative_height = 100 = As fast as textsearch\n(('note:textsearchと同じくらいの速さ')) * textsearch uses word based full text search\n (('note:textsearchは単語ベースの全文検索実装')) * PostgreSQL has enough performance for the approach\n (('note:PostgreSQLはこの方法では十分な性能を出せる')) = textsearch and Japanese\n(('note:textsearchと日本語')) * Asian languages including Japanese aren't supported\n (('note:日本語を含むアジア圏の言語は非サポート')) * Need plugin(('note:(プラグインが必要)')) * Plugin exists but isn't maintained\n (('note:プラグインはあるがメンテナンスされていない')) = Japanese support\n(('note:日本語対応')) * Need one of them(('note:(どちらかが必要)')) * N-gram approach support\n (('note:N-gramというやり方のサポート')) * Japanese specific word based approach support\n (('note:日本語を考慮した単語ベースのやり方のサポート')) (('tag:center')) PGroonga supports both of them\n 🕺 = PostgreSQL and N-gram\n(('note:PostgreSQLとN-gram')) * PostgreSQL is slow with N-gram approach\n (('note:PostgreSQLでN-gramというやり方を使うと遅い')) * N-gram approach: * pg_trgm (contrib) * Japanese isn't supported by default\n (('note:デフォルトでは日本語に対応していない')) * pg_bigm (third-party) = PGroona and pg_bigm # image # src = images/search-pgroonga-pg-bigm.pdf # relative_height = 100 = PGroonga is fast stably\n(('note:PGroongaは安定して速い')) * PostgreSQL needs "recheck" for N-gram approach\n (('note:PostgreSQLはN-gramのときは「recheck」が必要')) * Seq search after index search\n (('note:インデックスサーチのあとにシーケンシャルサーチ')) * PGroonga doesn't need\n(('note:PGroongaでは必要ない')) * Only index search\n (('note:インデックスサーチだけでOK')) = Wrap up\n(('note:まとめ')) * textsearch is fast but Asian langs aren't supported\n (('note:textsearchは速いけどアジア圏の言語を未サポート')) * pg_bigm supports Japanese but is slow for large hits\n (('note:pg_bigmは日本語対応だがヒット数が多くなると遅い')) * PGroonga is fast and supports all languages\n (('note:PGroongaは速くて全言語対応')) = FYI: textsearch, PGroonga and Groonga # image # src = images/search-pgroonga-groonga-textsearch.pdf # relative_height = 100 = Zulip and PGroonga * Zulip uses textsearch by default\n (('note:Zulipはデフォルトでtextsearchを使用')) * Japanese isn't supported\n (('note:日本語非対応')) * Zulip supports PGroonga as option\n (('note:ZulipでPGroongaも使うこともできる')) * Implemented by me\n (('note:私が実装')) = Zulip: full text search\n(('note:Zulipと全文検索')) * Zulip is chat tool\n (('note:Zulipはチャットツール')) * Latency is important for UX\n (('note:UX的にレイテンシーは大事')) * Index update is heavy\n (('note:インデックスの更新は重い')) * Delay index update\n (('note:インデックスの更新を後回しにしている')) = Delay index update\n(('note:インデックス更新を後回し')) # coderay sql CREATE TABLE zerver_message ( rendered_content text, -- ... ↓Column for full text search search_tsvector tsvector ); -- ↓Index for full text search CREATE INDEX zerver_message_search_tsvector ON zerver_message USING gin (search_tsvector); = Delay index update\n(('note:インデックス更新を後回し')) # coderay sql -- Execute append_to_fts_update_log() on change CREATE TRIGGER zerver_message_update_search_tsvector_async BEFORE INSERT OR UPDATE OF rendered_content ON zerver_message FOR EACH ROW EXECUTE PROCEDURE append_to_fts_update_log(); = Delay index update\n(('note:インデックス更新を後回し')) # coderay sql -- Insert ID to fts_update_log table CREATE FUNCTION append_to_fts_update_log() RETURNS trigger LANGUAGE plpgsql AS $$ BEGIN INSERT INTO fts_update_log (message_id) VALUES (NEW.id); RETURN NEW; END $$; = Delay index update\n(('note:インデックス更新を後回し')) # coderay sql -- Keep ID to be updated CREATE TABLE fts_update_log ( id SERIAL PRIMARY KEY, message_id INTEGER NOT NULL ); = Delay index update\n(('note:インデックス更新を後回し')) # coderay sql -- Execute do_notify_fts_update_log() -- on INSERT CREATE TRIGGER fts_update_log_notify AFTER INSERT ON fts_update_log FOR EACH STATEMENT EXECUTE PROCEDURE do_notify_fts_update_log(); = Delay index update\n(('note:インデックス更新を後回し')) # coderay sql -- NOTIFY to fts_update_log channel! CREATE FUNCTION do_notify_fts_update_log() RETURNS trigger LANGUAGE plpgsql AS $$ BEGIN NOTIFY fts_update_log; RETURN NEW; END $$; = Delay index update\n(('note:インデックス更新を後回し')) # coderay python cursor.execute("LISTEN ftp_update_log") # Wait cursor.execute("SELECT id, message_id FROM fts_update_log") ids = [] for (id, message_id) in cursor.fetchall(): cursor.execute("UPDATE zerver_message SET search_tsvector = " "to_tsvector('zulip.english_us_search', " "rendered_content) " "WHERE id = %s", (message_id,)) ids.append(id) cursor.execute("DELETE FROM fts_update_log WHERE id = ANY(%s)", (ids,)) = PGroonga: index update\n(('note:PGroongaとインデックス更新')) * PGroonga's index update is fast too\n (('note:PGroongaはインデックス更新も速い')) * PGroonga's search while index update is still fast\n (('note:PGroongaはインデックス更新中の検索も速い')) = Perf characteristics\n(('note:性能の傾向')) # image # src = images/performance-charcteristic-for-constant-read-and-write.svg # relative_height = 100 = Update and lock\n(('note:更新とロック')) * Update without ((*read*)) locks\n (('note:((*参照*))ロックなしで更新')) * ((*Write*)) locks are required\n (('note:((*書き込み*))ロックは必要')) = GIN: Read/Write\n(('note:GIN:読み書き')) # image # src = images/read-while-write-gin.svg # relative_height = 100 = PGroonga: Read/Write\n(('note:PGroonga:読み書き')) # image # src = images/read-while-write-pgroonga.svg # relative_height = 100 = Wrap up\n(('note:まとめ')) * Zulip: Low latency for UX\n (('note:ZulipはUXのために低レイテンシーをがんばっている')) * Delay index update\n (('note:インデックスの更新は後回し')) * PGroonga: Keeps fast search with update\n (('note:PGroongaは更新しながらでも高速検索を維持')) * Chat friendly characteristics\n (('note:チャット向きの特性')) = More PGroonga features\n(('note:PGroongaの機能いろいろ')) * Query expansion(('note:(クエリー展開)')) * Support synonyms(('note:(同義語検索をサポート)')) * Similar search(('note:(類似文書検索)')) * Find similar messages\n (('note:類似メッセージ検索')) * Fuzzy search(('note:(あいまい検索)')) * Stemming(('note:(ステミング)'))