= PGroonga\n&\nZulip

: author
   Kouhei Sutou
: institution
   ClearCode Inc.
: content-source
   Zulip & PGroonga Night
: date
   2017-09-06
: allotted-time
   30m
: theme
   .

= PGroonga

  * Pronunciation: píːzí:lúnɡά\n
    (('note:読み方：ぴーじーるんが'))

  * PostgreSQL extension\n
    (('note:PostgreSQLの拡張機能'))
    * Fast full text search\n
      (('note:高速全文検索機能'))
    * All languages are supported!\n
      (('note:全言語対応！'))

= Fast?(('note:（高速？）'))

  * Need to measure to confirm\n
    (('note:確認するには測定しないと'))
  * Targets(('note:（測定対象）'))
    * textsearch (built-in)(('note:（組み込み）'))
    * pg_bigm (third party)(('note:（外部プロダクト）'))

= PGroona and textsearch

  # image
  # src = images/search-pgroonga-textsearch.pdf
  # relative_height = 100

= As fast as textsearch\n(('note:textsearchと同じくらいの速さ'))

  * textsearch uses word based full text search\n
    (('note:textsearchは単語ベースの全文検索実装'))
  * PostgreSQL has enough performance for the approach\n
    (('note:PostgreSQLはこの方法では十分な性能を出せる'))

= textsearch and Japanese\n(('note:textsearchと日本語'))

  * Asian languages including Japanese aren't supported\n
    (('note:日本語を含むアジア圏の言語は非サポート'))
    * Need plugin(('note:（プラグインが必要）'))
    * Plugin exists but isn't maintained\n
      (('note:プラグインはあるがメンテナンスされていない'))

= Japanese support\n(('note:日本語対応'))

  * Need one of them(('note:（どちらかが必要）'))
    * N-gram approach support\n
      (('note:N-gramというやり方のサポート'))
    * Japanese specific word based approach support\n
      (('note:日本語を考慮した単語ベースのやり方のサポート'))

(('tag:center'))
PGroonga supports both of them\n
🕺

= PostgreSQL and N-gram\n(('note:PostgreSQLとN-gram'))

  * PostgreSQL is slow with N-gram approach\n
    (('note:PostgreSQLでN-gramというやり方を使うと遅い'))
  * N-gram approach:
    * pg_trgm (contrib)
      * Japanese isn't supported by default\n
        (('note:デフォルトでは日本語に対応していない'))
    * pg_bigm (third-party)

= PGroona and pg_bigm

  # image
  # src = images/search-pgroonga-pg-bigm.pdf
  # relative_height = 100

= PGroonga is fast stably\n(('note:PGroongaは安定して速い'))

  * PostgreSQL needs "recheck" for N-gram approach\n
    (('note:PostgreSQLはN-gramのときは「recheck」が必要'))
    * Seq search after index search\n
      (('note:インデックスサーチのあとにシーケンシャルサーチ'))
  * PGroonga doesn't need\n(('note:PGroongaでは必要ない'))
    * Only index search\n
      (('note:インデックスサーチだけでOK'))

= Wrap up\n(('note:まとめ'))

  * textsearch is fast but Asian langs aren't supported\n
    (('note:textsearchは速いけどアジア圏の言語を未サポート'))
  * pg_bigm supports Japanese but is slow for large hits\n
    (('note:pg_bigmは日本語対応だがヒット数が多くなると遅い'))
  * PGroonga is fast and supports all languages\n
    (('note:PGroongaは速くて全言語対応'))

= FYI: textsearch, PGroonga and Groonga

  # image
  # src = images/search-pgroonga-groonga-textsearch.pdf
  # relative_height = 100

= Zulip and PGroonga

  * Zulip uses textsearch by default\n
    (('note:Zulipはデフォルトでtextsearchを使用'))
    * Japanese isn't supported\n
      (('note:日本語非対応'))
  * Zulip supports PGroonga as option\n
    (('note:ZulipでPGroongaも使うこともできる'))
    * Implemented by me\n
      (('note:私が実装'))

= Zulip: full text search\n(('note:Zulipと全文検索'))

  * Zulip is chat tool\n
    (('note:Zulipはチャットツール'))
    * Latency is important for UX\n
      (('note:UX的にレイテンシーは大事'))
  * Index update is heavy\n
    (('note:インデックスの更新は重い'))
    * Delay index update\n
      (('note:インデックスの更新を後回しにしている'))

= Delay index update\n(('note:インデックス更新を後回し'))

  # coderay sql
  CREATE TABLE zerver_message (
    rendered_content text,
    -- ... ↓Column for full text search
    search_tsvector tsvector
  ); -- ↓Index for full text search
  CREATE INDEX zerver_message_search_tsvector
    ON zerver_message
    USING gin (search_tsvector);

= Delay index update\n(('note:インデックス更新を後回し'))

  # coderay sql

  -- Execute append_to_fts_update_log() on change
  CREATE TRIGGER
    zerver_message_update_search_tsvector_async
    BEFORE INSERT OR UPDATE OF rendered_content
      ON zerver_message
    FOR EACH ROW
      EXECUTE PROCEDURE append_to_fts_update_log();

= Delay index update\n(('note:インデックス更新を後回し'))

  # coderay sql

  -- Insert ID to fts_update_log table
  CREATE FUNCTION append_to_fts_update_log()
    RETURNS trigger
    LANGUAGE plpgsql AS $$
      BEGIN
        INSERT INTO fts_update_log (message_id)
          VALUES (NEW.id);
        RETURN NEW;
      END
  $$;

= Delay index update\n(('note:インデックス更新を後回し'))

  # coderay sql

  -- Keep ID to be updated
  CREATE TABLE fts_update_log (
    id SERIAL PRIMARY KEY,
    message_id INTEGER NOT NULL
  );

= Delay index update\n(('note:インデックス更新を後回し'))

  # coderay sql

  -- Execute do_notify_fts_update_log()
  -- on INSERT
  CREATE TRIGGER fts_update_log_notify
    AFTER INSERT ON fts_update_log
    FOR EACH STATEMENT
      EXECUTE PROCEDURE
        do_notify_fts_update_log();

= Delay index update\n(('note:インデックス更新を後回し'))

  # coderay sql

  -- NOTIFY to fts_update_log channel!
  CREATE FUNCTION do_notify_fts_update_log()
    RETURNS trigger
    LANGUAGE plpgsql AS $$
      BEGIN
        NOTIFY fts_update_log;
        RETURN NEW;
      END
    $$;

= Delay index update\n(('note:インデックス更新を後回し'))

  # coderay python

  cursor.execute("LISTEN ftp_update_log") # Wait
  cursor.execute("SELECT id, message_id FROM fts_update_log")
  ids = []
  for (id, message_id) in cursor.fetchall():
    cursor.execute("UPDATE zerver_message SET search_tsvector = "
                     "to_tsvector('zulip.english_us_search', "
                                 "rendered_content) "
                   "WHERE id = %s", (message_id,))
    ids.append(id)
  cursor.execute("DELETE FROM fts_update_log WHERE id = ANY(%s)",
                 (ids,))

= PGroonga: index update\n(('note:PGroongaとインデックス更新'))

  * PGroonga's index update is fast too\n
    (('note:PGroongaはインデックス更新も速い'))
  * PGroonga's search while index update is still fast\n
    (('note:PGroongaはインデックス更新中の検索も速い'))

= Perf characteristics\n(('note:性能の傾向'))

  # image
  # src = images/performance-charcteristic-for-constant-read-and-write.svg
  # relative_height = 100

= Update and lock\n(('note:更新とロック'))

  * Update without ((*read*)) locks\n
    (('note:((*参照*))ロックなしで更新'))
    * ((*Write*)) locks are required\n
      (('note:((*書き込み*))ロックは必要'))

= GIN: Read/Write\n(('note:GIN：読み書き'))

  # image
  # src = images/read-while-write-gin.svg
  # relative_height = 100

= PGroonga: Read/Write\n(('note:PGroonga：読み書き'))

  # image
  # src = images/read-while-write-pgroonga.svg
  # relative_height = 100

= Wrap up\n(('note:まとめ'))

  * Zulip: Low latency for UX\n
    (('note:ZulipはUXのために低レイテンシーをがんばっている'))
    * Delay index update\n
      (('note:インデックスの更新は後回し'))
  * PGroonga: Keeps fast search with update\n
    (('note:PGroongaは更新しながらでも高速検索を維持'))
    * Chat friendly characteristics\n
      (('note:チャット向きの特性'))

= More PGroonga features\n(('note:PGroongaの機能いろいろ'))

  * Query expansion(('note:（クエリー展開）'))
    * Support synonyms(('note:（同義語検索をサポート）'))
  * Similar search(('note:（類似文書検索）'))
    * Find similar messages\n
      (('note:類似メッセージ検索'))
  * Fuzzy search(('note:（あいまい検索）'))
  * Stemming(('note:（ステミング）'))