= PGroonga : subtitle Make PostgreSQL\nfast full text search platform\nfor all languages! : author Kouhei Sutou : institution ClearCode Inc. : content-source PGConf.ASIA 2016 : date 2016-12-03 : allotted-time 45m : theme . = PostgreSQL and me\n(('note:PostgreSQLと私')) Some my patches are merged\n (('note:いくつかパッチがマージされている')) = Patches\n(('note:パッチ')) * (('wait'))BUG #13840: pg_dump generates unloadable SQL\n (('note:pg_dumpがリストアできないSQLを出力する')) * (('wait'))BUG #14160: DROP ACCESS METHOD IF EXISTS isn't impl.\n (('note:DROP ACCESS METHOD IF EXISTSが実装されていない')) (('wait')) (('tag:center')) They are found while developing PGroonga\n (('note:どちらもPGroonga開発中に見つけた問題')) = PGroonga dev style\n(('note:PGroongaの開発スタイル')) * When there are problems in related projects including PostgreSQL\n (('note:PostgreSQLを含む関連プロジェクトに問題があった場合')) * (('wait'))We fix these problems in these projects instead of choosing workaround in PGroonga\n (('note:PGroonga側で回避するのではなく'))\n (('note:関連プロジェクトの方で問題を直す')) = PostgreSQL and FTS\n(('note:PostgreSQLと全文検索')) * PostgreSQL has out-of-the-box full text search feature\n (('note:PostgreSQLには組込の全文検索機能がある'))\n * (('wait'))It has some problems...\n (('note:ただ、いくつか問題がある')) * (('wait'))We fixed them by PGroonga\n (('note:PGroongaを開発することでそれらの問題を修正した')) * (('wait'))instead of fixing PostgreSQL 😓\n (('note:PostgreSQLを修正するのではなくて…')) = Because...\n(('note:理由は…')) (1) (('wait'))Our approach is different from PostgreSQL's approach\n (('note:PGroongaのやり方はPostgreSQLのやり方と違う')) (2) (('wait'))PG provides plugin system\n (('note:PostgreSQLはプラグインの仕組みを提供している')) * Implementing as a plugin is PostgreSQL way!\n (('note:プラグインでの実装はPostgreSQLらしいやり方!')) = PG FTS problem\n(('note:PostgreSQLの全文検索の問題')) (('tag:center')) (('tag:margin-bottom * 4')) Many langs aren't supported\n (('note:サポートしていない言語がたくさんある')) * (('wait'))e.g.: Asian languages\n (('note:例:アジアの言語')) * Japanese, Chinese and more\n (('note:日本語や中国語など')) = FTS for Japanese1\n(('note:日本語の全文検索1')) # coderay sql SELECT to_tsvector('japanese', 'こんにちは'); -- ERROR: text search configuration -- "japanese" does not exist -- LINE 2: to_tsvector('japanese', -- ^ = FTS for Japanese2\n(('note:日本語の全文検索2')) # coderay sql CREATE EXTENSION pg_trgm; SELECT 'こんにちは' % 'にちは'; -- ↑substring -- ?column? -- ---------- -- f ← Must be "t"! -- (1 row) = Existing solution\n(('note:既存の解決策')) (('tag:center')) pg_bigm = pg_bigm * An extension\n (('note:拡張機能')) * (('wait'))Similar to pg_trgm\n (('note:pg_trgmと似ている')) * Operator class for GIN\n (('note:GIN用の演算子クラス')) = pg_bigm: Usage\n(('note:pg_bigm:使い方')) # coderay sql CREATE INDEX index ON table USING gin (column gin_bigm_ops); -- ↑Use GIN ↑Specify op class = pg_bigm: Demerit\n(('note:pg_bigm:デメリット')) * Slow for large document\n (('note:文書が長いと遅い'))\n (('note:(Normally, we want to use FTS for large document)'))\n (('note:(普通は長い文書に対して全文検索したい)')) * (('wait'))Because it needs "recheck"\n (('note:「recheck」が必要だから')) = "recheck" * "Exact" seq. search after\n "loose" index search\n (('note:「ゆるい」インデックス検索の後に実行する'))\n (('note:「正確な」シーケンシャルサーチ')) * (('wait'))The larger text, the slower\n (('note:対象テキストが大きければ大きいほど遅くなる'))\n * (('wait'))text = doc size * N docs\n (('note:対象テキスト = 文書サイズ * 文書数')) = Benchmark\n(('note:ベンチマーク')) # image # src = images/search-pg-bigm.pdf # relative_height = 100 = New solution\n(('note:新しい解決策')) # image # src = images/pgroonga-logo.svg # relative_width = 100 = PGroonga * Pronunciation: píːzí:lúnɡά\n (('note:読み方:ぴーじーるんが')) * (('wait'))An extension\n (('note:拡張機能')) * (('wait'))Index and operator classes\n (('note:インデックスと演算子クラス')) * Not operator classes for GIN\n (('note:GINの演算子クラスではない')) = PGroonga layer # image # src = images/pgroonga-layer.svg # relative_width = 100 = Benchmark\n(('note:ベンチマーク')) # image # src = images/search-pgroonga-pg-bigm.pdf # relative_height = 100 = Wrap up1\n(('note:まとめ1')) * PostgreSQL doesn't support Asian languages\n (('note:PostgreSQLはアジアの言語をサポートしていない')) * pg_bigm and PGroonga support all languages\n (('note:pg_bigmとPGroongaはすべての言語をサポートしている')) = Wrap up2\n(('note:まとめ2')) * Many hits case:\n (('note:ヒット数が多い場合')) * pg_bigm is slow\n (('note:pg_bigmは遅い')) * PGroonga is fast\n (('note:PGroongaは速い')) = Why is PGroonga fast?\n(('note:PGroongaはどうして速いのか')) * Doesn't need "recheck"\n (('note:「recheck」が必要ないから')) * (('wait'))Is "recheck" really slow?\n (('note:本当に「recheck」が遅いの?')) * See one more benchmark result\n (('note:もう一つベンチマーク結果を見てみましょう')) = Benchmark\n(('note:ベンチマーク')) # image # src = images/search-pgroonga-pg-bigm-all.pdf # relative_height = 100 = Why is pg_bigm fast?\n(('note:pg_bigmはどうして速いのか')) * Query is "日本"\n (('note:クエリーは「日本」')) * Point: 2 characters\n (('note:ポイント:2文字')) * (('wait'))pg_bigm doesn't need "recheck" for 2 chars query\n (('note:pg_bigmは2文字のクエリーに「recheck」の必要がない')) * It means that "recheck" is slow\n (('note:つまり「recheck」が遅いということ')) = N-gram and "recheck"\n(('note:N-gramと「recheck」')) * N-gram approach needs "phrase search" when query has N or more characters\n (('note:N+1文字以上のクエリーには「フレーズ検索」が必要')) * N=2 for pg_bigm, N=3 for pg_trgm\n (('note:pg_bigmはN=2でpg_trgmはN=3')) * (('wait'))GIN needs "recheck" for "phrase search"\n (('note:GINは「フレーズ検索」には「recheck」が必要')) = Phrase search\n(('note:フレーズ検索')) * Phrase search is "token search" and "position check"\n (('note:フレーズ検索は「トークン検索」と「位置チェック」')) * Tokens must exist and be ordered\n (('note:トークンは同じ順序で出現していないといけない')) * OK: "car at" for "car at" query * NG: "at car" for "car at" query = N-gram and phrase search (1) Split text to tokens\n (('note:テキストをトークンに分割')) * "cat"→"ca","at" (2) ((*Search all tokens*))\n (('note:すべてのトークンを検索')) * "ca" and "at" exist: Candidate! (3) ((*Check appearance pos.*))\n (('note:出現位置をチェック')) * "ca" then "at": Found! = N-gram and GIN: Create\n(('note:N-gramとGIN:作成')) # image # src = images/gin-n-gram-index.svg # relative_height = 100 = N-gram and GIN: Search\n(('note:N-gramとGIN:検索')) # image # src = images/gin-n-gram-search.svg # relative_height = 100 = GIN and phrase search\n(('note:GINとフレーズ検索')) * Phrase search needs position check\n (('note:フレーズ検索では出現位置チェックが必要')) * GIN doesn't support position check\n (('note:GINは出現位置チェックをサポートしていない')) * →GIN needs "recheck"→Slow!\n (('note:GINでは「recheck」が必要だから遅い')) = Why is PGroonga fast?\n(('note:PGroongaはどうして速いのか')) * PGroonga uses N-gram (('note:by default'))\n (('note:PGroongaはデフォルトでN-gramを使っている')) * But doesn't need "recheck"\n (('note:PGroongaは「recheck」の必要がない')) = Why no "recheck"?\n(('note:どうして「recheck」が必要ないのか')) PGroonga uses\n ((*full*))\n inverted index\n (('note:PGroongaは((*完全*))転置インデックスを使っているから')) = Full inverted index\n(('note:完全転置インデックス')) Including position\n (('note:位置情報を含む')) = Inverted index diff\n(('note:転置インデックスの違い')) # image # src = images/inverted-index.svg # relative_height = 100 = N-gram/PGroonga: Search\n(('note:N-gramとPGroonga:検索')) # image # src = images/pgroonga-n-gram-search.svg # relative_height = 100 = Wrap up\n(('note:まとめ')) * N-gram needs phrase search\n (('note:N-gramの場合はフレーズ検索が必要')) * Full inverted index provides fast phrase search\n (('note:完全転置インデックスを使うと高速にフレーズ検索できる')) * GIN isn't full inverted index\n (('note:GINは完全転置インデックスではない')) * PGroonga uses full inverted index\n (('note:PGroongaは完全転置インデックスを使っている')) = FTS and English(*)\n(('note:全文検索と英語')) * Normally, N-gram isn't used for English FTS\n (('note:普通は英語の全文検索にN-gramを使わない')) * (('wait'))N-gram is slower than word based approach (textsearch approach)\n (('note:N-gramは単語ベースのやり方(textsearchのやり方)より遅め')) * (('wait'))Stemming/stop word can't be used\n (('note:N-gramではステミングとストップワードを使えない')) (('note:(*) English≒Alphabet based languages')) = PGroonga and English\n(('note:PGroongaと英語')) * PGroonga uses N-gram (('note:by default'))\n (('note:PGroongaはデフォルトではN-gramを使う')) * Is PGroonga slow for English?\n (('note:ではPGroongaは英語では遅いのか?')) * (('wait'))No. Similar to textsearch\n (('note:遅くない。textsearchと同じくらい')) = PGroonga: Search\n(('note:PGroonga:検索')) # image # src = images/search-pgroonga-textsearch.pdf # relative_height = 100 = PGroonga's N-gram * Variable size N-gram\n (('note:可変長サイズのN-gram')) * Continuous alphabets are 1 token\n (= word based approach)\n (('note:連続したアルファベットは1トークン(=単語ベース)'))\n Hello→"Hello" not "He","el",… * No alphabet is 2-gram\n (('note:非アルファベットは2-gram'))\n こんにちは→"こん","んに",… = Wrap up1\n(('note:まとめ1')) * PGroonga's search is fast for all languages\n (('note:PGroongaの検索はすべての言語で速い')) * Including alphabet based languages and Asian languages mixed case\n (('note:アルファベットベースの言語とアジアの言語が混ざった場合でも速い'))\n (('note:(textsearch doesn't support mixed case)'))\n (('note:(textsearchは言語を混ぜることはできない)')) = Wrap up2\n(('note:まとめ2')) (('tag:center')) PGroonga makes PostgreSQL\n fast full text search platform\n for all languages! (('tag:center')) (('note:PGroongaでPostgreSQLが'))\n (('note:全言語対応高速全文検索プラットフォームになる!')) = More about PGroonga\n(('note:PGroongaについてもっと')) * Performance\n (('note:性能')) * JSON support\n (('note:JSONサポート')) * Replication\n (('note:レプリケーション')) = Performance\n(('note:性能')) * Search and update\n(('note:検索と更新')) * Index only scan\n(('note:インデックスオンリースキャン')) * Direct Groonga search\n(('note:直接Groongaで検索')) * Index creation\n(('note:インデックス作成')) = Search and update\n(('note:検索と更新')) * Doesn't decrease search performance while updating\n (('note:更新中も検索性能が落ちない')) = Characteristics\n(('note:傾向')) # image # src = images/performance-charcteristic-for-constant-read-and-write.svg # relative_height = 100 = Update and lock\n(('note:更新とロック')) * Update without ((*read*)) locks\n (('note:((*参照*))ロックなしで更新')) * ((*Write*)) locks are required\n (('note:((*書き込み*))ロックは必要')) = GIN: Read/Write\n(('note:GIN:読み書き')) # image # src = images/read-while-write-gin.svg # relative_height = 100 = PGroonga: Read/Write\n(('note:PGroonga:読み書き')) # image # src = images/read-while-write-pgroonga.svg # relative_height = 100 = Fast stably\n(('note:安定して速い')) * GIN has intermittent performance decrements\n (('note:GINは間欠的な性能劣化がある')) * For details:🔎"GIN pending list"\n (('note:詳細は「GIN pending list」で検索')) * PGroonga keeps fast search\n (('note:PGroongaは高速な検索を維持')) * PGroonga keeps index latest\n (('note:PGroongaのインデックスは常に最新状態')) = Index only scan\n(('note:インデックスオンリースキャン')) * GIN: Not supported\n (('note:GIN:未サポート')) * PGroonga: Supported\n (('note:PGroonga:サポート')) = More faster search\n(('note:より高速な検索')) * Direct Groonga search is more faster\n (('note:直接Groongaで検索するとさらに高速')) * Groonga: Full text search engine PGroonga uses\n (('note:Groonga:PGroongaが使っている全文検索エンジン')) = Direct Groonga search\n(('note:直接Groongaで検索')) # image # src = images/search-pgroonga-groonga-textsearch.pdf # relative_height = 100 = Index creation time\n(('note:インデックス作成時間')) # image # src = images/index-creation-en.pdf # relative_height = 100 = Performance: Wrap up\n(('note:性能:まとめ')) * Keep fast search with update\n (('note:更新しながらでも高速検索を維持')) * Support index only scan\n (('note:インデックスオンリースキャンをサポート')) * Direct Groonga search is more faster\n (('note:Groonga直接検索はもっと速い')) * Fast index creation\n (('note:インデックス作成も速い')) = JSON support\n(('note:JSONサポート')) * Support full text search\n (('note:全文検索対応')) * Target: All texts in JSON\n (('note:JSON内のすべてのテキスト')) * Not only a text in a path\n (('note:特定のパスのテキストだけではない'))\n (('note:(GIN supports only this style)'))\n (('note:(GINはこのやり方だけサポート)')) = JSON: FTS\n(('note:JSON:全文検索')) # coderay sql INSERT INTO logs (record) VALUES ('{"host": "app1"}'), ('{"message": "app is down"}'); SELECT * FROM logs WHERE record @@ 'string @@ "app"' -- record -- -------------------------- -- {"host": "app1"} -- {"message": "app is down"} = JSON: Wrap up\n(('note:JSON:まとめ')) * Support full text search against all texts in JSON\n (('note:JSON内の全テキスト対象の全文検索をサポート')) = Replication\n(('note:レプリケーション')) * Support with PostgreSQL 9.6!\n (('note:PostgreSQL 9.6で使う場合はサポート!')) * PostgreSQL 9.6 ships "generic WAL"\n (('note:PostgreSQL 9.6で「generic WAL」機能が追加')) * Third party index can support WAL generation\n (('note:サードパーティーのインデックスもWALを生成できる')) = Implementation\n(('note:実装')) (1) Master: Encode action logs as MessagePack\n (('note:マスター:アクションログをMessagePackでエンコード')) (2) Master: Write the action logs to WAL\n (('note:マスター:アクションログをWALに書き込み')) (3) Slaves: Read the action logs and apply them\n (('note:スレーブ:アクションログを読み込んで適用')) = Overview\n(('note:概要')) # image # src = images/replication-overview.svg # relative_width = 100 = Action log: "action"\n(('note:アクションログ:「アクション」')) { "_action": ACTION_ID } # ACTION_ID: 0: INSERT # ACTION_ID: 1: CREATE_TABLE # ACTION_ID: 2: CREATE_COLUMN # ACTION_ID: 3: SET_SOURCES = Action log: INSERT\n(('note:アクションログ:INSERT')) { "_action": 0, "_table": "TABLE_NAME", "ctid": PACKED_CTID_VALUE, "column1": COLUMN1_VALUE, ... } = Action log: Logs\n(('note:アクションログ:複数ログ')) {"_action": ACTION_ID, ...} {"_action": ACTION_ID, ...} {"_action": ACTION_ID, ...} ... = Write action logs\n(('note:アクションログの書き込み')) # image # src = images/replication-write-action-logs.svg # relative_height = 100 = Apply action logs\n(('note:アクションログの適用')) # image # src = images/replication-apply-action-logs.svg # relative_height = 100 = Action log: Why msgpack?\n(('note:アクションログ:どうしてmsgpack?')) * Because MessagePack supports streaming unpack\n (('note:MessagePackはストリーミングで展開できるから')) * It's useful to stop applying action logs when WAL is applied partially on slaves\n (('note:スレーブでWALが途中までしか書き込まれていないときにアクションログの適用を中断できるので便利')) = Replication: Wrap up\n(('note:レプリケーション:まとめ')) * Support with PostgreSQL 9.6!\n (('note:PostgreSQL 9.6で使う場合はサポート!')) * Concept: Action logs on WAL\n (('note:コンセプト:WAL上にアクションログを書く')) * It'll be an useful pattern for out of PostgreSQL storage index\n (('note:PostgreSQL管理外のストレージを使うインデックスではこのパターンが使えるはず')) = Wrap up1\n(('note:まとめ1')) * PostgreSQL doesn't support FTS for all languages\n (('note:PostgreSQLの全文検索は一部の言語のみ対応')) * PGroonga supports FTS for all languages\n (('note:PGroongaの全文検索は全言語対応')) = Wrap up2\n(('note:まとめ2')) * PGroonga is fast stably\n (('note:PGroongaは安定して速い')) * PGroonga supports FTS for all texts in JSON\n (('note:PGroongaはJSON中の全テキストに対する全文検索に対応')) = Wrap up3\n(('note:まとめ3')) * PGroonga supports replication\n (('note:PGroongaはレプリケーション対応')) * PostgreSQL 9.6 is required\n (('note:ただしPostgreSQL 9.6が必要')) = Wrap up4\n(('note:まとめ4')) (('tag:center')) PGroonga makes PostgreSQL\n fast full text search platform\n for all languages! (('tag:center')) (('note:PGroongaでPostgreSQLが'))\n (('note:全言語対応高速全文検索プラットフォームになる!')) = See also ((<"https://pgroonga.github.io/"|URL:https://pgroonga.github.io/>)) * Tutorial: ((<"/tutorial/"|URL:https://pgroonga.github.io/tutorial/>)) * Install: ((<"/install/"|URL:https://pgroonga.github.io/install/>)) * Reference: ((<"/reference/"|URL:https://pgroonga.github.io/reference/>)) * Includes replication doc and benchmark docs * Community: ((<"/community/"|URL:https://pgroonga.github.io/community/>))