[Groonga-commit] pgroonga/pgroonga.github.io at 4c2e0e4 [master] Add note about Japanese similar search

Back to archive index

Kouhei Sutou null+****@clear*****
Thu Jul 20 15:41:19 JST 2017


Kouhei Sutou	2017-07-20 15:41:19 +0900 (Thu, 20 Jul 2017)

  New Revision: 4c2e0e40f0b20fb1b27d5c0d9ceaf4e9832b9531
  https://github.com/pgroonga/pgroonga.github.io/commit/4c2e0e40f0b20fb1b27d5c0d9ceaf4e9832b9531

  Message:
    Add note about Japanese similar search

  Modified files:
    _po/ja/reference/operators/similar-search-v2.po
    ja/reference/operators/similar-search-v2.md
    reference/operators/similar-search-v2.md

  Modified: _po/ja/reference/operators/similar-search-v2.po (+32 -1)
===================================================================
--- _po/ja/reference/operators/similar-search-v2.po    2017-07-20 15:32:42 +0900 (e26474f)
+++ _po/ja/reference/operators/similar-search-v2.po    2017-07-20 15:41:19 +0900 (ceb9543)
@@ -1,7 +1,7 @@
 msgid ""
 msgstr ""
 "Project-Id-Version: PACKAGE VERSION\n"
-"PO-Revision-Date: 2017-06-10 13:29+0900\n"
+"PO-Revision-Date: 2017-07-20 15:41+0900\n"
 "Language: ja\n"
 "MIME-Version: 1.0\n"
 "Content-Type: text/plain; charset=UTF-8\n"
@@ -160,3 +160,34 @@ msgstr ""
 "SELECT * FROM memos WHERE content &~? 'MroongaはGroongaを使うMySQLの拡張機能です。';\n"
 "-- ERROR:  pgroonga: operator &~? is available only in index scan\n"
 "```"
+
+msgid "## For Japanese"
+msgstr "## 日本語向け"
+
+msgid ""
+"You should use `TokenMecab` tokenizer instead of the default `TokenBigram` for"
+" similar search against Japanese documents:"
+msgstr "日本語の文書を類似文書検索する場合はデフォルトの`TokenBigram`ではなく`TokenMecab`を使う方がよいです。"
+
+msgid ""
+"```sql\n"
+"CREATE INDEX pgroonga_content_index ON memos\n"
+"  USING pgroonga (content pgroonga.text_full_text_search_ops_v2)\n"
+"  WITH (tokenizer='TokenMecab');\n"
+"```"
+msgstr ""
+
+msgid ""
+"`TokenMecab` will tokenize target documents to words. It improves similar sear"
+"ch precision."
+msgstr "`TokenMecab`は対象の文書を(ほぼ)単語にトークナイズします。これにより類似文書検索の精度が上がります。"
+
+msgid ""
+"See also [`CREATE INDEX USING pgroonga`][create-index-using-pgroonga] how to s"
+"pecify `TokenMecab` tokenizer."
+msgstr ""
+"`TokenMecab`トークナイザーの指定方法については[`CREATE INDEX USING pgroonga`][create-index-usin"
+"g-pgroonga]も参照してください。"
+
+msgid "[create-index-using-pgroonga]:../create-index-using-pgroonga.html"
+msgstr ""

  Modified: ja/reference/operators/similar-search-v2.md (+16 -0)
===================================================================
--- ja/reference/operators/similar-search-v2.md    2017-07-20 15:32:42 +0900 (cfbf5df)
+++ ja/reference/operators/similar-search-v2.md    2017-07-20 15:41:19 +0900 (3cb675c)
@@ -72,3 +72,19 @@ SELECT * FROM memos WHERE content &~? 'MroongaはGroongaを使うMySQLの拡張
 SELECT * FROM memos WHERE content &~? 'MroongaはGroongaを使うMySQLの拡張機能です。';
 -- ERROR:  pgroonga: operator &~? is available only in index scan
 ```
+
+## 日本語向け
+
+日本語の文書を類似文書検索する場合はデフォルトの`TokenBigram`ではなく`TokenMecab`を使う方がよいです。
+
+```sql
+CREATE INDEX pgroonga_content_index ON memos
+  USING pgroonga (content pgroonga.text_full_text_search_ops_v2)
+  WITH (tokenizer='TokenMecab');
+```
+
+`TokenMecab`は対象の文書を(ほぼ)単語にトークナイズします。これにより類似文書検索の精度が上がります。
+
+`TokenMecab`トークナイザーの指定方法については[`CREATE INDEX USING pgroonga`][create-index-using-pgroonga]も参照してください。
+
+[create-index-using-pgroonga]:../create-index-using-pgroonga.html

  Modified: reference/operators/similar-search-v2.md (+16 -0)
===================================================================
--- reference/operators/similar-search-v2.md    2017-07-20 15:32:42 +0900 (e3d5105)
+++ reference/operators/similar-search-v2.md    2017-07-20 15:41:19 +0900 (b369d6e)
@@ -72,3 +72,19 @@ You can't use similar search with sequential scan. If you use similar search wit
 SELECT * FROM memos WHERE content &~? 'Mroonga is a MySQL extension taht uses Groonga';
 -- ERROR:  pgroonga: operator &~? is available only in index scan
 ```
+
+## For Japanese
+
+You should use `TokenMecab` tokenizer instead of the default `TokenBigram` for similar search against Japanese documents:
+
+```sql
+CREATE INDEX pgroonga_content_index ON memos
+  USING pgroonga (content pgroonga.text_full_text_search_ops_v2)
+  WITH (tokenizer='TokenMecab');
+```
+
+`TokenMecab` will tokenize target documents to words. It improves similar search precision.
+
+See also [`CREATE INDEX USING pgroonga`][create-index-using-pgroonga] how to specify `TokenMecab` tokenizer.
+
+[create-index-using-pgroonga]:../create-index-using-pgroonga.html
-------------- next part --------------
HTML����������������������������...
Descargar 



More information about the Groonga-commit mailing list
Back to archive index