[Groonga-commit] groonga/groonga at 7b2ed93 [master] doc: add table_tokenize

Back to archive index

naoa null+****@clear*****
Sun Oct 26 21:42:43 JST 2014


naoa	2014-10-26 21:42:43 +0900 (Sun, 26 Oct 2014)

  New Revision: 7b2ed936771bc5d0fa4142051e64ca1c5d8b1741
  https://github.com/groonga/groonga/commit/7b2ed936771bc5d0fa4142051e64ca1c5d8b1741

  Merged ead7e4e: Merge pull request #233 from naoa/doc-tokenize-add-token_filters/table

  Message:
    doc: add table_tokenize

  Added files:
    doc/source/example/reference/commands/table_tokenize/simple_example.log
    doc/source/reference/commands/table_tokenize.rst
  Modified files:
    doc/locale/en/LC_MESSAGES/reference.po
    doc/locale/ja/LC_MESSAGES/reference.po
    doc/source/reference/commands/tokenize.rst

  Modified: doc/locale/en/LC_MESSAGES/reference.po (+88 -9)
===================================================================
--- doc/locale/en/LC_MESSAGES/reference.po    2014-10-26 20:30:12 +0900 (d3a6aaa)
+++ doc/locale/en/LC_MESSAGES/reference.po    2014-10-26 21:42:43 +0900 (f779198)
@@ -7317,6 +7317,94 @@ msgstr ""
 "table_removeはテーブルと定義されているカラムを削除します。カラムに付随するイ"
 "ンデックスも再帰的に削除されます。"
 
+msgid "``table_tokenize``"
+msgstr ""
+
+msgid ""
+"``table_tokenize`` command tokenizes text by the specified table's tokenizer."
+msgstr ""
+"``table_tokenize`` command tokenizes text by the specified table's tokenizer."
+
+msgid ""
+"``table_tokenize`` command has required parameters and optional parameters. "
+"``table`` and ``string`` are required parameters. Others are optional::"
+msgstr ""
+"``table_tokenize`` command has required parameters and optional parameters. "
+"``table`` and ``string`` are required parameters. Others are optional::"
+
+msgid ""
+"``Terms`` table is set ``TokenBigram`` tokenizer, ``NormalizerAuto`` normalizer, "
+"``TokenFilterStopWord`` token filter. It returns tokens that is "
+"generated by tokenizeing ``\"Hello and Good-bye\"`` with ``TokenBigram`` tokenizer. "
+"It is normalized by ``NormalizerAuto`` normalizer. "
+"``and`` token is removed with ``TokenFilterStopWord`` token filter."
+msgstr ""
+"``Terms`` table is set ``TokenBigram`` tokenizer, ``NormalizerAuto`` normalizer, "
+"``TokenFilterStopWord`` token filter. It returns tokens that is "
+"generated by tokenizeing ``\"Hello and Good-bye\"`` with ``TokenBigram`` tokenizer. "
+"It is normalized by ``NormalizerAuto`` normalizer. "
+"``and`` token is removed with ``TokenFilterStopWord`` token filter."
+
+msgid "There are required parameters, ``table`` and ``string``."
+msgstr "There are required parameters, ``table`` and ``string``."
+
+msgid ""
+"It specifies the lexicon table. ``table_tokenize`` command uses the tokenizer, "
+"the normalizer, the token filters that is set the lexicon table."
+msgstr ""
+"It specifies the lexicon table. ``table_tokenize`` command uses the tokenizer, "
+"the normalizer, the token filters that is set the lexicon table."
+
+msgid "It specifies any string which you want to tokenize."
+msgstr "It specifies any string which you want to tokenize."
+
+msgid ""
+"See :ref:`tokenize-string` option in :doc:`/reference/commands/tokenize` "
+"about details."
+msgstr ""
+"See :ref:`tokenize-string` option in :doc:`/reference/commands/tokenize` "
+"about details."
+
+msgid ""
+"It specifies a tokenization customize options. You can specify multiple "
+"options separated by \"``|``\"."
+msgstr ""
+"It specifies a tokenization customize options. You can specify multiple "
+"options separated by \"``|``\"."
+
+msgid ""
+"See :ref:`tokenize-flags` option in :doc:`/reference/commands/tokenize` "
+"about details."
+msgstr ""
+"See :ref:`tokenize-flags` option in :doc:`/reference/commands/tokenize` "
+"about details."
+
+msgid "``mode``"
+msgstr ""
+
+msgid "It specifies a tokenize mode."
+msgstr "It specifies a tokenize mode."
+
+msgid ""
+"See :ref:`tokenize-mode` option in :doc:`/reference/commands/tokenize` about "
+"details."
+msgstr ""
+"See :ref:`tokenize-mode` option in :doc:`/reference/commands/tokenize` about "
+"details."
+
+msgid "``table_tokenize`` command returns tokenized tokens."
+msgstr "``table_tokenize`` command returns tokenized tokens."
+
+msgid ""
+"See :ref:`tokenize-return-value` option in :doc:`/reference/commands/"
+"tokenize` about details."
+msgstr ""
+"See :ref:`tokenize-return-value` option in :doc:`/reference/commands/"
+"tokenize` about details."
+
+msgid ":doc:`/reference/commands/tokenize`"
+msgstr ""
+
 msgid "``tokenize``"
 msgstr "``tokenize``"
 
@@ -7375,9 +7463,6 @@ msgstr ""
 "<http://www.phontron.com/kytea/>`_ based tokenizer by registering "
 "``tokenizers/kytea``."
 
-msgid "It specifies any string which you want to tokenize."
-msgstr "It specifies any string which you want to tokenize."
-
 msgid ""
 "It specifies the normalizer name. ``tokenize`` command uses the normalizer "
 "that is named ``normalizer``. Normalizer is important for N-gram family "
@@ -7469,9 +7554,6 @@ msgstr ""
 "the target string is treated as already tokenized string. Tokenizer just "
 "tokenizes by tokenized delimiter."
 
-msgid "``mode``"
-msgstr ""
-
 msgid ""
 "It specifies a tokenize mode. If the mode is specified ``ADD``, the text is "
 "tokenized by the rule that adding a document. If the mode is specified "
@@ -7570,9 +7652,6 @@ msgstr ""
 msgid "Tokenizer name."
 msgstr ""
 
-msgid ":doc:`/reference/commands/tokenize`"
-msgstr ""
-
 msgid "``truncate``"
 msgstr "``truncate``"
 

  Modified: doc/locale/ja/LC_MESSAGES/reference.po (+91 -0)
===================================================================
--- doc/locale/ja/LC_MESSAGES/reference.po    2014-10-26 20:30:12 +0900 (e9271bb)
+++ doc/locale/ja/LC_MESSAGES/reference.po    2014-10-26 21:42:43 +0900 (69c028e)
@@ -6884,6 +6884,97 @@ msgid ""
 "ンデックスも再帰的に削除されます。"
 msgstr ""
 
+msgid "``table_tokenize``"
+msgstr ""
+
+msgid ""
+"``table_tokenize`` command tokenizes text by the specified table's tokenizer."
+msgstr ""
+"``table_tokenize`` コマンドは指定したテーブルのトークナイザーでテキストを"
+"トークナイズします。"
+
+msgid ""
+"``table_tokenize`` command has required parameters and optional parameters. "
+"``table`` and ``string`` are required parameters. Others are optional::"
+msgstr ""
+"``table_tokenize`` コマンドには必須の引数と省略可能な引数があります。 "
+"``table`` と ``string`` が必須の引数で、他の引数はすべて省略可能です。"
+
+msgid ""
+"``Terms`` table is set ``TokenBigram`` tokenizer, ``NormalizerAuto`` normalizer, "
+"``TokenFilterStopWord`` token filter. It returns tokens that is "
+"generated by tokenizeing ``\"Hello and Good-bye\"`` with ``TokenBigram`` tokenizer. "
+"It is normalized by ``NormalizerAuto`` normalizer. "
+"``and`` token is removed with ``TokenFilterStopWord`` token filter."
+msgstr ""
+"``Terms`` テーブルには、 ``TokenBigram`` トークナイザーと、 ``NormalizerAuto`` "
+"ノーマライザーと、 ``TokenFilterStopWord`` トークンフィルターがセットされていま"
+"す。 この例は ``TokenBigram`` トークナイザーで ``\"Hello and Good-bye\"`` をトー"
+"クナイズしたトークンを返します。トークンは、 ``NormalizerAuto`` ノーマライザーで"
+"正規化されています。 ``and`` トークンは、 ``TokenFilterStopWord`` トークンフィル"
+"ターで除去されています。"
+
+msgid "There are required parameters, ``table`` and ``string``."
+msgstr "必須引数は二つあります。 ``table`` と ``string`` です。"
+
+msgid ""
+"It specifies the lexicon table. ``table_tokenize`` command uses the tokenizer, "
+"the normalizer, the token filters that is set the lexicon table."
+msgstr ""
+"語彙表テーブルを指定します。 ``table_tokenize`` コマンドは、語彙表テーブルにセット"
+"されたトークナイザーとノーマライザーとトークンフィルターを使います。"
+
+msgid "It specifies any string which you want to tokenize."
+msgstr "トークナイズしたい文字列を指定します。"
+
+msgid ""
+"See :ref:`tokenize-string` option in :doc:`/reference/commands/tokenize` "
+"about details."
+msgstr ""
+"詳細は、 :doc:`/reference/commands/tokenize` の See :ref:`tokenize-string` "
+"オプションを参照してください。"
+
+msgid ""
+"It specifies a tokenization customize options. You can specify multiple "
+"options separated by \"``|``\"."
+msgstr ""
+"トークナイズ処理をカスタマイズするオプションを指定します。「 ``|`` 」で区切っ"
+"て複数のオプションを指定することができます。"
+
+msgid ""
+"See :ref:`tokenize-flags` option in :doc:`/reference/commands/tokenize` "
+"about details."
+msgstr ""
+"詳細は、 :doc:`/reference/commands/tokenize` の See :ref:`tokenize-flags` "
+"オプションを参照してください。"
+
+msgid "``mode``"
+msgstr ""
+
+msgid "It specifies a tokenize mode."
+msgstr "トークナイズモードを指定します。"
+
+msgid ""
+"See :ref:`tokenize-mode` option in :doc:`/reference/commands/tokenize` about "
+"details."
+msgstr ""
+"詳細は、 :doc:`/reference/commands/tokenize` の See :ref:`tokenize-mode` "
+"オプションを参照してください。"
+
+msgid "``table_tokenize`` command returns tokenized tokens."
+msgstr ""
+"``table_tokenize`` コマンドはトークナイズしたトークンを返します。"
+
+msgid ""
+"See :ref:`tokenize-return-value` option in :doc:`/reference/commands/"
+"tokenize` about details."
+msgstr ""
+"詳細は、 :doc:`/reference/commands/tokenize` の See :ref:`tokenize-return-value` "
+"オプションを参照してください。"
+
+msgid ":doc:`/reference/commands/tokenize`"
+msgstr ""
+
 msgid "``tokenize``"
 msgstr ""
 

  Added: doc/source/example/reference/commands/table_tokenize/simple_example.log (+42 -0) 100644
===================================================================
--- /dev/null
+++ doc/source/example/reference/commands/table_tokenize/simple_example.log    2014-10-26 21:42:43 +0900 (6979a15)
@@ -0,0 +1,42 @@
+Execution example::
+
+  register token_filters/stop_word
+  # [[0,0.0,0.0],true]
+  table_create Terms TABLE_PAT_KEY ShortText \
+    --default_tokenizer TokenBigram \
+    --normalizer NormalizerAuto \
+    --token_filters TokenFilterStopWord
+  # [[0,0.0,0.0],true]
+  column_create Terms is_stop_word COLUMN_SCALAR Bool
+  # [[0,0.0,0.0],true]
+  load --table Terms
+  [
+  {"_key": "and", "is_stop_word": true}
+  ]
+  # [[0,0.0,0.0],1]
+  table_tokenize Terms "Hello and Good-bye" --mode GET
+  # [
+  #  [
+  #    0,
+  #    0.0,
+  #    0.0
+  #  ],
+  #  [
+  #    {
+  #      "value": "hello",
+  #      "position": 0
+  #    },
+  #    {
+  #      "value": "good",
+  #      "position": 2
+  #    },
+  #    {
+  #      "value": "-",
+  #      "position": 3
+  #    },
+  #    {
+  #      "value": "bye",
+  #      "position": 4
+  #    }
+  #  ]
+  # ]

  Added: doc/source/reference/commands/table_tokenize.rst (+105 -0) 100644
===================================================================
--- /dev/null
+++ doc/source/reference/commands/table_tokenize.rst    2014-10-26 21:42:43 +0900 (702bc44)
@@ -0,0 +1,105 @@
+.. -*- rst -*-
+
+.. highlightlang:: none
+
+.. groonga-command
+.. database: commands_table_tokenize
+
+``table_tokenize``
+==================
+
+Summary
+-------
+
+``table_tokenize`` command tokenizes text by the specified table's tokenizer.
+
+Syntax
+------
+
+``table_tokenize`` command has required parameters and optional parameters.
+``table`` and ``string`` are required parameters. Others are
+optional::
+
+  table_tokenize table
+                 string
+                 [flags=NONE]
+                 [mode=ADD]
+
+Usage
+-----
+
+Here is a simple example.
+
+.. groonga-command
+.. include:: ../../example/reference/commands/table_tokenize/simple_example.log
+.. register token_filters/stop_word
+.. table_create Terms TABLE_PAT_KEY ShortText   --default_tokenizer TokenBigram   --normalizer NormalizerAuto   --token_filters TokenFilterStopWord
+.. column_create Terms is_stop_word COLUMN_SCALAR Bool
+.. load --table Terms
+.. [
+.. {"_key": "and", "is_stop_word": true}
+.. ]
+.. table_tokenize Terms "Hello and Good-bye" --mode GET
+
+``Terms`` table is set ``TokenBigram`` tokenizer, ``NormalizerAuto`` normalizer,
+``TokenFilterStopWord`` token filter. It returns tokens that is
+generated by tokenizeing ``"Hello and Good-bye"`` with ``TokenBigram`` tokenizer.
+It is normalized by ``NormalizerAuto`` normalizer.
+``and`` token is removed with ``TokenFilterStopWord`` token filter.
+
+Parameters
+----------
+
+This section describes all parameters. Parameters are categorized.
+
+Required parameters
+^^^^^^^^^^^^^^^^^^^
+
+There are required parameters, ``table`` and ``string``.
+
+``table``
+"""""""""
+
+It specifies the lexicon table. ``table_tokenize`` command uses the
+tokenizer, the normalizer, the token filters that is set the
+lexicon table.
+
+``string``
+""""""""""
+
+It specifies any string which you want to tokenize.
+
+See :ref:`tokenize-string` option in :doc:`/reference/commands/tokenize` about details.
+
+Optional parameters
+^^^^^^^^^^^^^^^^^^^
+
+There are optional parameters.
+
+``flags``
+"""""""""
+
+It specifies a tokenization customize options. You can specify
+multiple options separated by "``|``". 
+
+See :ref:`tokenize-flags` option in :doc:`/reference/commands/tokenize` about details.
+
+``mode``
+""""""""
+
+It specifies a tokenize mode.
+
+See :ref:`tokenize-mode` option in :doc:`/reference/commands/tokenize` about details.
+
+Return value
+------------
+
+``table_tokenize`` command returns tokenized tokens.
+
+See :ref:`tokenize-return-value` option in :doc:`/reference/commands/tokenize` about details.
+
+See also
+--------
+
+* :doc:`/reference/tokenizers`
+* :doc:`/reference/commands/tokenize`

  Modified: doc/source/reference/commands/tokenize.rst (+14 -0)
===================================================================
--- doc/source/reference/commands/tokenize.rst    2014-10-26 20:30:12 +0900 (c869ec0)
+++ doc/source/reference/commands/tokenize.rst    2014-10-26 21:42:43 +0900 (d5fc6d1)
@@ -52,6 +52,8 @@ Required parameters
 
 There are required parameters, ``tokenizer`` and ``string``.
 
+.. _tokenize-tokenizer:
+
 ``tokenizer``
 """""""""""""
 
@@ -71,6 +73,8 @@ tokenizer plugin by :doc:`register` command. For example, you can use
 `KyTea <http://www.phontron.com/kytea/>`_ based tokenizer by
 registering ``tokenizers/kytea``.
 
+.. _tokenize-string:
+
 ``string``
 """"""""""
 
@@ -90,6 +94,8 @@ Optional parameters
 
 There are optional parameters.
 
+.. _tokenize-normalizer:
+
 ``normalizer``
 """"""""""""""
 
@@ -129,6 +135,8 @@ If you want to tokenize by two characters with noramlizer, use
 All alphabets are tokenized by two characters. And they are normalized
 to lower case characters. For example, ``fu`` is a token.
 
+.. _tokenize-flags:
+
 ``flags``
 """""""""
 
@@ -165,6 +173,8 @@ string. So the character is good character for this puropose. If
 treated as already tokenized string. Tokenizer just tokenizes by
 tokenized delimiter.
 
+.. _tokenize-mode:
+
 ``mode``
 """"""""
 
@@ -191,6 +201,8 @@ Here is an example to the ``GET`` mode.
 
 The last alphabet is tokenized by two characters.
 
+.. _tokenize-token-filters:
+
 ``token_filters``
 """""""""""""""""
 
@@ -199,6 +211,8 @@ tokenizer that is named ``token_filters``.
 
 See :doc:`/reference/token_filters` about token filters.
 
+.. _tokenize-return-value:
+
 Return value
 ------------
 
-------------- next part --------------
HTML����������������������������...
Descargar 



More information about the Groonga-commit mailing list
Back to archive index