naoa
null+****@clear*****
Sun Oct 26 21:42:43 JST 2014
naoa 2014-10-26 21:42:43 +0900 (Sun, 26 Oct 2014) New Revision: 7b2ed936771bc5d0fa4142051e64ca1c5d8b1741 https://github.com/groonga/groonga/commit/7b2ed936771bc5d0fa4142051e64ca1c5d8b1741 Merged ead7e4e: Merge pull request #233 from naoa/doc-tokenize-add-token_filters/table Message: doc: add table_tokenize Added files: doc/source/example/reference/commands/table_tokenize/simple_example.log doc/source/reference/commands/table_tokenize.rst Modified files: doc/locale/en/LC_MESSAGES/reference.po doc/locale/ja/LC_MESSAGES/reference.po doc/source/reference/commands/tokenize.rst Modified: doc/locale/en/LC_MESSAGES/reference.po (+88 -9) =================================================================== --- doc/locale/en/LC_MESSAGES/reference.po 2014-10-26 20:30:12 +0900 (d3a6aaa) +++ doc/locale/en/LC_MESSAGES/reference.po 2014-10-26 21:42:43 +0900 (f779198) @@ -7317,6 +7317,94 @@ msgstr "" "table_removeはテーブルと定義されているカラムを削除します。カラムに付随するイ" "ンデックスも再帰的に削除されます。" +msgid "``table_tokenize``" +msgstr "" + +msgid "" +"``table_tokenize`` command tokenizes text by the specified table's tokenizer." +msgstr "" +"``table_tokenize`` command tokenizes text by the specified table's tokenizer." + +msgid "" +"``table_tokenize`` command has required parameters and optional parameters. " +"``table`` and ``string`` are required parameters. Others are optional::" +msgstr "" +"``table_tokenize`` command has required parameters and optional parameters. " +"``table`` and ``string`` are required parameters. Others are optional::" + +msgid "" +"``Terms`` table is set ``TokenBigram`` tokenizer, ``NormalizerAuto`` normalizer, " +"``TokenFilterStopWord`` token filter. It returns tokens that is " +"generated by tokenizeing ``\"Hello and Good-bye\"`` with ``TokenBigram`` tokenizer. " +"It is normalized by ``NormalizerAuto`` normalizer. " +"``and`` token is removed with ``TokenFilterStopWord`` token filter." +msgstr "" +"``Terms`` table is set ``TokenBigram`` tokenizer, ``NormalizerAuto`` normalizer, " +"``TokenFilterStopWord`` token filter. It returns tokens that is " +"generated by tokenizeing ``\"Hello and Good-bye\"`` with ``TokenBigram`` tokenizer. " +"It is normalized by ``NormalizerAuto`` normalizer. " +"``and`` token is removed with ``TokenFilterStopWord`` token filter." + +msgid "There are required parameters, ``table`` and ``string``." +msgstr "There are required parameters, ``table`` and ``string``." + +msgid "" +"It specifies the lexicon table. ``table_tokenize`` command uses the tokenizer, " +"the normalizer, the token filters that is set the lexicon table." +msgstr "" +"It specifies the lexicon table. ``table_tokenize`` command uses the tokenizer, " +"the normalizer, the token filters that is set the lexicon table." + +msgid "It specifies any string which you want to tokenize." +msgstr "It specifies any string which you want to tokenize." + +msgid "" +"See :ref:`tokenize-string` option in :doc:`/reference/commands/tokenize` " +"about details." +msgstr "" +"See :ref:`tokenize-string` option in :doc:`/reference/commands/tokenize` " +"about details." + +msgid "" +"It specifies a tokenization customize options. You can specify multiple " +"options separated by \"``|``\"." +msgstr "" +"It specifies a tokenization customize options. You can specify multiple " +"options separated by \"``|``\"." + +msgid "" +"See :ref:`tokenize-flags` option in :doc:`/reference/commands/tokenize` " +"about details." +msgstr "" +"See :ref:`tokenize-flags` option in :doc:`/reference/commands/tokenize` " +"about details." + +msgid "``mode``" +msgstr "" + +msgid "It specifies a tokenize mode." +msgstr "It specifies a tokenize mode." + +msgid "" +"See :ref:`tokenize-mode` option in :doc:`/reference/commands/tokenize` about " +"details." +msgstr "" +"See :ref:`tokenize-mode` option in :doc:`/reference/commands/tokenize` about " +"details." + +msgid "``table_tokenize`` command returns tokenized tokens." +msgstr "``table_tokenize`` command returns tokenized tokens." + +msgid "" +"See :ref:`tokenize-return-value` option in :doc:`/reference/commands/" +"tokenize` about details." +msgstr "" +"See :ref:`tokenize-return-value` option in :doc:`/reference/commands/" +"tokenize` about details." + +msgid ":doc:`/reference/commands/tokenize`" +msgstr "" + msgid "``tokenize``" msgstr "``tokenize``" @@ -7375,9 +7463,6 @@ msgstr "" "<http://www.phontron.com/kytea/>`_ based tokenizer by registering " "``tokenizers/kytea``." -msgid "It specifies any string which you want to tokenize." -msgstr "It specifies any string which you want to tokenize." - msgid "" "It specifies the normalizer name. ``tokenize`` command uses the normalizer " "that is named ``normalizer``. Normalizer is important for N-gram family " @@ -7469,9 +7554,6 @@ msgstr "" "the target string is treated as already tokenized string. Tokenizer just " "tokenizes by tokenized delimiter." -msgid "``mode``" -msgstr "" - msgid "" "It specifies a tokenize mode. If the mode is specified ``ADD``, the text is " "tokenized by the rule that adding a document. If the mode is specified " @@ -7570,9 +7652,6 @@ msgstr "" msgid "Tokenizer name." msgstr "" -msgid ":doc:`/reference/commands/tokenize`" -msgstr "" - msgid "``truncate``" msgstr "``truncate``" Modified: doc/locale/ja/LC_MESSAGES/reference.po (+91 -0) =================================================================== --- doc/locale/ja/LC_MESSAGES/reference.po 2014-10-26 20:30:12 +0900 (e9271bb) +++ doc/locale/ja/LC_MESSAGES/reference.po 2014-10-26 21:42:43 +0900 (69c028e) @@ -6884,6 +6884,97 @@ msgid "" "ンデックスも再帰的に削除されます。" msgstr "" +msgid "``table_tokenize``" +msgstr "" + +msgid "" +"``table_tokenize`` command tokenizes text by the specified table's tokenizer." +msgstr "" +"``table_tokenize`` コマンドは指定したテーブルのトークナイザーでテキストを" +"トークナイズします。" + +msgid "" +"``table_tokenize`` command has required parameters and optional parameters. " +"``table`` and ``string`` are required parameters. Others are optional::" +msgstr "" +"``table_tokenize`` コマンドには必須の引数と省略可能な引数があります。 " +"``table`` と ``string`` が必須の引数で、他の引数はすべて省略可能です。" + +msgid "" +"``Terms`` table is set ``TokenBigram`` tokenizer, ``NormalizerAuto`` normalizer, " +"``TokenFilterStopWord`` token filter. It returns tokens that is " +"generated by tokenizeing ``\"Hello and Good-bye\"`` with ``TokenBigram`` tokenizer. " +"It is normalized by ``NormalizerAuto`` normalizer. " +"``and`` token is removed with ``TokenFilterStopWord`` token filter." +msgstr "" +"``Terms`` テーブルには、 ``TokenBigram`` トークナイザーと、 ``NormalizerAuto`` " +"ノーマライザーと、 ``TokenFilterStopWord`` トークンフィルターがセットされていま" +"す。 この例は ``TokenBigram`` トークナイザーで ``\"Hello and Good-bye\"`` をトー" +"クナイズしたトークンを返します。トークンは、 ``NormalizerAuto`` ノーマライザーで" +"正規化されています。 ``and`` トークンは、 ``TokenFilterStopWord`` トークンフィル" +"ターで除去されています。" + +msgid "There are required parameters, ``table`` and ``string``." +msgstr "必須引数は二つあります。 ``table`` と ``string`` です。" + +msgid "" +"It specifies the lexicon table. ``table_tokenize`` command uses the tokenizer, " +"the normalizer, the token filters that is set the lexicon table." +msgstr "" +"語彙表テーブルを指定します。 ``table_tokenize`` コマンドは、語彙表テーブルにセット" +"されたトークナイザーとノーマライザーとトークンフィルターを使います。" + +msgid "It specifies any string which you want to tokenize." +msgstr "トークナイズしたい文字列を指定します。" + +msgid "" +"See :ref:`tokenize-string` option in :doc:`/reference/commands/tokenize` " +"about details." +msgstr "" +"詳細は、 :doc:`/reference/commands/tokenize` の See :ref:`tokenize-string` " +"オプションを参照してください。" + +msgid "" +"It specifies a tokenization customize options. You can specify multiple " +"options separated by \"``|``\"." +msgstr "" +"トークナイズ処理をカスタマイズするオプションを指定します。「 ``|`` 」で区切っ" +"て複数のオプションを指定することができます。" + +msgid "" +"See :ref:`tokenize-flags` option in :doc:`/reference/commands/tokenize` " +"about details." +msgstr "" +"詳細は、 :doc:`/reference/commands/tokenize` の See :ref:`tokenize-flags` " +"オプションを参照してください。" + +msgid "``mode``" +msgstr "" + +msgid "It specifies a tokenize mode." +msgstr "トークナイズモードを指定します。" + +msgid "" +"See :ref:`tokenize-mode` option in :doc:`/reference/commands/tokenize` about " +"details." +msgstr "" +"詳細は、 :doc:`/reference/commands/tokenize` の See :ref:`tokenize-mode` " +"オプションを参照してください。" + +msgid "``table_tokenize`` command returns tokenized tokens." +msgstr "" +"``table_tokenize`` コマンドはトークナイズしたトークンを返します。" + +msgid "" +"See :ref:`tokenize-return-value` option in :doc:`/reference/commands/" +"tokenize` about details." +msgstr "" +"詳細は、 :doc:`/reference/commands/tokenize` の See :ref:`tokenize-return-value` " +"オプションを参照してください。" + +msgid ":doc:`/reference/commands/tokenize`" +msgstr "" + msgid "``tokenize``" msgstr "" Added: doc/source/example/reference/commands/table_tokenize/simple_example.log (+42 -0) 100644 =================================================================== --- /dev/null +++ doc/source/example/reference/commands/table_tokenize/simple_example.log 2014-10-26 21:42:43 +0900 (6979a15) @@ -0,0 +1,42 @@ +Execution example:: + + register token_filters/stop_word + # [[0,0.0,0.0],true] + table_create Terms TABLE_PAT_KEY ShortText \ + --default_tokenizer TokenBigram \ + --normalizer NormalizerAuto \ + --token_filters TokenFilterStopWord + # [[0,0.0,0.0],true] + column_create Terms is_stop_word COLUMN_SCALAR Bool + # [[0,0.0,0.0],true] + load --table Terms + [ + {"_key": "and", "is_stop_word": true} + ] + # [[0,0.0,0.0],1] + table_tokenize Terms "Hello and Good-bye" --mode GET + # [ + # [ + # 0, + # 0.0, + # 0.0 + # ], + # [ + # { + # "value": "hello", + # "position": 0 + # }, + # { + # "value": "good", + # "position": 2 + # }, + # { + # "value": "-", + # "position": 3 + # }, + # { + # "value": "bye", + # "position": 4 + # } + # ] + # ] Added: doc/source/reference/commands/table_tokenize.rst (+105 -0) 100644 =================================================================== --- /dev/null +++ doc/source/reference/commands/table_tokenize.rst 2014-10-26 21:42:43 +0900 (702bc44) @@ -0,0 +1,105 @@ +.. -*- rst -*- + +.. highlightlang:: none + +.. groonga-command +.. database: commands_table_tokenize + +``table_tokenize`` +================== + +Summary +------- + +``table_tokenize`` command tokenizes text by the specified table's tokenizer. + +Syntax +------ + +``table_tokenize`` command has required parameters and optional parameters. +``table`` and ``string`` are required parameters. Others are +optional:: + + table_tokenize table + string + [flags=NONE] + [mode=ADD] + +Usage +----- + +Here is a simple example. + +.. groonga-command +.. include:: ../../example/reference/commands/table_tokenize/simple_example.log +.. register token_filters/stop_word +.. table_create Terms TABLE_PAT_KEY ShortText --default_tokenizer TokenBigram --normalizer NormalizerAuto --token_filters TokenFilterStopWord +.. column_create Terms is_stop_word COLUMN_SCALAR Bool +.. load --table Terms +.. [ +.. {"_key": "and", "is_stop_word": true} +.. ] +.. table_tokenize Terms "Hello and Good-bye" --mode GET + +``Terms`` table is set ``TokenBigram`` tokenizer, ``NormalizerAuto`` normalizer, +``TokenFilterStopWord`` token filter. It returns tokens that is +generated by tokenizeing ``"Hello and Good-bye"`` with ``TokenBigram`` tokenizer. +It is normalized by ``NormalizerAuto`` normalizer. +``and`` token is removed with ``TokenFilterStopWord`` token filter. + +Parameters +---------- + +This section describes all parameters. Parameters are categorized. + +Required parameters +^^^^^^^^^^^^^^^^^^^ + +There are required parameters, ``table`` and ``string``. + +``table`` +""""""""" + +It specifies the lexicon table. ``table_tokenize`` command uses the +tokenizer, the normalizer, the token filters that is set the +lexicon table. + +``string`` +"""""""""" + +It specifies any string which you want to tokenize. + +See :ref:`tokenize-string` option in :doc:`/reference/commands/tokenize` about details. + +Optional parameters +^^^^^^^^^^^^^^^^^^^ + +There are optional parameters. + +``flags`` +""""""""" + +It specifies a tokenization customize options. You can specify +multiple options separated by "``|``". + +See :ref:`tokenize-flags` option in :doc:`/reference/commands/tokenize` about details. + +``mode`` +"""""""" + +It specifies a tokenize mode. + +See :ref:`tokenize-mode` option in :doc:`/reference/commands/tokenize` about details. + +Return value +------------ + +``table_tokenize`` command returns tokenized tokens. + +See :ref:`tokenize-return-value` option in :doc:`/reference/commands/tokenize` about details. + +See also +-------- + +* :doc:`/reference/tokenizers` +* :doc:`/reference/commands/tokenize` Modified: doc/source/reference/commands/tokenize.rst (+14 -0) =================================================================== --- doc/source/reference/commands/tokenize.rst 2014-10-26 20:30:12 +0900 (c869ec0) +++ doc/source/reference/commands/tokenize.rst 2014-10-26 21:42:43 +0900 (d5fc6d1) @@ -52,6 +52,8 @@ Required parameters There are required parameters, ``tokenizer`` and ``string``. +.. _tokenize-tokenizer: + ``tokenizer`` """"""""""""" @@ -71,6 +73,8 @@ tokenizer plugin by :doc:`register` command. For example, you can use `KyTea <http://www.phontron.com/kytea/>`_ based tokenizer by registering ``tokenizers/kytea``. +.. _tokenize-string: + ``string`` """""""""" @@ -90,6 +94,8 @@ Optional parameters There are optional parameters. +.. _tokenize-normalizer: + ``normalizer`` """""""""""""" @@ -129,6 +135,8 @@ If you want to tokenize by two characters with noramlizer, use All alphabets are tokenized by two characters. And they are normalized to lower case characters. For example, ``fu`` is a token. +.. _tokenize-flags: + ``flags`` """"""""" @@ -165,6 +173,8 @@ string. So the character is good character for this puropose. If treated as already tokenized string. Tokenizer just tokenizes by tokenized delimiter. +.. _tokenize-mode: + ``mode`` """""""" @@ -191,6 +201,8 @@ Here is an example to the ``GET`` mode. The last alphabet is tokenized by two characters. +.. _tokenize-token-filters: + ``token_filters`` """"""""""""""""" @@ -199,6 +211,8 @@ tokenizer that is named ``token_filters``. See :doc:`/reference/token_filters` about token filters. +.. _tokenize-return-value: + Return value ------------ -------------- next part -------------- HTML����������������������������...Descargar