sumom****@users*****
sumom****@users*****
2008年 10月 2日 (木) 17:23:47 JST
Index: julius4/libjulius/jconf.man diff -u julius4/libjulius/jconf.man:1.1 julius4/libjulius/jconf.man:removed --- julius4/libjulius/jconf.man:1.1 Tue Dec 18 23:09:23 2007 +++ julius4/libjulius/jconf.man Thu Oct 2 17:23:47 2008 @@ -1,1053 +0,0 @@ -.TH "jconf " "5 " -.SH NAME -jconf -\- Jconf configuration file specification -.SH DESCRIPTION -The variables that can be written in Jconf file are organized as follows. -.TP 0.2i -\(bu -Global options -.TP 0.2i -\(bu -Instance declaration -.TP 0.2i -\(bu -Language model instance -.TP 0.2i -\(bu -Acoustic model and speech analysis instance -.TP 0.2i -\(bu -Recognizer and search instance -.PP -The details are described in the followings. -.SH EXAMPLE -These are examples of jconf file. -.PP -First example is a simple one with no instance declaration. When -no instance declaration is found, Julius assumes there are only -one AM, LM and recognition process instance. In this case, the -default instance will be named "\fB_default\fR", and -option order does not matter. This is equivalent to older version -of Julius, except for GMM handling (see below). -.PP -\fBExample of Jconf file: no instance declaration\fR -.PP -.nf - - \-C jconffile - (\fIOther global options\fR...) - (\fIAM and analysis options\fR...) - (\fILM options\fR...) - (\fISearch options\fR...) - -.fi -.PP -This is an example using two acoustic models and three language -models of different types. Three recognition process instance is -defined for each combination of AM and LM. The LM type (ngram / -grammar / word) is determined by the arguments. The Global -options are placed at the top in the example, but actually it can -be placed anywhere in the file. -.PP -\fBExample of Jconf file: multi model decoding\fR -.PP -.nf - - \-C jconffile - (\fIOther global options\fR...) - \-AM am1 - (\fIAM and analysis options for am1\fR...) - \-AM am2 - (\fIAM and analysis options for am2\fR...) - \-LM lm_ngram - \-d ngram \-v dictfile - (\fILM options for lm1\fR...) - \-LM lm_grammar - \-gram grammarprefix - (\fILM options for lm2\fR...) - \-LM lm_word - \-w dictfile - (\fILM options for lm3\fR...) - \-SR recog_ngram am1 lm_ngram - (\fISearch options for recog_ngram\fR...) - \-SR recog_grammar am1 lm_grammar - (\fISearch options for recog_ngram\fR...) - \-SR recog_word am2 lm_word - (\fISearch options for recog_ngram\fR...) - -.fi -.PP -This is another example using GMM for frontend processing. Note -that from Rev.4.0 Julius has independent MFCC calculation scheme -for GMM. This means that you should explicitly specify the -acoustic analysis condition for GMM, not only the AM. -.PP -Option \fB\-AM_GMM\fR switch the current AM configuration -to the one prepared internally for GMM. You can place AM configuration -after the option to specify MFCC computation parameter for GMM. -If you define exactly the same condition as AM for recognition, -the same MFCC calculation instance will be shared among AM and GMM. -Else, each MFCC will be computed independently. -.PP -\fBExample with GMM\fR -.PP -.nf - - \-C jconffile - (\fIOther global options\fR...) - \-gmm gmmdefs \-gmmreject noise - \-AM_GMM - (\fIanalysis options for GMM\fR...) - \-AM am1 - (\fIAM and analysis options for am1\fR...) - \-LM lm_ngram - \-d ngram \-v dictfile - (\fILM options for lm1\fR...) - \-SR recog_ngram am1 lm_ngram - -.fi -.SH "JCONF VARIABLES" -The full list of options and variables that can be specified in jconf -file is listed below. -.SS "GLOBAL OPTIONS " -.RS -.SS "Misc. options" -.RE -.TP -\fB\-C \fR\fIjconffile\fR -Load a jconf file. The options written in the file are -expanded at the point. This option can be used within -other jconf file. -.TP -\fB\-version \fR -Print version information to standard error, and exit. -.TP -\fB\-setting \fR -Print engine setting information to standard error, and exit. -.TP -\fB\-quiet \fR -Output less log. For result, only the best word sequence will be -printed. -.TP -\fB\-debug \fR -(For debug) output enoumous internal message and debug -information to log. -.TP -\fB\-check \fR\fB{wchmm|trellis|triphone}\fR -For debug, enter interactive check mode. -.RS -.SS "Audio input" -.RE -.TP -\fB\-input \fR\fB{mic|rawfile|mfcfile|adinnet|stdin|netaudio} \fR -Choose speech input source. 'file' or 'rawfile' for waveform -file, 'htkparam' or 'mfcfile' for HTK parameter file. Users will -be prompted to enter the file name from stdin, or you can use -"\-filelist" option to specify list of files to process. - -\&'mic' is to get audio input from live microphone device, and -\&'adinnet' means receiving waveform data via tcpip network from -an adinnet client. 'netaudio' is from DatLink/NetAudio input, -and 'stdin' means data input from standard input. - -For waveform file input, only WAV (no -compression) and RAW (noheader, 16bit, -big endian) are supported by default. Other format can be read -when compiled with \fBlibsnd\fR library. To see -what format is actually supported, see the help message using -option "\-help". For stdin input, only WAV and RAW is -supported. (default: mfcfile) -.TP -\fB\-filelist \fR\fIfilename\fR -(With \-input rawfile|mfcfile) perform recognition on all files -listed in the file. The file should contain an input file -per line. Engine ends when all of the files are processed. -.TP -\fB\-notypecheck \fR -By default, Julius checks the input parameter type whether it -matches the AM or not. This option will disable the check and -use the input vector as is. -.TP -\fB\-48 \fR -Record input with 48kHz sampling, and down\-sample it to 16kHz -on\-the\-fly. This option is valid for 16kHz model only. The -down\-sampling routine was ported from sptk. -(Rev. 4.0) -.TP -\fB\-NA \fR\fIdevicename\fR -Host name for DatLink server input (\fB\-input netaudio\fR). -.TP -\fB\-adport \fR\fIport_number\fR -With \fB\-input adinnet\fR, specify adinnet port -number to listen. (default: 5530) -.TP -\fB\-nostrip \fR -Julius by default removes successive zero samples in input -speech data. This option inhibits this removal. -.TP -\fB\-zmean \fR, \fB\-nozmean \fR -This option enables/disables DC offset removal of input -waveform. Offset will be estimated from the whole input. For -microphone / network input, zero mean of the first 48000 -samples (3 seconds in 16kHz sampling) will be used for the -estimation. (default: disabled) - -This option uses static offset for the channel. See also -\fB\-zmeansource\fR for frame\-wise offset removal. -.RS -.SS "Speech segment detection by level and zero\-cross" -.RE -.TP -\fB\-cutsilence \fR, \fB\-nocutsilence \fR -Turn on / off the speech detection by level and zero\-cross. -Default is on for mic / adinnet input, off for files. -.TP -\fB\-lv \fR\fIthres\fR -Level threshold for speech input detection. Values should be -from 0 to 32767. -.TP -\fB\-zc \fR\fIthres\fR -Zero crossing threshold per second. Only waves over the level -threshold (\fB\-lv\fR) will be counted. (default: 60) -.TP -\fB\-headmargin \fR\fImsec\fR -Silence margin at the start of speech segment in -milliseconds. (default: 300) -.TP -\fB\-tailmargin \fR\fImsec\fR -Silence margin at the end of speech segment in milliseconds. -(default: 400) -.TP -\fB\-rejectshort \fR\fImsec\fR -Reject input shorter than specified milliseconds. Search will -be terminated and no result will be output. -.RS -.SS "Input rejection by average power" -.RE -.PP -This feature will be enabled by -\fB\-\-enable\-power\-reject\fR on compilation. Should be -used with Decoder VAD or GMM VAD. Valid for real\-time input only. -.TP -\fB\-powerthres \fR\fIthres\fR -Reject the inputted segment by its average energy. If the -average energy of the last recognized input is below the -threshold, Julius will reject the input. (Rev.4.0) - -This option is valid when -\fB\-\-enable\-power\-reject\fR is specified -at compilation time. -.RS -.SS "Gaussian mixture model" -.RE -.PP -GMM will be used for input rejection by accumurated score, or for -GMM\-based frontend VAD when \fB\-\-enable\-gmm\-vad\fR is specified. -.PP -NOTE: You should also set the proper MFCC parameters required for the -GMM, specifying the acoustic parameters described in AM section -\fB\-AM_GMM\fR. -.TP -\fB\-gmm \fR\fIhmmdefs_file\fR -GMM definition file in HTK format. If specified, GMM\-based -input verification will be performed concurrently with the 1st -pass, and you can reject the input according to the result as -specified by \fB\-gmmreject\fR. The GMM should be -defined as one\-state HMMs. -.TP -\fB\-gmmnum \fR\fInumber\fR -Number of Gaussian components to be computed per frame on GMM -calculation. Only the N\-best Gaussians will be computed for -rapid calculation. The default is 10 and specifying smaller -value will speed up GMM calculation, but too small value (1 or -2) may cause degradation of identification performance. -.TP -\fB\-gmmreject \fR\fIstring\fR -Comma\-separated list of GMM names to be rejected as invalid -input. When recognition, the log likelihoods of GMMs -accumulated for the entire input will be computed concurrently -with the 1st pass. If the GMM name of the maximum score is -within this string, the 2nd pass will not be executed and the -input will be rejected. -.TP -\fB\-gmmmargin \fR\fIframes\fR -Head margin for GMM\-based VAD in frames. (Rev.4.0) - -This option will be valid only if compiled with -\fB\-\-enable\-gmm\-vad\fR. -.RS -.SS "Decoding option" -.RE -.PP -Real\-time processing means concurrent processing of MFCC computation -1st pass decoding. By default, real\-time processing on the pass is on -for microphone / adinnet / netaudio input, and for others. -.TP -\fB\-realtime \fR, \fB\-norealtime \fR -Explicitly switch on / off real\-time (pipe\-line) processing on -the first pass. The default is off for file input, and on for -microphone, adinnet and NetAudio input. This option relates -to the way CMN and energy normalization is performed: if off, -they will be done using average features of whole input. If -on, MAP\-CMN and energy normalization to do rea\-time processing. -.SS "INSTANCE DECLARATION FOR MULTI DECODING " -The following arguments will create a new configuration set with -default parameters, and switch current set to it. Jconf parameters -specified after the option will be set into the current set. -.PP -To do multi\-model decoding, these argument should be specified at -the first of each model / search instances with different names. -Any options before the first instance definition will be IGNORED. -.PP -When no instance definition is found (as older version of Julius), -all the options are assigned to a default instance named "_default". -.PP -Please note that decoding with a single LM and multiple AMs is not -fully supported. For example, you may want to construct the -jconf file as following. - -.nf - - \-AM am_1 \-AM am_2 - \-LM lm (LM spec..) - \-SR search1 am_1 lm - \-SR search2 am_2 lm -.fi - -This type of model sharing is not supported yet, since some part -of LM processing depends on the assigned AM. Instead, you can -get the same result by defining the same LMs for each AM, like this: - -.nf - - \-AM am_1 \-AM am_2 - \-LM lm_1 (LM spec..) - \-LM lm_2 (same LM spec..) - \-SR search1 am_1 lm_1 - \-SR search2 am_2 lm_2 -.fi - -.TP -\fB\-AM \fR\fIname\fR -Create a new AM configuration set, and switch current to the -new one. You should give a unique name. (Rev.4.0) -.TP -\fB\-LM \fR\fIname\fR -Create a new LM configuration set, and switch current to the -new one. You should give a unique name. (Rev.4.0) -.TP -\fB\-SR \fR\fIname\fR \fIam_name\fR \fIlm_name\fR -Create a new search configuration set, and switch current to -the new one. The specified AM and LM will be assigned to it. -The \fIam_name\fR and -\fIlm_name\fR can be either name or ID -number. You should give a unique name. (Rev.4.0) -.TP -\fB\-AM_GMM \fR -A special command to switch AM configuration set for -specifying speech analysis parameters of GMM. The current AM -will be switched to the GMM specific one already reserved, so -be careful not to confuse with normal AM configurations. -(Rev.4.0) -.SS "LANGUAGE MODEL (\-LM) " -Only one type of LM can be specified for a LM configuration. -If you want to use multi model, you should define them one by one, -each as a new LM. -.RS -.SS N\-gram -.RE -.TP -\fB\-d \fR\fIbingram_file\fR -Use binary format N\-gram. An ARPA N\-gram file can be -converted to Julius binary format by -mkbingram. -.TP -\fB\-nlr \fR\fIarpa_ngram_file\fR -A forward, left\-to\-right N\-gram language model in standard -ARPA format. When both a forward N\-gram and backward N\-gram -are specified, Julius uses this forward 2\-gram for the 1st -pass, and the backward N\-gram for the 2nd pass. - -Since ARPA file often gets huge and requires a lot of time to -load, it may be better to convert the ARPA file to Julius -binary format by mkbingram. Note that if -both forward and backward N\-gram is used for recognition, they -together should be converted to a single binary. - -When only a forward N\-gram is specified by this option and no -backward N\-gram specified by \fB\-nrl\fR, Julius -performs recognition with only the forward N\-gram. The 1st -pass will use the 2\-gram entry in the given N\-gram, and -The 2nd pass will use the given N\-gram, with converting -forward probabilities to backward probabilities by Bayes rule. -(Rev.4.0) -.TP -\fB\-nrl \fR\fIarpa_ngram_file\fR -A backward, right\-to\-left N\-gram language model in standard -ARPA format. When both a forward N\-gram and backward N\-gram -are specified, Julius uses the forward 2\-gram for the 1st -pass, and this backward N\-gram for the 2nd pass. - -Since ARPA file often gets huge and requires a lot of time to -load, it may be better to convert the ARPA file to Julius -binary format by mkbingram. Note that if -both forward and backward N\-gram is used for recognition, they -together should be converted to a single binary. - -When only a backward N\-gram is specified by this option and no -forward N\-gram specified by \fB\-nlr\fR, Julius -performs recognition with only the backward N\-gram. The 1st -pass will use the forward 2\-gram probability computed from the -backward 2\-gram using Bayes rule. The 2nd pass fully use the -given backward N\-gram. (Rev.4.0) -.TP -\fB\-v \fR\fIdict_file\fR -Word dictionary file. -.TP -\fB\-silhead \fR\fIword_string\fR \fB\-siltail \fR\fIword_string\fR -Silence word defined in the dictionary, for silences at -the beginning of sentence and end of sentence. (default: -"<s>", "</s>") -.TP -\fB\-iwspword \fR -Add a word entry to the dictionary that should correspond to -inter\-word pauses. This may improve recognition accuracy in -some language model that has no explicit inter\-word pause -modeling. The word entry to be added can be changed by -\fB\-iwspentry\fR. -.TP -\fB\-iwspentry \fR\fIword_entry_string\fR -Specify the word entry that will be added by -\fB\-iwspword\fR. (default: "<UNK> [sp] sp -sp") -.TP -\fB\-sepnum \fR\fInumber\fR -Number of high frequency words to be isolated from the lexicon -tree, to ease approximation error that may be caused by the -one\-best approximation on 1st pass. (default: 150) -.RS -.SS Grammar -.RE -.PP -Multiple grammars can be specified by using \fB\-gram\fR and -\fB\-gramlist\fR. When you specify grammars using these -options multiple times, all of them will be read at startup. Note -that this is unusual behavior from other options (in normal Julius -option, last one override previous ones). You can use -\fB\-nogram\fR to reset the already specified grammars at -that point. -.TP -\fB\-gram \fR\fBgramprefix1[,gramprefix2[,gramprefix3,...]] \fR -Comma\-separated list of grammars to be used. the argument -should be prefix of a grammar, i.e. if you have -\fBfoo.dfa\fR and -\fBfoo.dict\fR, you can specify them by single -argument \fBfoo\fR. Multiple grammars can be -specified at a time as a comma\-separated list. -.TP -\fB\-gramlist \fR\fIlist_file\fR -Specify a grammar list file that contains list of grammars to -be used. The list file should contain the prefixes of -grammars, each per line. A relative path in the list file -will be treated as relative to the list file, not the current -path or configuration file. -.TP -\fB\-dfa \fR\fIdfa_file\fR \fB\-v \fR\fIdict_file\fR -An old way of specifying grammar files separately. -.TP -\fB\-nogram \fR -Remove the current list of grammars already specified by -\fB\-gram\fR, \fB\-gramlist\fR, -\fB\-dfa\fR and \fB\-v\fR. -.RS -.SS "Isolated word" -.RE -.PP -Multiple dictionary can be specified by using \fB\-w\fR and -\fB\-wlist\fR. When you specify multiple times, all of them -will be read at startup. You can use \fB\-nogram\fR to -reset the already specified dictionaries at that point. -.TP -\fB\-w \fR\fIdict_file\fR -Word dictionary for isolated word recognition. File format -is the same as other LM. (Rev.4.0) -.TP -\fB\-wlist \fR\fIlist_file\fR -Specify a dictionary list file that contains list of -dictionaries to be used. The list file should contain the -file name of dictionaries, each per line. A relative path in -the list file will be treated as relative to the list file, -not the current path or configuration file. (Rev.4.0) -.TP -\fB\-nogram \fR -Remove the current list of dictionaries already specified by -\fB\-w\fR and \fB\-wlist\fR. -.TP -\fB\-wsil \fR\fIhead_sil_model_name\fR \fItail_sil_model_name\fR \fIsil_context_name\fR -On isolated word recognition, silence models will be appended -to the head and tail of each word at recognition. This option -specifies the silence models to be appended. -\fIsil_context_name\fR is the name of the -head sil model and tail sil model as a context of word head -phone and tail phone. For example, if you specify -\fB\-wsil silB silE sp\fR, a word with phone -sequence \fBb eh t\fR will be translated as -\fBsilB sp\-b+eh b\-eh+t eh\-t+sp silE\fR. -(Rev.4.0) -.RS -.SS "User\-defined LM" -.RE -.TP -\fB\-userlm \fR -Declare to use user LM defined in program. This option should be -specified if you use user\-defined LM function. (Rev.4.0) -.RS -.SS "Misc LM options" -.RE -.TP -\fB\-forcedict \fR -Ignore dictionary errors and force running. Words with errors -will be skipped at startup. -.SS "ACOUSTIC MODEL AND SPEECH ANALYSIS (\-AM) (\-AM_GMM) " -Acoustic analysis parameters are included in this section, since the -AM defines the required parameter. You can use different MFCC type -for each AM. For GMM, the same parameter should be specified after -\fB\-AM_GMM\fR -.PP -When using multiple AM, the values of \fB\-smpPeriod\fR, -\fB\-smpFreq\fR, \fB\-fsize\fR and -\fB\-fshift\fR should have the same value among all AMs. -.RS -.SS "acoustic HMM and parameters" -.RE -.TP -\fB\-h \fR\fIhmmdef_file\fR -Acoustic HMM definition file. File should be in HTK ascii -format, or Julius binary format. You can convert HTK ascii hmmdefs -to Julius binary format by mkbinhmm. -.TP -\fB\-hlist \fR\fIhmmlist_file\fR -HMMList file for phone mapping. This options is required when -using a triphone model. This file provides a mapping between -logical triphone names genertated from the dictionary and defined -HMM names in hmmdefs. -.TP -\fB\-tmix \fR\fInumber\fR -Specify the number of top Gaussians to be calculted in a -mixture codebook. Small number will speed up the acoustic -computation namely in a tied\-mixture model, but AM accuracy may -get worse on too small value. (default: 2) -.TP -\fB\-spmodel \fR\fIname\fR -Specify an HMM name that corresponds to short\-pause model in -HMM. This option will affect various aspects in recognition: -short\-pause skipping process on grammar recognition, word\-end -short\-pause model insertion with \fB\-iwsp\fR on -N\-gram recognition, or short\-pause segmentation -(\fB\-spsegment\fR). (default: "sp") -.TP -\fB\-multipath \fR -Enable multi\-path mode. Multi\-path mode expand state -transition availability to allow model\-skipping, or multiple -output/input transitions in HMMs. However, since defining -additional word begin / end node and perform extra transition -check on decoding, the beam width may be required to set larger -and recognition becomes a bit slower. - -By default (without this option), Julius automatically check -the transition type of specified HMMs, and enable the -multi\-path mode if required. You can force Julius to enable multi\-path -mode with this option. (rev.4.0) -.TP -\fB\-gprune \fR\fB{safe|heuristic|beam|none|default} \fR -Set Gaussian pruning algotrihm to use. The default setting -will be set according to the model type and engine setting. -"default" will force accepting the default setting. Set this -to "none" to disable pruning and perform full -computation. "safe" gualantees the top N Gaussians to be -computed. "heuristic" and "beam" do more aggressive -computational cosst reduction, but may result in small loss of -accuracy model (default: 'safe' (standard), 'beam' (fast) for -tied mixture model, 'none' for non tied\-mixture model). -.TP -\fB\-iwcd1 \fR\fB{max|avg|best number} \fR -Select method to approximate inter\-word triphone on the head -and tail of a word in the first pass. - -"max" will apply the maximum likelihood of the same context -triphones. "avg" will apply the average likelihood of the -same context triphones. "best number" will apply the average -of top N\-best likelihoods of the same context -triphone. - -Default is "best 3" for use with N\-gram, and "avg" for grammar -and word. When this AM is shared by LMs of both type, -latter one will be chosen. -.TP -\fB\-iwsppenalty \fR\fIfloat\fR -Short pause insertion penalty for appended short pauses by -\fB\-iwsp\fR. -.TP -\fB\-gshmm \fR\fIhmmdef_file\fR -If this option is specified, Julius performs Gaussian Mixture -Selection for efficient decoding. The hmmdefs should be a -monophone model generated from an ordinary monophone HMM -model, using mkgshmm. -.TP -\fB\-gsnum \fR\fInumber\fR -On GMS, specify number of monophone state from top to -compute the detailed corresponding triphones. (default: 24) -.RS -.SS "Speech analysis parameters" -.RE -.TP -\fB\-smpPeriod \fR\fIperiod\fR -Set sampling frequency of input speech by its sampling period, -in unit of 100 nanoseconds. Sampling rate can also be -specified by \fB\-smpFreq\fR. Please note that the -input frequency should be the same as trained conditions of -acoustic model you use. (default: 625 = 16000Hz) - -This option corresponds to the HTK Option "SOURCERATE". -The same value can be given to this option. - -When using multiple AM, this value should be the same among all -AMs. -.TP -\fB\-smpFreq \fR\fIHz\fR -Set sampling frequency of input speech in Hz. Sampling rate -can also be specified using "\-smpPeriod". Please note that -this frequency should be the same as the trained conditions of -acoustic model you use. (default: 16000) - -When using multiple AM, this value should be the same among all -AMs. -.TP -\fB\-fsize \fR\fIsample_num\fR -Window size in number of samples. (default: 400) - -This option corresponds to the HTK Option "WINDOWSIZE", -but value should be in samples (HTK value / smpPeriod). - -When using multiple AM, this value should be the same among all -AMs. -.TP -\fB\-fshift \fR\fIsample_num\fR -Frame shift in number of samples. (default: 160) - -This option corresponds to the HTK Option "TARGETRATE", -but value should be in samples (HTK value / smpPeriod). - -When using multiple AM, this value should be the same among all -AMs. -.TP -\fB\-preemph \fR\fIfloat\fR -Pre\-emphasis coefficient. (default: 0.97) - -This option corresponds to the HTK Option "PREEMCOEF". -The same value can be given to this option. -.TP -\fB\-fbank \fR\fInum\fR -Number of filterbank channels. (default: 24) - -This option corresponds to the HTK Option "NUMCHANS". -The same value can be given to this option. -Be aware that the default value differs from HTK (22). -.TP -\fB\-ceplif \fR\fInum\fR -Cepstral liftering coefficient. (default: 22) - -This option corresponds to the HTK Option "CEPLIFTER". -The same value can be given to this option. -.TP -\fB\-rawe \fR, \fB\-norawe \fR -Enable/disable using raw energy before pre\-emphasis (default: disabled) - -This option corresponds to the HTK Option "RAWENERGY". -Be aware that the default value differs from HTK (enabled at HTK, -disabled at Julius). -.TP -\fB\-enormal \fR, \fB\-noenormal \fR -Enable/disable normalizing log energy. On live input, this -normalization will be approximated from the average of last -input. (default: disabled) - -This option corresponds to the HTK Option "ENORMALISE". -Be aware that the default value differs from HTK (enabled at HTK, -disabled at Julius). -.TP -\fB\-escale \fR\fIfloat_scale\fR -Scaling factor of log energy when normalizing log -energy. (default: 1.0) - -This option corresponds to the HTK Option "ESCALE". -Be aware that the default value differs from HTK (0.1). -.TP -\fB\-silfloor \fR\fIfloat\fR -Energy silence floor in dB when normalizing log energy. -(default: 50.0) - -This option corresponds to the HTK Option "SILFLOOR". -.TP -\fB\-delwin \fR\fIframe\fR -Delta window size in number of frames. (default: 2) - -This option corresponds to the HTK Option "DELTAWINDOW". -The same value can be given to this option. -.TP -\fB\-accwin \fR\fIframe\fR -Acceleration window size in number of frames. (default: 2) - -This option corresponds to the HTK Option "ACCWINDOW". -The same value can be given to this option. -.TP -\fB\-hifreq \fR\fIHz\fR -Enable band\-limiting for MFCC filterbank computation: set -upper frequency cut\-off. Value of \-1 will disable it. -(default: \-1) - -This option corresponds to the HTK Option "HIFREQ". -The same value can be given to this option. -.TP -\fB\-lofreq \fR\fIHz\fR -Enable band\-limiting for MFCC filterbank computation: set -lower frequency cut\-off. Value of \-1 will disable it. -(default: \-1) - -This option corresponds to the HTK Option "LOFREQ". -The same value can be given to this option. -.TP -\fB\-zmeanframe \fR, \fB\-nozmeanframe \fR -With speech input, this option enables/disables frame\-wise DC -offset removal. This corresponds to HTK configuration -ZMEANSOURCE. This cannot be used with "\-zmean". -(default: disabled) -.RS -.SS "Real\-time cepstral mean normalization" -.RE -.TP -\fB\-cmnload \fR\fIfile\fR -Load initial cepstral mean vector from file on startup. The -file shoudld be one saved by \fB\-cmnsave\fR. -Loading an initial cepstral mean enables Julius to better -recognize the first utterance on a microphone / network input. -.TP -\fB\-cmnsave \fR\fIfile\fR -Save cepstral mean vector at each input. The parameters will -be saved to the file at each input end, so the output file -always keeps the last cepstral mean. If output file already -exist, it will be overridden. -.TP -\fB\-cmnupdate \fR\fB\-cmnnoupdate \fR -Control whether to update the cepstral mean at each input on -microphone / network input. Disabling this and specifying -\fB\-cmnload\fR will make engine to use the initial -cepstral mean parmanently. -.TP -\fB\-cmnmapweight \fR\fIfloat\fR -Specify weight of initial cepstral mean for MAP\-CMN. Specify -larger value to retain the initial cepstral mean for a longer -period, and smaller value to rely more on the current input. -(default: 100.0) -.RS -.SS "Spectral subtraction" -.RE -.TP -\fB\-sscalc \fR -Perform spectral subtraction using head part of each file. -Valid only for raw speech file input. Conflict with -\fB\-ssload\fR. -.TP -\fB\-sscalclen \fR\fImsec\fR -With \fB\-sscalc\fR, specify the length of head part -silence in milliseconds. (default: 300) -.TP -\fB\-ssload \fR\fIfile\fR -Perform spectral subtraction for speech input using -pre\-estimated noise spectrum from file. The noise spectrum -should be computed beforehand by mkss. -Valid for all speech input. Conflict with -\fB\-sscalc\fR. -.TP -\fB\-ssalpha \fR\fIfloat\fR -Alpha coefficient of spectral subtraction for -\-sscalc and \-ssload. -Noise will be subtracted stronger as this value gets larger, -but distortion of the resulting signal also becomes -remarkable. (default: 2.0) -.TP -\fB\-ssfloor \fR\fIfloat\fR -Flooring coefficient of spectral subtraction. The spectral -power that goes below zero after subtraction will be -substituted by the source signal with this coefficient -multiplied. (default: 0.5) -.RS -.SS "Misc AM options" -.RE -.TP -\fB\-htkconf \fR\fIfile\fR -Parse the given HTK Config file, and set corresponding -parameters to Julius. When using this option, the default -parameter values are switched from Julius defaults to HTK -defaults. -.SS "RECOGNIZER AND SEARCH (\-SR) " -Default values for beam width and LM weights will change according to -compile\-time setup of JuliusLib and model specification. Please see -the startup log for the actual values. -.RS -.SS "General parameters" -.RE -.TP -\fB\-inactive \fR -Start this recognition process instance with inactive state. (Rev.4.0) -.TP -\fB\-1pass \fR -Perform only the first pass. This mode is automatically set -at isolated word recognition. -.TP -\fB\-no_ccd \fR, \fB\-force_ccd \fR -Normally Julius determines whether the specified acoustic -model is a context\-dependent model from the model names, i.e., -whether the model names contain character \fB+\fR -and \fB\-\fR. You can explicitly specify by these -options to avoid mis\-detection. These option will override -automatic detection. -.TP -\fB\-cmalpha \fR\fIfloat\fR -Smoothing patemeter for confidence scoring. (default: 0.05) -.TP -\fB\-iwsp \fR -(Multi\-path mode only) Enable inter\-word context\-free short -pause handling. This option appends a skippable short pause -model for every word end. The added model will be skipped on -inter\-word context handling. The HMM model to be appended can -be specified by \fB\-spmodel\fR. -.TP -\fB\-transp \fR\fIfloat\fR -Additional insertion penalty for transparent words. (default: -0.0) -.TP -\fB\-demo \fR -Equivalent to \fB\-progout \-quiet\fR. -.RS -.SS "1st pass parameters" -.RE -.TP -\fB\-lmp \fR\fIweight\fR \fIpenalty\fR -(N\-gram) Language model weights and word insertion penalties -for the first pass. -.TP -\fB\-penalty1 \fR\fIpenalty\fR -(Grammar) word insertion penalty for the first pass. (default: 0.0) -.TP -\fB\-b \fR\fIwidth\fR -Beam width for rank beam in number of HMM nodes on the first -pass. This value defines search width on the 1st pass, and -has great effect on the total processing time. Smaller width -will speed up the decoding, but too small value will result in -a substantial increase of recognition errors due to search -failure. Larger value will make the search stable and will -lead to failure\-free search, but processing time and memory -usage will grow in proportion to the width. - -The default value is dependent on acoustic model type: 400 -(monophone), 800 (triphone), or 1000 (triphone, setup=v2.1) -.TP -\fB\-nlimit \fR\fInum\fR -Upper limit of token per node. This option is valid when -\fB\-\-enable\-wpair\fR and -\fB\-\-enable\-wpair\-nlimit\fR are enabled at -compilation time. -.TP -\fB\-progout \fR -Enable progressive output of the partial results on the first pass. -.TP -\fB\-proginterval \fR\fImsec\fR -Set the output time interval of \fB\-progout\fR in -milliseconds. -.RS -.SS "2nd pass parameters" -.RE -.TP -\fB\-lmp2 \fR\fIweight\fR \fIpenalty\fR -(N\-gram) Language model weights and word insertion penalties -for the second pass. -.TP -\fB\-penalty2 \fR\fIpenalty\fR -(Grammar) word insertion penalty for the second pass. (default: 0.0) -.TP -\fB\-b2 \fR\fIwidth\fR -Envelope beam width (number of hypothesis) in second pass. If -the count of word expantion at a certain length of hypothesis -reaches this limit while search, shorter hypotheses are not -expanded further. This prevents search to fall in -breadth\-first\-like status stacking on the same position, and -improve search failure. (default: 30) -.TP -\fB\-sb \fR\fIfloat\fR -Score envelope width for enveloped scoring. When calculating -hypothesis score for each generated hypothesis, its trellis -expansion and viterbi operation will be pruned in the middle -of the speech if score on a frame goes under the width. -Giving small value makes the second pass faster, but -computation error may occur. (default: 80.0) -.TP -\fB\-s \fR\fInum\fR -Stack size, i.e. the maximum number of hypothesis that can be -stored on the stack during the search. A larger value may -give more stable results, but increases the amount of memory -required. (default: 500) -.TP -\fB\-m \fR\fIcount\fR -Number of expanded hypotheses required to discontinue the -search. If the number of expanded hypotheses is greater then -this threshold then, the search is discontinued at that point. -The larger this value is, The longer Julius gets to give up -search. (default: 2000) -.TP -\fB\-n \fR\fInum\fR -The number of candidates Julius tries to find. The search -continues till this number of sentence hypotheses have been -found. The obtained sentence hypotheses are sorted by score, -and final result is displayed in the order (see also the -\fB\-output\fR). The possibility that the optimum -hypothesis is correctly found increases as this value gets -increased, but the processing time also becomes longer. The -default value depends on the engine setup on compilation time: -10 (standard) or 1 (fast or v2.1) -.TP -\fB\-output \fR\fInum\fR -The top N sentence hypothesis to be output at the end of -search. Use with \fB\-n\fR (default: 1) -.TP -\fB\-lookuprange \fR\fIframe\fR -When performing word expansion on the second pass, this option -sets the number of frames before and after to look up next -word hypotheses in the word trellis. This prevents the -omission of short words, but with a large value, the number of -expanded hypotheses increases and system becomes -slow. (default: 5) -.TP -\fB\-looktrellis \fR -(Grammar) Expand only the words survived on the first pass -instead of expanding all the words predicted by grammar. This -option makes second pass decoding slightly faster especially -for large vocabulary condition, but may increase deletion -error of short words. (default: disabled) -.RS -.SS "Short\-pause segmentation" -.RE -.PP -When compiled with \fB\-\-enable\-decoder\-vad\fR, the -short\-pause segmentation will be extended to support decoder\-based -VAD. -.TP -\fB\-spsegment \fR -Enable short\-pause segmentation mode. Input will be segmented -when a short pause word (word with only silence model in -pronunciation) gets the highest likelihood at certain -successive frames on the first pass. When detected segment -end, Julius stop the 1st pass at the point, perform 2nd pass, -and continue with next segment. The word context will be considered -among segments. (Rev.4.0) - -When compiled with \fB\-\-enable\-decoder\-vad\fR, -this option enables decoder\-based VAD, to skip long silence. -.TP -\fB\-spdur \fR\fIframe\fR -Short pause duration length to detect end of input segment, in -number of frames. (default: 10) -.TP -\fB\-pausemodels \fR\fIstring\fR -A comma\-separated list of pause model names to be used at short\-pause -segmentation. The word with only the pause models will be treated -as "pause word" for pause detectionin. If not specified, name -of \fB\-spmodel\fR, \fB\-silhead\fR and -\fB\-siltail\fR will be used. (Rev.4.0) -.TP -\fB\-spmargin \fR\fIframe\fR -Backstep margin at trigger up for decoder\-based VAD. (Rev.4.0) - -This option will be valid only if compiled with -\fB\-\-enable\-decoder\-vad\fR. -.TP -\fB\-spdelay \fR\fIframe\fR -Trigger decision delay frame at trigger up for decoder\-based -VAD. (Rev.4.0) - -This option will be valid only if compiled with -\fB\-\-enable\-decoder\-vad\fR. -.RS -.SS "Lattice / confusion network output" -.RE -.TP -\fB\-lattice \fR, \fB\-nolattice \fR -Enable / disable generation of word graph. Search -algorithm also has changed to optimize for better word graph -generation, so the sentence result may not be the same as -normal N\-best recognition. (Rev.4.0) -.TP -\fB\-confnet \fR, \fB\-noconfnet \fR -Enable / disable generation of confusion network. Enabling -this will also activates \fB\-lattice\fR internally. -(Rev.4.0) -.TP -\fB\-graphrange \fR\fIframe\fR -Merge same words at neighbor position at graph generation. If -the position of same words differs smaller than this value, -they will be merged. The default is 0 (allow merging on -exactly the same location) and specifying larger value will -result in smaller graph output. Setting to \-1 will disable -merging, in that case same words on the same location of -different scores will be left as they are. (default: 0) -.TP -\fB\-graphcut \fR\fIdepth\fR -Cut the resulting graph by its word depth at post\-processing -stage. The depth value is the number of words to be allowed -at a frame. Setting to \-1 disables this feature. (default: -80) -.TP -\fB\-graphboundloop \fR\fIcount\fR -Limit the number of boundary adjustment loop at -post\-processing stage. This parameter prevents Julius from -blocking by infinite adjustment loop by short word -oscillation. (default: 20) -.TP -\fB\-graphsearchdelay \fR, \fB\-nographsearchdelay \fR -When "\-graphsearchdelay" option is set, Julius modifies its -graph generation alogrithm on the 2nd pass not to terminate -search by graph merging, until the first sentence candidate is -found. This option may improve graph accuracy, especially -when you are going to generate a huge word graph by setting -broad search. Namely, it may result in better graph accuracy -when you set wide beams on both 1st pass \fB\-b\fR -and 2nd pass \fB\-b2\fR, and large number for -\fB\-n\fR. (default: disabled) -.RS -.SS "Multi\-gram / multi\-dic output" -.RE -.TP -\fB\-multigramout \fR, \fB\-nomultigramout \fR -On grammar recognition using multiple grammars, Julius will -output only the best result among all grammars. Enabling this -option will make Julius to output result for each grammar. -(default: disabled) -.RS -.SS "Forced alignment" -.RE -.TP -\fB\-walign \fR -Do viterbi alignment per word units for the recognition -result. The word boundary frames and the average acoustic -scores per frame will be calculated. -.TP -\fB\-palign \fR -Do viterbi alignment per phone units for the recognition -result. The phone boundary frames and the average acoustic -scores per frame will be calculated. -.TP -\fB\-salign \fR -Do viterbi alignment per state for the recognition result. -The state boundary frames and the average acoustic scores per -frame will be calculated.