ChineseLexicon (Stanford JavaNLP API)

Overview

Package

Class

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

edu.stanford.nlp.parser.lexparser
Class ChineseLexicon

java.lang.Object
  edu.stanford.nlp.parser.lexparser.BaseLexicon
      edu.stanford.nlp.parser.lexparser.ChineseLexicon

All Implemented Interfaces:: Lexicon, Serializable

public class ChineseLexicon
extends BaseLexicon
extends BaseLexicon

A lexicon class for Chinese. Extends the current Lexicon class, overriding its score and train methods to include a ChineseUnknownWordModel.

Author:: Roger Levy
See Also:: Serialized Form

Field Summary
`static boolean`	`useCharBasedUnknownWordModel`
`static boolean`	`useGoodTuringUnknownWordModel`

Fields inherited from class edu.stanford.nlp.parser.lexparser.BaseLexicon
`lastSentencePosition, lastSignatureIndex, lastWordToSignaturize, nullTag, nullWord, rulesWithWord, seenCounter, smartMutation, smoothInUnknownsThreshold, tags, unknownLevel, unSeenCounter, words`

Fields inherited from interface edu.stanford.nlp.parser.lexparser.Lexicon
`BOUNDARY, BOUNDARY_TAG, UNKNOWN_WORD`

Constructor Summary
`ChineseLexicon(Options.LexOptions op)`

Method Summary
`double`	`score(IntTaggedWord iTW, int loc)` Get the score of this word with this tag (as an IntTaggedWord) at this loc.
`void`	`train(Collection trees)` Trains this lexicon on the Collection of trees.

Methods inherited from class edu.stanford.nlp.parser.lexparser.BaseLexicon
`addTagging, evaluateCoverage, getSignature, getSignatureIndex, initRulesWithWord, isKnown, isKnown, printLexStats, readData, ruleIteratorByWord, train, treeToEvents, tune, writeData`

Methods inherited from class java.lang.Object
`clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait`

Field Detail

useCharBasedUnknownWordModel

public static boolean useCharBasedUnknownWordModel

useGoodTuringUnknownWordModel

public static boolean useGoodTuringUnknownWordModel

Constructor Detail

ChineseLexicon

public ChineseLexicon(Options.LexOptions op)

Method Detail

train

public void train(Collection trees)

Description copied from class: BaseLexicon

Trains this lexicon on the Collection of trees.

Specified by:: train in interface Lexicon
Overrides:: train in class BaseLexicon

score

public double score(IntTaggedWord iTW,
                    int loc)

Description copied from class: BaseLexicon

Get the score of this word with this tag (as an IntTaggedWord) at this loc. (Presumably an estimate of P(word | tag).)

Implementation documentation: Seen: c_W = count(W) c_TW = count(T,W) c_T = count(T) c_Tunseen = count(T) among new words in 2nd half total = count(seen words) totalUnseen = count("unseen" words) p_T_U = Pmle(T|"unseen") pb_T_W = P(T|W). If (c_W > smoothInUnknownsThreshold) = c_TW/c_W Else (if not smart mutation) pb_T_W = bayes prior smooth[1] with p_T_U p_T= Pmle(T) p_W = Pmle(W) pb_W_T = log(pb_T_W * p_W / p_T) [Bayes rule] Note that this doesn't really properly reserve mass to unknowns. Unseen: c_TS = count(T,Sig|Unseen) c_S = count(Sig) c_T = count(T|Unseen) c_U = totalUnseen above p_T_U = Pmle(T|Unseen) pb_T_S = Bayes smooth of Pmle(T|S) with P(T|Unseen) [smooth[0]] pb_W_T = log(P(W|T)) inverted

Specified by:: score in interface Lexicon
Overrides:: score in class BaseLexicon

Parameters:: iTW - An IntTaggedWord pairing a word and POS tag; loc - The position in the sentence. In the default implementation this is used only for unknown words to change their probability distribution when sentence initial
Returns:: A double valued score, usually - log P(word|tag)

edu.stanford.nlp.parser.lexparser Class ChineseLexicon

useCharBasedUnknownWordModel

useGoodTuringUnknownWordModel

ChineseLexicon

train

score

edu.stanford.nlp.parser.lexparser
Class ChineseLexicon