edu.stanford.nlp.parser.lexparser
Class ChineseUnknownWordModel

java.lang.Object
  extended by edu.stanford.nlp.parser.lexparser.ChineseUnknownWordModel
All Implemented Interfaces:
Serializable

public class ChineseUnknownWordModel
extends Object
implements Serializable

Stores, trains, and scores with an unknown word model. A couple of filters deterministically force rewrites for certain proper nouns, dates, and cardinal and ordinal numbers; when none of these filters are met, either the distribution of terminals with the same first character is used, or Good-Turing smoothing is used. Although this is developed for Chinese, the training and storage methods could be used cross-linguistically.

Author:
Roger Levy
See Also:
Serialized Form

Constructor Summary
ChineseUnknownWordModel()
           
 
Method Summary
static void main(String[] args)
           
 double score(IntTaggedWord itw)
           
 double score(TaggedWord tw)
           
 void train(Collection trees)
          trains the first-character based unknown word model.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

ChineseUnknownWordModel

public ChineseUnknownWordModel()
Method Detail

score

public double score(IntTaggedWord itw)

score

public double score(TaggedWord tw)

train

public void train(Collection trees)
trains the first-character based unknown word model.

Parameters:
trees - the collection of trees to be trained over

main

public static void main(String[] args)


Stanford NLP Group