edu.stanford.nlp.parser.lexparser
Class ChineseCharacterBasedLexicon

java.lang.Object
  extended by edu.stanford.nlp.parser.lexparser.ChineseCharacterBasedLexicon
All Implemented Interfaces:
Lexicon, Serializable

public class ChineseCharacterBasedLexicon
extends Object
implements Lexicon

Author:
Galen Andrew
See Also:
Serialized Form

Field Summary
protected static NumberFormat formatter
           
static PrintWriter pw
           
 
Fields inherited from interface edu.stanford.nlp.parser.lexparser.Lexicon
BOUNDARY, BOUNDARY_TAG, UNKNOWN_WORD
 
Constructor Summary
ChineseCharacterBasedLexicon()
           
 
Method Summary
 Distribution getPOSDistribution()
           
static boolean isForeign(String s)
           
 boolean isKnown(int word)
          Checks whether a word is in the lexicon.
 boolean isKnown(String word)
          Checks whether a word is in the lexicon.
static void main(String[] args)
           
static void printStats(Collection<Tree> trees)
           
 void readData(BufferedReader in)
          Read the lexicon from the BufferedReader in the format written by writeData.
 Iterator ruleIteratorByWord(int word, int loc)
          Get an iterator over all rules (pairs of (word, POS)) for this word.
 String sampleFrom()
          Samples over words regardless of POS: first samples POS, then samples word according to that POS
 String sampleFrom(String tag)
          Samples from the distribution over words with this POS according to the lexicon.
 double score(IntTaggedWord iTW, int loc)
          Get the score of this word with this tag (as an IntTaggedWord) at this loc.
 void train(Collection trees)
          Trains this lexicon on the Collection of trees.
 void tune(List trees)
           
 void writeData(Writer w)
          Write the lexicon in human-readable format to the Writer.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

pw

public static PrintWriter pw

formatter

protected static NumberFormat formatter
Constructor Detail

ChineseCharacterBasedLexicon

public ChineseCharacterBasedLexicon()
Method Detail

printStats

public static void printStats(Collection<Tree> trees)

train

public void train(Collection trees)
Description copied from interface: Lexicon
Trains this lexicon on the Collection of trees.

Specified by:
train in interface Lexicon

getPOSDistribution

public Distribution getPOSDistribution()

isForeign

public static boolean isForeign(String s)

score

public double score(IntTaggedWord iTW,
                    int loc)
Description copied from interface: Lexicon
Get the score of this word with this tag (as an IntTaggedWord) at this loc. (Presumably an estimate of P(word | tag).)

Specified by:
score in interface Lexicon
Parameters:
iTW - An IntTaggedWord pairing a word and POS tag
loc - The position in the sentence. In the default implementation this is used only for unknown words to change their probability distribution when sentence initial
Returns:
A double valued score, usually - log P(word|tag)

sampleFrom

public String sampleFrom(String tag)
Samples from the distribution over words with this POS according to the lexicon.

Parameters:
tag - the POS of the word to sample
Returns:
a sampled word

sampleFrom

public String sampleFrom()
Samples over words regardless of POS: first samples POS, then samples word according to that POS

Returns:
a sampled word

ruleIteratorByWord

public Iterator ruleIteratorByWord(int word,
                                   int loc)
Description copied from interface: Lexicon
Get an iterator over all rules (pairs of (word, POS)) for this word.

Specified by:
ruleIteratorByWord in interface Lexicon
Parameters:
word - The word, represented as an integer in Numberer
loc - The position of the word in the sentence (counting from 0). Implementation note: The BaseLexicon class doesn't actually make use of this position information.
Returns:
An Iterator over a List ofIntTaggedWords, which pair the word with possible taggings as integer pairs. (Each can be thought of as a tag -> word rule.)

tune

public void tune(List trees)

main

public static void main(String[] args)
                 throws IOException
Throws:
IOException

readData

public void readData(BufferedReader in)
              throws IOException
Description copied from interface: Lexicon
Read the lexicon from the BufferedReader in the format written by writeData. (An optional operation.)

Specified by:
readData in interface Lexicon
Parameters:
in - The BufferedReader to read from
Throws:
IOException

writeData

public void writeData(Writer w)
               throws IOException
Description copied from interface: Lexicon
Write the lexicon in human-readable format to the Writer. (An optional operation.)

Specified by:
writeData in interface Lexicon
Parameters:
w - The writer to output to
Throws:
IOException

isKnown

public boolean isKnown(int word)
Description copied from interface: Lexicon
Checks whether a word is in the lexicon.

Specified by:
isKnown in interface Lexicon
Parameters:
word - The word as an int
Returns:
Whether the word is in the lexicon

isKnown

public boolean isKnown(String word)
Description copied from interface: Lexicon
Checks whether a word is in the lexicon.

Specified by:
isKnown in interface Lexicon
Parameters:
word - The word as a String
Returns:
Whether the word is in the lexicon


Stanford NLP Group