ChineseCharacterBasedLexicon (Stanford JavaNLP API)

Overview

Package

Class

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

edu.stanford.nlp.parser.lexparser
Class ChineseCharacterBasedLexicon

java.lang.Object
  edu.stanford.nlp.parser.lexparser.ChineseCharacterBasedLexicon

All Implemented Interfaces:: Lexicon, Serializable

public class ChineseCharacterBasedLexicon
extends Object
implements Lexicon
extends Object
implements Lexicon

Author:: Galen Andrew
See Also:: Serialized Form

Field Summary
`protected static NumberFormat`	`formatter`
`static PrintWriter`	`pw`

Fields inherited from interface edu.stanford.nlp.parser.lexparser.Lexicon
`BOUNDARY, BOUNDARY_TAG, UNKNOWN_WORD`

Constructor Summary
`ChineseCharacterBasedLexicon()`

Method Summary
`Distribution`	`getPOSDistribution()`
`static boolean`	`isForeign(String s)`
`boolean`	`isKnown(int word)` Checks whether a word is in the lexicon.
`boolean`	`isKnown(String word)` Checks whether a word is in the lexicon.
`static void`	`main(String[] args)`
`static void`	`printStats(Collection<Tree> trees)`
`void`	`readData(BufferedReader in)` Read the lexicon from the BufferedReader in the format written by writeData.
`Iterator`	`ruleIteratorByWord(int word, int loc)` Get an iterator over all rules (pairs of (word, POS)) for this word.
`String`	`sampleFrom()` Samples over words regardless of POS: first samples POS, then samples word according to that POS
`String`	`sampleFrom(String tag)` Samples from the distribution over words with this POS according to the lexicon.
`double`	`score(IntTaggedWord iTW, int loc)` Get the score of this word with this tag (as an IntTaggedWord) at this loc.
`void`	`train(Collection trees)` Trains this lexicon on the Collection of trees.
`void`	`tune(List trees)`
`void`	`writeData(Writer w)` Write the lexicon in human-readable format to the Writer.

Methods inherited from class java.lang.Object
`clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait`

Field Detail

pw

public static PrintWriter pw

formatter

protected static NumberFormat formatter

Constructor Detail

ChineseCharacterBasedLexicon

public ChineseCharacterBasedLexicon()

Method Detail

printStats

public static void printStats(Collection<Tree> trees)

train

public void train(Collection trees)

Description copied from interface: Lexicon

Trains this lexicon on the Collection of trees.

Specified by:: train in interface Lexicon

getPOSDistribution

public Distribution getPOSDistribution()

isForeign

public static boolean isForeign(String s)

score

public double score(IntTaggedWord iTW,
                    int loc)

Description copied from interface: Lexicon

Get the score of this word with this tag (as an IntTaggedWord) at this loc. (Presumably an estimate of P(word | tag).)

Specified by:: score in interface Lexicon

Parameters:: iTW - An IntTaggedWord pairing a word and POS tag; loc - The position in the sentence. In the default implementation this is used only for unknown words to change their probability distribution when sentence initial
Returns:: A double valued score, usually - log P(word|tag)

sampleFrom

public String sampleFrom(String tag)

Samples from the distribution over words with this POS according to the lexicon.

Parameters:
tag - the POS of the word to sample
Returns:
a sampled word

sampleFrom

public String sampleFrom()

Samples over words regardless of POS: first samples POS, then samples word according to that POS

Returns:
a sampled word

ruleIteratorByWord

public Iterator ruleIteratorByWord(int word, int loc)

Description copied from interface: Lexicon

Get an iterator over all rules (pairs of (word, POS)) for this word.

Specified by:
ruleIteratorByWord in interface Lexicon

Parameters:
word - The word, represented as an integer in Numberer
loc - The position of the word in the sentence (counting from 0). Implementation note: The BaseLexicon class doesn't actually make use of this position information.
Returns:
An Iterator over a List ofIntTaggedWords, which pair the word with possible taggings as integer pairs. (Each can be thought of as a tag -> word rule.)
tune public void tune(List trees) main public static void main(String[] args) throws IOException Throws: IOException readData public void readData(BufferedReader in) throws IOException Description copied from interface: Lexicon Read the lexicon from the BufferedReader in the format written by writeData. (An optional operation.) Specified by: readData in interface Lexicon Parameters: in - The BufferedReader to read from Throws: IOException writeData public void writeData(Writer w) throws IOException Description copied from interface: Lexicon Write the lexicon in human-readable format to the Writer. (An optional operation.) Specified by: writeData in interface Lexicon Parameters: w - The writer to output to Throws: IOException isKnown public boolean isKnown(int word) Description copied from interface: Lexicon Checks whether a word is in the lexicon. Specified by: isKnown in interface Lexicon Parameters: word - The word as an int Returns: Whether the word is in the lexicon isKnown public boolean isKnown(String word) Description copied from interface: Lexicon Checks whether a word is in the lexicon. Specified by: isKnown in interface Lexicon Parameters: word - The word as a String Returns: Whether the word is in the lexicon Overview Package Class Tree Deprecated Index Help PREV CLASS NEXT CLASS FRAMES NO FRAMES SUMMARY: NESTED | FIELD | CONSTR | METHOD DETAIL: FIELD | CONSTR | METHOD Stanford NLP Group

edu.stanford.nlp.parser.lexparser Class ChineseCharacterBasedLexicon

pw

formatter

ChineseCharacterBasedLexicon

printStats

train

getPOSDistribution

isForeign

score

sampleFrom

sampleFrom

ruleIteratorByWord

tune

main

readData

writeData

isKnown

isKnown

edu.stanford.nlp.parser.lexparser
Class ChineseCharacterBasedLexicon