|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectedu.stanford.nlp.parser.lexparser.ChineseCharacterBasedLexicon
public class ChineseCharacterBasedLexicon
Field Summary | |
---|---|
protected static NumberFormat |
formatter
|
static PrintWriter |
pw
|
Fields inherited from interface edu.stanford.nlp.parser.lexparser.Lexicon |
---|
BOUNDARY, BOUNDARY_TAG, UNKNOWN_WORD |
Constructor Summary | |
---|---|
ChineseCharacterBasedLexicon()
|
Method Summary | |
---|---|
Distribution |
getPOSDistribution()
|
static boolean |
isForeign(String s)
|
boolean |
isKnown(int word)
Checks whether a word is in the lexicon. |
boolean |
isKnown(String word)
Checks whether a word is in the lexicon. |
static void |
main(String[] args)
|
static void |
printStats(Collection<Tree> trees)
|
void |
readData(BufferedReader in)
Read the lexicon from the BufferedReader in the format written by writeData. |
Iterator |
ruleIteratorByWord(int word,
int loc)
Get an iterator over all rules (pairs of (word, POS)) for this word. |
String |
sampleFrom()
Samples over words regardless of POS: first samples POS, then samples word according to that POS |
String |
sampleFrom(String tag)
Samples from the distribution over words with this POS according to the lexicon. |
double |
score(IntTaggedWord iTW,
int loc)
Get the score of this word with this tag (as an IntTaggedWord) at this loc. |
void |
train(Collection trees)
Trains this lexicon on the Collection of trees. |
void |
tune(List trees)
|
void |
writeData(Writer w)
Write the lexicon in human-readable format to the Writer. |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
public static PrintWriter pw
protected static NumberFormat formatter
Constructor Detail |
---|
public ChineseCharacterBasedLexicon()
Method Detail |
---|
public static void printStats(Collection<Tree> trees)
public void train(Collection trees)
Lexicon
train
in interface Lexicon
public Distribution getPOSDistribution()
public static boolean isForeign(String s)
public double score(IntTaggedWord iTW, int loc)
Lexicon
score
in interface Lexicon
iTW
- An IntTaggedWord pairing a word and POS tagloc
- The position in the sentence. In the default implementation
this is used only for unknown words to change their
probability distribution when sentence initial
public String sampleFrom(String tag)
tag
- the POS of the word to sample
public String sampleFrom()
public Iterator ruleIteratorByWord(int word, int loc)
Lexicon
ruleIteratorByWord
in interface Lexicon
word
- The word, represented as an integer in Numbererloc
- The position of the word in the sentence (counting from 0).
Implementation note: The BaseLexicon class doesn't
actually make use of this position information.
tag -> word rule.)
public void tune(List trees)
public static void main(String[] args) throws IOException
IOException
public void readData(BufferedReader in) throws IOException
Lexicon
readData
in interface Lexicon
in
- The BufferedReader to read from
IOException
public void writeData(Writer w) throws IOException
Lexicon
writeData
in interface Lexicon
w
- The writer to output to
IOException
public boolean isKnown(int word)
Lexicon
isKnown
in interface Lexicon
word
- The word as an int
public boolean isKnown(String word)
Lexicon
isKnown
in interface Lexicon
word
- The word as a String
|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |