|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectedu.stanford.nlp.parser.lexparser.ChineseLexiconAndWordSegmenter
public class ChineseLexiconAndWordSegmenter
This class lets you train a lexicon and segmenter at the same time.
Field Summary |
---|
Fields inherited from interface edu.stanford.nlp.parser.lexparser.Lexicon |
---|
BOUNDARY, BOUNDARY_TAG, UNKNOWN_WORD |
Constructor Summary | |
---|---|
ChineseLexiconAndWordSegmenter(ChineseLexicon lex,
WordSegmenter seg)
|
|
ChineseLexiconAndWordSegmenter(String segmenterFileOrUrl,
Options op)
Construct a new ChineseLexiconAndWordSegmenter. |
|
ChineseLexiconAndWordSegmenter(Treebank trainTreebank,
Options op)
|
Method Summary | |
---|---|
static ChineseLexiconAndWordSegmenter |
getSegmenterDataFromFile(String parserFileOrUrl,
Options op)
|
protected static ChineseLexiconAndWordSegmenter |
getSegmenterDataFromSerializedFile(String serializedFileOrUrl)
|
boolean |
isKnown(int word)
Checks whether a word is in the lexicon. |
boolean |
isKnown(String word)
Checks whether a word is in the lexicon. |
static void |
main(String[] args)
|
void |
readData(BufferedReader in)
Read the lexicon from the BufferedReader in the format written by writeData. |
Iterator |
ruleIteratorByWord(int word,
int loc)
Get an iterator over all rules (pairs of (word, POS)) for this word. |
double |
score(IntTaggedWord iTW,
int loc)
Get the score of this word with this tag (as an IntTaggedWord) at this loc. |
Sentence |
segmentWords(String s)
|
void |
train(Collection trees)
Trains this lexicon on the Collection of trees. |
void |
writeData(Writer w)
Write the lexicon in human-readable format to the Writer. |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Constructor Detail |
---|
public ChineseLexiconAndWordSegmenter(ChineseLexicon lex, WordSegmenter seg)
public ChineseLexiconAndWordSegmenter(Treebank trainTreebank, Options op)
public ChineseLexiconAndWordSegmenter(String segmenterFileOrUrl, Options op)
IllegalArgumentException
- If segmenter data cannot be loadedMethod Detail |
---|
public Sentence segmentWords(String s)
segmentWords
in interface WordSegmenter
public boolean isKnown(int word)
Lexicon
isKnown
in interface Lexicon
word
- The word as an int
public boolean isKnown(String word)
Lexicon
isKnown
in interface Lexicon
word
- The word as a String
public Iterator ruleIteratorByWord(int word, int loc)
Lexicon
ruleIteratorByWord
in interface Lexicon
word
- The word, represented as an integer in Numbererloc
- The position of the word in the sentence (counting from 0).
Implementation note: The BaseLexicon class doesn't
actually make use of this position information.
tag -> word rule.)
public void train(Collection trees)
Lexicon
train
in interface Lexicon
train
in interface WordSegmenter
public double score(IntTaggedWord iTW, int loc)
Lexicon
score
in interface Lexicon
iTW
- An IntTaggedWord pairing a word and POS tagloc
- The position in the sentence. In the default implementation
this is used only for unknown words to change their
probability distribution when sentence initial
public void readData(BufferedReader in) throws IOException
Lexicon
readData
in interface Lexicon
in
- The BufferedReader to read from
IOException
public void writeData(Writer w) throws IOException
Lexicon
writeData
in interface Lexicon
w
- The writer to output to
IOException
public static ChineseLexiconAndWordSegmenter getSegmenterDataFromFile(String parserFileOrUrl, Options op)
protected static ChineseLexiconAndWordSegmenter getSegmenterDataFromSerializedFile(String serializedFileOrUrl)
public static void main(String[] args)
|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |