edu.stanford.nlp.parser.lexparser
Interface Lexicon

All Superinterfaces:
Serializable
All Known Implementing Classes:
BaseLexicon, ChineseCharacterBasedLexicon, ChineseLexicon, ChineseLexiconAndWordSegmenter

public interface Lexicon
extends Serializable

An interface for lexicons interfacing to lexparser.

Author:
Galen Andrew

Field Summary
static String BOUNDARY
           
static String BOUNDARY_TAG
           
static String UNKNOWN_WORD
           
 
Method Summary
 boolean isKnown(int word)
          Checks whether a word is in the lexicon.
 boolean isKnown(String word)
          Checks whether a word is in the lexicon.
 void readData(BufferedReader in)
          Read the lexicon from the BufferedReader in the format written by writeData.
 Iterator ruleIteratorByWord(int word, int loc)
          Get an iterator over all rules (pairs of (word, POS)) for this word.
 double score(IntTaggedWord iTW, int loc)
          Get the score of this word with this tag (as an IntTaggedWord) at this loc.
 void train(Collection trees)
          Trains this lexicon on the Collection of trees.
 void writeData(Writer w)
          Write the lexicon in human-readable format to the Writer.
 

Field Detail

UNKNOWN_WORD

static final String UNKNOWN_WORD
See Also:
Constant Field Values

BOUNDARY

static final String BOUNDARY
See Also:
Constant Field Values

BOUNDARY_TAG

static final String BOUNDARY_TAG
See Also:
Constant Field Values
Method Detail

isKnown

boolean isKnown(int word)
Checks whether a word is in the lexicon.

Parameters:
word - The word as an int
Returns:
Whether the word is in the lexicon

isKnown

boolean isKnown(String word)
Checks whether a word is in the lexicon.

Parameters:
word - The word as a String
Returns:
Whether the word is in the lexicon

ruleIteratorByWord

Iterator ruleIteratorByWord(int word,
                            int loc)
Get an iterator over all rules (pairs of (word, POS)) for this word.

Parameters:
word - The word, represented as an integer in Numberer
loc - The position of the word in the sentence (counting from 0). Implementation note: The BaseLexicon class doesn't actually make use of this position information.
Returns:
An Iterator over a List ofIntTaggedWords, which pair the word with possible taggings as integer pairs. (Each can be thought of as a tag -> word rule.)

train

void train(Collection trees)
Trains this lexicon on the Collection of trees.


score

double score(IntTaggedWord iTW,
             int loc)
Get the score of this word with this tag (as an IntTaggedWord) at this loc. (Presumably an estimate of P(word | tag).)

Parameters:
iTW - An IntTaggedWord pairing a word and POS tag
loc - The position in the sentence. In the default implementation this is used only for unknown words to change their probability distribution when sentence initial
Returns:
A double valued score, usually - log P(word|tag)

writeData

void writeData(Writer w)
               throws IOException
Write the lexicon in human-readable format to the Writer. (An optional operation.)

Parameters:
w - The writer to output to
Throws:
IOException

readData

void readData(BufferedReader in)
              throws IOException
Read the lexicon from the BufferedReader in the format written by writeData. (An optional operation.)

Parameters:
in - The BufferedReader to read from
Throws:
IOException


Stanford NLP Group