ChineseLexiconAndWordSegmenter (Stanford JavaNLP API)

Overview

Package

Class

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

edu.stanford.nlp.parser.lexparser
Class ChineseLexiconAndWordSegmenter

java.lang.Object
  edu.stanford.nlp.parser.lexparser.ChineseLexiconAndWordSegmenter

All Implemented Interfaces:: Lexicon, WordSegmenter, Serializable

public class ChineseLexiconAndWordSegmenter
extends Object
implements Lexicon, WordSegmenter
extends Object
implements Lexicon, WordSegmenter

This class lets you train a lexicon and segmenter at the same time.

Author:: Galen Andrew, Pi-Chuan Chang
See Also:: Serialized Form

Field Summary

Fields inherited from interface edu.stanford.nlp.parser.lexparser.Lexicon
`BOUNDARY, BOUNDARY_TAG, UNKNOWN_WORD`

Constructor Summary
`ChineseLexiconAndWordSegmenter(ChineseLexicon lex, WordSegmenter seg)`
`ChineseLexiconAndWordSegmenter(String segmenterFileOrUrl, Options op)` Construct a new ChineseLexiconAndWordSegmenter.
`ChineseLexiconAndWordSegmenter(Treebank trainTreebank, Options op)`

Method Summary
`static ChineseLexiconAndWordSegmenter`	`getSegmenterDataFromFile(String parserFileOrUrl, Options op)`
`protected static ChineseLexiconAndWordSegmenter`	`getSegmenterDataFromSerializedFile(String serializedFileOrUrl)`
`boolean`	`isKnown(int word)` Checks whether a word is in the lexicon.
`boolean`	`isKnown(String word)` Checks whether a word is in the lexicon.
`static void`	`main(String[] args)`
`void`	`readData(BufferedReader in)` Read the lexicon from the BufferedReader in the format written by writeData.
`Iterator`	`ruleIteratorByWord(int word, int loc)` Get an iterator over all rules (pairs of (word, POS)) for this word.
`double`	`score(IntTaggedWord iTW, int loc)` Get the score of this word with this tag (as an IntTaggedWord) at this loc.
`Sentence`	`segmentWords(String s)`
`void`	`train(Collection trees)` Trains this lexicon on the Collection of trees.
`void`	`writeData(Writer w)` Write the lexicon in human-readable format to the Writer.

Methods inherited from class java.lang.Object
`clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait`

Constructor Detail

ChineseLexiconAndWordSegmenter

public ChineseLexiconAndWordSegmenter(ChineseLexicon lex,
                                      WordSegmenter seg)

ChineseLexiconAndWordSegmenter

public ChineseLexiconAndWordSegmenter(Treebank trainTreebank,
                                      Options op)

ChineseLexiconAndWordSegmenter

public ChineseLexiconAndWordSegmenter(String segmenterFileOrUrl,
                                      Options op)

Construct a new ChineseLexiconAndWordSegmenter. This loads a segmenter file that was previously assembled and stored.

Throws:: IllegalArgumentException - If segmenter data cannot be loaded

Method Detail

segmentWords

public Sentence segmentWords(String s)

Specified by:: segmentWords in interface WordSegmenter

isKnown

public boolean isKnown(int word)

Description copied from interface: Lexicon

Checks whether a word is in the lexicon.

Specified by:: isKnown in interface Lexicon

Parameters:: word - The word as an int
Returns:: Whether the word is in the lexicon

isKnown

public boolean isKnown(String word)

Description copied from interface: Lexicon

Checks whether a word is in the lexicon.

Specified by:: isKnown in interface Lexicon

Parameters:: word - The word as a String
Returns:: Whether the word is in the lexicon

ruleIteratorByWord

public Iterator ruleIteratorByWord(int word,
                                   int loc)

Description copied from interface: Lexicon

Get an iterator over all rules (pairs of (word, POS)) for this word.

Specified by:: ruleIteratorByWord in interface Lexicon

Parameters:: word - The word, represented as an integer in Numberer; loc - The position of the word in the sentence (counting from 0). Implementation note: The BaseLexicon class doesn't actually make use of this position information.
Returns:: An Iterator over a List ofIntTaggedWords, which pair the word with possible taggings as integer pairs. (Each can be thought of as a tag -> word rule.)





train
public void train(Collection trees)

Description copied from interface: Lexicon
Trains this lexicon on the Collection of trees.


Specified by:
train in interface Lexicon
Specified by:
train in interface WordSegmenter








score
public double score(IntTaggedWord iTW,
                    int loc)

Description copied from interface: Lexicon
Get the score of this word with this tag (as an IntTaggedWord) at this 
 loc.
 (Presumably an estimate of P(word | tag).)


Specified by:
score in interface Lexicon


Parameters:
iTW - An IntTaggedWord pairing a word and POS tag
loc - The position in the sentence.  In the default implementation
               this is used only for unknown words to change their
               probability distribution when sentence initial
Returns:
A double valued score, usually - log P(word|tag)






readData
public void readData(BufferedReader in)
              throws IOException

Description copied from interface: Lexicon
Read the lexicon from the BufferedReader in the format written by 
 writeData.
 (An optional operation.)


Specified by:
readData in interface Lexicon


Parameters:
in - The BufferedReader to read from
Throws:
IOException





writeData
public void writeData(Writer w)
               throws IOException

Description copied from interface: Lexicon
Write the lexicon in human-readable format to the Writer.
 (An optional operation.)


Specified by:
writeData in interface Lexicon


Parameters:
w - The writer to output to
Throws:
IOException





getSegmenterDataFromFile
public static ChineseLexiconAndWordSegmenter getSegmenterDataFromFile(String parserFileOrUrl,
                                                                      Options op)











getSegmenterDataFromSerializedFile
protected static ChineseLexiconAndWordSegmenter getSegmenterDataFromSerializedFile(String serializedFileOrUrl)











main
public static void main(String[] args)




















  
      Overview 
      Package 
    Class 
      Tree 
      Deprecated 
      Index 
      Help 
  









 PREV CLASS 
 NEXT CLASS

  FRAMES   
 NO FRAMES   
 







  SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD






Stanford NLP Group

edu.stanford.nlp.parser.lexparser Class ChineseLexiconAndWordSegmenter

ChineseLexiconAndWordSegmenter

ChineseLexiconAndWordSegmenter

ChineseLexiconAndWordSegmenter

segmentWords

isKnown

isKnown

ruleIteratorByWord

train

score

readData

writeData

getSegmenterDataFromFile

getSegmenterDataFromSerializedFile

main

edu.stanford.nlp.parser.lexparser
Class ChineseLexiconAndWordSegmenter