|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectedu.stanford.nlp.process.AbstractListProcessor
edu.stanford.nlp.process.WordToSentenceProcessor
public class WordToSentenceProcessor
Transforms a Document of Words into a Document of Sentences by grouping the Words. The word stream is assumed to already be adequately tokenized, and this class just divides the list into sentences, perhaps discarding some separator tokens based on the setting of the following three sets:
Constructor Summary | |
---|---|
WordToSentenceProcessor()
Create a WordToSentenceProcessor using a sensible default
list of tokens to split on. |
|
WordToSentenceProcessor(Pattern regionBeginPattern,
Pattern regionEndPattern)
|
|
WordToSentenceProcessor(Set boundaryTokens)
Flexibly set the set of acceptable sentence boundary tokens, but with a default set of allowed boundary following tokens. |
|
WordToSentenceProcessor(Set boundaryTokens,
Set boundaryFollowers)
Flexibly set the set of acceptable sentence boundary tokens and also the set of tokens commonly following sentence boundaries, and the set of discarded separator tokens. |
|
WordToSentenceProcessor(Set boundaryTokens,
Set boundaryFollowers,
Set boundaryToDiscard)
Flexibly set the set of acceptable sentence boundary tokens, the set of tokens commonly following sentence boundaries, and also the set of tokens that are sentences boundaries that should be discarded. |
Method Summary | |
---|---|
static void |
main(String[] args)
This will print out as sentences some text. |
List |
process(List words)
Returns a List of Sentences where each element is built from a run of Words in the input Document. |
Methods inherited from class edu.stanford.nlp.process.AbstractListProcessor |
---|
processDocument, processLists |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Constructor Detail |
---|
public WordToSentenceProcessor()
WordToSentenceProcessor
using a sensible default
list of tokens to split on. The default set is: {".","?","!"}.
public WordToSentenceProcessor(Set boundaryTokens)
public WordToSentenceProcessor(Set boundaryTokens, Set boundaryFollowers)
public WordToSentenceProcessor(Set boundaryTokens, Set boundaryFollowers, Set boundaryToDiscard)
public WordToSentenceProcessor(Pattern regionBeginPattern, Pattern regionEndPattern)
Method Detail |
---|
public List process(List words)
PTBTokenizer
).
words
- A list of already tokenized words (must implement HasWord)
WordToSentenceProcessor(Set, Set, Set)
,
Sentence
public static void main(String[] args)
args
- Command line argument: files or URLs
|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |