Package edu.stanford.nlp.process

Interface Summary
Function<T1,T2> An interface for classes that act as a function transforming one object to another.
LexedTokenFactory Constructs a token (of arbitrary type) from a String and its position in the underlying text.
ListProcessor An interface for things that operate on a List.
Processor Top-level interface for transforming Documents.
Tokenizer<T> Tokenizers break up text into individual Objects.
 

Class Summary
AbstractListProcessor Class AbstractListProcessor
AbstractTokenizer<T> An abstract tokenizer.
Americanize Takes a HasWord or String and returns a lowercase version of it.
DocumentPreprocessor Fully customizable preprocessor for XML, HTML, and plain text documents.
LexerTokenizer An implementation of Tokenizer designed to work with Lexer implementing classes.
Morphology Morphology computes the base form of English words, by removing just inflections (not derivational morphology).
PTBEscapingProcessor Produces a new Document of Words in which special characters of the PTB have been properly escaped.
PTBTokenizer Tokenizer implementation that conforms to the Penn Treebank tokenization conventions.
PTBTokenizer.PTBTokenizerFactory  
StripTagsProcessor A Processor whose process method deletes all SGML/XML/HTML tags (tokens starting with < and ending with >.
TokenizerAdapter This class adapts between a java.io.StreamTokenizer and a edu.stanford.nlp.process.Tokenizer.
WhitespaceTokenizer Simple Tokenizer implementation that tokenizes on whitespace.
WordSegmentingTokenizer  
WordTokenFactory Constructs a Word from a String.
WordToSentenceProcessor Transforms a Document of Words into a Document of Sentences by grouping the Words.
WordToTaggedWordProcessor Transforms a Document of Words into a document all or partly of TaggedWords by breaking words on a tag divider character.
 



Stanford NLP Group