edu.stanford.nlp.process
Interface Tokenizer<T>

All Superinterfaces:
Iterator<T>
All Known Implementing Classes:
AbstractTokenizer, CHTBTokenizer, LexerTokenizer, NegraPennTokenizer, PennTreebankTokenizer, PTBTokenizer, TokenizerAdapter, WhitespaceTokenizer, WordSegmentingTokenizer

public interface Tokenizer<T>
extends Iterator<T>

Tokenizers break up text into individual Objects. These objects may be Strings, Words, or other Objects. A Tokenizer extends the Iterator interface, but provides a lookahead operation peek(). An implementation of this interface is expected to have a constructor that takes a single argument, a Reader.

Author:
Teg Grenager (grenager@stanford.edu)

Method Summary
 boolean hasNext()
          Returns true if and only if this Tokenizer has more elements.
 T next()
          Returns the next token from this Tokenizer.
 T peek()
          Returns the next token, without removing it, from the Tokenizer, so that the same token will be again returned on the next call to next() or peek().
 void remove()
          Removes from the underlying collection the last element returned by the iterator.
 List<T> tokenize()
          Returns all tokens of this Tokenizer as a List for convenience.
 

Method Detail

next

T next()
Returns the next token from this Tokenizer.

Specified by:
next in interface Iterator<T>
Returns:
the next token in the token stream.
Throws:
NoSuchElementException - if the token stream has no more tokens.

hasNext

boolean hasNext()
Returns true if and only if this Tokenizer has more elements.

Specified by:
hasNext in interface Iterator<T>

remove

void remove()
Removes from the underlying collection the last element returned by the iterator. This is an optional operation for Iterators - a Tokenizer normally would not support it. This method can be called only once per call to next.

Specified by:
remove in interface Iterator<T>

peek

T peek()
Returns the next token, without removing it, from the Tokenizer, so that the same token will be again returned on the next call to next() or peek().

Returns:
the next token in the token stream.
Throws:
NoSuchElementException - if the token stream has no more tokens.

tokenize

List<T> tokenize()
Returns all tokens of this Tokenizer as a List for convenience.



Stanford NLP Group