edu.stanford.nlp.process
Class AbstractTokenizer<T>

java.lang.Object
  extended by edu.stanford.nlp.process.AbstractTokenizer<T>
All Implemented Interfaces:
Tokenizer<T>, Iterator<T>
Direct Known Subclasses:
CHTBTokenizer, LexerTokenizer, PTBTokenizer, TokenizerAdapter, WhitespaceTokenizer, WordSegmentingTokenizer

public abstract class AbstractTokenizer<T>
extends Object
implements Tokenizer<T>

An abstract tokenizer. Tokenizers extending AbstractTokenizer need only implement the getNext() method. This implementation does not allow null tokens, since null is used in the protected nextToken field to signify that no more tokens are available.

Author:
Teg Grenager (grenager@stanford.edu)

Field Summary
protected  T nextToken
           
 
Constructor Summary
AbstractTokenizer()
           
 
Method Summary
protected abstract  T getNext()
          Internally fetches the next token.
 boolean hasNext()
          Returns true if this Tokenizer has more elements.
 T next()
          Returns the next token from this Tokenizer.
 T peek()
          This is an optional operation, by default supported.
 void remove()
          This is an optional operation, by default not supported.
 List tokenize()
          Returns text as a List of tokens.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

nextToken

protected T nextToken
Constructor Detail

AbstractTokenizer

public AbstractTokenizer()
Method Detail

getNext

protected abstract T getNext()
Internally fetches the next token.

Returns:
the next token in the token stream, or null if none exists.

next

public T next()
Returns the next token from this Tokenizer.

Specified by:
next in interface Tokenizer<T>
Specified by:
next in interface Iterator<T>
Returns:
the next token in the token stream.
Throws:
NoSuchElementException - if the token stream has no more tokens.

hasNext

public boolean hasNext()
Returns true if this Tokenizer has more elements.

Specified by:
hasNext in interface Tokenizer<T>
Specified by:
hasNext in interface Iterator<T>

remove

public void remove()
This is an optional operation, by default not supported.

Specified by:
remove in interface Tokenizer<T>
Specified by:
remove in interface Iterator<T>

peek

public T peek()
This is an optional operation, by default supported.

Specified by:
peek in interface Tokenizer<T>
Returns:
the next token in the token stream.
Throws:
NoSuchElementException - if the token stream has no more tokens.

tokenize

public List tokenize()
Returns text as a List of tokens.

Specified by:
tokenize in interface Tokenizer<T>


Stanford NLP Group