edu.stanford.nlp.trees.international.pennchinese
Class CHTBTokenizer

java.lang.Object
  extended by edu.stanford.nlp.process.AbstractTokenizer
      extended by edu.stanford.nlp.trees.international.pennchinese.CHTBTokenizer
All Implemented Interfaces:
Tokenizer, Iterator

public class CHTBTokenizer
extends AbstractTokenizer

A simple tokenizer for tokenizing Penn Chinese Treebank files. A token is any parenthesis, node label, or terminal. All SGML content of the files is ignored.

Author:
Roger Levy

Field Summary
 
Fields inherited from class edu.stanford.nlp.process.AbstractTokenizer
nextToken
 
Constructor Summary
CHTBTokenizer(Reader r)
          Constructs a new tokenizer from a Reader.
 
Method Summary
 Object getNext()
          Internally fetches the next token.
static void main(String[] args)
          The main() method tokenizes a file in the specified Encoding and prints it to standard output in the specified Encoding.
 
Methods inherited from class edu.stanford.nlp.process.AbstractTokenizer
hasNext, next, peek, remove, tokenize
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

CHTBTokenizer

public CHTBTokenizer(Reader r)
Constructs a new tokenizer from a Reader. Note that getting the bytes going into the Reader into Java-internal Unicode is not the tokenizer's job. This can be done by converting the file with ConvertEncodingThread, or by specifying the files encoding explicitly in the Reader with java.io.InputStreamReader.

Parameters:
r - Reader
Method Detail

getNext

public Object getNext()
Internally fetches the next token.

Specified by:
getNext in class AbstractTokenizer
Returns:
the next token in the token stream, or null if none exists.

main

public static void main(String[] args)
                 throws IOException
The main() method tokenizes a file in the specified Encoding and prints it to standard output in the specified Encoding. Its arguments are (Infile, Encoding).

Throws:
IOException


Stanford NLP Group