CHTBTokenizer (Stanford JavaNLP API)

Overview

Package

Class

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

edu.stanford.nlp.trees.international.pennchinese
Class CHTBTokenizer

java.lang.Object
  edu.stanford.nlp.process.AbstractTokenizer
      edu.stanford.nlp.trees.international.pennchinese.CHTBTokenizer

All Implemented Interfaces:: Tokenizer, Iterator

public class CHTBTokenizer
extends AbstractTokenizer
extends AbstractTokenizer

A simple tokenizer for tokenizing Penn Chinese Treebank files. A token is any parenthesis, node label, or terminal. All SGML content of the files is ignored.

Author:: Roger Levy

Field Summary

Fields inherited from class edu.stanford.nlp.process.AbstractTokenizer
`nextToken`

Constructor Summary
`CHTBTokenizer(Reader r)` Constructs a new tokenizer from a Reader.

Method Summary
`Object`	`getNext()` Internally fetches the next token.
`static void`	`main(String[] args)` The main() method tokenizes a file in the specified Encoding and prints it to standard output in the specified Encoding.

Methods inherited from class edu.stanford.nlp.process.AbstractTokenizer
`hasNext, next, peek, remove, tokenize`

Methods inherited from class java.lang.Object
`clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait`

Constructor Detail

CHTBTokenizer

public CHTBTokenizer(Reader r)

Constructs a new tokenizer from a Reader. Note that getting the bytes going into the Reader into Java-internal Unicode is not the tokenizer's job. This can be done by converting the file with ConvertEncodingThread, or by specifying the files encoding explicitly in the Reader with java.io.InputStreamReader.

Parameters:: r - Reader

Method Detail

getNext

public Object getNext()

Internally fetches the next token.

Specified by:: getNext in class AbstractTokenizer

Returns:: the next token in the token stream, or null if none exists.

main

public static void main(String[] args)
                 throws IOException

The main() method tokenizes a file in the specified Encoding and prints it to standard output in the specified Encoding. Its arguments are (Infile, Encoding).

Throws:: IOException