edu.stanford.nlp.trees
Class PennTreeReader

java.lang.Object
  extended by edu.stanford.nlp.trees.PennTreeReader
All Implemented Interfaces:
TreeReader
Direct Known Subclasses:
FragDiscardingPennTreeReader

public class PennTreeReader
extends Object
implements TreeReader

A PennTreeReader is a TreeReader that reads in Penn Treebank-style files. Example usage:
TreeReader tr = new PennTreeReader(new BufferedReader(new InputStreamReader(new FileInputStream(file),"UTF-8")), myTreeFactory);

Author:
Christopher Manning, Roger Levy

Constructor Summary
PennTreeReader(Reader in)
          Read parse trees from a Reader.
PennTreeReader(Reader in, Tokenizer st)
          Read parse trees from a Reader.
PennTreeReader(Reader in, TreeFactory tf)
          Read parse trees from a Reader.
PennTreeReader(Reader in, TreeFactory tf, TreeNormalizer tn)
          Read parse trees from a Reader.
PennTreeReader(Reader in, TreeFactory tf, TreeNormalizer tn, Tokenizer st)
          Read parse trees from a Reader.
 
Method Summary
 void close()
          Close the Reader behind this TreeReader.
static void main(String[] args)
          Loads treebank data from first argument and prints it.
 Tree readTree()
          Reads a single tree in standard Penn Treebank format, with or without an additional set of parens around it (an unnamed ROOT node).
static TokenizerFactory<Tree> tokenizerFactory(TreeFactory tf, TreeNormalizer tn, Tokenizer stringTokenizer)
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

PennTreeReader

public PennTreeReader(Reader in)
Read parse trees from a Reader. For the defaulted arguments, you get a SimpleTreeFactory, no TreeNormalizer, and a PennTreebankTokenizer.

Parameters:
in - The Reader

PennTreeReader

public PennTreeReader(Reader in,
                      TreeFactory tf)
Read parse trees from a Reader.

Parameters:
in - the Reader
tf - TreeFactory -- factory to create some kind of Tree

PennTreeReader

public PennTreeReader(Reader in,
                      Tokenizer st)
Read parse trees from a Reader.

Parameters:
in - The Reader
st - The Tokenizer

PennTreeReader

public PennTreeReader(Reader in,
                      TreeFactory tf,
                      TreeNormalizer tn)
Read parse trees from a Reader.

Parameters:
in - Reader
tf - TreeFactory -- factory to create some kind of Tree
tn - the method of normalizing trees

PennTreeReader

public PennTreeReader(Reader in,
                      TreeFactory tf,
                      TreeNormalizer tn,
                      Tokenizer st)
Read parse trees from a Reader.

Parameters:
in - Reader
tf - TreeFactory -- factory to create some kind of Tree
tn - the method of normalizing trees
st - Tokenizer that divides up Reader
Method Detail

readTree

public Tree readTree()
              throws IOException
Reads a single tree in standard Penn Treebank format, with or without an additional set of parens around it (an unnamed ROOT node). If the token stream ends before the current tree is complete, a NoSuchElementException will get thrown from deep within the innards of this method.

Specified by:
readTree in interface TreeReader
Returns:
A single tree, or null at end of token stream.
Throws:
IOException

close

public void close()
           throws IOException
Close the Reader behind this TreeReader.

Specified by:
close in interface TreeReader
Throws:
IOException

tokenizerFactory

public static TokenizerFactory<Tree> tokenizerFactory(TreeFactory tf,
                                                      TreeNormalizer tn,
                                                      Tokenizer stringTokenizer)

main

public static void main(String[] args)
Loads treebank data from first argument and prints it.

Parameters:
args - Array of command-line arguments: specifies a filename


Stanford NLP Group