edu.stanford.nlp.trees
Class Treebank

java.lang.Object
  extended by java.util.AbstractCollection<Tree>
      extended by edu.stanford.nlp.trees.Treebank
All Implemented Interfaces:
Iterable<Tree>, Collection<Tree>
Direct Known Subclasses:
CompositeTreebank, DiskTreebank, MemoryTreebank

public abstract class Treebank
extends AbstractCollection<Tree>

A Treebank object provides access to a corpus of examples with given tree structures. This class now implements the Collection interface. However, it may offer less than the full power of the Collection interface: some Treebanks are read only, and so may throw the UnsupportedOperationException.

Author:
Christopher Manning, Roger Levy (added encoding variable and method)

Constructor Summary
Treebank()
          Create a new Treebank (using a LabeledScoredTreeReaderFactory).
Treebank(int initialCapacity)
          Create a new Treebank.
Treebank(int initialCapacity, TreeReaderFactory trf)
          Create a new Treebank.
Treebank(TreeReaderFactory trf)
          Create a new Treebank.
Treebank(TreeReaderFactory trf, String encoding)
          Create a new Treebank.
 
Method Summary
abstract  void apply(TreeVisitor tp)
          Apply a TreeVisitor to each tree in the Treebank.
abstract  void clear()
          Empty a Treebank.
 String encoding()
          Returns the encoding in use for treebank file bytestream access.
 void loadPath(File path)
          Load a sequence of trees from given file or directory and its subdirectories.
abstract  void loadPath(File path, FileFilter filt)
          Load trees from given path specification.
 void loadPath(File path, String suffix, boolean recursively)
          Load trees from given directory.
 void loadPath(String pathName)
          Load a sequence of trees from given directory and its subdirectories.
 void loadPath(String pathName, FileFilter filt)
          Load a sequence of trees from given directory and its subdirectories which match the file filter.
 void loadPath(String pathName, String suffix, boolean recursively)
          Load trees from given directory.
 boolean remove(Object o)
          This operation isn't supported for a Treebank.
 int size()
          Returns the size of the Treebank.
 String textualSummary()
          Return various statistics about the treebank (number of sentences, words, tag set, etc.).
 String toString()
          Return the whole treebank as a series of big bracketed lists.
 Treebank transform(TreeTransformer treeTrans)
          Return a Treebank (actually a MemoryTreebank) where each Tree in the current treebank has been transformed using the TreeTransformer.
protected  TreeReaderFactory treeReaderFactory()
          Get the TreeReaderFactory for a Treebank -- this method is provided in order to make the TreeReaderFactory available to subclasses.
 
Methods inherited from class java.util.AbstractCollection
add, addAll, contains, containsAll, isEmpty, iterator, removeAll, retainAll, toArray, toArray
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 
Methods inherited from interface java.util.Collection
equals, hashCode
 

Constructor Detail

Treebank

public Treebank()
Create a new Treebank (using a LabeledScoredTreeReaderFactory).


Treebank

public Treebank(TreeReaderFactory trf)
Create a new Treebank.

Parameters:
trf - the factory class to be called to create a new TreeReader

Treebank

public Treebank(TreeReaderFactory trf,
                String encoding)
Create a new Treebank.

Parameters:
trf - the factory class to be called to create a new TreeReader
encoding - The charset encoding to use for treebank file decoding

Treebank

public Treebank(int initialCapacity)
Create a new Treebank.

Parameters:
initialCapacity - The initial size of the underlying Collection, (if a Collection-based storage mechanism is being provided)

Treebank

public Treebank(int initialCapacity,
                TreeReaderFactory trf)
Create a new Treebank.

Parameters:
initialCapacity - The initial size of the underlying Collection, (if a Collection-based storage mechanism is being provided)
trf - the factory class to be called to create a new TreeReader
Method Detail

treeReaderFactory

protected TreeReaderFactory treeReaderFactory()
Get the TreeReaderFactory for a Treebank -- this method is provided in order to make the TreeReaderFactory available to subclasses.

Returns:
The TreeReaderFactory

encoding

public String encoding()
Returns the encoding in use for treebank file bytestream access.


clear

public abstract void clear()
Empty a Treebank.

Specified by:
clear in interface Collection<Tree>
Overrides:
clear in class AbstractCollection<Tree>

loadPath

public void loadPath(String pathName)
Load a sequence of trees from given directory and its subdirectories. Trees should reside in files with the suffix "mrg". Or: load a single file with the given pathName (including extension)

Parameters:
pathName - file or directory name

loadPath

public void loadPath(File path)
Load a sequence of trees from given file or directory and its subdirectories. Either this loads from directories and trees must reside in files with the suffix "mrg" (this is a somewhat non-general Penn Treebank hold over!), or it loads a single file with the given path (including extension)

Parameters:
path - File specification

loadPath

public void loadPath(String pathName,
                     String suffix,
                     boolean recursively)
Load trees from given directory.

Parameters:
pathName - File or directory name
suffix - Extension of files to load: If pathName is a directory, then, if this is non-null, all and only files ending in "." followed by this extension will be loaded; if it is null, all files in directories will be loaded. If pathName is not a directory, this parameter is ignored.
recursively - descend into subdirectories as well

loadPath

public void loadPath(File path,
                     String suffix,
                     boolean recursively)
Load trees from given directory.

Parameters:
path - file or directory to load from
suffix - suffix of files to load
recursively - descend into subdirectories as well

loadPath

public void loadPath(String pathName,
                     FileFilter filt)
Load a sequence of trees from given directory and its subdirectories which match the file filter. Or: load a single file with the given pathName (including extension)

Parameters:
pathName - file or directory name
filt - A filter used to determine which files match

loadPath

public abstract void loadPath(File path,
                              FileFilter filt)
Load trees from given path specification.

Parameters:
path - file or directory to load from
filt - a FilenameFilter of files to load

apply

public abstract void apply(TreeVisitor tp)
Apply a TreeVisitor to each tree in the Treebank. For all current implementations of Treebank, this is the fastest way to traverse all the trees in the Treebank.

Parameters:
tp - The TreeVisitor to be applied

transform

public Treebank transform(TreeTransformer treeTrans)
Return a Treebank (actually a MemoryTreebank) where each Tree in the current treebank has been transformed using the TreeTransformer. This Treebank is unchanged (assuming that the TreeTransformer correctly doesn't change input Trees.

Parameters:
treeTrans - The TreeTransformer to use

toString

public String toString()
Return the whole treebank as a series of big bracketed lists. Calling this is a really bad idea if your treebank is large.

Overrides:
toString in class AbstractCollection<Tree>

size

public int size()
Returns the size of the Treebank.

Specified by:
size in interface Collection<Tree>
Specified by:
size in class AbstractCollection<Tree>
Returns:
size How many trees are in the treebank

textualSummary

public String textualSummary()
Return various statistics about the treebank (number of sentences, words, tag set, etc.).


remove

public boolean remove(Object o)
This operation isn't supported for a Treebank. Tell them immediately.

Specified by:
remove in interface Collection<Tree>
Overrides:
remove in class AbstractCollection<Tree>


Stanford NLP Group