edu.stanford.nlp.trees
Class DiskTreebank
java.lang.Object
java.util.AbstractCollection<Tree>
edu.stanford.nlp.trees.Treebank
edu.stanford.nlp.trees.DiskTreebank
- All Implemented Interfaces:
- Iterable<Tree>, Collection<Tree>
public final class DiskTreebank
- extends Treebank
A DiskTreebank
is a Collection
of
s.
A DiskTreebank
object stores merely the information to
get at a corpus of trees that is stored on disk. Access is usually
via apply()'ing a TreeVisitor to each Tree in the Treebank or by using
an iterator() to get an iteration over the Trees.
- Author:
- Christopher Manning
Methods inherited from class edu.stanford.nlp.trees.Treebank |
encoding, loadPath, loadPath, loadPath, loadPath, loadPath, remove, size, textualSummary, toString, transform, treeReaderFactory |
DiskTreebank
public DiskTreebank()
- Create a new DiskTreebank.
The trees are made with a
LabeledScoredTreeReaderFactory
.
Compatibility note: Until Sep 2004, this used to create a Treebank
with a SimpleTreeReaderFactory, but this was changed as the old
default wasn't very useful, especially to naive users.
DiskTreebank
public DiskTreebank(String encoding)
- Create a new tree bank, set the encoding for file access.
- Parameters:
encoding
- The charset encoding to use for treebank file decoding
DiskTreebank
public DiskTreebank(TreeReaderFactory trf)
- Create a new DiskTreebank.
- Parameters:
trf
- the factory class to be called to create a new
TreeReader
DiskTreebank
public DiskTreebank(TreeReaderFactory trf,
String encoding)
- Create a new DiskTreebank.
- Parameters:
trf
- the factory class to be called to create a new
TreeReader
encoding
- The charset encoding to use for treebank file decoding
DiskTreebank
public DiskTreebank(int initialCapacity)
- Create a new Treebank.
The trees are made with a
LabeledScoredTreeReaderFactory
.
Compatibility note: Until Sep 2004, this used to create a Treebank
with a SimpleTreeReaderFactory, but this was changed as the old
default wasn't very useful, especially to naive users.
- Parameters:
initialCapacity
- The initial size of the underlying Collection.
For a DiskTreebank
, this parameter is ignored.
DiskTreebank
public DiskTreebank(int initialCapacity,
TreeReaderFactory trf)
- Create a new Treebank.
- Parameters:
initialCapacity
- The initial size of the underlying Collection,
For a DiskTreebank
, this parameter is ignored.trf
- the factory class to be called to create a new
TreeReader
transformOnRead
public void transformOnRead(TreeTransformer tt)
- Transform all trees with this TreeTransformer as they are read.
- Parameters:
tt
-
clear
public void clear()
- Empty a
Treebank
.
- Specified by:
clear
in interface Collection<Tree>
- Specified by:
clear
in class Treebank
loadPath
public void loadPath(File path,
FileFilter filt)
- Load trees from given directory. This version just records
the paths to be processed, and actually processes them at apply time.
- Specified by:
loadPath
in class Treebank
- Parameters:
path
- file or directory to load fromfilt
- a FilenameFilter of files to load
apply
public void apply(TreeVisitor tp)
- Applies the TreeVisitor to to all trees in the Treebank.
- Specified by:
apply
in class Treebank
- Parameters:
tp
- A class that can process trees.
getCurrentFile
public File getCurrentFile()
- Return the
File
from which trees are currently being
read by apply()
, and pased to a
TreePprocessor
. This is useful if one wants to map
the original file and
directory structure over to a set of modified trees.
- Returns:
- the file that trees are currently being read from, or
null
if no file is currently open
iterator
public Iterator<Tree> iterator()
- Return an Iterator over Trees in the Treebank. This is implemented
by building per-file MemoryTreebanks for the files in the
DiskTreebank. As such, it isn't as efficient as using
apply()
.
- Specified by:
iterator
in interface Iterable<Tree>
- Specified by:
iterator
in interface Collection<Tree>
- Specified by:
iterator
in class AbstractCollection<Tree>
main
public static void main(String[] args)
- Loads treebank and prints it.
All files below the designated
filePath
within the given
number range if any are loaded. You can normalize the trees or not
(English-specific) and print trees one per line up to a certain length
(for EVALB).
Usage:
java edu.stanford.nlp.trees.DiskTreebank [-maxLength n|-normalize] filePath numberRanges
- Parameters:
args
- Array of command-line arguments
Stanford NLP Group