edu.stanford.nlp.parser.lexparser
Class TreeBinarizer

java.lang.Object
  extended by edu.stanford.nlp.parser.lexparser.TreeBinarizer
All Implemented Interfaces:
TreeTransformer

public class TreeBinarizer
extends Object
implements TreeTransformer

Binarizes trees in such a way that head-argument structure is respected. Looks only at the value of input tree nodes. Produces LSTrees with CWT labels. The input trees have to have CWT labels! Although the binarizer always respects heads, you can get left or right binarization by defining an appropriate HeadFinder.

Author:
Dan Klein, Teg Grenager, Christopher Manning

Constructor Summary
TreeBinarizer(HeadFinder hf, TreebankLanguagePack tlp, boolean insideFactor, boolean markovFactor, int markovOrder, boolean useWrappingLabels, boolean unaryAtTop, double selectiveSplitThreshold, boolean markFinalStates)
          Build a custom binarizer for Trees.
 
Method Summary
protected static boolean isSynthetic(String label)
           
static void main(String[] args)
          Let's you test out the TreeBinarizer on the command line.
 void setDoSelectiveSplit(boolean doSelectiveSplit)
          If this is set to true, then the binarizer will choose selectively whether or not to split states based on how many counts the states had in a previous run.
 Tree transformTree(Tree t)
          Binarizes the tree according to options set up in the constructor.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

TreeBinarizer

public TreeBinarizer(HeadFinder hf,
                     TreebankLanguagePack tlp,
                     boolean insideFactor,
                     boolean markovFactor,
                     int markovOrder,
                     boolean useWrappingLabels,
                     boolean unaryAtTop,
                     double selectiveSplitThreshold,
                     boolean markFinalStates)
Build a custom binarizer for Trees.

Parameters:
hf - the HeadFinder to use in binarization
tlp - the TreebankLanguagePack to use
insideFactor - whether to do inside markovization
markovFactor - whether to markovize the binary rules
markovOrder - the markov order to use; only relevant with markovFactor=true
useWrappingLabels - whether to use state names (labels) that allow wrapping from right to left
unaryAtTop - Whether to actually materialize the unary that rewrites a passive state to the active rule at the top of an original local tree. This is used only when compaction is happening
selectiveSplitThreshold - if selective split is used, this will be the threshold used to decide which state splits to keep
markFinalStates - whether or not to make the state names (labels) of the final active states distinctive
Method Detail

setDoSelectiveSplit

public void setDoSelectiveSplit(boolean doSelectiveSplit)
If this is set to true, then the binarizer will choose selectively whether or not to split states based on how many counts the states had in a previous run. These counts are stored in an internal counter, which will be added to when doSelectiveSplit is false. If passed false, this will initialize (clear) the counts.

Parameters:
doSelectiveSplit -

isSynthetic

protected static boolean isSynthetic(String label)

transformTree

public Tree transformTree(Tree t)
Binarizes the tree according to options set up in the constructor. Does the whole tree by calling itself recursively.

Specified by:
transformTree in interface TreeTransformer
Parameters:
t - A tree to be binarized. The non-leaf nodes must already have CategoryWordTag labels, with heads percolated.
Returns:
A binary tree.

main

public static void main(String[] args)
Let's you test out the TreeBinarizer on the command line. This main method doesn't yet handle as many flags as one would like.



Stanford NLP Group