edu.stanford.nlp.parser.lexparser
Class ParentAnnotationStats

java.lang.Object
  extended by edu.stanford.nlp.parser.lexparser.ParentAnnotationStats
All Implemented Interfaces:
TreeVisitor

public class ParentAnnotationStats
extends Object
implements TreeVisitor

See what parent annotation helps in treebank, based on support and KL divergence.

Author:
Christopher Manning

Field Summary
static double[] CUTOFFS
          Minimum support * KL to be included in output and as feature
static double SUPPCUTOFF
          Minimum support of parent annotated node for grandparent to be studied.
 
Method Summary
static Set getEnglishSplitCategories(String treebankRoot)
          This is hardwired to calculate the split categories from English Penn Treebank sections 2-21 with a default cutoff of 300 (as used in ACL03PCFG).
static Set getSplitCategories(Treebank t, boolean doTags, int algorithm, double phrasalCutOff, double tagCutOff, TreebankLanguagePack tlp)
          Call this method to get a String array of categories to split on.
static Set getSplitCategories(Treebank t, double cutOff, TreebankLanguagePack tlp)
          Call this method to get a String array of categories to split on.
static List kidLabels(Tree t)
           
static void main(String[] args)
          Calculate parent annotation statistics suitable for doing selective parent splitting in the PCFGParser inside FactoredParser.
 void printStats()
           
 void processTreeHelper(String gP, String p, Tree t)
           
 void visitTree(Tree t)
          Does whatever one needs to do to a particular parse tree
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

CUTOFFS

public static final double[] CUTOFFS
Minimum support * KL to be included in output and as feature


SUPPCUTOFF

public static final double SUPPCUTOFF
Minimum support of parent annotated node for grandparent to be studied. Just there to reduce runtime and printout size.

See Also:
Constant Field Values
Method Detail

visitTree

public void visitTree(Tree t)
Does whatever one needs to do to a particular parse tree

Specified by:
visitTree in interface TreeVisitor
Parameters:
t - A tree. Classes implementing this interface can assume that the tree passed in is not null.

kidLabels

public static List kidLabels(Tree t)

processTreeHelper

public void processTreeHelper(String gP,
                              String p,
                              Tree t)

printStats

public void printStats()

main

public static void main(String[] args)
Calculate parent annotation statistics suitable for doing selective parent splitting in the PCFGParser inside FactoredParser.

Usage: java edu.stanford.nlp.parser.lexparser.ParentAnnotationStats [-tags] treebankPath

Parameters:
args - One argument: path to the Treebank

getSplitCategories

public static Set getSplitCategories(Treebank t,
                                     double cutOff,
                                     TreebankLanguagePack tlp)
Call this method to get a String array of categories to split on. It calculates parent annotation statistics suitable for doing selective parent splitting in the PCFGParser inside FactoredParser.

If tlp is non-null tlp.basicCategory() will be called on parent and grandparent nodes.

This version just defaults some parameters. Implementation note: This method is not designed for concurrent invocation: it uses static state variables.


getSplitCategories

public static Set getSplitCategories(Treebank t,
                                     boolean doTags,
                                     int algorithm,
                                     double phrasalCutOff,
                                     double tagCutOff,
                                     TreebankLanguagePack tlp)
Call this method to get a String array of categories to split on. It calculates parent annotation statistics suitable for doing selective parent splitting in the PCFGParser inside FactoredParser.

If tlp is non-null tlp.basicCategory() will be called on parent and grandparent nodes.

Implementation note: This method is not designed for concurrent invocation: it uses static state variables.


getEnglishSplitCategories

public static Set getEnglishSplitCategories(String treebankRoot)
This is hardwired to calculate the split categories from English Penn Treebank sections 2-21 with a default cutoff of 300 (as used in ACL03PCFG). It was added to upgrading of code in cases where no Treebank was available, and the pre-stored list was being used).



Stanford NLP Group