edu.stanford.nlp.parser.lexparser
Class AbstractTreebankParserParams

java.lang.Object
  extended by edu.stanford.nlp.parser.lexparser.AbstractTreebankParserParams
All Implemented Interfaces:
TreebankLangParserParams, Serializable
Direct Known Subclasses:
ChineseTreebankParserParams, EnglishTreebankParserParams, NegraPennTreebankParserParams, TueBaDZParserParams

public abstract class AbstractTreebankParserParams
extends Object
implements TreebankLangParserParams

An abstract class providing a common method base from which to complete a TreebankLangParserParams implementing class.

With some extending classes you'll want to have access to special attributes of the corresponding TreebankLanguagePack while taking advantage of this class's code for making the TreebankLanguagePack accessible. A good way to do this is to pass a new instance of the appropriate TreebankLanguagePack into this class's constructor, then get it back later on by casting a call to treebankLanguagePack(). See ChineseTreebankParserParams for an example.

Author:
Roger Levy
See Also:
Serialized Form

Nested Class Summary
protected  class AbstractTreebankParserParams.SubcategoryStripper
           
 
Field Summary
protected  String inputEncoding
           
protected  String outputEncoding
           
protected  TreebankLanguagePack tlp
           
 
Constructor Summary
protected AbstractTreebankParserParams(TreebankLanguagePack tlp)
          Stores the passed-in TreebankLanguagePack.
 
Method Summary
abstract  TreeTransformer collinizer()
          the tree transformer used to produce trees for evaluation.
abstract  TreeTransformer collinizerEvalb()
          the tree transformer used to produce trees for evaluation.
 edu.stanford.nlp.parser.lexparser.Extractor dependencyGrammarExtractor(Options op)
           
static
<E> Collection<E>
dependencyObjectify(Tree t, HeadFinder hf, TreeTransformer collinizer, DependencyTyper<E> typer)
          Returns the set of dependencies in a tree, according to some DependencyTyper.
abstract  void display()
          display language-specific settings
 String getInputEncoding()
          Returns the input encoding being used.
 String getOutputEncoding()
          Returns the output encoding being used.
abstract  HeadFinder headFinder()
          the HeadFinder to use for your treebank.
 Lexicon lex()
           
 Lexicon lex(Options.LexOptions op)
           
abstract  MemoryTreebank memoryTreebank()
          returns a MemoryTreebank appropriate to the treebank source
 double[] MLEDependencyGrammarSmoothingParams()
          Give the parameters for smoothing in the MLEDependencyGrammar.
static Collection<Constituent> parsevalObjectify(Tree t, TreeTransformer collinizer)
          Takes a Tree and a collinizer and returns a Collection of labeled Constituents for PARSEVAL.
static Collection<Constituent> parsevalObjectify(Tree t, TreeTransformer collinizer, boolean labelConstituents)
          Takes a Tree and a collinizer and returns a Collection of Constituents for PARSEVAL evaluation.
 PrintWriter pw()
          The PrintWriter used to print output.
 PrintWriter pw(OutputStream o)
          The PrintWriter used to print output.
 void setInputEncoding(String encoding)
          Sets the input encoding.
abstract  int setOptionFlag(String[] args, int i)
          Set language-specific options according to flags.
 void setOutputEncoding(String encoding)
          Sets the output encoding.
abstract  String[] sisterSplitters()
          Returns the splitting strings used for selective splits.
 TreeTransformer subcategoryStripper()
          Returns a TreeTransformer appropriate to the Treebank which can be used to remove functional tags (such as "-TMP") from categories.
 MemoryTreebank testMemoryTreebank()
          You can often return the same thing for testMemoryTreebank as for memoryTreebank
abstract  Tree transformTree(Tree t, Tree root)
          This method does language-specific tree transformations such as annotating particular nodes with language-relevant features.
 TreebankLanguagePack treebankLanguagePack()
          Returns an appropriate treebankLanguagePack
 TokenizerFactory<Tree> treeTokenizerFactory()
           
static EquivalenceClasser<List<String>> typedDependencyClasser()
          returns an EquivalenceClasser that classes typed dependencies by the syntactic categories of mother, head and daughter, plus direction.
static Collection<List<String>> typedDependencyObjectify(Tree t, HeadFinder hf, TreeTransformer collinizer)
          Returns a collection of word-word dependencies typed by mother, head, daughter node syntactic categories.
static Collection<List<String>> untypedDependencyObjectify(Tree t, HeadFinder hf, TreeTransformer collinizer)
          Returns a collection of untyped word-word dependencies for the tree.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 
Methods inherited from interface edu.stanford.nlp.parser.lexparser.TreebankLangParserParams
defaultTestSentence, diskTreebank, treeReaderFactory
 

Field Detail

inputEncoding

protected String inputEncoding

outputEncoding

protected String outputEncoding

tlp

protected TreebankLanguagePack tlp
Constructor Detail

AbstractTreebankParserParams

protected AbstractTreebankParserParams(TreebankLanguagePack tlp)
Stores the passed-in TreebankLanguagePack.

Method Detail

setInputEncoding

public void setInputEncoding(String encoding)
Sets the input encoding.

Specified by:
setInputEncoding in interface TreebankLangParserParams

setOutputEncoding

public void setOutputEncoding(String encoding)
Sets the output encoding.

Specified by:
setOutputEncoding in interface TreebankLangParserParams

getOutputEncoding

public String getOutputEncoding()
Returns the output encoding being used.

Specified by:
getOutputEncoding in interface TreebankLangParserParams

getInputEncoding

public String getInputEncoding()
Returns the input encoding being used.

Specified by:
getInputEncoding in interface TreebankLangParserParams

memoryTreebank

public abstract MemoryTreebank memoryTreebank()
returns a MemoryTreebank appropriate to the treebank source

Specified by:
memoryTreebank in interface TreebankLangParserParams

testMemoryTreebank

public MemoryTreebank testMemoryTreebank()
You can often return the same thing for testMemoryTreebank as for memoryTreebank

Specified by:
testMemoryTreebank in interface TreebankLangParserParams

pw

public PrintWriter pw()
The PrintWriter used to print output. It's the responsibility of pw to deal properly with character encodings for the relevant treebank.

Specified by:
pw in interface TreebankLangParserParams

pw

public PrintWriter pw(OutputStream o)
The PrintWriter used to print output. It's the responsibility of pw to deal properly with character encodings for the relevant treebank.

Specified by:
pw in interface TreebankLangParserParams

treebankLanguagePack

public TreebankLanguagePack treebankLanguagePack()
Returns an appropriate treebankLanguagePack

Specified by:
treebankLanguagePack in interface TreebankLangParserParams

headFinder

public abstract HeadFinder headFinder()
the HeadFinder to use for your treebank.

Specified by:
headFinder in interface TreebankLangParserParams

lex

public Lexicon lex()

lex

public Lexicon lex(Options.LexOptions op)
Specified by:
lex in interface TreebankLangParserParams

MLEDependencyGrammarSmoothingParams

public double[] MLEDependencyGrammarSmoothingParams()
Give the parameters for smoothing in the MLEDependencyGrammar. Defaults are the ones previously hard coded into MLEDependencyGrammar.

Specified by:
MLEDependencyGrammarSmoothingParams in interface TreebankLangParserParams
Returns:
an array of doubles with smooth_aT_hTWd, smooth_aTW_hTWd, smooth_stop, and interp

parsevalObjectify

public static Collection<Constituent> parsevalObjectify(Tree t,
                                                        TreeTransformer collinizer)
Takes a Tree and a collinizer and returns a Collection of labeled Constituents for PARSEVAL.

Parameters:
t - The tree to extract constituents from
collinizer - The TreeTransformer used to normalize the tree for evaluation
Returns:
The bag of Constituents for PARSEVAL.

parsevalObjectify

public static Collection<Constituent> parsevalObjectify(Tree t,
                                                        TreeTransformer collinizer,
                                                        boolean labelConstituents)
Takes a Tree and a collinizer and returns a Collection of Constituents for PARSEVAL evaluation. Some notes on this particular parseval: (Note that I haven't checked this rigorously yet with the PARSEVAL definition -- Roger.)


untypedDependencyObjectify

public static Collection<List<String>> untypedDependencyObjectify(Tree t,
                                                                  HeadFinder hf,
                                                                  TreeTransformer collinizer)
Returns a collection of untyped word-word dependencies for the tree.


typedDependencyObjectify

public static Collection<List<String>> typedDependencyObjectify(Tree t,
                                                                HeadFinder hf,
                                                                TreeTransformer collinizer)
Returns a collection of word-word dependencies typed by mother, head, daughter node syntactic categories.


dependencyObjectify

public static <E> Collection<E> dependencyObjectify(Tree t,
                                                    HeadFinder hf,
                                                    TreeTransformer collinizer,
                                                    DependencyTyper<E> typer)
Returns the set of dependencies in a tree, according to some DependencyTyper.


typedDependencyClasser

public static EquivalenceClasser<List<String>> typedDependencyClasser()
returns an EquivalenceClasser that classes typed dependencies by the syntactic categories of mother, head and daughter, plus direction.


collinizer

public abstract TreeTransformer collinizer()
the tree transformer used to produce trees for evaluation. Will be applied both to the parse output tree and to the gold tree. Should strip punctuation and maybe do some other things.

Specified by:
collinizer in interface TreebankLangParserParams

collinizerEvalb

public abstract TreeTransformer collinizerEvalb()
the tree transformer used to produce trees for evaluation. Will be applied both to the parse output tree and to the gold tree. Should strip punctuation and maybe do some other things. The evalb version should strip some more stuff off. (finish this doc!)

Specified by:
collinizerEvalb in interface TreebankLangParserParams

sisterSplitters

public abstract String[] sisterSplitters()
Returns the splitting strings used for selective splits.

Specified by:
sisterSplitters in interface TreebankLangParserParams
Returns:
An array containing ancestor-annotated Strings: categories should be split according to these ancestor annotations.

subcategoryStripper

public TreeTransformer subcategoryStripper()
Returns a TreeTransformer appropriate to the Treebank which can be used to remove functional tags (such as "-TMP") from categories.

Specified by:
subcategoryStripper in interface TreebankLangParserParams

transformTree

public abstract Tree transformTree(Tree t,
                                   Tree root)
This method does language-specific tree transformations such as annotating particular nodes with language-relevant features. Such parameterizations should be inside the specific TreebankLangParserParams class. This method is recursively applied to each node in the tree (depth first, left-to-right), so you shouldn't write this method to apply recursively to tree members. This method is allowed to (and in some cases does) destructively change the input tree t. It changes both labels and the tree shape.

Specified by:
transformTree in interface TreebankLangParserParams
Parameters:
t - The input tree (with non-language specific annotation already done, so you need to strip back to basic categories)
root - The root of the current tree (can be null for words)
Returns:
The fully annotated tree node (with daughters still as you want them in the final result)

display

public abstract void display()
display language-specific settings

Specified by:
display in interface TreebankLangParserParams

setOptionFlag

public abstract int setOptionFlag(String[] args,
                                  int i)
Set language-specific options according to flags. This routine should process the option starting in args[i] (which might potentially be several arguments long if it takes arguments). It should return the index after the last index it consumed in processing. In particular, if it cannot process the current option, the return value should be i.

Specified by:
setOptionFlag in interface TreebankLangParserParams
Parameters:
args - Array of command line arguments
i - Index in command line arguments to try to process as an option
Returns:
The index of the item after arguments processed as part of this command line option.

treeTokenizerFactory

public TokenizerFactory<Tree> treeTokenizerFactory()
Specified by:
treeTokenizerFactory in interface TreebankLangParserParams

dependencyGrammarExtractor

public edu.stanford.nlp.parser.lexparser.Extractor dependencyGrammarExtractor(Options op)
Specified by:
dependencyGrammarExtractor in interface TreebankLangParserParams


Stanford NLP Group