edu.stanford.nlp.parser.lexparser
Interface TreebankLangParserParams

All Superinterfaces:
Serializable
All Known Implementing Classes:
AbstractTreebankParserParams, ChineseTreebankParserParams, EnglishTreebankParserParams, NegraPennTreebankParserParams, TueBaDZParserParams

public interface TreebankLangParserParams
extends Serializable

Contains language-specific methods necessary to get the parser to parse an arbitrary treebank.

Author:
Roger Levy

Method Summary
 TreeTransformer collinizer()
          the tree transformer used to produce trees for evaluation.
 TreeTransformer collinizerEvalb()
          the tree transformer used to produce trees for evaluation.
 List defaultTestSentence()
          Return a default sentence for the language (for testing)
 edu.stanford.nlp.parser.lexparser.Extractor dependencyGrammarExtractor(Options op)
           
 DiskTreebank diskTreebank()
          returns a DiskTreebank appropriate to the treebank source
 void display()
          display language-specific settings
 String getInputEncoding()
          Returns the input encoding being used.
 String getOutputEncoding()
          Returns the output encoding being used.
 HeadFinder headFinder()
           
 Lexicon lex(Options.LexOptions op)
           
 MemoryTreebank memoryTreebank()
          returns a MemoryTreebank appropriate to the treebank source
 double[] MLEDependencyGrammarSmoothingParams()
          Give the parameters for smoothing in the MLEDependencyGrammar.
 PrintWriter pw()
          returns a PrintWriter used to print output.
 PrintWriter pw(OutputStream o)
          returns a PrintWriter used to print output to the OutputStream o.
 void setInputEncoding(String encoding)
           
 int setOptionFlag(String[] args, int i)
          Set a language-specific option according to command-line flags.
 void setOutputEncoding(String encoding)
           
 String[] sisterSplitters()
          Returns the splitting strings used for selective splits.
 TreeTransformer subcategoryStripper()
          Returns a TreeTransformer appropriate to the Treebank which can be used to remove functional tags (such as "-TMP") from categories.
 MemoryTreebank testMemoryTreebank()
          returns a MemoryTreebank appropriate to the testing treebank source
 Tree transformTree(Tree t, Tree root)
          This method does language-specific tree transformations such as annotating particular nodes with language-relevant features.
 TreebankLanguagePack treebankLanguagePack()
          returns a TreebankLanguagePack containing Treebank-specific (but not parser-specific) info such as what is punctuation, and also information about the structure of labels
 TreeReaderFactory treeReaderFactory()
          Returns a factory for reading in trees from the source you want.
 TokenizerFactory<Tree> treeTokenizerFactory()
           
 

Method Detail

headFinder

HeadFinder headFinder()

setInputEncoding

void setInputEncoding(String encoding)

setOutputEncoding

void setOutputEncoding(String encoding)

getOutputEncoding

String getOutputEncoding()
Returns the output encoding being used.


getInputEncoding

String getInputEncoding()
Returns the input encoding being used.


treeReaderFactory

TreeReaderFactory treeReaderFactory()
Returns a factory for reading in trees from the source you want. It's the responsibility of trf to deal properly with character-set encoding of the input. It also is the responsibility of trf to properly normalize trees.

Returns:
A factory that vends an appropriate TreeReader

lex

Lexicon lex(Options.LexOptions op)

collinizer

TreeTransformer collinizer()
the tree transformer used to produce trees for evaluation. Will be applied both to the parse output tree and to the gold tree. Should strip punctuation and maybe do some other things.


collinizerEvalb

TreeTransformer collinizerEvalb()
the tree transformer used to produce trees for evaluation. Will be applied both to the parse output tree and to the gold tree. Should strip punctuation and maybe do some other things. The evalb version should strip some more stuff off. (finish this doc!)


memoryTreebank

MemoryTreebank memoryTreebank()
returns a MemoryTreebank appropriate to the treebank source


diskTreebank

DiskTreebank diskTreebank()
returns a DiskTreebank appropriate to the treebank source


testMemoryTreebank

MemoryTreebank testMemoryTreebank()
returns a MemoryTreebank appropriate to the testing treebank source


treebankLanguagePack

TreebankLanguagePack treebankLanguagePack()
returns a TreebankLanguagePack containing Treebank-specific (but not parser-specific) info such as what is punctuation, and also information about the structure of labels


pw

PrintWriter pw()
returns a PrintWriter used to print output. It's the responsibility of the returned PrintWriter to deal properly with character encodings for the relevant treebank


pw

PrintWriter pw(OutputStream o)
returns a PrintWriter used to print output to the OutputStream o. It's the responsibility of the returned PrintWriter to deal properly with character encodings for the relevant treebank


sisterSplitters

String[] sisterSplitters()
Returns the splitting strings used for selective splits.

Returns:
An array containing ancestor-annotated Strings: categories should be split according to these ancestor annotations.

subcategoryStripper

TreeTransformer subcategoryStripper()
Returns a TreeTransformer appropriate to the Treebank which can be used to remove functional tags (such as "-TMP") from categories.


transformTree

Tree transformTree(Tree t,
                   Tree root)
This method does language-specific tree transformations such as annotating particular nodes with language-relevant features. Such parameterizations should be inside the specific TreebankLangParserParams class. This method is recursively applied to each node in the tree (depth first, left-to-right), so you shouldn't write this method to apply recursively to tree members. This method is allowed to (and in some cases does) destructively change the input tree t. It changes both labels and the tree shape.

Parameters:
t - The input tree (with non-language specific annotation already done, so you need to strip back to basic categories)
root - The root of the current tree (can be null for words)
Returns:
The fully annotated tree node (with daughters still as you want them in the final result)

display

void display()
display language-specific settings


setOptionFlag

int setOptionFlag(String[] args,
                  int i)
Set a language-specific option according to command-line flags. This routine should try to process the option starting at args[i] (which might potentially be several arguments long if it takes arguments). It should return the index after the last index it consumed in processing. In particular, if it cannot process the current option, the return value should be i.

Parameters:
args - Array of command line arguments
i - Index in command line arguments to try to process as an option
Returns:
The index of the item after arguments processed as part of this command line option.

defaultTestSentence

List defaultTestSentence()
Return a default sentence for the language (for testing)


treeTokenizerFactory

TokenizerFactory<Tree> treeTokenizerFactory()

dependencyGrammarExtractor

edu.stanford.nlp.parser.lexparser.Extractor dependencyGrammarExtractor(Options op)

MLEDependencyGrammarSmoothingParams

double[] MLEDependencyGrammarSmoothingParams()
Give the parameters for smoothing in the MLEDependencyGrammar.

Returns:
an array of doubles with smooth_aT_hTWd, smooth_aTW_hTWd, smooth_stop, and interp


Stanford NLP Group