TreebankLangParserParams (Stanford JavaNLP API)

Overview

Package

Class

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

edu.stanford.nlp.parser.lexparser
Interface TreebankLangParserParams

All Superinterfaces:: Serializable

All Known Implementing Classes:: AbstractTreebankParserParams, ChineseTreebankParserParams, EnglishTreebankParserParams, NegraPennTreebankParserParams, TueBaDZParserParams

public interface TreebankLangParserParams
extends Serializable
extends Serializable

Contains language-specific methods necessary to get the parser to parse an arbitrary treebank.

Author:: Roger Levy

Method Summary
`TreeTransformer`	`collinizer()` the tree transformer used to produce trees for evaluation.
`TreeTransformer`	`collinizerEvalb()` the tree transformer used to produce trees for evaluation.
`List`	`defaultTestSentence()` Return a default sentence for the language (for testing)
`edu.stanford.nlp.parser.lexparser.Extractor`	`dependencyGrammarExtractor(Options op)`
`DiskTreebank`	`diskTreebank()` returns a DiskTreebank appropriate to the treebank source
`void`	`display()` display language-specific settings
`String`	`getInputEncoding()` Returns the input encoding being used.
`String`	`getOutputEncoding()` Returns the output encoding being used.
`HeadFinder`	`headFinder()`
`Lexicon`	`lex(Options.LexOptions op)`
`MemoryTreebank`	`memoryTreebank()` returns a MemoryTreebank appropriate to the treebank source
`double[]`	`MLEDependencyGrammarSmoothingParams()` Give the parameters for smoothing in the MLEDependencyGrammar.
`PrintWriter`	`pw()` returns a PrintWriter used to print output.
`PrintWriter`	`pw(OutputStream o)` returns a PrintWriter used to print output to the OutputStream o.
`void`	`setInputEncoding(String encoding)`
`int`	`setOptionFlag(String[] args, int i)` Set a language-specific option according to command-line flags.
`void`	`setOutputEncoding(String encoding)`
`String[]`	`sisterSplitters()` Returns the splitting strings used for selective splits.
`TreeTransformer`	`subcategoryStripper()` Returns a TreeTransformer appropriate to the Treebank which can be used to remove functional tags (such as "-TMP") from categories.
`MemoryTreebank`	`testMemoryTreebank()` returns a MemoryTreebank appropriate to the testing treebank source
`Tree`	`transformTree(Tree t, Tree root)` This method does language-specific tree transformations such as annotating particular nodes with language-relevant features.
`TreebankLanguagePack`	`treebankLanguagePack()` returns a TreebankLanguagePack containing Treebank-specific (but not parser-specific) info such as what is punctuation, and also information about the structure of labels
`TreeReaderFactory`	`treeReaderFactory()` Returns a factory for reading in trees from the source you want.
`TokenizerFactory<Tree>`	`treeTokenizerFactory()`

Method Detail

headFinder

HeadFinder headFinder()

setInputEncoding

void setInputEncoding(String encoding)

setOutputEncoding

void setOutputEncoding(String encoding)

getOutputEncoding

String getOutputEncoding()

Returns the output encoding being used.

getInputEncoding

String getInputEncoding()

Returns the input encoding being used.

treeReaderFactory

TreeReaderFactory treeReaderFactory()

Returns a factory for reading in trees from the source you want. It's the responsibility of trf to deal properly with character-set encoding of the input. It also is the responsibility of trf to properly normalize trees.

Returns:: A factory that vends an appropriate TreeReader

lex

Lexicon lex(Options.LexOptions op)

collinizer

TreeTransformer collinizer()

the tree transformer used to produce trees for evaluation. Will be applied both to the parse output tree and to the gold tree. Should strip punctuation and maybe do some other things.

collinizerEvalb

TreeTransformer collinizerEvalb()

the tree transformer used to produce trees for evaluation. Will be applied both to the parse output tree and to the gold tree. Should strip punctuation and maybe do some other things. The evalb version should strip some more stuff off. (finish this doc!)

memoryTreebank

MemoryTreebank memoryTreebank()

returns a MemoryTreebank appropriate to the treebank source

diskTreebank

DiskTreebank diskTreebank()

returns a DiskTreebank appropriate to the treebank source

testMemoryTreebank

MemoryTreebank testMemoryTreebank()

returns a MemoryTreebank appropriate to the testing treebank source

treebankLanguagePack

TreebankLanguagePack treebankLanguagePack()

returns a TreebankLanguagePack containing Treebank-specific (but not parser-specific) info such as what is punctuation, and also information about the structure of labels

pw

PrintWriter pw()

returns a PrintWriter used to print output. It's the responsibility of the returned PrintWriter to deal properly with character encodings for the relevant treebank

pw

PrintWriter pw(OutputStream o)

returns a PrintWriter used to print output to the OutputStream o. It's the responsibility of the returned PrintWriter to deal properly with character encodings for the relevant treebank

sisterSplitters

String[] sisterSplitters()

Returns the splitting strings used for selective splits.

Returns:: An array containing ancestor-annotated Strings: categories should be split according to these ancestor annotations.

subcategoryStripper

TreeTransformer subcategoryStripper()

Returns a TreeTransformer appropriate to the Treebank which can be used to remove functional tags (such as "-TMP") from categories.

transformTree

Tree transformTree(Tree t,
                   Tree root)

This method does language-specific tree transformations such as annotating particular nodes with language-relevant features. Such parameterizations should be inside the specific TreebankLangParserParams class. This method is recursively applied to each node in the tree (depth first, left-to-right), so you shouldn't write this method to apply recursively to tree members. This method is allowed to (and in some cases does) destructively change the input tree t. It changes both labels and the tree shape.

Parameters:: t - The input tree (with non-language specific annotation already done, so you need to strip back to basic categories); root - The root of the current tree (can be null for words)
Returns:: The fully annotated tree node (with daughters still as you want them in the final result)

display

void display()

display language-specific settings

setOptionFlag

int setOptionFlag(String[] args,
                  int i)

Set a language-specific option according to command-line flags. This routine should try to process the option starting at args[i] (which might potentially be several arguments long if it takes arguments). It should return the index after the last index it consumed in processing. In particular, if it cannot process the current option, the return value should be i.

Parameters:: args - Array of command line arguments; i - Index in command line arguments to try to process as an option
Returns:: The index of the item after arguments processed as part of this command line option.

defaultTestSentence

List defaultTestSentence()

Return a default sentence for the language (for testing)

treeTokenizerFactory

TokenizerFactory<Tree> treeTokenizerFactory()

dependencyGrammarExtractor

edu.stanford.nlp.parser.lexparser.Extractor dependencyGrammarExtractor(Options op)

MLEDependencyGrammarSmoothingParams

double[] MLEDependencyGrammarSmoothingParams()

Give the parameters for smoothing in the MLEDependencyGrammar.

Returns:: an array of doubles with smooth_aT_hTWd, smooth_aTW_hTWd, smooth_stop, and interp