ChineseTreebankParserParams (Stanford JavaNLP API)

Overview

Package

Class

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

edu.stanford.nlp.parser.lexparser
Class ChineseTreebankParserParams

java.lang.Object
  edu.stanford.nlp.parser.lexparser.AbstractTreebankParserParams
      edu.stanford.nlp.parser.lexparser.ChineseTreebankParserParams

All Implemented Interfaces:: TreebankLangParserParams, Serializable

public class ChineseTreebankParserParams
extends AbstractTreebankParserParams
extends AbstractTreebankParserParams

Parameter file for parsing the Penn Chinese Treebank. Includes category enrichments specific to the Penn Chinese Treebank.

Author:: Roger Levy, Christopher Manning, Galen Andrew
See Also:: Serialized Form

Nested Class Summary

Nested classes/interfaces inherited from class edu.stanford.nlp.parser.lexparser.AbstractTreebankParserParams
`AbstractTreebankParserParams.SubcategoryStripper`

Field Summary
`boolean`	`bikelHeadFinder`
`boolean`	`charTags`
`static boolean`	`chineseSelectiveTagPA`
`static boolean`	`chineseSplitDouHao` Chinese: Split the dou hao (a punctuation mark separating members of a list) from other punctuation.
`static boolean`	`chineseSplitPunct` Chinese: split Chinese punctuation several ways, along the lines of English punctuation plus another category for the dou hao.
`static boolean`	`chineseSplitPunctLR` Chinese: split left right/paren quote (if chineseSplitPunct is also true.
`static boolean`	`chineseSplitVP3` Chinese: split VPs into VP-COMP, VP-CRD, VP-ADJ.
`static boolean`	`chineseVerySelectiveTagPA`
`boolean`	`discardFrags`
`static boolean`	`gpaAD` Grandparent annotate all AD.
`static boolean`	`markADgrandchildOfIP` Chinese: mark ADs that are grandchild of IP.
`static boolean`	`markCC` Mark phrases which are conjunctions.
`static boolean`	`markIPadjsubj`
`static boolean`	`markIPconj` Chinese: mark IPs that are conjuncts.
`static boolean`	`markIPsisDEC` Chinese: mark IPs that are part of prenominal modifiers.
`static boolean`	`markIPsisterBA` Chinese: mark IPs that are sister of BA.
`static boolean`	`markIPsisterVVorP` Chinese: mark IP's that are sister of VV or P.
`static boolean`	`markModifiedNP` Chinese: mark left-modified NPs (rightmost NPs with a left-side mod).
`static boolean`	`markMultiNtag` Chinese: mark nominal tags that are part of multi-nominal rewrites.
`static boolean`	`markNPconj` Chinese: mark NPs that are conjuncts.
`static boolean`	`markNPmodNP` Chinese: mark NP modifiers of NPs.
`static boolean`	`markPostverbalP` Chinese: mark P with a left aunt VV, and PP with a left sister VV.
`static boolean`	`markPostverbalPP`
`static boolean`	`markPsisterIP` Chinese: mark P's that are sister of IP.
`static boolean`	`markVPadjunct` Chinese: mark phrases that are adjuncts of VP (these tend to be locatives/temporals, and have a specific distribution).
`static boolean`	`markVVsisterIP` Chinese: mark VVs that are sister of IP (communication & small-clause-taking verbs).
`static boolean`	`mergeNNVV` Chinese: merge NN and VV.
`static boolean`	`paRootDtr` Chinese: parent annotate daughter of root.
`boolean`	`segmentMarkov`
`boolean`	`segmentMaxMatch`
`static boolean`	`splitBaseNP` Mark base NPs.
`static boolean`	`splitNPTMP` Whether to retain the -TMP functional tag on various phrasal categories.
`static boolean`	`splitPPTMP`
`static boolean`	`splitXPTMP`
`boolean`	`sunJurafskyHeadFinder`
`static boolean`	`tagWordSize` Annotate tags for number of characters contained.
`static boolean`	`unaryCP`
`static boolean`	`unaryIP` Chinese: unary category marking
`boolean`	`useCharacterBasedLexicon`
`boolean`	`useMaxentDepGrammar`
`boolean`	`useMaxentLexicon`
`boolean`	`useSimilarWordMap`

Fields inherited from class edu.stanford.nlp.parser.lexparser.AbstractTreebankParserParams
`inputEncoding, outputEncoding, tlp`

Constructor Summary
`ChineseTreebankParserParams()`

Method Summary
`TreeTransformer`	`collinizer()` Returns a ChineseCollinizer
`TreeTransformer`	`collinizerEvalb()` Returns a ChineseCollinizer that doesn't delete punctuation
`List`	`defaultTestSentence()` Return a default sentence for the language (for testing)
`edu.stanford.nlp.parser.lexparser.Extractor`	`dependencyGrammarExtractor(Options op)`
`DiskTreebank`	`diskTreebank()` Uses a DiskTreebank with a CHTBTokenizer and a BobChrisTreeNormalizer.
`void`	`display()` display language-specific settings
`HeadFinder`	`headFinder()` Returns a ChineseHeadFinder
`Lexicon`	`lex(Options.LexOptions op)` Returns a ChineseLexicon
`static void`	`main(String[] args)` For testing: loads a treebank and prints the trees.
`MemoryTreebank`	`memoryTreebank()` Uses a MemoryTreebank with a CHTBTokenizer and a BobChrisTreeNormalizer
`double[]`	`MLEDependencyGrammarSmoothingParams()` Give the parameters for smoothing in the MLEDependencyGrammar.
`int`	`setOptionFlag(String[] args, int i)` Set language-specific options according to flags.
`String[]`	`sisterSplitters()` Returns the splitting strings used for selective splits.
`Tree`	`transformTree(Tree t, Tree root)` transformTree does all language-specific tree transformations.
`TreeReaderFactory`	`treeReaderFactory()` Returns a factory for reading in trees from the source you want.

Methods inherited from class edu.stanford.nlp.parser.lexparser.AbstractTreebankParserParams
`dependencyObjectify, getInputEncoding, getOutputEncoding, lex, parsevalObjectify, parsevalObjectify, pw, pw, setInputEncoding, setOutputEncoding, subcategoryStripper, testMemoryTreebank, treebankLanguagePack, treeTokenizerFactory, typedDependencyClasser, typedDependencyObjectify, untypedDependencyObjectify`

Methods inherited from class java.lang.Object
`clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait`

Field Detail

charTags

public boolean charTags

useCharacterBasedLexicon

public boolean useCharacterBasedLexicon

useMaxentLexicon

public boolean useMaxentLexicon

useMaxentDepGrammar

public boolean useMaxentDepGrammar

segmentMarkov

public boolean segmentMarkov

segmentMaxMatch

public boolean segmentMaxMatch

sunJurafskyHeadFinder

public boolean sunJurafskyHeadFinder

bikelHeadFinder

public boolean bikelHeadFinder

discardFrags

public boolean discardFrags

useSimilarWordMap

public boolean useSimilarWordMap

chineseSplitDouHao

public static boolean chineseSplitDouHao

Chinese: Split the dou hao (a punctuation mark separating members of a list) from other punctuation. Good but included below.

chineseSplitPunct

public static boolean chineseSplitPunct

Chinese: split Chinese punctuation several ways, along the lines of English punctuation plus another category for the dou hao. Good.

chineseSplitPunctLR

public static boolean chineseSplitPunctLR

Chinese: split left right/paren quote (if chineseSplitPunct is also true. Only very marginal gains, but seems positive.

markVVsisterIP

public static boolean markVVsisterIP

Chinese: mark VVs that are sister of IP (communication & small-clause-taking verbs). Good: give 0.5%

markPsisterIP

public static boolean markPsisterIP

Chinese: mark P's that are sister of IP. Negative effect

markIPsisterVVorP

public static boolean markIPsisterVVorP

Chinese: mark IP's that are sister of VV or P. These rarely have punctuation. Small positive effect.

markADgrandchildOfIP

public static boolean markADgrandchildOfIP

Chinese: mark ADs that are grandchild of IP.

gpaAD

public static boolean gpaAD

Grandparent annotate all AD. Seems slightly negative.

chineseVerySelectiveTagPA

public static boolean chineseVerySelectiveTagPA

chineseSelectiveTagPA

public static boolean chineseSelectiveTagPA

markIPsisterBA

public static boolean markIPsisterBA

Chinese: mark IPs that are sister of BA. These always have overt NP. Very slightly positive.

markVPadjunct

public static boolean markVPadjunct

Chinese: mark phrases that are adjuncts of VP (these tend to be locatives/temporals, and have a specific distribution). Necessary even with chineseSplitVP3 and parent annotation because parent annotation happens with unsplit parent categories. Slightly positive.

markNPmodNP

public static boolean markNPmodNP

Chinese: mark NP modifiers of NPs. Quite positive (0.5%)

markModifiedNP

public static boolean markModifiedNP

Chinese: mark left-modified NPs (rightmost NPs with a left-side mod). Slightly positive.

markNPconj

public static boolean markNPconj

Chinese: mark NPs that are conjuncts. Negative on small set.

markMultiNtag

public static boolean markMultiNtag

Chinese: mark nominal tags that are part of multi-nominal rewrites. Doesn't seem any good.

markIPsisDEC

public static boolean markIPsisDEC

Chinese: mark IPs that are part of prenominal modifiers. Negative.

markIPconj

public static boolean markIPconj

Chinese: mark IPs that are conjuncts. Or those that have (adjuncts or subjects)

markIPadjsubj

public static boolean markIPadjsubj

chineseSplitVP3

public static boolean chineseSplitVP3

Chinese: split VPs into VP-COMP, VP-CRD, VP-ADJ. Negative value.

mergeNNVV

public static boolean mergeNNVV

Chinese: merge NN and VV. A lark.

unaryIP

public static boolean unaryIP

Chinese: unary category marking

unaryCP

public static boolean unaryCP

paRootDtr

public static boolean paRootDtr

Chinese: parent annotate daughter of root. Meant only for selectivesplit=false.

markPostverbalP

public static boolean markPostverbalP

Chinese: mark P with a left aunt VV, and PP with a left sister VV. Note that it's necessary to mark both to thread the context-marking. Used to identify post-verbal P's, which are rare.

markPostverbalPP

public static boolean markPostverbalPP

splitBaseNP

public static boolean splitBaseNP

Mark base NPs. Good.

tagWordSize

public static boolean tagWordSize

Annotate tags for number of characters contained.

markCC

public static boolean markCC

Mark phrases which are conjunctions. Appears negative, even with 200K words training data.

splitNPTMP

public static boolean splitNPTMP

Whether to retain the -TMP functional tag on various phrasal categories. On 80K words training, minutely helpful; on 200K words, best option gives 0.6%. Doing splitNPTMP and splitPPTMP (but not splitXPTMP) is best.

splitPPTMP

public static boolean splitPPTMP

splitXPTMP

public static boolean splitXPTMP

Constructor Detail

ChineseTreebankParserParams

public ChineseTreebankParserParams()

Method Detail

headFinder

public HeadFinder headFinder()

Returns a ChineseHeadFinder

Specified by:: headFinder in interface TreebankLangParserParams
Specified by:: headFinder in class AbstractTreebankParserParams

lex

public Lexicon lex(Options.LexOptions op)

Returns a ChineseLexicon

Specified by:: lex in interface TreebankLangParserParams
Overrides:: lex in class AbstractTreebankParserParams

MLEDependencyGrammarSmoothingParams

public double[] MLEDependencyGrammarSmoothingParams()

Description copied from class: AbstractTreebankParserParams

Give the parameters for smoothing in the MLEDependencyGrammar. Defaults are the ones previously hard coded into MLEDependencyGrammar.

Specified by:: MLEDependencyGrammarSmoothingParams in interface TreebankLangParserParams
Overrides:: MLEDependencyGrammarSmoothingParams in class AbstractTreebankParserParams

Returns:: an array of doubles with smooth_aT_hTWd, smooth_aTW_hTWd, smooth_stop, and interp

treeReaderFactory

public TreeReaderFactory treeReaderFactory()

Description copied from interface: TreebankLangParserParams

Returns a factory for reading in trees from the source you want. It's the responsibility of trf to deal properly with character-set encoding of the input. It also is the responsibility of trf to properly normalize trees.

Returns:: A factory that vends an appropriate TreeReader

diskTreebank

public DiskTreebank diskTreebank()

Uses a DiskTreebank with a CHTBTokenizer and a BobChrisTreeNormalizer.

memoryTreebank

public MemoryTreebank memoryTreebank()

Uses a MemoryTreebank with a CHTBTokenizer and a BobChrisTreeNormalizer

Specified by:: memoryTreebank in interface TreebankLangParserParams
Specified by:: memoryTreebank in class AbstractTreebankParserParams

collinizer

public TreeTransformer collinizer()

Returns a ChineseCollinizer

Specified by:: collinizer in interface TreebankLangParserParams
Specified by:: collinizer in class AbstractTreebankParserParams

collinizerEvalb

public TreeTransformer collinizerEvalb()

Returns a ChineseCollinizer that doesn't delete punctuation

Specified by:: collinizerEvalb in interface TreebankLangParserParams
Specified by:: collinizerEvalb in class AbstractTreebankParserParams

sisterSplitters

public String[] sisterSplitters()

Description copied from class: AbstractTreebankParserParams

Returns the splitting strings used for selective splits.

Specified by:: sisterSplitters in interface TreebankLangParserParams
Specified by:: sisterSplitters in class AbstractTreebankParserParams

Returns:: An array containing ancestor-annotated Strings: categories should be split according to these ancestor annotations.

transformTree

public Tree transformTree(Tree t,
                          Tree root)

transformTree does all language-specific tree transformations. Any parameterizations should be inside the specific TreebankLangParserParams class.

Specified by:: transformTree in interface TreebankLangParserParams
Specified by:: transformTree in class AbstractTreebankParserParams

Parameters:: t - The input tree (with non-language specific annotation already done, so you need to strip back to basic categories); root - The root of the current tree (can be null for words)
Returns:: The fully annotated tree node (with daughters still as you want them in the final result)

display

public void display()

Description copied from class: AbstractTreebankParserParams

display language-specific settings

Specified by:: display in interface TreebankLangParserParams
Specified by:: display in class AbstractTreebankParserParams

setOptionFlag

public int setOptionFlag(String[] args,
                         int i)

Set language-specific options according to flags. This routine should process the option starting in args[i] (which might potentially be several arguments long if it takes arguments). It should return the index after the last index it consumed in processing. In particular, if it cannot process the current option, the return value should be i.

Specified by:: setOptionFlag in interface TreebankLangParserParams
Specified by:: setOptionFlag in class AbstractTreebankParserParams

Parameters:: args - Array of command line arguments; i - Index in command line arguments to try to process as an option
Returns:: The index of the item after arguments processed as part of this command line option.

dependencyGrammarExtractor

public edu.stanford.nlp.parser.lexparser.Extractor dependencyGrammarExtractor(Options op)

Specified by:: dependencyGrammarExtractor in interface TreebankLangParserParams
Overrides:: dependencyGrammarExtractor in class AbstractTreebankParserParams

defaultTestSentence

public List defaultTestSentence()

Return a default sentence for the language (for testing)

main

public static void main(String[] args)

For testing: loads a treebank and prints the trees.

Overview

Package

Class

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

Stanford NLP Group

edu.stanford.nlp.parser.lexparser Class ChineseTreebankParserParams

charTags

useCharacterBasedLexicon

useMaxentLexicon

useMaxentDepGrammar

segmentMarkov

segmentMaxMatch

sunJurafskyHeadFinder

bikelHeadFinder

discardFrags

useSimilarWordMap

chineseSplitDouHao

chineseSplitPunct

chineseSplitPunctLR

markVVsisterIP

markPsisterIP

markIPsisterVVorP

markADgrandchildOfIP

gpaAD

chineseVerySelectiveTagPA

chineseSelectiveTagPA

markIPsisterBA

markVPadjunct

markNPmodNP

markModifiedNP

markNPconj

markMultiNtag

markIPsisDEC

markIPconj

markIPadjsubj

chineseSplitVP3

mergeNNVV

unaryIP

unaryCP

paRootDtr

markPostverbalP

markPostverbalPP

splitBaseNP

tagWordSize

markCC

splitNPTMP

splitPPTMP

splitXPTMP

ChineseTreebankParserParams

headFinder

lex

MLEDependencyGrammarSmoothingParams

treeReaderFactory

diskTreebank

memoryTreebank

collinizer

collinizerEvalb

sisterSplitters

transformTree

display

setOptionFlag

dependencyGrammarExtractor

defaultTestSentence

main

edu.stanford.nlp.parser.lexparser
Class ChineseTreebankParserParams