EnglishTreebankParserParams.EnglishTrain (Stanford JavaNLP API)

Overview

Package

Class

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

edu.stanford.nlp.parser.lexparser
Class EnglishTreebankParserParams.EnglishTrain

java.lang.Object
  edu.stanford.nlp.parser.lexparser.EnglishTreebankParserParams.EnglishTrain

Enclosing class:: EnglishTreebankParserParams

public static class EnglishTreebankParserParams.EnglishTrain
extends Object
extends Object

Field Summary
`static boolean`	`collinsBaseNP` Mark base NPs and add a NP node above the base NP, if it isn't already in an NP over NP construction, as in Collins 1999.
`static boolean`	`correctTags` 'Correct' tags to produce verbs in VPs, etc.
`static boolean`	`dominatesC` Verbal distance -- mark whether symbol dominates a conjunction (CC)
`static boolean`	`dominatesI` Verbal distance -- mark whether symbol dominates a preposition (IN)
`static int`	`dominatesV` Verbal distance -- mark whether symbol dominates a verb (V*, MD).
`static boolean`	`gpaRootVP` Grand-parent annotate (root mark) VP below ROOT.
`static boolean`	`joinJJ` Joint comparative and superlative adjective with positive.
`static boolean`	`joinNounTags` Join proper nouns with common nouns.
`static boolean`	`joinPound` Join pound with dollar.
`static int`	`markCC` Mark phrases which are conjunctions.
`static boolean`	`markContainedVP`
`static int`	`markDitransV` Attempt to record ditransitive verbs.
`static boolean`	`markReflexivePRP` Mark reflexivie PRP words.
`static boolean`	`restructurePossP` Restructure possessive NPs so that they introduce a POSSP node that takes as children the POS and a regularly structured NP.
`static boolean`	`rightPhrasal` Right edge has a phrasal node.
`static int`	`sisterSplitLevel` Set the support * KL cutoff level (1-4) for sister splitting -- don't use it, as far as we can tell so far
`static int`	`splitAux` Make special tags for forms of BE and HAVE.
`static boolean`	`splitBaseNP` Mark base NPs.
`static int`	`splitCC` Provide annotation of conjunctions.
`static int`	`splitIN` Annotate prepositions into subcategories.
`static boolean`	`splitJJCOMP` Put a special tag on 'adjectives with complements'.
`static boolean`	`splitMoreLess` Specially mark the comparative/superlative words: less, least, more, most
`static boolean`	`splitNNPposition` Mark NNP words as to position in phrase (single, left, right, inside
`static boolean`	`splitNOT` Annotates forms of "not" specially as tag "NOT".
`static int`	`splitNPADV` Retain NP-ADV annotation.
`static int`	`splitNPNNP` Mark NP-NNP.
`static int`	`splitNPpercent` Mark phrases that are headed by %.
`static boolean`	`splitNPPRP`
`static boolean`	`splitNumNP` Mark "numeric NPs".
`static boolean`	`splitPercent` Mark the nouns that are percent signs.
`static boolean`	`splitPoss` Give a special tag to NPs which are possessive NPs (end in 's)
`static boolean`	`splitPPJJ` A special test for "such" mainly ("such as Fred").
`static boolean`	`splitQuotes` Mark quote marks for single vs.
`static boolean`	`splitRB` Split modifier (NP, AdjP) adverbs from others.
`static int`	`splitSbar` Split SBAR nodes.
`static boolean`	`splitSFP` Separate out sentence final punct.
`static int`	`splitSGapped` Mark specially S nodes with "gapped" subject (control, raising).
`static boolean`	`splitSTag` Mark S nodes according to verbal tag.
`static int`	`splitTMP` Retain NP-TMP (or maybe PP-TMP) annotation.
`static boolean`	`splitTRJJ` Put a special tag on 'transitive adjectives' with NP complement, like 'due May 15' -- it also catches 'such' in 'such as NP', which may be a good.
`static int`	`splitVP` Add (head) tags to VPs.
`static boolean`	`splitVPNPAgr` Put enough marking on VP and NP to permit "agreement"
`static boolean`	`tagRBGPA` Grand parent annotate RB to try to distinguish sentential ones and ones in places like NP post modifier (things like 'very' are already distinguished as their parent is ADJP).
`static boolean`	`unaryDT` Mark "Intransitive" DT.
`static boolean`	`unaryIN` Mark "Intransitive" IN.
`static boolean`	`unaryPRP` "Intransitive" PRP.
`static boolean`	`unaryRB` Mark "Intransitive" RB.
`static boolean`	`vpSubCat` Pitiful attempt at marking V* preterms with their surface subcat frames.

Constructor Summary
`EnglishTreebankParserParams.EnglishTrain()`

Method Summary
`static void`	`display()`

Methods inherited from class java.lang.Object
`clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait`

Field Detail

splitIN

public static int splitIN

Annotate prepositions into subcategories. Values: 0 = no annotation 1 = IN with a ^S.* parent (putative subordinating conjunctions) marked differently from others (real prepositions). OK. 2 = Annotate IN prepositions 3 ways: ^S.* parent, ^N.* parent or rest (generally predicative ADJP, VP). Better than sIN=1. Good. 3 = Annotate prepositions 6 ways: real feature engineering. Great. 4 = Refinement of 3: allows -SC under SINV, WHADVP for -T and no -SCC if the parent is an NP. 5 = Like 4 but maps TO to IN in a "nominal" (N*, P*, A*) context. 6 = 4, but mark V/A complement and leave noun ones unmarked instead.

splitQuotes

public static boolean splitQuotes

Mark quote marks for single vs. double so don't get mismatched ones.

splitSFP

public static boolean splitSFP

Separate out sentence final punct. (. ! ?).

splitPercent

public static boolean splitPercent

Mark the nouns that are percent signs. Slightly good.

splitNPpercent

public static int splitNPpercent

Mark phrases that are headed by %. A value of 0 = do nothing, 1 = only NP, 2 = NP and ADJP, 3 = NP, ADJP and QP, 4 = any phrase.

tagRBGPA

public static boolean tagRBGPA

Grand parent annotate RB to try to distinguish sentential ones and ones in places like NP post modifier (things like 'very' are already distinguished as their parent is ADJP).

splitNNPposition

public static boolean splitNNPposition

Mark NNP words as to position in phrase (single, left, right, inside

joinPound

public static boolean joinPound

Join pound with dollar.

joinJJ

public static boolean joinJJ

Joint comparative and superlative adjective with positive.

joinNounTags

public static boolean joinNounTags

Join proper nouns with common nouns. This isn't to improve performance, but because Genia doesn't use proper noun tags in general.

splitPPJJ

public static boolean splitPPJJ

A special test for "such" mainly ("such as Fred"). A wash, so omit

splitTRJJ

public static boolean splitTRJJ

Put a special tag on 'transitive adjectives' with NP complement, like 'due May 15' -- it also catches 'such' in 'such as NP', which may be a good. Matches 658 times in 2-21 training corpus. Wash.

splitJJCOMP

public static boolean splitJJCOMP

Put a special tag on 'adjectives with complements'. This acts as a general subcat feature for adjectives.

splitMoreLess

public static boolean splitMoreLess

Specially mark the comparative/superlative words: less, least, more, most

unaryDT

public static boolean unaryDT

Mark "Intransitive" DT. Good.

unaryRB

public static boolean unaryRB

Mark "Intransitive" RB. Good.

unaryPRP

public static boolean unaryPRP

"Intransitive" PRP. Wash -- basically a no-op really.

markReflexivePRP

public static boolean markReflexivePRP

Mark reflexivie PRP words.

unaryIN

public static boolean unaryIN

Mark "Intransitive" IN. Minutely negative.

splitCC

public static int splitCC

Provide annotation of conjunctions. Gives modest gains (numbers shown F1 increase with respect to goodPCFG in June 2005). A value of 1 annotates both "and" and "or" as "CC-C" (+0.29%), 2 annotates "but" and "&" separately (+0.17%), 3 annotates just "and" (equalsIgnoreCase) (+0.11%), 0 annotates nothing (+0.00%).

splitNOT

public static boolean splitNOT

Annotates forms of "not" specially as tag "NOT". BAD

splitRB

public static boolean splitRB

Split modifier (NP, AdjP) adverbs from others. This does nothing if you're already doing tagPA.

splitAux

public static int splitAux

Make special tags for forms of BE and HAVE. A value of 1 is the basic form. Positive PCFG effect, but neutral to negative in Combo, and impossible if use gPA. A value of 2 adds in "s" = "'s" and delves further to disambiguate "'s" as BE or HAVE. Theoretically good, but no practical gains.

vpSubCat

public static boolean vpSubCat

Pitiful attempt at marking V* preterms with their surface subcat frames. Bad so far.

markDitransV

public static int markDitransV

Attempt to record ditransitive verbs. The value 0 means do nothing; 1 records two or more NP or S* arguments, and 2 means to only record two or more NP arguments (that aren't NP-TMP). 1 gave neutral to bad results.

splitVP

public static int splitVP

Add (head) tags to VPs. An argument of 0 = no head-subcategorization of VPs, 1 = add head tags (anything, as given by HeadFinder), 2 = add head tags, but collapse finite verb tags (VBP, VBD, VBZ, MD) together, 3 = only annotate verbal tags, and collapse finite verb tags (annotation is VBF, TO, VBG, VBN, VB, or zero), 4 = only split on categories of VBF, TO, VBG, VBN, VB, and map cases that are not headed by a verbal category to an appropriate category based on word suffix (ing, d, t, s, to) or to VB otherwise. We usually use a value of 3; 2 or 3 is much better than 0. See also splitVPNPAgr. If it is true, its effects override any value set for this parameter.

splitVPNPAgr

public static boolean splitVPNPAgr

Put enough marking on VP and NP to permit "agreement"

splitSTag

public static boolean splitSTag

Mark S nodes according to verbal tag. How they are marked depends on the setting of splitVP. Bad.

markContainedVP

public static boolean markContainedVP

splitNPPRP

public static boolean splitNPPRP

dominatesV

public static int dominatesV

Verbal distance -- mark whether symbol dominates a verb (V*, MD). Very good.

dominatesI

public static boolean dominatesI

Verbal distance -- mark whether symbol dominates a preposition (IN)

dominatesC

public static boolean dominatesC

Verbal distance -- mark whether symbol dominates a conjunction (CC)

markCC

public static int markCC

Mark phrases which are conjunctions. 0 = No marking 1 = Any phrase with a CC daughter that isn't first or last. Possibly marginally positive. 2 = As 0 but also a non-marginal CONJP daughter. In principle good, but no gains. 3 = More like Charniak. Not yet implemented. Need to annotate _before_ annotate children! np or vp with two or more np/vp children, a comma, cc or conjp, and nothing else.

splitSGapped

public static int splitSGapped

Mark specially S nodes with "gapped" subject (control, raising). 1 is basic version. 2 is better mark S nodes with "gapped" subject. 3 seems best on small training set, but all of these are too similar; 4 can't be differentiated. 5 is done on tree before empty splitting. (Bad!?)

splitNumNP

public static boolean splitNumNP

Mark "numeric NPs". Probably bad?

splitPoss

public static boolean splitPoss

Give a special tag to NPs which are possessive NPs (end in 's)

splitBaseNP

public static boolean splitBaseNP

Mark base NPs. Good.

collinsBaseNP

public static boolean collinsBaseNP

Mark base NPs and add a NP node above the base NP, if it isn't already in an NP over NP construction, as in Collins 1999. This option shouldn't really be in EnglishTrain since it's needed at parsing time. But we don't currently use it....

restructurePossP

public static boolean restructurePossP

Restructure possessive NPs so that they introduce a POSSP node that takes as children the POS and a regularly structured NP. I.e., recover standard good linguistic practice circa 1985. This seems a good idea, but is almost a no-op (modulo fine points of markovization), since the previous NP-P phrase already uniquely captured what is now a POSSP.

splitTMP

public static int splitTMP

Retain NP-TMP (or maybe PP-TMP) annotation. Good. The values for this parameter are defined in NPTmpRetainingTreeNormalizer.

splitSbar

public static int splitSbar

Split SBAR nodes. 1 = mark 'in order to' purpose clauses; this is actually a small and inconsistent part of what is marked SBAR-PRP in the treebank, which is mainly 'because' reason clauses. 2 = mark all infinitive SBAR. 3 = do 1 and 2. A value of 1 seems minutely positive; 2 and 3 seem negative. Also get 'in case Sfin', 'In order to', and on one occasion 'in order that'

splitNPADV

public static int splitNPADV

Retain NP-ADV annotation. 0 means strip "-ADV" annotation. 1 means to retain it, and to percolate it down to a head tag providing it can do it through a path of only NP nodes.

splitNPNNP

public static int splitNPNNP

Mark NP-NNP. 0 is nothing; 1 is only NNP head, 2 is NNP and NNPS head; 3 is NNP or NNPS anywhere in local NP. All bad!

correctTags

public static boolean correctTags

'Correct' tags to produce verbs in VPs, etc. where possible

rightPhrasal

public static boolean rightPhrasal

Right edge has a phrasal node. Bad?

sisterSplitLevel

public static int sisterSplitLevel

Set the support * KL cutoff level (1-4) for sister splitting -- don't use it, as far as we can tell so far

gpaRootVP

public static boolean gpaRootVP

Grand-parent annotate (root mark) VP below ROOT. Seems negative.

Constructor Detail