|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||
java.lang.Objectedu.stanford.nlp.trees.AbstractTreebankLanguagePack
public abstract class AbstractTreebankLanguagePack
This provides an implementation of parts of the TreebankLanguagePack API to reduce the load on fresh implementations. Only the abstract methods below need to be implemented to give a reasonable solution for a new language.
| Field Summary | |
|---|---|
static String |
DEFAULT_ENCODING
Use this as the default encoding for Readers and Writers of Treebank data. |
| Constructor Summary | |
|---|---|
AbstractTreebankLanguagePack()
Gives a handle to the TreebankLanguagePack |
|
| Method Summary | |
|---|---|
String |
basicCategory(String category)
Returns the basic syntactic category of a String. |
String |
categoryAndFunction(String category)
Returns the syntactic category and 'function' of a String. |
Filter |
evalBIgnoredPunctuationTagAcceptFilter()
Returns a filter that accepts a String that is a punctuation tag that should be ignored by EVALB-style evaluation, and rejects everything else. |
Filter |
evalBIgnoredPunctuationTagRejectFilter()
Returns a filter that accepts everything except a String that is a punctuation tag that should be ignored by EVALB-style evaluation. |
String[] |
evalBIgnoredPunctuationTags()
Returns a String array of punctuation tags that EVALB-style evaluation should ignore for this treebank/language. |
Function |
getBasicCategoryFunction()
Returns a Function object that maps Strings to Strings according
to this TreebankLanguagePack's basicCategory method. |
Function |
getCategoryAndFunctionFunction()
Returns a Function object that maps Strings to Strings according
to this TreebankLanguagePack's categoryAndFunction method. |
String |
getEncoding()
Return the input Charset encoding for the Treebank. |
TokenizerFactory |
getTokenizerFactory()
Return a tokenizer which might be suitable for tokenizing text that will be used with this Treebank/Language pair, without tokenizing carriage returns (i.e., treating them as white space). |
GrammaticalStructureFactory |
grammaticalStructureFactory()
Return a GrammaticalStructureFactory suitable for this language/treebank. |
boolean |
isEvalBIgnoredPunctuationTag(String str)
Accepts a String that is a punctuation tag that should be ignored by EVALB-style evaluation, and rejects everything else. |
boolean |
isLabelAnnotationIntroducingCharacter(char ch)
Say whether this character is an annotation introducing character. |
boolean |
isPunctuationTag(String str)
Accepts a String that is a punctuation tag name, and rejects everything else. |
boolean |
isPunctuationWord(String str)
Accepts a String that is a punctuation word, and rejects everything else. |
boolean |
isSentenceFinalPunctuationTag(String str)
Accepts a String that is a sentence end punctuation tag, and rejects everything else. |
boolean |
isStartSymbol(String str)
Accepts a String that is a start symbol of the treebank. |
char[] |
labelAnnotationIntroducingCharacters()
Return an array of characters at which a String should be truncated to give the basic syntactic category of a label. |
Filter |
punctuationTagAcceptFilter()
Return a filter that accepts a String that is a punctuation tag name, and rejects everything else. |
Filter |
punctuationTagRejectFilter()
Return a filter that rejects a String that is a punctuation tag name, and rejects everything else. |
abstract String[] |
punctuationTags()
Returns a String array of punctuation tags for this treebank/language. |
Filter |
punctuationWordAcceptFilter()
Returns a filter that accepts a String that is a punctuation word, and rejects everything else. |
Filter |
punctuationWordRejectFilter()
Returns a filter that accepts a String that is not a punctuation word, and rejects punctuation. |
abstract String[] |
punctuationWords()
Returns a String array of punctuation words for this treebank/language. |
Filter |
sentenceFinalPunctuationTagAcceptFilter()
Returns a filter that accepts a String that is a sentence end punctuation tag, and rejects everything else. |
abstract String[] |
sentenceFinalPunctuationTags()
Returns a String array of sentence final punctuation tags for this treebank/language. |
String |
startSymbol()
Returns a String which is the first (perhaps unique) start symbol of the treebank, or null if none is defined. |
Filter |
startSymbolAcceptFilter()
Return a filter that accepts a String that is a start symbol of the treebank, and rejects everything else. |
abstract String[] |
startSymbols()
Returns a String array of treebank start symbols. |
| Methods inherited from class java.lang.Object |
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Methods inherited from interface edu.stanford.nlp.trees.TreebankLanguagePack |
|---|
sentenceFinalPunctuationWords, treebankFileExtension |
| Field Detail |
|---|
public static final String DEFAULT_ENCODING
| Constructor Detail |
|---|
public AbstractTreebankLanguagePack()
| Method Detail |
|---|
public abstract String[] punctuationTags()
punctuationTags in interface TreebankLanguagePackpublic abstract String[] punctuationWords()
punctuationWords in interface TreebankLanguagePackpublic abstract String[] sentenceFinalPunctuationTags()
sentenceFinalPunctuationTags in interface TreebankLanguagePackpublic String[] evalBIgnoredPunctuationTags()
evalBIgnoredPunctuationTags in interface TreebankLanguagePackpublic boolean isPunctuationTag(String str)
isPunctuationTag in interface TreebankLanguagePackpublic boolean isPunctuationWord(String str)
isPunctuationWord in interface TreebankLanguagePackpublic boolean isSentenceFinalPunctuationTag(String str)
isSentenceFinalPunctuationTag in interface TreebankLanguagePackpublic boolean isEvalBIgnoredPunctuationTag(String str)
isEvalBIgnoredPunctuationTag in interface TreebankLanguagePackpublic Filter punctuationTagAcceptFilter()
punctuationTagAcceptFilter in interface TreebankLanguagePackpublic Filter punctuationTagRejectFilter()
punctuationTagRejectFilter in interface TreebankLanguagePackpublic Filter punctuationWordAcceptFilter()
punctuationWordAcceptFilter in interface TreebankLanguagePackpublic Filter punctuationWordRejectFilter()
punctuationWordRejectFilter in interface TreebankLanguagePackpublic Filter sentenceFinalPunctuationTagAcceptFilter()
sentenceFinalPunctuationTagAcceptFilter in interface TreebankLanguagePackpublic Filter evalBIgnoredPunctuationTagAcceptFilter()
evalBIgnoredPunctuationTagAcceptFilter in interface TreebankLanguagePackpublic Filter evalBIgnoredPunctuationTagRejectFilter()
evalBIgnoredPunctuationTagRejectFilter in interface TreebankLanguagePackpublic String getEncoding()
Charset class.
getEncoding in interface TreebankLanguagePackpublic char[] labelAnnotationIntroducingCharacters()
labelAnnotationIntroducingCharacters in interface TreebankLanguagePackpublic String basicCategory(String category)
labelAnnotationIntroducingCharacters().
However, there is also special case stuff to deal with
labelAnnotationIntroducingCharacters in category labels:
(i) if the first char is in this set, it's never truncated
(e.g., '-' or '=' as a token), and (ii) if it starts with
one ofthis set, a second item of this set is also excluded
(to deal with '-LLB-', '-RCB-', etc.).
basicCategory in interface TreebankLanguagePackcategory - The whole String name of the label
public Function getBasicCategoryFunction()
Function object that maps Strings to Strings according
to this TreebankLanguagePack's basicCategory method.
getBasicCategoryFunction in interface TreebankLanguagePackpublic String categoryAndFunction(String category)
category-function.
This implementation strips numeric tags after label introducing
characters (assuming that non-numeric things are functional tags).
categoryAndFunction in interface TreebankLanguagePackcategory - The whole String name of the label
public Function getCategoryAndFunctionFunction()
Function object that maps Strings to Strings according
to this TreebankLanguagePack's categoryAndFunction method.
getCategoryAndFunctionFunction in interface TreebankLanguagePackpublic boolean isLabelAnnotationIntroducingCharacter(char ch)
isLabelAnnotationIntroducingCharacter in interface TreebankLanguagePackch - The character to check
public boolean isStartSymbol(String str)
isStartSymbol in interface TreebankLanguagePackpublic Filter startSymbolAcceptFilter()
startSymbolAcceptFilter in interface TreebankLanguagePackpublic abstract String[] startSymbols()
startSymbols in interface TreebankLanguagePackpublic String startSymbol()
startSymbol in interface TreebankLanguagePackpublic TokenizerFactory getTokenizerFactory()
WhitespaceTokenizer.
getTokenizerFactory in interface TreebankLanguagePackpublic GrammaticalStructureFactory grammaticalStructureFactory()
grammaticalStructureFactory in interface TreebankLanguagePack
|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||