|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectedu.stanford.nlp.trees.AbstractTreebankLanguagePack
public abstract class AbstractTreebankLanguagePack
This provides an implementation of parts of the TreebankLanguagePack API to reduce the load on fresh implementations. Only the abstract methods below need to be implemented to give a reasonable solution for a new language.
Field Summary | |
---|---|
static String |
DEFAULT_ENCODING
Use this as the default encoding for Readers and Writers of Treebank data. |
Constructor Summary | |
---|---|
AbstractTreebankLanguagePack()
Gives a handle to the TreebankLanguagePack |
Method Summary | |
---|---|
String |
basicCategory(String category)
Returns the basic syntactic category of a String. |
String |
categoryAndFunction(String category)
Returns the syntactic category and 'function' of a String. |
Filter |
evalBIgnoredPunctuationTagAcceptFilter()
Returns a filter that accepts a String that is a punctuation tag that should be ignored by EVALB-style evaluation, and rejects everything else. |
Filter |
evalBIgnoredPunctuationTagRejectFilter()
Returns a filter that accepts everything except a String that is a punctuation tag that should be ignored by EVALB-style evaluation. |
String[] |
evalBIgnoredPunctuationTags()
Returns a String array of punctuation tags that EVALB-style evaluation should ignore for this treebank/language. |
Function |
getBasicCategoryFunction()
Returns a Function object that maps Strings to Strings according
to this TreebankLanguagePack's basicCategory method. |
Function |
getCategoryAndFunctionFunction()
Returns a Function object that maps Strings to Strings according
to this TreebankLanguagePack's categoryAndFunction method. |
String |
getEncoding()
Return the input Charset encoding for the Treebank. |
TokenizerFactory |
getTokenizerFactory()
Return a tokenizer which might be suitable for tokenizing text that will be used with this Treebank/Language pair, without tokenizing carriage returns (i.e., treating them as white space). |
GrammaticalStructureFactory |
grammaticalStructureFactory()
Return a GrammaticalStructureFactory suitable for this language/treebank. |
boolean |
isEvalBIgnoredPunctuationTag(String str)
Accepts a String that is a punctuation tag that should be ignored by EVALB-style evaluation, and rejects everything else. |
boolean |
isLabelAnnotationIntroducingCharacter(char ch)
Say whether this character is an annotation introducing character. |
boolean |
isPunctuationTag(String str)
Accepts a String that is a punctuation tag name, and rejects everything else. |
boolean |
isPunctuationWord(String str)
Accepts a String that is a punctuation word, and rejects everything else. |
boolean |
isSentenceFinalPunctuationTag(String str)
Accepts a String that is a sentence end punctuation tag, and rejects everything else. |
boolean |
isStartSymbol(String str)
Accepts a String that is a start symbol of the treebank. |
char[] |
labelAnnotationIntroducingCharacters()
Return an array of characters at which a String should be truncated to give the basic syntactic category of a label. |
Filter |
punctuationTagAcceptFilter()
Return a filter that accepts a String that is a punctuation tag name, and rejects everything else. |
Filter |
punctuationTagRejectFilter()
Return a filter that rejects a String that is a punctuation tag name, and rejects everything else. |
abstract String[] |
punctuationTags()
Returns a String array of punctuation tags for this treebank/language. |
Filter |
punctuationWordAcceptFilter()
Returns a filter that accepts a String that is a punctuation word, and rejects everything else. |
Filter |
punctuationWordRejectFilter()
Returns a filter that accepts a String that is not a punctuation word, and rejects punctuation. |
abstract String[] |
punctuationWords()
Returns a String array of punctuation words for this treebank/language. |
Filter |
sentenceFinalPunctuationTagAcceptFilter()
Returns a filter that accepts a String that is a sentence end punctuation tag, and rejects everything else. |
abstract String[] |
sentenceFinalPunctuationTags()
Returns a String array of sentence final punctuation tags for this treebank/language. |
String |
startSymbol()
Returns a String which is the first (perhaps unique) start symbol of the treebank, or null if none is defined. |
Filter |
startSymbolAcceptFilter()
Return a filter that accepts a String that is a start symbol of the treebank, and rejects everything else. |
abstract String[] |
startSymbols()
Returns a String array of treebank start symbols. |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Methods inherited from interface edu.stanford.nlp.trees.TreebankLanguagePack |
---|
sentenceFinalPunctuationWords, treebankFileExtension |
Field Detail |
---|
public static final String DEFAULT_ENCODING
Constructor Detail |
---|
public AbstractTreebankLanguagePack()
Method Detail |
---|
public abstract String[] punctuationTags()
punctuationTags
in interface TreebankLanguagePack
public abstract String[] punctuationWords()
punctuationWords
in interface TreebankLanguagePack
public abstract String[] sentenceFinalPunctuationTags()
sentenceFinalPunctuationTags
in interface TreebankLanguagePack
public String[] evalBIgnoredPunctuationTags()
evalBIgnoredPunctuationTags
in interface TreebankLanguagePack
public boolean isPunctuationTag(String str)
isPunctuationTag
in interface TreebankLanguagePack
public boolean isPunctuationWord(String str)
isPunctuationWord
in interface TreebankLanguagePack
public boolean isSentenceFinalPunctuationTag(String str)
isSentenceFinalPunctuationTag
in interface TreebankLanguagePack
public boolean isEvalBIgnoredPunctuationTag(String str)
isEvalBIgnoredPunctuationTag
in interface TreebankLanguagePack
public Filter punctuationTagAcceptFilter()
punctuationTagAcceptFilter
in interface TreebankLanguagePack
public Filter punctuationTagRejectFilter()
punctuationTagRejectFilter
in interface TreebankLanguagePack
public Filter punctuationWordAcceptFilter()
punctuationWordAcceptFilter
in interface TreebankLanguagePack
public Filter punctuationWordRejectFilter()
punctuationWordRejectFilter
in interface TreebankLanguagePack
public Filter sentenceFinalPunctuationTagAcceptFilter()
sentenceFinalPunctuationTagAcceptFilter
in interface TreebankLanguagePack
public Filter evalBIgnoredPunctuationTagAcceptFilter()
evalBIgnoredPunctuationTagAcceptFilter
in interface TreebankLanguagePack
public Filter evalBIgnoredPunctuationTagRejectFilter()
evalBIgnoredPunctuationTagRejectFilter
in interface TreebankLanguagePack
public String getEncoding()
Charset
class.
getEncoding
in interface TreebankLanguagePack
public char[] labelAnnotationIntroducingCharacters()
labelAnnotationIntroducingCharacters
in interface TreebankLanguagePack
public String basicCategory(String category)
labelAnnotationIntroducingCharacters()
.
However, there is also special case stuff to deal with
labelAnnotationIntroducingCharacters in category labels:
(i) if the first char is in this set, it's never truncated
(e.g., '-' or '=' as a token), and (ii) if it starts with
one ofthis set, a second item of this set is also excluded
(to deal with '-LLB-', '-RCB-', etc.).
basicCategory
in interface TreebankLanguagePack
category
- The whole String name of the label
public Function getBasicCategoryFunction()
Function
object that maps Strings to Strings according
to this TreebankLanguagePack's basicCategory method.
getBasicCategoryFunction
in interface TreebankLanguagePack
public String categoryAndFunction(String category)
category-function
.
This implementation strips numeric tags after label introducing
characters (assuming that non-numeric things are functional tags).
categoryAndFunction
in interface TreebankLanguagePack
category
- The whole String name of the label
public Function getCategoryAndFunctionFunction()
Function
object that maps Strings to Strings according
to this TreebankLanguagePack's categoryAndFunction method.
getCategoryAndFunctionFunction
in interface TreebankLanguagePack
public boolean isLabelAnnotationIntroducingCharacter(char ch)
isLabelAnnotationIntroducingCharacter
in interface TreebankLanguagePack
ch
- The character to check
public boolean isStartSymbol(String str)
isStartSymbol
in interface TreebankLanguagePack
public Filter startSymbolAcceptFilter()
startSymbolAcceptFilter
in interface TreebankLanguagePack
public abstract String[] startSymbols()
startSymbols
in interface TreebankLanguagePack
public String startSymbol()
startSymbol
in interface TreebankLanguagePack
public TokenizerFactory getTokenizerFactory()
WhitespaceTokenizer
.
getTokenizerFactory
in interface TreebankLanguagePack
public GrammaticalStructureFactory grammaticalStructureFactory()
grammaticalStructureFactory
in interface TreebankLanguagePack
|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |