edu.stanford.nlp.process
Class StripTagsProcessor
java.lang.Object
edu.stanford.nlp.process.AbstractListProcessor
edu.stanford.nlp.process.StripTagsProcessor
- All Implemented Interfaces:
- ListProcessor, Processor
public class StripTagsProcessor
- extends AbstractListProcessor
A Processor
whose process
method deletes all
SGML/XML/HTML tags (tokens starting with <
and ending
with >. Optionally, newlines can be inserted after the
end of block-level tags to roughly simulate where continuous text was
broken up (this helps finding sentence boundaries for example).
- Author:
- Christopher Manning
Field Summary |
static Set |
blockTags
Block-level HTML tags that are rendered with surrounding line breaks. |
Constructor Summary |
StripTagsProcessor()
Constructs a new StripTagsProcessor that doesn't mark line breaks. |
StripTagsProcessor(boolean markLineBreaks)
Constructs a new StripTagProcessor that marks line breaks as specified. |
Method Summary |
boolean |
getMarkLineBreaks()
Retruns whether the output of the processor will contain newline words
("\n") at the end of block-level tags. |
static void |
main(String[] args)
For internal debugging purposes only. |
List |
process(List in)
Returns a new Document with the same meta-data as in,
and the same words except tags are stripped. |
void |
setMarkLineBreaks(boolean markLineBreaks)
Sets whether the output of the processor will contain newline words
("\n") at the end of block-level tags. |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
blockTags
public static final Set blockTags
- Block-level HTML tags that are rendered with surrounding line breaks.
StripTagsProcessor
public StripTagsProcessor()
- Constructs a new StripTagsProcessor that doesn't mark line breaks.
StripTagsProcessor
public StripTagsProcessor(boolean markLineBreaks)
- Constructs a new StripTagProcessor that marks line breaks as specified.
getMarkLineBreaks
public boolean getMarkLineBreaks()
- Retruns whether the output of the processor will contain newline words
("\n") at the end of block-level tags.
setMarkLineBreaks
public void setMarkLineBreaks(boolean markLineBreaks)
- Sets whether the output of the processor will contain newline words
("\n") at the end of block-level tags.
process
public List process(List in)
- Returns a new Document with the same meta-data as in,
and the same words except tags are stripped.
main
public static void main(String[] args)
- For internal debugging purposes only.
Stanford NLP Group