edu.stanford.nlp.process
Class WordToTaggedWordProcessor

java.lang.Object
  extended by edu.stanford.nlp.process.AbstractListProcessor
      extended by edu.stanford.nlp.process.WordToTaggedWordProcessor
All Implemented Interfaces:
ListProcessor, Processor

public class WordToTaggedWordProcessor
extends AbstractListProcessor

Transforms a Document of Words into a document all or partly of TaggedWords by breaking words on a tag divider character.

Author:
Teg Grenager (grenager@stanford.edu), Christopher Manning

Field Summary
protected  char splitChar
          The char that we will split on.
 
Constructor Summary
WordToTaggedWordProcessor()
          Create a WordToTaggedWordProcessor using the default forward slash character to split on.
WordToTaggedWordProcessor(char splitChar)
          Flexibly set the tag splitting chars.
 
Method Summary
static void main(String[] args)
          This will print out some text, recognizing tags.
 List process(List words)
          Returns a new Document where each Word with a tag has been converted to a TaggedWord.
 
Methods inherited from class edu.stanford.nlp.process.AbstractListProcessor
processDocument, processLists
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

splitChar

protected char splitChar
The char that we will split on.

Constructor Detail

WordToTaggedWordProcessor

public WordToTaggedWordProcessor()
Create a WordToTaggedWordProcessor using the default forward slash character to split on.


WordToTaggedWordProcessor

public WordToTaggedWordProcessor(char splitChar)
Flexibly set the tag splitting chars. A splitChar of 0 is interpreted to mean never split off a tag.

Parameters:
splitChar - The character at which to split
Method Detail

process

public List process(List words)
Returns a new Document where each Word with a tag has been converted to a TaggedWord. Things in the input which don't implement HasWord will be deleted in the output. Things which do will be scanned for being word + splitChar + tag. If they are, they are split up and inserted as TaggedWords, otherwise they are added to the document with their current type. More precisely, they will be split on the last instance of splitChar with index above 0. This will give the correct split, providing tags don't include the splitChar, regardless of escaping, and will not allow an empty or null word - you can think of the first character as always being escaped.

Parameters:
words - The input Document (should be of HasWords)
Returns:
A new Document, perhaps with some of the things TaggedWords

main

public static void main(String[] args)
This will print out some text, recognizing tags. It can be used to test tag breaking.
Usage: java edu.stanford.nlp.process.WordToTaggedWordProcessor fileOrUrl

Parameters:
args - Command line argument: a file or URL


Stanford NLP Group