edu.stanford.nlp.util
Class StringUtils

java.lang.Object
  extended by edu.stanford.nlp.util.StringUtils

public class StringUtils
extends Object

StringUtils is a class for random String things.

Author:
Dan Klein, Christopher Manning, Tim Grow (grow@stanford.edu), Chris Cox

Method Summary
static Map<String,String[]> argsToMap(String[] args)
          Parses command line arguments into a Map.
static Map<String,String[]> argsToMap(String[] args, Map<String,Integer> flagsToNumArgs)
          Parses command line arguments into a Map.
static Properties argsToProperties(String[] args)
           
static Properties argsToProperties(String[] args, Map flagsToNumArgs)
          Analagous to argsToMap(java.lang.String[]).
static String capitalize(String s)
          Uppercases the first character of a string.
static int editDistance(String s, String t)
          Computes the Levenshtein (edit) distance of the two given Strings.
static String escapeString(String s, char[] charsToEscape, char escapeChar)
           
static String fileNameClean(String s)
          Returns a "clean" version of the given filename in which spaces have been converted to dashes and all non-alphaneumeric chars are underscores.
static boolean find(String str, String regex)
          Say whether this regular expression can be found inside this String.
static String join(Iterable l, String glue)
          Joins each elem in the Collection with the given glue.
static String join(List l)
          Joins elems with a space.
static String join(List l, String glue)
          Joins each elem in the List with the given glue.
static String join(Object[] elements)
          Joins elems with a space.
static String join(Object[] elements, String glue)
          Joins each elem in the array with the given glue.
static int longestCommonSubstring(String s, String t)
          Computes the longest common substring of s and t.
static boolean lookingAt(String str, String regex)
          Say whether this regular expression can be found at the beginning of this String.
static void main(String[] args)
           
static boolean matches(String str, String regex)
          Say whether this regular expression matches this String.
static int nthIndex(String s, char ch, int n)
          Returns the index of the nth occurrence of ch in s, or -1 if there are less than n occurrences of ch.
static String pad(Object obj, int totalChars)
          Pads the toString value of the given Object.
static String pad(String str, int totalChars)
          Return a String of length a minimum of totalChars characters by padding the input String str with spaces.
static String padLeft(double d, int totalChars)
           
static String padLeft(int i, int totalChars)
           
static String padLeft(Object obj, int totalChars)
           
static String padLeft(String str, int totalChars)
          Pads the given String to the left with spaces to ensure that it's at least totalChars long.
static String padOrTrim(Object obj, int totalChars)
          Pad or trim the toString value of the given Object.
static String padOrTrim(String str, int num)
          Pad or trim so as to produce a string of exactly a certain length.
static Map parseCommandLineArguments(String[] args)
          A simpler form of command line argument parsing.
static String pennPOSToWordnetPOS(String s)
          Computes the WordNet 2.0 POS tag corresponding to the PTB POS tag s.
static void printStringOneCharPerLine(String s)
           
static void printToFile(File file, String message)
          Prints to a file.
static void printToFile(File file, String message, boolean append)
          Prints to a file.
static void printToFile(String filename, String message)
          Prints to a file.
static void printToFile(String filename, String message, boolean append)
          Prints to a file.
static String slurpFile(File file)
          Returns all the text in the given File.
static String slurpFile(String filename)
          Returns all the text in the given file
static String slurpFile(String filename, String encoding)
          Returns all the text in the given file with the given encoding.
static String slurpFileNoExceptions(File file)
          Returns all the text in the given File.
static String slurpFileNoExceptions(String filename)
          Returns all the text in the given File.
static String slurpFileNoExceptions(String filename, String encoding)
          Returns all the text in the given file with the given encoding.
static String slurpGBFile(String filename)
           
static String slurpGBFileNoExceptions(String filename)
           
static String slurpGBURL(URL u)
          Returns all the text at the given URL.
static String slurpGBURLNoExceptions(URL u)
          Returns all the text at the given URL.
static String slurpReader(Reader reader)
          Returns all the text from the given Reader.
static String slurpURL(String path)
          Returns all the text at the given URL.
static String slurpURL(URL u)
          Returns all the text at the given URL.
static String slurpURL(URL u, String encoding)
          Returns all the text at the given URL.
static String slurpURLNoExceptions(String path)
          Returns all the text at the given URL.
static String slurpURLNoExceptions(URL u)
          Returns all the text at the given URL.
static String slurpURLNoExceptions(URL u, String encoding)
          Returns all the text at the given URL.
static List split(String s)
          Splits on whitespace (\\s+).
static List split(String str, String regex)
          Splits the given string using the given regex as delimiters.
static String[] splitOnCharWithQuoting(String s, char splitChar, char quoteChar, char escapeChar)
          This function splits the String s into multiple Strings using the splitChar.
static Properties stringToProperties(String str)
          This method converts a comma-separated String (with whitespace optionally allowed after the comma) representing properties to a Properties object.
static String stripNonAlphaNumerics(String orig)
           
static String trim(Object obj, int maxWidth)
           
static String trim(String s, int maxWidth)
          Returns s if it's at most maxWidth chars, otherwise chops right side to fit.
static String truncate(int n, int smallestDigit, int biggestDigit)
          This returns a string from decimal digit smallestDigit to decimal digit biggest digit.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Method Detail

find

public static boolean find(String str,
                           String regex)
Say whether this regular expression can be found inside this String. This method provides one of the two "missing" convenience methods for regular expressions in the String class in JDK1.4. This is the one you'll want to use all the time if you're used to Perl. What were they smoking?

Parameters:
str - String to search for match in
regex - String to compile as the regular expression
Returns:
Whether the regex can be found in str

lookingAt

public static boolean lookingAt(String str,
                                String regex)
Say whether this regular expression can be found at the beginning of this String. This method provides one of the two "missing" convenience methods for regular expressions in the String class in JDK1.4.

Parameters:
str - String to search for match at start of
regex - String to compile as the regular expression
Returns:
Whether the regex can be found at the start of str

matches

public static boolean matches(String str,
                              String regex)
Say whether this regular expression matches this String. This method is the same as the String.matches() method, and is included just to give a call that is parallel to the other static regex methods in this class.

Parameters:
str - String to search for match at start of
regex - String to compile as the regular expression
Returns:
Whether the regex matches the whole of this str

slurpFile

public static String slurpFile(File file)
                        throws IOException
Returns all the text in the given File.

Throws:
IOException

slurpGBFileNoExceptions

public static String slurpGBFileNoExceptions(String filename)

slurpFile

public static String slurpFile(String filename,
                               String encoding)
                        throws IOException
Returns all the text in the given file with the given encoding.

Throws:
IOException

slurpFileNoExceptions

public static String slurpFileNoExceptions(String filename,
                                           String encoding)
Returns all the text in the given file with the given encoding. If the file cannot be read (non-existent, etc.), then and only then the method returns null.


slurpGBFile

public static String slurpGBFile(String filename)
                          throws IOException
Throws:
IOException

slurpReader

public static String slurpReader(Reader reader)
Returns all the text from the given Reader.

Returns:
The text in the file.

slurpFile

public static String slurpFile(String filename)
                        throws IOException
Returns all the text in the given file

Returns:
The text in the file.
Throws:
IOException

slurpFileNoExceptions

public static String slurpFileNoExceptions(File file)
Returns all the text in the given File.

Returns:
The text in the file. May be an empty string if the file is empty. If the file cannot be read (non-existent, etc.), then and only then the method returns null.

slurpFileNoExceptions

public static String slurpFileNoExceptions(String filename)
Returns all the text in the given File.

Returns:
The text in the file. May be an empty string if the file is empty. If the file cannot be read (non-existent, etc.), then and only then the method returns null.

slurpGBURL

public static String slurpGBURL(URL u)
                         throws IOException
Returns all the text at the given URL.

Throws:
IOException

slurpGBURLNoExceptions

public static String slurpGBURLNoExceptions(URL u)
Returns all the text at the given URL.


slurpURLNoExceptions

public static String slurpURLNoExceptions(URL u,
                                          String encoding)
Returns all the text at the given URL.


slurpURL

public static String slurpURL(URL u,
                              String encoding)
                       throws IOException
Returns all the text at the given URL.

Throws:
IOException

slurpURL

public static String slurpURL(URL u)
                       throws IOException
Returns all the text at the given URL.

Throws:
IOException

slurpURLNoExceptions

public static String slurpURLNoExceptions(URL u)
Returns all the text at the given URL.


slurpURL

public static String slurpURL(String path)
                       throws Exception
Returns all the text at the given URL.

Throws:
Exception

slurpURLNoExceptions

public static String slurpURLNoExceptions(String path)
Returns all the text at the given URL. If the file cannot be read (non-existent, etc.), then and only then the method returns null.


join

public static String join(Iterable l,
                          String glue)
Joins each elem in the Collection with the given glue. For example, given a list of Integers, you can create a comma-separated list by calling join(numbers, ", ").


join

public static String join(List l,
                          String glue)
Joins each elem in the List with the given glue. For example, given a list of Integers, you can create a comma-separated list by calling join(numbers, ", ").


join

public static String join(Object[] elements,
                          String glue)
Joins each elem in the array with the given glue. For example, given a list of ints, you can create a comma-separated list by calling join(numbers, ", ").


join

public static String join(List l)
Joins elems with a space.


join

public static String join(Object[] elements)
Joins elems with a space.


split

public static List split(String s)
Splits on whitespace (\\s+).


split

public static List split(String str,
                         String regex)
Splits the given string using the given regex as delimiters. This method is the same as the String.split() method (except it throws the results in a List), and is included just to give a call that is parallel to the other static regex methods in this class.

Parameters:
str - String to split up
regex - String to compile as the regular expression
Returns:
List of Strings resulting from splitting on the regex

pad

public static String pad(String str,
                         int totalChars)
Return a String of length a minimum of totalChars characters by padding the input String str with spaces. If str is already longer than totalChars, it is returned unchanged.


pad

public static String pad(Object obj,
                         int totalChars)
Pads the toString value of the given Object.


padOrTrim

public static String padOrTrim(String str,
                               int num)
Pad or trim so as to produce a string of exactly a certain length.

Parameters:
str - The String to be padded or truncated
num - The desired length

padOrTrim

public static String padOrTrim(Object obj,
                               int totalChars)
Pad or trim the toString value of the given Object.


padLeft

public static String padLeft(String str,
                             int totalChars)
Pads the given String to the left with spaces to ensure that it's at least totalChars long.


padLeft

public static String padLeft(Object obj,
                             int totalChars)

padLeft

public static String padLeft(int i,
                             int totalChars)

padLeft

public static String padLeft(double d,
                             int totalChars)

trim

public static String trim(String s,
                          int maxWidth)
Returns s if it's at most maxWidth chars, otherwise chops right side to fit.


trim

public static String trim(Object obj,
                          int maxWidth)

fileNameClean

public static String fileNameClean(String s)
Returns a "clean" version of the given filename in which spaces have been converted to dashes and all non-alphaneumeric chars are underscores.


nthIndex

public static int nthIndex(String s,
                           char ch,
                           int n)
Returns the index of the nth occurrence of ch in s, or -1 if there are less than n occurrences of ch.


truncate

public static String truncate(int n,
                              int smallestDigit,
                              int biggestDigit)
This returns a string from decimal digit smallestDigit to decimal digit biggest digit. Smallest digit is labeled 1, and the limits are inclusive.


argsToMap

public static Map<String,String[]> argsToMap(String[] args)
Parses command line arguments into a Map. Arguments of the form

-flag1 arg1a arg1b ... arg1m -flag2 -flag3 arg3a ... arg3n

will be parsed so that the flag is a key in the Map (including the hyphen) and its value will be a String[] containing the optional arguments (if present). The non-flag values not captured as flag arguments are collected into a String[] array and returned as the value of null in the Map. In this invocation, flags cannot take arguments, so all the String array values other than the value for null will be zero-length.

Parameters:
args -
Returns:
a Map of flag names to flag argument String[] arrays.

argsToMap

public static Map<String,String[]> argsToMap(String[] args,
                                             Map<String,Integer> flagsToNumArgs)
Parses command line arguments into a Map. Arguments of the form

-flag1 arg1a arg1b ... arg1m -flag2 -flag3 arg3a ... arg3n

will be parsed so that the flag is a key in the Map (including the hyphen) and its value will be a String[] containing the optional arguments (if present). The non-flag values not captured as flag arguments are collected into a String[] array and returned as the value of null in the Map. In this invocation, the maximum number of arguments for each flag can be specified as an Integer value of the appropriate flag key in the flagsToNumArgs Map argument. (By default, flags cannot take arguments.)

Example of usage:

Map flagsToNumArgs = new HashMap(); flagsToNumArgs.put("-x",new Integer(2)); flagsToNumArgs.put("-d",new Integer(1)); Map result = argsToMap(args,flagsToNumArgs);

Parameters:
args - the argument array to be parsed
flagsToNumArgs - a Map of flag names to Integer values specifying the maximum number of allowed arguments for that flag (default 0).
Returns:
a Map of flag names to flag argument String[] arrays.

argsToProperties

public static Properties argsToProperties(String[] args)

argsToProperties

public static Properties argsToProperties(String[] args,
                                          Map flagsToNumArgs)
Analagous to argsToMap(java.lang.String[]). However, there are several key differences between this method and argsToMap(java.lang.String[]):

stringToProperties

public static Properties stringToProperties(String str)
This method converts a comma-separated String (with whitespace optionally allowed after the comma) representing properties to a Properties object. Each property is "property=value". The value for properties without an explicitly given value is set to "true".


printToFile

public static void printToFile(File file,
                               String message,
                               boolean append)
Prints to a file. If the file already exists, appends if append=true, and overwrites if append=false


printToFile

public static void printToFile(File file,
                               String message)
Prints to a file. If the file does not exist, rewrites the file; does not append.


printToFile

public static void printToFile(String filename,
                               String message,
                               boolean append)
Prints to a file. If the file already exists, appends if append=true, and overwrites if append=false


printToFile

public static void printToFile(String filename,
                               String message)
Prints to a file. If the file does not exist, rewrites the file; does not append.


parseCommandLineArguments

public static Map parseCommandLineArguments(String[] args)
A simpler form of command line argument parsing. Dan thinks this is highly superior to the overly complexified code that comes before it. Parses command line arguments into a Map. Arguments of the form -flag1 arg1 -flag2 -flag3 arg3 will be parsed so that the flag is a key in the Map (including the hyphen) and the optional argument will be its value (if present).

Parameters:
args -
Returns:
A Map from keys to possible values (String or null)

stripNonAlphaNumerics

public static String stripNonAlphaNumerics(String orig)

printStringOneCharPerLine

public static void printStringOneCharPerLine(String s)

escapeString

public static String escapeString(String s,
                                  char[] charsToEscape,
                                  char escapeChar)

splitOnCharWithQuoting

public static String[] splitOnCharWithQuoting(String s,
                                              char splitChar,
                                              char quoteChar,
                                              char escapeChar)
This function splits the String s into multiple Strings using the splitChar. However, it provides an quoting facility: it is possible to quote strings with the quoteChar. If the quoteChar occurs within the quotedExpression, it must be prefaced by the escapeChar

Parameters:
s - The String to split
splitChar -
quoteChar -
Returns:
An array of Strings that s is split into

longestCommonSubstring

public static int longestCommonSubstring(String s,
                                         String t)
Computes the longest common substring of s and t. The longest common substring of a and b is the longest run of characters that appear in order inside both a and b. Both a and b may have other extraneous characters along the way. This is like edit distance but with no substitution and a higher number means more similar. For example, the LCS of "abcD" and "aXbc" is 3 (abc).


editDistance

public static int editDistance(String s,
                               String t)
Computes the Levenshtein (edit) distance of the two given Strings.


pennPOSToWordnetPOS

public static String pennPOSToWordnetPOS(String s)
Computes the WordNet 2.0 POS tag corresponding to the PTB POS tag s.

Parameters:
s - a Penn TreeBank POS tag.

capitalize

public static String capitalize(String s)
Uppercases the first character of a string.

Parameters:
s - a string to capitalize
Returns:
a capitalized version of the string

main

public static void main(String[] args)
                 throws IOException
Throws:
IOException


Stanford NLP Group