edu.stanford.nlp.stats
Class Counters<E>

java.lang.Object
  extended by edu.stanford.nlp.stats.Counters<E>

public class Counters<E>
extends Object

Static methods for operating on Counters.

Author:
Galen Andrew (galand@cs.stanford.edu), Jeff Michels (jmichels@stanford.edu)

Constructor Summary
Counters()
           
 
Method Summary
static
<E> Counter<E>
absoluteDifference(GenericCounter<E> c1, GenericCounter<E> c2)
          Returns |c1 - c2|.
static
<E> Counter<E>
average(GenericCounter<E> c1, GenericCounter<E> c2)
          Returns a new Counter with counts averaged from the two given Counters.
static
<E> double
cosine(GenericCounter<E> c1, GenericCounter<E> c2)
           
static
<E> Counter<E>
createCounterFromCollection(Collection<E> l)
           
static
<E> Counter<E>
createCounterFromList(List<E> l)
           
static
<E> double
crossEntropy(GenericCounter<E> from, Counter<E> to)
          Note that this implementation doesn't normalize the "from" Counter.
static
<E> double
crossEntropy(GenericCounter<E> from, GenericCounter<E> to)
          Note that this implementation doesn't normalize the "from" Counter.
static
<E> Counter<E>
division(GenericCounter<E> c1, GenericCounter<E> c2)
          Returns c1 divided by c2.
static
<E> double
dotProduct(GenericCounter<E> c1, GenericCounter<E> c2)
          Returns the product of c1 and c2.
static
<E> double
entropy(GenericCounter<E> c)
          Calculates the entropy of the given counter (in bits).
static
<E> Counter<Double>
getCountCounts(GenericCounter<E> c)
           
static
<E> void
incrementNonzero(Counter<E> c1, Counter<E> c2)
          Increments counts on all those keys in c1 for which c2 has a nonzero count (i.e., for which c2 has in its keyset).
static
<E> Counter<E>
intersection(GenericCounter<E> c1, GenericCounter<E> c2)
          Returns a counter that is the intersection of c1 and c2.
static
<E> double
jaccardCoefficient(GenericCounter<E> c1, GenericCounter<E> c2)
          Returns the Jaccard Coefficient of the two counters.
static
<E> double
jensenShannonDivergence(GenericCounter<E> c1, GenericCounter<E> c2)
          Calculates the Jensen-Shannon divergence between the two counters.
static
<E> double
klDivergence(GenericCounter<E> from, GenericCounter<E> to)
          Calculates the KL divergence between the two counters.
static
<E> Counter<E>
L2Normalize(GenericCounter<E> c)
          L2 normalize a counter.
static
<E> Counter<E>
linearCombination(GenericCounter<E> c1, double w1, GenericCounter<E> c2, double w2)
          Returns a Counter which is a weighted average of c1 and c2.
static
<E> Counter<E>
loadCounter(String filename, Class c)
          Loads a Counter from a text file.
static IntCounter loadIntCounter(String filename, Class c)
          Loads a Counter from a text file.
static
<E> Counter<E>
perturbCounts(GenericCounter<E> c, Random random, double p)
           
static
<E> void
printCounterComparison(GenericCounter<E> a, GenericCounter<E> b)
          Great for debugging.
static
<E> void
printCounterComparison(GenericCounter<E> a, GenericCounter<E> b, PrintStream out)
          Great for debugging.
static
<E> void
printCounterSortedByKeys(GenericCounter<E> c)
           
static
<E> Counter<E>
product(GenericCounter<E> c1, GenericCounter<E> c2)
          Returns the product of c1 and c2.
static
<E> void
saveCounter(GenericCounter<E> c, String filename)
          Saves a Counter to a text file.
static
<E> Counter<E>
scale(GenericCounter<E> c, double s)
          Scales each element in the Counter by the given scale factor.
static
<E> double
skewDivergence(GenericCounter<E> c1, GenericCounter<E> c2, double skew)
          Calculates the skew divergence between the two counters.
static List sortedKeys(Counter x)
           
static String toBiggestValuesFirstString(Counter c)
           
static String toBiggestValuesFirstString(Counter c, int k)
           
static
<E> PriorityQueue
toPriorityQueue(GenericCounter<E> c)
          Returns a PriorityQueue of the c where the score of the object is its priority.
static
<E> List<E>
toSortedList(GenericCounter<E> c)
           
static
<E> Counter<E>
union(GenericCounter<E> c1, GenericCounter<E> c2)
          Returns a Counter that is the union of the two Counters passed in (counts are added).
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

Counters

public Counters()
Method Detail

union

public static <E> Counter<E> union(GenericCounter<E> c1,
                                   GenericCounter<E> c2)
Returns a Counter that is the union of the two Counters passed in (counts are added).

Parameters:
c1 -
c2 -
Returns:

intersection

public static <E> Counter<E> intersection(GenericCounter<E> c1,
                                          GenericCounter<E> c2)
Returns a counter that is the intersection of c1 and c2. If both c1 and c2 contain a key, the min of the two counts is used.

Parameters:
c1 -
c2 -
Returns:

jaccardCoefficient

public static <E> double jaccardCoefficient(GenericCounter<E> c1,
                                            GenericCounter<E> c2)
Returns the Jaccard Coefficient of the two counters. Calculated as |c1 intersect c2| / ( |c1| + |c2| - |c1 intersect c2|

Parameters:
c1 -
c2 -
Returns:

product

public static <E> Counter<E> product(GenericCounter<E> c1,
                                     GenericCounter<E> c2)
Returns the product of c1 and c2.

Parameters:
c1 -
c2 -
Returns:

dotProduct

public static <E> double dotProduct(GenericCounter<E> c1,
                                    GenericCounter<E> c2)
Returns the product of c1 and c2.

Parameters:
c1 -
c2 -
Returns:

absoluteDifference

public static <E> Counter<E> absoluteDifference(GenericCounter<E> c1,
                                                GenericCounter<E> c2)
Returns |c1 - c2|.

Parameters:
c1 -
c2 -
Returns:

division

public static <E> Counter<E> division(GenericCounter<E> c1,
                                      GenericCounter<E> c2)
Returns c1 divided by c2. Note that this can create NaN if c1 has non-zero counts for keys that c2 has zero counts.

Parameters:
c1 -
c2 -
Returns:

entropy

public static <E> double entropy(GenericCounter<E> c)
Calculates the entropy of the given counter (in bits). This method internally uses normalized counts (so they sum to one), but the value returned is meaningless if some of the counts are negative.

Returns:
The entropy of the given counter (in bits)

crossEntropy

public static <E> double crossEntropy(GenericCounter<E> from,
                                      GenericCounter<E> to)
Note that this implementation doesn't normalize the "from" Counter. It does, however, normalize the "to" Counter. Result is meaningless if any of the counts are negative.

Returns:

crossEntropy

public static <E> double crossEntropy(GenericCounter<E> from,
                                      Counter<E> to)
Note that this implementation doesn't normalize the "from" Counter. Result is meaningless if any of the counts are negative.

Returns:

klDivergence

public static <E> double klDivergence(GenericCounter<E> from,
                                      GenericCounter<E> to)
Calculates the KL divergence between the two counters. That is, it calculates KL(from || to). This method internally uses normalized counts (so they sum to one), but the value returned is meaningless if any of the counts are negative. In other words, how well can c1 be represented by c2. if there is some value in c1 that gets zero prob in c2, then return positive infinity.

Parameters:
from -
to -
Returns:
The KL divergence between the distributions

jensenShannonDivergence

public static <E> double jensenShannonDivergence(GenericCounter<E> c1,
                                                 GenericCounter<E> c2)
Calculates the Jensen-Shannon divergence between the two counters. That is, it calculates 1/2 [KL(c1 || avg(c1,c2)) + KL(c2 || avg(c1,c2))] .

Parameters:
c1 -
c2 -
Returns:
The Jensen-Shannon divergence between the distributions

skewDivergence

public static <E> double skewDivergence(GenericCounter<E> c1,
                                        GenericCounter<E> c2,
                                        double skew)
Calculates the skew divergence between the two counters. That is, it calculates KL(c1 || (c2*skew + c1*(1-skew))) . In other words, how well can c1 be represented by a "smoothed" c2.

Parameters:
c1 -
c2 -
skew -
Returns:
The skew divergence between the distributions

L2Normalize

public static <E> Counter<E> L2Normalize(GenericCounter<E> c)
L2 normalize a counter.

Parameters:
c - the GenericCounter to be L2 normalized.

cosine

public static <E> double cosine(GenericCounter<E> c1,
                                GenericCounter<E> c2)

average

public static <E> Counter<E> average(GenericCounter<E> c1,
                                     GenericCounter<E> c2)
Returns a new Counter with counts averaged from the two given Counters. The average Counter will contain the union of keys in both source Counters, and each count will be the average of the two source counts for that key, where as usual a missing count in one Counter is treated as count 0.

Returns:
A new counter with counts that are the mean of the resp. counts in the given counters.

linearCombination

public static <E> Counter<E> linearCombination(GenericCounter<E> c1,
                                               double w1,
                                               GenericCounter<E> c2,
                                               double w2)
Returns a Counter which is a weighted average of c1 and c2. Counts from c1 are weighted with weight w1 and counts from c2 are weighted with w2.


perturbCounts

public static <E> Counter<E> perturbCounts(GenericCounter<E> c,
                                           Random random,
                                           double p)

createCounterFromList

public static <E> Counter<E> createCounterFromList(List<E> l)

createCounterFromCollection

public static <E> Counter<E> createCounterFromCollection(Collection<E> l)

toSortedList

public static <E> List<E> toSortedList(GenericCounter<E> c)

toPriorityQueue

public static <E> PriorityQueue toPriorityQueue(GenericCounter<E> c)
Returns a PriorityQueue of the c where the score of the object is its priority.


printCounterComparison

public static <E> void printCounterComparison(GenericCounter<E> a,
                                              GenericCounter<E> b)
Great for debugging.

Parameters:
a -
b -

printCounterComparison

public static <E> void printCounterComparison(GenericCounter<E> a,
                                              GenericCounter<E> b,
                                              PrintStream out)
Great for debugging.

Parameters:
a -
b -

getCountCounts

public static <E> Counter<Double> getCountCounts(GenericCounter<E> c)

scale

public static <E> Counter<E> scale(GenericCounter<E> c,
                                   double s)
Scales each element in the Counter by the given scale factor.


printCounterSortedByKeys

public static <E> void printCounterSortedByKeys(GenericCounter<E> c)

loadCounter

public static <E> Counter<E> loadCounter(String filename,
                                         Class c)
                              throws Exception
Loads a Counter from a text file. File must have the format of one key/count pair per line, separated by whitespace.

Parameters:
filename - the path to the file to load the Counter from
c - the Class to instantiate each member of the set. Must have a String constructor.
Returns:
Throws:
Exception

loadIntCounter

public static IntCounter loadIntCounter(String filename,
                                        Class c)
                                 throws Exception
Loads a Counter from a text file. File must have the format of one key/count pair per line, separated by whitespace.

Parameters:
filename - the path to the file to load the Counter from
c - the Class to instantiate each member of the set. Must have a String constructor.
Returns:
Throws:
Exception

saveCounter

public static <E> void saveCounter(GenericCounter<E> c,
                                   String filename)
                        throws IOException
Saves a Counter to a text file. Counter written as one key/count pair per line, separated by whitespace.

Parameters:
c -
filename -
Throws:
IOException

incrementNonzero

public static <E> void incrementNonzero(Counter<E> c1,
                                        Counter<E> c2)
Increments counts on all those keys in c1 for which c2 has a nonzero count (i.e., for which c2 has in its keyset).


sortedKeys

public static List sortedKeys(Counter x)

toBiggestValuesFirstString

public static String toBiggestValuesFirstString(Counter c)

toBiggestValuesFirstString

public static String toBiggestValuesFirstString(Counter c,
                                                int k)


Stanford NLP Group