edu.stanford.nlp.stats
Class Distribution<E>

java.lang.Object
  extended by edu.stanford.nlp.stats.Distribution<E>
All Implemented Interfaces:
Serializable

public class Distribution<E>
extends Object
implements Serializable

Immutable class for representing normalized, smoothed discrete distributions from Counters. Smoothed counters reserve probability mass for unseen items, so queries for the probability of unseen items will return a small positive amount. totalCount() should always return 1.

Counter passed in constructors is copied.

Author:
Galen Andrew (galand@cs.stanford.edu)
See Also:
Serialized Form

Field Summary
protected  Counter<E> counter
           
 
Method Summary
static
<E> Distribution<E>
absolutelyDiscountedDistribution(GenericCounter<E> counter, int numberOfKeys, double discount)
           
 void addToKeySet(E o)
          Insures that object is in keyset (with possibly zero value)
 E argmax()
           
 boolean containsKey(E key)
           
static
<E> Distribution<E>
distributionFromLogisticCounter(GenericCounter<E> cntr)
          Maps a counter representing the linear weights of a multiclass logistic regression model to the probabilities of each class.
static
<E> Distribution<E>
distributionWithDirichletPrior(GenericCounter<E> c, Distribution<E> prior, double weight)
          Returns a Distribution that uses prior as a Dirichlet prior weighted by weight.
static
<E> Distribution<E>
dynamicCounterWithDirichletPrior(GenericCounter<E> c, Distribution<E> prior, double weight)
          Like normalizedCounterWithDirichletPrior except probabilities are computed dynamically from the counter and prior instead of all at once up front.
 boolean equals(Distribution distribution)
           
 boolean equals(Object o)
           
 double getCount(E key)
          Returns the current count for the given key, which is 0 if it hasn't been seen before.
 Counter<E> getCounter()
           
static
<E> Distribution<E>
getDistribution(GenericCounter<E> counter)
          Creates a Distribution from the given counter, ie makes an internal copy of the counter and divides all counts by the total count.
static
<E> Distribution<E>
getDistributionFromLogValues(GenericCounter<E> counter)
          Creates a Distribution from the given counter, ie makes an internal copy of the counter and divides all counts by the total count.
static
<E> Distribution<E>
getDistributionFromPartiallySpecifiedCounter(Counter<E> c, int numKeys)
          Assuming that c has a total count < 1, returns a new Distribution using the counts in c as probabilities.
static
<E> Distribution<E>
getDistributionWithReservedMass(GenericCounter<E> counter, double reservedMass)
           
 int getNumberOfKeys()
           
static
<E> Distribution<E>
getPerturbedDistribution(GenericCounter<E> wordCounter, Random r)
           
static
<E> Distribution<E>
getPerturbedUniformDistribution(Set<E> s, Random r)
           
 double getReservedMass()
           
static
<E> Distribution<E>
getUniformDistribution(Set<E> s)
           
static
<E> Distribution<E>
goodTuringSmoothedCounter(GenericCounter<E> counter, int numberOfKeys)
          Creates a Good-Turing smoothed Distribution from the given counter.
static
<E> Distribution<E>
goodTuringWithExplicitUnknown(GenericCounter<E> counter, E UNK)
          Creates a Good-Turing smoothed Distribution from the given counter without creating any reserved mass-- instead, the special object UNK in the counter is assumed to be the count of "UNSEEN" items.
 int hashCode()
           
 Set<E> keySet()
           
static
<E> Distribution<E>
laplaceSmoothedDistribution(GenericCounter<E> counter, int numberOfKeys)
          Creates an Laplace smoothed Distribution from the given counter, ie adds one count to every item, including unseen ones, and divides by the total count.
static
<E> Distribution<E>
laplaceSmoothedDistribution(GenericCounter<E> counter, int numberOfKeys, double lambda)
          Creates a smoothed Distribution using Lidstone's law, ie adds lambda (typically between 0 and 1) to every item, including unseen ones, and divides by the total count.
static
<E> Distribution<E>
laplaceWithExplicitUnknown(GenericCounter<E> counter, double lambda, E UNK)
          Creates a smoothed Distribution with Laplace smoothing, but assumes an explicit count of "UNKNOWN" items.
static void main(String[] args)
          For internal testing purposes only.
 double probabilityOf(E key)
          Returns the normalized count of the given object.
 E sampleFrom()
          Returns an object sampled from the distribution.
 String toString()
           
 String toString(NumberFormat nf)
           
 double totalCount()
           
 
Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, wait, wait, wait
 

Field Detail

counter

protected Counter<E> counter
Method Detail

getCounter

public Counter<E> getCounter()

toString

public String toString(NumberFormat nf)

getReservedMass

public double getReservedMass()

getNumberOfKeys

public int getNumberOfKeys()

keySet

public Set<E> keySet()

containsKey

public boolean containsKey(E key)

getCount

public double getCount(E key)
Returns the current count for the given key, which is 0 if it hasn't been seen before. This is a convenient version of get that casts and extracts the primitive value.


getDistributionFromPartiallySpecifiedCounter

public static <E> Distribution<E> getDistributionFromPartiallySpecifiedCounter(Counter<E> c,
                                                                               int numKeys)
Assuming that c has a total count < 1, returns a new Distribution using the counts in c as probabilities. If c has a total count > 1, returns a normalized distribution with no remaining mass.


getUniformDistribution

public static <E> Distribution<E> getUniformDistribution(Set<E> s)
Parameters:
s - a Set of keys.
Returns:

getPerturbedUniformDistribution

public static <E> Distribution<E> getPerturbedUniformDistribution(Set<E> s,
                                                                  Random r)
Parameters:
s - a Set of keys.
Returns:

getPerturbedDistribution

public static <E> Distribution<E> getPerturbedDistribution(GenericCounter<E> wordCounter,
                                                           Random r)

getDistribution

public static <E> Distribution<E> getDistribution(GenericCounter<E> counter)
Creates a Distribution from the given counter, ie makes an internal copy of the counter and divides all counts by the total count.

Parameters:
counter -
Returns:
a new Distribution

getDistributionWithReservedMass

public static <E> Distribution<E> getDistributionWithReservedMass(GenericCounter<E> counter,
                                                                  double reservedMass)

getDistributionFromLogValues

public static <E> Distribution<E> getDistributionFromLogValues(GenericCounter<E> counter)
Creates a Distribution from the given counter, ie makes an internal copy of the counter and divides all counts by the total count.

Parameters:
counter -
Returns:
a new Distribution

absolutelyDiscountedDistribution

public static <E> Distribution<E> absolutelyDiscountedDistribution(GenericCounter<E> counter,
                                                                   int numberOfKeys,
                                                                   double discount)

laplaceSmoothedDistribution

public static <E> Distribution<E> laplaceSmoothedDistribution(GenericCounter<E> counter,
                                                              int numberOfKeys)
Creates an Laplace smoothed Distribution from the given counter, ie adds one count to every item, including unseen ones, and divides by the total count.

Parameters:
counter -
numberOfKeys -
Returns:
a new add-1 smoothed Distribution

laplaceSmoothedDistribution

public static <E> Distribution<E> laplaceSmoothedDistribution(GenericCounter<E> counter,
                                                              int numberOfKeys,
                                                              double lambda)
Creates a smoothed Distribution using Lidstone's law, ie adds lambda (typically between 0 and 1) to every item, including unseen ones, and divides by the total count.

Parameters:
counter -
numberOfKeys -
lambda -
Returns:
a new Lidstone smoothed Distribution

laplaceWithExplicitUnknown

public static <E> Distribution<E> laplaceWithExplicitUnknown(GenericCounter<E> counter,
                                                             double lambda,
                                                             E UNK)
Creates a smoothed Distribution with Laplace smoothing, but assumes an explicit count of "UNKNOWN" items. Thus anything not in the original counter will have probability zero.

Parameters:
counter - the counter to normalize
lambda - the value to add to each count
UNK - the UNKNOWN symbol
Returns:
a new Laplace-smoothed distribution

goodTuringSmoothedCounter

public static <E> Distribution<E> goodTuringSmoothedCounter(GenericCounter<E> counter,
                                                            int numberOfKeys)
Creates a Good-Turing smoothed Distribution from the given counter.

Parameters:
counter -
numberOfKeys -
Returns:
a new Good-Turing smoothed Distribution.

goodTuringWithExplicitUnknown

public static <E> Distribution<E> goodTuringWithExplicitUnknown(GenericCounter<E> counter,
                                                                E UNK)
Creates a Good-Turing smoothed Distribution from the given counter without creating any reserved mass-- instead, the special object UNK in the counter is assumed to be the count of "UNSEEN" items. Probability of objects not in original counter will be zero.

Parameters:
counter - the counter
UNK - the unknown symbol
Returns:
a good-turing smoothed distribution

distributionWithDirichletPrior

public static <E> Distribution<E> distributionWithDirichletPrior(GenericCounter<E> c,
                                                                 Distribution<E> prior,
                                                                 double weight)
Returns a Distribution that uses prior as a Dirichlet prior weighted by weight. Essentially adds "pseudo-counts" for each Object in prior equal to that Object's mass in prior times weight, then normalizes.

WARNING: If unseen item is encountered in c, total may not be 1. NOTE: This will not work if prior is a DynamicDistribution to fix this, you could add a CounterView to Distribution and use that in the linearCombination call below

Parameters:
c -
prior -
weight - multiplier of prior to get "pseudo-count"
Returns:
new Distribution

dynamicCounterWithDirichletPrior

public static <E> Distribution<E> dynamicCounterWithDirichletPrior(GenericCounter<E> c,
                                                                   Distribution<E> prior,
                                                                   double weight)
Like normalizedCounterWithDirichletPrior except probabilities are computed dynamically from the counter and prior instead of all at once up front. The main advantage of this is if you are making many distributions from relatively sparse counters using the same relatively dense prior, the prior is only represented once, for major memory savings.

Parameters:
c -
prior -
weight - multiplier of prior to get "pseudo-count"
Returns:
new Distribution

distributionFromLogisticCounter

public static <E> Distribution<E> distributionFromLogisticCounter(GenericCounter<E> cntr)
Maps a counter representing the linear weights of a multiclass logistic regression model to the probabilities of each class.


sampleFrom

public E sampleFrom()
Returns an object sampled from the distribution. There may be a faster way to do this if you need to...

Returns:
a sampled object

probabilityOf

public double probabilityOf(E key)
Returns the normalized count of the given object.

Parameters:
key -
Returns:
the normalized count of the object

argmax

public E argmax()

totalCount

public double totalCount()

addToKeySet

public void addToKeySet(E o)
Insures that object is in keyset (with possibly zero value)

Parameters:
o - object to put in keyset

equals

public boolean equals(Object o)
Overrides:
equals in class Object

equals

public boolean equals(Distribution distribution)

hashCode

public int hashCode()
Overrides:
hashCode in class Object

toString

public String toString()
Overrides:
toString in class Object

main

public static void main(String[] args)
For internal testing purposes only.



Stanford NLP Group