org.apache.commons.math3.stat.descriptive
Class DescriptiveStatistics

java.lang.Object
  extended by org.apache.commons.math3.stat.descriptive.DescriptiveStatistics
All Implemented Interfaces:
Serializable, StatisticalSummary
Direct Known Subclasses:
SynchronizedDescriptiveStatistics

public class DescriptiveStatistics
extends Object
implements StatisticalSummary, Serializable

Maintains a dataset of values of a single variable and computes descriptive statistics based on stored data. The windowSize property sets a limit on the number of values that can be stored in the dataset. The default value, INFINITE_WINDOW, puts no limit on the size of the dataset. This value should be used with caution, as the backing store will grow without bound in this case. For very large datasets, SummaryStatistics, which does not store the dataset, should be used instead of this class. If windowSize is not INFINITE_WINDOW and more values are added than can be stored in the dataset, new values are added in a "rolling" manner, with new values replacing the "oldest" values in the dataset.

Note: this class is not threadsafe. Use SynchronizedDescriptiveStatistics if concurrent access from multiple threads is required.

Version:
$Id: DescriptiveStatistics.java 7721 2013-02-14 14:07:13Z CardosoP $
See Also:
Serialized Form

Field Summary
static int INFINITE_WINDOW
          Represents an infinite window size.
protected  int windowSize
          hold the window size
 
Constructor Summary
DescriptiveStatistics()
          Construct a DescriptiveStatistics instance with an infinite window
DescriptiveStatistics(DescriptiveStatistics original)
          Copy constructor.
DescriptiveStatistics(double[] initialDoubleArray)
          Construct a DescriptiveStatistics instance with an infinite window and the initial data values in double[] initialDoubleArray.
DescriptiveStatistics(int window)
          Construct a DescriptiveStatistics instance with the specified window
 
Method Summary
 void addValue(double v)
          Adds the value to the dataset.
 double apply(UnivariateStatistic stat)
          Apply the given statistic to the data associated with this set of statistics.
 void clear()
          Resets all statistics and storage
 DescriptiveStatistics copy()
          Returns a copy of this DescriptiveStatistics instance with the same internal state.
static void copy(DescriptiveStatistics source, DescriptiveStatistics dest)
          Copies source to dest.
 double getElement(int index)
          Returns the element at the specified index
 double getGeometricMean()
          Returns the geometric mean of the available values
 UnivariateStatistic getGeometricMeanImpl()
          Returns the currently configured geometric mean implementation.
 double getKurtosis()
          Returns the Kurtosis of the available values.
 UnivariateStatistic getKurtosisImpl()
          Returns the currently configured kurtosis implementation.
 double getMax()
          Returns the maximum of the available values
 UnivariateStatistic getMaxImpl()
          Returns the currently configured maximum implementation.
 double getMean()
          Returns the arithmetic mean of the available values
 UnivariateStatistic getMeanImpl()
          Returns the currently configured mean implementation.
 double getMin()
          Returns the minimum of the available values
 UnivariateStatistic getMinImpl()
          Returns the currently configured minimum implementation.
 long getN()
          Returns the number of available values
 double getPercentile(double p)
          Returns an estimate for the pth percentile of the stored values.
 UnivariateStatistic getPercentileImpl()
          Returns the currently configured percentile implementation.
 double getPopulationVariance()
          Returns the population variance of the available values.
 double getSkewness()
          Returns the skewness of the available values.
 UnivariateStatistic getSkewnessImpl()
          Returns the currently configured skewness implementation.
 double[] getSortedValues()
          Returns the current set of values in an array of double primitives, sorted in ascending order.
 double getStandardDeviation()
          Returns the standard deviation of the available values.
 double getSum()
          Returns the sum of the values that have been added to Univariate.
 UnivariateStatistic getSumImpl()
          Returns the currently configured sum implementation.
 double getSumsq()
          Returns the sum of the squares of the available values.
 UnivariateStatistic getSumsqImpl()
          Returns the currently configured sum of squares implementation.
 double[] getValues()
          Returns the current set of values in an array of double primitives.
 double getVariance()
          Returns the (sample) variance of the available values.
 UnivariateStatistic getVarianceImpl()
          Returns the currently configured variance implementation.
 int getWindowSize()
          Returns the maximum number of values that can be stored in the dataset, or INFINITE_WINDOW (-1) if there is no limit.
 void removeMostRecentValue()
          Removes the most recent value from the dataset.
 double replaceMostRecentValue(double v)
          Replaces the most recently stored value with the given value.
 void setGeometricMeanImpl(UnivariateStatistic geometricMeanImpl)
          Sets the implementation for the gemoetric mean.
 void setKurtosisImpl(UnivariateStatistic kurtosisImpl)
          Sets the implementation for the kurtosis.
 void setMaxImpl(UnivariateStatistic maxImpl)
          Sets the implementation for the maximum.
 void setMeanImpl(UnivariateStatistic meanImpl)
          Sets the implementation for the mean.
 void setMinImpl(UnivariateStatistic minImpl)
          Sets the implementation for the minimum.
 void setPercentileImpl(UnivariateStatistic percentileImpl)
          Sets the implementation to be used by getPercentile(double).
 void setSkewnessImpl(UnivariateStatistic skewnessImpl)
          Sets the implementation for the skewness.
 void setSumImpl(UnivariateStatistic sumImpl)
          Sets the implementation for the sum.
 void setSumsqImpl(UnivariateStatistic sumsqImpl)
          Sets the implementation for the sum of squares.
 void setVarianceImpl(UnivariateStatistic varianceImpl)
          Sets the implementation for the variance.
 void setWindowSize(int windowSize)
          WindowSize controls the number of values that contribute to the reported statistics.
 String toString()
          Generates a text report displaying univariate statistics from values that have been added.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

INFINITE_WINDOW

public static final int INFINITE_WINDOW
Represents an infinite window size. When the getWindowSize() returns this value, there is no limit to the number of data values that can be stored in the dataset.

See Also:
Constant Field Values

windowSize

protected int windowSize
hold the window size

Constructor Detail

DescriptiveStatistics

public DescriptiveStatistics()
Construct a DescriptiveStatistics instance with an infinite window


DescriptiveStatistics

public DescriptiveStatistics(int window)
                      throws MathIllegalArgumentException
Construct a DescriptiveStatistics instance with the specified window

Parameters:
window - the window size.
Throws:
MathIllegalArgumentException - if window size is less than 1 but not equal to INFINITE_WINDOW

DescriptiveStatistics

public DescriptiveStatistics(double[] initialDoubleArray)
Construct a DescriptiveStatistics instance with an infinite window and the initial data values in double[] initialDoubleArray. If initialDoubleArray is null, then this constructor corresponds to DescriptiveStatistics()

Parameters:
initialDoubleArray - the initial double[].

DescriptiveStatistics

public DescriptiveStatistics(DescriptiveStatistics original)
                      throws NullArgumentException
Copy constructor. Construct a new DescriptiveStatistics instance that is a copy of original.

Parameters:
original - DescriptiveStatistics instance to copy
Throws:
NullArgumentException - if original is null
Method Detail

addValue

public void addValue(double v)
Adds the value to the dataset. If the dataset is at the maximum size (i.e., the number of stored elements equals the currently configured windowSize), the first (oldest) element in the dataset is discarded to make room for the new value.

Parameters:
v - the value to be added

removeMostRecentValue

public void removeMostRecentValue()
                           throws MathIllegalStateException
Removes the most recent value from the dataset.

Throws:
MathIllegalStateException - if there are no elements stored

replaceMostRecentValue

public double replaceMostRecentValue(double v)
                              throws MathIllegalStateException
Replaces the most recently stored value with the given value. There must be at least one element stored to call this method.

Parameters:
v - the value to replace the most recent stored value
Returns:
replaced value
Throws:
MathIllegalStateException - if there are no elements stored

getMean

public double getMean()
Returns the arithmetic mean of the available values

Specified by:
getMean in interface StatisticalSummary
Returns:
The mean or Double.NaN if no values have been added.

getGeometricMean

public double getGeometricMean()
Returns the geometric mean of the available values

Returns:
The geometricMean, Double.NaN if no values have been added, or if the product of the available values is less than or equal to 0.

getVariance

public double getVariance()
Returns the (sample) variance of the available values.

This method returns the bias-corrected sample variance (using n - 1 in the denominator). Use getPopulationVariance() for the non-bias-corrected population variance.

Specified by:
getVariance in interface StatisticalSummary
Returns:
The variance, Double.NaN if no values have been added or 0.0 for a single value set.

getPopulationVariance

public double getPopulationVariance()
Returns the population variance of the available values.

Returns:
The population variance, Double.NaN if no values have been added, or 0.0 for a single value set.

getStandardDeviation

public double getStandardDeviation()
Returns the standard deviation of the available values.

Specified by:
getStandardDeviation in interface StatisticalSummary
Returns:
The standard deviation, Double.NaN if no values have been added or 0.0 for a single value set.

getSkewness

public double getSkewness()
Returns the skewness of the available values. Skewness is a measure of the asymmetry of a given distribution.

Returns:
The skewness, Double.NaN if no values have been added or 0.0 for a value set <=2.

getKurtosis

public double getKurtosis()
Returns the Kurtosis of the available values. Kurtosis is a measure of the "peakedness" of a distribution

Returns:
The kurtosis, Double.NaN if no values have been added, or 0.0 for a value set <=3.

getMax

public double getMax()
Returns the maximum of the available values

Specified by:
getMax in interface StatisticalSummary
Returns:
The max or Double.NaN if no values have been added.

getMin

public double getMin()
Returns the minimum of the available values

Specified by:
getMin in interface StatisticalSummary
Returns:
The min or Double.NaN if no values have been added.

getN

public long getN()
Returns the number of available values

Specified by:
getN in interface StatisticalSummary
Returns:
The number of available values

getSum

public double getSum()
Returns the sum of the values that have been added to Univariate.

Specified by:
getSum in interface StatisticalSummary
Returns:
The sum or Double.NaN if no values have been added

getSumsq

public double getSumsq()
Returns the sum of the squares of the available values.

Returns:
The sum of the squares or Double.NaN if no values have been added.

clear

public void clear()
Resets all statistics and storage


getWindowSize

public int getWindowSize()
Returns the maximum number of values that can be stored in the dataset, or INFINITE_WINDOW (-1) if there is no limit.

Returns:
The current window size or -1 if its Infinite.

setWindowSize

public void setWindowSize(int windowSize)
                   throws MathIllegalArgumentException
WindowSize controls the number of values that contribute to the reported statistics. For example, if windowSize is set to 3 and the values {1,2,3,4,5} have been added in that order then the available values are {3,4,5} and all reported statistics will be based on these values. If windowSize is decreased as a result of this call and there are more than the new value of elements in the current dataset, values from the front of the array are discarded to reduce the dataset to windowSize elements.

Parameters:
windowSize - sets the size of the window.
Throws:
MathIllegalArgumentException - if window size is less than 1 but not equal to INFINITE_WINDOW

getValues

public double[] getValues()
Returns the current set of values in an array of double primitives. The order of addition is preserved. The returned array is a fresh copy of the underlying data -- i.e., it is not a reference to the stored data.

Returns:
returns the current set of numbers in the order in which they were added to this set

getSortedValues

public double[] getSortedValues()
Returns the current set of values in an array of double primitives, sorted in ascending order. The returned array is a fresh copy of the underlying data -- i.e., it is not a reference to the stored data.

Returns:
returns the current set of numbers sorted in ascending order

getElement

public double getElement(int index)
Returns the element at the specified index

Parameters:
index - The Index of the element
Returns:
return the element at the specified index

getPercentile

public double getPercentile(double p)
                     throws MathIllegalStateException,
                            MathIllegalArgumentException
Returns an estimate for the pth percentile of the stored values.

The implementation provided here follows the first estimation procedure presented here.

Preconditions:

Parameters:
p - the requested percentile (scaled from 0 - 100)
Returns:
An estimate for the pth percentile of the stored data
Throws:
MathIllegalStateException - if percentile implementation has been overridden and the supplied implementation does not support setQuantile
MathIllegalArgumentException - if p is not a valid quantile

toString

public String toString()
Generates a text report displaying univariate statistics from values that have been added. Each statistic is displayed on a separate line.

Overrides:
toString in class Object
Returns:
String with line feeds displaying statistics

apply

public double apply(UnivariateStatistic stat)
Apply the given statistic to the data associated with this set of statistics.

Parameters:
stat - the statistic to apply
Returns:
the computed value of the statistic.

getMeanImpl

public UnivariateStatistic getMeanImpl()
Returns the currently configured mean implementation.

Returns:
the UnivariateStatistic implementing the mean
Since:
1.2

setMeanImpl

public void setMeanImpl(UnivariateStatistic meanImpl)

Sets the implementation for the mean.

Parameters:
meanImpl - the UnivariateStatistic instance to use for computing the mean
Since:
1.2

getGeometricMeanImpl

public UnivariateStatistic getGeometricMeanImpl()
Returns the currently configured geometric mean implementation.

Returns:
the UnivariateStatistic implementing the geometric mean
Since:
1.2

setGeometricMeanImpl

public void setGeometricMeanImpl(UnivariateStatistic geometricMeanImpl)

Sets the implementation for the gemoetric mean.

Parameters:
geometricMeanImpl - the UnivariateStatistic instance to use for computing the geometric mean
Since:
1.2

getKurtosisImpl

public UnivariateStatistic getKurtosisImpl()
Returns the currently configured kurtosis implementation.

Returns:
the UnivariateStatistic implementing the kurtosis
Since:
1.2

setKurtosisImpl

public void setKurtosisImpl(UnivariateStatistic kurtosisImpl)

Sets the implementation for the kurtosis.

Parameters:
kurtosisImpl - the UnivariateStatistic instance to use for computing the kurtosis
Since:
1.2

getMaxImpl

public UnivariateStatistic getMaxImpl()
Returns the currently configured maximum implementation.

Returns:
the UnivariateStatistic implementing the maximum
Since:
1.2

setMaxImpl

public void setMaxImpl(UnivariateStatistic maxImpl)

Sets the implementation for the maximum.

Parameters:
maxImpl - the UnivariateStatistic instance to use for computing the maximum
Since:
1.2

getMinImpl

public UnivariateStatistic getMinImpl()
Returns the currently configured minimum implementation.

Returns:
the UnivariateStatistic implementing the minimum
Since:
1.2

setMinImpl

public void setMinImpl(UnivariateStatistic minImpl)

Sets the implementation for the minimum.

Parameters:
minImpl - the UnivariateStatistic instance to use for computing the minimum
Since:
1.2

getPercentileImpl

public UnivariateStatistic getPercentileImpl()
Returns the currently configured percentile implementation.

Returns:
the UnivariateStatistic implementing the percentile
Since:
1.2

setPercentileImpl

public void setPercentileImpl(UnivariateStatistic percentileImpl)
                       throws MathIllegalArgumentException
Sets the implementation to be used by getPercentile(double). The supplied UnivariateStatistic must provide a setQuantile(double) method; otherwise IllegalArgumentException is thrown.

Parameters:
percentileImpl - the percentileImpl to set
Throws:
MathIllegalArgumentException - if the supplied implementation does not provide a setQuantile method
Since:
1.2

getSkewnessImpl

public UnivariateStatistic getSkewnessImpl()
Returns the currently configured skewness implementation.

Returns:
the UnivariateStatistic implementing the skewness
Since:
1.2

setSkewnessImpl

public void setSkewnessImpl(UnivariateStatistic skewnessImpl)

Sets the implementation for the skewness.

Parameters:
skewnessImpl - the UnivariateStatistic instance to use for computing the skewness
Since:
1.2

getVarianceImpl

public UnivariateStatistic getVarianceImpl()
Returns the currently configured variance implementation.

Returns:
the UnivariateStatistic implementing the variance
Since:
1.2

setVarianceImpl

public void setVarianceImpl(UnivariateStatistic varianceImpl)

Sets the implementation for the variance.

Parameters:
varianceImpl - the UnivariateStatistic instance to use for computing the variance
Since:
1.2

getSumsqImpl

public UnivariateStatistic getSumsqImpl()
Returns the currently configured sum of squares implementation.

Returns:
the UnivariateStatistic implementing the sum of squares
Since:
1.2

setSumsqImpl

public void setSumsqImpl(UnivariateStatistic sumsqImpl)

Sets the implementation for the sum of squares.

Parameters:
sumsqImpl - the UnivariateStatistic instance to use for computing the sum of squares
Since:
1.2

getSumImpl

public UnivariateStatistic getSumImpl()
Returns the currently configured sum implementation.

Returns:
the UnivariateStatistic implementing the sum
Since:
1.2

setSumImpl

public void setSumImpl(UnivariateStatistic sumImpl)

Sets the implementation for the sum.

Parameters:
sumImpl - the UnivariateStatistic instance to use for computing the sum
Since:
1.2

copy

public DescriptiveStatistics copy()
Returns a copy of this DescriptiveStatistics instance with the same internal state.

Returns:
a copy of this

copy

public static void copy(DescriptiveStatistics source,
                        DescriptiveStatistics dest)
                 throws NullArgumentException
Copies source to dest.

Neither source nor dest can be null.

Parameters:
source - DescriptiveStatistics to copy
dest - DescriptiveStatistics to copy to
Throws:
NullArgumentException - if either source or dest is null


Copyright © 2016 CNES. All Rights Reserved.