public class EmpiricalDistribution extends AbstractRealDistribution
Represents an empirical probability distribution -- a probability distribution derived from observed data without making any assumptions about the functional form of the population distribution that the data come from.
An EmpiricalDistribution
maintains data structures, called distribution digests, that describe
empirical distributions and support the following operations:
EmpiricalDistribution
to build grouped frequency histograms representing the input
data or to generate random values "like" those in the input file -- i.e., the values generated will follow the
distribution of the values in the file.
The implementation uses what amounts to the Variable Kernel Method with Gaussian smoothing:
Digesting the input file
binCount
"bins."
EmpiricalDistribution implements the RealDistribution
interface as follows. Given x within the range of
values in the dataset, let B be the bin containing x and let K be the within-bin kernel for B. Let P(B-) be the sum
of the probabilities of the bins below B and let K(B) be the mass of B under K (i.e., the integral of the kernel
density over B). Then set P(X < x) = P(B-) + P(B) * K(x) / K(B) where K(x) is the kernel distribution evaluated at x.
This results in a cdf that matches the grouped frequency distribution at the bin endpoints and interpolates within
bins using within-bin kernels.
binCount
is set by default to 1000. A good rule of thumb is to set the bin count to
approximately the length of the input file divided by 10.Modifier and Type | Field and Description |
---|---|
static int |
DEFAULT_BIN_COUNT
Default bin count
|
random, SOLVER_DEFAULT_ABSOLUTE_ACCURACY
Constructor and Description |
---|
EmpiricalDistribution()
Creates a new EmpiricalDistribution with the default bin count.
|
EmpiricalDistribution(int binCountIn)
Creates a new EmpiricalDistribution with the specified bin count.
|
EmpiricalDistribution(int binCountIn,
RandomGenerator generator)
Creates a new EmpiricalDistribution with the specified bin count using the
provided
RandomGenerator as the source of random data. |
EmpiricalDistribution(RandomGenerator generator)
Creates a new EmpiricalDistribution with default bin count using the
provided
RandomGenerator as the source of random data. |
Modifier and Type | Method and Description |
---|---|
double |
cumulativeProbability(double x)
For a random variable
X whose values are distributed according
to this distribution, this method returns P(X <= x) . |
double |
density(double x)
Returns the probability density function (PDF) of this distribution
evaluated at the specified point
x . |
int |
getBinCount()
Returns the number of bins.
|
List<SummaryStatistics> |
getBinStats()
Returns a List of
SummaryStatistics instances containing
statistics describing the values in each of the bins. |
double[] |
getGeneratorUpperBounds()
Returns a fresh copy of the array of upper bounds of the subintervals of [0,1] used in generating data from the
empirical distribution.
|
double |
getNextValue()
Generates a random value from this distribution.
|
double |
getNumericalMean()
Use this method to get the numerical value of the mean of this
distribution.
|
double |
getNumericalVariance()
Use this method to get the numerical value of the variance of this
distribution.
|
StatisticalSummary |
getSampleStats()
Returns a
StatisticalSummary describing this distribution. |
double |
getSupportLowerBound()
Access the lower bound of the support.
|
double |
getSupportUpperBound()
Access the upper bound of the support.
|
double[] |
getUpperBounds()
Returns a fresh copy of the array of upper bounds for the bins.
|
double |
inverseCumulativeProbability(double p)
Computes the quantile function of this distribution.
|
boolean |
isLoaded()
Property indicating whether or not the distribution has been loaded.
|
boolean |
isSupportConnected()
Use this method to get information about whether the support is connected,
i.e. whether all values between the lower and upper bound of the support
are included in the support.
|
boolean |
isSupportLowerBoundInclusive()
Returns true if support contains lower bound.
|
boolean |
isSupportUpperBoundInclusive()
Returns true if support contains upper bound.
|
void |
load(double[] array)
Computes the empirical distribution from the provided
array of numbers.
|
void |
load(File file)
Computes the empirical distribution from the input file.
|
void |
load(URL url)
Computes the empirical distribution using data read from a URL.
|
double |
probability(double x)
For a random variable
X whose values are distributed according
to this distribution, this method returns P(X = x) . |
void |
reSeed(long seed)
Reseeds the random number generator used by
getNextValue() . |
void |
reseedRandomGenerator(long seed)
Reseed the random generator used to generate samples.
|
double |
sample()
Generate a random value sampled from this distribution.
|
getSolverAbsoluteAccuracy, probability, sample
public static final int DEFAULT_BIN_COUNT
public EmpiricalDistribution()
public EmpiricalDistribution(int binCountIn)
binCountIn
- number of binspublic EmpiricalDistribution(int binCountIn, RandomGenerator generator)
RandomGenerator
as the source of random data.binCountIn
- number of binsgenerator
- random data generator (may be null, resulting in default JDK generator)public EmpiricalDistribution(RandomGenerator generator)
RandomGenerator
as the source of random data.generator
- random data generator (may be null, resulting in default JDK generator)public void load(double[] array)
array
- the input data arrayNullArgumentException
- if in is nullpublic void load(URL url) throws IOException
The input file must be an ASCII text file containing one valid numeric entry per line.
url
- url of the input fileIOException
- if an IO error occursNullArgumentException
- if url is nullZeroException
- if URL contains no datapublic void load(File file) throws IOException
The input file must be an ASCII text file containing one valid numeric entry per line.
file
- the input fileIOException
- if an IO error occursNullArgumentException
- if file is nullpublic double getNextValue()
MathIllegalStateException
- if the distribution has not been loadedpublic StatisticalSummary getSampleStats()
StatisticalSummary
describing this distribution.
Preconditions:
IllegalStateException
- if the distribution has not been loadedpublic int getBinCount()
public List<SummaryStatistics> getBinStats()
SummaryStatistics
instances containing
statistics describing the values in each of the bins. The list is
indexed on the bin number.public double[] getUpperBounds()
Returns a fresh copy of the array of upper bounds for the bins. Bins are:
[min,upperBounds[0]],(upperBounds[0],upperBounds[1]],..., (upperBounds[binCount-2], upperBounds[binCount-1] =
max].
Note: In versions 1.0-2.0 of commons-math, this method incorrectly returned the array of probability generator
upper bounds now returned by getGeneratorUpperBounds()
.
public double[] getGeneratorUpperBounds()
Returns a fresh copy of the array of upper bounds of the subintervals of [0,1] used in generating data from the empirical distribution. Subintervals correspond to bins with lengths proportional to bin counts.
In versions 1.0-2.0 of commons-math, this array was (incorrectly) returned by getUpperBounds()
.
public boolean isLoaded()
public void reSeed(long seed)
getNextValue()
.seed
- random generator seedpublic double probability(double x)
X
whose values are distributed according
to this distribution, this method returns P(X = x)
. In other
words, this method represents the probability mass function (PMF)
for the distribution.probability
in interface RealDistribution
probability
in class AbstractRealDistribution
x
- the point at which the PMF is evaluatedpublic double density(double x)
x
. In general, the PDF is
the derivative of the CDF
.
If the derivative does not exist at x
, then an appropriate
replacement should be returned, e.g. Double.POSITIVE_INFINITY
, Double.NaN
, or the limit inferior
or limit superior of the
difference quotient.
Returns the kernel density normalized so that its integral over each bin equals the bin mass.
Algorithm description:
x
- the point at which the PDF is evaluatedx
public double cumulativeProbability(double x)
X
whose values are distributed according
to this distribution, this method returns P(X <= x)
. In other
words, this method represents the (cumulative) distribution function
(CDF) for this distribution.
Algorithm description:
x
- the point at which the CDF is evaluatedx
public double inverseCumulativeProbability(double p)
X
distributed according to this distribution, the
returned value is
inf{x in R | P(X<=x) >= p}
for 0 < p <= 1
,inf{x in R | P(X<=x) > 0}
for p = 0
.RealDistribution.getSupportLowerBound()
for p = 0
,RealDistribution.getSupportUpperBound()
for p = 1
.Algorithm description:
inverseCumulativeProbability
in interface RealDistribution
inverseCumulativeProbability
in class AbstractRealDistribution
p
- the cumulative probabilityp
-quantile of this distribution
(largest 0-quantile for p = 0
)public double getNumericalMean()
Double.NaN
if it is not definedpublic double getNumericalVariance()
Double.POSITIVE_INFINITY
as
for certain cases in TDistribution
) or Double.NaN
if it
is not definedpublic double getSupportLowerBound()
inverseCumulativeProbability(0)
. In other words, this
method must return
inf {x in R | P(X <= x) > 0}
.
Double.NEGATIVE_INFINITY
)public double getSupportUpperBound()
inverseCumulativeProbability(1)
. In other words, this
method must return
inf {x in R | P(X <= x) = 1}
.
Double.POSITIVE_INFINITY
)public boolean isSupportLowerBoundInclusive()
public boolean isSupportUpperBoundInclusive()
public boolean isSupportConnected()
public double sample()
sample
in interface RealDistribution
sample
in class AbstractRealDistribution
public void reseedRandomGenerator(long seed)
reseedRandomGenerator
in interface RealDistribution
reseedRandomGenerator
in class AbstractRealDistribution
seed
- the new seedCopyright © 2019 CNES. All rights reserved.