public class ChiSquareTest extends Object
This implementation handles both known and unknown distributions.
Two samples tests can be used when the distribution is unknown a priori but provided by one sample, or when the hypothesis under test is that the two samples come from the same underlying distribution.
Constructor and Description |
---|
ChiSquareTest() |
Modifier and Type | Method and Description |
---|---|
double |
chiSquare(double[] expected,
long[] observed)
|
double |
chiSquare(long[][] counts)
Computes the Chi-Square statistic associated with a
chi-square test of independence based on the input
counts array, viewed as a two-way table. |
double |
chiSquareDataSetsComparison(long[] observed1,
long[] observed2)
Computes a Chi-Square
two sample test statistic comparing bin frequency counts in
observed1 and observed2
. |
double |
chiSquareTest(double[] expected,
long[] observed)
Returns the observed significance level, or
p-value, associated with a
Chi-square goodness of fit test comparing the
observed frequency counts to those in the
expected array. |
boolean |
chiSquareTest(double[] expected,
long[] observed,
double alpha)
Performs a
Chi-square goodness of fit test evaluating the null hypothesis that the
observed counts conform to the frequency distribution described by the expected
counts, with significance level
alpha . |
double |
chiSquareTest(long[][] counts)
Returns the observed significance level, or
p-value, associated with a
chi-square test of independence based on the input
counts array, viewed as a two-way table. |
boolean |
chiSquareTest(long[][] counts,
double alpha)
Performs a
chi-square test of independence evaluating the null hypothesis that the
classifications represented by the counts in the columns of the input 2-way table
are independent of the rows, with significance level
alpha . |
double |
chiSquareTestDataSetsComparison(long[] observed1,
long[] observed2)
Returns the observed significance level, or p-value, associated with a Chi-Square two
sample test comparing bin frequency counts in
observed1 and observed2 . |
boolean |
chiSquareTestDataSetsComparison(long[] observed1,
long[] observed2,
double alpha)
Performs a Chi-Square two sample test comparing two binned data sets.
|
public double chiSquare(double[] expected, long[] observed)
observed
and expected
frequency counts.
This statistic can be used to perform a Chi-Square test evaluating the null hypothesis that the observed counts follow the expected distribution.
Preconditions:
If any of the preconditions are not met, an IllegalArgumentException
is thrown.
Note: This implementation rescales the expected
array if necessary to ensure that
the sum of the expected and observed counts are equal.
observed
- array of observed frequency countsexpected
- array of expected frequency countsNotPositiveException
- if observed
has negative entriesNotStrictlyPositiveException
- if expected
has entries that are
not strictly positiveDimensionMismatchException
- if the arrays length is less than 2public double chiSquareTest(double[] expected, long[] observed)
observed
frequency counts to those in the
expected
array.
The number returned is the smallest significance level at which one can reject the null hypothesis that the observed counts conform to the frequency distribution described by the expected counts.
Preconditions:
If any of the preconditions are not met, an IllegalArgumentException
is thrown.
Note: This implementation rescales the expected
array if necessary to ensure that
the sum of the expected and observed counts are equal.
observed
- array of observed frequency countsexpected
- array of expected frequency countsNotPositiveException
- if observed
has negative entriesNotStrictlyPositiveException
- if expected
has entries that are
not strictly positiveDimensionMismatchException
- if the arrays length is less than 2MaxCountExceededException
- if an error occurs computing the p-valuepublic boolean chiSquareTest(double[] expected, long[] observed, double alpha)
alpha
. Returns true iff the null
hypothesis can be rejected with 100 * (1 - alpha) percent confidence.
Example:
To test the hypothesis that observed
follows expected
at the 99% level, use
chiSquareTest(expected, observed, 0.01)
Preconditions:
0 < alpha < 0.5
If any of the preconditions are not met, an IllegalArgumentException
is thrown.
Note: This implementation rescales the expected
array if necessary to ensure that
the sum of the expected and observed counts are equal.
observed
- array of observed frequency countsexpected
- array of expected frequency countsalpha
- significance level of the testNotPositiveException
- if observed
has negative entriesNotStrictlyPositiveException
- if expected
has entries that are
not strictly positiveDimensionMismatchException
- if the arrays length is less than 2OutOfRangeException
- if alpha
is not in the range (0, 0.5]MaxCountExceededException
- if an error occurs computing the p-valuepublic double chiSquare(long[][] counts)
counts
array, viewed as a two-way table.
The rows of the 2-way table are count[0], ... , count[count.length - 1]
Preconditions:
counts
must have at least 2 columns and at least 2 rows.
If any of the preconditions are not met, an IllegalArgumentException
is thrown.
counts
- array representation of 2-way tableNullArgumentException
- if the array is nullDimensionMismatchException
- if the array is not rectangularNotPositiveException
- if counts
has negative entriespublic double chiSquareTest(long[][] counts)
counts
array, viewed as a two-way table.
The rows of the 2-way table are count[0], ... , count[count.length - 1]
Preconditions:
counts
must have at least 2 columns and at least 2 rows.
If any of the preconditions are not met, an IllegalArgumentException
is thrown.
counts
- array representation of 2-way tableNullArgumentException
- if the array is nullDimensionMismatchException
- if the array is not rectangularNotPositiveException
- if counts
has negative entriesMaxCountExceededException
- if an error occurs computing the p-valuepublic boolean chiSquareTest(long[][] counts, double alpha)
alpha
.
Returns true iff the null hypothesis can be rejected with 100 * (1 - alpha) percent
confidence.
The rows of the 2-way table are count[0], ... , count[count.length - 1]
Example:
To test the null hypothesis that the counts in count[0], ... , count[count.length - 1]
all
correspond to the same underlying probability distribution at the 99% level, use
chiSquareTest(counts, 0.01)
Preconditions:
counts
must have at least 2 columns and at least 2 rows.
If any of the preconditions are not met, an IllegalArgumentException
is thrown.
counts
- array representation of 2-way tablealpha
- significance level of the testNullArgumentException
- if the array is nullDimensionMismatchException
- if the array is not rectangularNotPositiveException
- if counts
has any negative entriesOutOfRangeException
- if alpha
is not in the range (0, 0.5]MaxCountExceededException
- if an error occurs computing the p-valuepublic double chiSquareDataSetsComparison(long[] observed1, long[] observed2)
Computes a Chi-Square
two sample test statistic comparing bin frequency counts in observed1
and observed2
. The sums of frequency counts in the two samples are not required to be the same. The formula used to compute
the test statistic is
∑[(K * observed1[i] - observed2[i]/K)2 / (observed1[i] + observed2[i])]
where K = &sqrt;[&sum(observed2 / ∑(observed1)]
This statistic can be used to perform a Chi-Square test evaluating the null hypothesis that both observed counts follow the same distribution.
Preconditions:
observed1
and observed2
must have the same length and their common
length must be at least 2.
If any of the preconditions are not met, an IllegalArgumentException
is thrown.
observed1
- array of observed frequency counts of the first data setobserved2
- array of observed frequency counts of the second data setDimensionMismatchException
- the the length of the arrays does not matchNotPositiveException
- if any entries in observed1
or observed2
are negativeZeroException
- if either all counts of observed1
or observed2
are zero, or if the count at
some index is zero
for both arrayspublic double chiSquareTestDataSetsComparison(long[] observed1, long[] observed2)
Returns the observed significance level, or p-value, associated with a Chi-Square two
sample test comparing bin frequency counts in observed1
and observed2
.
The number returned is the smallest significance level at which one can reject the null hypothesis that the observed counts conform to the same distribution.
See chiSquareDataSetsComparison(long[], long[])
for details on the formula used to compute the test
statistic. The degrees of of freedom used to perform the test is one less than the common length of the input
observed count arrays.
observed1
and observed2
must have the same length and their common
length must be at least 2.
If any of the preconditions are not met, an IllegalArgumentException
is thrown.
observed1
- array of observed frequency counts of the first data setobserved2
- array of observed frequency counts of the second data setDimensionMismatchException
- the the length of the arrays does not matchNotPositiveException
- if any entries in observed1
or observed2
are negativeZeroException
- if either all counts of observed1
or observed2
are zero, or if the count at
the same index is zero
for both arraysMaxCountExceededException
- if an error occurs computing the p-valuepublic boolean chiSquareTestDataSetsComparison(long[] observed1, long[] observed2, double alpha)
Performs a Chi-Square two sample test comparing two binned data sets. The test evaluates the null hypothesis that
the two lists of observed counts conform to the same frequency distribution, with significance level
alpha
. Returns true iff the null hypothesis can be rejected with 100 * (1 - alpha) percent
confidence.
See chiSquareDataSetsComparison(long[], long[])
for details on the formula used to compute the Chisquare
statistic used in the test. The degrees of of freedom used to perform the test is one less than the common length
of the input observed count arrays.
observed1
and observed2
must have the same length and their common
length must be at least 2. 0 < alpha < 0.5
If any of the preconditions are not met, an IllegalArgumentException
is thrown.
observed1
- array of observed frequency counts of the first data setobserved2
- array of observed frequency counts of the second data setalpha
- significance level of the testDimensionMismatchException
- the the length of the arrays does not matchNotPositiveException
- if any entries in observed1
or observed2
are negativeZeroException
- if either all counts of observed1
or observed2
are zero, or if the count at
the same index is zero
for both arraysOutOfRangeException
- if alpha
is not in the range (0, 0.5]MaxCountExceededException
- if an error occurs performing the testCopyright © 2020 CNES. All rights reserved.