Developer-Guide/phpOMS/math/statistics.md

3.0 KiB

Statistics

Definitions

Range


range = max(values) - min(values)
ArrayUtils::range($values);

Mean (arithmetic)


\mu = mean = \bar{X} = \frac{1}{n}\sum_{i=1}^{n}a_i
Average::arithmeticMean($values);

Variance


\sigma^{2} = Var(X) = \frac{1}{N} \sum_{i = 1}^{N}\left(x_{i} - \bar{X}\right)^{2}
MeasureOfDispersion::empiricalVariance($x, $y);
MeasureOfDispersion::sampleVariance($x, $y);

The sample variance calculates with N - 1

Covariance


cov(X,Y) = \frac{1}{N} \sum_{i = 1}^{N}\left(x_{i} - \bar{X}\right)\left(y_{i} - \bar{Y}\right)
MeasureOfDispersion::empiricalCovariance($x, $y);
MeasureOfDispersion::sampleCovariance($x, $y);

The sample covariance calculates with N - 1

Variable types

Variables are characteristics (i.e. height, weight)

Quantitive

Quantitative variables represent a measer, they are numeric.

Discrete

Countable and finite possibilities/results/options.

Continuous

Not countable and infinite number of possibilities/results/options.

Qualitative

Qualitative variables represent categories/levels, they are not numeric.

Nominal

Cannot be ordered, they have the same importance/level/value (i.e. hair color). A category can also by yes/no (= two levels).

Ordinal

Can be ordered, they have different importance/level/value (i.e. risk severity)

Transformation

It is possible to transfrom variables:

  • Continuous to discrete: by removing steps in between (i.e. age only in years)
  • Quantitive to qualitive: by asigning numeric ranges a quality (i.e. school grades in pass and fail)

Tests

Chi-square

Goodness of fit test

This tests if observed values follows expected/known proportions (e.g. distributions.)

  • H0: The observation follows the expected frequency/distribution (i.e. normal distribution)
  • H1: The observation doesn't follow the expected frequency/distribution (i.e. normal distribution)

If H0 can be discarded or not depends on the significance (p-value).

ChiSquaredDistribution::testHypothesis($observed, $expected, $significance = 0.05, $degreesOfFreedom = 0);

Test of independence

This tests if there is a relationship between two categorical variables.

  • H0: The variables are independent (there is no relationship between them)
  • H1: The variables are dependent (there is a relationship between them)

t-test

One sample

This tests if the observed mean is different from a expected mean.

Two samples

This tests if the observed mean or median is different from two samples.

Correlation

Pearson correlation

Measures how two variables are related to each other (do they perform similarly, opposite or not related at all).


\rho_{XY} = \frac{cov(X, Y)}{\sigma_X \sigma_Y}
Correlation::bravaisPersonCorrelationCoefficientPopulation($x, $y);
Correlation::bravaisPersonCorrelationCoefficientSample($x, $y);

The sample correlation calculates with N - 1