Developer-Guide/math/statistics.md

136 lines
3.0 KiB
Markdown
Executable File

# Statistics
## Definitions
### Range
$$
range = max(values) - min(values)
$$
```php
ArrayUtils::range($values);
```
### Mean (arithmetic)
$$
\mu = mean = \bar{X} = \frac{1}{n}\sum_{i=1}^{n}a_i
$$
```php
Average::arithmeticMean($values);
```
### Variance
$$
\sigma^{2} = Var(X) = \frac{1}{N} \sum_{i = 1}^{N}\left(x_{i} - \bar{X}\right)^{2}
$$
```php
MeasureOfDispersion::empiricalVariance($x, $y);
MeasureOfDispersion::sampleVariance($x, $y);
```
> The sample variance calculates with N - 1
### Covariance
$$
cov(X,Y) = \frac{1}{N} \sum_{i = 1}^{N}\left(x_{i} - \bar{X}\right)\left(y_{i} - \bar{Y}\right)
$$
```php
MeasureOfDispersion::empiricalCovariance($x, $y);
MeasureOfDispersion::sampleCovariance($x, $y);
```
> The sample covariance calculates with N - 1
## Variable types
Variables are characteristics (i.e. height, weight)
### Quantitive
Quantitative variables represent a measer, they are numeric.
#### Discrete
Countable and finite possibilities/results/options.
#### Continuous
Not countable and infinite number of possibilities/results/options.
### Qualitative
Qualitative variables represent categories/levels, they are not numeric.
#### Nominal
Cannot be ordered, they have the same importance/level/value (i.e. hair color). A category can also by yes/no (= two levels).
#### Ordinal
Can be ordered, they have different importance/level/value (i.e. risk severity)
### Transformation
It is possible to transfrom variables:
* Continuous to discrete: by removing steps in between (i.e. age only in years)
* Quantitive to qualitive: by asigning numeric ranges a quality (i.e. school grades in pass and fail)
## Tests
### Chi-square
#### Goodness of fit test
This tests if observed values follows expected/known proportions (e.g. distributions.)
* H0: The observation follows the expected frequency/distribution (i.e. normal distribution)
* H1: The observation doesn't follow the expected frequency/distribution (i.e. normal distribution)
> If H0 can be discarded or not depends on the significance (p-value).
```php
ChiSquaredDistribution::testHypothesis($observed, $expected, $significance = 0.05, $degreesOfFreedom = 0);
```
#### Test of independence
This tests if there is a relationship between two categorical variables.
* H0: The variables are independent (there is no relationship between them)
* H1: The variables are dependent (there is a relationship between them)
### t-test
#### One sample
This tests if the observed mean is different from a expected mean.
#### Two samples
This tests if the observed mean or median is different from two samples.
### Correlation
#### Pearson correlation
Measures how two variables are related to each other (do they perform similarly, opposite or not related at all).
$$
\rho_{XY} = \frac{cov(X, Y)}{\sigma_X \sigma_Y}
$$
```php
Correlation::bravaisPersonCorrelationCoefficientPopulation($x, $y);
Correlation::bravaisPersonCorrelationCoefficientSample($x, $y);
```
> The sample correlation calculates with N - 1