z-test, t-test, ANOVA and chi-squared tests, binomial distribution

Variance

Descriptive

SD: Standard Deviation, σ²
$s^2_n=\frac{1}{n}\sum(x_i - \bar{x})^2$
Bessel’s correction
- $s^2=s^2_n\frac{n}{n-1}=\frac{1}{n-1}\sum(x_i - \bar{x})^2$

Inferential

SE: Standard Error
$SE_{\bar{x}} = \frac{s}{\sqrt{n}}$
s² is an estimation of σ² the variance of the population.
The higher the number of elements of the sample, the lower the SE.

Margin of error

\[ME = \\pm z^\* \\cdot SE = \\pm z^\* \\cdot\\frac{\\sigma}{\\sqrt{n}}\]

Types of tests

	degrees of freedom	Obvjective	Conditions	Formula
One sample
z-test	-	$\bar{x}$ vs μ	- Normal distribution -σ and μ are known	$z=\frac{\bar{x}-\mu}{\frac{\sigma}{\sqrt{n}}}$
t-test	n-1	$\bar{x}$ vs μ	- Normal distribution -σ unknown -μ known	$t=\frac{\bar{x}-\mu}{\frac{s}{\sqrt{n}}}$
Two sample
t-test independent samples	df₁ + df₂ = n₁ + n₂ − 2	- σ₁,σ₂ unknown - σ₁ ∼ σ₂	- Normal distribution -$s^2_p=\frac{(n_1-1)s^2_1+(n_2-1)s^2_2}{n_1+n_2-2}$	$t=\frac{\bar{x}_1-\bar{x}_2}{s_p\sqrt{\frac{1}{n_1}+\frac{1}{n_2}}}$
t-test dependent samples	n − 1	- σ₁,σ₂ unknown - σ₁ ∼ σ₂ d_i = x_1i − x_2i	- Normal distribution - 2 dependent samples (pre-treatment, post-treatment) $s_d= \sqrt{\frac{\sum(d_i-\bar{d})^2}{n}}$	$t=\frac{\bar{d}}{\frac{s_d}{\sqrt{n}}}$
Three or more samples
One way ANOVA	df_btw = k − 1 df_w = N − K N = ∑n_k N = number of elements k= number of groups	- diff 3 or more population means	- Normal distribution -s₁², s₂² sample variances	$F= \frac{\frac{SS_{btw}}{df_{btw}}}{\frac{SS_w}{df_w}} = \frac{\sum n_k(\bar{x}_k-\bar{x}_G)^2/(K-1)}{\sum_{k=1}^{K}\sum_{i=1}^{n_k}(x_i-\bar{x}_k)^2/(N-K)}$

ANOVA

\[MS\_{between}=\\frac{SS\_{btw}}{df\_{btw}}= \\frac{\\sum\_{k=1}^{K}n\_k(\\bar{x}\_k-\\bar{x}\_G)^2}{K-1}\] \[MS\_{within}= \\frac{SS\_w}{df\_w} = \\frac{\\sum\_{k=1}^{K}\\sum\_{i=1}^{n\_k}(x\_i-\\bar{x}\_k)^2}{N-K}\]

for N the total number of elements and K the total number of groups.

\[F= \\frac{\\frac{SS\_{btw}}{df\_{btw}}}{\\frac{SS\_w}{df\_w}} = \\frac{\\sum n\_k(\\bar{x}\_k-\\bar{x}\_G)^2/(K-1)}{\\sum\_{k=1}^{K}\\sum\_{i=1}^{n\_k}(x\_i-\\bar{x}\_k)^2/(N-K)}\]

F-statistic characteristics

∀ > 0
- skewed

F-distribution

Cohen’s d for multiple comparisons -> effect size —————————————————-

\[d=\\frac{\\bar{x}\_1-\\bar{x}\_2}{\\sqrt{MS\_{within}}}=\\frac{\\bar{x}\_1-\\bar{x}\_2}{\\sqrt{\\frac{\\sum(x\_i-\\bar{x}\_k)^2}{N-k}}}\]

Eta-squared η²

η² : proportion of total variation due to between group differences (explained variation)

\[\\eta^2=\\frac{SS\_{between}}{SS\_{total}}\]

Correlation coefficient (Pearson’s r)

For a population

$\\rho\_{X,Y}= \\frac{cov(X,Y)}{\\sigma\_X\\cdot \\sigma\_Y}$ Pearson’s correlation coefficient when applied to a population is commonly represented by the Greek letter ρ and may be referred to as the population correlation coefficient or the population Pearson correlation coefficient.

Where cov(X, Y)=E[(X − μ_X)(Y − μ_Y)] for E[X] the expected value of X or mean of X.

When

=1 this means that data lies in perfect line.

For a sample

Pearson’s correlation coefficient is represented as r when applied to a sample and it is called sample correlation coefficient or sample Pearson correlation coefficient.

\[r= \\frac{\\sum^n\_{i=1}(x\_i-\\bar{x})(y\_i - \\bar{y})}{\\sqrt{\\sum^n\_{i=1}(x\_i-\\bar{x})^2}\\sqrt{\\sum^n\_{i=1}(y\_i-\\bar{y})^2}}\]

Hypothesis testing

\[\\begin{align\*}H\_o:&\\rho=0 \\\\ H\_A: &\\rho<0 \\\\ &\\rho >0 \\\\ &\\rho\\neq0 \\end{align\*}\]

Student’s t-distribution

\[t=r\\sqrt{\\frac{n-2}{1-r^2}}\]

Which is a t-distribution with d**f = n − 2, for n the number of elements in the sample.

Regression

Linear regression: $\hat{y}=a+bx$

\[b = \\frac{\\sum\_{i=1}^n(x\_i-\\bar{x})(y\_i-\\bar{y})}{\\sum\_{i=1}^n(x\_i-\\bar{x})^2} = r \\frac{s\_y}{s\_x}\]

SD: how far values will fall from the regression line.

SD of the estimate = $\sqrt{\frac{\sum(y-\hat{y})^2}{n-2}}$

$residuals=\sum{(y_i -\hat{y_i})^2}$

Line of best fit: minimizes residuals

F-distribution

Confidence interval (CI) ————————

Population

β₀: population y_int
β₁: population slope

Sample

a : sample y_int
b : sample slope

Hypothesis testing

\[\\begin{align\*}H\_o:&\\beta\_1=0 \\\\ H\_A: &\\beta\_1\\neq0 \\\\ &\\beta\_1>0 \\\\ &\\beta\_1<0 \\end{align\*}\]

d**f = n − 2, for n the number of elements in the sample.

χ² test for independence

\[\\chi^2 = \\sum\_{i=1}^n{\\frac{(f\_{oi} - f\_{ei})^2}{f\_{ei}}}\]

∀ χ² > 0
∀ one-directional test
χ_cri**t²

1 variable with ≠ responses

d**f = n − 1

2 or more variables

d**f = (n_row**s − 1)(n_col**s − 1) for N the number of categories

\[f\_{ei} = \\frac{(column\\; total)(row\\,total)}{grand \\,total}\]

Binomial distribution

Propability: $P(X=r)= \binom {n} {r}p^r(1-p)^{n-r}$
Mean: μ = n**p
Variance: σ² = n**p(1 − p)
Standard deviation $s = \sqrt{np(1-p)}$
$SE = \sqrt{\frac{p(1-p)}{n}}$

Confidence Interval (CI)

$\hat{p}=\frac{x}{N}$

$SE = \sqrt{\frac{\hat{p}(1-\hat{p})}{N}}$

A distribution can be considered normal if $N\hat{p}>5$ or $N(1-\hat{p})>5$

Margin of error $m = z\\cdot SE = z\\cdot\\sqrt{\\frac{\\hat{p}(1-\\hat{p})}{N}}$

Pooled Standard Error (*S**E*_p)

Comparing two samples

X_control, X_experimen**t, N_control, N_experimen**t

$\\hat{p}\_{p}= \\frac{X\_{control}+X\_{exp}}{N\_{control}+N\_{exp}}$ Pooled Standard Error (S**E_p) $SE\_{p}= \\sqrt{\\hat{p}\_p\\cdot(1-\\hat{p}\_p)\\left(\\frac{1}{N\_{control}}+\\frac{1}{N\_{exp}}\\right)}$ $\\hat{d}=\\hat{p}\_{exp}-\\hat{p}\_{control}$

\[\\begin{align\*}&H\_0:& d=0 &\\rightarrow \\hat{d}\\sim N(0,SE\_{p}) \\\\ &H\_A:& d\\neq 0 & \\rightarrow Reject\\; Null:|\\hat{d}| > z^\* SE\_{p}\\end{align\*}\]

Power-sensitivity

References: ———–

Pearsons’s r: Wikipedia

z-test, t-test, ANOVA and chi-squared tests, binomial distribution

Variance

Descriptive

Inferential

Margin of error

Types of tests

ANOVA

F-statistic characteristics

Eta-squared η2

Correlation coefficient (Pearson’s r)

For a population

For a sample

Hypothesis testing

Student’s t-distribution

Regression

Population

Sample

Hypothesis testing

χ2 test for independence

1 variable with ≠ responses

2 or more variables

Binomial distribution

Confidence Interval (CI)

Pooled Standard Error (S**Ep)

Comparing two samples

Eta-squared η²

χ² test for independence

Pooled Standard Error (*S**E*_p)