Inferential Statistics - Complete Tutorial

What is Inferential Statistics?

🎯

Making Predictions

Use sample data to make predictions and inferences about a larger population

🔍

Testing Hypotheses

Scientifically test claims and theories using statistical evidence

📈

Drawing Conclusions

Make data-driven decisions with quantified confidence levels

Fundamental Concepts in Inferential Statistics

Population vs Sample: A population (denoted N) includes all members of a specified group that we want to study. A sample (denoted n) is a smaller subset randomly selected from the population to make inferences about the whole. Since studying entire populations is often impractical or impossible, we use representative samples to estimate population characteristics.

Parameters vs Statistics:
• Parameters (Greek letters): Unknown population values we want to estimate
  - μ (mu): Population mean
  - σ (sigma): Population standard deviation
  - σ² (sigma squared): Population variance
  - p: Population proportion
• Statistics (Roman letters): Known sample values calculated from data
  - x̄ (x-bar): Sample mean
  - s: Sample standard deviation
  - s²: Sample variance
  - p̂ (p-hat): Sample proportion

Central Limit Theorem (CLT): One of the most important theorems in statistics. For any population with mean μ and standard deviation σ, as sample size n increases (typically n ≥ 30), the sampling distribution of the sample mean x̄ approaches a normal distribution with:
• Mean: μ_x̄ = μ
• Standard Error: σ_x̄ = σ / √n
This holds regardless of the shape of the original population distribution!

Standard Error (SE): The standard deviation of a sampling distribution. It measures how much sample statistics typically vary from the population parameter. Formula: SE = s / √n, where s is the sample standard deviation and n is the sample size. Larger samples have smaller standard errors, leading to more precise estimates.

Sampling Distribution: The probability distribution of a statistic (like x̄ or p̂) obtained from all possible samples of a given size from a population. Understanding sampling distributions is crucial for making inferences about populations from samples.

Law of Large Numbers: As the sample size increases, the sample mean x̄ converges to the population mean μ. This justifies using sample statistics to estimate population parameters.

Hypothesis Testing

Hypothesis testing is a systematic statistical method used to make decisions about population parameters based on sample data. It provides a framework for determining whether observed differences or effects are statistically significant or merely due to random chance.

1

State Hypotheses

H₀ (Null Hypothesis): The default assumption stating no effect, no difference, or no relationship exists (e.g., μ = μ₀ or p₁ = p₂)

H₁ or H_a (Alternative Hypothesis): The research claim we're trying to prove. Can be:

• Two-tailed: μ ≠ μ₀ (tests for any difference)

• Right-tailed: μ > μ₀ (tests for increase)

• Left-tailed: μ < μ₀ (tests for decrease)

2

Set Significance Level

α (Alpha): Maximum probability of Type I error we're willing to accept. Represents the threshold for "statistical significance."

Common levels:

• α = 0.01 (1%): Very stringent, for critical decisions

• α = 0.05 (5%): Standard in most research

• α = 0.10 (10%): More lenient, exploratory research

Critical regions are determined by α: For two-tailed tests, split α/2 in each tail

3

Calculate Test Statistic

Compute the appropriate test statistic from sample data:

Z-score: z = (x̄ - μ₀) / (σ / √n) when σ is known

T-score: t = (x̄ - μ₀) / (s / √n) when σ is unknown

Degrees of Freedom: df = n - 1 for t-tests

The test statistic measures how many standard errors the sample statistic is from the hypothesized parameter.

4

Make Decision & Interpret

Two methods for decision:

1. P-value approach: If p-value < α, reject H₀

2. Critical value approach: If test statistic falls in critical region, reject H₀

Conclusion: State decision in context of the original problem with practical interpretation, not just statistical significance.

Types of Errors in Hypothesis Testing

Type I Error (α) - False Positive

Definition: Rejecting a true null hypothesis H₀

Probability: P(Type I Error) = α (significance level)

Consequence: Claiming an effect exists when it doesn't

Example: Convicting an innocent person, concluding a drug works when it doesn't (false alarm)

Control: Set a lower α (e.g., 0.01 instead of 0.05)

Type II Error (β) - False Negative

Definition: Failing to reject a false null hypothesis H₀

Probability: P(Type II Error) = β

Power: 1 - β = probability of correctly rejecting false H₀

Example: Acquitting a guilty person, concluding a drug doesn't work when it does (missed detection)

Control: Increase sample size, use higher α, or increase effect size

Trade-off: Decreasing α (Type I error) increases β (Type II error) and vice versa. The ideal balance depends on the relative costs of each type of error in your specific context.

Statistical Power: Power = 1 - β. Typically aim for power ≥ 0.80 (80% chance of detecting a real effect)

Understanding P-Values

Definition: The p-value is the probability of obtaining test results at least as extreme as the observed results, assuming that the null hypothesis H₀ is true.

Formula: p-value = P(observing test statistic ≥ |observed value| | H₀ is true)

Interpretation Guide: Lower p-values provide stronger evidence against H₀

p < 0.01
Very Strong Evidence Against H₀

0.01 ≤ p < 0.05
Strong Evidence Against H₀

0.05 ≤ p < 0.10
Moderate Evidence Against H₀

p ≥ 0.10
Weak/No Evidence Against H₀

Important: A p-value is NOT the probability that H₀ is true! It's the probability of the data given H₀ is true. Also, "not significant" does NOT prove H₀ is true.

Confidence Intervals

A confidence interval (CI) provides a range of plausible values for an unknown population parameter, constructed from sample data. The confidence level represents the long-run proportion of such intervals that would contain the true parameter if we repeated the sampling process many times.

Confidence Interval Formulas

General Form: CI = Point Estimate ± Margin of Error
CI = Point Estimate ± (Critical Value × Standard Error)

1. For Population Mean μ (σ known, or n ≥ 30):

x̄ ± z_α/2 × (σ / √n)

Where: x̄ = sample mean, z_α/2 = critical z-value,
σ = population standard deviation, n = sample size
Standard Error: SE = σ / √n
Margin of Error: ME = z_α/2 × SE

2. For Population Mean μ (σ unknown, small sample):

x̄ ± t_{α/2, df} × (s / √n)

Where: s = sample standard deviation,
t_{α/2, df} = critical t-value with df = n - 1 degrees of freedom
Standard Error: SE = s / √n
Use when: n < 30 and population is approximately normal

3. For Population Proportion p:

p̂ ± z_α/2 × √[p̂(1 - p̂) / n]

Where: p̂ = sample proportion = x / n
x = number of successes, n = sample size
Standard Error: SE = √[p̂(1 - p̂) / n]
Condition: np̂ ≥ 10 and n(1 - p̂) ≥ 10

4. For Difference Between Two Means (independent samples):

(x̄₁ - x̄₂) ± t_{α/2, df} × √[(s₁² / n₁) + (s₂² / n₂)]

Where: x̄₁, x̄₂ = sample means, s₁, s₂ = sample standard deviations
n₁, n₂ = sample sizes
Pooled Standard Error: SE = √[(s₁² / n₁) + (s₂² / n₂)]

Common Confidence Levels & Critical Values

90%

z_α/2 = 1.645

α = 0.10, α/2 = 0.05

Narrower interval, less confident

Used for: Preliminary studies

95%

z_α/2 = 1.96

α = 0.05, α/2 = 0.025

Standard in most research

Used for: General research, publications

99%

z_α/2 = 2.576

α = 0.01, α/2 = 0.005

Wider interval, more confident

Used for: Critical decisions, medical studies

Relationship: Confidence Level = (1 - α) × 100%
Width Trade-off: Higher confidence → Wider interval → Less precision
Sample Size Effect: Larger n → Smaller SE → Narrower interval (more precision)

Correct Interpretation of Confidence Intervals

✓ CORRECT Interpretations:

• "We are 95% confident that the true population mean μ lies between [lower, upper]"

• "If we repeated this sampling process many times, about 95% of the constructed intervals would contain the true μ"

• "The interval [lower, upper] was constructed using a method that captures the true parameter 95% of the time"

✗ INCORRECT Interpretations:

• "There is a 95% probability that μ is in this interval" (μ is fixed, not random!)

• "95% of the data falls in this interval" (CI is for the parameter, not data)

• "There is a 95% chance the interval contains μ" (after construction, it either does or doesn't)

Key Understanding: The confidence level (95%) refers to the procedure, not a specific interval. Once calculated, a particular interval either contains the true parameter (with certainty) or it doesn't. The "confidence" is in the method's long-run success rate.

Factors Affecting Width:
• Sample size (n) ↑ → Width ↓
• Confidence level ↑ → Width ↑
• Population variability (σ) ↑ → Width ↑

Statistical Tests

T-Tests (When σ is Unknown)

One-Sample T-Test

Purpose: Compare sample mean x̄ to a known/hypothesized population mean μ₀

Test Statistic:
t = (x̄ - μ₀) / (s / √n)

Where:
• x̄ = sample mean
• μ₀ = hypothesized population mean
• s = sample standard deviation
• n = sample size
• df = n - 1 (degrees of freedom)

Standard Error: SE = s / √n

Example: Does the average height of students (x̄ = 170 cm) differ from the national average (μ₀ = 168 cm)?
Hypotheses: H₀: μ = 168 vs H₁: μ ≠ 168

Assumptions: Random sample, approximately normal distribution (or n ≥ 30)

Two-Sample T-Test (Independent)

Purpose: Compare means of two independent populations

Test Statistic:
t = (x̄₁ - x̄₂) / √[(s₁² / n₁) + (s₂² / n₂)]

Pooled Variance Method (when σ₁ = σ₂):
s_p² = [(n₁-1)s₁² + (n₂-1)s₂²] / (n₁ + n₂ - 2)
t = (x̄₁ - x̄₂) / [s_p × √(1/n₁ + 1/n₂)]
df = n₁ + n₂ - 2

Welch's Method (unequal variances):
df ≈ smaller of (n₁-1, n₂-1)

Example: Do males (x̄₁ = 82) and females (x̄₂ = 78) have different average test scores?
Hypotheses: H₀: μ₁ = μ₂ vs H₁: μ₁ ≠ μ₂

Assumptions: Independent samples, normal distributions, homogeneity of variance

Paired T-Test (Dependent)

Purpose: Compare means of related/paired observations (before/after, matched pairs)

Test Statistic:
t = d̄ / (s_d / √n)

Where:
• d̄ = mean of differences = Σd_i / n
• d_i = x_i,before - x_i,after
• s_d = standard deviation of differences
• s_d = √[Σ(d_i - d̄)² / (n-1)]
• df = n - 1 (number of pairs - 1)

Example: Did blood pressure decrease after treatment? Before: 140 mmHg, After: 132 mmHg
Hypotheses: H₀: μ_d = 0 vs H₁: μ_d > 0 (one-tailed for decrease)

Assumptions: Paired observations, differences are approximately normal

ANOVA (Analysis of Variance)

Purpose: Compare means across three or more independent groups simultaneously (extension of t-test)

F-Test Statistic:
F = MS_between / MS_within = (Between-Group Variance) / (Within-Group Variance)

Sum of Squares:
• Total: SS_total = Σ(x_ij - x̄_grand)²
• Between Groups: SS_between = Σn_i(x̄_i - x̄_grand)²
• Within Groups: SS_within = Σ(x_ij - x̄_i)²

Mean Squares:
• MS_between = SS_between / df_between
• MS_within = SS_within / df_within

Degrees of Freedom:
• df_between = k - 1 (k = number of groups)
• df_within = N - k (N = total sample size)
• df_total = N - 1

Between-Group Variation: Measures differences among group means. Large values suggest groups have different means.

Within-Group Variation: Measures variability within each group (error/noise). Used as baseline for comparison.

Example: Do students from three teaching methods (A, B, C) have different average scores?
Hypotheses: H₀: μ₁ = μ₂ = μ₃ vs H₁: At least one mean differs
Post-hoc: If F is significant, use Tukey HSD or Bonferroni to find which pairs differ

Assumptions: Independence, normality within groups, homogeneity of variance (equal variances)

Chi-Square (χ²) Tests

Chi-Square Goodness-of-Fit Test

Purpose: Test if observed frequencies match an expected probability distribution

Test Statistic:
χ² = Σ [(O_i - E_i)² / E_i]

Where:
• O_i = observed frequency in category i
• E_i = expected frequency in category i
• Sum over all k categories
• df = k - 1 - p (p = parameters estimated)

Expected Frequency:
E_i = n × P_i (n = total, P_i = expected proportion)

Example: Is a die fair? Roll 600 times, expect each face 100 times
Hypotheses: H₀: Die is fair (P₁=P₂=...=P₆=1/6) vs H₁: Die is not fair

Condition: All E_i ≥ 5

Chi-Square Test of Independence

Purpose: Test if two categorical variables are independent (contingency table analysis)

Test Statistic:
χ² = Σ [(O_ij - E_ij)² / E_ij]

Expected Frequency:
E_ij = (Row_i Total × Column_j Total) / Grand Total

Degrees of Freedom:
df = (r - 1) × (c - 1)
r = number of rows, c = number of columns

Example: Is smoking status independent of gender?
Hypotheses: H₀: Variables are independent vs H₁: Variables are dependent

Condition: At least 80% of cells have E_ij ≥ 5

Z-Test (When σ is Known)

One-Sample Z-Test for Mean

Purpose: Test hypothesis about population mean when population standard deviation σ is known

Test Statistic:
z = (x̄ - μ₀) / (σ / √n)

Where:
• x̄ = sample mean
• μ₀ = hypothesized population mean
• σ = known population standard deviation
• n = sample size

Standard Error: SE = σ / √n

Critical Values (two-tailed):
• α = 0.05: z = ±1.96
• α = 0.01: z = ±2.576
• α = 0.10: z = ±1.645

Example: Testing if sample mean differs from known population mean when σ is known
Use when: σ known, large sample (n ≥ 30), or population is normal

Z-Test for Proportions: z = (p̂ - p₀) / √[p₀(1-p₀)/n]

Regression Analysis

Regression analysis is a powerful statistical method used to model and examine the relationship between one dependent variable (Y) and one or more independent variables (X). It allows us to predict outcomes, quantify relationships, and test hypotheses about how variables relate to each other.

Simple Linear Regression

Purpose: Model the linear relationship between one predictor (X) and one response variable (Y)

Regression Equation:
ŷ = β₀ + β₁x + ε

Where:
• ŷ = Predicted value of Y
• β₀ = Y-intercept (value of Y when X = 0)
• β₁ = Slope (change in Y per one-unit change in X)
• x = Independent variable (predictor)
• ε = Error term (residual)

Calculating Slope (β₁):

β₁ = Σ[(x_i - x̄)(y_i - ȳ)] / Σ(x_i - x̄)²
OR: β₁ = r × (s_y / s_x)

Calculating Intercept (β₀):

β₀ = ȳ - β₁x̄

Where: r = correlation coefficient, s_x = SD of X, s_y = SD of Y

Multiple Linear Regression

Purpose: Predict Y using multiple predictor variables X₁, X₂, ..., X_k

Regression Equation:
ŷ = β₀ + β₁x₁ + β₂x₂ + ... + β_kx_k + ε

Where:
• β₀ = Intercept
• β_i = Partial regression coefficient for X_i
• β_i represents the change in Y for one-unit change in X_i, holding all other predictors constant

Key Advantage: Controls for confounding variables and provides better predictions
Matrix Form: Y = Xβ + ε, solved using: β = (X'X)⁻¹X'Y

Key Regression Metrics & Formulas

R² (Coefficient of Determination)

Definition: Proportion of variance in Y explained by X

R² = 1 - (SS_residual / SS_total)
R² = SS_regression / SS_total

Where:
• SS_total = Σ(y_i - ȳ)²
• SS_regression = Σ(ŷ_i - ȳ)²
• SS_residual = Σ(y_i - ŷ_i)²

Range: 0 to 1 (0% to 100%)

Interpretation: R² = 0.75 means 75% of variance in Y is explained by X

Adjusted R²: R²_adj = 1 - [(1-R²)(n-1)/(n-k-1)] (penalizes extra predictors)

Correlation Coefficient (r)

Definition: Measures strength and direction of linear relationship

r = Σ[(x_i - x̄)(y_i - ȳ)] / √[Σ(x_i - x̄)² × Σ(y_i - ȳ)²]

Relationship with R²:
R² = r² (in simple linear regression)
r = ±√R² (sign matches slope β₁)

Range: -1 to +1

-1 (Perfect Negative) 0 (No Correlation) 1 (Perfect Positive)

Interpretation: |r| > 0.7 = strong, 0.3-0.7 = moderate, < 0.3 = weak

Standard Error of Estimate (SE_est)

Definition: Average distance of observed Y values from regression line

SE_est = √[Σ(y_i - ŷ_i)² / (n - 2)]
SE_est = √[SS_residual / (n - 2)]

Also called: Root Mean Square Error (RMSE)
Interpretation: Units of Y variable

Better fit: Lower SE_est values

Use: Measures prediction accuracy; construct prediction intervals

Linear Regression Assumptions (LINE)

✓ L - Linearity: The relationship between X and Y is linear
Check: Scatter plot should show linear pattern
Fix: Transform variables (log, square root) or use polynomial regression

✓ I - Independence: Observations are independent of each other (no autocorrelation)
Check: Durbin-Watson test, plot residuals over time
Violation: Common in time series data

✓ N - Normality: Residuals (errors) are approximately normally distributed
Check: Q-Q plot, histogram of residuals, Shapiro-Wilk test
Note: Most critical for small samples (n < 30)

✓ E - Equal Variance (Homoscedasticity): Constant variance of residuals across all values of X
Check: Residual plot (residuals vs. fitted values should show random scatter)
Fix: Transform Y variable or use weighted least squares

Additional Considerations:
• No Multicollinearity: (Multiple regression) Predictors should not be highly correlated with each other (check VIF < 10)
• No Outliers/Influential Points: Check Cook's distance, leverage, DFFIT
• Residual Formula: e_i = y_i - ŷ_i (observed - predicted)

Statistical Calculators

One-Sample T-Test Calculator

Sample Mean (x̄):

Population Mean (μ₀):

Sample Standard Deviation (s):

Sample Size (n):

Significance Level (α):

Confidence Interval Calculator

Sample Mean (x̄):

Standard Deviation (s):

Sample Size (n):

Confidence Level:

Z-Test Calculator

Sample Mean (x̄):

Population Mean (μ):

Population Standard Deviation (σ):

Sample Size (n):

Significance Level (α):

Correlation Coefficient Calculator

X Values (comma-separated):

Y Values (comma-separated):

Master Inferential Statistics

What is Inferential Statistics?

Making Predictions

Testing Hypotheses

Drawing Conclusions

Fundamental Concepts in Inferential Statistics

Hypothesis Testing

State Hypotheses

Set Significance Level

Calculate Test Statistic

Make Decision & Interpret

Types of Errors in Hypothesis Testing

Type I Error (α) - False Positive

Type II Error (β) - False Negative

Understanding P-Values

Confidence Intervals

Confidence Interval Formulas

Common Confidence Levels & Critical Values

Correct Interpretation of Confidence Intervals

Statistical Tests

T-Tests (When σ is Unknown)

One-Sample T-Test

Two-Sample T-Test (Independent)

Paired T-Test (Dependent)

ANOVA (Analysis of Variance)

Chi-Square (χ²) Tests

Chi-Square Goodness-of-Fit Test

Chi-Square Test of Independence

Z-Test (When σ is Known)

One-Sample Z-Test for Mean

Regression Analysis

Simple Linear Regression

Multiple Linear Regression

Key Regression Metrics & Formulas

R² (Coefficient of Determination)

Correlation Coefficient (r)

Standard Error of Estimate (SEest)

Linear Regression Assumptions (LINE)

Statistical Calculators

One-Sample T-Test Calculator

Confidence Interval Calculator

Z-Test Calculator

Correlation Coefficient Calculator

Standard Error of Estimate (SE_est)