Your comprehensive guide to understanding statistical inference, hypothesis testing, and data-driven decision making
Use sample data to make predictions and inferences about a larger population
Scientifically test claims and theories using statistical evidence
Make data-driven decisions with quantified confidence levels
Hypothesis testing is a systematic statistical method used to make decisions about population parameters based on sample data. It provides a framework for determining whether observed differences or effects are statistically significant or merely due to random chance.
H₀ (Null Hypothesis): The default assumption stating no effect, no difference, or no relationship exists (e.g., μ = μ₀ or p₁ = p₂)
H₁ or Ha (Alternative Hypothesis): The research claim we're trying to prove. Can be:
• Two-tailed: μ ≠ μ₀ (tests for any difference)
• Right-tailed: μ > μ₀ (tests for increase)
• Left-tailed: μ < μ₀ (tests for decrease)
α (Alpha): Maximum probability of Type I error we're willing to accept. Represents the threshold for "statistical significance."
Common levels:
• α = 0.01 (1%): Very stringent, for critical decisions
• α = 0.05 (5%): Standard in most research
• α = 0.10 (10%): More lenient, exploratory research
Critical regions are determined by α: For two-tailed tests, split α/2 in each tail
Compute the appropriate test statistic from sample data:
Z-score: z = (x̄ - μ₀) / (σ / √n) when σ is known
T-score: t = (x̄ - μ₀) / (s / √n) when σ is unknown
Degrees of Freedom: df = n - 1 for t-tests
The test statistic measures how many standard errors the sample statistic is from the hypothesized parameter.
Two methods for decision:
1. P-value approach: If p-value < α, reject H₀
2. Critical value approach: If test statistic falls in critical region, reject H₀
Conclusion: State decision in context of the original problem with practical interpretation, not just statistical significance.
Definition: Rejecting a true null hypothesis H₀
Probability: P(Type I Error) = α (significance level)
Consequence: Claiming an effect exists when it doesn't
Example: Convicting an innocent person, concluding a drug works when it doesn't (false alarm)
Control: Set a lower α (e.g., 0.01 instead of 0.05)
Definition: Failing to reject a false null hypothesis H₀
Probability: P(Type II Error) = β
Power: 1 - β = probability of correctly rejecting false H₀
Example: Acquitting a guilty person, concluding a drug doesn't work when it does (missed detection)
Control: Increase sample size, use higher α, or increase effect size
Definition: The p-value is the probability of obtaining test results at least as extreme as the observed results, assuming that the null hypothesis H₀ is true.
Formula: p-value = P(observing test statistic ≥ |observed value| | H₀ is true)
Interpretation Guide: Lower p-values provide stronger evidence against H₀
Important: A p-value is NOT the probability that H₀ is true! It's the probability of the data given H₀ is true. Also, "not significant" does NOT prove H₀ is true.
A confidence interval (CI) provides a range of plausible values for an unknown population parameter, constructed from sample data. The confidence level represents the long-run proportion of such intervals that would contain the true parameter if we repeated the sampling process many times.
1. For Population Mean μ (σ known, or n ≥ 30):
2. For Population Mean μ (σ unknown, small sample):
3. For Population Proportion p:
4. For Difference Between Two Means (independent samples):
zα/2 = 1.645
α = 0.10, α/2 = 0.05
Narrower interval, less confident
Used for: Preliminary studies
zα/2 = 1.96
α = 0.05, α/2 = 0.025
Standard in most research
Used for: General research, publications
zα/2 = 2.576
α = 0.01, α/2 = 0.005
Wider interval, more confident
Used for: Critical decisions, medical studies
Relationship: Confidence Level = (1 - α) × 100%
Width Trade-off: Higher confidence → Wider interval → Less precision
Sample Size Effect: Larger n → Smaller SE → Narrower interval (more precision)
• "We are 95% confident that the true population mean μ lies between [lower, upper]"
• "If we repeated this sampling process many times, about 95% of the constructed intervals would contain the true μ"
• "The interval [lower, upper] was constructed using a method that captures the true parameter 95% of the time"
• "There is a 95% probability that μ is in this interval" (μ is fixed, not random!)
• "95% of the data falls in this interval" (CI is for the parameter, not data)
• "There is a 95% chance the interval contains μ" (after construction, it either does or doesn't)
Key Understanding: The confidence level (95%) refers to the procedure, not a specific interval. Once calculated, a particular interval either contains the true parameter (with certainty) or it doesn't. The "confidence" is in the method's long-run success rate.
Factors Affecting Width:
• Sample size (n) ↑ → Width ↓
• Confidence level ↑ → Width ↑
• Population variability (σ) ↑ → Width ↑
Purpose: Compare sample mean x̄ to a known/hypothesized population mean μ₀
Example: Does the average height of students (x̄ = 170 cm) differ from the national average (μ₀ = 168 cm)?
Hypotheses: H₀: μ = 168 vs H₁: μ ≠ 168
Assumptions: Random sample, approximately normal distribution (or n ≥ 30)
Purpose: Compare means of two independent populations
Example: Do males (x̄₁ = 82) and females (x̄₂ = 78) have different average test scores?
Hypotheses: H₀: μ₁ = μ₂ vs H₁: μ₁ ≠ μ₂
Assumptions: Independent samples, normal distributions, homogeneity of variance
Purpose: Compare means of related/paired observations (before/after, matched pairs)
Example: Did blood pressure decrease after treatment? Before: 140 mmHg, After: 132 mmHg
Hypotheses: H₀: μd = 0 vs H₁: μd > 0 (one-tailed for decrease)
Assumptions: Paired observations, differences are approximately normal
Purpose: Compare means across three or more independent groups simultaneously (extension of t-test)
Example: Do students from three teaching methods (A, B, C) have different average scores?
Hypotheses: H₀: μ₁ = μ₂ = μ₃ vs H₁: At least one mean differs
Post-hoc: If F is significant, use Tukey HSD or Bonferroni to find which pairs differ
Assumptions: Independence, normality within groups, homogeneity of variance (equal variances)
Purpose: Test if observed frequencies match an expected probability distribution
Example: Is a die fair? Roll 600 times, expect each face 100 times
Hypotheses: H₀: Die is fair (P₁=P₂=...=P₆=1/6) vs H₁: Die is not fair
Condition: All Ei ≥ 5
Purpose: Test if two categorical variables are independent (contingency table analysis)
Example: Is smoking status independent of gender?
Hypotheses: H₀: Variables are independent vs H₁: Variables are dependent
Condition: At least 80% of cells have Eij ≥ 5
Purpose: Test hypothesis about population mean when population standard deviation σ is known
Example: Testing if sample mean differs from known population mean when σ is known
Use when: σ known, large sample (n ≥ 30), or population is normal
Z-Test for Proportions: z = (p̂ - p₀) / √[p₀(1-p₀)/n]
Regression analysis is a powerful statistical method used to model and examine the relationship between one dependent variable (Y) and one or more independent variables (X). It allows us to predict outcomes, quantify relationships, and test hypotheses about how variables relate to each other.
Purpose: Model the linear relationship between one predictor (X) and one response variable (Y)
Calculating Slope (β₁):
Calculating Intercept (β₀):
Where: r = correlation coefficient, sx = SD of X, sy = SD of Y
Purpose: Predict Y using multiple predictor variables X₁, X₂, ..., Xk
Key Advantage: Controls for confounding variables and provides better predictions
Matrix Form: Y = Xβ + ε, solved using: β = (X'X)⁻¹X'Y
Definition: Proportion of variance in Y explained by X
Range: 0 to 1 (0% to 100%)
Interpretation: R² = 0.75 means 75% of variance in Y is explained by X
Adjusted R²: R²adj = 1 - [(1-R²)(n-1)/(n-k-1)] (penalizes extra predictors)
Definition: Measures strength and direction of linear relationship
Range: -1 to +1
Interpretation: |r| > 0.7 = strong, 0.3-0.7 = moderate, < 0.3 = weak
Definition: Average distance of observed Y values from regression line
Better fit: Lower SEest values
Use: Measures prediction accuracy; construct prediction intervals
Additional Considerations:
• No Multicollinearity: (Multiple regression) Predictors should not be highly correlated with each other (check VIF < 10)
• No Outliers/Influential Points: Check Cook's distance, leverage, DFFIT
• Residual Formula: ei = yi - ŷi (observed - predicted)