Simple Linear Regression Example Problems With Solutions

Simple linear regression is a core statistical technique used to quantify the linear relationship between a single independent (predictor) variable and a dependent (response) variable, and practicing with simple linear regression example problems with solutions is the most reliable way to build fluency in applying this method to real-world datasets. This foundational tool underpins everything from basic academic statistics coursework to advanced machine learning model development, making hands-on practice with worked examples essential for anyone working with quantitative data Most people skip this — try not to..

Introduction

Simple linear regression models the relationship between two continuous variables by fitting a straight line to observed data points. The line is defined by two parameters: the intercept (b₀), which represents the predicted value of the response variable when the predictor variable is zero, and the slope (b₁), which represents the average change in the response variable for a one-unit increase in the predictor variable. Unlike multiple linear regression, which uses two or more predictor variables, simple linear regression focuses on a single X variable, making it an accessible entry point for new statistics students Took long enough..

Working through simple linear regression example problems with solutions helps bridge the gap between theoretical formulas and practical application. So many learners memorize the regression formulas without understanding how to apply them to messy, real-world data, but step-by-step worked examples clarify how each calculation contributes to the final model. These examples also highlight common pitfalls, such as misidentifying predictor and response variables, miscalculating sums, or misinterpreting R-squared values.

Most guides skip this. Don't Most people skip this — try not to..

Key Steps to Solve Simple Linear Regression Problems

Follow this standardized sequence to solve any simple linear regression problem accurately:

Identify variables: Confirm the X (predictor, independent) and Y (response, dependent) variables. The predictor variable is the value you manipulate or observe to predict the response variable.
Calculate summary statistics: Compute the sum and mean of all X values, all Y values, the sum of X multiplied by Y (∑XY), and the sum of squared X values (∑X²). For model fit calculations, you may also compute the sum of squared Y values (∑Y²).
Calculate the slope coefficient (b₁): Use the least squares method formula: b₁ = [n(∑XY) - (∑X)(∑Y)] / [n(∑X²) - (∑X)²] where n is the number of observations. This formula minimizes the sum of squared differences between observed and predicted Y values.
Calculate the intercept (b₀): Use the formula: b₀ = Ȳ - b₁(X̄) where X̄ is the mean of X values and Ȳ is the mean of Y values.
Formulate the regression equation: Write the final model as Ŷ = b₀ + b₁X, where Ŷ is the predicted value of Y for a given X.
Evaluate model fit: Calculate R-squared, which measures the proportion of variance in Y explained by X. The formula is R² = SSR / SSTO, where SSR = ∑(Ŷ - Ȳ)² (sum of squares due to regression) and SSTO = ∑(Y - Ȳ)² (total sum of squares). R-squared values range from 0 to 1, with higher values indicating better fit.
Make predictions and analyze residuals: Use the regression equation to predict Y values for new X inputs. Calculate residuals (observed Y minus predicted Ŷ) to check for patterns that may violate model assumptions.

Worked Simple Linear Regression Example Problems with Solutions

The following examples apply the steps above to common real-world scenarios. Each includes a full solution and interpretation of results Not complicated — just consistent..

Example 1: Study Hours vs. Exam Scores

A statistics instructor wants to model the relationship between hours spent studying for a final exam and the exam score earned (out of 100). They collect data from 10 randomly selected students:

X (Study hours): 2, 3, 5, 7, 8, 10, 11, 12, 14, 15
Y (Exam score): 55, 60, 65, 70, 72, 78, 80, 82, 85, 88

Step 1: Identify variables

X = Study hours (predictor), Y = Exam score (response) Practical, not theoretical..

Step 2: Calculate summary statistics

n = 10
∑X = 2 + 3 + 5 + 7 + 8 + 10 + 11 + 12 + 14 + 15 = 87 → X̄ = 87 / 10 = 8.7
∑Y = 55 + 60 + 65 + 70 + 72 + 78 + 80 + 82 + 85 + 88 = 735 → Ȳ = 735 / 10 = 73.5
∑XY = (2×55) + (3×60) + (5×65) + (7×70) + (8×72) + (10×78) + (11×80) + (12×82) + (14×85) + (15×88) = 6835
∑X² = 2² + 3² + 5² + 7² + 8² + 10² + 11² + 12² + 14² + 15² = 937

Step 3: Calculate slope (b₁)

b₁ = [10×6835 - 87×735] / [10×937 - 87²] b₁ = [68350 - 63945] / [9370 - 7569] = 4405 / 1801 ≈ 2.446

Step 4: Calculate intercept (b₀)

b₀ = 73.5 - (2.446×8.7) ≈ 73.5 - 21.28 ≈ 52.22

Step 5: Regression equation

Ŷ = 52.22 + 2.446X

Step 6: Evaluate model fit

First calculate SSTO: ∑Y² - (∑Y)²/n = 55111 - (735)²/10 = 55111 - 54022.5 = 1088.5 Then calculate SSR: b₁×(∑XY - (∑X∑Y)/n) = 2.446×(6835 - (87×735)/10) ≈ 2.446×440.5 ≈ 1077 R² = 1077 / 1088.5 ≈ 0.989 (98.9%)

This means 98.9% of the variance in exam scores is explained by study hours, indicating an extremely strong linear relationship.

Step 7: Predictions and residuals

For a student who studies 9 hours: Ŷ = 52.22 + (2.446×9) ≈ 74.23, so a predicted score of ~74. For a student who studied 8 hours (observed score 72): Ŷ = 52.22 + (2.446×8) ≈ 71.79, residual = 72 - 71.79 = 0.21. The small positive residual means the student scored slightly higher than predicted Practical, not theoretical..

Example 2: Advertising Spend vs. Monthly Sales

A small retail business tracks its monthly advertising spend (in $1000s) and corresponding monthly sales (in $1000s) over 8 months. They want to predict sales based on advertising investment.

X (Ad spend): 1, 2, 3, 4, 5, 6, 7, 8
Y (Sales): 15, 20, 25, 28, 32, 35, 40, 42

Step 1: Identify variables

X = Ad spend (predictor), Y = Sales (response).

Step 2: Summary statistics

n = 8
∑X = 36 → X̄ = 36 / 8 = 4.5
∑Y = 237 → Ȳ = 237 / 8 = 29.625
∑XY = (1×15) + (2×20) + (3×25) + (4×28) + (5×32) + (6×35) + (7×40) + (8×42) = 1228
∑X² = 1 + 4 + 9 + 16 + 25 + 36 + 49 + 64 = 204

Step 3: Slope (b₁)

b₁ = [8×1228 - 36×237] / [8×204 - 36²] = [9824 - 8532] / [1632 - 1296] = 1292 / 336 ≈ 3.845

Step 4: Intercept (b₀)

b₀ = 29.625 - (3.845×4.5) ≈ 29.625 - 17.30 ≈ 12.32

Step 5: Regression equation

Ŷ = 12.32 + 3.845X

Step 6: Model fit

SSTO = ∑Y² - (∑Y)²/n = 7647 - (237)²/8 = 7647 - 7021.125 = 625.875 SSR = 3.845×(1228 - (36×237)/8) ≈ 3.845×161.5 ≈ 621 R² = 621 / 625.875 ≈ 0.992 (99.2%)

Almost all variance in sales is explained by ad spend, suggesting a near-perfect linear relationship. Note that real-world business data rarely produces R-squared values this high, as external factors like seasonality or competitor activity also affect sales The details matter here..

Step 7: Predictions

For an ad spend of $4000 (X = 4): Ŷ = 12.32 + (3.845×4) ≈ 27.7, so predicted sales of ~$27,700. A business can use this to set advertising budgets aligned with revenue goals.

Scientific Explanation

The mathematical foundation of simple linear regression is the least squares method, a procedure that minimizes the sum of squared errors (SSE). SSE is the sum of the squared differences between each observed Y value and its corresponding predicted Ŷ value: SSE = ∑(Y - Ŷ)². By minimizing SSE, the least squares method produces the line of best fit that is closer to all observed data points than any other possible straight line Worth knowing..

To derive the slope and intercept formulas, we take the partial derivatives of SSE with respect to b₀ and b₁, set them to zero, and solve the resulting system of linear equations (the normal equations). This derivation confirms that the formulas outlined in the steps section are the only values of b₀ and b₁ that minimize SSE for a given dataset.

Beyond the line of best fit, three key sum of squares metrics define model performance:

SSTO (Total Sum of Squares): Measures total variance in the response variable. In real terms, SSTO = ∑(Y - Ȳ)²
SSR (Regression Sum of Squares): Measures variance in Y explained by the predictor variable. SSR = ∑(Ŷ - Ȳ)²
SSE (Error Sum of Squares): Measures unexplained variance.

It's the bit that actually matters in practice.

By definition, SSTO = SSR + SSE, which is why R-squared (SSR/SSTO) represents the proportion of total variance explained by the model And that's really what it comes down to..

Simple linear regression also relies on four core assumptions to produce valid results:

Homoscedasticity: Residuals have constant variance across all values of X. Heteroscedasticity (varying residual variance) biases standard error estimates. Practically speaking, 4. Independence: Residuals are independent of each other, meaning no observation influences another. 3. Linearity: The relationship between X and Y is linear. Normality: Residuals are normally distributed. Even so, this is often violated in time-series data. That's why this can be verified with a scatter plot of X vs Y. 2. This assumption is only critical for small sample sizes (n < 30) when conducting hypothesis tests on the slope coefficient.

Violating these assumptions does not make the regression line invalid, but it does mean that R-squared values and slope interpretations may be biased or misleading.

Frequently Asked Questions

What is the difference between simple and multiple linear regression? Simple linear regression uses exactly one predictor variable to model a response variable. Multiple linear regression uses two or more predictor variables, requiring matrix algebra to calculate coefficients instead of the simple formulas used here Still holds up..
Can simple linear regression handle categorical predictor variables? No, simple linear regression requires both variables to be continuous. To use a categorical predictor (e.g., gender, region), you would need to use a different technique like ANOVA or convert categories to dummy numerical variables, which falls under multiple regression.
What does a negative slope coefficient mean? A negative b₁ indicates a negative linear relationship: as X increases, Y decreases on average. Here's one way to look at it: a model of daily ice cream sales vs average temperature would have a positive slope, while a model of monthly heating bills vs temperature would have a negative slope.
How do I know if my simple linear regression model is valid? First, check that the four core assumptions (linearity, independence, homoscedasticity, normality) are met using residual plots and statistical tests. Second, ensure R-squared is reasonably high for your use case (there is no universal "good" R-squared value, as it depends on the field). Third, verify that the slope coefficient is statistically significant using a t-test Worth keeping that in mind..
Where can I find more simple linear regression example problems with solutions? Many open-access statistics textbooks include practice problems, and working with public datasets is an excellent way to create your own problems. Replicating the examples above with new datasets is also a great way to build fluency.

Conclusion

Simple linear regression is a versatile, accessible tool for modeling linear relationships between two continuous variables, but mastery only comes with repeated practice. Working through simple linear regression example problems with solutions helps you internalize the calculation steps, avoid common errors, and interpret results correctly for real-world applications.

The examples in this guide cover two common use cases, but the same steps apply to any dataset with a linear relationship between two continuous variables. Practically speaking, always start by verifying the relationship with a scatter plot, follow the standardized calculation sequence, and validate your model against the core assumptions before making predictions. Over time, you will be able to apply simple linear regression intuitively to your own research, coursework, or business problems Worth knowing..

Simple Linear Regression Example Problems With Solutions

Introduction

Key Steps to Solve Simple Linear Regression Problems

Worked Simple Linear Regression Example Problems with Solutions

Example 1: Study Hours vs. Exam Scores

Step 1: Identify variables

Step 2: Calculate summary statistics

Step 3: Calculate slope (b₁)

Step 4: Calculate intercept (b₀)

Step 5: Regression equation

Step 6: Evaluate model fit

Step 7: Predictions and residuals

Example 2: Advertising Spend vs. Monthly Sales

Step 1: Identify variables

Step 2: Summary statistics

Step 3: Slope (b₁)

Step 4: Intercept (b₀)

Step 5: Regression equation

Step 6: Model fit

Step 7: Predictions

Scientific Explanation

Frequently Asked Questions

Conclusion

New Content Alert

Just Made It Online

Introduction

Key Steps to Solve Simple Linear Regression Problems

Worked Simple Linear Regression Example Problems with Solutions

Example 1: Study Hours vs. Exam Scores

Step 1: Identify variables

Step 2: Calculate summary statistics

Step 3: Calculate slope (b₁)

Step 4: Calculate intercept (b₀)

Step 5: Regression equation

Step 6: Evaluate model fit

Step 7: Predictions and residuals

Example 2: Advertising Spend vs. Monthly Sales

Step 1: Identify variables

Step 2: Summary statistics

Step 3: Slope (b₁)

Step 4: Intercept (b₀)

Step 5: Regression equation

Step 6: Model fit

Step 7: Predictions

Scientific Explanation

Frequently Asked Questions

Conclusion

New Content Alert

Just Made It Online

A Natural Next Step