Find the regression line of y on x for the following data and estimate y when x 9

We conclude that $\rho^2$ is an indicator showing the strength of our regression model in estimating (predicting) $Y$ from $X$. In practice, we often do not have $\rho$ but we have the observed pairs $(x_1,y_1)$, $(x_2,y_2)$, $\cdots$, $(x_n,y_n)$. We can estimate $\rho^2$ from the observed data. We show it by $r^2$ and call it $R$-squared or coefficient of determination.

The first portion of results contains the best fit values of the slope and Y-intercept terms. These parameter estimates build the regression line of best fit. You can see how they fit into the equation at the bottom of the results section. Our guide can help you learn more about interpreting regression slopes, intercepts, and confidence intervals.

Use the goodness of fit section to learn how close the relationship is. R-square quantifies the percentage of variation in Y that can be explained by its value of X.

The next question may seem odd at first glance: Is the slope significantly non-zero? This goes back to the slope parameter specifically. If it is significantly different from zero, then there is reason to believe that X can be used to predict Y. If not, the model's line is not any better than no line at all, so the model is not particularly useful!

P-values help with interpretation here: If it is smaller than some threshold (often .05) we have evidence to suggest a statistically significant relationship.

Finally the equation is given at the end of the results section. Plug in any value of X (within the range of the dataset anyway) to calculate the corresponding prediction for its Y value.

Graphing linear regression

The Linear Regression calculator provides a generic graph of your data and the regression line.

While the graph on this page is not customizable, Prism is a fully-featured research tool used for publication-quality data visualizations. See it in action in our How To Create and Customize High Quality Graphs video!

Graphing is important not just for visualization reasons, but also to check for outliers in your data. If there are a couple points far away from all others, there are a few possible meanings: They could be unduly influencing your regression equation or the outliers could be a very important finding in themselves. Use this outlier checklist to help figure out which is more likely in your case.

For more information

Liked using this calculator? For additional features like advanced analysis and customizable graphics, we offer a free 30-day trial of Prism

Some additional highlights of Prism include the ability to:

  • Use the line-of-best-fit equation for prediction directly within the software
  • Graph confidence intervals and use advanced prediction intervals
  • Compare regression curves for different datasets
  • Build multiple regression models (use more than one predictor variable)

Looking to learn more about linear regression analysis? Our ultimate guide to linear regression includes examples, links, and intuitive explanations on the subject.

Prism's curve fitting guide also includes thorough linear regression resources in a helpful FAQ format.

Both of these resources also go over multiple linear regression analysis, a similar method used for more variables. If more than one predictor is involved in estimating a response, you should try multiple linear analysis in Prism (not the calculator on this page!).

The correlation coefficient r measures the strength of the linear association between x and y. The variable r has to be between –1 and +1. When r is positive, the x and y will tend to increase and decrease together. When r is negative, x will increase and y will decrease, or the opposite, x will decrease and y will increase. The coefficient of determination r2, is equal to the square of the correlation coefficient. When expressed as a percent, r2 represents the percent of variation in the dependent variable y that can be explained by variation in the independent variable x using the regression line.

X = xiY = yi`"x"_"i"^2`xi yi1212214236918691422

From the table, we have

n = 3, ∑ xi = 6, ∑ yi = 9, `sum "x"_"i"^2 = 14`, ∑ xi yi = 22

`bar x = (sum x_i)/"n" = 6/3 = 2`

`bar y = (sum y_i)/"n" = 9/3 = 3`

Now, `"b"_"YX" = (sum"x"_"i" "y"_"i" - "n" bar "x" bar "y")/(sum "x"_"i"^2 - "n" bar"x"^2)`

`= (22 - 3xx2xx3)/(14 - 3(2)^2) = (22 - 18)/(14 - 12) = 4/2 = 2`

Also, `"a" = bar y - "b"_"YX"  bar x` = 3 - 2(2) = - 1

The regression equation of Y on X is,

Y = a + bYX X

∴ Y = - 1 + 2X

For X = 4,

Y = - 1 + 2(4) = - 1 + 8 = 7

The most likely value of Y for X = 4 is 7.

X = xiY = yi`"x"_"i"^2`xi yi1212244837921461624552525663636213091116

From the table, we have

n = 6, ∑ xi = 21, ∑ yi = 30, `sum "x"_"i"^2 = 91`, ∑ xi yi = 116

`bar x = (sum x_i)/"n" = 21/6 = 3.5`

`bar y = (sum y_i)/"n" = 30/6 = 5`

Now, `"b"_"YX" = (sum"x"_"i" "y"_"i" - "n" bar "x" bar "y")/(sum "x"_"i"^2 - "n" bar"x"^2)`

`= (116 - 6xx3.5xx5)/(91 - 6(3.5)^2) = (116 - 105)/(91 - 73.5) = 11/17.5 = 0.63`

Also, `"a" = bar y - "b"_"YX"  bar x`

= 5 - 0.63 × 3.5

= 5 - 2.205 = 2.8

The regression equation of Y on X is,

Y = a + bYX X

∴ Y = 2.8 + 0.63 X

For X = 10,

Y = 2.8 + 0.63 × 10

= 2.8 + 6.3 = 9.1

∴ The value of Y when X =10 is 9.1