Navigation 
 Home 
 Search 
 Site map 

 Contact Graeme 
 Home 
 Email 
 Twitter

 Skip Navigation LinksMath Help > Statistics > Linear Regression Inferences > Linear Regression

Draw a line through the middle of a cloud of data points that is a "best fit" to the data.

Linear Regression

This explanation looks at regression solely as a descriptive statistic: what is the line which lies "closest" to a given set of points. "Closest" means minimizing the sum of the squared y (vertical) distance of the points from the least squares regression line. I won't derive the formula, merely present it and then use it. Data is given as a set of points in the plane, i.e., as ordered pairs of x and y values.

Statistical Formulae

X-bar, written as an X with a line over it, is the mean (average) of the x-values.

Y-bar, a Y with a line over it, is the mean of the y-values.

SSxx is the sum of the squares of the x-deviations.  SUM (xi-(X-bar))²

SSyy is the sum of the squares of the y-deviations.  SUM (yi-(Y-bar))²

SSxy is SUM (xi-(X-bar))(yi-(Y-bar))

b1 = SSxy/SSxx 

b0 = (Y-bar) - b1(X-bar)

The least squares regression line is y-hat = b0 + b1x

(y-hat is written as a y with a circumflex over it.)

Example:

 Data Values 
 x  y 
 2  -5 
 4  14 
 9  -1 
 13  38 
 16  11 

Statistical measures of this data:

 X-bar = 8.8 
 Y-bar = 11.4 
 SSxx = 138.8 
 SSyy = 1137.2 
 SSxy = 205.4 
 b1 = 1.48 
 b0 = -1.622 

The formula for the least squares regression line is

y-hat = b0 + b1x

So in our example, where b0=-1.622 and b1=1.48, the least squares regression line is

y-hat = -1.622 + (1.48)x

Internet references

Related pages in this website


The webmaster and author of this Math Help site is Graeme McRae.