
Draw a line through the middle of a cloud of data points that is a "best fit" to the data.
This explanation looks at regression solely as a descriptive statistic: what is the line which lies "closest" to a given set of points. "Closest" means minimizing the sum of the squared y (vertical) distance of the points from the least squares regression line. I won't derive the formula, merely present it and then use it. Data is given as a set of points in the plane, i.e., as ordered pairs of x and y values.
Xbar, written as an X with a line over it, is the mean (average) of the xvalues.
Ybar, a Y with a line over it, is the mean of the yvalues.
SS_{xx} is the sum of the squares of the xdeviations. SUM (x_{i}(Xbar))²
SS_{yy} is the sum of the squares of the ydeviations. SUM (y_{i}(Ybar))²
SS_{xy} is SUM (x_{i}(Xbar))(y_{i}(Ybar))
b_{1} = SS_{xy}/SS_{xx}
b_{0} = (Ybar)  b_{1}(Xbar)
The least squares regression line is yhat = b_{0} + b_{1}x
(yhat is written as a y with a circumflex over it.)
Data Values  

x  y 
2  5 
4  14 
9  1 
13  38 
16  11 

The formula for the least squares regression line is
yhat = b_{0} + b_{1}x
So in our example, where b_{0}=1.622 and b_{1}=1.48, the least squares regression line is
yhat = 1.622 + (1.48)x
The webmaster and author of this Math Help site is Graeme McRae.