|
Draw a line through the middle of a cloud of data points that is a "best fit" to the data. Linear RegressionThis explanation looks at regression solely as a descriptive statistic: what is the line which lies "closest" to a given set of points. "Closest" means minimizing the sum of the squared y (vertical) distance of the points from the least squares regression line. I won't derive the formula, merely present it and then use it. Data is given as a set of points in the plane, i.e., as ordered pairs of x and y values. Statistical FormulaeX-bar, written as an X with a line over it, is the mean (average) of the x-values. Y-bar, a Y with a line over it, is the mean of the y-values. SSxx is the sum of the squares of the x-deviations. SUM (xi-(X-bar))² SSyy is the sum of the squares of the y-deviations. SUM (yi-(Y-bar))² SSxy is SUM (xi-(X-bar))(yi-(Y-bar)) b1 = SSxy/SSxx b0 = (Y-bar) - b1(X-bar) The least squares regression line is y-hat = b0 + b1x
Example:
Statistical measures of this data:
The formula for the least squares regression line is
So in our example, where b0=-1.622 and b1=1.48, the least squares regression line is
Internet referencesRelated pages on this website | |||||||||||||||||||||||||||||||||||||
|
The webmaster and author of the Math
Help site is Graeme McRae. |