
Here's what you'll find in this section:
A student alsked,
Of the men who are 68 inches tall, what percentage have forearms which are 18 inches long, to the nearest inch? We are given the following:
Avg height is 68 inches, SD=2.7 in
Avg forearm is 18 inches, SD = 1 in
r = 0.80
We are allowed to assume the distributions of the two random variables are normal.
Recall that the population correlation coefficient can be estimated by the sample correlation coefficient r, where [ r=S_xyS_xxS_yy ] Assuming the pair (X,Y) has a bivariate normal distribution and using the aforementioned rule, we can find the confidence interval for as well as test for dependence between X and Y.
to 102ptHypothesis:to 10pt to 103pt to 102ptStatistic:to 10pt to 103pt to 102ptInterval:to 10pt to 103pt to
Recall that there are three basic assumptions about the random deviations (errors), : the random deviations are independent, normally distributed, and have a constant variance. In simple linear regression, we also assume that Y and X are linearly related. We shall consider the use of residual plots for examining the following types of departures from the assumed model.
In the following example, since the observations are from independent individuals, we will just use the first three plots to do assumption checking.
EXAMPLE:\ A cardiology data set was collected by the University of Virginia School of Medicine. Two variables were examined, aortic valve area (AVA) and body surface area (BSA). Physiologically, as children grow, the intracardiac areas also grow. A linear model which relates AVA with BSA, a proxy for physiological growth, has been widely accepted in medical science. (see Gutgesell and Rembold, 1990 and its references). The top left, top right, bottom left, bottom right plots in Figure (AVA vs. BSA) are, respectively, the scatter plot of AVA vs. BSA with the fitted regression line, normal probability plot of the residuals, residual plots with the base line 0 and the absolute residual plot. We can see that (1) AVA and BSA seem to be linearly related (2) The error variance increases with the BSA. (2) is even more obvious in the absolute residual plot. Since we know now that the error terms do not have a common variance, the normal probability plot does not provide much information here. We just note that the errors are not normal when the points do not follow a straight line.
In the Figure (log[AVA] vs. BSA), we do the same plots but replace AVA by log[AVA]. We can see that log[AVA] is not linearly related with BSA in the scatter plot. We also notice the systematic pattern (curvature) in the residual plot, which indicates a departure from a linear model.
We now logtransform both AVA and BSA and obtain the plots in Figure (log[AVA] vs. log[BSA]). We see now the model fits all but one observation in the left bottom corner. The residuals actually fluctuate in a more or less random pattern around the base line 0. Also beside one point, the points in the normal probability plot roughly follow a straight line.
The goal of this lab is to learn how to use StataQuest to do data analysis and how to use the diagnostic plots provided by Stataquest to check the model assumption. We will need the following commands for the regression analysis: Statistics Simple Regression. There are many interesting diagnostic plots provided by StataQuest. To understand them, we can use the following example:
First open the data file reg.dta under the datagen directory. There are several variables in the file:
%
Stat Trek: Important Statistics Formulas
Go back to Math Help Home
The webmaster and author of this Math Help site is Graeme McRae.