Correlation Inferences
   

   

 Math Help -> Statistics -> Correlation Inferences 

Inferences for Correlation

Contents of this section:

Here's what you'll find in this section:

A student alsked,

Of the men who are 68 inches tall, what percentage have forearms which are 18 inches long, to the nearest inch? We are given the following:

Avg height is 68 inches, SD=2.7 in
Avg forearm is 18 inches, SD = 1 in
r = 0.80

We are allowed to assume the distributions of the two random variables are normal.

bullet

Inferences for Population Correlation Coefficient

Recall that the population correlation coefficient tex2html_wrap_inline2879 can be estimated by the sample correlation coefficient r, where [ r=S_xyS_xxS_yy ] Assuming the pair (X,Y) has a bivariate normal distribution and using the aforementioned rule, we can find the confidence interval for tex2html_wrap_inline2879 as well as test for dependence between X and Y.

to 102ptHypothesis:to 10pt to 103pt tex2html_wrap_inline5255 to 102ptStatistic:to 10pt to 103pt to 102ptInterval:to 10pt to 103pt tex2html_wrap_inline5259 to

tabular2320

 


 
bulletResidual Plots and Regression Assumptions
bullet

Residual Plots and Regression Assumptions

Recall that there are three basic assumptions about the random deviations (errors), tex2html_wrap_inline5033 : the random deviations are independent, normally distributed, and have a constant variance. In simple linear regression, we also assume that Y and X are linearly related. We shall consider the use of residual plots for examining the following types of departures from the assumed model.

  1. The regression function is not linear.
  2. The error terms do not have a constant variance.
  3. The model fits all but one or a few outlying observations.
  4. The error terms are not normally distributed.
  5. The error terms are not independent.
The common graphical tools for assumption checking includes:
  1. Residual Plot- scatter plot the residuals against X or the fitted value.
  2. Absolute Residual Plot- scatter plot the absolute values of the residuals against X or the fitted value.
  3. Normal Probability Plot of the Residuals.
  4. Time Series Plot of the Residuals - scatter plot the residuals against time or index.
  5. The time series plot of the residuals are strongly recommended whenever data are obtained in a time sequence. The purpose is to see if there is any correlation between the error terms over time (the error terms are not independent). When the error terms are independent, we expect the residuals to fluctuate in a more or less random pattern around the base line 0.

In the following example, since the observations are from independent individuals, we will just use the first three plots to do assumption checking.

EXAMPLE:\ A cardiology data set was collected by the University of Virginia School of Medicine. Two variables were examined, aortic valve area (AVA) and body surface area (BSA). Physiologically, as children grow, the intracardiac areas also grow. A linear model which relates AVA with BSA, a proxy for physiological growth, has been widely accepted in medical science. (see Gutgesell and Rembold, 1990 and its references). The top left, top right, bottom left, bottom right plots in Figure (AVA vs. BSA) are, respectively, the scatter plot of AVA vs. BSA with the fitted regression line, normal probability plot of the residuals, residual plots with the base line 0 and the absolute residual plot. We can see that (1) AVA and BSA seem to be linearly related (2) The error variance increases with the BSA. (2) is even more obvious in the absolute residual plot. Since we know now that the error terms do not have a common variance, the normal probability plot does not provide much information here. We just note that the errors are not normal when the points do not follow a straight line.

 

 

 

 

 

In the Figure (log[AVA] vs. BSA), we do the same plots but replace AVA by log[AVA]. We can see that log[AVA] is not linearly related with BSA in the scatter plot. We also notice the systematic pattern (curvature) in the residual plot, which indicates a departure from a linear model.

 

 

 

 

 

We now log-transform both AVA and BSA and obtain the plots in Figure (log[AVA] vs. log[BSA]). We see now the model fits all but one observation in the left bottom corner. The residuals actually fluctuate in a more or less random pattern around the base line 0. Also beside one point, the points in the normal probability plot roughly follow a straight line.

 
bulletComputer Lab
bullet

Computer Lab

The goal of this lab is to learn how to use StataQuest to do data analysis and how to use the diagnostic plots provided by Stataquest to check the model assumption. We will need the following commands for the regression analysis: Statistics tex2html_wrap_inline5299 Simple Regression. There are many interesting diagnostic plots provided by StataQuest. To understand them, we can use the following example:

First open the data file reg.dta under the datagen directory. There are several variables in the file:
bulletX: the independent variable.
bullete: the deviation which contains a random sample from a normal distribution.
bulletY: Y=1+X+e
bullettex2html_wrap_inline5309 : tex2html_wrap_inline5311
bullettex2html_wrap_inline5313 : tex2html_wrap_inline5315
bullettex2html_wrap_inline5317 : tex2html_wrap_inline5317 = Y except the observation corresponding to the max(X) is replaced by the original Y value plus 20.
bullettex2html_wrap_inline5327 : tex2html_wrap_inline5317 = Y except the observation with the X value closest to tex2html_wrap_inline5335 is replaced by the original Y value plus 20.
You may look at the normal quantile plot of e and the scatter plot of e versus X and then compare them with the diagnostic plots provided by Stataquest after you regress Y on X. You may also want to get the diagnostic plots after regressing tex2html_wrap_inline5309 and tex2html_wrap_inline5313 on X, respectively. What do you learn from the plots? Now compare the LS lines for Y, tex2html_wrap_inline5317 , tex2html_wrap_inline5327 vs. X. What kind of observation has more potential to be influential?

 

bulletConcept Lab
bullet

Concept Lab

 
bulletCh 6: Bivariate Descriptive Statistics tex2html_wrap_inline3057 Least Squares

 

%

 

 

Internet References

Stat Trek: Important Statistics Formulas 

Related Pages in this Website

Go back to Math Help Home

 

The webmaster and author of the Math Help site is Graeme McRae.
     [home]  [email]  [search]  [Links to Math Sites]  [Whiteboard]