Bivariate
   

   

 Math Help -> Statistics -> Bivariate Descriptive Statistics 

Contents of this section:

Here's what you'll find in this section:

bullet

Bivariate Data

Before we looked at one measurement on an observation (or individual), say X is height. Now we're interested in more than one measurement per observation (individual), say X is height and Y is weight. Let's say we have n individuals we're taking the measurements on. Then our data would be as follows: tex2html_wrap_inline2853 .


 
bulletScatterplots
bullet

Scatterplots

Scatterplots, like histograms, are a good visual means to understanding patterns of bivariate numerical data. Construction of a scatterplot is straightforward: each point on a scatterplot corresponds to one bivariate observation.

The scatterplot gives us a visual means of seeing relationships between the two variables. We call a relationship positive if an increase in one variable corresponds to an increase in the other. When one variable increases and the other decreases, we call the relationship negative.

What can a scatterplot tell us? In general terms, it gives us an idea of what kind of relationships (or patterns) our bivariate data has. We may have
bulletPositive (negative) linear relationship
bulletPositive (negative) curved relationship
bulletOther relationships
bulletNo relationship

 

 

 

Scatterplots can also give us visual evidence of outliers or suspicious observations (details in Weeks 12 and 13).

NOTE: Scatterplots are used only for quantitative variables (those that are comparable numerically). Examples of quantitative variables are: height, weight, rates, counts, etc. Examples of qualitative variables (those which can not be compared numerically) are: color, type of car, sex, etc.

Just like other graphical methods we've discussed, e.g., histograms, there are numerical statistics which give us a more precise description of bivariate relationships. The two major ones we'll discuss are correlation and linear regression.

 

bulletCorrelation
bullet

Correlation

To get a measure of how strongly X and Y values are related, we will use the correlation coefficient. Correlation is concerned with trends: if X increases, does Y tend to increase or decrease? How much? How strong is this tendency?


 

bulletLeast Squares Line
bullet

Least Squares Line

Recall the equation of a line from algebra:

displaymath2967

(You may have seen Y=mX+b, we are going to change notation slightly.) Above, tex2html_wrap_inline2971 is called the slope of the line and tex2html_wrap_inline2973 is the y-intercept. The slope measures the amount Y increases when X increases by one unit. The Y-intercept is the value of Y when X=0.

 

 

Our objective is to fit a straight line to points on a scatterplot that do not lie along a straight line (see the figure above). So we want to find tex2html_wrap_inline2973 and tex2html_wrap_inline2971 such that the line tex2html_wrap_inline2991 fits the data as well as possible. First, we need to define what we mean by a ``best'' fit. We want a line that is in some sense closest to all of the data points simultaneously. In statistics, we define a residual, tex2html_wrap_inline2993 , as the vertical distance between a point and the line,

displaymath2995

 

(see the vertical line in the figure) Since residuals can be positive or negative, we will square them to remove the sign. By adding up all of the squared residuals, we get a measure of how far away from the data our line is. Thus, the ``best'' line will be one which has the minimum sum of squared residuals, i.e., min tex2html_wrap_inline2997 . This method of finding a line is called least squares.

The formulas for the slope and intercept of the least squares line are

displaymath2999

Using algebra, we can express the slope tex2html_wrap_inline2971 as

displaymath3003

 

bulletCoefficient of Determination
bullet

Coefficient of Determination

A statistic that is widely used to determine how well a regression fits is the coefficient of determination (or multiple correlation coefficient), tex2html_wrap_inline3037 . tex2html_wrap_inline3037 represents the fraction of variability in y that can be explained by the variability in x. In other words, tex2html_wrap_inline3037 explains how much of the variability in the y's can be explained by the fact that they are related to x, i.e., how close the points are to the line. The equation for tex2html_wrap_inline3037 is

displaymath3053

where SSTotal is the total sums of squares of the data.

NOTE: In the simple linear regression case, tex2html_wrap_inline3037 is simply the square of the correlation coefficient.

 

bulletComputer Lab
bullet

Computer Lab

Applicable StataQuest Commands:

Summaries tex2html_wrap_inline3057 Means and SDs

Summaries tex2html_wrap_inline3057 Means and SDs by group tex2html_wrap_inline3057 One-way of means

Summaries tex2html_wrap_inline3057 Median/Percentiles

Summaries tex2html_wrap_inline3057 Tables tex2html_wrap_inline3057 One-way (frequency) to create relative frequency tables used in histograms

Graphs tex2html_wrap_inline3057 One variable tex2html_wrap_inline3057 Histogram tex2html_wrap_inline3057 Continuous variable

Graphs tex2html_wrap_inline3057 One variable tex2html_wrap_inline3057 Box plot

Graphs tex2html_wrap_inline3057 One variable tex2html_wrap_inline3057 Stem-and-leaf

Graphs tex2html_wrap_inline3057 One variable by group tex2html_wrap_inline3057 Histograms by group tex2html_wrap_inline3057 Continuous variable to compare data in one column with the group indicator in another column

Graphs tex2html_wrap_inline3057 One variable by group tex2html_wrap_inline3057 Box plot by group to compare data in one column with the group indicator in another column

Graphs tex2html_wrap_inline3057 Comparison of variables tex2html_wrap_inline3057 Boxplot comparison to compare data in multiple columns

 

bulletConcept Lab
bullet

Concept Lab

 
bulletCh 4: How Are Populations Distributed?
bulletCh 6: Bivariate Descriptive Statistics, Scatterplots I and II

%

 

Internet References

 

Related pages in this website

 

 

The webmaster and author of the Math Help site is Graeme McRae.
     [home]  [email]  [search]  [Links to Math Sites]  [Whiteboard]