# document.write (document.title)

 Math Help > Statistics > Bivariate

Here's what you'll find in this section:

• # Bivariate Data

Before we looked at one measurement on an observation (or individual), say X is height. Now we're interested in more than one measurement per observation (individual), say X is height and Y is weight. Let's say we have n individuals we're taking the measurements on. Then our data would be as follows: .

• Scatterplots
• ## Scatterplots

Scatterplots, like histograms, are a good visual means to understanding patterns of bivariate numerical data. Construction of a scatterplot is straightforward: each point on a scatterplot corresponds to one bivariate observation.

The scatterplot gives us a visual means of seeing relationships between the two variables. We call a relationship positive if an increase in one variable corresponds to an increase in the other. When one variable increases and the other decreases, we call the relationship negative.

What can a scatterplot tell us? In general terms, it gives us an idea of what kind of relationships (or patterns) our bivariate data has. We may have

• Positive (negative) linear relationship
• Positive (negative) curved relationship
• Other relationships
• No relationship

Scatterplots can also give us visual evidence of outliers or suspicious observations (details in Weeks 12 and 13).

NOTE: Scatterplots are used only for quantitative variables (those that are comparable numerically). Examples of quantitative variables are: height, weight, rates, counts, etc. Examples of qualitative variables (those which can not be compared numerically) are: color, type of car, sex, etc.

Just like other graphical methods we've discussed, e.g., histograms, there are numerical statistics which give us a more precise description of bivariate relationships. The two major ones we'll discuss are correlation and linear regression.

• Correlation
• ## Correlation

To get a measure of how strongly X and Y values are related, we will use the correlation coefficient. Correlation is concerned with trends: if X increases, does Y tend to increase or decrease? How much? How strong is this tendency?

• Least Squares Line
• ## Least Squares Line

Recall the equation of a line from algebra:

(You may have seen Y=mX+b, we are going to change notation slightly.) Above, is called the slope of the line and is the y-intercept. The slope measures the amount Y increases when X increases by one unit. The Y-intercept is the value of Y when X=0.

Our objective is to fit a straight line to points on a scatterplot that do not lie along a straight line (see the figure above). So we want to find and such that the line fits the data as well as possible. First, we need to define what we mean by a ``best'' fit. We want a line that is in some sense closest to all of the data points simultaneously. In statistics, we define a residual, , as the vertical distance between a point and the line,

(see the vertical line in the figure) Since residuals can be positive or negative, we will square them to remove the sign. By adding up all of the squared residuals, we get a measure of how far away from the data our line is. Thus, the ``best'' line will be one which has the minimum sum of squared residuals, i.e., min . This method of finding a line is called least squares.

The formulas for the slope and intercept of the least squares line are

Using algebra, we can express the slope as

• Coefficient of Determination
• ## Coefficient of Determination

A statistic that is widely used to determine how well a regression fits is the coefficient of determination (or multiple correlation coefficient), . represents the fraction of variability in y that can be explained by the variability in x. In other words, explains how much of the variability in the y's can be explained by the fact that they are related to x, i.e., how close the points are to the line. The equation for is

where SSTotal is the total sums of squares of the data.

NOTE: In the simple linear regression case, is simply the square of the correlation coefficient.

• Computer Lab
• # Computer Lab

Applicable StataQuest Commands:

Summaries Means and SDs

Summaries Means and SDs by group One-way of means

Summaries Median/Percentiles

Summaries Tables One-way (frequency) to create relative frequency tables used in histograms

Graphs One variable Histogram Continuous variable

Graphs One variable Box plot

Graphs One variable Stem-and-leaf

Graphs One variable by group Histograms by group Continuous variable to compare data in one column with the group indicator in another column

Graphs One variable by group Box plot by group to compare data in one column with the group indicator in another column

Graphs Comparison of variables Boxplot comparison to compare data in multiple columns

• Concept Lab
• # Concept Lab

• Ch 4: How Are Populations Distributed?
• Ch 6: Bivariate Descriptive Statistics, Scatterplots I and II

%

### Related pages in this website

The webmaster and author of this Math Help site is Graeme McRae.