|
Here's what you'll find in this section:
We will now look at some other measures of location and spread.
The five-number summary is an abbreviated way to describe a sample. The five number summary is a list of the following numbers:
The five number summary leads to a graphical representation of a distribution called the boxplot. Boxplots are ideal for comparing two nearly-continuous variables. To draw a boxplot (see the example in the figure below), follow these simple steps:
EXAMPLE: To illustrate boxplots, the figure below puts boxplots side by side of the same four data sets that had histograms in the figure in Week 1.
In many places during this course we will assume that a sample comes from a population having the normal (bell-shaped) distribution. A plot based on percentiles that seeks to verify this assumption is called the normal quantile plot. This is a scatterplot of the percentiles of the data versus the percentiles of a population in fact having the normal distribution. If the data do come from a normal population, the resulting points should fall closely along a straight line.
To illustrate this, the figure below shows the normal quantile plot of a random sample of 50 IQ's (we said earlier that IQ's do in fact follow a normal distribution). Notice how the points closely follow the line.
To better understand the information that the normal quantile plots provide us and the relationship among distributions , histograms, box plots and normal quantile plots, we can look at the figure at the previous page. The 4 plots on the first row indicate the distributions where the data are sampled from. The second, the third and the fourth rows are, respectively, the corresponding histograms, boxplots and normal quantile plots. The four distributions are normal, long tailed, short tailed and skewed, respectively.
Knowing the mean and standard deviation of a sample or a population gives us a good idea of where most of the data values are because of the following two rules:
EXAMPLE: A pharmaceutical company manufactures vitamin pills which contain an average of 507 grams of vitamin C with a standard deviation of 3 grams. Using Chebychev's rule, we know that at least
or 75% of the vitamin pills are within k=2 standard deviations of the mean. That is, at least 75% of the vitamin pills will have between 501 and 513 grams of vitamin C, i.e.,
EXAMPLE: If the distribution of vitamin C amounts in the previous example is bell shaped, then we can get even more precise results by using the empirical rule. Under these conditions, approximately 68% of the vitamin pills have a vitamin C content in the interval [507-3,507+3]=[504,510], 95% are in the interval [507-2(3),507+2(3)]=[501,513], and 99.7% are in the interval [507-3(3),507+3(3)]=[498,516].
NOTE: Chebychev's rule gives only a minimum proportion of observations which lie within k standard deviations of the mean.
Z-scores are a means of answering the question ``how many
standard deviations away from the mean is this observation?'' If our
observation X is from a population with mean
and standard deviation
, then
On the other hand, if the observation X is from a sample with mean
and standard deviation s, then
A positive (negative) Z-score indicates that the observation is greater than (less than) the mean.
EXAMPLE: In a certain city the mean price of a quart of milk is 63 cents and the standard deviation is 8 cents. The average price of a package of bacon is $1.80 and the standard deviation is 15 cents. If we pay $0.89 for a quart of milk and $2.19 for a package of bacon at a 24-hour convenience store, which is relatively more expensive? To answer this, we compute Z-scores for each:
Our Z-scores show us that we are overpaying quite a bit more for the milk than we are for the bacon.
Because of the Empirical rule (or the Chebychev's rule), the Z-score of a given observation also provides insight on how ``typical'' this observation is to the population. For example, by empirical rule, if data follow a bell-shaped curve, then approximately 95% of the data should have the Z-score between -2 and 2.
The webmaster and author of this Math Help site is Graeme McRae.