# document.write (document.title)

 Math Help > Statistics > Univariate

Here's what you'll find in this section:

• Percentiles
• # Percentiles

We will now look at some other measures of location and spread.

• The (100p)th percentile of a population, called η(p) (η is the Greek letter eta), is the number such that (100p)% of the population ≤ η(p) and 100(1 — p)% of the population ≥ η(p).
• The 95th percentile:
• 95% ≤ η(.95), 5%  ≥ η(.95).
• The 90th percentile:
• 90% ≤ η(.90), 10%  ≥ η(.90).
• The 75th percentile:
• 75% ≤ η(.75), 25%  ≥ η(.75).
• The 50th percentile:
• 50% ≤ η(.50), 50%  ≥ η(.50).
• The 25th percentile:
• 25% ≤ η(.25), 75%  ≥ η(.25).
• The 10th percentile:
• 10% ≤ η(.10), 90%  ≥ η(.10).
• Values which divide the (ordered) data into fourths.
• Q1 (Lower Quartile): The 25th percentile.
• Q2 (Median): The 50th percentile.
• Q3 (Upper Quartile): The 75th percentile.
• Calculating Sample Percentiles
1. Order the n data values from lowest to highest.
2. p=.50: Calculate the sample median.
3. p=.25 or .75:
• If n is even:
• Q1 = median of the lower half of the data.
• Q3 = median of the upper half of the data.
• If n is odd:
• Q1 = median of the lower "half'' of the data (including , the median).
• Q3 = median of the upper "half'' of the data (including &Xtilde; ).
4. p ≠ .25, .5, .75:
• Compute np and round up, call this number m.
• Use the mth point in order.
• The maximum data value minus the minimum data value: .
• Range (IQR) The value .

• Boxplots
• # Boxplots

The five-number summary is an abbreviated way to describe a sample. The five number summary is a list of the following numbers:

1. Minimum
2. First (Lower) Quartile,
3. Median,
4. Third (Upper) Quartile,
5. Maximum

The five number summary leads to a graphical representation of a distribution called the boxplot. Boxplots are ideal for comparing two nearly-continuous variables. To draw a boxplot (see the example in the figure below), follow these simple steps:

1. The ends of the box (hinges) are at the quartiles, so that the length of the box is the .
2. The median is marked by a line within the box.
3. The two vertical lines (called whiskers) outside the box extend to the smallest and largest observations within of the quartiles.
4. Observations that fall outside of are called extreme outliers and are marked, for example, with an open circle. Observations between and are called mild outliers and are distinguished by a different mark, e.g., a closed circle.

EXAMPLE: To illustrate boxplots, the figure below puts boxplots side by side of the same four data sets that had histograms in the figure in Week 1.

• Normal Quantile Plots
• # Normal Quantile Plots

In many places during this course we will assume that a sample comes from a population having the normal (bell-shaped) distribution. A plot based on percentiles that seeks to verify this assumption is called the normal quantile plot. This is a scatterplot of the percentiles of the data versus the percentiles of a population in fact having the normal distribution. If the data do come from a normal population, the resulting points should fall closely along a straight line.

To illustrate this, the figure below shows the normal quantile plot of a random sample of 50 IQ's (we said earlier that IQ's do in fact follow a normal distribution). Notice how the points closely follow the line.

To better understand the information that the normal quantile plots provide us and the relationship among distributions , histograms, box plots and normal quantile plots, we can look at the figure at the previous page. The 4 plots on the first row indicate the distributions where the data are sampled from. The second, the third and the fourth rows are, respectively, the corresponding histograms, boxplots and normal quantile plots. The four distributions are normal, long tailed, short tailed and skewed, respectively.

• Chebychev and Empirical Rules
• # Chebychev and Empirical Rules

Knowing the mean and standard deviation of a sample or a population gives us a good idea of where most of the data values are because of the following two rules:

• 's Rule The proportion of observations within k standard deviations of the mean, where , is at least , i.e., at least 75%, 89%, and 94% of the data are within 2, 3, and 4 standard deviations of the mean, respectively.
• Empirical Rule If data follow a bell-shaped curve, then approximately 68%, 95%, and 99.7% of the data are within 1, 2, and 3 standard deviations of the mean, respectively.

EXAMPLE: A pharmaceutical company manufactures vitamin pills which contain an average of 507 grams of vitamin C with a standard deviation of 3 grams. Using Chebychev's rule, we know that at least

or 75% of the vitamin pills are within k=2 standard deviations of the mean. That is, at least 75% of the vitamin pills will have between 501 and 513 grams of vitamin C, i.e.,

EXAMPLE: If the distribution of vitamin C amounts in the previous example is bell shaped, then we can get even more precise results by using the empirical rule. Under these conditions, approximately 68% of the vitamin pills have a vitamin C content in the interval [507-3,507+3]=[504,510], 95% are in the interval [507-2(3),507+2(3)]=[501,513], and 99.7% are in the interval [507-3(3),507+3(3)]=[498,516].

NOTE: Chebychev's rule gives only a minimum proportion of observations which lie within k standard deviations of the mean.

•
• Z-Scores
• # Z-Scores

Z-scores are a means of answering the question ``how many standard deviations away from the mean is this observation?'' If our observation X is from a population with mean and standard deviation , then

On the other hand, if the observation X is from a sample with mean and standard deviation s, then

A positive (negative) Z-score indicates that the observation is greater than (less than) the mean.

EXAMPLE: In a certain city the mean price of a quart of milk is 63 cents and the standard deviation is 8 cents. The average price of a package of bacon is \$1.80 and the standard deviation is 15 cents. If we pay \$0.89 for a quart of milk and \$2.19 for a package of bacon at a 24-hour convenience store, which is relatively more expensive? To answer this, we compute Z-scores for each:

Our Z-scores show us that we are overpaying quite a bit more for the milk than we are for the bacon.

Because of the Empirical rule (or the Chebychev's rule), the Z-score of a given observation also provides insight on how ``typical'' this observation is to the population. For example, by empirical rule, if data follow a bell-shaped curve, then approximately 95% of the data should have the Z-score between -2 and 2.

### Related pages in this website

The webmaster and author of this Math Help site is Graeme McRae.