 |
 |
There are two basic questions asked by inferential statistics:
- How close is the value of a statistic to the corresponding
parameter of the entire population. For example, if we have a sample
of 30 elements from a population and we find that
, we would like to know how far this might be from the mean
of the entire population.
- In many cases, someone has hypothesized a particular value for the
parameter of a population or some relationship between the
parameters of two or more populations. For example, experience may
show that the mean IQ of all people is 100 and someone may want to
test whether a particular teaching method leads to higher mean IQ.
Similarly, someone may wonder if mean IQ of men (call it
) is the same as that for women (call it
).
Both of these inferential questions are answered using an idea called
the sampling distribution of a transformed statistic which we
study in this section.
 | Statistics
and Random Samples
 |
A random sample of size n from a population is a
set of n elements from the population that are chosen in such
a way that every set of n elements has the same probability
of being chosen.
Computers are very good at selecting random samples and we will
use Stataquest throughout the course to choose samples.
A statistic is a number calculated from a sample. Examples
include the sample mean
, the sample variance,
, and so on.
 | Sampling
Schemes and Inferences from Samples
 |
Much of what we do in this course consists of
- Taking a random sample (or samples) from a population (or
populations),
- Calculating a statistic (or statistics),
- Making inferences about parameters of the whole population
from the corresponding statistics calculated from the samples.
In particular, we consider the following situations:
 | The
Central Limit Theorem
 |
One remarkable fact about the normal distribution is the fact
that if we took many samples of size n from a population
having mean
and variance
(any distribution we want), then the population of
's would be approximately normally distributed with mean
and variance
. The larger n is, the better the approximation is.
These facts are known collectively as the Central Limit Theorem
and allow us to make inferences about population means using the
normal distribution no matter what the distribution of the
population being sampled from. See the ``Central Limit Theorem''
concept lab for more about this.
 | Normal
Approximation to Binomial
 |
A particularly useful example of the Central Limit Theorem is
when we are sampling from a 0-1 population. In this case, the number
of 1's observed has the binomial distribution which is difficult to
make calculations from. But notice that
for the sample is in fact the sample proportion p and the
Central Limit Theorem says that
is approximately normal with mean equal to the mean of the 0-1
population (also known as
, the proportion of 1's in the population) and variance
. See the ``Z, t, Chi-square, F'' concept lab.
 | The
Transformed Statistics: Z, t,
, F
 |
The basic idea of statistical inference is that we can determine
(using what is called sampling distributions) the likely values of a
number that measures how far a statistic is from the corresponding
parameter. For example, we can measure how far the statistic
is from the parameter
by calculating the number (called a ``transformed statistic'')
and noting that if
is close to
, then Z should be close to 0. Similarly, we can measure how
close
is to
by calculating
which should be close to n-1 if
is close to
(we will see in a minute why we use the symbols Z and
to represent the numbers).
In the table below, we write down a number of transformed
statistics and what they should be close to. You may wonder why we
use these transformations rather than some simple measure of
distance such as
. The answer is that statisticians have learned over the past 100
years that the more complicated transformations listed in the table
allow them to find the desired likely values while simple distance
measures are much more difficult to work with.

 | Sampling
Distributions of the Transformed Statistics
 |
So what good are these transformed statistics? As we said, we
know what they should be close to if our statistic is close to the
true parameter. The miracle is that (if certain assumptions are met)
statisticians have determined mathematically intervals of the real
line that a transformed statistic will fall into with specified
probability.
For example, the first transformed statistic is labeled Z
because statisticians have shown that if the population is normally
distributed, then the transformed statistic has the Z
distribution (the standard normal curve). Thus if we repeatedly
selected random samples of size n and and calculated Z
for each one, then we know that 95% of the samples will have a Z
between -1.96 and 1.96. (You will use the ``Sampling Distribution''
concept lab to experiment with this idea).
Thus, how close is
to
in this situation? We saw earlier in this week that 95% of the area
under a Z curve fals between -1.96 and 1.96. This tells us
that 95% of all samples will have
that is, 95% of all samples will have
within
. For example, 95% of all samples of 25 IQ's (remember that IQ's are
thought to be normally distributed with
and
) will have
in
, that is, 95% of all samples will have
within
3 of
.
|
| | | | | | | | | | |
 |
 | Computer
Lab
 |
Applicable StataQuest Commands:
Data
Generate/Replace
Random numbers to generate random Normals, Binomials, etc.
Data
Generate/Replace
Formula to generate z-scores
Calculator
Statistical tables
Normal to find probabilities
Calculator
Inverse statistical tables
Normal to find z-scores
 | Concept
Lab
 |
 | Ch 8: Z, t, Chi-square, F
Normal Curves
 | Ch 5: Sampling From 0-1 Populations
 | Ch 2: Random Sampling
 | Ch 7: Central Limit Theorem |
| | |
%
|
| | | | |
 | |
|