Here's what you'll find in this section:
There are two basic questions asked by inferential statistics:
A random sample of size n from a population is a set of n elements from the population that are chosen in such a way that every set of n elements has the same probability of being chosen.
Computers are very good at selecting random samples and we will use Stataquest throughout the course to choose samples.
A statistic is a number calculated from a sample. Examples include the sample mean , the sample variance, , and so on.
Much of what we do in this course consists of
In particular, we consider the following situations:
One remarkable fact about the normal distribution is the fact that if we took many samples of size n from a population having mean and variance (any distribution we want), then the population of 's would be approximately normally distributed with mean and variance . The larger n is, the better the approximation is.
These facts are known collectively as the Central Limit Theorem and allow us to make inferences about population means using the normal distribution no matter what the distribution of the population being sampled from. See the ``Central Limit Theorem'' concept lab for more about this.
A particularly useful example of the Central Limit Theorem is when we are sampling from a 0-1 population. In this case, the number of 1's observed has the binomial distribution which is difficult to make calculations from. But notice that for the sample is in fact the sample proportion p and the Central Limit Theorem says that is approximately normal with mean equal to the mean of the 0-1 population (also known as , the proportion of 1's in the population) and variance . See the ``Z, t, Chi-square, F'' concept lab.
The basic idea of statistical inference is that we can determine (using what is called sampling distributions) the likely values of a number that measures how far a statistic is from the corresponding parameter. For example, we can measure how far the statistic is from the parameter by calculating the number (called a ``transformed statistic'')
and noting that if is close to , then Z should be close to 0. Similarly, we can measure how close is to by calculating
which should be close to n-1 if is close to (we will see in a minute why we use the symbols Z and to represent the numbers).
In the table below, we write down a number of transformed statistics and what they should be close to. You may wonder why we use these transformations rather than some simple measure of distance such as . The answer is that statisticians have learned over the past 100 years that the more complicated transformations listed in the table allow them to find the desired likely values while simple distance measures are much more difficult to work with.
So what good are these transformed statistics? As we said, we know what they should be close to if our statistic is close to the true parameter. The miracle is that (if certain assumptions are met) statisticians have determined mathematically intervals of the real line that a transformed statistic will fall into with specified probability.
For example, the first transformed statistic is labeled Z because statisticians have shown that if the population is normally distributed, then the transformed statistic has the Z distribution (the standard normal curve). Thus if we repeatedly selected random samples of size n and and calculated Z for each one, then we know that 95% of the samples will have a Z between -1.96 and 1.96. (You will use the ``Sampling Distribution'' concept lab to experiment with this idea).
Thus, how close is to in this situation? We saw earlier in this week that 95% of the area under a Z curve fals between -1.96 and 1.96. This tells us that 95% of all samples will have
that is, 95% of all samples will have within . For example, 95% of all samples of 25 IQ's (remember that IQ's are thought to be normally distributed with and ) will have in , that is, 95% of all samples will have within 3 of .
Applicable StataQuest Commands:
Data Generate/Replace Random numbers to generate random Normals, Binomials, etc.
Data Generate/Replace Formula to generate z-scores
Calculator Statistical tables Normal to find probabilities
Calculator Inverse statistical tables Normal to find z-scores
The webmaster and author of this Math Help site is Graeme McRae.