Sampling Distribution
   

   

 Math Help -> Statistics -> Sampling Distribution of a Statistic 

Contents of this section:

Here's what you'll find in this section:

bullet 
bullet

Sampling Distributions

There are two basic questions asked by inferential statistics:

 

  1. How close is the value of a statistic to the corresponding parameter of the entire population. For example, if we have a sample of 30 elements from a population and we find that tex2html_wrap_inline3575 , we would like to know how far this might be from the mean tex2html_wrap_inline2651 of the entire population.
  2. In many cases, someone has hypothesized a particular value for the parameter of a population or some relationship between the parameters of two or more populations. For example, experience may show that the mean IQ of all people is 100 and someone may want to test whether a particular teaching method leads to higher mean IQ. Similarly, someone may wonder if mean IQ of men (call it tex2html_wrap_inline3579 ) is the same as that for women (call it tex2html_wrap_inline3581 ).
Both of these inferential questions are answered using an idea called the sampling distribution of a transformed statistic which we study in this section.


 
bulletStatistics and Random Samples
bullet

Statistics and Random Samples

A random sample of size n from a population is a set of n elements from the population that are chosen in such a way that every set of n elements has the same probability of being chosen.

Computers are very good at selecting random samples and we will use Stataquest throughout the course to choose samples.

A statistic is a number calculated from a sample. Examples include the sample mean tex2html_wrap_inline2643 , the sample variance, tex2html_wrap_inline2669 , and so on.


 

bulletSampling Schemes and Inferences from Samples
bullet

Sampling Schemes and Inferences from Samples

Much of what we do in this course consists of

 

  1. Taking a random sample (or samples) from a population (or populations),
  2. Calculating a statistic (or statistics),
  3. Making inferences about parameters of the whole population from the corresponding statistics calculated from the samples.

In particular, we consider the following situations:


 
bulletThe Central Limit Theorem
bullet

The Central Limit Theorem

One remarkable fact about the normal distribution is the fact that if we took many samples of size n from a population having mean tex2html_wrap_inline2651 and variance tex2html_wrap_inline2693 (any distribution we want), then the population of tex2html_wrap_inline2643 's would be approximately normally distributed with mean tex2html_wrap_inline2651 and variance tex2html_wrap_inline3683 . The larger n is, the better the approximation is.

These facts are known collectively as the Central Limit Theorem and allow us to make inferences about population means using the normal distribution no matter what the distribution of the population being sampled from. See the ``Central Limit Theorem'' concept lab for more about this.


 
bulletNormal Approximation to Binomial
bullet

Normal Approximation to Binomial

A particularly useful example of the Central Limit Theorem is when we are sampling from a 0-1 population. In this case, the number of 1's observed has the binomial distribution which is difficult to make calculations from. But notice that tex2html_wrap_inline2643 for the sample is in fact the sample proportion p and the Central Limit Theorem says that tex2html_wrap_inline2643 is approximately normal with mean equal to the mean of the 0-1 population (also known as tex2html_wrap_inline2703 , the proportion of 1's in the population) and variance tex2html_wrap_inline3695 . See the ``Z, t, Chi-square, F'' concept lab.


 
bulletThe Transformed Statistics: Z, t, tex2html_wrap_inline3701 , F
bullet

The Transformed Statistics: Z, t, tex2html_wrap_inline3701 , F

The basic idea of statistical inference is that we can determine (using what is called sampling distributions) the likely values of a number that measures how far a statistic is from the corresponding parameter. For example, we can measure how far the statistic tex2html_wrap_inline2643 is from the parameter tex2html_wrap_inline2651 by calculating the number (called a ``transformed statistic'')

displaymath3709

and noting that if tex2html_wrap_inline2643 is close to tex2html_wrap_inline2651 , then Z should be close to 0. Similarly, we can measure how close tex2html_wrap_inline2669 is to tex2html_wrap_inline2693 by calculating

displaymath3721

which should be close to n-1 if tex2html_wrap_inline2669 is close to tex2html_wrap_inline2693 (we will see in a minute why we use the symbols Z and tex2html_wrap_inline3701 to represent the numbers).

In the table below, we write down a number of transformed statistics and what they should be close to. You may wonder why we use these transformations rather than some simple measure of distance such as tex2html_wrap_inline3733 . The answer is that statisticians have learned over the past 100 years that the more complicated transformations listed in the table allow them to find the desired likely values while simple distance measures are much more difficult to work with.

 

 

table1126

 
bulletSampling Distributions of the Transformed Statistics
bullet

Sampling Distributions of the Transformed Statistics

So what good are these transformed statistics? As we said, we know what they should be close to if our statistic is close to the true parameter. The miracle is that (if certain assumptions are met) statisticians have determined mathematically intervals of the real line that a transformed statistic will fall into with specified probability.

For example, the first transformed statistic is labeled Z because statisticians have shown that if the population is normally distributed, then the transformed statistic has the Z distribution (the standard normal curve). Thus if we repeatedly selected random samples of size n and and calculated Z for each one, then we know that 95% of the samples will have a Z between -1.96 and 1.96. (You will use the ``Sampling Distribution'' concept lab to experiment with this idea).

Thus, how close is tex2html_wrap_inline2643 to tex2html_wrap_inline2651 in this situation? We saw earlier in this week that 95% of the area under a Z curve fals between -1.96 and 1.96. This tells us that 95% of all samples will have

displaymath3821

that is, 95% of all samples will have tex2html_wrap_inline3823 within tex2html_wrap_inline3825 . For example, 95% of all samples of 25 IQ's (remember that IQ's are thought to be normally distributed with tex2html_wrap_inline3827 and tex2html_wrap_inline3829 ) will have tex2html_wrap_inline3823 in tex2html_wrap_inline3833 , that is, 95% of all samples will have tex2html_wrap_inline2643 within tex2html_wrap_inline3837 3 of tex2html_wrap_inline2651 .


 

bullet 
bulletComputer Lab
bullet

Computer Lab

Applicable StataQuest Commands:

Data tex2html_wrap_inline3057 Generate/Replace tex2html_wrap_inline3057 Random numbers to generate random Normals, Binomials, etc.

Data tex2html_wrap_inline3057 Generate/Replace tex2html_wrap_inline3057 Formula to generate z-scores

Calculator tex2html_wrap_inline3057 Statistical tables tex2html_wrap_inline3057 Normal to find probabilities

Calculator tex2html_wrap_inline3057 Inverse statistical tables tex2html_wrap_inline3057 Normal to find z-scores


 
bulletConcept Lab
bullet

Concept Lab

 
bulletCh 8: Z, t, Chi-square, F tex2html_wrap_inline3057 Normal Curves
bulletCh 5: Sampling From 0-1 Populations
bulletCh 2: Random Sampling
bulletCh 7: Central Limit Theorem

%

 

bullet 

 

Internet References

 

Related pages in this website

 

 

The webmaster and author of the Math Help site is Graeme McRae.
     [home]  [email]  [search]  [Links to Math Sites]  [Whiteboard]