
Here's what you'll find in this section:
We saw last week that using sampling distributions we can get an interval of the real line that will include the transformed statistic any specified proportion of the time in repeated samples, for example, [ (1.96X/n1.96)=.95, ] or
In fact, such a probability inequality can be written down for each transformed statistic we studied last week. We have used .025 and .975 in the two examples above so that the total probability would be .95=.975.025. It is traditional in statistics to denote this ``inclusion probability'' by , which in our example would mean that and the .025 would be and the .975 would be .
Thus if we wanted the inclusion probability to be .99 instead of .95 in the example, we would use and ( ).
Once we have such an inequality as the two above, it is a simple matter to solve the inequality to get the parameter of interest by itself in the middle. This gives what is called a ``confidence interval'' for the parameter, that is, an interval of the real line that percent of the time will include the value of the parameter. What we've been calling the inclusion probability is called the confidence level of the confidence interval. Recall again that such a confidence interval will contain the value of the parameter (such as a population mean, for example) in percent of all samples from the population.
Rather than go through all of the algebra of the inequalities, we have given at the end of this chapter a list of confidence intervals. This list is by one of 11 `case numbers' which correspond to the dialog box in the `Calculating Confidence Intervals' concept lab. For examples of using confidence intervals, see that lab.
Stataquest makes it easy to get these intervals for a given data set.
Here we will derive the sample size needed to obtain an interval of length 2B, where B indicates the largest possible distance between any in the confidence interval and the sample mean . Recall that
Solving for n we obtain
Use of the formula above requires to be known, which seldom happens in practice. When is unknown, one either estimate it with the sample standard deviation from the previous study or just use the 1/4 of the range (the difference between the largest and the smallest observations) as a rough guess.
EXAMPLE:\: A factory claims that the average working hour of its employees is 40. 49 workers are chosen randomly and their average working hours is 42 with the standard deviation equal to 6. For future studies, if we want to construct a 99% confidence interval with the total length less than two hours, how large a sample will we need? (Assume .)
Here we will derive the sample size needed to obtain an interval of length 2B, where B indicates the largest possible distance between any in the confidence interval and the sample proportion p. Recall that
Solving for n we obtain
There is a problem with this equation: the value of p depends on n, i.e., . However, for p between 0 and 1, p(1p) has a maximum value of 0.25. If we plug this maximum into our equation, we get
We are guaranteed that if we take a sample of size n, our confidence interval will be no wider than 2B.
EXAMPLE:\ A produce supplier claims that 75% of his tomatoes will be ripe upon arrival at a distribution center. To test this claim, a random sample of tomatoes was selected from a shipment. Let denote the true proportion of ripe tomatoes in the particular shipment. What does n have to be to get a 95% confidence interval of length no more than 0.1?
We would need n=385 to get a 95% confidence interval whose length is at most 0.1. Note again that we always round up.
Applicable StataQuest Commands:
Data Generate/Replace Random numbers to generate random Normals
Summaries Confidence intervals to generate t confidence intervals for data you generated or data from a file
The webmaster and author of this Math Help site is Graeme McRae.