We saw last week that using sampling distributions we can get an
interval of the real line that will include the transformed statistic
any specified proportion of the time in repeated samples, for example, [
(-1.96X-/n1.96)=.95, ] or
In fact, such a probability inequality can be written down for each
transformed statistic we studied last week. We have used .025 and .975
in the two examples above so that the total probability would be
.95=.975-.025. It is traditional in statistics to denote this
``inclusion probability'' by
, which in our example would mean that
and the .025 would be
and the .975 would be
.
Thus if we wanted the inclusion probability to be .99 instead of .95
in the
example, we would use
and
(
).
Once we have such an inequality as the two above, it is a simple
matter to solve the inequality to get the parameter of interest by
itself in the middle. This gives what is called a ``confidence
interval'' for the parameter, that is, an interval of the real line that
percent of the time will include the value of the parameter. What we've
been calling the inclusion probability is called the confidence level of
the confidence interval. Recall again that such a confidence interval
will contain the value of the parameter (such as a population mean, for
example) in
percent of all samples from the population.
Rather than go through all of the algebra of the inequalities, we
have given at the end of this chapter a list of confidence intervals.
This list is by one of 11 `case numbers' which correspond to the dialog
box in the `Calculating Confidence Intervals' concept lab. For examples
of using confidence intervals, see that lab.
Stataquest makes it easy to get these intervals for a given data set.
 |
 | Sample
Size Determination
 |
 | Estimating
with a
% confidence interval of length 2B
 |
Here we will derive the sample size needed to obtain an interval
of length 2B, where B indicates the largest possible
distance between any
in the confidence interval and the sample mean
. Recall that
Solving for n we obtain
Use of the formula above requires
to be known, which seldom happens in practice. When
is unknown, one either estimate it with the sample standard
deviation from the previous study or just use the 1/4 of the range
(the difference between the largest and the smallest observations)
as a rough guess.
EXAMPLE:\: A factory claims
that the average working hour
of its employees is 40. 49 workers are chosen randomly and their
average working hours is 42 with the standard deviation equal to 6.
For future studies, if we want to construct a 99% confidence
interval with the total length less than two hours, how large a
sample will we need? (Assume
.)
We would need n=240 to get a 99% confidence interval whose
length is at most 2. Note that we always round up.
 | Estimating
with a
% confidence interval of length 2B
 |
Here we will derive the sample size needed to obtain an interval
of length 2B, where B indicates the largest possible
distance between any
in the confidence interval and the sample proportion p.
Recall that
Solving for n we obtain
There is a problem with this equation: the value of p
depends on n, i.e.,
. However, for p between 0 and 1, p(1-p) has a
maximum value of 0.25. If we plug this maximum into our equation, we
get
We are guaranteed that if we take a sample of size n, our
confidence interval will be no wider than 2B.
EXAMPLE:\ A produce
supplier claims that 75% of his tomatoes will be ripe upon arrival
at a distribution center. To test this claim, a random sample of
tomatoes was selected from a shipment. Let
denote the true proportion of ripe tomatoes in the particular
shipment. What does n have to be to get a 95% confidence
interval of length no more than 0.1?
We would need n=385 to get a 95% confidence interval whose
length is at most 0.1. Note again that we always round up.
|
| | | |
 |
 | Computer
Lab for Week 6
 |
Applicable StataQuest Commands:
Data
Generate/Replace
Random numbers to generate random Normals
Summaries
Confidence intervals to generate t confidence intervals for
data you generated or data from a file
 | Concept
Lab for Week 6
 |
 | Ch 10: Minimum Variance Estimation
 | Ch 8: Z, t, Chi-square, F
 | Ch 9: Sampling Distributions
 | Ch 12: Interpreting Confidence Intervals |
| | |
|
| | | | | |