
Here's what you'll find in this section:
Twoway ANOVA and Nonparametric Inferences
EXAMPLE:\ An agricultural scientist is interested in the corn yield when three different fertilizers are available and corn is planted in four different soil types. The questions he is interested in answering are:
Because we are applying two treatments to our population, we will use twoway ANOVA to analyze this type of problem. We will consider two types of twoway ANOVA:
If we use the model
we have to estimate IJ means and (a total of IJ+1 parameters) using only IJ observations! Since we can't estimate all of our parameters, we will change models (slightly),
where is the effect of factor A and is the effect of factor B. Now we only have to estimate I+J+1 parameters, which is now possible. (Actually, we also assume which leaves us with only I+J1 parameters to estimate.)
A slightly more general additive model is
where are the number of replications at each combination of factor A and factor B levels.
NOTE:\ When k is small, especially when k=1, we are forced to use the additive model. There will be more about this in Section 10.2.2.
The ANOVA table for the additive model is given by
The relevant null hypotheses are
and are tested by and , respectively. In words, these hypotheses are
EXAMPLE:\ In a study of automobile traffic and air pollution, air samples taken at four different times and at five different locations were analyzed to obtain the amount of particulate matter present in the air. Is there any difference in true average amount of particulate matter present in the air due either to different sampling times or to different locations?
Notice that in this case, both and are significantly greater than one. Thus, there is an effect due both to time and location.
When the additive model holds, there is no interaction between factors A and B. In other words, the effect of factor A is the same no matter what the level of factor B is. When the additive model doesn't hold, we have to go to a model which allows A and B to interact.
We will use the model
where , , and , but represent it in the form
where is the interaction of factors A and B.
The relevant null hypotheses are
and are tested by their respective F values in the following ANOVA table.
If we have small samples, the one and two sample t tests and the test of comparing K means are all valid only if we are sampling from normal populations. This week we study methods for comparing the distribution of populations that do not require the normality (or any other distributional assumption). There are two basic points to be made:
1. The distributionfree methods are valid for any distribution of the populations being compared, that is, if we specify a certain value, then the true type I error probability is .
2. If the populations being compared do in fact have the normal distribution, then the previous methods (t tests and so on) are in fact better than the distributionfree methods we will study. They are better in the sense that if the populations are different, then the parametric procedures have a better chance of concluding they are different (that is, they are more powerful).
These two points are illustrated in the ``Comparing Parametric and Nonparametric Tests'' concept lab.
Given n pairs of data, the sign test tests the hypothesis that the median of the differences in the pairs is zero. The test statistic is the number of positive differences. If the null hypothesis is true, then the numbers of positive and negative differences should be approximately the same. In fact, the number of positive differences will have a binomial distribution with parameters n and p. Stataquest will return the pvalue associated with the test statistic.
A similar test for the median difference in paired data to be zero consists of sorting the absolute values of the differences from smallest to largest, assigning ranks to the absolute values (rank 1 to the smallest, rank 2 to the next smallest, and so on) and then finding the sum of the ranks of the positive differences. If the null hypothesis is true, the sum of the ranks of the positive differences should be about the same as the sum of the ranks of the negative differences. Again, Stataquest will return the pvalue of the test.
This test is used in place of a two sample t test when the populations being compared are not normal. It requires independent random samples of sizes and . The test is very simple and consists of combining the two samples into one sample of size , sorting the result, assigning ranks to the sorted values (giving the average rank to any `tied' observations), and then letting T be the sum of the ranks for the observations in the first sample. If the two populations have the same distribution then the sum of the ranks of the first sample and those in the second sample should be close to the same value. Stataquest returns a p value for the null hypothesis that the two distributions are the same.
This test is the nonparametric version of one way ANOVA and is a straightforward generalization of the Wilcoxon test for two independent samples. If we have K independent samples of sizes , we combine all the samples into one large sample, sort the result from smallest to largest and assign ranks (again assigning the average rank to any observation in a group of tied observations), and then find , the average of the ranks of the observations in the ith sample. The test statistic is then
and reject the null hypothesis that all K distributions are the same if . Again, Stataquest will return the p value for the test.
Applicable StataQuest Commands:
Statistics ANOVA Twoway this also includes interaction plots
Statistics Nonparametric test Sign test for Wicoxon signedrank test
Statistics Nonparametric test MannWhitney for WicoxonMannWhitney rank sum test
Statistics Nonparametric test KruskalWallis for KruskalWallis test
The webmaster and author of this Math Help site is Graeme McRae.