Mann-Whitney U

From Free net encyclopedia

(Redirected from Mann-Whitney U test)

The Mann-Whitney U test is one of the best-known non-parametric statistical significance tests. It was proposed, apparently independently, by Mann and Whitney (1947) and Wilcoxon (1945), and is therefore sometimes also called the Mann-Whitney-Wilcoxon (MWW) test or the Wilcoxon rank-sum test.

The test is appropriate to the case of two independent samples of observations measured at least at an ordinal level, i.e. we can at least say, of any two observations, which is the greater. The test assesses whether the degree of overlap between the two observed distributions is less than would be expected by chance, on the null hypothesis that the two samples are drawn from a single population.

The test involves the calculation of a statistic, usually called U, whose distribution under the null hypothesis is known. In the case of small samples, the distribution is tabulated, but for samples above about 20 there is a good approximation using the normal distribution. Some books tabulate statistics other than U, such as the sum of ranks in one of the samples, but this deviation from standard practice is unhelpful.

The U test is included in most modern statistical packages. However, it is easily calculated by hand especially for small samples. There are two ways of doing this.

For small samples a direct method is recommended. It is very quick, and gives an insight into the meaning of the U statistic.

  1. Choose the sample for which the ranks seem to be smaller (the smaller sample; the choice is relevant only to ease of computation). Call this "sample 1", and call the other sample "sample 2".
  2. Taking each observation in sample 1, count the number of observations in sample 2 that are smaller than it.
  3. The total of these counts is U.

For larger samples, a formula can be used.

  1. Arrange all the observations into a single ranked series.
  2. Add up the ranks in the smaller group. The sum of ranks in the other group follows by calculation, since the sum of all the ranks equals N(N + 1)/2 where N is the total number of observations.
  3. U is then given by:
<math>U=n_1 n_2 +{n_1(n_1+1) \over 2}-R_1</math>
where n1 and n2 are the two sample sizes, and R1 is the sum of the ranks in sample 1.</p>

Note that the maximum value of U is the product of the two sample sizes, and if the value obtained by either of the methods above is more than half of this maximum, it should be subtracted from the maximum to obtain the value to look up in tables.

Contents

Example

Suppose that Aesop is dissatisfied with his classic experiment in which one tortoise was found to beat one hare in a race, and decides to carry out a significance test to discover whether the results could be extended to tortoises and hares in general. He collects a sample of 6 tortoises and 6 hares, and makes them all run his race. The order in which they reach the finishing post is as follows, writing T for a tortoise and H for a hare:

T H H H H H T T T T T H

(his original tortoise still goes at warp speed, and his original hare is still lazy, but the others run truer to stereotype). What is the value of U?

  • Using the direct method, we take each tortoise in turn, and count the number of hares it beats, getting 6, 1, 1, 1, 1, 1. So U = 6 + 1 + 1 + 1 + 1 + 1 = 11.
  • Using the indirect method:
the sum of the ranks achieved by the tortoises is 1 + 7 + 8 + 9 + 1 0 + 11 = 46.
Therefore U = 6×6 + 6×7/2 − 46 = 36 + 21 − 46 = 11.

Consulting the table referenced below, we find that this result does not confirm the greater speed of tortoises, though nor does it show any significant speed advantage for hares. It is left as an exercise for the reader to establish that statistical packages will give the same result, at rather greater expense.


Approximation

For large samples, the normal approximation:

<math>z=(U-m_U)/\sigma_{U}</math>

can be used, where z is a standard normal deviate whose significance can be checked in tables of the normal distribution. mU and σU are the mean and standard deviation of U if the null hypothesis is true, and are given by:

<math>m_U=n_1 n_2/2.</math>
<math>\sigma_U=\sqrt{n_1 n_2 (n_1+n_2+1) \over 12}.</math>

All the formulae here are made more complicated in the presence of tied ranks, but if the number of these is small (and especially if there are no large tie bands) these can be ignored when doing calculations by hand. The computer statistical packages will use them as a matter of routine.


Relation to other tests

The U test is useful in the same situations as the independent samples Student's t-test, and the question arises of which should be preferred. Before electronic calculators and computer packages made calculations easy, the U test was preferred on grounds of speed of calculation. It remains the logical choice when the data are inherently ordinal; and it is much less likely than the t-test to give a spuriously significant result because of one or two outliers.

On the other hand, the U test is often recommended for situations where the distributions of the two samples are very different. This is an error: it tests whether the two samples come from a common distribution, and Monte Carlo methods have shown that it is capable of giving erroneously significant results in some situations where they are drawn from distributions with the same mean and different variances. In that situation, the version of the t-test that allows for the samples to come from populations of different variance is likely to give more reliable results.

The U test is related to a number of other nonparametric statistical procedures. For example, it is equivalent to Kendall's τ correlation coefficient if one of the variables is binary (that is, it can only take two values).

The ρ statistic proposed by Richard Herrnstein is linearly related to U and widely used in studies of categorization (discrimination learning involving concepts) in birds (see animal cognition). ρ is calculated by dividing U by its maximum value for the given sample sizes, which is simply n1 x n2. ρ is thus a non-parametric measure of the overlap between two distributions; it can take values between 0 and 1, and it is equal to <math>p(Y > X) + 0.5 p(Y = X)</math>, where X and Y are randomly chosen observations from the two distributions. Both extreme values represent complete separation of the distributions, while a ρ of 0.5 represents complete overlap.

References

  • Table of critical values of the Mann-Whitney U distribution (pdf)
  • Herrnstein, R. J., Loveland, D. H., & Cable, C. (1976). Natural concepts in pigeons. Journal of Experimental Psychology: Animal Behavior Processes, 2, 285-302.
  • Mann, H. B., & Whitney, D. R. (1947). On a test of whether one of 2 random variables is stochastically larger than the other. Annals of Mathematical Statistics, 18, 50-60.
  • Wilcoxon, F. (1945). Individual comparisons by ranking methods. Biometrics Bulletin, 1, 80-83.de:Wilcoxon-Rangsummentest

es:Prueba de Mann-Whitney it:Test di Wilcoxon-Mann-Whitney ja:マン・ホイットニーのU検定 nl:Wilcoxon