Geometric standard deviation
From Free net encyclopedia
In probability theory and statistics, the geometric standard deviation describes how spread out are a set of numbers whose preferred average is the geometric mean. If the geometric mean of a set of numbers {A1, A2, ..., An} is denoted as μg, then the geometric standard deviation is
- <math> \sigma_g = \exp \left( \sqrt{ \sum_{i=1}^n ( \ln A_i - \ln \mu_g )^2 \over n } \right). \qquad \qquad (1) </math>
Contents |
Derivation
If the geometric mean is
- <math> \mu_g = \sqrt[n]{ A_1 A_2 \cdots A_n }.\, </math>
then taking the natural logarithm of both sides results in
- <math> \ln \mu_g = {1 \over n} \ln (A_1 A_2 \cdots A_n). </math>
The logarithm of a product is a sum of logarithms, so
- <math> \ln \mu_g = {1 \over n} [ \ln A_1 + \ln A_2 + \cdots + \ln A_n ].\, </math>
It can now be seen that <math> \ln \, \mu_g </math> is the arithmetic mean of the set <math> \{ \ln A_1, \ln A_2, \dots , \ln A_n \} </math>, therefore the arithmetic standard deviation of this same set should be
- <math> \ln \sigma_g = \sqrt{ \sum_{i=1}^n ( \ln A_i - \ln \mu_g )^2 \over n }. </math>
Thus
- ln(geometric SD of A1, ..., An) = arithmetic (i.e. usual) SD of ln(A1), ..., ln(An).
Geometric standard score
The geometric version of the standard score is
- <math> z = {\ln ( x/\mu_g ) \over \ln \sigma_g }.\, </math>
If the geometric mean, standard deviation, and z-score of a datum are known, then the raw score can be reconstructed by
- <math> x = \mu_g \sigma_g^z. </math>
Relationship to log-normal distribution
The geometric standard deviation is related to the log-normal distribution. The log-normal distribution is a distribution which is normal for the logarithm transformed values. By a simple set of logarithm transformations we see that the geometric standard deviation is the exponentiated value of the standard deviation of the log transformed values (e.g. exp(stdev(ln(A))));
As such, the geometric mean and the geometric standard deviation of a sample of data from a log-normally distributed population may be used to find the bounds of confidence intervals analogously to the way the arithmetic mean and standard deviation are used to bound confidence intervals for a normal distribution. See discussion in log-normal distribution for details.