Statistical significance
From Free net encyclopedia
In statistics, a result is significant if it is unlikely to have occurred by chance, given that in reality, the independent variable (the test condition being examined) has no effect, or, formally stated, that a presumed null hypothesis is true.
Technically, in traditional frequentist statistical hypothesis testing, the significance level of a test is the maximum probability of accidentally rejecting a true null hypothesis (a decision known as a Type I error). The significance of a result is also called its p-value; the smaller the p-value, the more significant the result is said to be.
Popular levels of signficance are 10%, 5%, and 1% , all represented by the greek symbol, α (alpha).
For example, one may choose a significance level of, say, 5%, and calculate a critical value of a statistic (such as the mean) so that the probability of it exceeding that value, given the truth of the null hypothesis, would be 5%. If the actual, calculated statistic value exceeds the critical value, then it is significant "at the 5% level".
If the significance level is smaller, a value will be less likely to be more extreme than the critical value. So a result which is "significant at the 1% level" is more significant than a result which is "significant at the 5% level". However a test at the 1% level is more likely to have a Type II error than a test at the 5% level, and so will have less statistical power. In devising a hypothesis test, the tester will aim to maximize power for a given significance, but ultimately have to recognize that the best which can be achieved is likely to be a balance between significance and power, in other words between the risks of Type I and Type II errors. It is important to note that Type I error is not necessarily any worse than a Type II error, and vice versa. The severity of an error depends on each individual case.
Pitfalls with significance
It is important to differentiate between the traditional use of the term “significant:” -- “a major effect; important; fairly large in amount or quantity” and the meaning when used to describe research, which carries no such connotation of meaningfulness. Statistically, with a large enough number being sampled, a trivial difference can still be described as “significant.”
Even statistical significance can be illusory when data from numerous subgroups of the sample population are evaluated. A very large study, Women's Health Initiative, that tested the effects of nutritional supplements reported in 2006 no significant effects for the entire population on certain variables, but among subgroups there was significance. Given that the number of subgroups--- age categories, obesity levels, marital status, along with the combinations thereof, could be extensive, simple probability would predict that some of these groups would show a spurious significant difference even if the null hypothesis were true.
Signal–noise ratio conceptualisation of significance
Statistical significance can be considered to be the confidence one has in a given result. In a comparison study, it is dependent on the relative difference between the groups compared, the amount of measurement and the noise associated with the measurement. In other words, the confidence one has, in a given result being non-random (i.e. it is not a consequence of chance), depends on the signal-to-noise ratio (SNR) and the sample size.
Expressed mathematically, the confidence that a result is not by random chance is given by the following formula by Sackett<ref>Sackett DL. Why randomized controlled trials fail but needn't: 2. Failure to employ physiological statistics, or the only formula a clinician-trialist is ever likely to need (or understand!). CMAJ. 2001 Oct 30;165(9):1226-37. PMID 11706914. Free Full Text.</ref>:
<math>confidence = \frac{signal}{noise} \times \sqrt{sample\ size}</math>
For clarity, the above formula is presented in tabular form below.
Dependence of confidence with noise, signal and sample size (tabular form)
Parameter | Parameter increases | Parameter decreases |
---|---|---|
Noise | Confidence decreases | Confidence increases |
Signal | Confidence increases | Confidence decreases |
Sample size | Confidence increases | Confidence decreases |
In words, the dependence of confidence is high if the noise is low and/or the sample size is large and/or the effect size (signal) is large. The confidence of a result (and its associated confidence interval) is not dependent on effect size alone. If the sample size is large and the noise is low a small effect size can be measured with great confidence. Whether a small effect size is considered important is dependent on the context of the events compared.
In medicine, small effect sizes (reflected by small increases of risk) are often considered clinically relevant and are frequently used to guide treatment decisions (if there is great confidence in them). Whether a given treatment is considered a worthy endeavour is dependent on the risks, benefits and costs.
References
<references/>de:Statistische Signifikanz he:מובהקות סטטיסטית ja:有意 lt:Reikšmingumo lygmuo nl:Significantie pt:Significância estatística su:Statistical significance zh:显著性差异