How do you know if a Wilcoxon test is significant?

You determine if a Wilcoxon test is significant by comparing its p-value to a predetermined significance level (alpha, or $\alpha$).

Understanding Wilcoxon Test Significance

The Wilcoxon test is a non-parametric statistical test used to compare population medians. It's often employed when data does not meet the assumptions of parametric tests (like the t-test), such as normality. Whether you're performing a one-sample Wilcoxon signed-rank test (comparing a population median to a hypothesized value) or a two-sample Wilcoxon rank-sum test (comparing two independent population medians), the principle of determining significance remains the same: it hinges on the p-value.

The P-value and Significance Level

To ascertain whether the results of a Wilcoxon test are statistically significant, you must compare its calculated p-value to your chosen significance level.

P-value: The p-value (probability value) represents the probability of observing a test statistic as extreme as, or more extreme than, the one calculated, assuming that the null hypothesis is true. In simpler terms, it quantifies the strength of evidence against the null hypothesis. A smaller p-value indicates stronger evidence against the null hypothesis.
Significance Level ($\alpha$): The significance level, denoted as $\alpha$ (alpha), is a threshold you set before conducting the test. It represents the maximum probability of making a Type I error – incorrectly rejecting a true null hypothesis. A commonly used significance level is 0.05 (or 5%). Other common levels include 0.01 (1%) or 0.10 (10%).

Decision Rule for Significance

The decision to declare a Wilcoxon test significant is straightforward:

Condition	Interpretation
P-value $\le \alpha$	The result is statistically significant. There is enough evidence to reject the null hypothesis.
P-value $> \alpha$	The result is not statistically significant. There is not enough evidence to reject the null hypothesis.

Example: If your Wilcoxon test yields a p-value of 0.02 and you set your significance level ($\alpha$) at 0.05, then because $0.02 \le 0.05$, the result is statistically significant. This suggests that there's a statistically meaningful difference between the medians being compared. Conversely, if the p-value were 0.10 with an $\alpha$ of 0.05, the result would not be significant, meaning you don't have enough evidence to claim a significant difference.

Practical Implications

When Significant: A significant result from a Wilcoxon test indicates that the observed difference between medians is unlikely to have occurred by random chance alone, assuming the null hypothesis (no difference) is true. This provides evidence to support the alternative hypothesis (that a difference exists).
When Not Significant: A non-significant result means that the observed difference is likely due to random variation, or that your sample size was not large enough to detect a true difference if one exists. It does not necessarily mean there is no difference, but rather that your test did not find sufficient evidence to conclude one at your chosen significance level.

Choosing the appropriate significance level is crucial. A lower alpha (e.g., 0.01) makes it harder to achieve significance, requiring stronger evidence against the null hypothesis, thus reducing the risk of a Type I error. A higher alpha (e.g., 0.10) makes it easier to find significance, but increases the risk of a Type I error.