Checking normality in Excel
stats
tutor community project www.statstutor.ac.uk
It is very unlikely that a histogram of sample data will produce a perfectly smooth normal curve like
the one displayed over the histogram, especially if the sample size is small. As long as the data is
approximately normally distributed, with a peak in the middle and fairly symmetrical, the
assumption of normality has been met.
The normal Q-Q plot is an alternative graphical method of assessing normality to the histogram
and is easier to use when there are small sample sizes. The scatter should lie as close to the line
as possible with no obvious pattern coming away from the line for the data to be considered
normally distributed. Below are the same examples of normally distributed and skewed data.
Q-Q plot of approximately normally distributed data
Tests for assessing if data is normally distributed
The Kolmogorov-Smirnov test and the Shapiro-Wilk’s W test are two specific methods for testing
normality of data but these should be used in conjunction with either a histogram or a Q-Q plot as
both tests are sensitive to outliers and are influenced by sample size:
• For smaller samples, non-normality is less likely to be detected but the Shapiro-Wilk test
should be preferred as it is generally more sensitive
• For larger samples (i.e. more than one hundred), the normality tests are overly conservative
and the assumption of normality might be rejected too easily.
Null hypothesis for test of normality: The data is normally distributed.
If the p-value is under 0.05, the null is rejected and there is significant evidence of non-normal data.
For both of these examples, the sample size is 35 so the Shapiro-Wilk test should be used. For
the skewed data, p = 0.002 suggesting strong evidence of non-normality. For the approximately
normally distributed data, p = 0.585, so normality can be assumed and provided any other test
assumptions are satisfied, an appropriate parametric test can be used.
What if the data is not normally distributed?
If the checks suggest that the data is not normally distributed, there are two options:
• Transform the dependent variable (repeating the normality checks on the transformed data):
Common transformations include taking the log or square root of the dependent variable.
• Use a non-parametric test: Non-parametric tests are often called distribution free tests and
can be used instead of their parametric equivalent.