# When does the Central Limit Theorem kick in?

The central limit theorem states that regardless of the original distribution of the data, the sample mean of n independent observations will approach normality as n goes to infinity. This theorem allows scientists to use the plethora of statistical tests that exist which contain a normality assumption. For instance, the t-statistic consists of a normal random variable divided by the square root of a chi-square random variable divided by its degrees of freedom. The t-test can be applied to many samples that are not normally distributed because their mean, as the CLT says, is approximately normal.

I have often wondered, though, how much is enough? Many textbooks and professors will say 20, 30, or 50. However, I could find no formalized guidelines that justified these numbers. So, I have done some rough simulations to provide a reference for when the CLT “kicks in,” depending on the original distribution. In reality we do not often know the generating distribution, but the general shape of the distribution might indicate what common distribution it is similar to.

Here is the code for the simulation, done in RStudio (0.94.84):

x<-matrix(nrow=1000, ncol=100)

for(j in 1:100){

for(i in 1:100){

x[i,j]<-mean(rnorm(50))}

temp<-shapiro.test(x[ ,j])

if(temp$p.value<=0.05){print(temp$p.value)}}

The above code will perform the following tasks:

- generate 50 observations from a normal (or any distribution with finite variance) and calculate a mean. This process is repeated 100 times, generating 100 means
- A Shapiro-Wilks test is used to ascertain normality. Then, this entire process, including step 1, is repeated 100 times. We now have the results of 100 Shapiro-Wilks tests, each of which tested the distribution of 100 means.
- Of the 100 Shapiro-Wilks tests, any tests that had a p-value less than or equal to 0.05 are printed. If about 5 are printed (I’m personally alright with 4-6), then the distribution of those sample means are considered normal (i.e., the test is rejecting at the correct alpha-level).

Now, I provide a list of the samples sizes needed to achieve normality of the sample means. These results were calculated using the above code and starting with a sample size of 5, and increasing the sample size by 5 as needed until normality is achieved. The distributions used are: binomial, poisson, exponential, beta, gamma, chi-square, and t.

- Binomial (size = half of sample size rounded down, probability = 0.5): 20 (4 rejections)
- Poisson (lambda = 2): 20 (4 rejections)
- Exponential (rate=0.5): 140 (6 rejections)
- Beta (shape = 2, scale = 2): 5 (1 rejection)
- Gamma (shape = 2, scale = 2): 120 (1 rejection). However, this value seemed to be an extreme case after replication. After, replication, the more reasonable sample size seems to be 155 (5 rejections).
- Chi-square (degree of freedom = n-1, ncp = 0): 25 (5 rejections)
- t (degree of freedom = n-1, ncp = 0): 10 (4 rejections)

This experiment shows us that while the binomial, poisson, beta, chi-square, and t distributions are all approximately normal at the “textbook” sample sizes, the exponential and the gamma required quite large sample sizes for the means to be distributed normally. This was surprising to me.

This post was my first attempt at a simulation. It is possible that the simulation methodology, or the parameter values for each distribution, were used inappropriately. I invite comments and suggestions on improving the methodology if anyone has criticisms.

Love it 🙂

Correction to the code. Matrix “x” should have only 100 rows. I am not sure whether this affected the shapiro-wilks tests (i.e., missing values might have been counted as a mean of 0). I will do some testing and report the results.