Bharat S Raj

Introduction to ANOVA

ANOVA (Analysis of Variance) is a statistical technique used to check if the means of two or more groups are significantly different from each other. It may seem odd that it’s called “Analysis of Variance” rather than “Analysis of Means”. However, the name is appropriate since we make inferences about means by analyzing variance.

If we are only comparing two means, ANOVA will produce the same results as the t-test for independent samples (if we are comparing two different groups) or the t-test for dependent samples (if we are comparing two variables in one set of observations).

Then why not use multiple t-tests? If you were to conduct multiple t-tests for comparing more than two samples, it will have a compounded effect on the error rate of the result.

Application: Consider a scenario where you get a sample of the annual income of employees from three different geographies. You might want to compare if there is a difference in the average income of employees based on geographies. In this case, you would use ANOVA to compare the average income.

Another application of ANOVA can be found in the medical sector. In order to understand a reliable treatment method for a disease, multiple test groups (based on cure methodology) would be created. They would try to measure the number of days it takes to cure for each test group. Here, ANOVA can be used to prove/disprove if all the medication treatments were equally effective or not.

Assumptions

The following are the assumptions in ANOVA:

Terminologies

Grand Mean: Mean is a simple or arithmetic average of a range of values. In ANOVA, we use two types of means – the grand mean (mean of the entire sample) and group sample means (mean of each individual groups).

Hypothesis: A hypothesis is a statement which is suggested as a possible explanation for a particular situation or condition, but which has not yet been proved to be correct. In the case of ANOVA, we have a Null Hypothesis and an Alternate Hypothesis.

Note that we still can’t tell which group is specifically different from rest of the others.

Between-Group Variability (Mean Square Effect): It refers to variations between the distributions of individual groups as the values within each group are different. To calculate the Mean Square effect, we look at each sample to calculate the difference between its mean and the grand mean.

Within-Group Variability (Mean Square Effect): It refers to variations caused by differences within individual groups as not all the values within each group are the same. Each sample is considered independently, no interaction between samples is involved and the variability between the individual points in the sample is calculated.

Total Variation: It is the sum of the squares of the differences of each mean with the grand mean which is also the sum of SSB and SSW.

The whole idea behind the analysis of variance is to compare the ratio of between-group variance to within-group variance. If the variance caused by the interaction between the samples is much larger when compared to the variance that appears within each group, then it is because the means aren’t the same.

F test statistic: It measures if the means of different samples are significantly different or not. It’s calculated by dividing MSB and MSW. Lower the F-Ratio, more similar are the sample means. In that case, we cannot reject the null hypothesis.

The F-statistic calculated here is compared with the F-critical value for making a conclusion. If the value of the calculated F-statistic is more than the F-critical value (for a specific α/significance level), then we reject the null hypothesis.

Summary Table:

Variability Sum of Square Degrees of Freedom Mean Square F Statistic
Between SSB k – 1 SSB/(k-1) MSB/MSW
Within SSW N – k SSW/(N – k)
Total SSB + SSW N – 1

Types of ANOVA Test for Univariate Analysis

There are commonly two types of ANOVA tests for univariate analysis – One-Way ANOVA and Two-Way ANOVA. One-way ANOVA is used when we are interested in studying the effect of one independent variable (IDV)/factor on a population, whereas Two-way ANOVA is used for studying the effects of two factors on a population at the same time.

One Way ANOVA in Excel 2013

The following steps will help you carry out ANOVA in Microsoft Excel 2013:

From the Excel Output above, we see that p-value = 0.485 > 0.05, and so we can reject the null hypothesis and conclude there is a significant difference between the means of the four samples. The F-value is also smaller than the f critical value.

Two Way ANOVA in Excel 2013

The steps are similar to One-way ANOVA in Microsoft Excel 2013:

From the excel output above, we can conclude that for Row-wise (Country wise), there is a difference in the means. However, for column-wise (Income Levels), we failed to reject the null hypothesis (p < 0.05) and hence there is no difference in the means.