How to Perform ANCOVA with R: A Simple Guide

ANCOVA is short for analysis of covariance, a statistical method that lets you compare the means of an outcome variable among two or more groups while accounting for (or correcting for) the variability of other variables, known as covariates. 

Covariates are variables related to the outcome variable but not of primary interest to the study. By adjusting for the effects of covariates, ANCOVA can increase the accuracy and the power of the analysis and remove the confounding bias between the groups. ANCOVA can also test the interaction effects between the variables and provide the adjusted group means for the covariates.

# Fit the ANCOVA model using the aov() function
model_aov <- aov(weight ~ exercise * age * height * bmi * gender, data = data)
How to Perform ANCOVA with R: A Simple Guide

Key takeaways

  • ANCOVA stands for analysis of covariance, a statistical method that allows you to compare the means of a continuous dependent variable across different groups while controlling for the effects of one or more continuous covariates.
  • ANCOVA can help you answer questions such as: Does the type of fertilizer affect the growth of plants after adjusting for the amount of water and sunlight they receive? Does the type of exercise affect people's weight loss after adjusting for their age and gender?
  • To perform ANCOVA with R, you need to install R and RStudio, load the necessary packages, and have a suitable data set that meets the assumptions of ANCOVA, such as linearity, homogeneity, and independence.
  • You can perform ANCOVA with R using the aov() function, which fits a linear model to the data and returns an ANOVA table. You can also use the Anova() function from the car package, which allows you to specify different contrasts and error terms for the ANCOVA model.
  • You can interpret the results of ANCOVA with R by looking at the p-values, coefficients, and effect sizes of the ANOVA table and the ANCOVA model. You can also create and customize plots to visualize the ANCOVA model and the effects of the variables.


Table of Contents

Ancova in R

Have you ever wondered how to perform ANCOVA in R? ANOVA is a simple and powerful statistical method that can help you compare the means of a continuous variable across different groups while considering the effects of other variables that may influence the outcome. 

If so, you are not alone. It is a common and important question in many fields and disciplines, such as psychology, education, medicine, and social sciences. I faced this question while working on my master's thesis. I wanted to investigate the effect of different types of online courses on student's satisfaction and performance while controlling for the impact of their prior knowledge, motivation, and learning style. 

I had a data set with the student's ratings, scores, and background information, and I needed a statistical method to help me answer my research question. That’s when I learned about ANCOVA or analysis of covariance. It is a simple and powerful statistical method that can help you compare the means of a continuous dependent variable across different groups while controlling for the effects of one or more continuous covariates. 

ANCOVA can help you adjust the group means for the covariates, test the interaction effects between the variables, and increase the power and precision of the analysis.

In this article, I will share how to perform ANCOVA in R, a free and powerful statistical computing and graphics software. I will also explain what ANCOVA is, why it is useful, and how to perform it with R using the aov() function or the Anova() function. I will also show you how to prepare your data for ANCOVA, interpret and report the results of ANCOVA, and create and customize plots to visualize the ANCOVA model and the effects of the variables.

Whether you are a student, a researcher, a teacher, or a practitioner, this article is for you. If you want to learn how to perform ANCOVA with R and how to compare the means of a continuous variable across different groups while controlling for the effects of other variables, this article is for you. If you are ready, let’s get started.

What is ANCOVA with R?

What is ANCOVA?

ANCOVA is a statistical method that stands for analysis of covariance. It is useful for comparing the means of a continuous dependent variable across different groups while controlling for the effects of one or more continuous covariates. Covariates are variables related to the dependent variable but not of primary interest to the study. 

By adjusting for the effects of covariates, ANCOVA can increase the analysis's power and precision and remove the confounding bias between the groups. ANCOVA can also test the interaction effects between the variables and provide the adjusted group means for the covariates. 

Function Description
aov() Fits a linear model to the data and returns an ANOVA table
Anova() Fits a linear model to the data and returns an ANOVA table with different types of contrasts and error terms
summary() Provides a summary of the ANOVA table or the ANCOVA model
confint() Computes the confidence intervals for the coefficients of the ANCOVA model
etaSquared() Computes the eta-squared effect size for the ANOVA table
plot() Creates a plot of the ANCOVA model or the residuals
interaction.plot() Creates an interaction plot of the dependent variable by the factors
ggplot() Creates a plot using the grammar of graphics

For example, suppose you want to compare the test scores of students who use different studying techniques. In that case, you can use ANCOVA to control for the effects of their prior knowledge, motivation, and learning style and see if there is a significant difference in the test scores between the studying techniques after accounting for these covariates. ANCOVA can help you answer your research question with more accuracy and reliability.

Before performing ANCOVA with R, you must ensure your data is ready and suitable for analysis. It involves checking and cleaning the data and ensuring it meets ANCOVA's assumptions.  Here are the steps you need to take to prepare our data for ANCOVA with R:

Data

# Set seed for reproducibility
set.seed(123)
# Generate 100 observations for each variable
height <- rnorm(100, mean = 170, sd = 10) # Height in centimeters
weight <- rnorm(100, mean = 70, sd = 15) # Weight in kilograms
exercise <- sample(c("aerobic", "anaerobic", "none"), 100, replace = TRUE) # Type of exercise
age <- sample(18:65, 100, replace = TRUE) # Age in years
gender <- sample(c("male", "female"), 100, replace = TRUE) # Gender
# Create a data frame with the variables
data <- data.frame(height, weight, exercise, age, gender)
# Save the data frame as a CSV file
write.csv(data, "ancova_data.csv", row.names = FALSE)

Here are the top five rows of the data:

height weight exercise age gender
164.3952 59.34390 none 46 male
167.6982 73.85326 none 23 male
185.5871 66.29962 none 27 female
170.7051 64.78686 aerobic 43 male
171.2929 55.72572 aerobic 26 female

or download this data set from this link.

Check and Clean the data 

You need to check the data for any missing values, outliers, or errors that may affect the accuracy and validity of the analysis. You can use the summary() function to get a summary of the data and the is.na() function to check for missing values. 

You can also use the boxplot() function to identify any outliers in the data and the filter() function from the dplyr package to remove or replace them. You can also use the mutate() function from the dplyr package to create or modify new variables.

# Get a summary of the data
summary(data)
# Check for missing values
sum(is.na(data))
# Identify outliers in the height variable
boxplot(data$height, main = "Boxplot of Height")
# Remove outliers in the height variable
#data <- filter(data, height > 140)
# Create a new variable called bmi, which is the ratio of weight to height squared
library(dplyr)
data <- mutate(data, bmi = weight / (height / 100)^2)
head(data,5)


descriptive statistics table in R and also check for missing values

boxplot of height by using the base function of R using the boxplot()

Check of assumptions of ANCOVA 

You need to make sure that the data meets the following assumptions of ANCOVA, which are necessary for the validity and reliability of the analysis:

  1. Linearity
  2. Homogeneity
  3. Independence

Linearity

The relationship between the dependent variable and the covariates should be linear, meaning that a straight line can represent them. You can check the linearity assumption by creating scatter plots of the dependent variable by the covariates and looking for any curvature or non-linearity in the patterns. 

You can also use the cor() function to calculate the correlation coefficients between the variables and look for any values close to zero or negative. If the linearity assumption is violated, you may need to transform the variables or use a different analysis method. You can use the following code to check the linearity assumption:

# Create scatter plots of the dependent variable by the covariates
par(mfrow = c(2, 2)) # Set the layout of the plots
plot(weight ~ age, data = data, main = "Weight by Age")
plot(weight ~ height, data = data, main = "Weight by Height")
plot(weight ~ bmi, data = data, main = "Weight by BMI")
plot(weight ~ gender, data = data, main = "Weight by Gender")

scatter plots of the dependent variable by the covariates
Correlation

# Calculate the correlation coefficients between the variables
cor(data[, c("weight", "age", "height", "bmi")])
weight age height bmi
weight 1.000 0.060 -0.050 0.900
age 0.060 1.000 -0.166 0.115
height -0.050 -0.166 1.000 -0.468
bmi 0.900 0.115 -0.468 1.000

Homogeneity

The variance of the dependent variable should be equal across the groups, meaning that the spread or dispersion of the data points should be similar for each group. You can check the homogeneity assumption by creating box plots of the dependent variable by the group variable and looking for any differences in the size or shape of the boxes. 

You can also use Levene's test, which is a statistical test that compares the variances of the groups. The null hypothesis of Levene's test is that the variances are equal, and the alternative hypothesis is that they are not. You can use the `leveneTest()` function from the `car` package to perform the Levene's test and look at the p-value. 

If the p-value is less than 0.05, you can reject the null hypothesis and conclude that the variances are unequal. If the homogeneity assumption is violated, you may need to transform the variables or use a different analysis method.

# Create box plots of the dependent variable by the group variable
boxplot(weight ~ exercise, data = data, main = "Boxplot of Weight by Exercise")
# Perform the Levene's test
library(car)
leveneTest(weight ~ exercise, data = data)
Boxplot of Weight by Exercise using boxplot() function in R
Perform the Levene's test using car

Independence

The observations of the dependent variable should be independent of each other, meaning that any external factors or other observations do not influence them. You can check the independence assumption by looking at the source and design of your data and ensuring there is no clustering, grouping, or pairing of the observations. 

Related Posts

You can also use the Durbin-Watson test, a statistical test that detects the presence of autocorrelation, a pattern of correlation between the observations based on their order or sequence. 

  • The null hypothesis of the Durbin-Watson test is that there is no autocorrelation
  • The alternative hypothesis is that there is. 
You can use the `durbinWatsonTest()` function from the `car` package to perform the Durbin-Watson test and look at the p-value. If the p-value is less than 0.05, you can reject the null hypothesis and conclude that there is autocorrelation. If the independence assumption is violated, you may need to adjust the data or use a different analysis method. 

# Perform the Durbin-Watson test
durbinWatsonTest(aov(weight ~ exercise + age + height + bmi, data = data))

Suppose your data meets all the assumptions of ANCOVA. In that case, you can proceed to the next step, which is to perform ANCOVA with R. If not, you may need to take some corrective actions, such as transforming the variables, removing or replacing the outliers, or using a different analysis method. You can also consult with us for further guidance. Let's arrange a Zoom meeting.

`durbinWatsonTest()` function from the `car` package to perform the Durbin-Watson test

How to Perform ANCOVA with R

After you have prepared your data for ANCOVA, you can perform ANCOVA with R using the aov() function or the Anova() function. Both functions fit a linear model to the data and return an ANOVA table, but they differ in handling the ANCOVA model's contrasts and error terms. You can choose the function that suits your needs and preferences, or you can use both and compare the results. Here are the steps you need to take to perform ANCOVA with R:

Specify the ANCOVA model

You need to specify the ANCOVA model using a formula describing the relationship between the dependent, group, and covariate variables. The formula has the following general form:

dependent ~ group + covariate1 + covariate2 + ...

You can also include interaction terms between the variables by using the * operator, which means that the effect of one variable depends on the level of another variable. For example, if you want to test the interaction between the group and the age variables, you can use the following formula:

dependent ~ group * age + covariate1 + covariate2 + ...

You can also use the: operator to specify only the interaction term without the main effects of the variables. For example, you want to test only the interaction between the group and the age variables.

In that case, you can use the following formula:

dependent ~ group : age + covariate1 + covariate2 + ...

For this tutorial, we will use the following formula to specify the ANCOVA model, which includes the main effects and the interaction effects of all the variables:

weight ~ exercise * age * height * bmi * gender

Fit the ANCOVA model using the aov() function

You can fit the ANCOVA model using the aov() function, which takes the formula and the data as the main arguments, and returns an ANOVA table. You can assign the output of the aov() function to a variable, such as model_aov, and use the summary() function to view the summary of the ANOVA table. 

You can also use the confint() function to compute the confidence intervals for the coefficients of the ANCOVA model, and the etaSquared() function from the lsr package to compute the eta-squared effect size for the ANOVA table. 

# Fit the ANCOVA model using the aov() function
model_aov <- aov(weight ~ exercise * age * height * bmi * gender, data = data)
# View the summary of the ANOVA table
summary(model_aov)
ancova with R, summary of the anova table
Fit the ANCOVA model using the Anova() function
You can also fit the ANCOVA model using the Anova() function from the car package, which takes the formula and the data as the main arguments and returns an ANOVA table with different contrasts and error terms. 
You can specify the type of contrast you want to use for the group variable by using the contrasts argument and the type of error term you want to use for the ANCOVA model by using the type argument. You can assign the output of the Anova() function to a variable, such as model_Anova, and use the summary() function to view the summary of the ANOVA table. 
You can also use the confint() function to compute the confidence intervals for the coefficients of the ANCOVA model and the etaSquared() function from the lsr package to compute the eta-squared effect size for the ANOVA table. You can use the following code to fit the ANCOVA model using the Anova() function:
library(lsr)
# Fit the ANCOVA model using the Anova() function
model_Anova <- Anova(lm(weight ~ exercise * age * height * bmi * gender, data = data), 
                     contrasts = list(exercise = contr.sum), # Use sum-to-zero contrasts for the exercise variable
                     type = 3) # Use type 3 error terms for the ANCOVA model
# View the summary of the ANOVA table
model_Anova
Get the same results as in previous.

How to Interpret and Report the Results of ANCOVA with R

After you have performed ANCOVA with R using the aov() function or the Anova() function, you need to interpret and report the analysis results. It involves looking at the p-values, coefficients, and effect sizes of the ANOVA table and the ANCOVA model and explaining what they mean and imply. 

You must also use clear and concise language and format and follow the APA style and guidelines. 

Look at the p-values of the ANOVA table

The p-values of the ANOVA table tell you whether the effects of the variables are statistically significant or not, meaning that they are unlikely to occur by chance. The p-values are compared to a significance level, usually 0.05, which is the probability of making a type I error or rejecting the null hypothesis when it is true. 

If the p-value is less than the significance level, you can reject the null hypothesis and conclude that the effect is statistically significant. If the p-value is greater than or equal to the significance level, you cannot reject the null hypothesis and conclude that the effect is not statistically significant. For example, if you look at the summary of the ANOVA table from the aov() function, you can see the following p-values

ancova with R, summary of the anova table

The coefficients of the ANCOVA model

Tell you the slope or the change in the dependent variable for each unit change in the independent variables, holding the other variables constant. The coefficients are also known as the regression coefficients or the parameter estimates. You can look at the coefficients of the ANCOVA model using the summary() function or the confint() function, which also provides the standard errors, the t-values, and the confidence intervals for the coefficients. 

Conclusion

In this article, you learned how to perform ANCOVA with R, a simple and powerful statistical method that can help you compare the means of a continuous dependent variable across different groups while controlling for the effects of one or more continuous covariates. You learned what ANCOVA is, why it is useful, and how to perform it with R using the aov() function or the Anova() function. You also learned how to prepare your data for ANCOVA, interpret and report the results of ANCOVA, and create and customize plots to visualize the ANCOVA model and the effects of the variables.

I hope you found this article useful and informative. If you have any questions or feedback, please comment below. If you liked this article, please share it with others and help me grow my audience and reputation as a data analyst and writer. You can subscribe to my newsletter or follow me on social media for more content and updates.

Frequently Asked Question (FAQS)

What is the difference between ANOVA and ANCOVA?

ANOVA stands for analysis of variance, a statistical method that allows you to compare the means of a continuous dependent variable across different groups. ANCOVA stands for analysis of covariance, a statistical technique that will enable you to compare the means of a continuous dependent variable across other groups while controlling for the effects of one or more continuous covariates. ANCOVA is an extension of ANOVA that considers the variation in the dependent variable explained by the covariates and adjusts the group means accordingly.

What are the advantages and disadvantages of ANCOVA?

The advantages of ANCOVA are that it can increase the analysis's power and precision by reducing the error variance and the confounding effects of the covariates. It can also test the interaction effects between the variables and adjust the group means for the covariates. The disadvantages of ANCOVA are that it requires some assumptions and conditions to be met, such as linearity, homogeneity, and independence. It can also be sensitive to outliers and multicollinearity and can be complex and difficult to interpret and report.

What are the types of contrasts and error terms in ANCOVA?

The types of contrasts and error terms in ANCOVA are the ways of coding and testing the effects of the group variable in the ANCOVA model. The types of differences are the ways of assigning numerical values to the levels or categories of the group variable, such as dummy coding, effect coding, or sum-to-zero coding. The error terms partition the sum of squares and the degrees of freedom of the ANOVA table, such as type 1, type 2, or type 3 error terms. The types of contrasts and error terms can affect the results and the interpretation of the ANCOVA model, and they should be chosen and justified based on the research question and the study's design.

What are the plots and the effect sizes in ANCOVA?

The plots and the effect sizes in ANCOVA are the ways of visualizing and measuring the effects of the variables in the ANCOVA model. The plots are the graphical representations of the ANCOVA model and the results of the variables, such as scatter plots, box plots, interaction plots, or regression plots. The plots can help you check the assumptions, explore the relationships, and communicate the findings of the ANCOVA model. The effect sizes are the numerical measures of the effects of the variables in the ANCOVA model, such as the eta-squared, the partial eta-squared, or the omega-squared. The effect sizes can help you quantify the magnitude and significance of the effects of the variables and compare them across different models or studies.

How do I choose the covariates for ANCOVA?

The covariates for ANCOVA should be variables related to the dependent variable but not the group variable. The covariates should also be measured before or independently of the group variable and should not be affected by the group variable. The covariates should also be continuous or categorical variables that can be converted to continuous variables. The covariates should also meet the assumptions of ANCOVA, such as linearity, homogeneity, and independence. You can use your theoretical knowledge, research question, and data exploration to select the covariates for ANCOVA.

How do I check the assumptions of ANCOVA?

The assumptions of ANCOVA are the conditions that need to be met for the validity and reliability of the analysis. The assumptions of ANCOVA are linearity, homogeneity, and independence. You can check the assumptions of ANCOVA by using various methods, such as plots, tests, and statistics. For example, you can use scatter plots, correlation coefficients, and the linearity assumption. You can use box plots, Levene's test, and the homogeneity assumption. You can use the source and design of your data, the Durbin-Watson test, and the independence assumption. You can also consult a statistician or an expert for further guidance.

How do I interpret the interaction effects in ANCOVA?

The interaction effects in ANCOVA are the effects of combining two or more variables on the dependent variable, which differ from the impact of the individual variables. The interaction effects in ANCOVA can be tested by using the * operator or the: operator in the formula of the ANCOVA model. You can interpret the interaction effects in ANCOVA by looking at the p-values, coefficients, and effect sizes of the ANOVA table and the ANCOVA model and by creating and examining the interaction plots. The interaction plots show the relationship between the dependent variable and one variable for each level of another variable. You can look for any crossing, diverging, or converging of the lines in the interaction plots, which indicate the presence and the nature of the interaction effects.

How do I report the results of ANCOVA in APA style?

The results of ANCOVA in APA style should include the following elements: the purpose and the research question of the analysis, the description and the summary of the data set, the specification and the justification of the ANCOVA model, the summary and the interpretation of the ANOVA table and the ANCOVA model, the discussion and the implication of the findings and the limitations of the analysis, and the references and the appendices. You should use clear, concise language and format and follow APA style and guidelines. You should also include the plots and the tables that support your results and label and caption them appropriately.

What are the alternatives to ANCOVA?

The alternatives to ANCOVA are the other statistical methods that can compare the means of a continuous dependent variable across different groups while controlling for the effects of one or more covariates. Some of the alternatives to ANCOVA are multiple regression, multivariate analysis of variance (MANOVA), multivariate analysis of covariance (MANCOVA), generalized linear models (GLM), mixed models, and structural equation models (SEM). The choice of the alternative method depends on the research question, the design of the study, the type and distribution of the variables, and the assumptions and conditions of the method.

What are the advantages and disadvantages of R for ANCOVA?

The advantages of R for ANCOVA are that R is a free and powerful software for statistical computing and graphics, that R has many packages and functions that can perform ANCOVA and related methods, that R can handle complex and large data sets, that R can create and customize high-quality plots, and that R has a large and active community of users and developers. The disadvantages of R for ANCOVA are that R has a steep learning curve, R has a complex and inconsistent syntax, R can be slow and memory-intensive, R can have compatibility and dependency issues, and R can have errors and bugs.

How do I learn more about ANCOVA and R?

You can learn more about ANCOVA and R using various resources and methods, such as books, videos, courses, blogs, podcasts, forums, and workshops. You can also learn more about ANCOVA and R by practicing and applying them to real-world data sets and problems, reviewing and replicating the works of others, seeking and providing feedback and help, and joining and participating in the online and offline communities of ANCOVA and R users and enthusiasts.

Source:
ANOVA and ANCOVA: A GLM Approach by John N. Walker and David A. Levitt.

Source:
Experimental Design for the Life Sciences by Graeme D. Ruxton and Nick Colegrave


Need a Customized solution for your data analysis projects? Are you interested in learning through Zoom? Hire me as your data analyst. I have five years of experience and a PhD. I can help you with data analysis projects and problems using R and other tools. To hire me, you can visit this link and fill out the order form. You can also contact me at info@rstudiodatalab.com for any questions or inquiries. I will be happy to work with you and provide you with high-quality data analysis services.


About the author

Zubair Goraya
Ph.D. Scholar | Certified Data Analyst | Blogger | Completed 5000+ data projects | Passionate about unravelling insights through data.
-->

Post a Comment

Have A Question?We will reply within minutes
Hello, how can we help you?
Start chat...