Data Visualization with ggplot2 | Cheat Sheet for R Users

Key Points

  • ggplot2 creates stunning and informative graphics in R, making complex data easy to understand.
  • It empowers you to transform raw data into captivating visuals, making your insights shine.
  • It is Built on a layered system; you build plots step-by-step, adding elements like points, lines, and bars.
  • You can easily Change colors, sizes, labels, and themes to create unique and impactful visuals.
  • Explore various graph types like scatter plots, bar charts, and heatmaps to reveal hidden trends and patterns.
  • Use the provided code snippets to visualize your data.

Visualize Your Data Like a Pro with the ggplot2 cheat sheet

I used to stare at spreadsheets with a mix of dread and fascination. Rows and columns held whispers of stories, but I couldn't decipher their language. Then, I discovered ggplot2. It was like finding a Rosetta Stone for data. Suddenly, patterns leaped from the screen, trends danced in vibrant hues, and insights whispered secrets I'd never imagined. 

It wasn't just about creating charts—it was about giving data a voice, a canvas to express its hidden truths. Join me as I explore the captivating world of ggplot2, where data transforms from numbers into narratives, and insights ignite a symphony of visual storytelling.

Data Visualization with ggplot2 | Cheat Sheet for R Users
Table of Contents

What is ggplot2?

Hadley Wickham designed ggplot2, a prominent R data visualization program. It is based on the grammar of graphics, which provides an organized way to construct visualizations. The program includes many modification options, making it an effective tool for creating professional-quality plots. 

With viable customization options, you can use ggplot2 to construct a wide range of plot types, such as scatter plots, bar plots, and histograms.

ggplot2 graph examples

Understanding the Grammar of Graphics

To fully utilize the capabilities of ggplot2, you must first understand the underlying functions that power its functioning. The ggplot2 divides a plot into layers, each reflecting a different visualization aspect. Furthermore, ggplot2 uses aesthetics and geometries to map data variables to visual features.

The Layers of ggplot2

The concept of layers is central to ggplot2, with each layer adding a new component to the plot. You have fine-grained control over the appearance of your plot by building it layer by layer, and you can simply adjust individual elements.

Mapping Aesthetics and Geometries

Aesthetics govern how data variables are visually represented, such as how a variable is mapped to the x-axis or the color of points. Geometries, conversely, dictate the type of plot you generate, whether it's a point plot, a line plot, a bar plot, or another shape. 

Adjusting aesthetics and geometries enables you to produce visually appealing and useful plots. To understand the concept of data visualization.

ggplot2 Tutorial

Data visualization, a pivotal component of contemporary research and analysis, is made more accessible and empowering with the use of the ggplot2 package in R. Renowned for its flexibility and user-friendly interface (Jiao et al., 2019), ggplot2 has empowered researchers across diverse fields to create informative and visually captivating plots that effectively represent their findings.

In cancer research, ggplot2 has proven its worth by visualizing biomarkers for liver cancer, such as PGM5, a diagnostic and prognostic marker (Jiao et al., 2019). Similarly, in studies on breast cancer and atherosclerotic plaques, researchers have harnessed the power of ggplot2 to present their findings on potential biomarkers and essential genes (Zhai et al., 2020; Yuan, 2024). These visualizations have played a pivotal role in deciphering the role of these markers in disease progression.

Moreover, in environmental science, ggplot2 has been used to visualize data related to fungal communities in soils and the presence of pathogens like Coccidioides immitis (Wagner et al., 2023). Using ggplot2, researchers created visual representations that aided in uncovering patterns and relationships within the data, contributing to a better understanding of environmental dynamics.

Application of ggplot2

Furthermore, in computational biology and bioinformatics, ggplot2 has been instrumental in visualizing RNA isoforms, gene expression profiles, and molecular pathways (DeMario et al., 2023; Xie, 2023). These visualizations help researchers identify potential therapeutic targets and unravel complex regulatory networks, showcasing the versatility of ggplot2 across diverse biological research domains.

In public health and epidemiology, ggplot2 has been a key tool in analyzing and visualizing the spread of diseases like COVID-19 and predicting crime patterns through spatiotemporal analysis (Song, 2024; Umair et al., 2020). By creating interactive dashboards and geographical maps using ggplot2, researchers can effectively communicate trends and insights from their data, playing a crucial role in aiding policymakers in making informed decisions and helping the public understand the gravity of the situation.

Moreover, in ecological studies, ggplot2 has been employed to visualize plant demographic rates, seasonal dynamics, and successional patterns in various ecosystems (Saenz-Pedroza et al., 2020; "Peer Review #3 of "Seasonal and successional dynamics of size-dependent plant demographic rates in a tropical dry forest (v0.2)", 2020). These visual representations help ecologists study changes in plant populations over time and understand the impact of environmental factors on ecosystem dynamics.

Additionally, in the field of genetics and genomics, ggplot2 has been used to visualize transcriptomic data, gene co-expression networks, and immune repertoires obtained through next-generation sequencing (R & Jia, 2021; Cui et al., 2021; Aouinti et al., 2016). These visualizations are crucial in identifying essential genes, unraveling regulatory networks, understanding immune responses, and advancing our knowledge.

The widespread adoption of ggplot2 across diverse disciplines underscores its significance as a powerful tool for data visualization in research and inspires the potential it holds for advancing scientific knowledge. By enabling researchers to create high-quality plots and graphics, ggplot2 facilitates the effective communication of findings and enhances data interpretation, contributing significantly to advancements in various scientific fields.

How to install ggplot2 in r, RStudio

Before starting work with ggplot2, you must install a library on your PC or laptop. The procedure is easy and can be completed with a few basic commands. Once installed, you can import the ggplot2 library into your R environment and explore its features.
install.packages("ggplot2") # Installation of the library, you have just to run this line of code one time
library(ggplot2) # Load the ggplot2 package

Basic Data Visualization with ggplot2

After installing and loading ggplot2, you may start building visualizations. The layering system is the main notion of ggplot2, where you add different components to make your plot. In our data analysis, we will use mtcars data, an openly available data set. 

# Create a scatter plot
ggplot(data = mtcars, aes(x = disp, y = hp)) +
geom_point()

Create a scatter plot using disp and hp using mtcars

Customizing Your Visualizations

ggplot2 gives a plethora of customization choices for your visualizations. You can change the colors and sizes change the colors and sizes, add labels and headings, and even apply themes for a consistent look. 

# Customize a scatter plot
ggplot(data = mtcars, aes(x = disp, y = hp, color = factor(gear))) +
geom_point(size = 3) +
labs(title = "Disp vs. Sepal Width", x = "disp", y = "hp", color="gear") + theme_minimal()
ggplot2 customization, adding title, legends in ggplot2

Visualizing Multiple 4-Variables

Multiple variable visualizations are a strength of ggplot2. Other aesthetics, such as form, size, and transparency, can express additional dimensions of your data. 

Scatter plot with facet grid by using column (facet_grid) in ggplot2 in R

# Scatter plot with facet grid (column)
ggplot(data = mtcars, aes(x = disp, y = hp, color = factor(gear))) +
  geom_point(size = 3) + 
  facet_grid(. ~ factor(carb)) + # facet_grid() usage
  labs(title = "Disp vs. Sepal Width", x = "disp", y = "hp", color = "gear") + 
  theme_minimal()
Scatter plot with facet grid by using column (facet_grid) in ggplot2 in R

Scatter plot with facet grid by using row (facet_grid) in ggplot2 in R

# Scatter plot with facet grid (row)
ggplot(data = mtcars, aes(x = disp, y = hp, color = factor(gear))) +
  geom_point(size = 3) +
  facet_grid(factor(carb) ~ .) + 
  labs(title = "Disp vs. Sepal Width", 
       x = "disp", y = "hp", color = "gear") + 
  theme_minimal()
Scatter plot with facet grid by using row (facet_grid) in ggplot2 in R

Visualizing Multiple 5-Variables

ggplot(data = mtcars, aes(x = disp, y = hp, color = factor(gear))) +
  geom_point(size = 3) +
  facet_grid(factor(carb) ~ factor(am),
             labeller=label_both) + # Include both row and column facets
  labs(title = "Disp vs. Sepal Width", 
       x = "disp", y = "hp", color = "gear")+
  theme_minimal()

Visualizing Multiple 5-Variables using ggplot2 function in R
Related Posts

Types of Graphs in ggplot2

Simple Scatterplot using ggplot2

ggplot(data = mtcars) + geom_point(aes(x = mpg, y = disp))

Simple Scatterplot using ggplot2 in RScatterplot With Encircling

#install.packages('ggalt')
library(ggalt)
ggplot(data = mtcars) + geom_point(aes(x = wt, y = hp)) +
  geom_encircle(aes(x = wt, y = hp), data = subset(mtcars, hp > 150))
Scatterplot With Encircling using ggalt and ggplot function of RStudio

Jitter Plot

ggplot(data = mtcars) +geom_jitter(aes(x = cyl, 
                                       y = mpg, color=factor(am)))+ #Map 'cyl' to x-axis and 'mpg' to y-axis
  labs(title = "Scatter Plot of Cylinders vs. MPG", 
       x = "Cylinders",y = "Miles per Gallon") # add labels

Jitter Plot using the geom_jitter function of ggplot2 in RBar Chart

gplot(data = mtcars) + geom_bar(aes(x = cyl, fill=factor(am)))
Bar Chart using geom_bar function of ggplot2 in Rsyudio

Bubble Plot

ggplot(data = mtcars) +
geom_point(aes(x = wt, y = hp, size = mpg))
Bubble Plot using geom_point function of ggplot2 in R

Correlogram: Visualizes the correlation matrix using color-coded tiles

library(ggcorrplot)
# Compute correlation matrix
cor_matrix <- cor(mtcars)
# Create correlation plot
ggcorrplot(cor_matrix, hc.order = TRUE, type = 
             "lower", lab = TRUE,lab_size = 3, 
           colors = c("blue", "white", "red"), 
           title = "Correlation Plot of mtcars Variables")
Correlogram: Visualizes the correlation matrix using color-coded tiles using ggcorrplot library in R

Deviation: Depicts the deviation from a baseline value.

ggplot(data = mtcars) +
  geom_point(aes(x = factor(cyl), y = mpg, color = factor(carb)), size = 3) +
  geom_hline(yintercept = mean(mtcars$mpg), color = "red", linetype = "dashed")
Deviation: Depicts the deviation from a baseline value in R

Other Types of Graphs

Plot Name Code
Bar ChartBar Chart ggplot(data, aes(x = categorical_variable)) + geom_bar(stat = "count")
Box Plot ggplot(data, aes(x = group_variable, y = numeric_variable)) + geom_boxplot()
Bubble Chart ggplot(data, aes(x = x_variable, y = y_variable, size = bubble_size_variable)) + geom_point()
Scatter PlotScatter Plot ggplot(data, aes(x = x_variable, y = y_variable)) + geom_point()
Count Charts ggplot(data, aes(x = categorical_variable)) + geom_bar(stat = "count")
Scatter Plot with Jittered Points ggplot(data, aes(x = x_variable, y = y_variable)) + geom_jitter()
Stacked Bar Chart ggplot(data, aes(x = x_variable, fill = group_variable)) + geom_bar(stat = "count")
Density Plot ggplot(data, aes(x = numeric_variable)) + geom_density()
Dot Plot ggplot(data, aes(x = x_variable, y = y_variable)) + geom_point()
Histogram ggplot(data, aes(x = numeric_variable)) + geom_histogram(binwidth = binwidth_value)
Ordered Bar Chart ggplot(data, aes(x = factor(variable, levels = ordered_levels))) + geom_bar(stat = "count")
Pie Chart ggplot(data, aes(x = "", y = numeric_variable, fill = categorical_variable)) + geom_bar(stat = "identity", width = 1)
Scatter Plot with Facet Grid ggplot(data, aes(x = x_variable, y = y_variable)) + geom_point() + facet_grid(rows = ~facet_variable)
Violin Plot ggplot(data, aes(x = group_variable, y = numeric_variable)) + geom_violin()

ggplot2 Cheat Sheet

Function Description
ggplot() Initializes a ggplot object and specifies the dataset and aesthetic mappings.
geom_point() Adds points to the plot for a scatter plot.
geom_bar() Creates a bar plot.
geom_line() Adds lines to the plot for a line plot.
geom_histogram() Creates a histogram.
geom_boxplot() Generates a box plot.
geom_tile() Produces a heatmap.
labs() Sets the title and axis labels for the plot.
theme_minimal() Applies a minimalistic theme to the plot.
facet_wrap() Divide the plot into multiple panels based on a categorical variable.
scale_fill_manual() Specifies custom colors for the fill aesthetic.
coord_flip() Flips the x and y axes to create a horizontal plot.
facet_grid() Organizes the plot into a grid of panels based on two categorical variables.

Conclusion

We learned about ggplot2, a powerful data visualization package in R. This tool is not just about creating pretty charts but also about transforming raw data into compelling visuals that can communicate complex information effectively. We'll start by guiding you through the installation process, ensuring you have all the necessary tools to embark on this data visualization journey.

We explore the primary usage of ggplot2, covering the fundamental building blocks of a ggplot object. We discussed the importance of aesthetics and learning to map data variables to visual properties such as color, shape, and size. By understanding the grammar of graphics, you can easily create simple yet informative plots.

Next, we ventured into the realm of customization, where you discovered how to tailor your plots to suit your needs. We explored various themes, scales, and annotations, enabling you to create polished and visually appealing visualizations. You also learned about the power of layers, which allows you to build complex plots by combining multiple data layers.

We also explored the advanced graphing techniques, introducing you to features such as facets, guides, and legends. These tools allow you to organize and present your data informative and visually strikingly. 

Frequently Asked Questions (FAQs)

What is ggplot2 used for in R?

ggplot2 is a powerful data visualization package in R used for creating and customizing a wide range of statistical graphics. It provides a flexible and layered approach to constructing plots, allowing users to visualize complex data structures and relationships easily.

How to plot ggplot2 in R?

To plot with ggplot2 in R, you must first load the ggplot2 library using the command library(ggplot2). Then, you can construct a plot by specifying the data, aesthetics (mapping variables to visual properties), and layers using the ggplot() function. 

Additional graphical elements such as geometric shapes and statistical transformations can be added using various geom_* functions. Finally, you can customize the plot's appearance by adding axes, labels, titles, and themes.

What are the 3 components of ggplot2?

The three main components of ggplot2 ggplot2 are:
  • Data: The dataset or data frame containing the variables to be visualized.
  • Aesthetics (aes): The mapping between variables and visual properties such as position, color, size, shape, etc.
  • Geometric objects (geoms): The graphical shapes or elements representing the data, such as points, lines, bars, etc.

How to include ggplot2?

To include ggplot2 in your R code, you must install the ggplot2 package if you still need it. You can install it by running install.packages("ggplot2") in the R console. After installation, you can load the ggplot2 library using the command library(ggplot2) to access the functions and capabilities provided by ggplot2.

Why is ggplot2 important?

ggplot2 is important because it offers a highly flexible and intuitive framework for creating visually appealing and informative data visualizations in R. It allows users to easily represent complex relationships, patterns, and trends in their data. 

With its layered graphics grammar, ggplot2 provides a consistent and powerful approach to visualizing diverse data types, making it a preferred choice for data analysts and researchers.

What library is used for ggplot?

The library used for ggplot is called "ggplot2". It is a widely used data visualization package in R developed by Hadley Wickham. By loading the ggplot2 library, users can access the functions and syntax specific to ggplot2 for creating and customizing visualizations.

What is df in ggplot?

In ggplot, "df" typically refers to a data frame, a tabular data structure in R. A data frame is a standard format for organizing and storing data, where each column represents a variable, and each row represents an observation. In ggplot, data frames are often used as the input data for creating visualizations.

How to name a plot in ggplot2?

To name a plot in ggplot2, you can use the labs() function to specify the desired names for the plot title, x-axis, and y-axis labels. For example, labs(title = "My Plot", x = "X-axis", y = "Y-axis") sets the title and axis labels accordingly.

How to plot data in R code?

To plot data in R code, you can use various plotting functions and libraries available in R. One popular option is ggplot2, which provides a powerful and flexible framework for creating customized plots. To plot data using ggplot2, specify the data, aesthetics, and geometric objects used.


Reference:

  • Saenz-Pedroza, I., Feldman, R., Reyes-García, C., Meave, J. A., Calvo-Irabien, L. M., May-Pat, F., & Dupuy, J. M. (2020). Seasonal and successional dynamics of size-dependent plant demographic rates in a tropical dry forest. PeerJ, 8, e9636.
  • Aouinti, S., Giudicelli, V., Duroux, P., Malouche, D., Kossida, S., & Lefranc, M. (2016). Imgt/statclonotype for pairwise evaluation and visualization of ngs ig and tr imgt clonotype (aa) diversity or expression from imgt/highv-quest. Frontiers in Immunology, 7. https://doi.org/10.3389/fimmu.2016.00339
  • Cui, Z., Li, Y., He, S., Wen, F., Xu, X., Liu, L., … & Wu, S. (2021). Key candidate genes – vsig2 of colon cancer identified by weighted gene co-expression network analysis. Cancer Management and Research, Volume 13, 5739-5750. https://doi.org/10.2147/cmar.s316584
  • DeMario, S., Xu, K., He, K., & Chanfreau, G. (2023). Nanoblot: an r-package for visualization of rna isoforms from long-read rna-sequencing data. Rna, 29(8), 1099-1107. https://doi.org/10.1261/rna.079505.122
  • Jiao, Y., Li, Y., Jiang, P., Han, W., & Liu, Y. (2019). Pgm5: a novel diagnostic and prognostic biomarker for liver cancer. Peerj, 7, e7070. https://doi.org/10.7717/peerj.7070
  • R, L. and Jia, Z. (2021). Pcadb - a comprehensive and interactive database for transcriptomes from prostate cancer population cohorts.. https://doi.org/10.1101/2021.06.29.449134
  • Saenz-Pedroza, I., Feldman, R., Reyes-García, C., Meave, J., Calvo-Irabién, L., May-Pat, F., … & Dupuy, J. (2020). Seasonal and successional dynamics of size-dependent plant demographic rates in a tropical dry forest. Peerj, 8, e9636. https://doi.org/10.7717/peerj.9636
  • Song, H. (2024). K-track-covid: interactive web-based dashboard for analyzing geographical and temporal spread of covid-19 in south korea. Frontiers in Public Health, 12. https://doi.org/10.3389/fpubh.2024.1347862
  • Umair, A., Sarfraz, M., Habib, U., Ullah, M., & Mazzara, M. (2020). Spatiotemporal analysis of web news archives for crime prediction. Applied Sciences, 10(22), 8220. https://doi.org/10.3390/app10228220
  • Wagner, R., Montoya, L., Head, J., Campo, S., Remais, J., & Taylor, J. (2023). Coccidioides undetected in soils from agricultural land and uncorrelated with time or the greater soil fungal community on undeveloped land. Plos Pathogens, 19(5), e1011391. https://doi.org/10.1371/journal.ppat.1011391
  • Xie, R. (2023). Identification of potential therapeutic target spp1 and related rna regulatory pathway in keloid based on bioinformatics analysis.. https://doi.org/10.21203/rs.3.rs-3008440/v2
  • Yuan, Y. (2024). Identification of m2 macrophage-related key genes in advanced atherosclerotic plaques by network-based analysis. Journal of Cardiovascular Pharmacology, 83(3), 276-288. https://doi.org/10.1097/fjc.0000000000001528
  • Zhai, X., Yang, Z., Liu, X., Dong, Z., & Zhou, D. (2020). Identification of nuf2 and fam83d as potential biomarkers in triple-negative breast cancer. Peerj, 8, e9975. https://doi.org/10.7717/peerj.9975 


Do you need help with a data analysis project? Let me assist you! With a PhD and ten years of experience, I specialize in solving data analysis challenges using R and other advanced tools. Reach out to me for personalized solutions tailored to your needs.

About the author

Zubair Goraya
Ph.D. Scholar | Certified Data Analyst | Blogger | Completed 5000+ data projects | Passionate about unravelling insights through data.
-->

Post a Comment

Have A Question?We will reply within minutes
Hello, how can we help you?
Start chat...