To visualize a small data set containing multiple categorical (or qualitative) variables, you can create either a bar plot, a balloon plot or a mosaic plot.
For a large multivariate categorical data, you need specialized statistical techniques dedicated to categorical data analysis, such as simple and multiple correspondence analysis. These methods make it possible to analyze and visualize the association (i.e. correlation) between a large number of qualitative variables.
Here, you’ll learn some examples of graphs, in R programming language, for visualizing the frequency distribution of categorical variables contained in small contingency tables. We provide also the R code for computing the simple correspondence analysis.
Load required R packages and set the default theme:
library(ggplot2) library(ggpubr) theme_set(theme_pubr())
Demo data set: HairEyeColor (distribution of hair and eye color and sex in 592 statistics students)
data("HairEyeColor") df
## Hair Eye Sex Freq ## 1 Black Brown Male 32 ## 2 Brown Brown Male 53 ## 3 Red Brown Male 10 ## 4 Blond Brown Male 3 ## 5 Black Blue Male 11 ## 6 Brown Blue Male 50
ggplot(df, aes(x = Hair, y = Freq))+ geom_bar( aes(fill = Eye), stat = "identity", color = "white", position = position_dodge(0.9) )+ facet_wrap(~Sex) + fill_palette("jco")
Balloon plot is an alternative to bar plot for visualizing a large categorical data. We’ll use the function ggballoonplot() [in ggpubr], which draws a graphical matrix of a contingency table, where each cell contains a dot whose size reflects the relative magnitude of the corresponding component.
Demo data sets: Housetasks (a contingency table containing the frequency of execution of 13 house tasks in the couple.)
housetasks
## Wife Alternating Husband Jointly ## Laundry 156 14 2 4 ## Main_meal 124 20 5 4 ## Dinner 77 11 7 13 ## Breakfeast 82 36 15 7
ggballoonplot(housetasks, fill = "value")+ scale_fill_viridis_c(option = "C")
A mosaic plot is basically an area-proportional visualization of observed frequencies, composed of tiles (corresponding to the cells) created by recursive vertical and horizontal splits of a rectangle. The area of each tile is proportional to the corresponding cell entry, given the dimensions of previous splits.
Mosaic graph can be created using either the function mosaicplot() [in graphics] or the function mosaic() [in vcd package]. Read more at: Visualizing Multi-way Contingency Tables with vcd.
Example of mosaic plot:
library(vcd) mosaic(HairEyeColor, shade = TRUE, legend = TRUE)
Correspondence analysis can be used to summarize and visualize the information contained in a large contingency table formed by two categorical variables.
Required package: FactoMineR for the analysis and factoextra for the visualization
library(FactoMineR) library(factoextra) res.ca
From the graphic above, it’s clear that:
Enjoyed this article? Give us 5 stars (just above this text block)! Reader needs to be STHDA member for voting. I’d be very grateful if you’d help it spread by emailing it to a friend, or sharing it on Twitter, Facebook or Linked In.
Show me some love with the like buttons below. Thank you and please don't forget to share and comment below!!
Avez vous aimé cet article? Donnez nous 5 étoiles (juste au dessus de ce block)! Vous devez être membre pour voter. Je vous serais très reconnaissant si vous aidiez à sa diffusion en l'envoyant par courriel à un ami ou en le partageant sur Twitter, Facebook ou Linked In.
Montrez-moi un peu d'amour avec les like ci-dessous . Merci et n'oubliez pas, s'il vous plaît, de partager et de commenter ci-dessous!