geom_rect() + geom_coord() + {ggiraph}
I manually defined coordinates for geom_rect()
to create custom sunbursts with ggplot()
. Pair this with {ggiraph}
package to create an awesome interactive graphic!
Sunbursts in R first caught my eye from the sunburstR
package with its amazing interactivity through d3
. However, I had trouble customizing it and sought a ggplot
alternative. You can create a version using geom_col
, as shown in this RStudio Community post; I chose to go the more complex geom_rect
route for more customization options.1
pretty pumped I got this custom "sunburst"-esque figure to work!! #ggplot #rstats pic.twitter.com/L88IQWoVCs
— Shannon Pileggi (@PipingHotData) January 10, 2020
To illustrate the concept on a publicly available data set, I am using the May 2021 #TidyTuesday
salary data. Here, Iâll be looking at the top 5 industries and their top 5 job titles by number of respondents. I donât think this visualization is necessarily the best for this data set, I am just using this data as a proof of concept.
The data attributes that work well for this visualization include:
4 to 6 categories, with associated weight or frequency of occurrence
4 to 6 sub-items within each category, with associated weight or frequency of occurrence
optional: a metric for the sub-items
This material was developed using:
Software / package | Version |
---|---|
R | 4.0.5 |
RStudio | 1.4.1103 |
tidyverse |
1.3.1 |
ggiraph |
0.7.8 |
glue |
1.4.2 |
pals |
1.6 |
First, import the data from GitHub.
# import data
survey_raw <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2021/2021-05-18/survey.csv')
Next, summarize the data in terms of the top 5 industries and top 5 job titles. For the job title, weâll get (1) the percent of respondents with that title, and (2) the median salary for that title.
survey_summary <- survey_raw %>%
# identify top industries
group_by(industry) %>%
mutate(# number of respondents per industry
n_industry = n()) %>%
ungroup() %>%
arrange(desc(n_industry), industry) %>%
mutate(
# rank of industry in terms of number of survey respondents
# know a better way to do this bit? pls let me know!
n_industry_rank = as.numeric(factor(-n_industry)),
# very light cleaning of job title
job_title = tolower(job_title) %>%
str_remove_all("[:digit:]+") %>%
ifelse(. == "rn", "registered nurse", .) %>%
str_trim()
) %>%
# reduce to top 5 industries present
filter(n_industry_rank %in% 1:5) %>%
# identify top 5 job titles in each industry
group_by(n_industry_rank, n_industry, industry, job_title) %>%
summarize(# number of titles within industry
n_title = n(),
# median salary per title
median_salary = median(annual_salary)) %>%
ungroup() %>%
arrange(n_industry_rank, industry, desc(n_title)) %>%
group_by(industry) %>%
# keep top 5 job titles within industry
slice(1:5) %>%
# recompute number of respondents per industry based on titles selected
mutate(n_industry = sum(n_title)) %>%
ungroup() %>%
mutate(
# relative scale of salary for plotting, on 0 - 100 scale
median_salary_scaled = median_salary / max(median_salary) * 100,
# base n respondents
n_total = sum(n_title),
# percent of respondents per title
pct_title = n_title / n_total * 100,
pct_industry = n_industry / n_total * 100,
# unique job title, as some repeat across industry
title_unique = glue::glue("{str_sub(industry, 1, 3)}: {job_title}")
) %>%
arrange(desc(pct_industry, pct_title))
Here is a preview of our data, which is the basis needed for the sunburstâs custom rectangles.
Pick a color palette for industry, and name the color vector.
colors_industry
Computing or Tech
"#830CAB"
Education (Higher Education)
"#6574FA"
Nonprofits
"#3BBCAC"
Health care
"#62E44A"
Government and Public Administration
"#CCEC5A"
Next, I manually define rectangles for the outer ring of the sunburst. I figured out the bits to create custom rectangles from various blog posts that I did not save. It could have been this post on variable width column charts. I got tips on how to angle text for labels from the R Graph Galleryâs circular barplots; sample code is shown below but not implemented in the figure.
The idea of the rectanglesâ color assignment is that darker shades represent more frequent or heavier weight items within a category. To create the colors, I concocted a solution to implement a degree of transparency off of the original industry colors based on the total number of sub-categories. Jake Riley shared with me an alternative solution for sub-category color assignment that he contributed on Stack Overflow.
ring_outer <- survey_summary %>%
# join colors to data frame ----
left_join(enframe(colors_industry, name = "industry", value = "color_industry"),
by = "industry") %>%
# arrange descending industry but ascending title
# to compute title colors
arrange(desc(n_industry), industry, n_title, job_title) %>%
group_by(industry) %>%
mutate(# enumerate title in industry
id_title = row_number(),
# number of titles per industry
# this can vary - it does not have to be same number per category
num_title = max(id_title)) %>%
ungroup() %>%
mutate(
# degree of transparency based on number of attributes ----
color_trans = id_title / num_title ,
color_title = map2_chr(color_industry, color_trans, ~ adjustcolor(.x, .y))
) %>%
# this arrange specifies the ordering of the figure
# clockwise ---
# had to include job title here as health care had ties in title frequency
arrange(-pct_industry, industry,-pct_title, desc(job_title)) %>%
# counter clockwise ---
# arrange(pct_industry, industry, pct_title, job_title) %>%
mutate(industry = fct_inorder(industry),
title_unique = fct_inorder(title_unique)) %>%
group_by(industry) %>%
mutate(cum_title_pct = cumsum(pct_title)) %>%
ungroup() %>%
mutate(cum_industry_pct = cumsum(cum_title_pct),
cum_all = cumsum(pct_title)) %>%
mutate(
# compute coordinates of the rectangles ----
# the 0.3 is an adjustment on the 0 to 100 scale to add a small
# amount of white space between rectangles
rect_x_max = cumsum(pct_title) - 0.3,
# xmax ----
rect_x_min = rect_x_max - pct_title + 0.3,
# xmin ----
rect_x_mid = (rect_x_min + rect_x_max) / 2,
# xmid ----
# angles in case adding text to plot, not shown in figures below ----
angle = 90 - 360 * rect_x_mid / 100,
hjust = ifelse(angle < -90, 1, 0),
angle = ifelse(angle < -90, angle + 180, angle),
# label for ggiraph
label_title = glue::glue(
'{scales::percent(pct_industry/100, accuracy = 1)} ({n_industry}/{n_total}) {industry}\n {scales::percent(pct_title/100, accuracy = 1)} ({n_title}/{n_total}) {job_title} \n {scales::dollar(median_salary)} median salary'
)
)
# view sunburst data
ring_outer %>%
mutate_at(vars(pct_industry,pct_title, median_salary_scaled, rect_x_max, rect_x_min), scales::number, accuracy = 0.1) %>%
dplyr::select(industry, n_industry, pct_industry, job_title, n_title, pct_title, id_title, num_title, color_trans, color_title, rect_x_max, rect_x_min, median_salary, median_salary_scaled) %>%
DT::datatable()
Combine all industry and title colors in a single vector.
colors_title <- ring_outer[["color_title"]] %>%
set_names(ring_outer[["title_unique"]])
all_colors <- c(colors_industry, colors_title)
Lastly in data prep, we create a smaller data set of rectangle coordinates just for the industries. The requires a dummy extra row to create white space between the first and last categories.
# create an inner ring for industry ----
# there is probably a better way to do this too, please let me know
# if you have suggestions to streamline!
ring_inner <- ring_outer %>%
dplyr::select(industry,
pct_industry,
n_industry,
n_total,
rect_x_max,
rect_x_min) %>%
# first, get stopping point for each rectangle from least common item ----
arrange(industry, desc(rect_x_max)) %>%
group_by(industry) %>%
# keep one entry per industry ----
slice(1) %>%
ungroup() %>%
dplyr::select(industry, pct_industry, n_industry, n_total, stop = rect_x_max) %>%
# second, get starting point for each rectangle from most common item ----
left_join(
ring_outer %>%
dplyr::select(
industry,
pct_industry,
n_industry,
n_total,
rect_x_max,
rect_x_min
) %>%
arrange(industry, rect_x_max) %>%
group_by(industry) %>%
slice(1) %>%
ungroup() %>%
dplyr::select(industry, pct_industry, n_industry, n_total, start = rect_x_min),
by = c("industry", "pct_industry", "n_industry", "n_total")
) %>%
# insert blank row for extra white space where circle starts/stops ---
bind_rows(tibble(
attr_category = NA,
start = 99.7,
stop = 100.1
)) %>%
mutate(
# compute midpoint for labeling if needed ----
mid = (stop - start) / 2 + start,
# a label for industry pct ----
label_industry = ifelse(
is.na(industry),
NA,
glue::glue(
'{scales::percent(pct_industry/100, accuracy = 1)} ({n_industry}/{n_total}) {industry} '
)
)
)
Finally, we can plot! This first figure has variable length bars, which are represented by the scaled median salary. The outer ring goes from a height of 0 up to a height of the scaled median salary, which is 100 here. Adding extra negative space to the ylim
creates the donut in the middle.
# outer ring for job title
ring_outer %>%
ggplot() +
geom_rect(
aes(
ymin = 1,
# variable length bars
ymax = median_salary_scaled,
xmin = rect_x_min,
xmax = rect_x_max,
fill = title_unique
),
show.legend = FALSE
) +
coord_polar() +
theme_minimal() +
theme(
axis.text.x = element_blank(),
axis.text.y = element_blank(),
panel.grid = element_blank(),
legend.position = "none",
) +
# fiddle lower limit here to control donut size
ylim(-40, 100) +
scale_fill_manual(values = all_colors) +
# inner industry ring ----
geom_rect(data = ring_inner,
aes(
# the inner ring spans a height from 0 to -20
ymin = -20,
ymax = 0,
xmin = start,
xmax = stop,
fill = industry
))
Well, it is an example! Not quite as aesthetically appealing as my original example shown in the tweet, but it will do to put code to the page.
Here, we see that the Computing or Tech industry had the most respondents, and within that industry the most common job title was software engineer. The median salary of software engineer was $125,000, which was less than the next most common job title of senior software engineer, with a median salary of $151,500.
Although I think this figure can be eye-catching, it is challenging to make it informative with labeling and still maintain appeal. For this reason, I have omitted labels.
For a simpler alternative, drop the job title metric of median salary and only show the frequency of industry and title in survey respondents by fixing the rectangle height. This figure can effectively communicate parts of a whole.
# outer ring for job title
ring_outer %>%
ggplot() +
geom_rect(
aes(
ymin = 1,
# fix rectangle height here
ymax = 20,
xmin = rect_x_min,
xmax = rect_x_max,
fill = title_unique
),
show.legend = FALSE
) +
coord_polar() +
theme_minimal() +
theme(
axis.text.x = element_blank(),
axis.text.y = element_blank(),
panel.grid = element_blank(),
legend.position = "none",
) +
# fiddle lower limit here to control donut size
# reduce max lim as well
ylim(-40, 30) +
scale_fill_manual(values = all_colors) +
# inner industry ring ----
geom_rect(data = ring_inner,
aes(
# the inner ring spans a height from 0 to -20
ymin = -20,
ymax = 0,
xmin = start,
xmax = stop,
fill = industry
))
With some very small tweaks to incorporate the {ggiraph} package, we can transform this to an interactive sunburst!
Hover over figure to view summary statistics!
# outer ring for job title
gg_interactive_variable <- ring_outer %>%
ggplot() +
# change 1: use geom_rect_interactive function from ggiraph
ggiraph::geom_rect_interactive(
aes(
ymin = 1,
# variable length bars
ymax = median_salary_scaled,
xmin = rect_x_min,
xmax = rect_x_max,
fill = title_unique,
# change 2: add id for hover over tooltip ---
tooltip = label_title
),
show.legend = FALSE
) +
coord_polar() +
theme_minimal() +
theme(
axis.text.x = element_blank(),
axis.text.y = element_blank(),
panel.grid = element_blank(),
legend.position = "none",
) +
# fiddle lower limit here to control donut size
# reduce max lim as well
ylim(-40, 100) +
scale_fill_manual(values = all_colors) +
# change 3: add interactivity to inner ring ----
ggiraph::geom_rect_interactive(
data = ring_inner,
aes(
# the inner ring spans a height from 0 to -20
ymin = -20,
ymax = 0,
xmin = start,
xmax = stop,
fill = industry,
# change 4: add tooltip to inner ring
tooltip = label_industry
))
# change 5: create interactive version using ggirafe
girafe(ggobj = gg_interactive_variable,
# this option fills tooltip background with same color from figure
options = list(opts_tooltip(use_fill = TRUE)))
Hover over figure to view summary statistics!
# outer ring for job title
gg_interactive_fixed <- ring_outer %>%
ggplot() +
# change 1: use geom_rect_interactive function from ggiraph
ggiraph::geom_rect_interactive(
aes(
ymin = 1,
# fix rectangle height here
ymax = 20,
xmin = rect_x_min,
xmax = rect_x_max,
fill = title_unique,
# change 2: add id for hover over tooltip ---
tooltip = label_title
),
show.legend = FALSE
) +
coord_polar() +
theme_minimal() +
theme(
axis.text.x = element_blank(),
axis.text.y = element_blank(),
panel.grid = element_blank(),
legend.position = "none",
) +
# fiddle lower limit here to control donut size
ylim(-40, 30) +
scale_fill_manual(values = all_colors) +
# change 3: add interactivity to inner ring ----
ggiraph::geom_rect_interactive(
data = ring_inner,
aes(
# the inner ring spans a height from 0 to -20
ymin = -20,
ymax = 0,
xmin = start,
xmax = stop,
fill = industry,
# change 4: add tooltip to inner ring
tooltip = label_industry
))
# change 5: create interactive version using ggirafe
girafe(ggobj = gg_interactive_fixed,
# this option fills tooltip background with same color from figure
options = list(opts_tooltip(use_fill = TRUE)))
Like this figure? Use it in your own work? Or have suggestions for improvement in either code or the graphic? Please, let me know!
I did experiment with some variations on this figure that did not make the cut!2
Monday's work đ¤đĨ° #rstats #ggplot pic.twitter.com/GbZWLhLeDA
— Shannon Pileggi (@PipingHotData) February 4, 2020
Alt text for tweet: Thin inner ring with solid shades shows relative frequency of broader categories. Outer ring juts out - shades and width indicate relative frequency of items within category; bar length indicates a metric on item.âŠī¸
Alt text: Left hand side: 4 year old coloring letter âOâ, in rainbow colors and outside the line; right hand side: figure created from ggplot with shape of letter âOâ in rainbow colors with rectangles both inside and outside the âOâ.âŠī¸
Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
Pileggi (2021, June 1). PIPING HOT DATA: Custom interactive sunbursts with ggplot in R. Retrieved from https://www.pipinghotdata.com/posts/2021-06-01-custom-interactive-sunbursts-with-ggplot-in-r/
BibTeX citation
@misc{pileggi2021custom, author = {Pileggi, Shannon}, title = {PIPING HOT DATA: Custom interactive sunbursts with ggplot in R}, url = {https://www.pipinghotdata.com/posts/2021-06-01-custom-interactive-sunbursts-with-ggplot-in-r/}, year = {2021} }