ggplot2 gallery: tags summary
2026-04-30
Following a misuse of tags myself, as a new ggplot2 extender, I decided
to take a deeper dive into tags in the ggplot2 extension gallery.
This full Quarto doc can be found here.
_config.yml lists every extension with a free-text tags field. Because the field is unconstrained, the field nearly useless for meaningful discovery.
TL;DR
- 158 packages in the extension gallery.
- 193 unique tags after lower-casing (one case-only collision was
Visualization vs visualization).
- On average, each package has 3.15 tags (md = 3, sd = 1.58).
ggDNAvis leads with 12 unique tags.
visualization is the number one tag with 145 appearances;
general follows with 62. Together they account for 42% of all
tag uses but carry essentially no information.
- The tag distribution is extremely long-tailed: 148 of 193 tags (77%)
are used by exactly one package, and only 20 tags reach n ≥ 3.
What is wrong with the current tags
The raw .qmd file contains an object tags_in_pkgs with the summary
information. download & explore it for better understanding.
- Case / spelling / number variants that should clearly merge:
Visualization vs visualization vs visualisation vs
visualizations; geom vs geoms; time series vs time-series;
theme vs themes; facet vs facets; outlier vs outliers;
customisable vs customizable; algorithm vs algorithms.
- Filler tags that carry no signal.
general is applied by 62
packages and tells a reader nothing. visualization is applied by
145 packages — i.e., almost everyone — so it is not a discriminator
either. I mean, this IS a ggplot2 gallery after all.
- Specialized niche vocabulary. Several packages invent their own
private taxonomy. Examples:
gganatogram: anatograms, tissue, anatomy, expression, pharmacology
ggDNAvis: DNA, RNA, customisable, customizable, medicine, methylation, sequence, FASTQ, ...
(12 tags, all unique to this package).
ggblend: blending, affine transformation, layer algebra, compositing` (4 tags, all unique).
These are accurate descriptions but they cannot help anyone find the package, because no other entry uses the same words.
- Redundant within a single entry.
ggoutlierscatterplot lists
both outlier and outliers, both algorithm and algorithms.
ggDNAvis lists both customisable and customizable.
Proposed unification
The core strategy is to reduce and unify tags to clusters that provide
actual information:
- Normalise all tags to lower-case and trim whitespace.
- Collapse spelling / case / number variants to a single canonical form.
- Group small related tags (n < 3) under a topical umbrella when a
clear cluster exists.
Better control in the future
- Drop or rename
general and visualization. Both are applied
broadly enough that they fail to discriminate between packages and
add little to no signal.
- Encourage informative, topic-based tagging. Publish a suggested
vocabulary in the contributing guide, organized by topic
(life-sciences, spatial, time-series, distributions, etc..). Help us
help the potential future user
- Tidymodels has a reactive table. Consider using one for the gallery too?
Reproducing the analysis
This analysis relies on _config.yml
from commit ed45cf8 and is fully parameterized in the script.
Download the full script here: ggplot2-gallery-tags-summary.qmd
Yann
ggplot2 gallery: tags summary
2026-04-30
Following a misuse of tags myself, as a new ggplot2 extender, I decided
to take a deeper dive into tags in the ggplot2 extension gallery.
This full Quarto doc can be found here.
_config.ymllists every extension with a free-texttagsfield. Because the field is unconstrained, the field nearly useless for meaningful discovery.TL;DR
Visualizationvsvisualization).ggDNAvisleads with 12 unique tags.visualizationis the number one tag with 145 appearances;generalfollows with 62. Together they account for 42% of alltag uses but carry essentially no information.
are used by exactly one package, and only 20 tags reach n ≥ 3.
What is wrong with the current tags
The raw .qmd file contains an object
tags_in_pkgswith the summaryinformation. download & explore it for better understanding.
Visualizationvsvisualizationvsvisualisationvsvisualizations;geomvsgeoms;time seriesvstime-series;themevsthemes;facetvsfacets;outliervsoutliers;customisablevscustomizable;algorithmvsalgorithms.generalis applied by 62packages and tells a reader nothing.
visualizationis applied by145 packages — i.e., almost everyone — so it is not a discriminator
either. I mean, this IS a
ggplot2gallery after all.private taxonomy. Examples:
gganatogram:anatograms, tissue, anatomy, expression, pharmacologyggDNAvis:DNA, RNA, customisable, customizable, medicine, methylation, sequence, FASTQ, ...(12 tags, all unique to this package).
ggblend:blending, affine transformation, layer algebra, compositing` (4 tags, all unique).These are accurate descriptions but they cannot help anyone find the package, because no other entry uses the same words.
ggoutlierscatterplotlistsboth
outlierandoutliers, bothalgorithmandalgorithms.ggDNAvislists bothcustomisableandcustomizable.Proposed unification
The core strategy is to reduce and unify tags to clusters that provide
actual information:
clear cluster exists.
Better control in the future
generalandvisualization. Both are appliedbroadly enough that they fail to discriminate between packages and
add little to no signal.
vocabulary in the contributing guide, organized by topic
(life-sciences, spatial, time-series, distributions, etc..). Help us
help the potential future user
Reproducing the analysis
This analysis relies on
_config.ymlfrom commit ed45cf8 and is fully parameterized in the script.
Download the full script here: ggplot2-gallery-tags-summary.qmd
Yann