DArts

Diversity in the MoMA Collection is a scroll-driven data story for EPFL COM-480. It follows more than 144,000 cleaned MoMA collection records through medium, geography, gender, time, and a final artist-match quiz.

Live site: https://com-480-data-visualization.github.io/DArts/

Project Pitch

MoMA is often experienced through a small set of famous paintings and sculptures. The recorded collection tells a different story: paper works dominate the archive, a small group of countries accounts for most credited works, gender parity arrives late and unevenly, and representation varies strongly by medium. DArts turns those patterns into a martini-glass narrative: authored scenes first, an open medium explorer next, and a personal underrepresented-artist match at the end.

Intended Usage

Open the live site and scroll from top to bottom. Scenes 1-3 introduce the core findings, Scene 4 lets users filter by decade, department, region, and selected country, and Scene 5 returns a deterministic underrepresented-artist match. The global decade slider filters the globe, gender chart, and medium explorer; clicking a country on the globe links that selection into later scenes.

Deliverables

Website source: website/
Process book PDF: process_book.pdf
Process book LaTeX source: process_book/main.tex
EDA notebook: notebooks/exploratory_analysis.ipynb
Data aggregation script: data/build_aggregates.py
Deployment workflow: .github/workflows/deploy.yml
Screencast link: https://youtu.be/zW6h9NghxNc

Team

Student	SCIPER
Oussama Ghali	341478
Nour Guermazi	314474
Isabella Linde	423106

Tech Stack

Svelte 5 + Vite
D3 for scales, layouts, geo projection, and path generation
TopoJSON for bundled world topology
Python + pandas for build-time data aggregation
GitHub Pages deployment through GitHub Actions

Technical Setup

Clone with submodules so the raw MoMA data is available:

git clone --recurse-submodules https://github.com/com-480-data-visualization/DArts.git
cd DArts

Install and run the website locally:

cd website
npm install
npm run dev

Run production checks:

cd website
npm run lint
npm run build
npm run preview

Regenerate data aggregates from the repository root:

python data/build_aggregates.py

The Vite base path is /DArts/, matching the GitHub Pages deployment URL.

Data

Source: Museum of Modern Art Collection, public dataset.

The raw MoMA files are provided through the data/moma-collection Git submodule. Some upstream raw files are larger than 50 MB, so they remain in the submodule rather than being duplicated into the website bundle. The deployed site uses compact JSON aggregates in website/public/data/.

Current reconciliation from data/build_report.txt:

Raw artworks: 160,248
Cleaned artworks used by the site: 144,149
Dated cleaned artworks after permissive parsing: 141,884
Cleaned artists: 11,879
Artist credits: 160,035
medium_totals.json reconciles to 144,149 works

The website never fetches raw CSVs at runtime. It uses pre-aggregated JSON, and the larger artist_index.json is loaded only when the explorer expansion or quiz needs it.

Folder Structure

.
|-- data/
|   |-- build_aggregates.py
|   |-- build_report.txt
|   |-- nationality_to_iso3.json
|   |-- regions.json
|   `-- moma-collection/
|-- notebooks/
|   `-- exploratory_analysis.ipynb
|-- process_book/
|   |-- main.tex
|   `-- figures/
|-- process_book.pdf
|-- website/
|   |-- public/data/
|   |-- public/topology/world-110m.json
|   |-- src/App.svelte
|   `-- src/lib/
|       |-- charts/
|       |-- components/
|       |-- design/
|       |-- scenes/
|       |-- stores/
|       `-- utils/
`-- .github/workflows/deploy.yml

Narrative Scenes

Hero - DArts
The Collection Takes Shape - treemap of collection areas
Where Are These Artists From? - orthographic globe with linked country selection
What About Gender? - female-credited share line chart and department small multiples
Does Your Medium Matter? - filterable medium explorer with 100% stacked bars
Who Are You In The Collection? - deterministic underrepresented artist match
Footer - credits, data attribution, and disclaimer

Deployment

Pushing to master or main runs .github/workflows/deploy.yml.

The workflow installs dependencies with Node 20, runs lint, builds website/dist, and publishes it to the gh-pages branch with peaceiris/actions-gh-pages@v4.

Notes

This project analyzes the recorded MoMA collection metadata. Demographic fields are limited to what MoMA records; absence in the data is not a judgment of curatorial intent or of an artist's significance. Artwork images are not embedded; the site links out to MoMA records where available.

Name		Name	Last commit message	Last commit date
Latest commit History 79 Commits
.github/workflows		.github/workflows
.husky		.husky
data		data
notebooks		notebooks
process_book		process_book
website		website
.gitattributes		.gitattributes
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
process_book.pdf		process_book.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DArts

Project Pitch

Intended Usage

Deliverables

Team

Tech Stack

Technical Setup

Data

Folder Structure

Narrative Scenes

Deployment

Notes

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

DArts

Project Pitch

Intended Usage

Deliverables

Team

Tech Stack

Technical Setup

Data

Folder Structure

Narrative Scenes

Deployment

Notes

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages