JSON Schema definitions for all data structures produced by the reprodb-pipeline and consumed by the reprodb.github.io website.
Browse the documentation: reprodb.github.io/data-schemas
Each schema validates one or more output files produced by the pipeline. Output files land in two locations within the website repo:
_data/— YAML/JSON consumed by Jekyll templatesassets/data/— JSON served directly to the browser (charts, search, client-side tables)_build/— Intermediate files (not published, used by downstream generators)
| Schema | Output file(s) | Generator | Description |
|---|---|---|---|
| artifacts | assets/data/artifacts.json |
generate_statistics.py |
Core artifact records with badges and URLs |
| artifacts_by_conference | _data/artifacts_by_conference.yml |
generate_statistics.py |
Badge breakdown by conference and year |
| artifacts_by_year | _data/artifacts_by_year.yml |
generate_statistics.py |
Year-over-year artifact counts |
| ae_members | assets/data/ae_members.json, assets/data/{area}_ae_members.json |
committee_stats/ |
AE committee member lists |
| artifact_availability | assets/data/artifact_availability.json |
generate_artifact_availability.py |
URL liveness checks for artifacts |
| artifact_citations | assets/data/artifact_citations.json |
generate_artifact_citations.py |
Citation data for artifacts |
| author_index | assets/data/author_index.json |
generate_author_stats.py |
Lightweight author lookup index |
| author_profiles | assets/data/author_profiles.json |
generate_author_profiles.py |
Detailed per-author profile data |
| author_stats | assets/data/authors.json, assets/data/{area}_authors.json |
generate_author_stats.py |
Per-author statistics and paper lists |
| combined_rankings | assets/data/combined_rankings.json, assets/data/{conf}_combined_rankings.json |
generate_combined_rankings.py |
Author rankings combining artifacts + AE service |
| committee_stats | assets/data/committee_stats.json |
committee_stats/ |
AE committee participation statistics |
| institution_ranking_history | assets/data/institution_ranking_history.json |
generate_ranking_history.py |
Historical institution ranking changes |
| institution_rankings | assets/data/institution_rankings.json, assets/data/{conf}_institution_rankings.json |
generate_institution_rankings.py |
Institution-level rankings and metrics |
| paper_citations | assets/data/paper_citations.json |
generate_paper_citations_doi.py |
Paper citation counts from OpenAlex/Crossref |
| paper_index | assets/data/papers.json, _data/papers.json |
generate_author_stats.py |
Paper metadata index |
| participation_stats | assets/data/participation_stats.json, _data/participation_stats.yml |
generate_participation_stats.py |
Conference participation trends |
| ranking_history | assets/data/ranking_history.json |
generate_ranking_history.py |
Historical author ranking changes |
| repo_stats | assets/data/repo_stats_detail.json |
generate_repo_stats.py |
Per-artifact repository metrics (stars, forks) |
| repo_stats_summary | _data/repo_stats.yml |
generate_repo_stats.py |
Aggregated repository metrics (overall, by-conference, by-year, by-area) |
| repo_stats_yearly | assets/data/repo_stats_yearly.json |
generate_repo_stats.py |
Per-conference yearly star/fork trends for charts |
| search_data | assets/data/search_data.json |
generate_search_data.py |
Merged data for website full-text search |
| summary | _data/summary.yml, assets/data/summary.json |
generate_statistics.py |
High-level site summary statistics |
| top_repos | assets/data/top_repos.json, assets/data/{area}_top_repos.json |
generate_repo_stats.py |
Top repositories by stars |
| artifact_citations_summary | assets/data/artifact_citations_summary.json |
generate_artifact_citations.py |
Citation analysis summary with per-year breakdown and verification |
| citation_history | _build/citation_history.json |
generate_paper_citations_doi.py |
Time-series of per-paper citation counts from OpenAlex/Semantic Scholar |
| citation_verification_summary | assets/data/citation_verification_summary.json |
generate_artifact_citations.py |
Crossref-based verification distinguishing genuine vs false-positive citations |
| conf_authors | _build/{conf}_conf_authors.json |
generate_author_stats.py |
Per-conference author rankings with badge breakdowns |
| geographic_statistics | assets/data/geographic_statistics.json |
generate_institution_rankings.py |
Country and continent statistics for reproducibility and artifacts |
| institution_timeline | assets/data/institution_timeline.json, _build/institution_timeline.json |
committee_stats/ |
Year-over-year institution AE participation with per-area breakdown |
| paper_authors_map | assets/data/paper_authors_map.json, _build/paper_authors_map.json |
generate_author_stats.py |
Paper → author mapping with venue/badge info |
| paper_citations_summary | assets/data/paper_citations_summary.json |
generate_paper_citations_doi.py |
Summary of paper citation lookup coverage and totals |
| repo_stats_history | assets/data/repo_stats_history.json, _build/repo_stats_history.json |
generate_repo_stats.py |
Time-series of stars/forks (GitHub) or views/downloads (Zenodo) per repo |
| Output file | Generator | Description | Priority |
|---|---|---|---|
assets/data/cited_artifacts_by_author.json |
generate_cited_artifacts_list.py |
Authors mapped to their cited artifacts | Medium |
assets/data/cited_artifacts_by_institution.json |
generate_cited_artifacts_list.py |
Institutions mapped to cited artifacts | Medium |
assets/data/cited_artifacts_list.json |
generate_cited_artifacts_list.py |
Flat list of all cited artifacts | Medium |
_data/author_summary.yml |
generate_author_stats.py |
Author summary for Jekyll | Low |
_data/combined_summary.yml |
generate_combined_rankings.py |
Combined ranking summary for Jekyll | Low |
_data/coverage.yml |
generate_statistics.py |
Conference/year coverage table | Low |
_data/navigation.yml |
generate_statistics.py |
Site navigation structure | Low |
_data/pipeline_metadata.yml |
Pipeline runner | Run timestamp and version info | Low |
_data/all_results_cache.yml |
generate_statistics.py |
Full results cache for Jekyll | Low |