Skip to content

Add weighted MAE to results table and tests#2

Merged
andrewpbray merged 1 commit into
mainfrom
extend-metrics
May 26, 2026
Merged

Add weighted MAE to results table and tests#2
andrewpbray merged 1 commit into
mainfrom
extend-metrics

Conversation

@andrewpbray

Copy link
Copy Markdown
Collaborator
  • compute_mae_and_isp now returns MAE, wMAE, and ISP separately; wMAE uses rubric point weights from metadata when supplied, NA otherwise
  • generate_results_row passes metadata.json to compute_mae_and_isp and populates wMAE_ columns alongside MAE_ and ISP_
  • generate_gt_results_table adds wMAE spanner and label stripping
  • New test file test-compute-mae-and-isp.R with 18 tests covering all three metrics across no-metadata, with-metadata, and edge cases

- compute_mae_and_isp now returns MAE, wMAE, and ISP separately;
  wMAE uses rubric point weights from metadata when supplied, NA otherwise
- generate_results_row passes metadata.json to compute_mae_and_isp and
  populates wMAE_ columns alongside MAE_ and ISP_
- generate_gt_results_table adds wMAE spanner and label stripping
- New test file test-compute-mae-and-isp.R with 18 tests covering all
  three metrics across no-metadata, with-metadata, and edge cases
Copilot AI review requested due to automatic review settings May 26, 2026 20:34

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR extends the grading-accuracy metrics pipeline to include a point-weighted MAE (wMAE) derived from rubric point values in metadata.json, and surfaces it in the generated results table output, along with new unit tests for the updated metric output shape.

Changes:

  • Update compute_mae_and_isp() to return a 3-metric list: MAE, wMAE, and ISP.
  • Add wMAE_ columns to generated results rows and display them in gt tables via a new “wMAE” spanner and label stripping.
  • Add a new testthat file covering core compute_mae_and_isp() behavior with and without metadata.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 8 comments.

File Description
R/computing-accuracy.R Changes compute_mae_and_isp() to return MAE, wMAE, ISP and introduces weighted computation via metadata-derived rubric scores.
R/generate-results-table.R Plumbs metadata into metric computation, adds wMAE_ columns, and updates gt formatting to include wMAE.
tests/testthat/test-compute-mae-and-isp.R Adds new tests validating the new return structure and wMAE behavior in basic scenarios.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread R/computing-accuracy.R
Comment on lines 274 to +279
eval1 <- readr::read_csv(file1, show_col_types = FALSE)
eval2 <- readr::read_csv(file2, show_col_types = FALSE)
weights <- if (!is.null(metadata_file)) scores_from_metadata(metadata_file) else NULL
list(MAE = rubric_mae(eval1, eval2, weights = weights),
ISP = isp(eval1, eval2))
list(MAE = rubric_mae(eval1, eval2),
wMAE = if (!is.null(weights)) rubric_mae(eval1, eval2, weights = weights) else NA_real_,
ISP = isp(eval1, eval2))
Comment thread R/computing-accuracy.R
Comment on lines 276 to +279
weights <- if (!is.null(metadata_file)) scores_from_metadata(metadata_file) else NULL
list(MAE = rubric_mae(eval1, eval2, weights = weights),
ISP = isp(eval1, eval2))
list(MAE = rubric_mae(eval1, eval2),
wMAE = if (!is.null(weights)) rubric_mae(eval1, eval2, weights = weights) else NA_real_,
ISP = isp(eval1, eval2))
Comment thread R/computing-accuracy.R
Comment on lines +277 to +279
list(MAE = rubric_mae(eval1, eval2),
wMAE = if (!is.null(weights)) rubric_mae(eval1, eval2, weights = weights) else NA_real_,
ISP = isp(eval1, eval2))
Comment on lines +108 to +113
metrics <- compute_mae_and_isp(paste0(dir, f1), paste0(dir, f2),
metadata_file = metadata_file)

mae_vals[[pair_name]] <- metrics$MAE
isp_vals[[pair_name]] <- metrics$ISP
mae_vals[[pair_name]] <- metrics$MAE
wmae_vals[[pair_name]] <- metrics$wMAE
isp_vals[[pair_name]] <- metrics$ISP
Comment on lines +86 to +90
metadata_file <- paste0(dir, "/metadata.json")

mae_vals <- list()
wmae_vals <- list()
isp_vals <- list()
Comment on lines +10 to +19
# compute_mae_and_isp - without metadata ------------------------------------

test_that("compute_mae_and_isp - returns MAE, wMAE (NA), and ISP without metadata", {
result <- compute_mae_and_isp(experts_csv, students_csv)

expect_named(result, c("MAE", "wMAE", "ISP"))
expect_true(is.numeric(result$MAE))
expect_true(is.na(result$wMAE))
expect_true(is.numeric(result$ISP))
})
Comment on lines +76 to +80
# All other rows match. Expected wMAE = 0.5 / 5 = 0.1
weights <- c(1.0, 0.5, 0.5, 0.0)
result <- compute_mae_and_isp(experts_csv, students_csv,
metadata_file = metadata_json)
expected_wmae <- (0 + 0.5 + 0 + 0 + 0) / 5
Comment thread R/computing-accuracy.R
@@ -274,8 +274,9 @@ compute_mae_and_isp <- function(file1, file2, metadata_file = NULL){
eval1 <- readr::read_csv(file1, show_col_types = FALSE)
eval2 <- readr::read_csv(file2, show_col_types = FALSE)
weights <- if (!is.null(metadata_file)) scores_from_metadata(metadata_file) else NULL
@andrewpbray andrewpbray merged commit 5f4b513 into main May 26, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants