Add weighted MAE to results table and tests by andrewpbray · Pull Request #2 · grading-accuracy-study/GradingAccuracy

andrewpbray · 2026-05-26T20:34:25Z

compute_mae_and_isp now returns MAE, wMAE, and ISP separately; wMAE uses rubric point weights from metadata when supplied, NA otherwise
generate_results_row passes metadata.json to compute_mae_and_isp and populates wMAE_ columns alongside MAE_ and ISP_
generate_gt_results_table adds wMAE spanner and label stripping
New test file test-compute-mae-and-isp.R with 18 tests covering all three metrics across no-metadata, with-metadata, and edge cases

- compute_mae_and_isp now returns MAE, wMAE, and ISP separately; wMAE uses rubric point weights from metadata when supplied, NA otherwise - generate_results_row passes metadata.json to compute_mae_and_isp and populates wMAE_ columns alongside MAE_ and ISP_ - generate_gt_results_table adds wMAE spanner and label stripping - New test file test-compute-mae-and-isp.R with 18 tests covering all three metrics across no-metadata, with-metadata, and edge cases

Copilot

Pull request overview

This PR extends the grading-accuracy metrics pipeline to include a point-weighted MAE (wMAE) derived from rubric point values in metadata.json, and surfaces it in the generated results table output, along with new unit tests for the updated metric output shape.

Changes:

Update compute_mae_and_isp() to return a 3-metric list: MAE, wMAE, and ISP.
Add wMAE_ columns to generated results rows and display them in gt tables via a new “wMAE” spanner and label stripping.
Add a new testthat file covering core compute_mae_and_isp() behavior with and without metadata.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 8 comments.

File	Description
`R/computing-accuracy.R`	Changes `compute_mae_and_isp()` to return `MAE`, `wMAE`, `ISP` and introduces weighted computation via metadata-derived rubric scores.
`R/generate-results-table.R`	Plumbs metadata into metric computation, adds `wMAE_` columns, and updates `gt` formatting to include wMAE.
`tests/testthat/test-compute-mae-and-isp.R`	Adds new tests validating the new return structure and wMAE behavior in basic scenarios.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

  eval1 <- readr::read_csv(file1, show_col_types = FALSE)
  eval2 <- readr::read_csv(file2, show_col_types = FALSE)
  weights <- if (!is.null(metadata_file)) scores_from_metadata(metadata_file) else NULL
-  list(MAE = rubric_mae(eval1, eval2, weights = weights),
-       ISP = isp(eval1, eval2))
+  list(MAE  = rubric_mae(eval1, eval2),
+       wMAE = if (!is.null(weights)) rubric_mae(eval1, eval2, weights = weights) else NA_real_,
+       ISP  = isp(eval1, eval2))


  weights <- if (!is.null(metadata_file)) scores_from_metadata(metadata_file) else NULL
-  list(MAE = rubric_mae(eval1, eval2, weights = weights),
-       ISP = isp(eval1, eval2))
+  list(MAE  = rubric_mae(eval1, eval2),
+       wMAE = if (!is.null(weights)) rubric_mae(eval1, eval2, weights = weights) else NA_real_,
+       ISP  = isp(eval1, eval2))


+  list(MAE  = rubric_mae(eval1, eval2),
+       wMAE = if (!is.null(weights)) rubric_mae(eval1, eval2, weights = weights) else NA_real_,
+       ISP  = isp(eval1, eval2))


+      metrics <- compute_mae_and_isp(paste0(dir, f1), paste0(dir, f2),
+                                     metadata_file = metadata_file)

-      mae_vals[[pair_name]] <- metrics$MAE
-      isp_vals[[pair_name]] <- metrics$ISP
+      mae_vals[[pair_name]]  <- metrics$MAE
+      wmae_vals[[pair_name]] <- metrics$wMAE
+      isp_vals[[pair_name]]  <- metrics$ISP


+  metadata_file <- paste0(dir, "/metadata.json")
+
+  mae_vals  <- list()
+  wmae_vals <- list()
+  isp_vals  <- list()


+# compute_mae_and_isp - without metadata ------------------------------------
+
+test_that("compute_mae_and_isp - returns MAE, wMAE (NA), and ISP without metadata", {
+  result <- compute_mae_and_isp(experts_csv, students_csv)
+
+  expect_named(result, c("MAE", "wMAE", "ISP"))
+  expect_true(is.numeric(result$MAE))
+  expect_true(is.na(result$wMAE))
+  expect_true(is.numeric(result$ISP))
+})


+  # All other rows match.  Expected wMAE = 0.5 / 5 = 0.1
+  weights <- c(1.0, 0.5, 0.5, 0.0)
+  result <- compute_mae_and_isp(experts_csv, students_csv,
+                                metadata_file = metadata_json)
+  expected_wmae <- (0 + 0.5 + 0 + 0 + 0) / 5


@@ -274,8 +274,9 @@ compute_mae_and_isp <- function(file1, file2, metadata_file = NULL){
  eval1 <- readr::read_csv(file1, show_col_types = FALSE)
  eval2 <- readr::read_csv(file2, show_col_types = FALSE)
  weights <- if (!is.null(metadata_file)) scores_from_metadata(metadata_file) else NULL


Copilot AI review requested due to automatic review settings May 26, 2026 20:34

Copilot started reviewing on behalf of andrewpbray May 26, 2026 20:34 View session

Copilot AI reviewed May 26, 2026

View reviewed changes

andrewpbray merged commit 5f4b513 into main May 26, 2026
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add weighted MAE to results table and tests#2

Add weighted MAE to results table and tests#2
andrewpbray merged 1 commit into
mainfrom
extend-metrics

andrewpbray commented May 26, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

andrewpbray commented May 26, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants