Use this instructions on this repo to deidentify Gradescope submissions.
remotes::install_github("grading-accuracy-study/GradingAccuracy")
library(GradingAccuracy)
library(tidyverse)Specify a folder where the de-identified, processed files are saved.
exported_folder <- "./Midterm Exam/"Store the metadata.json in this folder, and check that it meets all formatting requirements
using validate_metadata_json().
validate_metadata_json(paste0(exported_folder, "metadata.json"),
verbose = T)The following process is used for processing exports from a de-identified Gradescope. This means that the only identifiable data we are removing is the names of the graders. The students are already de-identified in the Gradescope prior to exporting. This is often relevant to the expert grades.
The roster_csv is a csv file with a column for "Name", "Email" and a false "SID".
The following code using deidentify_graders() reads in the original-experts.csv, de-identifies the graders using the roster.csv,
and exports the csv with deidentified graders and a lookup table for the real and fake names of the graders.
roster_csv <- "../Roster.csv"
deidentify_graders("original-experts.csv", roster_csv,
"experts-calibrated.csv")It's occasionally necessary to normalize a full credit option.
The following code using normalize_full_credit() does so by normalizing the full credit
option to the equivalent rubric items and removing the full credit option.
students <- read_evals("students-uncalibrated.csv")
students <- normalize_full_credit(students, full_credit = 4,
rubric_items = c(5:8))
# remove full credit + rubric items
students <- students[, -c(4, 9:11)]
write_csv(students, "students-uncalibrated.csv")Then, we use generate_rubric_texts() to create a rubric_items.csv that has the original rubric items
and their mapping to the "R1", "R2" structure. This function also changes the headers of the experts-calibrated.csv
to the "R1", "R2" structure and saves it in the exported_folder. For the first
grades that are processed, existing should be false to create a new rubric_items.csv file
and true afterwards to keep updating.
expert <- generate_rubric_texts(csv_path = "experts-calibrated.csv",
output_folder =exported_folder,
existing = F)The following code is some additional data-processing to remove unnecessary columns.
read_evals(paste0(exported_folder, "experts-calibrated.csv")) |>
select(Name, SID, Score:Tags) |>
write_csv(paste0(exported_folder, "experts-calibrated.csv"))The update_scores() function updates the "Score" column in the experts-calibrated.csv
based on the point-values from the metadata.json. Remember to specify whether you want to overwrite
the original csv and/or if you want to use the calibrated or uncalibrated rubric.
expert <- update_scores(csv = paste0(exported_folder, "experts-calibrated.csv"),
metadata = paste0(exported_folder, "metadata.json"),
overwrite = T, calibrated = T)The following process is used for processing exports from the original Gradescope. This means that we are removing is the names of the students. This is often relevant to the student grades.
We use deidentify_gradescope_evals() to take the original export original-students-calibrated.csv and
deidentify using the deidentified-lookup-table.csv, so they can be mapped to the other
deidentified grades. The deidentified grades are exported to students-calibrated.csv.
deidentify_gradescope_evals("original-students-calibrated.csv",
"deidentified-lookup-table.csv",
"students-calibrated.csv")We similarly use generate_rubric_texts() to change the headers into the "R1", "R2" structure.
Since the existing argument defaults to false, this will add another row to the rubric-items.csv.
student <- generate_rubric_texts(csv_path = "students-calibrated.csv",
output_folder =exported_folder)We drop NA values from the SID column because there are often student grades that are unused for this study (due to random sampling or invalid submissions).
read_evals(paste0(exported_folder, "students-calibrated.csv")) |>
drop_na(SID) |>
write_csv(paste0(exported_folder, "students-calibrated.csv"))We similarly update the scores using the metadata point-values using update_scores().
student <- update_scores(csv = paste0(exported_folder, "students-calibrated.csv"),
metadata = paste0(exported_folder, "metadata.json"),
overwrite = T, calibrated = T)Additionally, we can now use update_scores_in_metadata() to update the n_submissions and mean_score.
While these metrics are computed based on the expert grades, there is a check to make sure that there is
an equal number of students in the student-graded and expert-graded exports.
update_scores_in_metadata(folder = exported_folder,
file = paste0(exported_folder, "metadata.json"))Pensive exports have slight deviations from the Gradescope exports and hence require unique data processing.
Pensive requires manual de-identification with the following code:
roster_pensieve <- read_csv("../Roster_Pensieve.csv")
roster <- read_csv(roster_csv) |>
rbind(roster_pensieve)
# add SID
read_csv("original-pensive-calibrated.csv") |>
left_join(roster, by = c("Name", "Email")) |>
relocate(SID) |>
write_csv("pensive-calibrated.csv")We once again use generate_rubric_texts(), but note the slightly different arguments.
pensieve <- generate_rubric_texts("pensive-calibrated.csv",
ignored_nrows = 0,
pensieve = T,
exported_folder)We remove unnecessary columns.
read_csv(paste0(exported_folder, "pensive-calibrated.csv")) |>
select(-c(`Assignment ID`, `Problem ID`, Email)) |>
write_csv(paste0(exported_folder, "pensive-calibrated.csv")) Finally, we update the scores using the metadata point-values using update_scores().
pensive <- update_scores(csv = paste0(exported_folder, "pensive-calibrated.csv"),
metadata = paste0(exported_folder, "metadata.json"),
overwrite = T, calibrated = T)As a final check, it's useful to make sure that all SIDs across all files are present uniformly.
identical(sort(expert$SID),sort(pensive$SID))
identical(sort(expert$SID),sort(student$SID))
identical(sort(student$SID),sort(pensive$SID))