Executor: Cross-year student ID matching by johncmerfeld · Pull Request #60 · edanalytics/runway

johncmerfeld · 2026-05-13T18:40:19Z

Open question: I think it's technically possible for the second pass to not match anything at all. It would probably produce a result set of empty files which we may not want to upload. Do we just skip in that case?

Implement cross-year matching in the executor; companion to #62. The core logic here is

IF the app says there is a cross-year roster is available
- AND this is an ODS job
  - AND there are unmatched students from Earthmover
    - THEN run earthmover a second time on those unamtched students, using a roster we pull from EDU as a year-agnostic source of IDs
- AND this is NOT and ODS
  - THEN perform a single Earthmover pass, using the EDU roster as the source of IDs
Otherwise behave normally

Other notes:

Includes fixes from
When it happens, the second EM pass is special in that it
- switches the input file
- switches the roster file
- eliminates the match rate thresholds
- can still finish running even if no students match. It just returns the first run's unmatched students file and references its unmatched student count
It is not special in that it
- still returns unmatched students to the user
- still assumes a potentially hostile file encoding

edandylytics · 2026-05-22T17:35:53Z

+            cross_year_output = self.cross_year_pass(self.output_sets[0])
+            self.output_sets.append(cross_year_output)
+
+        self.upload_artifact(artifact.MATCH_RATES)


If the cross year pass generates unmatched IDs, it seems the file is being saved at non-ods/input_no_student_id_match.csv within the job directory, instead of at the top level where the app expects it.

FWIW, we could definitely tighten up the handoff of the unmatched IDs file... but I feel that's not really worth it in this project and soon we should be moving away from this paradigm anyway.

Thanks for flagging. I think in principle it's actually uploaded to both locations, but not being handled correctly due to the bug we discussed in slack. Still, I'm going to filter out the non-JSONL files from teh output set upload so that there aren't duplicated copies sent to S3

jalvord1 · 2026-05-22T19:31:56Z

+        # if we don't return the unmatched students file at all
+        self.logger.debug("too many unmatched students. Skipping upload")
+        artifact.UNMATCHED_STUDENTS.needs_upload = False
+        raise ValueError("insufficient ID matches to continue (highest rate {self.highest_match_rate} < required {config.REQUIRED_ID_MATCH_RATE}; ID column name: {self.highest_match_id_name}; Ed-Fi ID type: {self.highest_match_id_type})")


Does this need to be an f string?

jalvord1 · 2026-05-22T19:39:53Z

+        first_run_id_type = self.highest_match_id_type
+
+        first_run_output_dir = os.path.abspath(config.OUTPUT_DIR_FIRST_RUN)
+        os.rename(self.output_dir, first_run_output_dir)


Just checking my understanding: if the first run is successful and kicks out unmatched students to be processed again for xyear matches, and that second run is unsuccessful, we do not want to present the original unmatched students from run 1? My intuition says yes that is correct, and also I'm not sure how run 2 could fail.

Your understanding of the desired behavior is correct, and I agree it seems unlikely to occur in practice.

edandylytics · 2026-05-22T20:28:32Z

+
+def stream_to_file(session, url, dest_path, max_attempts=3):
+    """GET url as a stream and write the body to dest_path"""
+    for attempt in range(1, max_attempts + 1):


I tested the retry and it works well. Had the app interrupt the first two attempts after 10 rows and succeed on the third. The executor detected the interrupted stream and tried again.

# Attempt 1 [EarthbeamApiController] cross-year roster fetch failed for run 30: test error [ExecutorLocalPythonService] Executor stdout: stream_to_file: attempt 1/3 failed (ChunkedEncodingError); retrying in 5s # Attempt 2 [EarthbeamApiController] cross-year roster fetch failed for run 30: test error [ExecutorLocalPythonService] Executor stdout: stream_to_file: attempt 2/3 failed (ChunkedEncodingError); retrying in 10s # Attempt 3: Success [EarthbeamApiController] cross-year roster: partnerId=ea tenantCode=grand_bend rowCount=1060 durationMs=756 [ExecutorLocalPythonService] Executor stderr: 2026-05-22 15:22:13.053 runway INFO cross-year pass: matching on TSDS UID / Local

johncmerfeld self-assigned this May 13, 2026

johncmerfeld marked this pull request as ready for review May 14, 2026 22:46

johncmerfeld requested review from edandylytics, jalvord1 and theokaufman May 14, 2026 22:46

edandylytics reviewed May 22, 2026

View reviewed changes

jalvord1 reviewed May 22, 2026

View reviewed changes

edandylytics reviewed May 22, 2026

View reviewed changes

johncmerfeld added 20 commits May 28, 2026 16:49

iteration 0

37a7e2a

wip

1706e3f

iteration 1

1330dd9

add output_sets

cf9d56c

lower threshold in EM params

eaffcba

fix needs_upload

44fec94

log exception body

9b83af7

integrate edge-case fix

ad3d790

clean up language

62a829a

rerun on unmatched students file

024e7c1

better integrate other PR

8baaf0b

comments

b01bdfc

handle the unmatched students edge cases

b80f5d6

clean up comments

dea79ea

fix stacktrace reporting

cdbb6f0

integrate lightbeam guardrail

11f11f9

integrate lightbeam guardrail

b839bb6

more surgical move

2d3e9f7

clarify default upload status

b03dac0

merge develop

5c38d4e

johncmerfeld force-pushed the executor/x-year-matching branch from 4940802 to 5c38d4e Compare May 29, 2026 18:49

Merge branch 'development' into executor/x-year-matching

d3bcc58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Executor: Cross-year student ID matching#60

Executor: Cross-year student ID matching#60
johncmerfeld wants to merge 21 commits into
developmentfrom
executor/x-year-matching

johncmerfeld commented May 13, 2026 •

edited

Loading

Uh oh!

edandylytics May 22, 2026

Uh oh!

johncmerfeld May 26, 2026 •

edited

Loading

Uh oh!

jalvord1 May 22, 2026

Uh oh!

johncmerfeld May 26, 2026

Uh oh!

jalvord1 May 22, 2026

Uh oh!

johncmerfeld May 27, 2026

Uh oh!

edandylytics May 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

johncmerfeld commented May 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

edandylytics May 22, 2026

Choose a reason for hiding this comment

Uh oh!

johncmerfeld May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jalvord1 May 22, 2026

Choose a reason for hiding this comment

Uh oh!

johncmerfeld May 26, 2026

Choose a reason for hiding this comment

Uh oh!

jalvord1 May 22, 2026

Choose a reason for hiding this comment

Uh oh!

johncmerfeld May 27, 2026

Choose a reason for hiding this comment

Uh oh!

edandylytics May 22, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

johncmerfeld commented May 13, 2026 •

edited

Loading

johncmerfeld May 26, 2026 •

edited

Loading