Skip to content

Multi video backend#2153

Draft
delexagon wants to merge 8 commits into
codeforboston:mainfrom
delexagon:multi-video-backend
Draft

Multi video backend#2153
delexagon wants to merge 8 commits into
codeforboston:mainfrom
delexagon:multi-video-backend

Conversation

@delexagon

@delexagon delexagon commented Jun 2, 2026

Copy link
Copy Markdown
Collaborator

Summary

Added backend support for multiple video handling.

Changes:

Created an emulator for returning Assembly AI style transcripts when testing locally without setting ASSEMBLY_API_KEY.
Created a backfill function called backfillHearingVideoFormat.
Changed backfillHearingTranscriptions to support multiple videos.
Split video/Assembly AI work from HearingScraper/scrapeHearings into a different format called EventPostProcessor meant to update events after they have occurred, which is operated as a separate HearingPostProcessor/scrapeVideos.

Notes:

  • backfillHearingVideoFormat will convert the hearings into the new format
  • backfillHearingTranscriptions will fetch all videos for hearings
  • Because hearing documents are not compatible with the previous format, a search index backfill has to be done over hearings after they are converted
  • Interesting hearings to test:
    • 2709 has a video that has duplicate uploads, one labeled MASTER and the other labeled archive.
    • 2731 is like 2709, but one of the listed urls has a video of 2 hours of a "Missing File" screen.
    • 2858 has two seemingly identical videos which are also identically named with completely different URLs.
  • A list of all hearings known to have multiple videos up to hearing 5471 is [13, 14, 71, 91, 104, 138, 167, 187, 203, 214, 217, 292, 501, 680, 861, 2118, 2137, 2271, 2289, 2290, 2300, 2476, 2662, 2680, 2709, 2731, 2735, 2858, 2904, 2967, 3073, 3080, 3125, 3167, 3171, 3243, 3317, 3362, 3377, 3381, 3402, 3470, 3480, 3486, 3521, 3579, 3580, 3586, 3642, 3646, 3659, 3660, 3668, 3677, 3685, 3689, 3695, 3713, 3716, 3733, 3774, 3792, 3819, 3829, 3846, 3887, 3891, 3892, 3921, 3930, 3933, 3951, 3976, 3988, 4000, 4016, 4049, 4052, 4065, 4071, 4082, 4111, 4112, 4126, 4127, 4149, 4158, 4201, 4258, 4278, 4458, 4469, 4470, 4558, 4600, 4612, 4641, 4699, 4709, 4711, 4734, 4777, 4847, 4880, 5099, 5173, 5207, 5362, 5382, 5441, 5465, 5471].
  • Assembly AI is connected externally only if the environment variable ASSEMBLY_API_KEY has been set.
  • I don't think ${process.env.FUNCTIONS_API_BASE}/transcription points to localhost:5001 in the emulator, so I set it manually. Maybe it should be more generalized.
  • The new ballotquestions pages seem to reference the videoURLs, but not use them.

Checklist

  • If I've added new Firestore queries, I've added any new required indexes to firestore.indexes.json (Please do not only create indexes through the Firebase Web UI, even though the error messages may reccommend it - indexes created this way may be obliterated by subsequent deploys) - I do not believe this is relevant? I have not changed firestore.indexes.json.

Known issues

I will not convert this pull request from a draft until I have completed more testing.

Steps to test/reproduce

  1. Test backfillHearingVideoFormat (yarn firebase-admin run-script backfillHearingVideoFormat --env local)
  2. Test backfillHearingTranscription for all hearings (yarn firebase-admin run-script backfillHearingTranscription --env local) and for specific hearings (yarn firebase-admin run-script backfillHearingTranscription --env local --eventId 4258) that exist in the database. Test that rerunning this function without --recreateTranscripts does not create new transcriptIds and vice versa.
  3. Test the functions scrapeSingleHearing and scrapeSingleHearingv2
curl -X POST 'http://localhost:5001/demo-dtp/us-central1/scrapeSingleHearingv2' \
  -H "Content-Type: application/json" \
  -d '{"data": { "eventId": 3713 }}'
  1. Test pubsub functions (curl 'http://localhost:5001/demo-dtp/us-central1/triggerPubsubFunction?scheduled=scrapeHearings') (curl 'http://localhost:5001/demo-dtp/us-central1/triggerPubsubFunction?scheduled=scrapeVideos')
  2. Test that hearing indexing is functional after a VideoFormat + a backfill
  3. Test that Assembly AI is interpreted properly after changing ASSEMBLY_API_KEY in functions/.secret.local
  4. Test migrateHearingTranscription

@vercel

vercel Bot commented Jun 2, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
maple-dev Ready Ready Preview, Comment Jun 3, 2026 11:07pm

Request Review

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant