[Ops]: Add Chrysalis ingestion wrapper#169
Conversation
|
@tomvothecoder Once I am clear on the boundaries to the term "NERSC ingestion wrapper", I should be able to comprehent "Chrysalis ingestion wrapper". The term "scheduler-agnostic" refers to Jenkins? (I always considered cron to be universal...). |
TonyB9000
left a comment
There was a problem hiding this comment.
Configures a call to the hpc_archive_ingestor. Understandable.
What process sets "SIMBOARD_API_BASE_URL" and "SIMBOARD_API_TOKEN"?
|
Hmmmm. The "Tom Requested your review" took me to the page with 7 files to examine, each with a "submit-review" option. As soon as I completed the first one, all 7 vanished... |
Accidentally tagged you for review. I meant to assign this PR you. It is fixed now. |
|
@tomvothecoder "Accidentally tagged you for review". OK, (I think colleges should offer a master's program in github). |
|
Chrysalis and other non-NERSC sites require upload-based ingestion rather than path-based ingestion, so follow-up work is tracked in #207 for a state-first HPC upload flow with DB-backed dedupe parity. |
|
Using the "upload-based' vs "path-based" terminology, my thought was that when the NERSC upload-receiving system was deliverd an upload from a non-NERSC system, it could open it in the existing NERSC PA-directory under (say) "From_crysalis/<new_exec_ids>" and then process it with the existing "path-based" codes - assuming PACE would not interfere with it (and vice-versa). But on second thought, to avoid PACE crossing, it would be best to open it in a separate "PACE-unaware" directory. |
|
@tomvothecoder I am preparing to exercise "hpc_upload_archive_ingestor.py" on chrysalis, to see the logs and flow (in dry-run) in action, discover parameter faults, etc. QUESTION: Although, on NERSC, the backend ingestion is "path-based" (returnsp paths for ingestion), it could in principle run the "https-transfer-based" codes just as easily. I might try a dryrun on NESRC/Perlmutter first, since that configuration is already a known item. Then, differences in behavior on chrysalis would stand out. Does that make sense? |
22a1a88 to
feae197
Compare
|
@tomvothecoder Apologies if I'm doing this wrong. I attempted to test the "hpc_upload" on NESRC, thinking "--help" might be helpful. To get started, I needed an environment where I could install things, so: |
|
Hey Tony, happy to help and no apologies needed.
SimBoard defines the Python backend dependencies in You can run
I'd checkout this branch now that I've rebased it on the latest The |
|
@tomvothecoder I get the latest stuff - but I have made progress. My latest run_script (NERSC dry_run test) says: The output indicates that I am missing "archive_root” and “has_api_token”. By examining the "nersc" "_build_config" function, I can see what variables exist to push into the environment. I'll checkout branch #169 on both NERSC and Chrysalis to do comparisons in outputs. |
|
@tomvothecoder git gets me again: You wrote: "I'd checkout this branch now that I've rebased it on the latest main commit." is "this branch" off of main, as you had advised? Or is it off of a fork?? ((test_simboard) ) (base) [ac.bartoletti1@chrlogin1 simboard]$ git branch -a
When I get too confused, I do a clean "git clone". Then I can do one of these: Which is appropriate in this case? |
|
@tomvothecoder If I pull down a branch off of someone's fork, am I in that fork, or can I pull that into a new branch of my local main? The persistence of branches and forks, between local and remote, is a bit of a mystery. |
This branch ( Something like this (I did not verify correctness): git remote add tomvothecoder https://github.com/tomvothecoder/simboard
git checkout tomvothecoder feature/154-ingestion-ites I usually work directly on upstream and not fork when possible, but in this case I use a fork for separate testing purposes. |
|
@tomvothecoder Sorry, I guess I must pull from your fork. Quick test: I cd to ``/home/ac.bartoletti1/gitrepo/simboard/backend" and issue
The result: |
|
@tomvothecoder The line in "chrysalis.sh" script_dir="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"; echo $script_dir clearly wont work for defining "backend_root" as backend_root="$(cd "${script_dir}/../../../.." && pwd)" I will modify chrysalis.sh to provide a "backend_root" that does not depend upon the user location., at least for test purposes. |
|
@tomvothecoder Works better now that it can find "backentd/apps" When I use this for "chrysalis.sh": and issue these exports: I get: I guess, even "dry_run" requires real URLs and API_tokens. That is because we need "state" up front. |
|
Hi @tomvothecoder The document also says: One-case-per-request rule:
The term "alongside the archive" is a bit ambiguous. Would this be accurate?
Or am I misunderstanding the intent? |
Great to see the progress! Yes, the dry run needs to query the SimBoard database via the REST API. I will send the
Your info sounds more accurate, thanks for the suggestion. Can you point me to the source document with this info? I will update it. |
|
@tomvothecoder Running "chrysalis.sh" with the full (DRY_RUN) parameters yieded the following summary (folded for readability): Questions that arise:
Observation: The bulk of work getting to this point involved stuffing the right ENV VARS and having created an environment where misc modules like "dateutils" could be installed. On Chrysalis, I performed On NERSC/Perlmutter, I simply replaced "python3.12" with "python3.11". I intend to perform the same test on Perlmutter, just to exercise the mechanisms of networking. |
|
@tomvothecoder For comparison, running the equivalent commands on perlmutter (swapping our parameters where necessary), we obtain the summary: I suppose I should re-run the chrysalis test, using "OLD_PERF" as the root_PA directory. It is HUGE. |
Since this is a first-time dry-run on the Chrysalis
That exact counter is not in So yes: a test expecting that exact field is probably unrealistic or stale.
The ingestor script only asks SimBoard for enough existing ingestion state to decide which archive cases and their executions are new and may be candidates for ingestion. It does not fetch, return, or summarize the full database state. It also does not show the database query because the query is behind the SimBoard API, not inside the ingestor script. So this is not a Chrysalis-specific DB query in the ingestor. It is an API request filtered by the configured If more detail is needed, the API response or ingestor summary would need to be expanded to include counts like total known cases, known execution IDs, skipped known cases, and machine filter used. Happy for you to open a new GitHub issue to expand logging in https://github.com/E3SM-Project/simboard/blob/main/backend/app/scripts/ingestion/nersc_archive_ingestor.py and https://github.com/E3SM-Project/simboard/blob/main/backend/app/scripts/ingestion/hpc_upload_archive_ingestor.py.
They are dry-run logging counters.
The point is to avoid massive logs when many candidate cases are found. It does not change which cases are candidates or which cases would be ingested. |
I don't think this is going to work yet as the directory structure of "OLD_PERF" is different from "performance_archive". We might also want to be targeted in what we ingest from "OLD_PERF". This will require guidance Rob/Jill. |
|
@tomvothecoder I could do the more thorough assessment, but "OLD_PERF" seems rather sparse. It contains a subdirecotry for every year and month, and each has "performance_archive" subdirectories (one per day, approx), but they appear to contain no "exec_id" material, only logs: |
|
@tomvothecoder More importantly, I think we might want a "DRY_RUN_1" and "DRY_RUN_2", the latter wherein we exercise the "tar.gz" generation. (Even a "DRY_RUN_0" could stub the remote state test, and obviate the need to exercise the URL and API TOKEN, while still exercising the directory traversal and other "accept/reject" logic.) As far as logging, it would be nice to simply know how many local entries were rejected due to redundancy (already accepted), or some confirmation that the remote state query had actually succeeded. |
Description
This adds a scheduler-agnostic HPC archive ingestor entrypoint and a thin Chrysalis site wrapper for existing Jenkins-driven metadata ingestion.
hpc_archive_ingestormodule that delegates to existing NERSC ingestorsites/chrysalis.shwrapper with Chrysalis archive and state defaults plus required API env varsChecklist
Deployment Notes (if any)
No special deployment steps.
Local validation is currently blocked because PostgreSQL was unavailable at
127.0.0.1, somake backend-testand the targeted ingestion test file could not complete in this environment.