[Ops]: Add Chrysalis ingestion wrapper by tomvothecoder · Pull Request #169 · E3SM-Project/simboard

tomvothecoder · 2026-04-30T18:53:22Z

Description

This adds a scheduler-agnostic HPC archive ingestor entrypoint and a thin Chrysalis site wrapper for existing Jenkins-driven metadata ingestion.

Closes [Ops]: Scope automated metadata ingestion at other sites (prioritize Chrysalis first) #154
Adds shared hpc_archive_ingestor module that delegates to existing NERSC ingestor
Adds sites/chrysalis.sh wrapper with Chrysalis archive and state defaults plus required API env vars
Documents shared ingestor and site-wrapper pattern in backend scripts README
Adds test coverage for the generic HPC module entrypoint

Checklist

Code follows project style guidelines
Self-reviewed code
No new warnings
Tests added or updated (if needed)
All tests pass (locally and CI/CD)
Documentation/comments updated (if needed)
Breaking change noted (if applicable)

Deployment Notes (if any)

No special deployment steps.

Local validation is currently blocked because PostgreSQL was unavailable at 127.0.0.1, so make backend-test and the targeted ingestion test file could not complete in this environment.

TonyB9000 · 2026-05-06T22:49:08Z

@tomvothecoder Once I am clear on the boundaries to the term "NERSC ingestion wrapper", I should be able to comprehent "Chrysalis ingestion wrapper". The term "scheduler-agnostic" refers to Jenkins? (I always considered cron to be universal...).

TonyB9000

Configures a call to the hpc_archive_ingestor. Understandable.

What process sets "SIMBOARD_API_BASE_URL" and "SIMBOARD_API_TOKEN"?

TonyB9000 · 2026-05-13T17:04:12Z

Hmmmm. The "Tom Requested your review" took me to the page with 7 files to examine, each with a "submit-review" option. As soon as I completed the first one, all 7 vanished...

tomvothecoder · 2026-05-13T22:19:41Z

Configures a call to the hpc_archive_ingestor. Understandable.

What process sets "SIMBOARD_API_BASE_URL" and "SIMBOARD_API_TOKEN"?

SIMBOARD_API_BASE_URL is the URL for the REST API calls -> https://simboard-dev-api.e3sm.org/api/v1
- More info here: https://simboard-dev-api.e3sm.org/docs#/Ingestions/ingest_from_upload_api_v1_ingestions_from_upload_post
SIMBOARD_API_TOKEN -- I create this in the SimBoard backend at NERSC (with this). When we get this far, I will send you the token which you will then set securely in the preferred method for the Jenkins workflow at LCRC with help from Wade.

Hmmmm. The "Tom Requested your review" took me to the page with 7 files to examine, each with a "submit-review" option. As soon as I completed the first one, all 7 vanished...

Accidentally tagged you for review. I meant to assign this PR you. It is fixed now.

TonyB9000 · 2026-05-14T16:59:55Z

@tomvothecoder "Accidentally tagged you for review". OK, (I think colleges should offer a master's program in github).

tomvothecoder · 2026-05-27T21:36:37Z

Chrysalis and other non-NERSC sites require upload-based ingestion rather than path-based ingestion, so follow-up work is tracked in #207 for a state-first HPC upload flow with DB-backed dedupe parity.

TonyB9000 · 2026-05-27T23:07:04Z

Using the "upload-based' vs "path-based" terminology, my thought was that when the NERSC upload-receiving system was deliverd an upload from a non-NERSC system, it could open it in the existing NERSC PA-directory under (say) "From_crysalis/<new_exec_ids>" and then process it with the existing "path-based" codes - assuming PACE would not interfere with it (and vice-versa). But on second thought, to avoid PACE crossing, it would be best to open it in a separate "PACE-unaware" directory.

TonyB9000 · 2026-06-04T18:32:08Z

@tomvothecoder I am preparing to exercise "hpc_upload_archive_ingestor.py" on chrysalis, to see the logs and flow (in dry-run) in action, discover parameter faults, etc.

QUESTION: Although, on NERSC, the backend ingestion is "path-based" (returnsp paths for ingestion), it could in principle run the "https-transfer-based" codes just as easily. I might try a dryrun on NESRC/Perlmutter first, since that configuration is already a known item. Then, differences in behavior on chrysalis would stand out. Does that make sense?

TonyB9000 · 2026-06-04T22:40:56Z

@tomvothecoder Apologies if I'm doing this wrong.

I attempted to test the "hpc_upload" on NESRC, thinking "--help" might be helpful. To get started, I needed an environment where I could install things, so:

After

    python3.11 -m venv ~/envs/test_simboard
    source ~/envs/test_simboard/bin/activate
    python3.11 -m pip install --upgrade pip

    python3.11 -m pip install python-dateutil
    pip install pydantic
    pip install fastapi_users
 
The (bash script) commands:

    REPO_ROOT="/global/homes/t/tonyb/gitrepo/simboard/backend"
    SCRIPT="$REPO_ROOT/app/scripts/ingestion/hpc_upload_archive_ingestor.py"

    PYTHONPATH="$REPO_ROOT"
    python3.11 "$SCRIPT" --help

Produces the following output:

2026-06-04 15:02:26,464 [INFO]: nersc_archive_ingestor.py(_log_event:1344) >> ts=2026-06-04T22:02:26.464377+00:00 event=run_started archive_root=/performance_archive mode=ingest
2026-06-04 15:02:26,464 [INFO]: nersc_archive_ingestor.py(_log_event:1344) >> ts=2026-06-04T22:02:26.464638+00:00 event=startup_configuration_begin
2026-06-04 15:02:26,464 [INFO]: nersc_archive_ingestor.py(_log_event:1344) >> ts=2026-06-04T22:02:26.464749+00:00 event=summary_table row_count=10 rows="api.api_base_url=http://backend:8000 | api.endpoint_url=http://backend:8000/api/v1/ingestions/from-hpc-upload | api.state_endpoint_url=http://backend:8000/api/v1/ingestions/state | paths.archive_root=/performance_archive | runtime.machine_name=perlmutter | runtime.dry_run=false | runtime.max_cases_per_run=null | runtime.max_attempts=3 | runtime.request_timeout_seconds=60 | auth.has_api_token=false" title=startup_configuration
2026-06-04 15:02:26,464 [INFO]: nersc_archive_ingestor.py(_log_event:1344) >> ts=2026-06-04T22:02:26.464819+00:00 event=startup_configuration_end
2026-06-04 15:02:26,464 [INFO]: nersc_archive_ingestor.py(_log_event:1344) >> ts=2026-06-04T22:02:26.464876+00:00 event=archive_root_missing archive_root=/performance_archive
2026-06-04 15:02:26,464 [INFO]: nersc_archive_ingestor.py(_log_event:1344) >> ts=2026-06-04T22:02:26.464929+00:00 event=run_finished duration_seconds=0.001 exit_code=1 mode=ingest

I now see there is no commandline parsing.  I need to set “dry_run” as an environment variable so that the auto-generated config will pick it up.  I must have missed where the docs explain setting the environment variables.  I assume I can set them in my “run_script”.

tomvothecoder · 2026-06-05T17:42:42Z

Hey Tony, happy to help and no apologies needed.

I attempted to test the "hpc_upload" on NESRC, thinking "--help" might be helpful. To get started, I needed an environment where I could install things, so:
After

    python3.11 -m venv ~/envs/test_simboard
    source ~/envs/test_simboard/bin/activate
    python3.11 -m pip install --upgrade pip

    python3.11 -m pip install python-dateutil
    pip install pydantic
    pip install fastapi_users
 

SimBoard defines the Python backend dependencies in pyproject.toml and uses uv for dependency management.

You can run make backend-install if you only need a Python env (source).

The (bash script) commands:
REPO_ROOT="/global/homes/t/tonyb/gitrepo/simboard/backend"
SCRIPT="$REPO_ROOT/app/scripts/ingestion/hpc_upload_archive_ingestor.py"

PYTHONPATH="$REPO_ROOT"
python3.11 "$SCRIPT" --help
Produces the following output:

2026-06-04 15:02:26,464 [INFO]: nersc_archive_ingestor.py(_log_event:1344) >> ts=2026-06-04T22:02:26.464377+00:00 event=run_started archive_root=/performance_archive mode=ingest
2026-06-04 15:02:26,464 [INFO]: nersc_archive_ingestor.py(_log_event:1344) >> ts=2026-06-04T22:02:26.464638+00:00 event=startup_configuration_begin
2026-06-04 15:02:26,464 [INFO]: nersc_archive_ingestor.py(_log_event:1344) >> ts=2026-06-04T22:02:26.464749+00:00 event=summary_table row_count=10 rows="api.api_base_url=http://backend:8000 | api.endpoint_url=http://backend:8000/api/v1/ingestions/from-hpc-upload | api.state_endpoint_url=http://backend:8000/api/v1/ingestions/state | paths.archive_root=/performance_archive | runtime.machine_name=perlmutter | runtime.dry_run=false | runtime.max_cases_per_run=null | runtime.max_attempts=3 | runtime.request_timeout_seconds=60 | auth.has_api_token=false" title=startup_configuration
2026-06-04 15:02:26,464 [INFO]: nersc_archive_ingestor.py(_log_event:1344) >> ts=2026-06-04T22:02:26.464819+00:00 event=startup_configuration_end
2026-06-04 15:02:26,464 [INFO]: nersc_archive_ingestor.py(_log_event:1344) >> ts=2026-06-04T22:02:26.464876+00:00 event=archive_root_missing archive_root=/performance_archive
2026-06-04 15:02:26,464 [INFO]: nersc_archive_ingestor.py(_log_event:1344) >> ts=2026-06-04T22:02:26.464929+00:00 event=run_finished duration_seconds=0.001 exit_code=1 mode=ingest

I now see there is no commandline parsing. I need to set “dry_run” as an environment variable so that the auto-generated config will pick it up. I must have missed where the docs explain setting the environment variables. I assume I can set them in my “run_script”.

I'd checkout this branch now that I've rebased it on the latest main commit.

The chrysalis.sh bash script exports environment variables and wraps hpc_upload_archive_ingestor.py. You can try experimenting with that script. More info here: https://github.com/tomvothecoder/simboard/tree/feature/154-ingestion-sites/backend/app/scripts#hpc-upload-archive-ingestor.

TonyB9000 · 2026-06-05T19:09:36Z

@tomvothecoder I get the latest stuff - but I have made progress. My latest run_script (NERSC dry_run test) says:

REPO_ROOT="/global/homes/t/tonyb/gitrepo/simboard/backend"
WORKDIR="/global/homes/t/tonyb/test/simboard"
SCRIPT="$REPO_ROOT/app/scripts/ingestion/hpc_upload_archive_ingestor.py"

export PYTHONPATH="$REPO_ROOT"
export DRY_RUN=True
python3.11 "$SCRIPT"

The output indicates that I am missing "archive_root” and “has_api_token”.

By examining the "nersc" "_build_config" function, I can see what variables exist to push into the environment.

I'll checkout branch #169 on both NERSC and Chrysalis to do comparisons in outputs.

TonyB9000 · 2026-06-05T20:14:53Z

@tomvothecoder git gets me again:

You wrote: "I'd checkout this branch now that I've rebased it on the latest main commit."

is "this branch" off of main, as you had advised? Or is it off of a fork??

((test_simboard) ) (base) [ac.bartoletti1@chrlogin1 simboard]$ git branch -a

main
remotes/origin/HEAD -> origin/main
remotes/origin/copilot/analyze-simboard-devops-issues
remotes/origin/copilot/check-copilot-agent-tokens
remotes/origin/copilot/enhance-simulation-details-page
remotes/origin/copilot/fix-hpc-filepaths-issue
remotes/origin/dev-ai
remotes/origin/fix/181-archive-path-substitution
remotes/origin/main

When I get too confused, I do a clean "git clone". Then I can do one of these:

To pull a remote branch down from remote:

    git fetch --all --prune
    git checkout -b newbranchname origin/newbranchname

To checkout a remote branch pushed but not merged to main/master

    git fetch origin <the_remote_branch_name>
    git checkout -b <any_new_local_name> origin/<the_remote_branch_name>

to fetch a branch from a remote fork: (example)

    git remote add tomvothecoder https://github.com/tomvothecoder/simboard.git
    git fetch tomvothecoder
    git checkout -b feature/154-ingestion-sites tomvothecoder/feature/154-ingestion-sites

Which is appropriate in this case?

TonyB9000 · 2026-06-05T20:20:36Z

@tomvothecoder If I pull down a branch off of someone's fork, am I in that fork, or can I pull that into a new branch of my local main? The persistence of branches and forks, between local and remote, is a bit of a mystery.

tomvothecoder · 2026-06-05T21:24:12Z

@tomvothecoder If I pull down a branch off of someone's fork, am I in that fork, or can I pull that into a new branch of my local main? The persistence of branches and forks, between local and remote, is a bit of a mystery.

Upstream -> E3SM-Project/simboard
Fork -> tomvothecoder/simboard

This branch (tomvothecoder:feature/154-ingestion-sites) is on my fork (tomvothecoder/simboard), not on upstream (E3SM-Project/simboard) You need to add my fork as a remote git source to git checkout branches from my fork.

Something like this (I did not verify correctness):

git remote add tomvothecoder https://github.com/tomvothecoder/simboard
git checkout tomvothecoder feature/154-ingestion-ites

I usually work directly on upstream and not fork when possible, but in this case I use a fork for separate testing purposes.

TonyB9000 · 2026-06-05T21:30:56Z

@tomvothecoder Sorry, I guess I must pull from your fork.

Quick test: I cd to ``/home/ac.bartoletti1/gitrepo/simboard/backend" and issue

python3.12 -m app.scripts.ingestion.nersc_archive_ingestor --api-base-url http://backend:8000 --machine-name chrysalis

The result:

2026-06-05 16:23:29,964 [INFO]: nersc_archive_ingestor.py(_log_event:1344) >> ts=2026-06-05T21:23:29.964677+00:00 event=run_started archive_root=/performance_archive mode=ingest
2026-06-05 16:23:29,964 [INFO]: nersc_archive_ingestor.py(_log_event:1344) >> ts=2026-06-05T21:23:29.964921+00:00 event=startup_configuration_begin
2026-06-05 16:23:29,965 [INFO]: nersc_archive_ingestor.py(_log_event:1344) >> ts=2026-06-05T21:23:29.965032+00:00 event=summary_table row_count=10 rows="api.api_base_url=_fake_url_ | api.endpoint_url=_fake_url_/api/v1/ingestions/from-path | api.state_endpoint_url=_fake_url_/api/v1/ingestions/state | paths.archive_root=/performance_archive | runtime.machine_name=perlmutter | runtime.dry_run=false | runtime.max_cases_per_run=null | runtime.max_attempts=3 | runtime.request_timeout_seconds=60 | auth.has_api_token=true" title=startup_configuration
2026-06-05 16:23:29,965 [INFO]: nersc_archive_ingestor.py(_log_event:1344) >> ts=2026-06-05T21:23:29.965082+00:00 event=startup_configuration_end
2026-06-05 16:23:29,965 [INFO]: nersc_archive_ingestor.py(_log_event:1344) >> ts=2026-06-05T21:23:29.965118+00:00 event=archive_root_missing archive_root=/performance_archive
2026-06-05 16:23:29,965 [INFO]: nersc_archive_ingestor.py(_log_event:1344) >> ts=2026-06-05T21:23:29.965168+00:00 event=run_finished duration_seconds=0.0 exit_code=1 mode=ingest

TonyB9000 · 2026-06-05T22:05:09Z

@tomvothecoder The line in "chrysalis.sh"

script_dir="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"; echo $script_dir
(prints "/home/ac.bartoletti1/test/simboard")

clearly wont work for defining "backend_root" as backend_root="$(cd "${script_dir}/../../../.." && pwd)"

I will modify chrysalis.sh to provide a "backend_root" that does not depend upon the user location., at least for test purposes.

TonyB9000 · 2026-06-05T22:25:06Z

@tomvothecoder Works better now that it can find "backentd/apps"

When I use this for "chrysalis.sh":

GITREPO="/home/ac.bartoletti1/gitrepo"

: "${SIMBOARD_API_BASE_URL:?SIMBOARD_API_BASE_URL is required}"
: "${SIMBOARD_API_TOKEN:?SIMBOARD_API_TOKEN is required}"

export MACHINE_NAME="${MACHINE_NAME:-chrysalis}"
export PERF_ARCHIVE_ROOT="${PERF_ARCHIVE_ROOT:-/lcrc/group/e3sm/PERF_Chrysalis/performance_archive}"
export STATE_PATH="${STATE_PATH:-${PERF_ARCHIVE_ROOT}/../simboard-ingestion-state.json}"
export DRY_RUN="${DRY_RUN:-true}"

backend_root="$GITREPO/simboard/backend"
python_bin="${PYTHON_BIN:-python}"

cd "${backend_root}"
exec "${python_bin}" -m app.scripts.ingestion.hpc_upload_archive_ingestor

and issue these exports:

export SIMBOARD_API_BASE_URL=" http://backend:8000"
export SIMBOARD_API_TOKEN="_fake_token_"

I get:

2026-06-05 17:17:50,763 [INFO]: nersc_archive_ingestor.py(_log_event:1344) >> ts=2026-06-05T22:17:50.763649+00:00 event=run_started archive_root=/lcrc/group/e3sm/PERF_Chrysalis/performance_archive mode=dry-run
2026-06-05 17:17:50,763 [INFO]: nersc_archive_ingestor.py(_log_event:1344) >> ts=2026-06-05T22:17:50.763967+00:00 event=startup_configuration_begin
2026-06-05 17:17:50,764 [INFO]: nersc_archive_ingestor.py(_log_event:1344) >> ts=2026-06-05T22:17:50.764094+00:00 event=summary_table row_count=10 rows="api.api_base_url=\" http://backend:8000\" | api.endpoint_url=\" http://backend:8000/api/v1/ingestions/from-hpc-upload\" | api.stat
e_endpoint_url=\" http://backend:8000/api/v1/ingestions/state\" | paths.archive_root=/lcrc/group/e3sm/PERF_Chrysalis/performance_archive | runtime.machine_name=chrysalis | runtime.dry_run=true | runtime.max_cases_per_run=null | runtime.max_attempts=3 | runtime.request_timeout_second
s=60 | auth.has_api_token=true" title=startup_configuration
2026-06-05 17:17:50,764 [INFO]: nersc_archive_ingestor.py(_log_event:1344) >> ts=2026-06-05T22:17:50.764155+00:00 event=startup_configuration_end
2026-06-05 17:17:50,816 [INFO]: nersc_archive_ingestor.py(_log_event:1344) >> ts=2026-06-05T22:17:50.816097+00:00 event=state_fetch_failed error="URL error: [Errno -2] Name or service not known" machine_name=chrysalis status_code=null
2026-06-05 17:17:50,816 [INFO]: nersc_archive_ingestor.py(_log_event:1344) >> ts=2026-06-05T22:17:50.816211+00:00 event=run_finished duration_seconds=0.053 exit_code=1 mode=dry-run

I guess, even "dry_run" requires real URLs and API_tokens. That is because we need "state" up front.

TonyB9000 · 2026-06-05T23:26:52Z

Hi @tomvothecoder The document also says:

One-case-per-request rule:

Each upload request contains exactly one case directory.
case_path is sent alongside the archive and becomes the stable dedupe key in the ingestion audit table.
Browser/manual uploads still use /api/v1/ingestions/from-upload; this runner does not call that endpoint.

The term "alongside the archive" is a bit ambiguous. Would this be accurate?

Each upload request contains exactly one case directory, and one or more newly-completed jlid archives.
case_path is sent alongside the archives, and (case_id + jlid) becomes the stable dedupe key in the ingestion audit table.
Browser/manual uploads still use /api/v1/ingestions/from-upload; this runner does not call that endpoint.

Or am I misunderstanding the intent?

tomvothecoder · 2026-06-09T19:16:09Z

I guess, even "dry_run" requires real URLs and API_tokens. That is because we need "state" up front.

Great to see the progress!

Yes, the dry run needs to query the SimBoard database via the REST API. I will send the API_TOKEN over encrypted email to you.

Hi @tomvothecoder The document also says:

One-case-per-request rule:

* Each upload request contains exactly one case directory.

* case_path is sent alongside the archive and becomes the stable dedupe key in the ingestion audit table.

* Browser/manual uploads still use /api/v1/ingestions/from-upload; this runner does not call that endpoint.

The term "alongside the archive" is a bit ambiguous. Would this be accurate?

* Each upload request contains exactly one case directory, and one or more newly-completed jlid archives.

* case_path is sent alongside the archives, and (case_id + jlid) becomes the stable dedupe key in the ingestion audit table.

* Browser/manual uploads still use /api/v1/ingestions/from-upload; this runner does not call that endpoint.

Or am I misunderstanding the intent?

Your info sounds more accurate, thanks for the suggestion. Can you point me to the source document with this info? I will update it.

TonyB9000 · 2026-06-09T21:44:48Z

@tomvothecoder Running "chrysalis.sh" with the full (DRY_RUN) parameters yieded the following summary (folded for readability):

event=summary_table
    row_count=9
    rows="mode=dry-run 
        | discovered_cases=746 
        | candidate_cases=746 
        | execution_dirs_scanned=3155 
        | execution_dirs_accepted=1649 
        | skipped_incomplete=1506
        | skipped_invalid=0 
        | candidate_logs_emitted=20 
        | candidate_logs_suppressed=726" 
    title=dry_run_summary
event=run_finished
    duration_seconds=374.342
    exit_code=0
    mode=dry-run

Questions that arise:

What distinguishes "discovered cases" from "candidate cases"?
What distinguishes "skipped_incomplete" from "skipped_invalid"?
Where is "skipped_already_accepted = 0"? Perhaps this test is unrealistic, as no "state" of previous accepted submissions exists,.
Why is there no count of "state" returned from the database? Was the query restricted to Chrysalis-only? Why is the DB query not indicated?
What is "candidate_logs_emitted/suppressed"?

Observation: The bulk of work getting to this point involved stuffing the right ENV VARS and having created an environment where misc modules like "dateutils" could be installed. On Chrysalis, I performed

    python3.12 -m venv ~/envs/test_simboard
    source ~/envs/test_simboard/bin/activate
    python3.12 -m pip install --upgrade pip

    python3.12 -m pip install python-dateutil
    pip install pydantic
    pip install fastapi_users

On NERSC/Perlmutter, I simply replaced "python3.12" with "python3.11". I intend to perform the same test on Perlmutter, just to exercise the mechanisms of networking.

TonyB9000 · 2026-06-09T22:23:34Z

@tomvothecoder For comparison, running the equivalent commands on perlmutter (swapping our parameters where necessary), we obtain the summary:

event=summary_table
    row_count=9 
    rows="mode=dry-run 
        | discovered_cases=1289 
        | candidate_cases=1289 
        | execution_dirs_scanned=2514 
        | execution_dirs_accepted=1648 
        | skipped_incomplete=866 
        | skipped_invalid=0
        | candidate_logs_emitted=20
        | candidate_logs_suppressed=1269"
    title=dry_run_summary
event=run_finished
    duration_seconds=23.689
    exit_code=0
    mode=dry-run

I suppose I should re-run the chrysalis test, using "OLD_PERF" as the root_PA directory. It is HUGE.

tomvothecoder · 2026-06-09T23:03:20Z

What distinguishes "discovered cases" from "candidate cases"?

discovered_casesare everything the archive scan finds that looks like a case.
candidate_cases are the subset that SimBoard does not already know about and may ingest.

Since this is a first-time dry-run on the Chrysalis performance_archive directory, it is expected that discovered_cases and candidates_cases are the same.

What distinguishes "skipped_incomplete" from "skipped_invalid"?

skipped_incomplete means required metadata was missing.
skipped_invalid means the metadata or path looked wrong, unreadable, or unusable.

Where is "skipped_already_accepted = 0"? Perhaps this test is unrealistic, as no "state" of previous accepted submissions exists.

That exact counter is not in nersc_archive_ingestor.py. The script checks existing SimBoard ingestion state and filters out already-known execution IDs, but it does not use the term “accepted” or expose a skipped_already_accepted count.

So yes: a test expecting that exact field is probably unrealistic or stale.

Why is there no count of "state" returned from the database? Was the query restricted to Chrysalis-only? Why is the DB query not indicated?

The ingestor script only asks SimBoard for enough existing ingestion state to decide which archive cases and their executions are new and may be candidates for ingestion. It does not fetch, return, or summarize the full database state. It also does not show the database query because the query is behind the SimBoard API, not inside the ingestor script. So this is not a Chrysalis-specific DB query in the ingestor. It is an API request filtered by the configured machine_name.

If more detail is needed, the API response or ingestor summary would need to be expanded to include counts like total known cases, known execution IDs, skipped known cases, and machine filter used.

Happy for you to open a new GitHub issue to expand logging in https://github.com/E3SM-Project/simboard/blob/main/backend/app/scripts/ingestion/nersc_archive_ingestor.py and https://github.com/E3SM-Project/simboard/blob/main/backend/app/scripts/ingestion/hpc_upload_archive_ingestor.py.

What is "candidate_logs_emitted/suppressed"?

What is candidate_logs_emitted/suppressed?

They are dry-run logging counters.

candidate_logs_emitted = how many candidate case details were actually printed to the log.

candidate_logs_suppressed = how many candidate case details were not printed because the script hit its logging limit.

The point is to avoid massive logs when many candidate cases are found. It does not change which cases are candidates or which cases would be ingested.

tomvothecoder · 2026-06-09T23:05:22Z

I suppose I should re-run the chrysalis test, using "OLD_PERF" as the root_PA directory. It is HUGE.

I don't think this is going to work yet as the directory structure of "OLD_PERF" is different from "performance_archive".
We need to expand ingestion support for "OLD_PERF" in #209.

We might also want to be targeted in what we ingest from "OLD_PERF". This will require guidance Rob/Jill.

TonyB9000 · 2026-06-09T23:19:33Z

@tomvothecoder I could do the more thorough assessment, but "OLD_PERF" seems rather sparse. It contains a subdirecotry for every year and month, and each has "performance_archive" subdirectories (one per day, approx), but they appear to contain no "exec_id" material, only logs:

((test_simboard) ) (base) [ac.bartoletti1@chrlogin1 simboard]$ ll /lcrc/group/e3sm/OLD_PERF/2026-01
total 23
drwxrwsr-x 3 e3smtest E3SM 4096 Jan  1 00:21 performance_archive_anvil_e3sm_2026_01_01_00_19_55
drwxrwsr-x 2 e3smtest E3SM 4096 Jan  2 00:17 performance_archive_anvil_e3sm_2026_01_02_00_17_24
drwxrwsr-x 2 e3smtest E3SM 4096 Jan  3 00:17 performance_archive_anvil_e3sm_2026_01_03_00_17_43
drwxrwsr-x 2 e3smtest E3SM 4096 Jan  4 00:20 performance_archive_anvil_e3sm_2026_01_04_00_20_49
drwxrwsr-x 2 e3smtest E3SM 4096 Jan  5 00:23 performance_archive_anvil_e3sm_2026_01_05_00_22_56
drwxrwsr-x 2 e3smtest E3SM 4096 Jan  6 00:19 performance_archive_anvil_e3sm_2026_01_06_00_19_00
drwxrwsr-x 2 e3smtest E3SM 4096 Jan  7 00:18 performance_archive_anvil_e3sm_2026_01_07_00_17_47
drwxrwsr-x 3 e3smtest E3SM 4096 Jan  8 00:20 performance_archive_anvil_e3sm_2026_01_08_00_18_23
drwxrwsr-x 2 e3smtest E3SM 4096 Jan  9 00:17 performance_archive_anvil_e3sm_2026_01_09_00_17_32
drwxrwsr-x 2 e3smtest E3SM 4096 Jan 10 00:18 performance_archive_anvil_e3sm_2026_01_10_00_18_46
drwxrwsr-x 2 e3smtest E3SM 4096 Jan 11 00:19 performance_archive_anvil_e3sm_2026_01_11_00_19_34
drwxrwsr-x 2 e3smtest E3SM 4096 Jan 12 00:17 performance_archive_anvil_e3sm_2026_01_12_00_17_13
drwxrwsr-x 2 e3smtest E3SM 4096 Jan 21 05:17 performance_archive_anvil_e3sm_2026_01_21_05_17_46
drwxrwsr-x 3 e3smtest E3SM 4096 Jan 22 00:21 performance_archive_anvil_e3sm_2026_01_22_00_20_40
drwxrwsr-x 2 e3smtest E3SM 4096 Jan 23 00:22 performance_archive_anvil_e3sm_2026_01_23_00_22_20
drwxrwsr-x 3 e3smtest E3SM 4096 Jan 24 00:21 performance_archive_anvil_e3sm_2026_01_24_00_21_28
drwxrwsr-x 3 e3smtest E3SM 4096 Jan 25 00:17 performance_archive_anvil_e3sm_2026_01_25_00_17_10
drwxrwsr-x 3 e3smtest E3SM 4096 Jan 26 00:18 performance_archive_anvil_e3sm_2026_01_26_00_17_46
drwxrwsr-x 3 e3smtest E3SM 4096 Jan 27 00:23 performance_archive_anvil_e3sm_2026_01_27_00_22_54
drwxrwsr-x 3 e3smtest E3SM 4096 Jan 28 00:18 performance_archive_anvil_e3sm_2026_01_28_00_18_06
drwxrwsr-x 3 e3smtest E3SM 4096 Jan 29 00:19 performance_archive_anvil_e3sm_2026_01_29_00_18_23
drwxrwsr-x 2 e3smtest E3SM 4096 Jan 30 00:23 performance_archive_anvil_e3sm_2026_01_30_00_22_48
drwxrwsr-x 2 e3smtest E3SM 4096 Jan 31 00:17 performance_archive_anvil_e3sm_2026_01_31_00_16_50
((test_simboard) ) (base) [ac.bartoletti1@chrlogin1 simboard]$ ll /lcrc/group/e3sm/OLD_PERF/2026-01/performance_archive_anvil_e3sm_2026_01_30_00_22_48
total 32
-rw-rw-r-- 1 e3smtest E3SM 22748 Jan 30 00:22 e3sm_perf_archive_anvil_2026_01_30_00_22_48_out.txt
-rw-rw-r-- 1 e3smtest E3SM     0 Jan 30 00:21 large-files-removed.txt
((test_simboard) ) (base) [ac.bartoletti1@chrlogin1 simboard]$ ll /lcrc/group/e3sm/OLD_PERF/2026-01/performance_archive_anvil_e3sm_2026_01_06_00_19_00
total 32
-rw-rw-r-- 1 e3smtest E3SM 22748 Jan  6 00:19 e3sm_perf_archive_anvil_2026_01_06_00_19_00_out.txt
-rw-rw-r-- 1 e3smtest E3SM     0 Jan  6 00:17 large-files-removed.txt

TonyB9000 · 2026-06-09T23:27:42Z

@tomvothecoder More importantly, I think we might want a "DRY_RUN_1" and "DRY_RUN_2", the latter wherein we exercise the "tar.gz" generation. (Even a "DRY_RUN_0" could stub the remote state test, and obviate the need to exercise the URL and API TOKEN, while still exercising the directory traversal and other "accept/reject" logic.)

As far as logging, it would be nice to simply know how many local entries were rejected due to redundancy (already accepted), or some confirmation that the remote state query had actually succeeded.

tomvothecoder requested a review from TonyB9000 May 6, 2026 22:33

TonyB9000 approved these changes May 13, 2026

View reviewed changes

tomvothecoder assigned TonyB9000 May 13, 2026

tomvothecoder added the type: enhancement New feature or request label May 13, 2026

tomvothecoder mentioned this pull request May 27, 2026

[Enhancement]: Add state-first HPC upload ingestion flow with DB-backed dedupe parity #207

Closed

tomvothecoder mentioned this pull request Jun 1, 2026

[Ops] Discover stability parameters of performance archive directories (PACE) #192

Open

5 tasks

tomvothecoder added 3 commits June 4, 2026 13:24

Add Chrysalis ingestion wrapper

39f77f7

Add docs for ingestion work

9161b77

Update chrysalis.sh and remove old script

feae197

tomvothecoder force-pushed the feature/154-ingestion-sites branch from 22a1a88 to feae197 Compare June 4, 2026 20:26

tomvothecoder added 2 commits June 4, 2026 13:28

Update docs

fcbf87d

Fix docs

9b7de39

tomvothecoder commented Jun 5, 2026

View reviewed changes

Comment thread backend/app/scripts/ingestion/sites/chrysalis.sh Outdated

Apply suggestion from @tomvothecoder

53438b2

tomvothecoder modified the milestones: Operations & Deployment, 2. Operations & Deployment, 1. FY26Q4 (07/01/26 - 09/30/26) Jun 10, 2026

tomvothecoder added type: ops Operation and Deployment tasks for DOE sites. and removed type: enhancement New feature or request labels Jun 10, 2026

tomvothecoder changed the title ~~Add Chrysalis ingestion wrapper~~ [Ops]: Add Chrysalis ingestion wrapper Jun 11, 2026

Conversation

tomvothecoder commented Apr 30, 2026

Description

Checklist

Deployment Notes (if any)

Uh oh!

TonyB9000 commented May 6, 2026

Uh oh!

TonyB9000 left a comment

Choose a reason for hiding this comment

Uh oh!

TonyB9000 commented May 13, 2026

Uh oh!

tomvothecoder commented May 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

TonyB9000 commented May 14, 2026

Uh oh!

tomvothecoder commented May 27, 2026

Uh oh!

TonyB9000 commented May 27, 2026

Uh oh!

TonyB9000 commented Jun 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

TonyB9000 commented Jun 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

tomvothecoder commented Jun 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

TonyB9000 commented Jun 5, 2026

Uh oh!

TonyB9000 commented Jun 5, 2026

Uh oh!

TonyB9000 commented Jun 5, 2026

Uh oh!

tomvothecoder commented Jun 5, 2026

Uh oh!

TonyB9000 commented Jun 5, 2026

Uh oh!

TonyB9000 commented Jun 5, 2026

Uh oh!

TonyB9000 commented Jun 5, 2026

Uh oh!

TonyB9000 commented Jun 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tomvothecoder commented Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

TonyB9000 commented Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

TonyB9000 commented Jun 9, 2026

Uh oh!

tomvothecoder commented Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tomvothecoder commented Jun 9, 2026

Uh oh!

TonyB9000 commented Jun 9, 2026

Uh oh!

TonyB9000 commented Jun 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

tomvothecoder commented May 13, 2026 •

edited

Loading

TonyB9000 commented Jun 4, 2026 •

edited

Loading

TonyB9000 commented Jun 4, 2026 •

edited

Loading

tomvothecoder commented Jun 5, 2026 •

edited

Loading

TonyB9000 commented Jun 5, 2026 •

edited

Loading

tomvothecoder commented Jun 9, 2026 •

edited

Loading

TonyB9000 commented Jun 9, 2026 •

edited

Loading

tomvothecoder commented Jun 9, 2026 •

edited

Loading