Skip to content

Add signed batch machine run context / ADR#1772

Open
krowvin wants to merge 6 commits into
developfrom
batch-dynamic-runtime/cda-job-context
Open

Add signed batch machine run context / ADR#1772
krowvin wants to merge 6 commits into
developfrom
batch-dynamic-runtime/cda-job-context

Conversation

@krowvin

@krowvin krowvin commented Jun 9, 2026

Copy link
Copy Markdown
Collaborator

Summary

Discussed adding M2M auth for use with CWMS Batch Jobs. This is the first step towards enabling this form of authentication for use in CWBI.

Adds CDA support for signed batch job run context when requests are made by configured batch machine users.

This allows batch runtimes to authenticate with a shared Keycloak service account while still giving CDA trusted job launch context from the dispatcher, such as the authorized run office, job id, requester, and dispatch source.

Also adds ADR 0009 documenting the signed batch machine run context design.

Related Issue

Closes #

Validation

JDK 21

  • Passed targeted unit test:
    • ./gradlew.bat :cwms-data-api:test --tests cwms.cda.security.BatchJobContextTest
  • Ran full non-integration unit suite:
    • ./gradlew.bat :cwms-data-api:test
    • Result: failed on existing unrelated tests under JDK 21 Mockito/ByteBuddy limitations and one missing resource in RatingsVerticalDatumExtractorTest; new BatchJobContextTest passed.
  • Local dynamic batch runtime harness was used earlier to validate API/dispatcher/runner/CDA flow, including valid signed context and denied unauthorized cross-office write.

Checklist

  • AI tools used

@krowvin krowvin changed the title Add signed batch machine run context Add signed batch machine run context / ADR Jun 9, 2026
@krowvin krowvin marked this pull request as ready for review June 9, 2026 12:03
@krowvin krowvin requested a review from MikeNeilson June 9, 2026 12:03

@MikeNeilson MikeNeilson left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm being a bit pushy here. AND I may in fact be wrong about what we can accomplish with Keycloak and thus may end up having to go this route (there's nothing fundmentally wrong with the design... other that too much reliance on properties, though that's an initial design problem not a new problem) but I think if we can get Keycloak to handle the load the transition is a lot simpler.

final String email = claims.get(EMAIL_CLAIM, String.class);
return dao.createUser(preferredUserName, oidcPrincipal, givenName, email);
DataApiPrincipal dataApiPrincipal = dao.createUser(preferredUserName, oidcPrincipal, givenName, email);
BatchJobContext.prepareContext(ctx, dataApiPrincipal);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the "batch" user is getting created randomly here, we have an issue, In the context of a batch process this should be a failure.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Corrected in bc40f8b

public static final String REQUESTED_BY_ATTR = "BatchRequestedBy";
public static final String DISPATCH_SOURCE_ATTR = "BatchDispatchSource";

public static final String SECRET_PROPERTY = "cwms.dataapi.batch.jobContext.secret";

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is Batch becoming an issuer of secrets?

My understanding is that Keycloak would provide the JWT and CDA would just consume it.

if (username == null) {
return false;
}
String machineUsers = readSetting(MACHINE_USERS_PROPERTY, DEFAULT_MACHINE_USERS);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CDA should not need to know anything about this, the JWT provided by Keycloak can embed a claim of "machine-auth" or something and decisions made from that.

throw new CwmsAuthException("Batch job context missing run_as_office",
HttpServletResponse.SC_UNAUTHORIZED);
}
ctx.attribute(RUN_AS_OFFICE_ATTR, office.toUpperCase(Locale.ROOT));

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While appropriate to put in to a future logging context... these shouldn't need to be part of the Request/Response attributes. Downstream components needing to know that is a definite code smell.


Provide CWMS Data API with a trusted batch run context for jobs that execute through a shared machine identity.

Batch runtimes will authenticate to CDA with a service account (via Keycloak). Each job will also provide a signed context token that identifies the authorized job launch context, including the office for which the scheduler or API approved the run.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Setting up additional signing is going to be difficult to get right, and will involve yet-more CDK changes. Granted they won't be difficult.

I would like us to first determine if we can, in some way, setup keycloak to be able to receive the required information (such as the office, and specific job identification) and return it back in the already signed JWT access token.

Some of the other information can, and should, still be provided but it doesn't really required singing it's just informational. The office would be used to set the session context (or by the future authorization system) to appropriately limit operations.

e.g. office + job identification can be readily tied to a policy to appropriately limit operations..

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants