Skip to content

DPC-5482: Datadog dashboard, monitors and synthetic tests#3022

Draft
jscott-nava wants to merge 43 commits into
mainfrom
jscott/DPC-5482
Draft

DPC-5482: Datadog dashboard, monitors and synthetic tests#3022
jscott-nava wants to merge 43 commits into
mainfrom
jscott/DPC-5482

Conversation

@jscott-nava

@jscott-nava jscott-nava commented Jun 3, 2026

Copy link
Copy Markdown
Contributor

🎫 Ticket

https://jira.cms.gov/browse/DPC-5482
https://jira.cms.gov/browse/DPC-5252
https://jira.cms.gov/browse/DPC-5507

🛠 Changes

This PR adds the following:

  • Datadog dashboard and monitor modules and initial configurations.
  • Github actions workflows for the new modules, and an update to the pre-existing SOPS workflow (for the sake of consistency but also to be compatible with the shared root.tofu.tf file.
  • Updates to the shared root.tofu.tf file to improve the handling of the env variable.
  • Synthetic tests that mirror what exists in New Relic.

ℹ️ Context

The addition of the Datadog dashboard, monitors and synthetic tests are tasks in the greater epic of migrating from New Relic to Datadog.

🧪 Validation

A sample of relevant workflow runs is as follows:

@jscott-nava jscott-nava marked this pull request as ready for review June 3, 2026 22:12
@jscott-nava jscott-nava requested a review from a team as a code owner June 3, 2026 22:12
@jscott-nava jscott-nava marked this pull request as draft June 4, 2026 16:45
@jscott-nava

Copy link
Copy Markdown
Contributor Author

Moving back to draft to add monitors code to this PR.

@jscott-nava jscott-nava marked this pull request as ready for review June 5, 2026 22:50
@jscott-nava jscott-nava changed the title DPC-5482: Datadog dashboard DPC-5482: Datadog dashboard and monitors Jun 9, 2026
@@ -0,0 +1,66 @@
name: tf-00-datadog-dashboard
run-name: tf-00-datadog-dashboard

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like BCDA uses -40- for datadog
(e.g. 40-datadog-monitors and 40-datadog-dashboards)

What is the advantage of using this "##-xyz" format? Should we be consistent across teams?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • My understanding is that the prefixes are intended to create a sort of dependency tree so that it is clear in which order each TF module should be applied. In AB2D, for example, 10-core contains all IAM resources, and therefore clearly should be applied prior to 30-api which depends on those resources.

  • There is not currently a strict standard shared between repos (i.e. BCDA uses 40 for Datadog but AB2D uses 60 and 65), but since Datadog does not depend on anything else and is a team wide resource (as opposed to account or environment wide) it seems as though 00 is as good a place as any. (These numbers can also adjusted very easily in the future as the larger Tofu refactoring effort progresses.)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might make sense to follow BCDA and AB2D and give it a higher rating, since wouldnt the dashboard not depend on our services to be up and running to accurately report data?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Jose-verdance Sure, let me know what number makes sense and I'll make that change.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am ok with 40s for the datadog and maybe call it observability.

Comment thread .github/workflows/tf-00-datadog-dashboard.yml
@@ -0,0 +1,106 @@
name: tf-10-config

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

We still need to figure out a strategy for handling free-floating parameters in upper environments.
As a long-term solution, making this part of the Build and deploy workflow, would avoid the need to perform extra steps for updating SSM values

I drafted a new ticket to address this: https://jira.cms.gov/browse/DPC-5519

This is relevant to multi-csp work as we'll want to use the sops module for managing things like CLEAR secret key

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed, we will sidebar on this with the team to figure out the SOPS deployment strategy. As far as the scope of this PR is concerned however, the SOPS-related changes are limited to fixes to allow it to continue to work with changes made to the shared root.tofu.tf file which were required by the dashboard and monitors modules.

lambda: true
s3: true
rds: true
lambda:

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

values are the same across dev/test/sandbox/prod.yml, so we could probably include these as part of defaults.yml

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code was borrowed from the AB2D implementation of the monitors module and seems like a reasonable starting point for the monitors deployment with the expectation that the environment specific values will likely change with ongoing Datadog tasks as we realize that all monitors do not need to be in each environment.

Also, while all configuration could be contained in the defaults.yml file for this PR, leaving this current implementation helps to clarify how each environment can be independently configured.

See https://terraform-docs.io/user-guide/configuration/ for more information.
-->
## Outputs

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you provide context where I can find the new datadog dashboard?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lukey-luke You should have received an email from Datadog HQ on (or around) May 26th with details on how to register and log in to the CMS Datadog instance, and from there you can browse all the DASG dashboards and monitors. Please reach out to Bishoy if you have not received that email.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!
For other folks reviewing, DPC Metrics Dashboard can be found HERE

@jscott-nava jscott-nava requested a review from lukey-luke June 9, 2026 22:54
permissions:
contents: read
id-token: write
runs-on: codebuild-dpc-app-non-prod-${{ github.run_id }}-${{ github.run_attempt }}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @jscott-nava, why is this run with the non prod version? Will there be a separate dashboard for prod added later?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Jose-verdance There is one dashboard per team (not account), and that dashboard covers all environments.

Comment on lines +10 to +11
env = var.env
root_module = "https://github.com/CMSgov/bcda-app/tree/main/ops/services/10-sops"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this bcda-app and not cdap or dpc-ops?

@@ -0,0 +1,5 @@
variable "env" {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is restricted to test then where are the sandbox and prod monitor metrics connected to?

@jscott-nava jscott-nava marked this pull request as draft June 12, 2026 18:32
@jscott-nava

Copy link
Copy Markdown
Contributor Author

Converting this PR back to draft to add Datadog synthetic tests (DPC-5507).

@jscott-nava jscott-nava changed the title DPC-5482: Datadog dashboard and monitors DPC-5482: Datadog dashboard, monitors and synthetic tests Jun 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants