DPC-5482: Datadog dashboard, monitors and synthetic tests#3022
DPC-5482: Datadog dashboard, monitors and synthetic tests#3022jscott-nava wants to merge 43 commits into
Conversation
b33f6a3 to
6c34699
Compare
|
Moving back to draft to add monitors code to this PR. |
2b28cdc to
b1eeba3
Compare
| @@ -0,0 +1,66 @@ | |||
| name: tf-00-datadog-dashboard | |||
| run-name: tf-00-datadog-dashboard | |||
There was a problem hiding this comment.
Looks like BCDA uses -40- for datadog
(e.g. 40-datadog-monitors and 40-datadog-dashboards)
What is the advantage of using this "##-xyz" format? Should we be consistent across teams?
There was a problem hiding this comment.
-
My understanding is that the prefixes are intended to create a sort of dependency tree so that it is clear in which order each TF module should be applied. In AB2D, for example, 10-core contains all IAM resources, and therefore clearly should be applied prior to 30-api which depends on those resources.
-
There is not currently a strict standard shared between repos (i.e. BCDA uses 40 for Datadog but AB2D uses 60 and 65), but since Datadog does not depend on anything else and is a team wide resource (as opposed to account or environment wide) it seems as though 00 is as good a place as any. (These numbers can also adjusted very easily in the future as the larger Tofu refactoring effort progresses.)
There was a problem hiding this comment.
I might make sense to follow BCDA and AB2D and give it a higher rating, since wouldnt the dashboard not depend on our services to be up and running to accurately report data?
There was a problem hiding this comment.
@Jose-verdance Sure, let me know what number makes sense and I'll make that change.
There was a problem hiding this comment.
I am ok with 40s for the datadog and maybe call it observability.
| @@ -0,0 +1,106 @@ | |||
| name: tf-10-config | |||
There was a problem hiding this comment.
👍
We still need to figure out a strategy for handling free-floating parameters in upper environments.
As a long-term solution, making this part of the Build and deploy workflow, would avoid the need to perform extra steps for updating SSM values
I drafted a new ticket to address this: https://jira.cms.gov/browse/DPC-5519
This is relevant to multi-csp work as we'll want to use the sops module for managing things like CLEAR secret key
There was a problem hiding this comment.
As discussed, we will sidebar on this with the team to figure out the SOPS deployment strategy. As far as the scope of this PR is concerned however, the SOPS-related changes are limited to fixes to allow it to continue to work with changes made to the shared root.tofu.tf file which were required by the dashboard and monitors modules.
| lambda: true | ||
| s3: true | ||
| rds: true | ||
| lambda: |
There was a problem hiding this comment.
values are the same across dev/test/sandbox/prod.yml, so we could probably include these as part of defaults.yml
There was a problem hiding this comment.
This code was borrowed from the AB2D implementation of the monitors module and seems like a reasonable starting point for the monitors deployment with the expectation that the environment specific values will likely change with ongoing Datadog tasks as we realize that all monitors do not need to be in each environment.
Also, while all configuration could be contained in the defaults.yml file for this PR, leaving this current implementation helps to clarify how each environment can be independently configured.
| See https://terraform-docs.io/user-guide/configuration/ for more information. | ||
| --> | ||
| ## Outputs | ||
|
|
There was a problem hiding this comment.
Could you provide context where I can find the new datadog dashboard?
There was a problem hiding this comment.
@lukey-luke You should have received an email from Datadog HQ on (or around) May 26th with details on how to register and log in to the CMS Datadog instance, and from there you can browse all the DASG dashboards and monitors. Please reach out to Bishoy if you have not received that email.
There was a problem hiding this comment.
Thanks!
For other folks reviewing, DPC Metrics Dashboard can be found HERE
| permissions: | ||
| contents: read | ||
| id-token: write | ||
| runs-on: codebuild-dpc-app-non-prod-${{ github.run_id }}-${{ github.run_attempt }} |
There was a problem hiding this comment.
Hey @jscott-nava, why is this run with the non prod version? Will there be a separate dashboard for prod added later?
There was a problem hiding this comment.
@Jose-verdance There is one dashboard per team (not account), and that dashboard covers all environments.
| env = var.env | ||
| root_module = "https://github.com/CMSgov/bcda-app/tree/main/ops/services/10-sops" |
There was a problem hiding this comment.
Why is this bcda-app and not cdap or dpc-ops?
| @@ -0,0 +1,5 @@ | |||
| variable "env" { | |||
There was a problem hiding this comment.
If this is restricted to test then where are the sandbox and prod monitor metrics connected to?
|
Converting this PR back to draft to add Datadog synthetic tests (DPC-5507). |
15e819f to
9ac3ff9
Compare
🎫 Ticket
https://jira.cms.gov/browse/DPC-5482
https://jira.cms.gov/browse/DPC-5252
https://jira.cms.gov/browse/DPC-5507
🛠 Changes
This PR adds the following:
root.tofu.tffile.envvariable.ℹ️ Context
The addition of the Datadog dashboard, monitors and synthetic tests are tasks in the greater epic of migrating from New Relic to Datadog.
🧪 Validation
A sample of relevant workflow runs is as follows: