feat(observability): cloud/host OTel attributes + VM-metrics collector#148
Merged
Conversation
Two coupled slices that give Grafana Cloud cloud/host context, sharing one
resource-attribute set so VM metrics and app telemetry correlate on host.id.
Resource-attribute enrichment (ADR-0030): inject four more OTel resource
attributes inforge alone knows at deploy — cloud.provider, cloud.region
(Hetzner network_zone), cloud.availability_zone (Hetzner location) and host.type
(server-type SKU). They are provider-supplied fields on types.ComputeOutputs,
read off the host in renderDescriptor, carried in bootstrapper.Deployment, and
emitted by buildEnv omit-if-empty. Descriptor schema bumps v4 -> v5 (the strict
KnownFields decoder makes any field addition a major bump).
Host VM-metrics collector (ADR-0031): new pure internal/otelcol package renders
an off-the-shelf OpenTelemetry Collector Contrib config (hostmetrics -> otlphttp)
and the idempotent install shell (version-pinned .deb, checksum-verified, apt
install keeping our config). program.provisionObservability is an always-on
per-host pass gated on env-level config (variables.yaml observability.otlp_endpoint
plus the observability/otlp_auth secret in secrets.enc.yaml). The agent runs
unprivileged (process scraper off); the credential is base64'd, ToSecret-wrapped
(encrypted in state), written 0600 to the collector user, and referenced from the
config via the collector's ${file:...} provider.
The consumer-side telemetry.rs mapping for the new attributes lands separately in
the wardnet-cloud repo.
Claude-Session: https://claude.ai/code/session_017Kyd98NzojozMZ19d5UCZ2
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Two coupled slices that give Grafana Cloud cloud/host context, sharing one resource-attribute set so VM metrics and app telemetry correlate on
host.id.Slice 1 — Resource-attribute enrichment (ADR-0030)
Inject four more OTel resource attributes inforge alone knows at deploy time:
cloud.providerhetzner(provider self-names)INFORGE_CLOUD_PROVIDERcloud.regionnetwork_zoneINFORGE_CLOUD_REGIONcloud.availability_zonelocationINFORGE_CLOUD_AVAILABILITY_ZONEhost.typeINFORGE_HOST_TYPEtypes.ComputeOutputs, populated inhetzner.Create(), read off the host inrenderDescriptor, carried inbootstrapper.Deployment, emitted bybuildEnvomit-if-empty.host.name/os.typedeliberately dropped (the process can self-detect them).KnownFieldsdecoder makes any field addition a major bump; safe via the pinned-bootstrapper lockstep).Slice 2 — Host VM-metrics collector (ADR-0031)
internal/otelcolpackage renders an off-the-shelf OpenTelemetry Collector Contrib config (hostmetrics→otlphttp) + the idempotent install shell (version-pinned.deb, checksum-verified,apt-get installkeeping our config).program.provisionObservability: always-on per-host pass, gated on env-level config (variables.yamlobservability.otlp_endpoint+ theobservability/otlp_authsecret insecrets.enc.yaml). No endpoint → no-op; endpoint but no credential → deploy fails.processscraper off). Credential is base64'd,pulumi.ToSecret-wrapped (encrypted in state), written0600to the collector user, referenced via the collector's${file:…}provider (never inlined).host.id.Notes / follow-ups
wardnet-cloud: a four-row addition to the(attribute, env_var)table incrates/common/src/telemetry.rs::resource()mapping the new vars. It's inert until inforge emits them, so this PR can merge first..debis fetched onto the host)..deb's default unit loads/etc/otelcol-contrib/config.yaml, and the collector's${file:…}provider reads the credential cleanly. A singleinforge deployagainst a test env would confirm both.add-service-exposed-ports) is stale — that work merged in feat(service): add exposed_ports for private-network port exposure #141; this branch now carries the observability slices.Test plan
go build ./...,go test -race ./...,golangci-lint run ./...— all clean.internal/otelcol(render determinism, omit-empty, required-field errors, install/credential/apply scripts);bootstrapperenv + descriptor round-trip carry the new attributes;programdescriptor round-trip asserts the cloud/host fields.wardnet-cloudtelemetry.rstests updated (in the separate PR).Merge Commit Message
feat(observability): cloud/host OTel attributes + VM-metrics collector (ADR-0030, ADR-0031)
https://claude.ai/code/session_017Kyd98NzojozMZ19d5UCZ2