client: add Prometheus HTTP SD endpoint for local allocations#28116
client: add Prometheus HTTP SD endpoint for local allocations#28116dberkerdem wants to merge 3 commits into
Conversation
Add GET /v1/client/service_discovery to the client agent HTTP API, serving the node's running allocations as Prometheus HTTP SD target groups (https://prometheus.io/docs/prometheus/latest/http_sd/). One target group is emitted per allocated port of every running allocation, labeled with __meta_nomad_* labels covering namespace, job, task group, allocation, node, and port, plus job/group meta as __meta_nomad_meta_<key>. A ?port=<label> query parameter filters to a single port label (e.g. ?port=metrics). The endpoint serves local client state only and requires node:read, so scrape-target discovery fans out to the client nodes instead of funneling through the servers.
Avoid overloading 'service discovery', which already names Nomad's native service catalog — this endpoint serves allocations, not services. Dashed path segments also match the existing API style.
schmichael
left a comment
There was a problem hiding this comment.
Took a first pass and it looks pretty good! Not approving yet because there are some design questions worth discussing in the issue: #28115
| // Listing every allocation on the node spans namespaces, so require | ||
| // node:read like the other node-level client endpoints. | ||
| aclObj, err := s.ResolveToken(req) |
There was a problem hiding this comment.
As per my comment on the issue, if we switch to <namespace>:read-job this should change to filter out allocations in namespaces the token is not allowed to read.
| } | ||
| maps.Copy(baseLabels, nodeLabels) | ||
|
|
||
| // Expose job and task group meta (group overrides job) so schedulers |
There was a problem hiding this comment.
"schedulers" is a bit weird in this context because in Nomad code that usually refers to the scheduler/ package.
| // Expose job and task group meta (group overrides job) so schedulers | |
| // Expose job and task group meta (group overrides job) so users |
| // A running allocation always carries its job and allocated | ||
| // resources; their absence means corrupted client state. Skip | ||
| // the allocation but say so, because its targets silently | ||
| // disappearing from a successful response is otherwise | ||
| // undebuggable. | ||
| if alloc.Job == nil || alloc.AllocatedResources == nil { | ||
| logger.Warn("skipping running allocation with incomplete state in service discovery", | ||
| "alloc_id", alloc.ID, "job_id", alloc.JobID, | ||
| "has_job", alloc.Job != nil, "has_resources", alloc.AllocatedResources != nil) | ||
| continue | ||
| } |
There was a problem hiding this comment.
Panicking here is fine and probably preferable for what should be "unreachable" code. At least if someone has hit these conditions I'd love to hear about it! Since panics by handlers are recovered and logged by the http server, there's no availability danger.
The downside is that 1 corrupted job breaks service discovery if we panic and return a 500 from the handler. I think this is probably desirable because if unexpected corruption has taken place, what other corruption is lurking? This node probably needs to be drained and rebuilt.
Description
Adds
GET /v1/client/allocations/prometheus-sdto the client agent HTTP API, serving the node's running allocations as Prometheus HTTP SD target groups.One target group is emitted per allocated port of every running allocation, labeled with
__meta_nomad_*labels covering namespace, job, task group, allocation, node, and port, plus job/group meta as__meta_nomad_meta_<key>(group meta taking precedence over job meta). A?port=<label>query parameter filters to a single port label (e.g.?port=metrics). The endpoint serves local client state only and requiresnode:readACL capability, so scrape-target discovery fans out to the client nodes instead of funneling through the servers. Requests on agents without a client return 400, matching the other/v1/client/*endpoints.The change is purely additive: a new endpoint file, one route registration in
http.go, and a small client method exposing running allocations.Per the AI usage guidelines: AI tooling was used to assist with boilerplate and porting this change onto
main(conflict resolution, license headers); all code has been human-reviewed and tested.Testing & Reproduction steps
Unit tests cover target-group rendering (including legacy task-network ports), port filtering, allocation status filtering, incomplete-allocation handling, meta label precedence, deterministic output ordering, empty-list (not null) JSON wire format, ACL enforcement, and IPv6 host-IP bracketing.
Manual check: run a dev agent with a job exposing a labeled port, then
curl 'http://127.0.0.1:4646/v1/client/allocations/prometheus-sd?port=metrics'returns HTTP SD JSON consumable by a Prometheushttp_sd_configsblock.Links
Feature request: #28115
Contributor Checklist
.changelog/file to this PR's number once assigned)web-unified-docsif the feature direction is approved