Add configurability to set tags and metadata on langfuse traces by danchild · Pull Request #53 · forge-sdlc/forge

danchild · 2026-05-21T21:19:29Z

Summary

Adds configurability for Langfuse trace tags and metadata via LANGFUSE_TRACE_TAGS and LANGFUSE_TRACE_METADATA env vars
Ports Grafana dashboards (business, engineering, issue-detail) into devtools/grafana/ with full provisioning
Fixes trace field leakage into agent system prompts (workflow-internal fields were appearing in LLM context)

Changes

Langfuse trace configurability

LANGFUSE_TRACE_TAGS / LANGFUSE_TRACE_METADATA: comma-separated field lists that control what workflow context is attached to Langfuse traces
Available fields: ticket_key, ticket_type, project_id, workflow_step, repo, pr_number, ci_status, event_source, event_type, llm_model, retry_count, system_prompt_length
Fields are validated at startup; unknown values are silently ignored and logged

Grafana dashboard stack

Three dashboards: forge-business (ticket throughput, cycle time), forge-engineering (LLM latency, token usage), forge-issue-detail (per-ticket trace drill-down)
Datasources: ClickHouse (Langfuse traces), Prometheus (Forge metrics), Redis
devtools/grafana/compose.grafana.yml: standalone Grafana compose for dashboard dev
devtools/grafana/compose.langfuse-network.yml: optional overlay that joins Grafana to a self-hosted Langfuse Docker network (requires Langfuse to be running — omit this file if not using self-hosted Langfuse)
Wired into both docker-compose.yml and devtools/docker-compose.dev.yml

Trace context leakage fix (prompted by review feedback)

Added trace_context parameter to ForgeAgent.run_task() — fields passed there are forwarded to Langfuse only and never written to the system prompt
Reverted generate_prd, generate_spec, generate_epics, regenerate_with_feedback, answer_question to their original minimal prompt contexts; workflow state trace fields go via trace_context instead
Fixed sync_pr_description in code_review.py the same way
Previously, fields like current_node, event_type, retry_count were leaking verbatim into the agent system prompt for tasks where they're irrelevant

Docs

docs/reference/config.md: added Langfuse trace field and Grafana variable reference
docs/developer-guide.md: added trace tags/metadata section and Grafana setup instructions
.env.example: documented new vars with recommended values for dashboard compatibility

Test plan

uv run pytest tests/unit/ -q passes
Start stack with docker compose --env-file .env -f docker-compose.yml up -d prometheus grafana — Grafana reachable at http://localhost:3010
With self-hosted Langfuse: add -f devtools/grafana/compose.langfuse-network.yml — ClickHouse datasource connects and dashboard panels render
Without Langfuse: base command (no network overlay) starts successfully — Prometheus and Redis panels work

🤖 Generated with Claude Code

eshulman2 · 2026-05-31T10:19:40Z

Review Notes

Overall

Good architecture — the TracingField enum + resolver pattern is clean, the config parsing with validation is solid, and test coverage is comprehensive. A few things to address before merge:

1. Bug workflow nodes are not covered

The PR only enriches context in feature workflow nodes (prd_generation, spec_generation, epic_decomposition, task_generation, pr_creation, qa_handler, code_review).

The bug workflow nodes that invoke the agent were not updated:

triage.py — only passes context={"ticket_key": ticket_key}, missing all other trace fields
rca_analysis.py — invokes the agent via ContainerRunner, different code path entirely
plan_bug_fix.py — same, uses ContainerRunner

If someone configures LANGFUSE_TRACE_TAGS=ticket_type,workflow_step, bug workflow traces will be missing those tags.

2. The per-node enrichment approach is fragile

The current design requires every node that calls the agent to manually build a ~6-line context dict:

context = {
    "ticket_key": ticket_key,
    "ticket_type": state.get("ticket_type", ""),
    "current_node": state.get("current_node", ""),
    "event_type": state.get("event_type", ""),
    "event_source": state.get("context", {}).get("source", ""),
    "retry_count": state.get("retry_count", 0),
}

This is copy-pasted into 7 files and will need to be added to every future node that calls the agent. If someone forgets (as happened with the bug workflow nodes), traces from that node get no tags/metadata.

Suggested alternative: Resolve the trace fields once in the orchestrator worker — it already has the full workflow state at invocation time — and store the resolved (tags, metadata) in the state dict or pass them via the LangGraph config. The agent's run_task() would then pick them up automatically without any node needing to know about tracing. This would:

Eliminate the per-node boilerplate
Cover all nodes automatically (including future ones)
Remove the risk of forgetting to add trace context to new nodes
Work with ContainerRunner-based nodes without changes

3. Field naming inconsistency

TracingField.WORKFLOW_STEP resolves by reading state["current_node"] but produces a metadata key called "workflow_step". Same for current_repo → repo, current_pr_number → pr_number. The resolvers work correctly, but the mismatch between config names, Langfuse keys, and actual state keys is confusing. Consider naming TracingField members to match their state key names (e.g., CURRENT_NODE instead of WORKFLOW_STEP).

Addresses review feedback from forge-sdlc#53: - Resolve tags/metadata once in the worker and pass via state/config, eliminating copy-pasted per-node context dicts across 7 feature nodes - Extend coverage to ContainerRunner-based bug workflow nodes (triage, rca_analysis, plan_bug_fix) so all nodes get trace enrichment - Fix TracingField naming to match state keys (WORKFLOW_STEP → CURRENT_NODE, etc.) for consistency between config names, Langfuse keys, and state keys Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Dan Childers <dchilder@redhat.com>

danchild · 2026-06-08T17:38:22Z

Hi @eshulman2 thank you for the feedback. I've push a number of changes that address all of your concerns. Please let me know if you have questions.

Addresses review feedback from forge-sdlc#53: - Resolve tags/metadata once in the worker and pass via state/config, eliminating copy-pasted per-node context dicts across 7 feature nodes - Extend coverage to ContainerRunner-based bug workflow nodes (triage, rca_analysis, plan_bug_fix) so all nodes get trace enrichment - Fix TracingField naming to match state keys (WORKFLOW_STEP → CURRENT_NODE, etc.) for consistency between config names, Langfuse keys, and state keys Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Dan Childers <dchilder@redhat.com>

eshulman2 · 2026-06-11T12:45:22Z

Claude Code review

All three concerns from the original review are addressed: bug workflow nodes are now covered via the orchestrator worker, per-node context boilerplate is eliminated, and TracingField members are renamed to match state key names (CURRENT_NODE, REPO, PR_NUMBER).

Found 2 issues in the refactored code:

if value: should be if value is not None: — empty-string resolved values (e.g., ticket_key="") are silently dropped with no warning, making misconfigured state invisible in traces

forge/src/forge/integrations/langfuse/fields.py

Lines 275 to 286 in 3c11e12

    
           for field in settings.trace_tag_fields: 
        
               value = resolve_field(field, state) 
        
               if value: 
        
                   tags.append(value) 
        
           metadata: dict[str, Any] = {} 
        
           for field in settings.trace_metadata_fields: 
        
               value = resolve_field(field, state) 
        
               if value: 
        
                   metadata[field.value] = value 
        
           return tags, metadata

"ticket_key": ticket_key or "" in regenerate_with_feedback puts an empty string into task_context when ticket_key=None. Since run_task merges explicit context over trace context (merged = {**get_trace_context(), **(context or {})}), this overwrites the valid ticket_key from the orchestrator's trace context with "", silently breaking Langfuse session tracking for that invocation. Fix: only include ticket_key in task_context when it is not None.

forge/src/forge/integrations/agents/agent.py

Lines 1011 to 1020 in 3c11e12

    
           logger.info(f"Regenerating {content_type} with feedback using Deep Agents") 
        
           task_context = { 
        
               "is_revision": True, 
        
               "ticket_key": ticket_key or "", 
        
           } 
        
           result = await self.run_task( 
        
               task=skill_name, 
        
               prompt=prompt, 
        
               context=task_context, 
        
           )

🤖 Generated with Claude Code

_{- If this code review was useful, please react with 👍. Otherwise, react with 👎.}

Addresses review feedback from forge-sdlc#53: - Resolve tags/metadata once in the worker and pass via state/config, eliminating copy-pasted per-node context dicts across 7 feature nodes - Extend coverage to ContainerRunner-based bug workflow nodes (triage, rca_analysis, plan_bug_fix) so all nodes get trace enrichment - Fix TracingField naming to match state keys (WORKFLOW_STEP → CURRENT_NODE, etc.) for consistency between config names, Langfuse keys, and state keys Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Dan Childers <dchilder@redhat.com>

danchild · 2026-06-11T19:17:04Z

Hi @eshulman2 - thank you for the feedback. I made the changes and also made a few edits to .env.example to include better documentation and make sure the new env vars are empty by default

eshulman2

the last thing that is missing IMHO is docs, please update the config reference and the developer guide with information regarding this change

danchild · 2026-06-15T15:01:36Z

Good idea, @eshulman2 - I just added documentation as described and pushed the changes. Please let me know what you think

eshulman2

I noticed you removed some context from some nodes, why is that?

eshulman2 · 2026-06-16T12:38:58Z

                    task="sync-pr-description",
                    prompt=prompt,
-                    context={"owner": owner, "repo": repo, "pr_number": pr_number},
+                    context={


any reason for this change?

eshulman2 · 2026-06-16T12:46:40Z

I believe this might give some context on the issues I mentioned in the review.

The trace context should not be merged into the agent system prompt. Agent prompts and skills should be isolated from the tags/metadata we collect for observability.

The problem is in run_task() — the trace context (set via set_trace_context) is merged into merged which then feeds both resolve_trace_fields() and the system prompt context block. This causes observability fields like is_blocked, ci_fix_attempts, ai_review_status, revision_requested, etc. to appear in every agent's system prompt regardless of whether they're relevant to the task.

The merge was introduced because ticket_key was removed from explicit node context dicts (reclassified as a trace field), and merging the trace context back into the prompt was the way to restore it. But ticket_key is domain context the agent needs — it shouldn't have been removed from the explicit context in the first place.

Suggested fix: keep the two concerns separate in run_task():

System prompt gets only the explicit context dict passed by the caller (domain context)
Langfuse fields are resolved from get_trace_context() alone, independently

# Domain context → system prompt only
if context:
    system_prompt += "\n\nContext:\n"
    for key, value in context.items():
        if value is not None:
            system_prompt += f"- {key}: {value}\n"

# Trace context → Langfuse only, never touches the prompt
trace_state = {**get_trace_context(), "system_prompt_length": len(system_prompt), "llm_model": ...}
trace_tags, trace_metadata = resolve_trace_fields(trace_state)

Nodes that need ticket_key in the prompt should pass it explicitly in their domain context dict — as they did before this PR.

eshulman2 · 2026-06-16T12:55:52Z

A few other things worth addressing:

event_type is never written to workflow state. BaseState gets a new event_type: str field and there's a resolver for it, but event_type lives on QueueMessage — it's never written into the LangGraph state dict. So _resolve_event_type will always return None. Either write it to state at workflow start, or remove the field from the resolver until it has a real source.

The system_prompt_length missing warning is presumptuous. trace_metadata_fields warns at startup when Langfuse is enabled but system_prompt_length isn't in LANGFUSE_TRACE_METADATA. The whole point of this PR is opt-in configuration — treating one specific field as implicitly required contradicts that. If the operator intentionally left it out, they still get a warning. Just remove it.

parse_trace_fields naming is inverted. Called with allow_tags=True for the tags config and allow_tags=False for metadata — but allow_tags=False means "don't enforce tag eligibility" (i.e. allow everything). The boolean reads backwards. tags_only=True/False or enforce_tag_eligibility would be clearer.

ContextVar is never reset after workflow completion. set_trace_context is called before ainvoke but there's no cleanup after it returns. In the worker's long-running loop the same asyncio task context could carry a previous run's trace fields into the next invocation. Reset to {} after each ainvoke call.

Duplicate fields are silently accepted. LANGFUSE_TRACE_TAGS=ticket_key,ticket_key produces duplicate tags. A quick dedup pass in parse_trace_fields before returning would prevent it.

- Remove inline "system_prompt_length" - Create the mechanism to pass node state to context to allow writing configured metadata and traces to langfuse traces Signed-off-by: Dan Childers <dchilder@redhat.com> Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

…prompts Grafana dashboards: - Port forge-business, forge-engineering, forge-issue-detail dashboards into devtools/grafana/dashboards/ - Add datasource provisioning for ClickHouse (Langfuse), Prometheus, Redis - Add compose.grafana.yml (standalone) and compose.langfuse-network.yml (optional overlay for self-hosted Langfuse) compose files - Wire Grafana into docker-compose.yml and devtools/docker-compose.dev.yml - Document required LANGFUSE_TRACE_TAGS/METADATA values for dashboard queries in .env.example, config reference, and developer guide - Clarify compose.langfuse-network.yml as opt-in: running it without Langfuse up fails the whole stack; base commands now work without it Trace context fix: - Add trace_context parameter to ForgeAgent.run_task() — fields passed there go to Langfuse only and are never written to the system prompt - Revert generate_prd/spec/epics/regenerate/answer_question to original minimal prompt contexts; workflow state trace fields forwarded via trace_context instead - Fix sync_pr_description in code_review.py the same way Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

danchild force-pushed the feature-metadata-tags-config branch 3 times, most recently from 34b8135 to bc14b93 Compare May 26, 2026 19:58

danchild force-pushed the feature-metadata-tags-config branch from bc14b93 to 1bcdf07 Compare June 8, 2026 14:40

danchild force-pushed the feature-metadata-tags-config branch from 9da99f6 to 6eb63de Compare June 8, 2026 17:19

danchild force-pushed the feature-metadata-tags-config branch from 6eb63de to 2b787f3 Compare June 8, 2026 17:32

danchild force-pushed the feature-metadata-tags-config branch from 2b787f3 to 8e82dcb Compare June 8, 2026 18:07

danchild force-pushed the feature-metadata-tags-config branch from 8e82dcb to 3c11e12 Compare June 8, 2026 18:12

danchild force-pushed the feature-metadata-tags-config branch from 3c11e12 to ede1ba3 Compare June 11, 2026 18:42

danchild force-pushed the feature-metadata-tags-config branch from b47dbef to 27e6c43 Compare June 11, 2026 19:14

eshulman2 reviewed Jun 15, 2026

View reviewed changes

eshulman2 reviewed Jun 16, 2026

View reviewed changes

danchild and others added 2 commits June 18, 2026 11:28

fix: keep trace metadata out of agent prompts

1a9a736

eshulman2 force-pushed the feature-metadata-tags-config branch from 3434028 to 1a9a736 Compare June 18, 2026 09:19

feat: add experimental forge dashboards

bb79969

eshulman2 merged commit 6b81238 into forge-sdlc:main Jun 18, 2026
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add configurability to set tags and metadata on langfuse traces#53

Add configurability to set tags and metadata on langfuse traces#53
eshulman2 merged 4 commits into
forge-sdlc:mainfrom
danchild:feature-metadata-tags-config

danchild commented May 21, 2026 •

edited by eshulman2

Loading

Uh oh!

eshulman2 commented May 31, 2026

Uh oh!

danchild commented Jun 8, 2026

Uh oh!

eshulman2 commented Jun 11, 2026 •

edited

Loading

Uh oh!

danchild commented Jun 11, 2026

Uh oh!

eshulman2 left a comment

Uh oh!

danchild commented Jun 15, 2026

Uh oh!

eshulman2 left a comment

Uh oh!

eshulman2 Jun 16, 2026

Uh oh!

Uh oh!

eshulman2 commented Jun 16, 2026 •

edited

Loading

Uh oh!

eshulman2 commented Jun 16, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

danchild commented May 21, 2026 • edited by eshulman2 Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Langfuse trace configurability

Grafana dashboard stack

Trace context leakage fix (prompted by review feedback)

Docs

Test plan

Uh oh!

eshulman2 commented May 31, 2026

Review Notes

Overall

1. Bug workflow nodes are not covered

2. The per-node enrichment approach is fragile

3. Field naming inconsistency

Uh oh!

danchild commented Jun 8, 2026

Uh oh!

eshulman2 commented Jun 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Claude Code review

Uh oh!

danchild commented Jun 11, 2026

Uh oh!

eshulman2 left a comment

Choose a reason for hiding this comment

Uh oh!

danchild commented Jun 15, 2026

Uh oh!

eshulman2 left a comment

Choose a reason for hiding this comment

Uh oh!

eshulman2 Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

eshulman2 commented Jun 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

eshulman2 commented Jun 16, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

danchild commented May 21, 2026 •

edited by eshulman2

Loading

eshulman2 commented Jun 11, 2026 •

edited

Loading

eshulman2 commented Jun 16, 2026 •

edited

Loading