Skip to content

gkrtjd99/AgentOTelStack

Repository files navigation

AgentOTelStack

English · 한국어

A local observability stack for AI coding agents (Claude Code · Codex · OpenCode …) and humans. One shared, Docker-based backend for logs · metrics · traces, with an observe -> reason -> change -> re-run feedback loop driven by plain curl query tools. No SDK needed to read telemetry — any agent reads AGENTS.md and uses the same obs/*.sh tools.


English

A self-contained observability backend you run once on your machine. Any number of your own apps point at it over OTLP (http://localhost:4318); everything lands in the same stores and is queried side by side. The agent (or you) reads telemetry back through ./obs/*.sh.

Architecture

Read this left to right: apps write telemetry into the collector; agents read it back through the query scripts.

flowchart LR
  subgraph Apps["Apps that emit OTLP"]
    Own["Your local app<br/>OTEL_SERVICE_NAME=my-app"]
    Demo["Bundled sample-app<br/>make demo only"]
  end

  Collector["OpenTelemetry Collector<br/>OTLP :4317 / :4318"]

  Own -->|"logs / metrics / traces"| Collector
  Demo -->|"logs / metrics / traces"| Collector

  Collector -->|"logs"| Logs["VictoriaLogs<br/>:9428 LogQL"]
  Collector -->|"metrics"| Metrics["VictoriaMetrics<br/>:8428 PromQL"]
  Collector -->|"traces"| Traces["VictoriaTraces<br/>:10428 Jaeger query API"]

  Logs --> LogTool["obs/logs.sh"]
  Metrics --> MetricTool["obs/metrics.sh"]
  Traces --> TraceTool["obs/traces.sh"]
  Logs --> Correlate["obs/correlate.sh"]
  Traces --> Correlate

  LogTool --> Reader["Agent or human<br/>reads AGENTS.md"]
  MetricTool --> Reader
  TraceTool --> Reader
  Correlate --> Reader
Loading
  • The OpenTelemetry Collector is the single fan-out point — it receives all OTLP signals and replicates them to the three stores.
  • VictoriaTraces is queried through the Jaeger query API, so obs/traces.sh uses Jaeger-style subcommands.

The feedback loop

The loop is not just "look at logs". Metrics tell you whether something is wrong; logs and traces tell you which request and code path explain it.

flowchart TD
  Workload["Run or rerun workload<br/>workload/run.sh or e2e"]
  Observe["Observe<br/>metrics, logs, traces"]
  Problem{"Bad signal?"}
  TraceID["Pick one failing trace_id<br/>from an error log or trace search"]
  Correlate["Correlate<br/>obs/correlate.sh trace_id"]
  Reason["Reason from spans + logs<br/>find the failing operation"]
  Change["Change code<br/>app/ or your own service"]
  Rebuild["Rebuild and restart<br/>then rerun the workload"]
  Compare["Compare with baseline or target<br/>error rate, latency, failures"]
  Done["Done<br/>keep the measured result"]

  Workload --> Observe
  Observe --> Compare
  Compare --> Problem
  Problem -- "no" --> Done
  Problem -- "yes" --> TraceID
  TraceID --> Correlate
  Correlate --> Reason
  Reason --> Change
  Change --> Rebuild
  Rebuild --> Workload
Loading

Why

What an agent (or a human) editing code lacks most is fact-based feedback on whether a change actually worked. Logs alone are fragmentary; metrics alone tell you what broke but not where or why. This stack:

  • Unifies the three signals in one backend, so they connect to each other.
  • Pivots across signals on trace_id — "error rate spiked" (metric) → "this request failed" (log) → "this span in this code path returned 500" (trace), all at once. (./obs/correlate.sh)
  • Needs no SDK to read — just curl wrappers (./obs/*.sh). Any agent reads AGENTS.md and runs the same loop with the same tools.
  • Is shared by every local project once it's up — give each app a different OTEL_SERVICE_NAME and they all report into the same backend and are queried side by side.

What you get

With this stack attached you can answer, in numbers (see Verified):

  • Error ratesum(...{outcome="error"}) / sum(...) → e.g. 18.7%
  • Latency distributionhistogram_quantile(0.95, ...) → e.g. p95 4.75s
  • Failure localization — from a failed request's trace, instantly see which span (GET /api/checkout) carried http.status_code=500
  • Before/after comparison — after a fix, re-run the same workload and verify the error rate / latency actually dropped

So instead of "I think I fixed it", you say "error rate 18.7% → 0%".

Verified

Actual run results, not claims.

Booted the full stack with make demo, drove load with ./workload/run.sh 150, then queried all four tools:

Check Result
Stack boot 5 containers (collector + Victoria ×3 + sample-app) all healthy
Write path app → collector → all 3 stores receiving (success 54 / error 12)
Read path logs.sh / metrics.sh / traces.sh / correlate.sh all returned real data
Correlation error log (checkout failed + trace_id) → correlate.sh → same trace's GET /api/checkout span showed http.status_code=500, error=true, 17.4ms
Effect metrics error rate 18.7%, p95 4.75s
External app a second app appeared alongside sample-app in the trace service list → bring-your-own-app path proven

Reproduce in Reproduce.

Prerequisites

  • Docker (Docker Desktop or Engine) — runs the collector, stores, and optional demo app.
  • jq — the ./obs/*.sh query scripts use it to pretty-print JSON (brew install jq / apt install jq).
  • make (optional) — convenience wrapper. Without it, run docker compose up -d directly (raw commands are in Makefile).
  • Your app must emit OTLP. If it doesn't yet, see docs/CONNECT.md (Node / Python / Java / Go).

Quick start

Want to attach your own app?docs/CONNECT.md. Summary: make up (infra only), then send your app to http://localhost:4318 with OTEL_SERVICE_NAME=my-app.

Just want the self-contained demo?

# 1. Start infra + bundled sample app
make demo           # = docker compose --profile demo up -d --build

# 2. Generate traffic
./workload/run.sh 300

# 3. Observe (the very tools the agent uses)
./obs/metrics.sh 'sum by (outcome) (orders_processed_total)'
./obs/logs.sh '_time:5m severity_text:error' 20
./obs/traces.sh search-errors sample-app

# 4. (optional) run a browser UI journey
cd e2e && npm install && npm run install-browsers && npm test

# 5. (optional) run the automated smoke test / terminal dashboard
make smoke
make dashboard SERVICE=sample-app
make dashboard SERVICE=sample-app MODE=compact LOOKBACK=15m
./obs/overview.sh --json --since 15m sample-app

# 6. (optional) browser dashboard
make grafana

Sample app UI: http://localhost:3000. Optional Grafana UI: http://localhost:3001, with the ObservabilityStack / Local Observability dashboard provisioned automatically.

make up starts only the shared infra (collector + 3 stores) — the bring-your-own-app default. make demo adds the bundled sample app.

Reproduce

make demo                                  # full stack
./workload/run.sh 150                       # load (~10% intentional failures)
sleep 12                                    # wait for metric export interval (10s)

# 1) metrics — success/error counts
./obs/metrics.sh 'sum by (outcome) (orders_processed_total)'
# 2) traces — services reporting + error traces
./obs/traces.sh services
./obs/traces.sh search-errors sample-app 5
# 3) logs — pull one error log and grab its trace_id
./obs/logs.sh '_time:10m severity_text:error' 1
# 4) correlate — that trace_id's spans + logs in one shot
./obs/correlate.sh <trace_id-from-step-3>

Expected: metrics show success/error counts, traces show sample-app, and correlate output shows the GET /api/checkout span with http.status_code=500.

Connect your own app (the two-layer model)

This is not a library you install into each project. It is one shared backend that every project points at. Two layers:

Layer Lives where What you do
Backend (one copy) ~/AgentOTelStack/ Run make up for the 4 shared infra containers: collector + VictoriaLogs/Metrics/Traces.
Per app (tiny) Each project folder Set 4 env vars; Node apps may add one otel.js; emit OTLP to :4318.

Use make demo when you also want the bundled sample-app on :3000.

Layer 2 — the only per-app footprint. Set four env vars and run your app:

export OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318   # the collector
export OTEL_EXPORTER_OTLP_PROTOCOL=http/protobuf
export OTEL_SERVICE_NAME=my-app                            # unique per app
export OTEL_RESOURCE_ATTRIBUTES=deployment.environment=dev

Per-language setup (full detail in docs/CONNECT.md):

Language What lands in your app folder New files
Node/TS copy app/src/otel.js + deps + --require ./otel.js 1 (otel.js)
Python pip install + wrap launch with opentelemetry-instrument 0 (env only)
Java -javaagent:opentelemetry-javaagent.jar 1 (jar)
Go set up SDK in main() with OTLP/HTTP exporters code edit

Multiple apps? They all land in the same stores; filter by service name:

./obs/logs.sh   '_time:15m service.name:my-app severity_text:error'
./obs/metrics.sh 'sum by (outcome) (some_metric{service_name="my-app"})'
./obs/traces.sh  search my-app

What's in here

Path What it is
docker-compose.yml Orchestrates Victoria ×3 + collector + app on the dev-observability network
otel-collector/config.yaml OTLP receive → fan-out to the 3 stores
app/ Swappable sample service (Node + explicit OTel bootstrap + lockfile). Replace with your own.
obs/ Agent query tools: logs.sh (LogQL), metrics.sh (PromQL), traces.sh (Jaeger), correlate.sh, plus multi-app helpers
scripts/smoke.sh End-to-end write/read path verification (make smoke)
dashboards/local-observability.json Optional Grafana dashboard provisioned by the dashboard profile
grafana/provisioning/ Grafana datasource and dashboard provider provisioning
.github/workflows/ci.yml Static validation, npm audit, and Docker smoke test
workload/run.sh Synthetic load generator
e2e/ Playwright browser UI journey
AGENTS.md Operating guide every agent reads (CLAUDE.md is a symlink to it)
docs/ARCHITECTURE.md Runtime structure — write/read paths, collector fan-out, querying
docs/CONNECT.md How to point your own app at the stack (per language)
docs/DASHBOARD.md Terminal overview and built-in Victoria UI entry points
docs/DASHBOARD_PLAN.md Dashboard roadmap and implementation phases
docs/SECURITY.md Local-only security model and remote exposure guidance

Ports

Service Port Purpose
sample-app 3000 app + UI (http://localhost:3000) — demo mode only
otel-collector 4317 / 4318 OTLP gRPC / HTTP ingest (apps send here)
VictoriaLogs 9428 LogQL query API (logs.sh)
VictoriaMetrics 8428 PromQL query API (metrics.sh)
VictoriaTraces 10428 Jaeger query API (traces.sh)
Grafana 3001 Optional browser dashboard (make grafana)

Teardown

make down          # stop (telemetry preserved in volumes)
make clean         # stop + wipe all stored telemetry (docker compose down -v)

Further reading


한국어

로컬에서 한 번 띄워두면 모든 로컬 프로젝트가 공유하는 관측 백엔드입니다. 앱은 OTLP (http://localhost:4318)로 신호를 보내고, 에이전트(또는 사람)는 ./obs/*.sh로 조회합니다. 텔레메트리를 읽는 데 SDK가 필요 없습니다.

아키텍처

왼쪽에서 오른쪽으로 보면 됩니다. 앱은 컬렉터로 텔레메트리를 쓰고, 에이전트는 조회 스크립트로 다시 읽습니다.

flowchart LR
  subgraph Apps["OTLP를 보내는 앱"]
    Own["내 로컬 앱<br/>OTEL_SERVICE_NAME=my-app"]
    Demo["번들 sample-app<br/>make demo일 때만"]
  end

  Collector["OpenTelemetry Collector<br/>OTLP :4317 / :4318"]

  Own -->|"logs / metrics / traces"| Collector
  Demo -->|"logs / metrics / traces"| Collector

  Collector -->|"logs"| Logs["VictoriaLogs<br/>:9428 LogQL"]
  Collector -->|"metrics"| Metrics["VictoriaMetrics<br/>:8428 PromQL"]
  Collector -->|"traces"| Traces["VictoriaTraces<br/>:10428 Jaeger query API"]

  Logs --> LogTool["obs/logs.sh"]
  Metrics --> MetricTool["obs/metrics.sh"]
  Traces --> TraceTool["obs/traces.sh"]
  Logs --> Correlate["obs/correlate.sh"]
  Traces --> Correlate

  LogTool --> Reader["에이전트 또는 사람<br/>AGENTS.md를 읽고 사용"]
  MetricTool --> Reader
  TraceTool --> Reader
  Correlate --> Reader
Loading
  • 팬아웃은 OpenTelemetry Collector가 담당합니다 — OTLP 로그/메트릭/트레이스 3종을 받아 각 저장소로 복제합니다.
  • VictoriaTracesJaeger query API로 조회합니다. 그래서 obs/traces.sh가 Jaeger식 서브커맨드를 씁니다.

피드백 루프

루프는 단순히 "로그 보기"가 아닙니다. 메트릭은 문제가 있는지 알려주고, 로그와 트레이스는 어떤 요청/코드 경로 때문인지 알려줍니다.

flowchart TD
  Workload["워크로드 실행 또는 재실행<br/>workload/run.sh 또는 e2e"]
  Observe["관찰<br/>metrics, logs, traces"]
  Problem{"나쁜 신호가 있나?"}
  TraceID["실패 trace_id 하나 선택<br/>에러 로그 또는 trace search에서"]
  Correlate["상관분석<br/>obs/correlate.sh trace_id"]
  Reason["span + log로 추론<br/>실패한 작업 찾기"]
  Change["코드 변경<br/>app/ 또는 내 서비스"]
  Rebuild["재빌드/재시작<br/>그 다음 워크로드 재실행"]
  Compare["기준값 또는 이전 실행과 비교<br/>에러율, 지연, 실패"]
  Done["완료<br/>측정 결과를 남김"]

  Workload --> Observe
  Observe --> Compare
  Compare --> Problem
  Problem -- "아니오" --> Done
  Problem -- "예" --> TraceID
  TraceID --> Correlate
  Correlate --> Reason
  Reason --> Change
  Change --> Rebuild
  Rebuild --> Workload
Loading

왜 쓰는가

코드를 고치는 에이전트(혹은 사람)에게 가장 부족한 건 "내 변경이 실제로 어떤 영향을 줬는가"에 대한 사실 기반 피드백입니다. 로그만 보면 단편적이고, 메트릭만 보면 무엇이 잘못됐는지는 알아도 어디서·왜 인지는 모릅니다. 이 스택은:

  • 세 신호를 한 백엔드로 합칩니다 — logs·metrics·traces가 같은 곳에 쌓여 서로 연결됩니다.
  • trace_id로 신호를 가로질러 pivot 합니다 — "에러율이 올랐다"(metric) → "이 요청이 실패했다"(log) → "이 코드 경로의 이 스팬에서 500이 났다"(trace)를 한 번에 추적합니다. (./obs/correlate.sh)
  • 읽는 데 SDK가 필요 없습니다 — 그냥 curl 래퍼(./obs/*.sh). 어떤 에이전트든 AGENTS.md만 읽으면 같은 도구로 같은 루프를 돕니다.
  • 한 번 켜두면 모든 로컬 프로젝트가 공유합니다 — 앱마다 OTEL_SERVICE_NAME만 다르게 주면 같은 백엔드로 보고하고 나란히 조회됩니다.

무엇을 얻는가

이 스택을 붙이면 다음을 수치로 답할 수 있게 됩니다 (검증 참고):

  • 에러율sum(...{outcome="error"}) / sum(...) → 예: 18.7%
  • 지연 분포histogram_quantile(0.95, ...) → 예: p95 4.75s
  • 실패 위치 특정 — 실패한 요청의 trace에서 어느 스팬(GET /api/checkout)이 http.status_code=500인지 즉시 확인
  • before/after 비교 — 코드 수정 후 같은 워크로드를 재실행해 에러율·지연이 실제로 내려갔는지 객관 확인

즉 "고친 것 같다"가 아니라 **"에러율 18.7% → 0%로 떨어졌다"**고 말할 수 있습니다.

검증

아래는 실제 실행 결과입니다(주장 아님).

make demo로 풀스택을 띄우고 ./workload/run.sh 150으로 부하를 준 뒤 네 도구를 모두 조회:

검증 항목 결과
스택 기동 5개 컨테이너(collector + Victoria 3종 + sample-app) 전부 healthy
Write path app → collector → 3종 저장소 모두 수신 (success 54 / error 12)
Read path logs.sh / metrics.sh / traces.sh / correlate.sh 4종 모두 실데이터 반환
상관관계 에러 로그(checkout failed + trace_id) → correlate.sh → 같은 trace의 GET /api/checkout 스팬에서 http.status_code=500, error=true, 17.4ms 확인
효과 지표 에러율 18.7%, p95 4.75s 산출
외부 앱 연결 트레이스 서비스 목록에 sample-app과 별도 앱이 동시 노출 → bring-your-own-app 경로 실증

재현은 검증 재현 절 참고.

사전 요구

  • Docker(Docker Desktop 또는 Engine) — 컬렉터, 저장소, 선택적 데모 앱을 실행.
  • jq./obs/*.sh 조회 스크립트가 JSON을 정리 출력할 때 사용 (brew install jq / apt install jq).
  • make (선택) — 편의 래퍼. 없으면 docker compose up -d로 직접 실행 (원시 명령은 Makefile 참고).
  • 본인 앱이 OTLP를 송신해야 함. 아직이면 docs/CONNECT.md 참고 (Node / Python / Java / Go).

Quick start

내 앱을 붙이려면?docs/CONNECT.md. 요약: make up(인프라만) 후 내 앱을 http://localhost:4318로 보내고 OTEL_SERVICE_NAME=my-app 지정.

자체 완결 데모만 보고 싶다면:

# 1. 인프라 + 번들 샘플 앱 기동
make demo           # = docker compose --profile demo up -d --build

# 2. 트래픽 생성
./workload/run.sh 300

# 3. 관찰 (에이전트가 쓰는 바로 그 도구들)
./obs/metrics.sh 'sum by (outcome) (orders_processed_total)'
./obs/logs.sh '_time:5m severity_text:error' 20
./obs/traces.sh search-errors sample-app

# 4. (선택) 브라우저 UI 여정 실행
cd e2e && npm install && npm run install-browsers && npm test

# 5. (선택) 자동 smoke test / 터미널 대시보드
make smoke
make dashboard SERVICE=sample-app
make dashboard SERVICE=sample-app MODE=compact LOOKBACK=15m
./obs/overview.sh --json --since 15m sample-app

# 6. (선택) 브라우저 대시보드
make grafana

샘플 앱 UI: http://localhost:3000. 선택형 Grafana UI: http://localhost:3001. ObservabilityStack / Local Observability 대시보드가 자동 provision됩니다.

make up공유 인프라만(collector + 저장소 3종) 띄웁니다 — bring-your-own-app 기본값. make demo는 여기에 샘플 앱을 더합니다.

검증 재현

make demo                                  # 풀스택 기동
./workload/run.sh 150                       # 부하 (약 10%는 의도적 실패)
sleep 12                                    # 메트릭 export 주기(10s) 대기

# 1) metrics — 성공/실패 카운트
./obs/metrics.sh 'sum by (outcome) (orders_processed_total)'
# 2) traces — 보고 중인 서비스 + 에러 트레이스
./obs/traces.sh services
./obs/traces.sh search-errors sample-app 5
# 3) logs — 에러 로그에서 trace_id 하나 뽑기
./obs/logs.sh '_time:10m severity_text:error' 1
# 4) correlate — 그 trace_id로 스팬 + 로그를 한 번에
./obs/correlate.sh <trace_id-from-step-3>

기대 결과: metrics에 success/error 카운트, traces에 sample-app, correlate 출력에서 GET /api/checkout 스팬의 http.status_code=500.

내 앱 연결 (두-층 모델)

이건 각 프로젝트에 설치하는 라이브러리가 아닙니다. 모든 프로젝트가 가리키는 공유 백엔드가 하나 있습니다. 두 층:

위치 할 일
백엔드 (1개만) ~/AgentOTelStack/ make up으로 공유 인프라 컨테이너 4개(collector + VictoriaLogs/Metrics/Traces)를 실행
앱마다 (아주 작음) 각 프로젝트 폴더 env 4개 설정; Node 앱은 otel.js 1개를 추가할 수 있음; :4318로 OTLP 송신

번들 sample-app(:3000)까지 같이 보려면 make demo를 씁니다.

층2 — 앱마다 생기는 것은 이것뿐. env 4개를 설정하고 앱을 실행:

export OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318   # 컬렉터
export OTEL_EXPORTER_OTLP_PROTOCOL=http/protobuf
export OTEL_SERVICE_NAME=my-app                            # 앱마다 유일
export OTEL_RESOURCE_ATTRIBUTES=deployment.environment=dev

언어별 설정 (상세는 docs/CONNECT.md):

언어 내 앱 폴더에 생기는 것 새 파일
Node/TS app/src/otel.js 복사 + 의존성 + --require ./otel.js 1개 (otel.js)
Python pip install + opentelemetry-instrument로 실행 감싸기 0개 (env만)
Java -javaagent:opentelemetry-javaagent.jar 1개 (jar)
Go main()에 OTLP/HTTP exporter로 SDK 세팅 코드 수정

여러 앱이면? 전부 같은 저장소에 쌓이고, 서비스 이름으로 필터:

./obs/logs.sh   '_time:15m service.name:my-app severity_text:error'
./obs/metrics.sh 'sum by (outcome) (some_metric{service_name="my-app"})'
./obs/traces.sh  search my-app

구성

경로 설명
docker-compose.yml Victoria 3종 + collector + app을 dev-observability 네트워크에 오케스트레이션
otel-collector/config.yaml OTLP 수신 → 3종 저장소로 fan-out
app/ 교체 가능한 샘플 서비스 (Node + 명시적 OTel bootstrap + lockfile). 내 앱으로 바꿔 관측.
obs/ 에이전트 조회 도구: logs.sh(LogQL), metrics.sh(PromQL), traces.sh(Jaeger), correlate.sh, 다중 앱 helper
scripts/smoke.sh write/read path 자동 검증 (make smoke)
dashboards/local-observability.json dashboard profile로 provision되는 선택형 Grafana dashboard
grafana/provisioning/ Grafana datasource와 dashboard provider provisioning
.github/workflows/ci.yml 정적 검증, npm audit, Docker smoke test
workload/run.sh 합성 부하 생성기
e2e/ Playwright 브라우저 UI 여정
AGENTS.md 모든 에이전트가 읽는 운영 가이드 (CLAUDE.md가 심링크)
docs/ARCHITECTURE.md 런타임 동작 구조 — write/read path, 컬렉터 fan-out, 조회 방식
docs/CONNECT.md 내 앱을 OTLP로 붙이는 법 (언어별)
docs/DASHBOARD.md 터미널 overview와 Victoria 내장 UI 진입점
docs/DASHBOARD_PLAN.md 대시보드 로드맵과 구현 단계
docs/SECURITY.md 로컬 보안 경계와 원격 노출 가이드

포트

서비스 포트 용도
sample-app 3000 앱 + UI (http://localhost:3000) — demo 모드만
otel-collector 4317 / 4318 OTLP gRPC / HTTP 수신 (앱이 여기로 쏨)
VictoriaLogs 9428 LogQL 쿼리 API (logs.sh)
VictoriaMetrics 8428 PromQL 쿼리 API (metrics.sh)
VictoriaTraces 10428 Jaeger 쿼리 API (traces.sh)
Grafana 3001 선택형 브라우저 대시보드 (make grafana)

종료

make down          # 정지 (텔레메트리는 볼륨에 보존)
make clean         # 정지 + 저장된 텔레메트리 전부 삭제 (docker compose down -v)

더 보기

About

No description, website, or topics provided.

Resources

License

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors