Skip to content

feat(jsonlogger): schema presets, structured errors, redaction, term caps#385

Open
Taure wants to merge 7 commits into
novaframework:masterfrom
Taure:feat/jsonlogger-tier1
Open

feat(jsonlogger): schema presets, structured errors, redaction, term caps#385
Taure wants to merge 7 commits into
novaframework:masterfrom
Taure:feat/jsonlogger-tier1

Conversation

@Taure
Copy link
Copy Markdown
Collaborator

@Taure Taure commented May 19, 2026

Summary

Tier 1 modernization of nova_jsonlogger to bring it in line with what shops expect from a 2026 structured-JSON logger.

  • Schema presets via #{schema => nova | ecs | otel | gcp | datadog} config:

    • ECS@timestamp, log.level, message, trace.id, span.id, error.type / error.message / error.stack_trace, log.origin.* for source location.
    • OTelTimestamp (nanosecond), SeverityText + SeverityNumber (1-24 per spec), Body, TraceId, SpanId, exception.*.
    • GCPseverity (uppercase), message, logging.googleapis.com/trace, logging.googleapis.com/spanId, structured sourceLocation.
    • Datadogstatus, message, dd.trace_id, dd.span_id.
    • nova (default) preserves the current shape and now emits RFC 3339 timestamps by default.
  • Structured error object from crash_report (SASL convention) or class / reason / stacktrace meta — #{type, reason, message, stacktrace => [#{mfa, file, line}]} rendered to each schema's conventions.

  • Body-path redaction#{redact => [[req, headers, authorization], [user, password]]} walks nested maps and replaces matched paths with <<\"[REDACTED]\">>. Missing paths are no-ops.

  • Term-size guardsmax_term_size and max_string_length (default 8192 bytes) cap ~0p-formatted output and binary/string values, appending \"...[truncated]\" to truncated payloads. Prevents a 10 MB term from blowing the log line.

  • Existing key_mapping, format_funs, meta_with, meta_without, new_line[_type] keep working and now run after the schema layer so users can override anything the schema rendered.

Trace correlation

trace_id / span_id are picked up from process metadata. Upstream opentelemetry_nova PR #10 (already merged) writes them in pre_request, so once it's pulled the keys land under each schema's conventional names automatically. No OTel dependency in nova_jsonlogger itself.

Pipeline

merge_meta -> extract_error -> redact -> apply_schema ->
apply_key_mapping -> apply_format_funs -> pre_encode (with caps)

Behaviour change

Meta time (microseconds) is now emitted as an RFC 3339 binary by default under every schema. Callers using format_funs => #{time => fun(Int) -> Int end} should switch to a binary-accepting fun, or pass #{time => fun(B) -> B end} to keep the value untouched.

Test plan

  • 35 eunit cases (17 new) all passing
  • Full nova eunit suite (364 tests) passes
  • xref clean
  • dialyzer clean
  • New tests cover: each schema's rendering, source location (ECS + GCP), severity number mapping, stacktrace extraction, error rendering under ECS + OTel, top-level + nested + missing-path redaction, binary + term + string caps.

Taure added 3 commits May 19, 2026 20:54
…caps

Adds the Tier 1 bundle for nova_jsonlogger:

* `schema => nova | ecs | otel | gcp | datadog` config preset.
  - ECS:     @timestamp, log.level, message, trace.id, span.id,
             error.type/message/stack_trace, log.origin.* source location.
  - OTel:    Timestamp (nanosecond), SeverityText + SeverityNumber (1-24),
             Body, TraceId, SpanId, exception.type/message/stacktrace.
  - GCP:     severity (uppercase), message, logging.googleapis.com/trace,
             logging.googleapis.com/spanId, sourceLocation object.
  - Datadog: status, message, dd.trace_id, dd.span_id.
  - nova (default) preserves current behaviour and now emits RFC 3339
    timestamps by default.
* Structured `error` object from `crash_report` or `class/reason/stacktrace`
  meta, with per-schema rendering (error.* under ECS, exception.* under OTel).
* Body-path redaction: `#{redact => [[req, headers, authorization], ...]}`
  walks nested maps and replaces matching paths with `<<"[REDACTED]">>`.
* Term-size guards: `max_term_size` and `max_string_length` (default 8192)
  prevent ~0p formatting and binary expansion from blowing memory; truncated
  values get a "...[truncated]" marker appended.
* Existing `key_mapping`, `format_funs`, `meta_with`, `meta_without`,
  `new_line[_type]` config keys still work and run after the schema layer
  so users can override anything the schema rendered.

Pipeline: merge_meta -> extract_error -> redact -> apply_schema ->
apply_key_mapping -> apply_format_funs -> pre_encode (with caps).

Trace correlation: `trace_id` / `span_id` are picked up from process
metadata (opentelemetry_nova writes these in pre_request as of upstream
PR novaframework#10) and rendered under each schema's conventional key names.

Behaviour change: meta `time` (microseconds) is now emitted as an RFC 3339
binary by default. Callers that previously formatted `time` via a custom
`format_funs` entry should switch to a fun that takes a binary, or set
`format_funs => #{time => fun(B) -> B end}` to keep the value untouched.

35 eunit cases (17 new), all passing. xref + dialyzer clean.
* New `guides/logging.md` covering setup, schemas, severity numbers,
  source location, structured errors, trace correlation, redaction,
  size caps, hooks, and a "choosing a schema" cheat sheet.
* Wire the guide into `guides/README.md` (under "Deeper into Nova")
  and `rebar.config` ex_doc extras.
* Add `-moduledoc` / `-doc` attributes on `nova_jsonlogger`, `format/2`,
  and the public timestamp helpers using the OTP doc format.
@Taure Taure requested a review from burbas May 19, 2026 19:04
Copy link
Copy Markdown
Contributor

@burbas burbas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Think it looks good except for the hard-coded thoas-lib usage :-). Fix that and it's A-Ok!

Comment thread src/nova_jsonlogger.erl Outdated
<<"{\"level\":\"error\",\"report\":\"[{hej,\\\"hopp\\\"}]\",\"time\":1}">>,
format(ErrorReport, #{})
).
{ok, Decoded} = thoas:decode(format(ErrorReport, #{})),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we should call nova to see which json-lib to use.

The eunit block lived inside src/nova_jsonlogger.erl and hard-coded
thoas:decode/1 throughout. Move it to test/nova_jsonlogger_tests.erl
and resolve the JSON decoder through nova:get_env(json_lib, thoas) so
the tests follow the same configurable pattern as the production
encode/2 path.

Internal helpers touched by tests (jsonify/3, cap_binary/2,
severity_number/1) get a TEST-only export.
Taure added a commit to Taure/triagebot that referenced this pull request May 24, 2026
…request (#10)

Three changes that go together:

1. Revert the io:format debug code in triagebot_webhook_controller back
   to structured ?LOG_ERROR / ?LOG_WARNING / ?LOG_INFO calls. The
   io:format escape hatch was added in v0.1.7 to debug the webhook
   secret mismatch when nova_jsonlogger was silently dropping the
   diagnostic maps - that root cause is now fixed upstream.

2. Pin nova to Taure/nova feat/jsonlogger-tier1 (the open PR
   novaframework/nova#385). The tier1 jsonlogger does structured
   error extraction from class/reason/stacktrace meta and applies
   term-size caps, so the stacktrace field that previously broke the
   formatter is now rendered safely. TODO comment in rebar.config
   to swap to a tag once the upstream PR lands.

3. Opt the prod config into the ECS schema with term and string caps
   (8192 bytes), plus drop logger_level back from debug to info now
   that the active debug round is over.

4. When triagebot's label_proposer asks for the claude-try label,
   post a follow-up @claude implement... comment so the
   claude-code-action workflow in the target repo picks it up
   without the maintainer needing to apply the label manually. The
   action listens on issues.labeled too, but an explicit @claude
   mention also makes the chain visible in the issue thread.
Route JSON encoding/decoding through a single configurable ?JSONLIB
macro in nova.hrl instead of resolving the library at each call site.
Addresses review feedback on hard-coded json-lib usage.
@Taure
Copy link
Copy Markdown
Collaborator Author

Taure commented May 26, 2026

Good catch. Removed the hard-coded json-lib usage: introduced a shared ?JSONLIB macro in include/nova.hrl ((nova:get_env(json_lib, thoas))) and routed nova_jsonlogger encoding plus all the tests through it, so the library is fetched from config in one place rather than resolved at each call site. Default lib is unchanged. eunit (364) + xref + dialyzer all green.

Copy link
Copy Markdown
Contributor

@burbas burbas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very picky now, but I think you can call the module directly instead of adding a macro for it :)

Comment thread include/nova.hrl Outdated
…acro

Inline nova:get_env(json_lib, thoas) at the call site instead of the
shared ?JSONLIB macro in nova.hrl, per review feedback. Tests use a
local json_lib/0 helper.
Copy link
Copy Markdown
Contributor

@burbas burbas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rewrite function and explain where trace_id and span_id comes from

Comment thread src/nova_jsonlogger.erl Outdated
Acc
end.

type_and_reason() ->
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at this now and I'm a bit unsure why we return a lambda here. Could we not just match in the header and have to clauses? Seems unnecessary to create a lambda here imho.

Comment thread src/nova_jsonlogger.erl
}
```

## Trace correlation
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit unsure where trace_id (And perhaps span_id) is set? I can see that the code tries to do call rename/3, but it's just a rename op - not a set. I might have missed this, but where is this done? :)

…enance

Replace the type_and_reason/0 lambda + polymorphic report_value/4 helper
with direct error_info/1 clauses and a plain take_into/3, per review.

Rewrite the trace-correlation docs to be precise: the formatter relocates
trace_id/span_id from log metadata under each schema's conventional key but
never generates them; populate upstream via logger:update_process_metadata/1
or the optional opentelemetry_nova plugin (not a nova dependency). Add a
code comment at the schema layer noting rename/3 is a no-op when absent.

Tests: trace ids surface from metadata and are not invented when absent;
crash_report extraction (previously untested) for both tuple and non-tuple
error_info.
@Taure
Copy link
Copy Markdown
Collaborator Author

Taure commented May 26, 2026

Good questions, both addressed in aad5296.

trace_id/span_id: you're right that rename/3 only relocates, never sets. That's deliberate - the formatter doesn't generate trace context. The keys arrive in the log record's metadata, merge_meta surfaces them, and rename/3 moves them to each schema's conventional name (trace.id, TraceId, ...), no-op when absent. The producer lives upstream: opentelemetry_nova #10 writes the hex ids into logger metadata in pre_request, or you call logger:update_process_metadata/1 yourself. nova has no OTel dependency, so I reworded the moduledoc to stop implying it's automatic and added a comment at the schema layer. Two tests now lock it: ids surface from metadata, and nothing is invented when they're absent.

normalise_crash_report: dropped the lambda and the polymorphic report_value/4. error_info/1 pattern-matches the {Class, Reason, Stack} tuple directly in the head, with a small take_into/3 for registered_name/pid. Added crash_report tests too - that path had none.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants