feat(jsonlogger): schema presets, structured errors, redaction, term caps#385
feat(jsonlogger): schema presets, structured errors, redaction, term caps#385Taure wants to merge 7 commits into
Conversation
…caps Adds the Tier 1 bundle for nova_jsonlogger: * `schema => nova | ecs | otel | gcp | datadog` config preset. - ECS: @timestamp, log.level, message, trace.id, span.id, error.type/message/stack_trace, log.origin.* source location. - OTel: Timestamp (nanosecond), SeverityText + SeverityNumber (1-24), Body, TraceId, SpanId, exception.type/message/stacktrace. - GCP: severity (uppercase), message, logging.googleapis.com/trace, logging.googleapis.com/spanId, sourceLocation object. - Datadog: status, message, dd.trace_id, dd.span_id. - nova (default) preserves current behaviour and now emits RFC 3339 timestamps by default. * Structured `error` object from `crash_report` or `class/reason/stacktrace` meta, with per-schema rendering (error.* under ECS, exception.* under OTel). * Body-path redaction: `#{redact => [[req, headers, authorization], ...]}` walks nested maps and replaces matching paths with `<<"[REDACTED]">>`. * Term-size guards: `max_term_size` and `max_string_length` (default 8192) prevent ~0p formatting and binary expansion from blowing memory; truncated values get a "...[truncated]" marker appended. * Existing `key_mapping`, `format_funs`, `meta_with`, `meta_without`, `new_line[_type]` config keys still work and run after the schema layer so users can override anything the schema rendered. Pipeline: merge_meta -> extract_error -> redact -> apply_schema -> apply_key_mapping -> apply_format_funs -> pre_encode (with caps). Trace correlation: `trace_id` / `span_id` are picked up from process metadata (opentelemetry_nova writes these in pre_request as of upstream PR novaframework#10) and rendered under each schema's conventional key names. Behaviour change: meta `time` (microseconds) is now emitted as an RFC 3339 binary by default. Callers that previously formatted `time` via a custom `format_funs` entry should switch to a fun that takes a binary, or set `format_funs => #{time => fun(B) -> B end}` to keep the value untouched. 35 eunit cases (17 new), all passing. xref + dialyzer clean.
* New `guides/logging.md` covering setup, schemas, severity numbers, source location, structured errors, trace correlation, redaction, size caps, hooks, and a "choosing a schema" cheat sheet. * Wire the guide into `guides/README.md` (under "Deeper into Nova") and `rebar.config` ex_doc extras. * Add `-moduledoc` / `-doc` attributes on `nova_jsonlogger`, `format/2`, and the public timestamp helpers using the OTP doc format.
burbas
left a comment
There was a problem hiding this comment.
Think it looks good except for the hard-coded thoas-lib usage :-). Fix that and it's A-Ok!
| <<"{\"level\":\"error\",\"report\":\"[{hej,\\\"hopp\\\"}]\",\"time\":1}">>, | ||
| format(ErrorReport, #{}) | ||
| ). | ||
| {ok, Decoded} = thoas:decode(format(ErrorReport, #{})), |
There was a problem hiding this comment.
Here we should call nova to see which json-lib to use.
The eunit block lived inside src/nova_jsonlogger.erl and hard-coded thoas:decode/1 throughout. Move it to test/nova_jsonlogger_tests.erl and resolve the JSON decoder through nova:get_env(json_lib, thoas) so the tests follow the same configurable pattern as the production encode/2 path. Internal helpers touched by tests (jsonify/3, cap_binary/2, severity_number/1) get a TEST-only export.
…request (#10) Three changes that go together: 1. Revert the io:format debug code in triagebot_webhook_controller back to structured ?LOG_ERROR / ?LOG_WARNING / ?LOG_INFO calls. The io:format escape hatch was added in v0.1.7 to debug the webhook secret mismatch when nova_jsonlogger was silently dropping the diagnostic maps - that root cause is now fixed upstream. 2. Pin nova to Taure/nova feat/jsonlogger-tier1 (the open PR novaframework/nova#385). The tier1 jsonlogger does structured error extraction from class/reason/stacktrace meta and applies term-size caps, so the stacktrace field that previously broke the formatter is now rendered safely. TODO comment in rebar.config to swap to a tag once the upstream PR lands. 3. Opt the prod config into the ECS schema with term and string caps (8192 bytes), plus drop logger_level back from debug to info now that the active debug round is over. 4. When triagebot's label_proposer asks for the claude-try label, post a follow-up @claude implement... comment so the claude-code-action workflow in the target repo picks it up without the maintainer needing to apply the label manually. The action listens on issues.labeled too, but an explicit @claude mention also makes the chain visible in the issue thread.
Route JSON encoding/decoding through a single configurable ?JSONLIB macro in nova.hrl instead of resolving the library at each call site. Addresses review feedback on hard-coded json-lib usage.
|
Good catch. Removed the hard-coded json-lib usage: introduced a shared |
burbas
left a comment
There was a problem hiding this comment.
Very picky now, but I think you can call the module directly instead of adding a macro for it :)
…acro Inline nova:get_env(json_lib, thoas) at the call site instead of the shared ?JSONLIB macro in nova.hrl, per review feedback. Tests use a local json_lib/0 helper.
burbas
left a comment
There was a problem hiding this comment.
Rewrite function and explain where trace_id and span_id comes from
| Acc | ||
| end. | ||
|
|
||
| type_and_reason() -> |
There was a problem hiding this comment.
Looking at this now and I'm a bit unsure why we return a lambda here. Could we not just match in the header and have to clauses? Seems unnecessary to create a lambda here imho.
| } | ||
| ``` | ||
|
|
||
| ## Trace correlation |
There was a problem hiding this comment.
I'm a bit unsure where trace_id (And perhaps span_id) is set? I can see that the code tries to do call rename/3, but it's just a rename op - not a set. I might have missed this, but where is this done? :)
…enance Replace the type_and_reason/0 lambda + polymorphic report_value/4 helper with direct error_info/1 clauses and a plain take_into/3, per review. Rewrite the trace-correlation docs to be precise: the formatter relocates trace_id/span_id from log metadata under each schema's conventional key but never generates them; populate upstream via logger:update_process_metadata/1 or the optional opentelemetry_nova plugin (not a nova dependency). Add a code comment at the schema layer noting rename/3 is a no-op when absent. Tests: trace ids surface from metadata and are not invented when absent; crash_report extraction (previously untested) for both tuple and non-tuple error_info.
|
Good questions, both addressed in aad5296. trace_id/span_id: you're right that normalise_crash_report: dropped the lambda and the polymorphic |
Summary
Tier 1 modernization of
nova_jsonloggerto bring it in line with what shops expect from a 2026 structured-JSON logger.Schema presets via
#{schema => nova | ecs | otel | gcp | datadog}config:@timestamp,log.level,message,trace.id,span.id,error.type/error.message/error.stack_trace,log.origin.*for source location.Timestamp(nanosecond),SeverityText+SeverityNumber(1-24 per spec),Body,TraceId,SpanId,exception.*.severity(uppercase),message,logging.googleapis.com/trace,logging.googleapis.com/spanId, structuredsourceLocation.status,message,dd.trace_id,dd.span_id.Structured error object from
crash_report(SASL convention) orclass/reason/stacktracemeta —#{type, reason, message, stacktrace => [#{mfa, file, line}]}rendered to each schema's conventions.Body-path redaction —
#{redact => [[req, headers, authorization], [user, password]]}walks nested maps and replaces matched paths with<<\"[REDACTED]\">>. Missing paths are no-ops.Term-size guards —
max_term_sizeandmax_string_length(default 8192 bytes) cap~0p-formatted output and binary/string values, appending\"...[truncated]\"to truncated payloads. Prevents a 10 MB term from blowing the log line.Existing
key_mapping,format_funs,meta_with,meta_without,new_line[_type]keep working and now run after the schema layer so users can override anything the schema rendered.Trace correlation
trace_id/span_idare picked up from process metadata. Upstream opentelemetry_nova PR #10 (already merged) writes them inpre_request, so once it's pulled the keys land under each schema's conventional names automatically. No OTel dependency in nova_jsonlogger itself.Pipeline
Behaviour change
Meta
time(microseconds) is now emitted as an RFC 3339 binary by default under every schema. Callers usingformat_funs => #{time => fun(Int) -> Int end}should switch to a binary-accepting fun, or pass#{time => fun(B) -> B end}to keep the value untouched.Test plan