From 4d7d284cebea8912adb9b46d816d833cb1635d09 Mon Sep 17 00:00:00 2001 From: Matthias Vallentin Date: Thu, 28 May 2026 17:47:07 +0200 Subject: [PATCH 1/4] Document read_auto operator Add the reference entry for automatic reader detection, including strict detection behavior, fallback modes, probe limits, and examples. Assisted-by: GPT-5 (pi) --- src/content/docs/reference/operators.mdx | 12 ++ .../docs/reference/operators/read_auto.mdx | 109 ++++++++++++++++++ 2 files changed, 121 insertions(+) create mode 100644 src/content/docs/reference/operators/read_auto.mdx diff --git a/src/content/docs/reference/operators.mdx b/src/content/docs/reference/operators.mdx index 15c5042e6..0a76d91d4 100644 --- a/src/content/docs/reference/operators.mdx +++ b/src/content/docs/reference/operators.mdx @@ -515,6 +515,10 @@ operators: description: 'Parses an incoming bytes stream into a single event.' example: 'read_all binary=true' path: 'reference/operators/read_all' + - name: 'read_auto' + description: 'Detects the input format of a byte stream and selects a matching reader.' + example: 'read_auto fallback="lines"' + path: 'reference/operators/read_auto' - name: 'read_bitz' description: 'Parses bytes as *BITZ* format.' example: 'read_bitz' @@ -2157,6 +2161,14 @@ read_all binary=true + + +```tql +read_auto fallback="lines" +``` + + + ```tql diff --git a/src/content/docs/reference/operators/read_auto.mdx b/src/content/docs/reference/operators/read_auto.mdx new file mode 100644 index 000000000..afab7476f --- /dev/null +++ b/src/content/docs/reference/operators/read_auto.mdx @@ -0,0 +1,109 @@ +--- +title: read_auto +category: Parsing +example: 'read_auto fallback="lines"' +--- + +Detects the input format of a byte stream and selects a matching reader. + +```tql +read_auto [fallback=string, max_probe_bytes=uint] +``` + +## Description + +The `read_auto` operator probes the first bytes of its input and starts the +reader whose detector returns the best unique match. Use it when the input format +is unknown at authoring time, but should still be one of Tenzir's structured +formats. + +By default, detection is strict. If no detector matches, or if multiple +detectors match with the same score, `read_auto` emits an error instead of +falling back to a generic text reader. + +The built-in detectors cover common JSON, delimited text, security log, and +magic-byte formats, including NDJSON, JSON objects, JSON arrays of objects, CSV, +TSV, SSV, key-value text, YAML, Syslog, CEF, LEEF, Zeek TSV, Suricata EVE JSON, +Zeek JSON, GELF, PCAP, Feather, BITZ, and Parquet. + +### `fallback = string (optional)` + +Controls what happens when no detector matches. + +Valid values are: + +- `"none"`: Emit an error. This is the default. +- `"lines"`: Use read_lines. The input must be valid UTF-8. +- `"all"`: Use read_all. If the input is not valid UTF-8, `read_auto` + uses `read_all binary=true`. + +### `max_probe_bytes = uint (optional)` + +The maximum number of bytes to inspect before forcing a detection decision. + +Defaults to `1048576` bytes. + +## Examples + +### Detect JSON lines + +Given this input: + +```json title="events.ndjson" +{"x":1} +{"x":2} +``` + +Use `read_auto` where you would normally use a concrete reader: + +```tql +load "events.ndjson" +read_auto +``` + +```tql +{x: 1} +{x: 2} +``` + +### Fall back to lines + +For arbitrary UTF-8 text, opt into line-based parsing explicitly: + +```txt title="messages.txt" +hello +world +``` + +```tql +load "messages.txt" +read_auto fallback="lines" +``` + +```tql +{line: "hello"} +{line: "world"} +``` + +### Fall back to a single event + +Use `fallback="all"` when unknown input should become one event instead of one +event per line: + +```tql +load "payload.bin" +read_auto fallback="all" +``` + +If the input is binary, the resulting event contains a `blob` value in the +`data` field. + +## See Also + +- read_all +- read_csv +- read_json +- read_lines +- read_ndjson +- read_syslog +- read_yaml From 84fd14129aee98ac3d43b29fbb11184905faf36a Mon Sep 17 00:00:00 2001 From: Matthias Vallentin Date: Thu, 28 May 2026 18:08:22 +0200 Subject: [PATCH 2/4] Clarify read_auto fallback probing State that fallback=all chooses text or binary mode from the current probe bytes, not from the entire stream. Point users with binary payloads that start with a UTF-8 prefix to a larger probe or direct read_all binary mode. Assisted-by: GPT-5 (pi) --- src/content/docs/reference/operators/read_auto.mdx | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/src/content/docs/reference/operators/read_auto.mdx b/src/content/docs/reference/operators/read_auto.mdx index afab7476f..6b914062b 100644 --- a/src/content/docs/reference/operators/read_auto.mdx +++ b/src/content/docs/reference/operators/read_auto.mdx @@ -34,8 +34,12 @@ Valid values are: - `"none"`: Emit an error. This is the default. - `"lines"`: Use read_lines. The input must be valid UTF-8. -- `"all"`: Use read_all. If the input is not valid UTF-8, `read_auto` - uses `read_all binary=true`. +- `"all"`: Use read_all. `read_auto` uses the current probe to + choose between text and binary output: valid UTF-8 probe bytes select + `read_all`, while invalid probe bytes select `read_all binary=true`. If + binary input can start with a valid UTF-8 prefix longer than + `max_probe_bytes`, use a larger probe limit or read_all with + `binary=true` directly. ### `max_probe_bytes = uint (optional)` From a7f2ace569094fa10649439a85acf0ea48a8cb1a Mon Sep 17 00:00:00 2001 From: Matthias Vallentin Date: Thu, 28 May 2026 18:16:33 +0200 Subject: [PATCH 3/4] Use valid TQL in read_auto examples Replace the invalid load snippets with from_file subpipelines, matching the documented file-reading syntax for parsing byte streams. Assisted-by: GPT-5 (pi) --- .../docs/reference/operators/read_auto.mdx | 15 +++++++++------ 1 file changed, 9 insertions(+), 6 deletions(-) diff --git a/src/content/docs/reference/operators/read_auto.mdx b/src/content/docs/reference/operators/read_auto.mdx index 6b914062b..9689e5f19 100644 --- a/src/content/docs/reference/operators/read_auto.mdx +++ b/src/content/docs/reference/operators/read_auto.mdx @@ -61,8 +61,9 @@ Given this input: Use `read_auto` where you would normally use a concrete reader: ```tql -load "events.ndjson" -read_auto +from_file "events.ndjson" { + read_auto +} ``` ```tql @@ -80,8 +81,9 @@ world ``` ```tql -load "messages.txt" -read_auto fallback="lines" +from_file "messages.txt" { + read_auto fallback="lines" +} ``` ```tql @@ -95,8 +97,9 @@ Use `fallback="all"` when unknown input should become one event instead of one event per line: ```tql -load "payload.bin" -read_auto fallback="all" +from_file "payload.bin" { + read_auto fallback="all" +} ``` If the input is binary, the resulting event contains a `blob` value in the From 00671547bfef0938e9980365a6ce4c6eedce88ba Mon Sep 17 00:00:00 2001 From: Matthias Vallentin Date: Thu, 28 May 2026 18:28:02 +0200 Subject: [PATCH 4/4] Use SI literal in read_auto docs Document the default probe limit as 1Mi to match the TQL spelling users can configure. Assisted-by: GPT-5 (pi) --- src/content/docs/reference/operators/read_auto.mdx | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/content/docs/reference/operators/read_auto.mdx b/src/content/docs/reference/operators/read_auto.mdx index 9689e5f19..b8d3617f1 100644 --- a/src/content/docs/reference/operators/read_auto.mdx +++ b/src/content/docs/reference/operators/read_auto.mdx @@ -45,7 +45,7 @@ Valid values are: The maximum number of bytes to inspect before forcing a detection decision. -Defaults to `1048576` bytes. +Defaults to `1Mi` bytes. ## Examples