Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 12 additions & 0 deletions src/content/docs/reference/operators.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -515,6 +515,10 @@ operators:
description: 'Parses an incoming bytes stream into a single event.'
example: 'read_all binary=true'
path: 'reference/operators/read_all'
- name: 'read_auto'
description: 'Detects the input format of a byte stream and selects a matching reader.'
example: 'read_auto fallback="lines"'
path: 'reference/operators/read_auto'
- name: 'read_bitz'
description: 'Parses bytes as *BITZ* format.'
example: 'read_bitz'
Expand Down Expand Up @@ -2157,6 +2161,14 @@ read_all binary=true

</ReferenceCard>

<ReferenceCard title="read_auto" description="Detects the input format of a byte stream and selects a matching reader." href="/reference/operators/read_auto">

```tql
read_auto fallback="lines"
```

</ReferenceCard>

<ReferenceCard title="read_bitz" description="Parses bytes as *BITZ* format." href="/reference/operators/read_bitz">

```tql
Expand Down
116 changes: 116 additions & 0 deletions src/content/docs/reference/operators/read_auto.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,116 @@
---
title: read_auto
category: Parsing
example: 'read_auto fallback="lines"'
---

Detects the input format of a byte stream and selects a matching reader.

```tql
read_auto [fallback=string, max_probe_bytes=uint]
```

## Description

The `read_auto` operator probes the first bytes of its input and starts the
reader whose detector returns the best unique match. Use it when the input format
is unknown at authoring time, but should still be one of Tenzir's structured
formats.

By default, detection is strict. If no detector matches, or if multiple
detectors match with the same score, `read_auto` emits an error instead of
falling back to a generic text reader.

The built-in detectors cover common JSON, delimited text, security log, and
magic-byte formats, including NDJSON, JSON objects, JSON arrays of objects, CSV,
TSV, SSV, key-value text, YAML, Syslog, CEF, LEEF, Zeek TSV, Suricata EVE JSON,
Zeek JSON, GELF, PCAP, Feather, BITZ, and Parquet.

### `fallback = string (optional)`

Controls what happens when no detector matches.

Valid values are:

- `"none"`: Emit an error. This is the default.
- `"lines"`: Use <Op>read_lines</Op>. The input must be valid UTF-8.
- `"all"`: Use <Op>read_all</Op>. `read_auto` uses the current probe to
choose between text and binary output: valid UTF-8 probe bytes select
`read_all`, while invalid probe bytes select `read_all binary=true`. If
binary input can start with a valid UTF-8 prefix longer than
`max_probe_bytes`, use a larger probe limit or <Op>read_all</Op> with
`binary=true` directly.

### `max_probe_bytes = uint (optional)`

The maximum number of bytes to inspect before forcing a detection decision.

Defaults to `1Mi` bytes.

## Examples

### Detect JSON lines

Given this input:

```json title="events.ndjson"
{"x":1}
{"x":2}
```

Use `read_auto` where you would normally use a concrete reader:

```tql
from_file "events.ndjson" {
read_auto
}
```

```tql
{x: 1}
{x: 2}
```

### Fall back to lines

For arbitrary UTF-8 text, opt into line-based parsing explicitly:

```txt title="messages.txt"
hello
world
```

```tql
from_file "messages.txt" {
read_auto fallback="lines"
}
```

```tql
{line: "hello"}
{line: "world"}
```

### Fall back to a single event

Use `fallback="all"` when unknown input should become one event instead of one
event per line:

```tql
from_file "payload.bin" {
read_auto fallback="all"
}
```

If the input is binary, the resulting event contains a `blob` value in the
`data` field.

## See Also

- <Op>read_all</Op>
- <Op>read_csv</Op>
- <Op>read_json</Op>
- <Op>read_lines</Op>
- <Op>read_ndjson</Op>
- <Op>read_syslog</Op>
- <Op>read_yaml</Op>
Loading