Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 19 additions & 3 deletions docs/api-reference/configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -172,11 +172,15 @@ RDMA transport settings:

### Flows

`rx.flows:` — Flow rules that steer packets to specific queues based on match criteria.
`rx.flows:` — Static startup flow rules that steer packets to specific queues based on
match criteria. This sequence may be omitted; a queues-only RX config can add DPDK RX
flows later with the dynamic flow API.

- **`name`**: Flow name.
- type: `string`
- **`id`**: Flow ID. Retrievable at runtime via `get_packet_flow_id()`.
- **`id`**: Non-zero static flow ID. Retrievable at runtime via `get_packet_flow_id()`.
Static IDs are reserved for the lifetime of the process and are not deletable through
the dynamic flow API.
- type: `integer`
- **`action`**: What to do with matched packets.
- **`type`**: Action type. Only `queue` is currently supported.
Expand All @@ -201,11 +205,23 @@ RDMA transport settings:
### Flow Isolation

`rx.flow_isolation:` — When `true`, only packets matching an explicit flow rule are delivered.
Unmatched packets are dropped. When `false`, unmatched packets go to a default queue.
Unmatched packets are dropped. When `false`, unmatched packets go to a default queue. A
queues-only config can set `flow_isolation: true` and then install dynamic RX flows after
`daqiri_init()`.

- type: `boolean`
- default: `false`

### Dynamic Flow Capacity

`rx.dynamic_flow_capacity:` — DPDK template-table capacity reserved for dynamic RX flow
rules on this interface. DAQIRI uses this when the DPDK template/async fast path is
available; legacy fallback paths still accept dynamic RX flow operations but do not use a
template table.

- type: `integer`
- default: `1024`

### Hardware Timestamps

`rx.hardware_timestamps:` — Enable per-packet hardware RX timestamps for Raw Ethernet
Expand Down
99 changes: 97 additions & 2 deletions docs/api-reference/cpp.md
Original file line number Diff line number Diff line change
Expand Up @@ -77,7 +77,7 @@ For a single-segment configuration (CPU-only or batched GPU):
for (int i = 0; i < daqiri::get_num_packets(burst); i++) {
void *pkt = daqiri::get_packet_ptr(burst, i);
uint32_t len = daqiri::get_packet_length(burst, i);
uint16_t flow = daqiri::get_packet_flow_id(burst, i);
daqiri::FlowId flow = daqiri::get_packet_flow_id(burst, i);
uint64_t rx_ts_ns = 0;
if (daqiri::get_packet_rx_timestamp(burst, i, &rx_ts_ns) == daqiri::Status::SUCCESS) {
// rx_ts_ns is in the NIC timestamp clock domain.
Expand Down Expand Up @@ -130,6 +130,92 @@ daqiri::free_all_segment_packets(burst, seg);
daqiri::free_rx_burst(burst);
```

## Dynamic RX Flows

DPDK RX flows can be added and deleted after `daqiri_init()`. This supports
queues-only startup configs, including `rx.flow_isolation: true` with no
initial `rx.flows`. Static YAML flows still use explicit configured IDs and are
not deletable through this API.

```cpp
daqiri::FlowRuleConfig flow;
flow.name_ = "udp_5000";
flow.action_.type_ = daqiri::FlowType::QUEUE;
flow.action_.id_ = 0;
flow.match_.type_ = daqiri::FlowMatchType::IPV4_UDP;
flow.match_.udp_dst_ = 5000;

daqiri::FlowOpId add_op = 0;
auto st = daqiri::add_rx_flow_async(0, flow, &add_op);
if (st != daqiri::Status::SUCCESS) {
// invalid port/queue/match, unsupported backend, or no flow IDs available
}

daqiri::FlowId flow_id = 0;
daqiri::FlowOpResult result;
while (flow_id == 0) {
st = daqiri::poll_flow_op(&result);
if (st == daqiri::Status::NOT_READY) {
continue;
}
if (st != daqiri::Status::SUCCESS) {
// handle poll error
break;
}
if (result.op_id_ == add_op) {
if (result.status_ != daqiri::Status::SUCCESS) {
// handle flow create failure
break;
}
flow_id = result.flow_id_;
}
}
```

Packets matching a dynamic rule are marked with the same `FlowId` returned by
the add completion, so `get_packet_flow_id()` gives the handle to pass to
`delete_flow_async()`. `poll_flow_op()` returns `Status::NOT_READY` when no flow
operation has completed yet.

Multiple RX flows can be added as one operation. This maps to a single DPDK
template queue push when the IPv4/UDP template path is available, and
`poll_flow_op()` returns one batch completion when all creates in the batch have
resolved.

```cpp
std::vector<daqiri::FlowRuleConfig> flows;
flows.push_back(flow);
flows.push_back(flow);
flows.back().name_ = "udp_5001";
flows.back().match_.udp_dst_ = 5001;

daqiri::FlowOpId batch_op = 0;
st = daqiri::add_rx_flows_async(0, flows, &batch_op);

std::vector<daqiri::FlowId> flow_ids;
while (flow_ids.empty()) {
st = daqiri::poll_flow_op(&result);
if (st == daqiri::Status::NOT_READY) {
continue;
}
if (st == daqiri::Status::SUCCESS && result.op_id_ == batch_op) {
flow_ids = result.flow_ids_;
}
}
```

For batch completions, `flow_ids_` is in the same order as the input rules. If
the completion status is not `SUCCESS`, nonzero entries were installed and zero
entries were not installed.

```cpp
daqiri::FlowOpId delete_op = 0;
auto delete_status = daqiri::delete_flow_async(flow_id, &delete_op);
```

Dynamic flow support is RX-only in v1. Socket, RDMA, and software loopback
managers return `NOT_SUPPORTED`.

## Reordered RX Bursts

For an overview of what RX reorder is and when to use it, see
Expand Down Expand Up @@ -411,9 +497,18 @@ workflow sections above show the common call order and ownership rules.
| `get_segment_packet_ptr(burst, seg, idx)` | Return a packet pointer for a specific segment. |
| `get_packet_length(burst, idx)` | Return the logical packet length. |
| `get_segment_packet_length(burst, seg, idx)` | Return the length of one packet segment. |
| `get_packet_flow_id(burst, idx)` | Return the matched flow ID, or `0` when no flow matched. |
| `get_packet_flow_id(burst, idx)` | Return the matched `FlowId`, or `0` when no flow matched. |
| `get_packet_rx_timestamp(burst, idx, &timestamp_ns)` | Return the hardware RX timestamp when enabled and available. |

### Dynamic RX Flow Lifecycle

| Function | Purpose |
| --- | --- |
| `add_rx_flow_async(port, flow, &op_id)` | Enqueue a dynamic RX flow create. The add completion returns the allocated `FlowId`. |
| `add_rx_flows_async(port, flows, &op_id)` | Enqueue a dynamic RX flow batch create. One completion returns allocated `FlowId`s in input order. |
| `delete_flow_async(flow_id, &op_id)` | Enqueue deletion of an active dynamic flow. Static YAML flows and unknown IDs return `INVALID_PARAMETER`. |
| `poll_flow_op(&result)` | Return one completed flow operation, or `NOT_READY` when none are ready. |

### RX and Reorder

| Function | Purpose |
Expand Down
9 changes: 8 additions & 1 deletion docs/api-reference/python.md
Original file line number Diff line number Diff line change
Expand Up @@ -573,6 +573,10 @@ The workflow sections above show the common call order and ownership rules.
| `drop_all_traffic(port)` | Install a high-priority drop rule on a port. |
| `allow_all_traffic(port)` | Remove a drop rule installed by `drop_all_traffic`. |
| `flush_port_queue(port, queue)` | Drain stale packets from a port queue. |
| `add_rx_flow_async(port, flow)` | Return `(Status, op_id)` after enqueueing one dynamic RX flow create. |
| `add_rx_flows_async(port, flows)` | Return `(Status, op_id)` after enqueueing a dynamic RX flow batch create. One completion returns `flow_ids` in input order. |
| `delete_flow_async(flow_id)` | Return `(Status, op_id)` after enqueueing deletion of one dynamic flow. |
| `poll_flow_op()` | Return `(Status, FlowOpResult)`, or `NOT_READY` when no dynamic flow operation has completed. |
| `socket_connect_to_server(server_addr, server_port[, src_addr])` | Return `(Status, conn_id)`. |
| `socket_get_port_queue(conn_id)` | Return `(Status, port, queue)`. |
| `socket_get_server_conn_id(server_addr, server_port)` | Return `(Status, conn_id)`. |
Expand Down Expand Up @@ -614,7 +618,8 @@ The workflow sections above show the common call order and ownership rules.
| `RDMATransportMode` | `RC`, `UC`, `UD`, `INVALID` |
| `SocketMode` | `CLIENT`, `SERVER`, `INVALID` |
| `FlowType` | `QUEUE` |
| `FlowMatchType` | `NORMAL`, `FLEX_ITEM` |
| `FlowMatchType` | `IPV4_UDP`, `FLEX_ITEM` |
| `FlowOpType` | `ADD_RX`, `ADD_RX_BATCH`, `DELETE` |
| `ReorderMethod` | `INVALID`, `SEQ_BATCH_NUMBER`, `SEQ_PACKETS_PER_BATCH` |
| `ReorderDataType` | `SAME`, `INT4`, `INT8`, `INT16`, `INT32`, `FP16`, `BF16`, `FP32`, `FP64`, `INVALID` |
| `ReorderEndianness` | `HOST`, `NETWORK`, `INVALID` |
Expand Down Expand Up @@ -647,6 +652,8 @@ names that mostly omit the trailing underscore from the C++ member name (e.g.
| `FlowAction` | Flow action type and target ID. |
| `FlowMatch` | Flow match fields for UDP, IPv4, and flex item matching. |
| `FlowConfig` | Named flow rule combining action and match. |
| `FlowRuleConfig` | Dynamic flow rule match and action. |
| `FlowOpResult` | Dynamic flow operation completion. Batch adds return `flow_ids` in input order. |
| `FlexItemConfig` | Flexible parser item configuration. |
| `FlexItemMatch` | Flexible parser match value and mask. |
| `SocketConfig` | Socket client/server endpoint and timing settings. |
Expand Down
11 changes: 10 additions & 1 deletion docs/concepts.md
Original file line number Diff line number Diff line change
Expand Up @@ -222,7 +222,16 @@ Flow rules are only available in Raw Ethernet (`stream_type: "raw"`).
A flow's match can combine fields such as `udp_src`, `udp_dst`, and
`ipv4_len`; multiple flows can target the same queue, and the matching
flow's ID is available at runtime so the application can distinguish
them. Flows are configured under `rx.flows` in the YAML.
them.

Flows can be static or dynamic. Static flows are configured under
`rx.flows` in the YAML and keep their configured IDs for the process lifetime.
Dynamic RX flows are added after `daqiri_init()` with `add_rx_flow_async()` or
`add_rx_flows_async()`; their non-zero `FlowId`s are allocated by DAQIRI,
returned in the add completion, and used as the packet marks returned by
`get_packet_flow_id()`. Batch adds complete with a single operation result whose
flow IDs are in input order. Only dynamic flows can be deleted dynamically. TX
dynamic flows are not part of v1.

### Flow Steering

Expand Down
5 changes: 3 additions & 2 deletions docs/tutorials/configuration-walkthrough.md
Original file line number Diff line number Diff line change
Expand Up @@ -96,8 +96,9 @@ For a shorter backend-selection guide, start with the [Benchmarking overview](..
??? question "4. I need flow-based load balancing across multiple RX queues"
- **Closed-loop TX+RX with four queues** — [`daqiri_bench_raw_tx_rx_4q.yaml`](https://github.com/nvidia/daqiri/blob/main/examples/daqiri_bench_raw_tx_rx_4q.yaml) (runs on `daqiri_bench_raw_gpudirect`).
- [`daqiri_bench_raw_rx_multi_q.yaml`](https://github.com/nvidia/daqiri/blob/main/examples/daqiri_bench_raw_rx_multi_q.yaml) (runs on `daqiri_bench_raw_gpudirect`).
- **Dynamic RX flow lifecycle** — [`daqiri_example_dynamic_rx_flow.yaml`](https://github.com/nvidia/daqiri/blob/main/examples/daqiri_example_dynamic_rx_flow.yaml) (runs on `daqiri_example_dynamic_rx_flow`). Starts with `flow_isolation: true` and no configured flows, then dynamically routes one UDP flow to RX queue 0 and queue 1 in sequence.

The four-queue TX+RX config is self-contained and maps each `bench_tx`/`bench_rx` list entry to the matching DAQIRI queue. The RX-only config is for an external traffic source. Both demonstrate flow-rule-based routing across multiple RX queues, each pinned to its own CPU core.
The four-queue TX+RX config is self-contained and maps each `bench_tx`/`bench_rx` list entry to the matching DAQIRI queue. The RX-only config is for an external traffic source. The dynamic-flow example demonstrates queues-only startup and runtime flow insertion/deletion. All three demonstrate flow-rule-based routing across multiple RX queues, each pinned to its own CPU core.

*Requires: Raw Ethernet build (`DAQIRI_MGR` includes `dpdk`) + NVIDIA ConnectX-class NIC. The RX-only config also requires a separate TX traffic source.*

Expand Down Expand Up @@ -368,7 +369,7 @@ flows:
udp_dst: 5000
```

1. **`id`** · `integer` · *required* — Flow tag attached to matching packets. Set to a non-zero value here so the `reorder_configs:` block below can reference it via `flow_ids:` to select which packets to reorder.
1. **`id`** · `integer` · *required* — Static flow tag attached to matching packets. Set to a non-zero value here so the `reorder_configs:` block below can reference it via `flow_ids:` to select which packets to reorder. Dynamic RX flows are added after initialization and are not attached to reorder configs in v1.

**The `reorder_configs:` block.** The core of the feature — sits inside the `rx:` section alongside `queues` and `flows`.

Expand Down
3 changes: 3 additions & 0 deletions examples/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,7 @@ set(DAQIRI_BENCH_CONFIGS
daqiri_bench_raw_rx_reorder_seq_batch.yaml
daqiri_bench_raw_rx_multi_q.yaml
daqiri_bench_raw_sw_loopback.yaml
daqiri_example_dynamic_rx_flow.yaml
daqiri_example_gds_write_sw_loopback.yaml
daqiri_example_gds_write_tx_rx.yaml
daqiri_example_pcap_writer_sw_loopback.yaml
Expand Down Expand Up @@ -95,6 +96,7 @@ if(DAQIRI_ENABLE_OTEL_METRICS)
endif()
add_daqiri_raw_bench(daqiri_bench_raw_reorder_seq raw_reorder_seq_bench.cpp)
add_daqiri_raw_bench(daqiri_bench_raw_reorder_quantize raw_reorder_quantize_bench.cpp)
add_daqiri_raw_bench(daqiri_example_dynamic_rx_flow dynamic_rx_flow_example.cpp)
add_daqiri_raw_bench(daqiri_example_gds_write gds_write_example.cpp)
add_daqiri_raw_bench(daqiri_example_pcap_writer pcap_writer_example.cpp)

Expand Down Expand Up @@ -125,6 +127,7 @@ install(TARGETS
daqiri_bench_raw_gpudirect
daqiri_bench_raw_reorder_seq
daqiri_bench_raw_reorder_quantize
daqiri_example_dynamic_rx_flow
daqiri_example_gds_write
daqiri_example_pcap_writer
daqiri_bench_rdma
Expand Down
5 changes: 5 additions & 0 deletions examples/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,9 @@ Standalone benchmark applications for testing performance of DAQIRI with various
- `daqiri_bench_raw_reorder_seq`: raw RX sequence-number reorder benchmark
- `daqiri_bench_raw_reorder_quantize`: raw RX sequence reorder with payload conversion
- `daqiri_example_pcap_writer`: RX pcap writer with optional GPUDirect demo TX traffic
- `daqiri_example_dynamic_rx_flow`: raw TX/RX example that starts with RX flow
isolation and no configured flows, then dynamically steers one UDP flow to
queues 0 and 1 in sequence
- `daqiri_bench_rdma`: RDMA benchmark logic (former `rdma_bench.h`)
- `daqiri_bench_socket`: TCP/UDP socket benchmark logic
- `daqiri_example_gds_write`: one-shot capture that demonstrates synchronous and
Expand Down Expand Up @@ -50,6 +53,7 @@ Run:
./build/examples/daqiri_bench_raw_reorder_seq ./build/examples/daqiri_bench_raw_tx_rx_reorder_seq_1024.yaml --seconds 10
./build/examples/daqiri_bench_raw_reorder_quantize ./build/examples/daqiri_bench_raw_tx_rx_reorder_quantize_seq_batch.yaml --seconds 10
./build/examples/daqiri_example_pcap_writer ./build/examples/daqiri_example_pcap_writer_sw_loopback.yaml /tmp/daqiri-capture.pcap --tx
./build/examples/daqiri_example_dynamic_rx_flow ./build/examples/daqiri_example_dynamic_rx_flow.yaml --target-gbps 10
./build/examples/daqiri_bench_rdma ./build/examples/daqiri_bench_rdma_tx_rx.yaml --seconds 10 --mode both
./build/examples/daqiri_bench_socket ./build/examples/daqiri_bench_socket_udp_tx_rx.yaml --seconds 10 --mode both
./build/examples/daqiri_bench_socket ./build/examples/daqiri_bench_socket_tcp_tx_rx.yaml --seconds 10 --mode both
Expand All @@ -72,6 +76,7 @@ Included configs:
| `daqiri_bench_raw_tx_rx.yaml` | `daqiri_bench_raw_gpudirect` |
| `daqiri_bench_raw_tx_rx_4q.yaml` | `daqiri_bench_raw_gpudirect` |
| `daqiri_bench_raw_sw_loopback.yaml` | `daqiri_bench_raw_gpudirect` |
| `daqiri_example_dynamic_rx_flow.yaml` | `daqiri_example_dynamic_rx_flow` |
| `daqiri_example_gds_write_sw_loopback.yaml` | `daqiri_example_gds_write` |
| `daqiri_example_gds_write_tx_rx.yaml` | `daqiri_example_gds_write` |
| `daqiri_bench_raw_rx_multi_q.yaml` | `daqiri_bench_raw_gpudirect` |
Expand Down
77 changes: 77 additions & 0 deletions examples/daqiri_example_dynamic_rx_flow.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,77 @@
%YAML 1.2
---
daqiri:
cfg:
version: 1
stream_type: "raw"
master_core: 3
debug: false
log_level: "info"
loopback: ""

memory_regions:
- name: "Data_TX_GPU"
kind: "device"
affinity: 0
num_bufs: 51200
buf_size: 1064
- name: "Data_RX_GPU_0"
kind: "device"
affinity: 0
num_bufs: 51200
buf_size: 1064
- name: "Data_RX_GPU_1"
kind: "device"
affinity: 0
num_bufs: 51200
buf_size: 1064

interfaces:
- name: "tx_port"
address: <0000:00:00.0>
tx:
queues:
- name: "tx_q_0"
id: 0
batch_size: 1024
cpu_core: 4
memory_regions:
- "Data_TX_GPU"
offloads:
- "tx_eth_src"
- name: "rx_port"
address: <0000:00:00.0>
rx:
flow_isolation: true
dynamic_flow_capacity: 1024
queues:
- name: "rx_q_0"
id: 0
cpu_core: 8
batch_size: 1024
memory_regions:
- "Data_RX_GPU_0"
- name: "rx_q_1"
id: 1
cpu_core: 9
batch_size: 1024
memory_regions:
- "Data_RX_GPU_1"

bench_rx:
- interface_name: "rx_port"
queue_id: 0
- interface_name: "rx_port"
queue_id: 1

bench_tx:
- interface_name: "tx_port"
queue_id: 0
batch_size: 1024
payload_size: 1000
header_size: 64
eth_dst_addr: <00:00:00:00:00:00>
ip_src_addr: <1.2.3.4>
ip_dst_addr: <5.6.7.8>
udp_src_port: 4096
udp_dst_port: 4096
Loading