AXI-Stream compressed input → AXI-Stream RGBA pixel output, fully pipelined.
| Signal | Width | Direction | Description |
|---|---|---|---|
clk |
1 | in | Clock |
rst |
1 | in | Synchronous reset (active high) |
s_axis_tdata |
40 | in | Packed input bytes (up to 5 per beat) |
s_axis_tkeep |
5 | in | Byte-enable, MSB-aligned |
s_axis_tvalid |
1 | in | |
s_axis_tready |
1 | out | |
s_axis_tlast |
1 | in | Asserted on last beat of compressed frame |
m_axis_tdata |
32 | out | RGBA pixel (R at [31:24]) |
m_axis_tvalid |
1 | out | |
m_axis_tready |
1 | in | |
m_axis_tlast |
1 | out | Asserted on last pixel of frame |
m_cfg_width |
16 | out | Frame width in pixels (from QOI header) |
m_cfg_height |
16 | out | Frame height in pixels (from QOI header) |
m_cfg_channels |
8 | out | Channel count from QOI header |
m_cfg_colorspace |
8 | out | Colorspace field from QOI header |
| Parameter | Default | Description |
|---|---|---|
CHANNELS |
4 | 3 = RGB (alpha forced to 255), 4 = RGBA |
FIFO_DEPTH |
64 | Internal byte FIFO depth |
Claude Code and Codex have been used in this project. Though I designed the architecture, testplan, formal assertions, and synthesis strategy, AI has been used for scripting tasks, CocoTB tests, Makefiles, and documentation.
The QOI repository does feature a reference implementation, which this tests against lending some additional confidence that this implementation works.
Additionally, I want to acknowledge the work of Alex Forenich's TAXI project, which provides the underlying AXI infrastructure and excellent examples of modern testbenches.
The decoder is a 3-stage pipeline:
- Byte unpacker (
qoi_unpacker) — strips the 14-byte header, buffers the byte stream, and emits 5-byte aligned beats into an internal FIFO. Parses width, height, channels, and colorspace into sideband outputs. - Chunk decoder (
qoi_chunk_dec) — reads from the FIFO and identifies chunk boundaries (RUN, INDEX, DIFF, LUMA, RGB, RGBA). Emits a chunk descriptor (type, data, run length) per chunk. - Pixel reconstructor (
qoi_pixel_rec) — maintainsprev_pixeland a 64-entry distributed-RAM LUT. Expands RUN repeats, applies DIFF/LUMA arithmetic, looks up INDEX chunks. LUT writes are 2-cycle pipelined: cycle 1 captures the emitted pixel, cycle 2 computes the hash from the registered value.
Target: Artix-7 xc7a35t-CPG236-1, out-of-context
| Resource | Used | Available | Util% |
|---|---|---|---|
| Slice LUTs | 1051 | 20,800 | 5.1% |
| — LUT as Logic | 1039 | 20,800 | 5.0% |
| — LUT as Distributed RAM | 12 | 9,600 | 0.1% |
| Slice Registers (FF) | 2450 | 41,600 | 5.9% |
| RAMB18 | 0 | 100 | 0% |
| DSP48 | 1 | 90 | 1.1% |
| F7 Muxes | 257 | 16,300 | 1.6% |
| F8 Muxes | 128 | 8,150 | 1.6% |
| Parameter | Value |
|---|---|
| Target clock | 120 MHz (8.333 ns) |
| WNS (worst negative slack) | +1.153 ns |
| Timing constraints met | Yes |
| Failing setup endpoints | 0 / 7020 |
| Failing hold endpoints | 0 / 7020 |
| Power (W) | |
|---|---|
| Total on-chip | 0.084 |
| Dynamic | 0.016 |
| — Clocks | 0.007 |
| — Logic + signals | 0.008 |
| Static | 0.068 |
MIT License. See LICENSE for the full text.
The QOI format specification and reference implementation are by Dominic Szablewski (@phoboslab), also MIT licensed.