Deflux

Streaming XML-from-ZIP engine with serializable checkpoints for .NET 8+.
Read XLSX, ODS, and other ZIP-based XML formats with pause/resume across process restarts — no re-decompression, no re-parsing. Pure C#, zero native dependencies.

Read(0 -> P) + Save(P) + [restart] + Restore(P) + Read(P -> end) === Read(0 -> end)

Install

Deflux.Core     — low-level engine (ZIP + DEFLATE + XML + checkpoints)
Deflux.Xlsx     — XLSX streaming reader
Deflux.Ods      — ODS streaming reader

All APIs accept a seekable Stream. Nothing takes file paths — you control the stream lifetime.

Examples

XLSX — streaming rows with checkpoint

using Deflux.Xlsx;
using Deflux.Core;

using var stream = File.OpenRead("report.xlsx");
using var reader = new XlsxReader(stream);

Console.WriteLine(string.Join(", ", reader.GetSheetNames()));

reader.OpenSheet(0);

byte[]? checkpoint = null;
int n = 0;

while (reader.ReadRow(out Row row))
{
    foreach (var cell in row.Cells.Span)
        Console.Write($"[{cell.ColumnIndex}:{cell.Type}] {cell.Value}  ");
    Console.WriteLine();

    // Save checkpoint every 10k rows
    if (++n % 10_000 == 0)
        checkpoint = reader.SaveCheckpoint();
}

// Later (even in a different process):
if (checkpoint != null)
{
    using var stream2 = File.OpenRead("report.xlsx");
    using var reader2 = new XlsxReader(stream2);
    reader2.RestoreCheckpoint(checkpoint);

    while (reader2.ReadRow(out Row row))
        ProcessRow(row); // continues exactly where we left off
}

ODS — scan sheets, instant open via checkpoint

using Deflux.Ods;
using Deflux.Core;

using var stream = File.OpenRead("data.ods");
using var reader = new OdsReader(stream);

// One pass through content.xml — finds all sheets and caches a
// checkpoint (~45 KB) at the start of each one.
IReadOnlyList<SheetInfo> sheets = reader.ScanSheets();

foreach (var s in sheets)
    Console.WriteLine($"{s.Name}  (checkpoint {s.Checkpoint.Length} bytes)");

// Open any sheet instantly — checkpoint restore, no re-decompression.
reader.OpenSheet("Summary");

while (reader.ReadRow(out Row row))
{
    foreach (var cell in row.Cells.Span)
        Console.Write($"{cell.Value}\t");
    Console.WriteLine();
}

Low-level XML — read any entry from any ZIP

using Deflux.Core.Reader;
using Deflux.Core;

// Works with EPUB, DOCX, KMZ, 3MF — anything that is ZIP + XML.
using var stream = File.OpenRead("book.epub");
using var reader = new CheckpointableXmlReader(stream, "OEBPS/chapter3.xhtml");

if (reader.SkipTo("body"))
{
    foreach (var _ in reader.ReadElements("p"))
    {
        reader.Read(); // Text node
        Console.WriteLine(reader.Value);

        if (NeedPause())
        {
            byte[] cp = reader.SaveCheckpoint();
            File.WriteAllBytes("progress.bin", cp);
            break;
        }
    }
}

How it works

Stream
  -> ZipNavigator        parse Central Directory, open entry as SubStream
  -> CheckpointableInflater   forked SharpZipLib DEFLATE with save/restore
  -> MinimalXmlParser    feed-model, chunk-boundary safe, namespace-aware
  -> CheckpointableXmlReader  unified public API
  -> XlsxReader / OdsReader   format modules

Checkpoint (~45 KB) captures the full vertical state:

DEFLATE: 32 KB sliding window + Huffman trees + bit buffer + stream position
XML parser: element stack, namespace bindings, parse state
Pending bytes between inflater and parser

Restore seeks the compressed stream to the saved position, rebuilds all state, and the next Read() continues transparently.

Checkpoint binary format

┌─────────┬─────────────┬──────────────┬──────────────────────────────┐
│ version │ totalLength │ payloadCRC32 │           payload            │
│  1 byte │   4 bytes   │   4 bytes    │         ~40-45 KB            │
└─────────┴─────────────┴──────────────┴──────────────────────────────┘

payload:
  ┌─ Entry ID ──────────────────────────────────────────────────────┐
  │  entryName (UTF-8, length-prefixed)                             │
  │  entryCRC32 · compressedSize · adjustedCompressedOffset         │
  ├─ DEFLATE state (section with length prefix) ────────────────────┤
  │  mode · neededBits · repLength · repDist · uncomprLen           │
  │  isLastBlock · totalOut · totalIn                               │
  │  OutputWindow: byte[32768] + windowEnd + windowFilled           │
  │  StreamManipulator: window + start/end + bitBuffer + bitsCount  │
  │  Huffman trees: litlen[] + dist[] (null if static/inactive)     │
  │  DynHeader state (if mid-dynamic block)                         │
  │  Adler32 checksum · adjustedOffset · totalBytesFeeded           │
  ├─ Pending decompressed bytes (0-8 KB) ───────────────────────────┤
  ├─ XML parser state (section with length prefix) ─────────────────┤
  │  parseState · depth · elementStack[] · namespaceBindings[]      │
  │  pendingText · lineNumber · columnNumber · incompleteUTF8       │
  └─────────────────────────────────────────────────────────────────┘

Three-level verification on restore:
  1. version ≠ expected  →  CheckpointVersionException
  2. CRC-32 mismatch    →  CheckpointMismatchException (corrupted)
  3. ZIP entry CRC/size  →  CheckpointMismatchException (file modified)

All values little-endian. BinaryWriter/BinaryReader, no external dependencies.

Performance

Benchmark on a 70 MB ODS file (50 sheets, 1.25M rows, 6.7M cells).
Each library ran in a separate process for clean memory measurement (PeakWorkingSet64).

Library	Read all rows	Need only sheet 50	Peak memory
Deflux	16.3 s	7.8 s	83 MB
OdsReaderWriter	29.7 s	29.7 s (must load all)	7,037 MB
Aspose.Cells	22.7 s	22.5 s (must load all)	2,019 MB

DOM libraries (OdsReaderWriter, Aspose) must load the entire file into memory even if you only need one sheet. Deflux streams — ScanSheets() does one pass (~8s), then any OpenSheet() restores from a cached checkpoint in ~0.2s.

Deflux internals
Checkpoint size	~45 KB
Checkpoint save	< 5 ms
Checkpoint restore	< 50 ms
Steady-state memory	< 1 MB

Limitations

By design:

UTF-8 only — this covers 99%+ of real-world XLSX/ODS/EPUB/DOCX files. Supporting legacy encodings would add complexity with near-zero practical benefit.
Sync-only — the checkpoint model is inherently sequential (save state at point P, resume from P). Async would add overhead to every Read() call with no throughput gain, since the bottleneck is DEFLATE decompression which is CPU-bound.
Single instance = single thread — each reader instance is not thread-safe. However, multiple reader instances can safely work on the same file concurrently (each opens its own stream).

Currently not supported:

No ZIP encryption support
No async overloads

XLSX-specific:

SharedStringsTable is loaded fully into memory
No date/style detection — Excel stores dates as serial floats; parsing them requires styles.xml, which is not yet implemented. Raw float values are returned as-is.

License

MIT

DEFLATE decompressor forked from SharpZipLib (MIT).

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
.github/workflows		.github/workflows
src		src
tests		tests
.gitignore		.gitignore
Deflux.slnx		Deflux.slnx
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Deflux

Install

Examples

XLSX — streaming rows with checkpoint

ODS — scan sheets, instant open via checkpoint

Low-level XML — read any entry from any ZIP

How it works

Checkpoint binary format

Performance

Limitations

License

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Deflux

Install

Examples

XLSX — streaming rows with checkpoint

ODS — scan sheets, instant open via checkpoint

Low-level XML — read any entry from any ZIP

How it works

Checkpoint binary format

Performance

Limitations

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages