s3-cache

Transparent S3 proxy for the webtor.io platform. Sits between vault and S3-compatible object storage, signing requests itself (SigV4), fan-out-fetching large range requests across parallel HTTP/1.1 connections, and caching aligned 4 MiB chunks on local NVMe so per-source-IP upstream rate limits stop dominating cold traffic.

Why this exists, plainly

Two problems we want to solve:

Per-source-IP rate caps on object storage. Some providers throttle bursty single-IP traffic via a token bucket. A single TCP stream from one node sees the full effect; once depleted, requests crawl until the bucket refills. Multi-range alone doesn't fix this — opening more connections from the same IP shares the same shaper. The proper fix is a local cache so we stop hitting the upstream for hot ranges.
TLS / SigV4 / presigning overhead in vault. vault currently mints a presigned URL per webseed request. Pushing signing into a dedicated service lets vault stay stateless about S3 specifics.

MVP-2 delivers both: chunked multi-range upstream fetch + on-disk LRU cache

sequential readahead + singleflight dedup + Prometheus metrics.

Architecture

client → vault (302 with stable URL) → s3-cache → object storage
                                          │
                                          ├─ HEAD pass-through
                                          └─ GET  (aligned chunks, 4 MiB)
                                               ├─ disk cache lookup
                                               │    hit → serve from /webtor/s3-cache*
                                               │    miss → singleflight → upstream
                                               ├─ N=8 workers per request (own TCP each)
                                               ├─ wait for chunk 0 → 200/206
                                               ├─ stream chunks 1..N in order
                                               └─ kick readahead (K aligned chunks)

Calling convention: GET /{key}. HEAD returns metadata. Add Range: bytes=start-end for partial fetch. The bucket is fixed per-deploy via AWS_BUCKET — single-tenant, so URLs don't carry it.

Every request goes through the same aligned-chunk path, including sub-chunk-sized HLS-segment reads — the cache keys on absolute aligned offsets, so request-relative offsets would defeat reuse across requests. For a < 4 MiB request this still amounts to one upstream GET that gets sliced down to the requested span.

Status semantics (RFC 7233): A Range request gets 206 Partial Content + Content-Range. A plain GET gets 200 OK + Content-Length, no Content-Range header. Mismatching this — returning 206 on a request that didn't ask for a range — broke prod: thp's redirectFollowingTransport proxies our status back to the original client and returned 403 on the unexpected 206. Don't refactor the chunked fetch path to short-circuit this status-code branching.

The status code is only committed after chunk 0 lands, so any upstream failure on the first chunk yields a clean 502 Bad Gateway instead of a torn-off mid-response.

Workers force HTTP/1.1 (ForceAttemptHTTP2: false + empty TLSNextProto) so each chunk gets an independent TCP connection — under HTTP/2 they'd all share one socket and stall together.

There are no slow-detector aborts and no per-chunk retries. We tried them; empirically the upstream's throttle is per-source-IP, so retries from the same node hit the same shaper. The cache is what decouples us.

Disk cache

Pod-local (hostPath on the DaemonSet, same /webtor mount as torrent-web-seeder and content-transcoder so one disk allocation per node serves all three).
Shard topology piggybacks on TWS — we read /webtor/data* to pick a shard (same wildcard convention as content-transcoder's GetDir), then own <shard>/s3-cache/ for our own chunks. This way s3-cache automatically follows whatever multi-disk layout the node admin gave TWS, and eviction stays scoped to our subdir so torrent data sitting next to us under data1/ is never touched.
Chunk file at /webtor/dataN/s3-cache/<keyHash[:2]>/<keyHash>/<chunkHash> where keyHash = sha1(key) (bucket fixed per deploy) and chunkHash = sha1(key + ":" + offset). Uniform 40-char hex names, no prefix/suffix — matches TWS's naming style. Atomic writes via tmp file + rename — never serves a torn chunk.
LRU eviction, per-shard size cap. os.Chtimes bumps mtime on every cache hit, and the evictor sweeps oldest-mtime files until each shard is below ~90% of the cap. Per-shard (not global) so one hot key family can't starve evenly-distributed traffic.
Singleflight dedup — concurrent identical chunk misses collapse to one upstream GET. Typical for HLS where N viewers want the same segment simultaneously.
Sequential readahead — after the served range, prefetch K aligned chunks past the tail. Best-effort: drops kicks when the readahead worker pool is saturated, never blocks foreground.

Cache content is assumed immutable per key (typical for webtor storage). There is no ETag invalidation — bust the cache by deleting the shard files or by changing the key.

Build & run

go build -o server
./server

Local probe:

AWS_ACCESS_KEY_ID=... \
AWS_SECRET_ACCESS_KEY=... \
AWS_ENDPOINT=https://... \
AWS_REGION=... \
AWS_BUCKET=webtor-vault \
CACHE_ENABLED=true \
CACHE_DIR=/tmp/s3-cache/* \
./server

curl -I http://localhost:8080/<key>
curl -r 0-52428800 http://localhost:8080/<key> -o /dev/null

Configuration

All settings via env vars (or matching CLI flags).

Env	Default	Purpose
`WEB_PORT`	8080	HTTP listen port
`PROBE_PORT`	8081	Liveness/readiness port
`PPROF_PORT`	8082	pprof `/debug/pprof`-equivalent endpoints (`USE_PPROF=true` default)
`PROM_PORT`	8083	Prometheus `/metrics` port
`AWS_ACCESS_KEY_ID`	—	S3 access key
`AWS_SECRET_ACCESS_KEY`	—	S3 secret key
`AWS_ENDPOINT`	—	S3-compatible endpoint
`AWS_REGION`	—	S3 region
`AWS_NO_SSL`	`false`	Disable TLS to upstream
`AWS_BUCKET`	—	S3 bucket (required; single-tenant — same bucket for every request)
`CHUNK_SIZE`	4194304 (4 MiB)	Chunk granularity (also cache granularity)
`WORKERS`	8	Concurrent S3 fetches per request
`CACHE_ENABLED`	`false`	Enable on-disk chunk cache
`CACHE_DIR`	`/webtor/data*`	Cache shard roots (wildcard); piggybacks on TWS shards
`CACHE_SHARD_SUBDIR`	`s3-cache`	Subdirectory inside each shard we own
`EVICTION_MAX_BYTES`	10737418240 (10 GiB)	Per-shard size cap (0 disables)
`EVICTION_INTERVAL`	`1m`	Eviction sweep interval
`READAHEAD_CHUNKS`	4	Chunks to prefetch past served range (0 disables)
`READAHEAD_CONCURRENCY`	8	Max concurrent readahead fetches (process-wide)
`READAHEAD_TIMEOUT`	`30s`	Per-chunk readahead timeout

Metrics

Scrape :8083/metrics. Key series:

s3cache_cache_lookups_total{result="hit|miss|error"}
s3cache_cache_writes_total{result="ok|error"}
s3cache_cache_bytes_served_total — bytes served from disk
s3cache_upstream_bytes_fetched_total
s3cache_upstream_chunk_seconds{source="foreground|readahead"} (histogram)
s3cache_singleflight_shared_total — fetches that joined an in-flight call
s3cache_readahead_kicks_total{result="scheduled|dropped|already_cached"}
s3cache_eviction_runs_total, s3cache_eviction_bytes_freed_total
s3cache_shard_bytes{shard="..."} — current shard size (gauge)
s3cache_requests_total{method,status}

Hit ratio: rate(s3cache_cache_lookups_total{result="hit"}[5m]) / ignoring(result) sum(rate(s3cache_cache_lookups_total[5m])).

Deployment

Docker multi-stage build (scratch base). Published to GHCR via GitHub Actions on push to main. Helm chart lives in infra/helmfile/ and is symlinked at chart/.

Deployed as a DaemonSet on the worker pool with internalTrafficPolicy: Local so vault/thp on each node always hit the local s3-cache pod, no cross-node hops. The chart mounts /webtor from the host (same convention as TWS / content-transcoder).

Integration with vault

When S3_CACHE_URL is set in vault, its /webseed/{id}/{path} handler redirects clients to ${S3_CACHE_URL}/{key} instead of generating a presigned upstream URL. Empty env disables — falls back to direct presigned upstream, so rollback is a single env unset.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
.github/workflows		.github/workflows
services		services
.dockerignore		.dockerignore
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
Dockerfile		Dockerfile
README.md		README.md
configure.go		configure.go
go.mod		go.mod
go.sum		go.sum
main.go		main.go

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

s3-cache

Why this exists, plainly

Architecture

Disk cache

Build & run

Configuration

Metrics

Deployment

Integration with vault

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

s3-cache

Why this exists, plainly

Architecture

Disk cache

Build & run

Configuration

Metrics

Deployment

Integration with vault

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages