Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions agent-learnings.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,3 +8,5 @@ Mistakes and non-obvious patterns worth remembering. Any agent (any tool, any te
- When `msc` is in a project venv (`.venv/bin/msc`), MCP server configs must use the absolute path — tools launch the server outside the venv.
- `git commit --no-edit` on a merge commit uses `cleanup=whitespace`, which keeps any `#`-prefixed comment lines from the auto-generated `MERGE_MSG` template. To strip those without opening an editor, pass `--cleanup=strip` explicitly (e.g. `git commit --no-edit --cleanup=strip`) or supply `-m "..."` with the final message. Fixing it after the push requires `git commit --amend --cleanup=strip --no-edit` plus `git push --force-with-lease`, so prefer to get it right the first time.
- Long-running `nix-shell` sessions set `TMPDIR` to a per-shell directory (e.g. `/tmp/nix-shell.XXXXXX`) that gets swept off disk when /tmp is cleaned or the original spawning shell exits. Any tool that calls `os.TempDir()` then fails opaquely: `git commit` ("could not create temporary file: No such file or directory" → "failed to write commit object"), `golangci-lint` ("typechecking error: go: creating work dir: stat /tmp/nix-shell.XXXX: no such file or directory" → exit 7 even though the lint itself found 0 issues), `go build` / `go test` with similar messages. Fix from inside the nix-shell with `export TMPDIR=/tmp` (or `unset TMPDIR`) before re-running. If you must persist it, set it in `shellHook` in `flake.nix`. Reading `pgrep -fa golangci-lint` showing fast-churn PIDs from Cursor's `go.lintOnSave` is a separate problem (parallel-runner lock); set `run.allow-parallel-runners: true` in `.golangci.yml` to let editor + CLI coexist.
- MSFS manifest **ingest** performance is governed by `process_memory_limit` (the Go soft-memory / GOMEMLIMIT knob), which **defaults to 4 GiB**. A 100M-inode ingest needs ~6-8 GiB resident, so on the default the heap sits permanently above the soft limit and Go GCs continuously — a death spiral where current throughput collapses to ~1K obj/sec, `gcPauses` climb to thousands, and a 100M ingest stretches past 24 min. Symptom in the log: `manifest-ingest: progress` lines where `current obj/sec` drops to ~1000 while `mem` hovers above the limit and `gcPauses` grows ~linearly. Fix: set `process_memory_limit` generously for the host (e.g. `68719476736` = 64 GiB on a big node); ingest then completed in ~3m22s @ ~518K obj/sec with only 94 GC pauses (vs 3317). On 256-vCPU hosts also cap `GOMAXPROCS` (e.g. `GOMAXPROCS=32` env at mount) so GC stop-the-world doesn't pause 256 Ps. NOTE: the old EC2 configs bounded ingest memory via `max_memory_inodes: 20000000`, but that key **no longer exists in the current code** (replaced by PebbleDB metadata-cache paging: `pebble_cache_size`, `inode_map_*`); it is silently ignored if present, so use `process_memory_limit` instead.
- MSFS AIStore backend supports `manifest_gen_backend: <dir_name>` (in the `AIStore:` block) to route LIST/STAT-DIR (manifest generation, readdir) to another already-configured backend (e.g. a direct-S3 backend) while object reads still go via AIS for caching/fast-tier. Use this when AIS fronts a remote cloud bucket: AIS's cold cross-cloud LIST is extremely slow (proxy→target `/begin` control-plane timeouts under concurrency; a 100M-dir crawl never completes), whereas listing the underlying store directly is fast (100M objects in ~1m28s @ ~1.1M obj/sec, identical to native S3). Only the AIS backend should carry `manifest_path`; if a backend named as someone's `manifest_gen_backend` ALSO sets its own `manifest_path`, both run independent generate+ingest pipelines that compete on the same PebbleDB and memory budget (double the S3 LIST load; corrupt manifest if they share the same `manifest_path` dir).
4 changes: 3 additions & 1 deletion multi-storage-file-system/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -63,7 +63,8 @@ The MSFS-specific global (i.e. "top-level") settings are described in the follow
| virtual_dir_ttl | decimal milliseconds | 1000000 | Amount of time a created but still empty directory should be maintained (should be at least evictable_inode_ttl) |
| virtual_file_ttl | decimal milliseconds | 1000000 | Amount of time a created but still not flushed file should be maintained (should be at least evictable_inode_ttl) |
| ttl_check_interval | decimal milliseconds | 250 | Amount of time between checking for evictions and cache pruning |
| mapped_cache | boolean | true | If true, the caching layer uses a memory-mapped file (which can be much larger than RAM); if false, the caching layer uses only RAM |
| mapped_cache | boolean | true | If true, the caching layer uses a memory-mapped file (which can be much larger than RAM); if false, the caching layer uses only RAM (only applies when cache_backend == "memory") |
| cache_backend | string | "memory" | Cache-line storage/serve backend: "memory" (shared mmap content buffer served via memcpy) or "disk" (per-inode contiguous files under <cache_dir>/cachelines served via pread, with FOPEN_DIRECT_IO dropped) |
| cache_line_size | decimal bytes | 10485760 (10Mi) | Granularity of caching layer for both file read and write traffic |
| cache_lines | decimal | 128 | Number of cache lines provisioned |
| cache_lines_to_prefetch | decimal | 4 | Maximum number of cache lines to prefetch while fetching a cache line to satisfy a read operation |
Expand Down Expand Up @@ -146,6 +147,7 @@ the following table:
| authnTokenFile | string | "${AIS_AUTHN_TOKEN_FILE:=~/.config/ais/cli/auth.token}" | If != "", specifies location of AUTHN Token file |
| provider | string | "s3" | IF != "ais", specifies the backend of which bucket contents are cached |
| timeout | decimal milliseconds | 30000 | Limit on allowed duration of requests (including retries) |
| manifest_gen_backend | string | "" | IF != "", `dir_name` of another (non-AIStore) backend used for LIST/STAT-DIR (manifest generation, readdir); object reads still go via AIS. Lets listing hit the underlying store (e.g. S3) directly while reads benefit from AIS caching. The referenced backend should target the same bucket/prefix; a mismatch is logged as a warning (not rejected). |

### GCS Backend Configuration

Expand Down
40 changes: 39 additions & 1 deletion multi-storage-file-system/backend_aistore.go
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,14 @@ type aistoreContextStruct struct {
backend *backendStruct
baseParams api.BaseParams // Connection parameters
bck cmn.Bck // Bucket metadata/ structure
// `listBackend`, when non-nil, is the backend named by the AIStore backend's
// `manifest_gen_backend` (e.g. a direct S3 backend). LIST/STAT-DIR operations are
// delegated to its context so that manifest generation and readdir bypass AIS's
// (slow, cold) cross-cloud listing, while object reads continue to flow through
// AIS for its caching/fast-tier. We store the *backendStruct (not its context)
// and dereference `.context` at call time so a later re-setup can't leave a stale
// pointer here.
listBackend *backendStruct
}

// `backendCommon` is called to return a pointer to the context's common `backendStruct`.
Expand Down Expand Up @@ -94,12 +102,27 @@ func (backend *backendStruct) setupAIStoreContext() (err error) {
}

// Store context
backend.context = &aistoreContextStruct{
aisContext := &aistoreContextStruct{
backend: backend,
baseParams: baseParams,
bck: bck,
}

// If a manifest_gen_backend is configured, route LIST/STAT-DIR operations to it
// (reads still go via AIS). Ensure its context is set up (it may not have been
// set up yet, depending on backend mount ordering).
if backendAIStore.manifestGenBackend != nil {
if backendAIStore.manifestGenBackend.context == nil {
if err = backendAIStore.manifestGenBackend.setupContext(); err != nil {
err = fmt.Errorf("[AIStore] failed to set up manifest_gen_backend %q: %v", backendAIStore.manifestGenBackend.dirName, err)
return
}
}
aisContext.listBackend = backendAIStore.manifestGenBackend
}

backend.context = aisContext

// Record backendPath
if backend.prefix == "" {
backend.backendPath = backendAIStore.endpoint + "/"
Expand Down Expand Up @@ -157,6 +180,11 @@ func (aisContext *aistoreContextStruct) deleteFile(deleteFileInput *deleteFileIn
// indicates the `directory` has been completely enumerated. The `isTruncated` field will also
// align with this convention.
func (aisContext *aistoreContextStruct) listDirectory(listDirectoryInput *listDirectoryInputStruct) (listDirectoryOutput *listDirectoryOutputStruct, err error) {
// Delegate listing to the manifest_gen_backend (e.g. direct S3) when configured.
if aisContext.listBackend != nil {
return aisContext.listBackend.context.listDirectory(listDirectoryInput)
}

var (
backend = aisContext.backend
fullDirPath = backend.prefix + listDirectoryInput.dirPath
Expand Down Expand Up @@ -222,6 +250,11 @@ func (aisContext *aistoreContextStruct) listDirectory(listDirectoryInput *listDi
// empty list of elements (`objects`) indicates the list of `objects` has been completely
// enumerated. The `isTruncated` field will also align with this convention.
func (aisContext *aistoreContextStruct) listObjects(listObjectsInput *listObjectsInputStruct) (listObjectsOutput *listObjectsOutputStruct, err error) {
// Delegate listing to the manifest_gen_backend (e.g. direct S3) when configured.
if aisContext.listBackend != nil {
return aisContext.listBackend.context.listObjects(listObjectsInput)
}

var (
backend = aisContext.backend
lsmsg = &apc.LsoMsg{
Expand Down Expand Up @@ -337,6 +370,11 @@ func (aisContext *aistoreContextStruct) readFile(readFileInput *readFileInputStr
// `statDirectory` is called to verify that the specified path refers to a `directory`.
// An error is returned if either the specified path is not a `directory` or non-existent.
func (aisContext *aistoreContextStruct) statDirectory(statDirectoryInput *statDirectoryInputStruct) (statDirectoryOutput *statDirectoryOutputStruct, err error) {
// Delegate directory stat to the manifest_gen_backend (e.g. direct S3) when configured.
if aisContext.listBackend != nil {
return aisContext.listBackend.context.statDirectory(statDirectoryInput)
}

var (
backend = aisContext.backend
fullDirPath = backend.prefix + statDirectoryInput.dirPath
Expand Down
68 changes: 59 additions & 9 deletions multi-storage-file-system/cache.go
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,18 @@ func dataCacheUp() (err error) {

dataCacheLineContentSize = globals.config.cacheLines * globals.config.cacheLineSize

if globals.config.mappedCache {
switch {
case globals.config.cacheBackend == "disk":
// Disk backend: no shared content arena. Each inode's cache-line bytes
// live in its own contiguous file under <cacheDir>/cachelines/, served
// via pread. The tracker/LRU machinery below is unchanged and bounds the
// number of resident lines exactly as in memory mode.
globals.dataCacheLinesFile = nil
globals.dataCacheLinesContent = nil
if err = diskCacheUp(); err != nil {
return
}
case globals.config.mappedCache:
globals.dataCacheLinesFile, err = os.OpenFile(filepath.Join(globals.cacheDir, DataCacheFileName), os.O_RDWR|os.O_CREATE|os.O_EXCL, 0o600)
if err != nil {
err = fmt.Errorf("os.OpenFile(filepath.Join(globals.cacheDir, DataCacheFileName), os.O_RDWR|os.O_CREATE|os.O_EXCL, 0o600) failed: %v", err)
Expand All @@ -42,7 +53,7 @@ func dataCacheUp() (err error) {
err = fmt.Errorf("syscall.Mmap(int(globals.dataCacheLinesFile.Fd()),,,,,) failed: %v", err)
return
}
} else {
default:
// RAM-backed cache: use an anonymous private mmap (fd=-1, MAP_ANON) instead
// of `make([]byte, N)`. The kernel only commits physical pages on first
// touch, so workloads that never read file data (e.g. pure manifest ingest
Expand Down Expand Up @@ -120,6 +131,10 @@ func dataCacheUp() (err error) {
func dataCacheDown() (err error) {
globals.dataCacheActivityWG.Wait()

if globals.config.cacheBackend == "disk" {
diskCacheDown()
}

// Both mapped_cache=true (file-backed) and mapped_cache=false (anonymous) paths
// use mmap-allocated memory, so Munmap covers both. The file fd is only present
// for the mapped_cache=true case.
Expand Down Expand Up @@ -364,6 +379,15 @@ func allocateDataCacheLines(count uint64) (cacheLineNumbers []uint64, neededToBl
globals.logger.Fatalf("[FATAL] globals.inodeMap.get(dataCacheLineTracker.inodeNumber[%v]) returned !ok", dataCacheLineTracker.inodeNumber)
}

if globals.config.cacheBackend == "disk" {
// Slot is being recycled for a different (inode, line): release
// the evicted line's disk bytes. Bump the generation first so any
// in-flight unlocked optimistic reader fails its post-copy
// re-check and retries rather than accepting now-punched bytes.
dataCacheLineTracker.contentGeneration++
dataCacheLineTracker.punchHoleDisk()
}
Comment thread
coderabbitai[bot] marked this conversation as resolved.

delete(inode.cacheMap, dataCacheLineTracker.lineNumber)

cacheLineNumbers = append(cacheLineNumbers, dataCacheLineTracker.pos)
Expand Down Expand Up @@ -393,7 +417,7 @@ func allocateDataCacheLines(count uint64) (cacheLineNumbers []uint64, neededToBl

cacheLineWaiter.Wait()

globalsLock("cache.go:396:3:allocateDataCacheLines")
globalsLock("cache.go:416:3:allocateDataCacheLines")
}
}

Expand All @@ -416,11 +440,15 @@ func releaseDataCacheLines(cacheLineNumbers []uint64) {
// `free` resets a data cache line that is not currently on any LRU and returns
// it to the Free LRU. The caller must hold the globals lock.
func (dataCacheLineTracker *dataCacheLineTrackerStruct) free() {
if globals.config.cacheBackend == "disk" {
dataCacheLineTracker.punchHoleDisk() // no-op if this line had no disk backing
}
dataCacheLineTracker.contentLength = 0
dataCacheLineTracker.contentGeneration++
dataCacheLineTracker.inodeNumber = 0 // not yet applicable
dataCacheLineTracker.lineNumber = 0 // not yet applicable
dataCacheLineTracker.eTag = "" // not yet applicable
dataCacheLineTracker.fetchFailed = false
dataCacheLineTracker.waiters = make([]*sync.WaitGroup, 0, 1)
globals.dataCacheLineFreeLRU.pushTail(dataCacheLineTracker)
}
Expand All @@ -441,7 +469,7 @@ func (dataCacheLineTracker *dataCacheLineTrackerStruct) fetch() {

defer globals.dataCacheActivityWG.Done()

globalsLock("cache.go:444:2:(*dataCacheLineTrackerStruct).fetch")
globalsLock("cache.go:468:2:(*dataCacheLineTrackerStruct).fetch")

inode, ok = globals.inodeMap.get(dataCacheLineTracker.inodeNumber)
if !ok {
Expand All @@ -468,12 +496,12 @@ func (dataCacheLineTracker *dataCacheLineTrackerStruct) fetch() {
globalsUnlock()

readFileOutput, err = readFileWrapper(backend.context, readFileInput)
if err == nil {
if err == nil && globals.config.cacheBackend == "memory" {
content = globals.dataCacheLinesContent[dataCacheLineTracker.contentStart : dataCacheLineTracker.contentStart+globals.config.cacheLineSize]
dataCacheLineTracker.contentLength = uint64(copy(content, readFileOutput.buf))
}

globalsLock("cache.go:476:2:(*dataCacheLineTrackerStruct).fetch")
globalsLock("cache.go:500:2:(*dataCacheLineTrackerStruct).fetch")
inode, ok = globals.inodeMap.get(dataCacheLineTracker.inodeNumber)
if ok {
inode.inboundCacheLineCount--
Expand All @@ -489,12 +517,34 @@ func (dataCacheLineTracker *dataCacheLineTrackerStruct) fetch() {
globals.dataCacheLineInboundLRU.popThis(dataCacheLineTracker)
dataCacheLineTracker.contentGeneration++

if err != nil {
globals.logger.Printf("[WARN] [TODO] (*dataCacheLineTrackerStruct) fetch() needs to handle error reading cache line")
switch {
case err != nil:
// The backend read failed. Mark the line as failed and move it to the
// Clean LRU so waiters are released and the state machine stays consistent;
// DoRead detects .fetchFailed, surfaces EIO to the caller, and evicts the
// line (so a subsequent read re-fetches rather than serving empty content).
// Leaving it un-flagged caused DoRead to compute an inverted slice
// (contentLength==0 < offset) and panic while holding the global lock.
globals.logger.Printf("[WARN] (*dataCacheLineTrackerStruct) fetch() backend read failed for inode %d line %d: %v", dataCacheLineTracker.inodeNumber, dataCacheLineTracker.lineNumber, err)
dataCacheLineTracker.contentLength = 0
dataCacheLineTracker.eTag = ""
} else {
dataCacheLineTracker.fetchFailed = true
case globals.config.cacheBackend == "disk":
// Write the fetched bytes through to the inode's backing file under
// the lock (matches the memory path, which set contentLength above).
dataCacheLineTracker.contentLength = dataCacheLineTracker.storeContentDisk(readFileOutput.buf)
if len(readFileOutput.buf) > 0 && dataCacheLineTracker.contentLength == 0 {
// storeContentDisk failed (open/WriteAt error) on a non-empty read:
// treat as a fetch failure so DoRead surfaces EIO instead of EOF.
dataCacheLineTracker.eTag = ""
dataCacheLineTracker.fetchFailed = true
} else {
dataCacheLineTracker.eTag = readFileOutput.eTag
dataCacheLineTracker.fetchFailed = false
}
default:
dataCacheLineTracker.eTag = readFileOutput.eTag
dataCacheLineTracker.fetchFailed = false
}

globals.dataCacheLineCleanLRU.pushTail(dataCacheLineTracker)
Expand Down
Loading