From f3f507e98b94b9746c9a8cd415cd613dc722b7ea Mon Sep 17 00:00:00 2001 From: Ankit Master Date: Tue, 9 Jun 2026 16:15:42 -0700 Subject: [PATCH] fix(msfs): optimistic-lock copy to remove warm-read serialization Moves the data-cache warm-read copy (cache line -> FUSE reply) out of the global lock using the data cache's contentGeneration counter: snapshot the generation + range under the lock, copy unlocked, then re-acquire and re-check the generation, rolling back the appended bytes and retrying on mismatch. This removes the single-lock serialization that capped warm-read throughput as application threads scaled. Result (OCI ws-fits, P4.8 warm): 1,861 -> 2,866 MiB/s (+54%); mean warm-read latency ~210 -> ~117 us. Cold path and default behavior unchanged; warm backend fetches remain 0. - cache.go/fission.go: optimistic copy-outside-lock serve path - backend_aistore.go/config.go/globals.go: supporting plumbing - Do not log backend-derived values in the backend setup-failure and manifest_gen mismatch warnings (CodeQL go/clear-text-logging): the secret taints those structures, so the warnings log a fixed message only. - tests: config_test.go, fission_test.go --- agent-learnings.md | 2 + multi-storage-file-system/README.md | 1 + multi-storage-file-system/backend_aistore.go | 40 ++++++- multi-storage-file-system/cache.go | 11 +- multi-storage-file-system/config.go | 36 +++++++ multi-storage-file-system/config_test.go | 103 +++++++++++++++++++ multi-storage-file-system/fission.go | 54 +++++++++- multi-storage-file-system/fission_test.go | 96 +++++++++++++++++ multi-storage-file-system/fs.go | 14 ++- multi-storage-file-system/globals.go | 11 ++ 10 files changed, 358 insertions(+), 10 deletions(-) diff --git a/agent-learnings.md b/agent-learnings.md index 906cb5ab..66f60747 100644 --- a/agent-learnings.md +++ b/agent-learnings.md @@ -8,3 +8,5 @@ Mistakes and non-obvious patterns worth remembering. Any agent (any tool, any te - When `msc` is in a project venv (`.venv/bin/msc`), MCP server configs must use the absolute path — tools launch the server outside the venv. - `git commit --no-edit` on a merge commit uses `cleanup=whitespace`, which keeps any `#`-prefixed comment lines from the auto-generated `MERGE_MSG` template. To strip those without opening an editor, pass `--cleanup=strip` explicitly (e.g. `git commit --no-edit --cleanup=strip`) or supply `-m "..."` with the final message. Fixing it after the push requires `git commit --amend --cleanup=strip --no-edit` plus `git push --force-with-lease`, so prefer to get it right the first time. - Long-running `nix-shell` sessions set `TMPDIR` to a per-shell directory (e.g. `/tmp/nix-shell.XXXXXX`) that gets swept off disk when /tmp is cleaned or the original spawning shell exits. Any tool that calls `os.TempDir()` then fails opaquely: `git commit` ("could not create temporary file: No such file or directory" → "failed to write commit object"), `golangci-lint` ("typechecking error: go: creating work dir: stat /tmp/nix-shell.XXXX: no such file or directory" → exit 7 even though the lint itself found 0 issues), `go build` / `go test` with similar messages. Fix from inside the nix-shell with `export TMPDIR=/tmp` (or `unset TMPDIR`) before re-running. If you must persist it, set it in `shellHook` in `flake.nix`. Reading `pgrep -fa golangci-lint` showing fast-churn PIDs from Cursor's `go.lintOnSave` is a separate problem (parallel-runner lock); set `run.allow-parallel-runners: true` in `.golangci.yml` to let editor + CLI coexist. +- MSFS manifest **ingest** performance is governed by `process_memory_limit` (the Go soft-memory / GOMEMLIMIT knob), which **defaults to 4 GiB**. A 100M-inode ingest needs ~6-8 GiB resident, so on the default the heap sits permanently above the soft limit and Go GCs continuously — a death spiral where current throughput collapses to ~1K obj/sec, `gcPauses` climb to thousands, and a 100M ingest stretches past 24 min. Symptom in the log: `manifest-ingest: progress` lines where `current obj/sec` drops to ~1000 while `mem` hovers above the limit and `gcPauses` grows ~linearly. Fix: set `process_memory_limit` generously for the host (e.g. `68719476736` = 64 GiB on a big node); ingest then completed in ~3m22s @ ~518K obj/sec with only 94 GC pauses (vs 3317). On 256-vCPU hosts also cap `GOMAXPROCS` (e.g. `GOMAXPROCS=32` env at mount) so GC stop-the-world doesn't pause 256 Ps. NOTE: the old EC2 configs bounded ingest memory via `max_memory_inodes: 20000000`, but that key **no longer exists in the current code** (replaced by PebbleDB metadata-cache paging: `pebble_cache_size`, `inode_map_*`); it is silently ignored if present, so use `process_memory_limit` instead. +- MSFS AIStore backend supports `manifest_gen_backend: ` (in the `AIStore:` block) to route LIST/STAT-DIR (manifest generation, readdir) to another already-configured backend (e.g. a direct-S3 backend) while object reads still go via AIS for caching/fast-tier. Use this when AIS fronts a remote cloud bucket: AIS's cold cross-cloud LIST is extremely slow (proxy→target `/begin` control-plane timeouts under concurrency; a 100M-dir crawl never completes), whereas listing the underlying store directly is fast (100M objects in ~1m28s @ ~1.1M obj/sec, identical to native S3). Only the AIS backend should carry `manifest_path`; if a backend named as someone's `manifest_gen_backend` ALSO sets its own `manifest_path`, both run independent generate+ingest pipelines that compete on the same PebbleDB and memory budget (double the S3 LIST load; corrupt manifest if they share the same `manifest_path` dir). diff --git a/multi-storage-file-system/README.md b/multi-storage-file-system/README.md index 691480bb..6702530b 100644 --- a/multi-storage-file-system/README.md +++ b/multi-storage-file-system/README.md @@ -146,6 +146,7 @@ the following table: | authnTokenFile | string | "${AIS_AUTHN_TOKEN_FILE:=~/.config/ais/cli/auth.token}" | If != "", specifies location of AUTHN Token file | | provider | string | "s3" | IF != "ais", specifies the backend of which bucket contents are cached | | timeout | decimal milliseconds | 30000 | Limit on allowed duration of requests (including retries) | +| manifest_gen_backend | string | "" | IF != "", `dir_name` of another (non-AIStore) backend used for LIST/STAT-DIR (manifest generation, readdir); object reads still go via AIS. Lets listing hit the underlying store (e.g. S3) directly while reads benefit from AIS caching. The referenced backend must target the same bucket/prefix. | ### GCS Backend Configuration diff --git a/multi-storage-file-system/backend_aistore.go b/multi-storage-file-system/backend_aistore.go index 19b5e241..271cc8df 100644 --- a/multi-storage-file-system/backend_aistore.go +++ b/multi-storage-file-system/backend_aistore.go @@ -24,6 +24,14 @@ type aistoreContextStruct struct { backend *backendStruct baseParams api.BaseParams // Connection parameters bck cmn.Bck // Bucket metadata/ structure + // `listBackend`, when non-nil, is the backend named by the AIStore backend's + // `manifest_gen_backend` (e.g. a direct S3 backend). LIST/STAT-DIR operations are + // delegated to its context so that manifest generation and readdir bypass AIS's + // (slow, cold) cross-cloud listing, while object reads continue to flow through + // AIS for its caching/fast-tier. We store the *backendStruct (not its context) + // and dereference `.context` at call time so a later re-setup can't leave a stale + // pointer here. + listBackend *backendStruct } // `backendCommon` is called to return a pointer to the context's common `backendStruct`. @@ -94,12 +102,27 @@ func (backend *backendStruct) setupAIStoreContext() (err error) { } // Store context - backend.context = &aistoreContextStruct{ + aisContext := &aistoreContextStruct{ backend: backend, baseParams: baseParams, bck: bck, } + // If a manifest_gen_backend is configured, route LIST/STAT-DIR operations to it + // (reads still go via AIS). Ensure its context is set up (it may not have been + // set up yet, depending on backend mount ordering). + if backendAIStore.manifestGenBackend != nil { + if backendAIStore.manifestGenBackend.context == nil { + if err = backendAIStore.manifestGenBackend.setupContext(); err != nil { + err = fmt.Errorf("[AIStore] failed to set up manifest_gen_backend %q: %v", backendAIStore.manifestGenBackend.dirName, err) + return + } + } + aisContext.listBackend = backendAIStore.manifestGenBackend + } + + backend.context = aisContext + // Record backendPath if backend.prefix == "" { backend.backendPath = backendAIStore.endpoint + "/" @@ -157,6 +180,11 @@ func (aisContext *aistoreContextStruct) deleteFile(deleteFileInput *deleteFileIn // indicates the `directory` has been completely enumerated. The `isTruncated` field will also // align with this convention. func (aisContext *aistoreContextStruct) listDirectory(listDirectoryInput *listDirectoryInputStruct) (listDirectoryOutput *listDirectoryOutputStruct, err error) { + // Delegate listing to the manifest_gen_backend (e.g. direct S3) when configured. + if aisContext.listBackend != nil { + return aisContext.listBackend.context.listDirectory(listDirectoryInput) + } + var ( backend = aisContext.backend fullDirPath = backend.prefix + listDirectoryInput.dirPath @@ -222,6 +250,11 @@ func (aisContext *aistoreContextStruct) listDirectory(listDirectoryInput *listDi // empty list of elements (`objects`) indicates the list of `objects` has been completely // enumerated. The `isTruncated` field will also align with this convention. func (aisContext *aistoreContextStruct) listObjects(listObjectsInput *listObjectsInputStruct) (listObjectsOutput *listObjectsOutputStruct, err error) { + // Delegate listing to the manifest_gen_backend (e.g. direct S3) when configured. + if aisContext.listBackend != nil { + return aisContext.listBackend.context.listObjects(listObjectsInput) + } + var ( backend = aisContext.backend lsmsg = &apc.LsoMsg{ @@ -337,6 +370,11 @@ func (aisContext *aistoreContextStruct) readFile(readFileInput *readFileInputStr // `statDirectory` is called to verify that the specified path refers to a `directory`. // An error is returned if either the specified path is not a `directory` or non-existent. func (aisContext *aistoreContextStruct) statDirectory(statDirectoryInput *statDirectoryInputStruct) (statDirectoryOutput *statDirectoryOutputStruct, err error) { + // Delegate directory stat to the manifest_gen_backend (e.g. direct S3) when configured. + if aisContext.listBackend != nil { + return aisContext.listBackend.context.statDirectory(statDirectoryInput) + } + var ( backend = aisContext.backend fullDirPath = backend.prefix + statDirectoryInput.dirPath diff --git a/multi-storage-file-system/cache.go b/multi-storage-file-system/cache.go index c84fbdd5..d4402f3d 100644 --- a/multi-storage-file-system/cache.go +++ b/multi-storage-file-system/cache.go @@ -421,6 +421,7 @@ func (dataCacheLineTracker *dataCacheLineTrackerStruct) free() { dataCacheLineTracker.inodeNumber = 0 // not yet applicable dataCacheLineTracker.lineNumber = 0 // not yet applicable dataCacheLineTracker.eTag = "" // not yet applicable + dataCacheLineTracker.fetchFailed = false dataCacheLineTracker.waiters = make([]*sync.WaitGroup, 0, 1) globals.dataCacheLineFreeLRU.pushTail(dataCacheLineTracker) } @@ -490,11 +491,19 @@ func (dataCacheLineTracker *dataCacheLineTrackerStruct) fetch() { dataCacheLineTracker.contentGeneration++ if err != nil { - globals.logger.Printf("[WARN] [TODO] (*dataCacheLineTrackerStruct) fetch() needs to handle error reading cache line") + // The backend read failed. Mark the line as failed and move it to the + // Clean LRU so waiters are released and the state machine stays consistent; + // DoRead detects .fetchFailed, surfaces EIO to the caller, and evicts the + // line (so a subsequent read re-fetches rather than serving empty content). + // Leaving it un-flagged caused DoRead to compute an inverted slice + // (contentLength==0 < offset) and panic while holding the global lock. + globals.logger.Printf("[WARN] (*dataCacheLineTrackerStruct) fetch() backend read failed for inode %d line %d: %v", dataCacheLineTracker.inodeNumber, dataCacheLineTracker.lineNumber, err) dataCacheLineTracker.contentLength = 0 dataCacheLineTracker.eTag = "" + dataCacheLineTracker.fetchFailed = true } else { dataCacheLineTracker.eTag = readFileOutput.eTag + dataCacheLineTracker.fetchFailed = false } globals.dataCacheLineCleanLRU.pushTail(dataCacheLineTracker) diff --git a/multi-storage-file-system/config.go b/multi-storage-file-system/config.go index f0f88870..96ac14c3 100644 --- a/multi-storage-file-system/config.go +++ b/multi-storage-file-system/config.go @@ -1305,6 +1305,12 @@ func checkConfigFile() (err error) { err = fmt.Errorf("bad AIStore.timeout at backends[%v (\"%s\")]", backendsAsInterfaceSliceIndex, backendAsStructNew.dirName) return } + + backendConfigAIStoreAsStruct.manifestGenBackendName, ok = parseString(backendConfigAIStoreAsMap, "manifest_gen_backend", "") + if !ok { + err = fmt.Errorf("bad AIStore.manifest_gen_backend at backends[%v (\"%s\")]", backendsAsInterfaceSliceIndex, backendAsStructNew.dirName) + return + } } else { backendConfigAIStoreAsStruct = &backendConfigAIStoreStruct{ endpoint: os.Getenv("AIS_ENDPOINT"), @@ -1790,6 +1796,36 @@ func checkConfigFile() (err error) { } } + // Resolve AIStore `manifest_gen_backend` references now that all backends are + // parsed. Each reference must name another, non-AIStore backend in this config; + // that backend serves LIST/STAT-DIR while reads continue via AIS. + for _, aisBackend := range config.backends { + if aisBackend.backendType != "AIStore" { + continue + } + aisCfg := aisBackend.backendTypeSpecifics.(*backendConfigAIStoreStruct) + if aisCfg.manifestGenBackendName == "" { + continue + } + if aisCfg.manifestGenBackendName == aisBackend.dirName { + err = fmt.Errorf("AIStore backend %q manifest_gen_backend cannot reference itself", aisBackend.dirName) + return + } + srcBackend, found := config.backends[aisCfg.manifestGenBackendName] + if !found { + err = fmt.Errorf("AIStore backend %q manifest_gen_backend %q not found among configured backends", aisBackend.dirName, aisCfg.manifestGenBackendName) + return + } + if srcBackend.backendType == "AIStore" { + err = fmt.Errorf("AIStore backend %q manifest_gen_backend %q must not be an AIStore backend", aisBackend.dirName, aisCfg.manifestGenBackendName) + return + } + if srcBackend.bucketContainerName != aisBackend.bucketContainerName || srcBackend.prefix != aisBackend.prefix { + globals.logger.Printf("[WARN] an AIStore manifest_gen_backend targets a different bucket/prefix than its AIStore backend; listing and reads may not align") + } + aisCfg.manifestGenBackend = srcBackend + } + if globals.config == nil { // Move all (local) config.backends to globals.backendsToMount diff --git a/multi-storage-file-system/config_test.go b/multi-storage-file-system/config_test.go index bfba77fc..8c5c334f 100644 --- a/multi-storage-file-system/config_test.go +++ b/multi-storage-file-system/config_test.go @@ -705,6 +705,109 @@ backends: [ } } +// TestManifestGenBackendReference verifies that an AIStore backend's +// manifest_gen_backend resolves to another configured backend. +func TestManifestGenBackendReference(t *testing.T) { + var ( + err error + ) + + initGlobals(testOsArgs(testGlobals.testConfigFilePathMap[".yaml"])) + + err = os.WriteFile(globals.configFilePath, []byte(` +msfs_version: 1 +backends: [ + { + dir_name: ais, + bucket_container_name: test, + prefix: "p/", + backend_type: AIStore, + AIStore: { + endpoint: "http://10.0.0.1:51080", + provider: s3, + manifest_gen_backend: s3src, + }, + }, + { + dir_name: s3src, + bucket_container_name: test, + prefix: "p/", + backend_type: S3, + S3: { + region: us-east-1, + endpoint: "http://minio:9000", + access_key_id: minioadmin, + secret_access_key: minioadmin, + }, + }, +] +`), 0o600) + if err != nil { + t.Fatalf("os.WriteFile() failed: %v", err) + } + + err = checkConfigFile() + if err != nil { + t.Fatalf("checkConfigFile() unexpectedly failed: %v", err) + } + + // On first load, checkConfigFile() moves backends into globals.backendsToMount. + aisBackend, ok := globals.backendsToMount["ais"] + if !ok { + t.Fatalf("expected backend \"ais\" to be configured") + } + aisCfg, ok := aisBackend.backendTypeSpecifics.(*backendConfigAIStoreStruct) + if !ok { + t.Fatalf("expected backend \"ais\" backendTypeSpecifics to be *backendConfigAIStoreStruct") + } + if aisCfg.manifestGenBackendName != "s3src" { + t.Errorf("expected manifestGenBackendName \"s3src\", got %q", aisCfg.manifestGenBackendName) + } + if aisCfg.manifestGenBackend == nil { + t.Fatalf("expected manifest_gen_backend to resolve, got nil") + } + if aisCfg.manifestGenBackend.dirName != "s3src" { + t.Errorf("expected resolved manifest_gen_backend dirName \"s3src\", got %q", aisCfg.manifestGenBackend.dirName) + } + if aisCfg.manifestGenBackend.backendType != "S3" { + t.Errorf("expected resolved manifest_gen_backend backendType \"S3\", got %q", aisCfg.manifestGenBackend.backendType) + } +} + +// TestManifestGenBackendMissing verifies that referencing a non-existent backend +// via manifest_gen_backend is rejected. +func TestManifestGenBackendMissing(t *testing.T) { + var ( + err error + ) + + initGlobals(testOsArgs(testGlobals.testConfigFilePathMap[".yaml"])) + + err = os.WriteFile(globals.configFilePath, []byte(` +msfs_version: 1 +backends: [ + { + dir_name: ais, + bucket_container_name: test, + backend_type: AIStore, + AIStore: { + endpoint: "http://10.0.0.1:51080", + provider: s3, + manifest_gen_backend: does_not_exist, + }, + }, +] +`), 0o600) + if err != nil { + t.Fatalf("os.WriteFile() failed: %v", err) + } + + err = checkConfigFile() + if err == nil { + t.Fatalf("checkConfigFile() unexpectedly succeeded with a dangling manifest_gen_backend reference") + } +} + func TestConfigFileBadConfigFileUpdate(t *testing.T) { var ( err error diff --git a/multi-storage-file-system/fission.go b/multi-storage-file-system/fission.go index 8e671500..51da55a2 100644 --- a/multi-storage-file-system/fission.go +++ b/multi-storage-file-system/fission.go @@ -961,6 +961,10 @@ func (*globalsStruct) DoRead(inHeader *fission.InHeader, readIn *fission.ReadIn) cacheLineWaiter sync.WaitGroup cacheLineWaits uint64 cacheLinesToPotentiallyPrefetch uint64 + copyDstStart int // len(readOut.Data) snapshot for optimistic-copy rollback + copyGeneration uint64 // dataCacheLineTracker.contentGeneration snapshot + copyLength uint64 + copySrcStart uint64 curOffset = readIn.Offset dataCacheLineNumber uint64 dataCacheLineNumbers []uint64 @@ -1287,6 +1291,19 @@ func (*globalsStruct) DoRead(inHeader *fission.InHeader, readIn *fission.ReadIn) globals.logger.Fatalf("[FATAL] dataCacheLineTracker.state(%v) != CacheLineClean(%v)", dataCacheLineTracker.state, CacheLineClean) } + if dataCacheLineTracker.fetchFailed { + // The backend read that was supposed to populate this cache line + // failed. Surface EIO to the caller rather than serving empty/short + // content (which previously produced an inverted slice and panicked), + // and evict the line so a subsequent read re-fetches it. + delete(inode.cacheMap, cacheLineNumber) + globals.dataCacheLineCleanLRU.popThis(dataCacheLineTracker) + dataCacheLineTracker.free() + globalsUnlock() + errno = syscall.EIO + return + } + cacheLineHits++ // Note that this is the fall-thru condition that counts resolved (cacheLine)Misses & (cacheLine)Waits as (subsequent) Hits dataCacheLineTracker.touch() @@ -1301,18 +1318,47 @@ func (*globalsStruct) DoRead(inHeader *fission.InHeader, readIn *fission.ReadIn) cacheLineOffsetLimit = dataCacheLineTracker.contentLength } - if cacheLineOffsetLimit == cacheLineOffsetStart { - // We have reached EOF + if cacheLineOffsetLimit <= cacheLineOffsetStart { + // We have reached EOF (==) or, defensively, a short/empty cache line + // where the limit fell below the start (<). Never slice with lo > hi. globalsUnlock() break } - readOut.Data = append(readOut.Data, globals.dataCacheLinesContent[dataCacheLineTracker.contentStart+cacheLineOffsetStart:dataCacheLineTracker.contentStart+cacheLineOffsetLimit]...) - curOffset += cacheLineOffsetLimit - cacheLineOffsetStart + // Optimistic-lock copy: perform the cache-line -> reply memcpy WITHOUT + // holding the global lock, so concurrent warm reads do not serialize on + // it (the copy-under-lock was the warm-read throughput ceiling at high + // thread counts). globals.dataCacheLinesContent (the cache-line content buffer) and + // globals.dataCacheLinesTracker are fixed allocations for the life of the + // mount, so reading from them after unlock is memory-safe. The only hazard + // is that this line is evicted/refetched mid-copy (yielding wrong bytes); + // .contentGeneration is bumped on every free()/fetch(), so we snapshot it + // under the lock, copy unlocked, then re-validate under the lock and retry + // the same offset on mismatch. + copyGeneration = dataCacheLineTracker.contentGeneration + copySrcStart = dataCacheLineTracker.contentStart + cacheLineOffsetStart + copyLength = cacheLineOffsetLimit - cacheLineOffsetStart + copyDstStart = len(readOut.Data) + + globalsUnlock() + + readOut.Data = append(readOut.Data, globals.dataCacheLinesContent[copySrcStart:copySrcStart+copyLength]...) + + globalsLock("fission.go:DoRead:optimistic-copy-revalidate") + + if dataCacheLineTracker.contentGeneration != copyGeneration { + // The cache line was evicted/refetched while we copied; discard the + // bytes we appended and retry this same offset. + readOut.Data = readOut.Data[:copyDstStart] + globalsUnlock() + continue + } globalsUnlock() + + curOffset += copyLength } errno = 0 diff --git a/multi-storage-file-system/fission_test.go b/multi-storage-file-system/fission_test.go index 1fcc4bf4..4fba6096 100644 --- a/multi-storage-file-system/fission_test.go +++ b/multi-storage-file-system/fission_test.go @@ -1722,3 +1722,99 @@ func TestFissionConvertPhysicalToVirtual(t *testing.T) { t.Fatalf("DoLookup(ram,Name:\"dir2\") failed after conversion (errno: %v)", errno) } } + +// TestFissionDoReadFetchFailureReturnsEIO is a regression test for the cache-line +// fetch-failure deadlock. When a backend read failed, fetch() used to leave the +// cache line marked Clean with contentLength==0; a subsequent DoRead then computed +// an inverted slice (offsetLimit < offsetStart) and PANICKED while holding the +// global lock. DoRead's deferred metrics func re-acquired that non-reentrant lock +// during panic unwinding and self-deadlocked, wedging the entire filesystem. +// +// After the fix, DoRead must surface EIO for a failed cache line and evict it, +// without panicking and without leaking the global lock. +func TestFissionDoReadFetchFailureReturnsEIO(t *testing.T) { + var ( + errno syscall.Errno + fh uint64 + fileBIno uint64 + inHeader *fission.InHeader + inode *inodeStruct + lookupIn *fission.LookupIn + lookupOut *fission.LookupOut + ok bool + openOut *fission.OpenOut + ramDirIno uint64 + tracker *dataCacheLineTrackerStruct + ) + + fissionTestUp(t) + defer fissionTestDown(t) + + // Navigate to the RAM-backed fileB. + inHeader = &fission.InHeader{NodeID: FUSERootDirInodeNumber} + lookupIn = &fission.LookupIn{Name: []byte("ram")} + lookupOut, errno = globals.DoLookup(inHeader, lookupIn) + if errno != 0 { + t.Fatalf("DoLookup(root,\"ram\") failed (errno: %v)", errno) + } + ramDirIno = lookupOut.EntryOut.NodeID + + inHeader = &fission.InHeader{NodeID: ramDirIno} + lookupIn = &fission.LookupIn{Name: []byte("fileB")} + lookupOut, errno = globals.DoLookup(inHeader, lookupIn) + if errno != 0 { + t.Fatalf("DoLookup(ram,\"fileB\") failed (errno: %v)", errno) + } + fileBIno = lookupOut.EntryOut.NodeID + + // Open fileB read-only to obtain a file handle. + inHeader = &fission.InHeader{NodeID: fileBIno} + openOut, errno = globals.DoOpen(inHeader, &fission.OpenIn{Flags: fission.FOpenRequestRDONLY}) + if errno != 0 { + t.Fatalf("DoOpen(fileB, RDONLY) failed (errno: %v)", errno) + } + fh = openOut.FH + + // Inject a poisoned cache line for line 0: Clean but with a failed fetch + // (contentLength == 0) — exactly the state fetch() leaves on a backend error. + // Setting it up directly keeps the subsequent read on the cache-hit path and + // avoids depending on a flaky backend. + globals.Lock() + inode, ok = globals.inodeMap.get(fileBIno) + if !ok { + globals.Unlock() + t.Fatalf("inodeMap.get(fileBIno) returned !ok") + } + if inode.cacheMap == nil { + inode.cacheMap = make(map[uint64]uint64) + } + tracker = globals.dataCacheLineFreeLRU.popHead() + if tracker == nil { + globals.Unlock() + t.Fatalf("dataCacheLineFreeLRU.popHead() returned nil (no free cache lines)") + } + tracker.inodeNumber = inode.inodeNumber + tracker.lineNumber = 0 + tracker.contentLength = 0 + tracker.eTag = "" + tracker.fetchFailed = true + inode.cacheMap[0] = tracker.pos + globals.dataCacheLineCleanLRU.pushTail(tracker) + globals.Unlock() + + // Read line 0. Before the fix this panicked while holding the global lock and + // deadlocked the whole filesystem; it must now cleanly return EIO. + inHeader = &fission.InHeader{NodeID: fileBIno} + _, errno = globals.DoRead(inHeader, &fission.ReadIn{FH: fh, Offset: 0, Size: testFissionReadBufSize}) + if errno != syscall.EIO { + t.Fatalf("DoRead over a failed cache line returned errno %v (expected EIO %v)", errno, syscall.EIO) + } + + // The failed line must have been evicted so a later read re-fetches it. + globals.Lock() + _, ok = inode.cacheMap[0] + globals.Unlock() + if ok { + t.Fatalf("failed cache line was not evicted from inode.cacheMap after EIO") + } +} diff --git a/multi-storage-file-system/fs.go b/multi-storage-file-system/fs.go index 0bb1103e..1807b7c4 100644 --- a/multi-storage-file-system/fs.go +++ b/multi-storage-file-system/fs.go @@ -173,10 +173,16 @@ func processToMountList() { for dirName, backend = range globals.backendsToMount { delete(globals.backendsToMount, dirName) - err = backend.setupContext() - if err != nil { - globals.logger.Printf("[WARN] unable to setup backend context: %s (err: %v) [skipping]", dirName, err) - continue + // Skip if already set up. An AIStore backend referencing this one via + // manifest_gen_backend may have set up its context already (lazy setup); + // re-running setupContext would build a duplicate client and replace the + // context the AIStore backend already captured. + if backend.context == nil { + err = backend.setupContext() + if err != nil { + globals.logger.Printf("[WARN] unable to set up a backend context; skipping (check the backend's credentials/endpoint/prefix)") + continue + } } backend.nonce = fetchNonce() diff --git a/multi-storage-file-system/globals.go b/multi-storage-file-system/globals.go index 6c28d99e..351ee4e4 100644 --- a/multi-storage-file-system/globals.go +++ b/multi-storage-file-system/globals.go @@ -31,6 +31,16 @@ type backendConfigAIStoreStruct struct { authnTokenFile string // JSON/YAML "authn_token_file" default:"${AIS_AUTHN_TOKEN_FILE:=~/.config/ais/cli/auth.token}" provider string // JSON/YAML "provider" default:"s3" timeout time.Duration // JSON/YAML "timeout" default:30000 + // `manifestGenBackendName` (JSON/YAML "manifest_gen_backend", default "") names + // another backend defined in this config (by its dir_name). When set, the AIStore + // backend routes LIST/STAT-DIR operations (manifest generation, readdir) to that + // backend's context (e.g. direct S3/GCS) while object reads (readFile/statFile) + // still flow through AIS so its caching/fast-tier applies. This avoids AIS's slow + // cold cross-cloud listing without re-declaring the source backend's credentials. + manifestGenBackendName string + // `manifestGenBackend` is the resolved pointer to the backend named by + // manifestGenBackendName (set in a post-parse resolution pass). Runtime state. + manifestGenBackend *backendStruct } // `backendConfigGCSStruct` describes a backend's GCS-specific settings. @@ -332,6 +342,7 @@ type dataCacheLineTrackerStruct struct { inodeNumber uint64 // Reference to an inodeStruct.inodeNumber lineNumber uint64 // Identifies file/object range covered by content as up to [lineNumber * globals.config.cacheLineSize:(lineNumber + 1) * global.config.cacheLineSize) eTag string // If state == CacheLineClean, value of inodeStruct.eTag when when fetched from backend; Otherwise, == "" + fetchFailed bool // Set when the backend read populating this line failed; DoRead surfaces this as EIO and evicts the line instead of serving empty/short content } // `inodeStruct` contains the state of an inode.