INF-1318 flock config.yaml against concurrent writers#1
Conversation
Han-9 from the adversarial review: the load-modify-write cycle in SetContext / UseContext / DeleteContext was not protected against two twoctl invocations racing on the same config.yaml. The later os.Rename won; the earlier update was lost. Realistic trigger: background 'twoctl config use-context' while a foreground 'auth login' completes. Now wrapped with withLock(), an exclusive advisory flock on config.yaml.lock. Implementation split by build tag: - lock_unix.go: syscall.Flock(LOCK_EX) - blocking, per-fd, released on file close. Works on linux, darwin, *bsd. - lock_windows.go: no-op for now. Windows residual race documented; proper fix needs LockFileEx via golang.org/x/sys/windows in a follow-up. - lock.go: shared dirOf helper. No new third-party deps. Test confirms unix serialisation: 10 goroutines contending for the same lockfile never enter the critical section concurrently.
There was a problem hiding this comment.
Code Review
This pull request implements a file-based locking mechanism using flock on Unix systems to prevent race conditions during configuration updates in SetContext, UseContext, and DeleteContext. Feedback includes a recommendation to include keychain operations within the lock in DeleteContext to ensure data consistency, as well as a suggestion to improve the test suite by asserting strict serialization on Unix-like platforms.
| if err := withLock(p, func() error { | ||
| cfg, err := LoadFile() | ||
| if err != nil { | ||
| return err | ||
| } | ||
| if _, ok := cfg.Contexts[name]; !ok { | ||
| return fmt.Errorf("context %q does not exist", name) | ||
| } | ||
| delete(cfg.Contexts, name) | ||
| if cfg.CurrentContext == name { | ||
| cfg.CurrentContext = "" | ||
| } | ||
| return SaveFile(cfg) | ||
| }); err != nil { | ||
| return err | ||
| } | ||
| return DeleteKey(name) |
There was a problem hiding this comment.
Similar to SetContext, DeleteKey is called outside the lock. A racing SetContext and DeleteContext could result in a context existing in config.yaml but having its key deleted from the keychain, or vice versa. The keyring deletion should be wrapped in the same lock as the file update to maintain consistency.
| if err := withLock(p, func() error { | |
| cfg, err := LoadFile() | |
| if err != nil { | |
| return err | |
| } | |
| if _, ok := cfg.Contexts[name]; !ok { | |
| return fmt.Errorf("context %q does not exist", name) | |
| } | |
| delete(cfg.Contexts, name) | |
| if cfg.CurrentContext == name { | |
| cfg.CurrentContext = "" | |
| } | |
| return SaveFile(cfg) | |
| }); err != nil { | |
| return err | |
| } | |
| return DeleteKey(name) | |
| return withLock(p, func() error { | |
| cfg, err := LoadFile() | |
| if err != nil { | |
| return err | |
| } | |
| if _, ok := cfg.Contexts[name]; !ok { | |
| return fmt.Errorf("context %q does not exist", name) | |
| } | |
| delete(cfg.Contexts, name) | |
| if cfg.CurrentContext == name { | |
| cfg.CurrentContext = "" | |
| } | |
| if err := SaveFile(cfg); err != nil { | |
| return err | |
| } | |
| return DeleteKey(name) | |
| }) |
| // On Unix, withLock holds an exclusive flock per call so the inner | ||
| // counter never exceeds 1. On Windows (no-op) it can. | ||
| if maxSeen < 1 { | ||
| t.Fatalf("test never entered critical section") | ||
| } |
There was a problem hiding this comment.
The test currently only verifies that the critical section was entered at least once (maxSeen < 1). It does not actually assert that serialization occurred on platforms where withLock is implemented (Unix). On Unix-like systems, maxSeen should be exactly 1. Note that you will need to import the runtime package to check runtime.GOOS.
// On Unix, withLock holds an exclusive flock per call so the inner
// counter never exceeds 1.
if maxSeen < 1 {
t.Fatalf("test never entered critical section")
}
if runtime.GOOS != "windows" && maxSeen > 1 {
t.Errorf("withLock failed to serialise on %s: max concurrent workers = %d", runtime.GOOS, maxSeen)
}There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 3aef89b3d9
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if maxSeen < 1 { | ||
| t.Fatalf("test never entered critical section") |
There was a problem hiding this comment.
Assert lock serialization in contention test
This assertion does not verify the behavior the test claims to cover: maxSeen < 1 only checks that the critical section ran at least once, so the test still passes even if withLock allows full concurrency (e.g., a future Unix regression to no-op locking). Because this test is intended to guard the lock’s correctness, it should fail when more than one goroutine enters simultaneously (on Unix), otherwise it provides false confidence and won’t catch lock breakage.
Useful? React with 👍 / 👎.
Summary
SetContext/UseContext/DeleteContextwith an exclusive advisory flock onconfig.yaml.lockso twotwoctlinvocations can't lose each other's updates.syscall.Flock(LOCK_EX)(stdlib, no new third-party dep). Windows is a no-op for now with the residual race documented in the file header — proper fix needsLockFileExviagolang.org/x/sys/windowsand is tracked as the Windows-specific follow-up to Han-9.dirOflives in a sharedlock.goso both build-tag variants pick it up.Addresses Han-9 from the adversarial review (INF-1318).
Test plan