Sync with upstream 2.35 by edolstra · Pull Request #530 · DeterminateSystems/nix-src

edolstra · 2026-07-02T19:13:57Z

Motivation

WIP, not finished yet.

Context

We should never call reset() on a value (such as vRes) than can be seen by another thread. This was causing random failures about 'partially applied built-in function' etc. (cherry picked from commit 839aec2)

Resolves NixOS#15420 Signed-off-by: Lisanna Dettwyler <lisanna.dettwyler@gmail.com>

…ToPath This is totally equivalent on unix, because the path is guaranteed to be absolute (there's a check right before the call to rootPath) and absPath is equivalent to canonPath when the path is absolute and resolveSymlinks is false (as in our case). There are a couple of reasons we want to do this: * Portability. Windows should always use unix-style paths in the evaluator. * Performance. We don't want to pay the overhead of constructing a std::filesystem::path.

Windows build can evaluate flakes

Fix "due not" → "do not" typo, and replace the incomplete sentence "Different stores that disagree." with a complete, accurate explanation.

We can reuse the asynchronous graph traversal logic from computeClosure to traverse the "references" edges asynchronously to avoid relying on the narinfo disk cache. queryMissing will get the same treatment. This is limited to the binary cache store implementation so that we don't needlessly spawn coroutines when the underlying queryPathInfo is synchronous.

Coroutines are cheap and the FileTransfer already implements its own rate limiting. This is needed just to limit the maximum number of coroutines in flight.

Make queryMissing, querySubstitutablePaths, topoSortPaths async

Fix compatibility with lowdown 3

In 08887ca I neglected the case of interrupts and got bitten by destruction order. fd gets closed before FdSink destructor gets a chance to run. This doesn't matter in the successful code path, but does get hit during interrupts and leads to a bunch of annoying ignored error messages: error (ignored): write of 16384 bytes: Bad file descriptor error (ignored): write of 16384 bytes: Bad file descriptor

…in-destructor libutil/fs-sink: Flush FdSink before closing the file descriptor

Make even more parts of the evaluator thread-safe

This wasn't handling Interrupted, since it's a subclass of BaseError, not Error. Since this is called from handleExceptions() while printing exceptions, the uncaught exception here was causing Nix to crash with a call to std::unexpected().

This hardens Nix against uncaught exceptions (like Interrupted) while printing exceptions.

My SNAFU from initially implementing computeClosure and callbackToAwaitable. * computeClosure must ensure that all callbacks have been invoked before destroying the event loop. That wasn't the case when we hit an early error. * Callbacks must not take (shared) ownership of the shared_ptr when the completion has been run. I ran into a crash due to the shared_ptr being destroyed on the libcurl thread: 0x00007f64d2b99e4e _ZNSt23_Sp_counted_ptr_inplaceIN5boost4asio6detail23strand_executor_service11strand_implENS1_17execution_context9allocatorIvEELN9__gnu_cxx12_Lock_policyE2EE10_M_destroyEv (libnixstore.so.2.35.0 + 0x2bae4e) 0x00007f64d2c1b254 _ZN5boost4asio9execution6detail22shared_target_executor4implINS0_6strandINS0_15any_io_executorEEEED0Ev (libnixstore.so.2.35.0 + 0x33c254) 0x00007f64d2c13508 _ZNSt17_Function_handlerIFvSt6futureIN3nix3refIKNS1_13ValidPathInfoEEEEEZZNS1_19callbackToAwaitableIS5_ZZNS1_L32querySubstitutablePathInfosAsyncERNS1_5StoreERKSt3mapINS1_9StorePathESt8optionalINS1_14ContentA> 0x00007f64d2c8a23b _ZNSt23_Sp_counted_ptr_inplaceIN3nix8CallbackINS0_3refIKNS0_13ValidPathInfoEEEEESaIvELN9__gnu_cxx12_Lock_policyE2EE10_M_disposeEv (libnixstore.so.2.35.0 + 0x3ab23b) 0x00007f64d2a424fa _ZNSt16_Sp_counted_baseILN9__gnu_cxx12_Lock_policyE2EE10_M_releaseEv (libnixstore.so.2.35.0 + 0x1634fa) 0x00007f64d2c8b9aa _ZNSt17_Function_handlerIFvSt6futureISt10shared_ptrIKN3nix13ValidPathInfoEEEEZNS2_5Store13queryPathInfoERKNS2_9StorePathENS2_8CallbackINS2_3refIS4_EEEEEUlS6_E_E10_M_managerERSt9_Any_dataRKSI_St18_Manager_ope> 0x00007f64d2a7196b _ZNSt23_Sp_counted_ptr_inplaceIN3nix8CallbackISt10shared_ptrIKNS0_13ValidPathInfoEEEESaIvELN9__gnu_cxx12_Lock_policyE2EE10_M_disposeEv (libnixstore.so.2.35.0 + 0x19296b) 0x00007f64d2a424fa _ZNSt16_Sp_counted_baseILN9__gnu_cxx12_Lock_policyE2EE10_M_releaseEv (libnixstore.so.2.35.0 + 0x1634fa) 0x00007f64d2a5b5c2 _ZNSt17_Function_handlerIFvSt6futureISt8optionalINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEEEEZN3nix16BinaryCacheStore21queryPathInfoUncachedERKNSB_9StorePathENSB_8CallbackISt10shared_ptrIKNSB_13Va> 0x00007f64d2b9961b _ZNSt23_Sp_counted_ptr_inplaceIN3nix8CallbackISt8optionalINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEEEESaIvELN9__gnu_cxx12_Lock_policyE2EE10_M_disposeEv (libnixstore.so.2.35.0 + 0x2ba61b) 0x00007f64d2a424fa _ZNSt16_Sp_counted_baseILN9__gnu_cxx12_Lock_policyE2EE10_M_releaseEv (libnixstore.so.2.35.0 + 0x1634fa) 0x00007f64d2b9132a _ZNSt17_Function_handlerIFvSt6futureIN3nix18FileTransferResultEEEZNS1_20HttpBinaryCacheStore7getFileERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEENS1_8CallbackISt8optionalISB_EEEEUlS3_E_E10_M_manage> 0x00007f64d2b38f54 _ZN3nix16curlFileTransfer12TransferItemD2Ev (libnixstore.so.2.35.0 + 0x259f54) 0x00007f64d2a4d252 _ZNSt16_Sp_counted_baseILN9__gnu_cxx12_Lock_policyE2EE24_M_release_last_use_coldEv (libnixstore.so.2.35.0 + 0x16e252) 0x00007f64d2b47555 _ZNSt6thread11_State_implINS_8_InvokerISt5tupleIJZN3nix16curlFileTransferC4ERKNS3_20FileTransferSettingsEEUlvE_EEEEE6_M_runEv (libnixstore.so.2.35.0 + 0x268555) 0x00007f64d203ffa4 execute_native_thread_routine (libstdc++.so.6 + 0xf2fa4) 0x00007f64d1db2d53 start_thread (libc.so.6 + 0x9dd53) 0x00007f64d1e3a63c __clone3 (libc.so.6 + 0x12563c)

So apparently this had a very high overhead with the loop constantly enqueueing more work into the executor, which wastes cpu cycles needlessly.

libutil: throw Error instead of asserting in `AbsolutePath` constructors

During the initial implementation I messed up here and also included the whole closure. The invariant of topoSortPaths is that it doesn't include more stuff than what's present in the starting set. The code is updated to reflect that.

@docroot

File transfer retries now use AWS-style "full jitter" exponential backoff, treat HTTP 503 as rate-limited (same longer delay as 429), and honor the Retry-After response header. New nix.conf settings: - filetransfer-retry-attempts (was download-attempts, old name aliased) - filetransfer-retry-delay (100ms): base for transient errors - filetransfer-retry-delay-rate-limited (5000ms): base for 429/503 - filetransfer-retry-max-delay (60000ms): ceiling on backoff growth - filetransfer-retry-jitter (true): enable full jitter Per-substituter overrides are available as store URL parameters (retry-delay, retry-delay-rate-limited, retry-max-delay, retry-attempts), e.g. s3://bucket?retry-attempts=8. The override docstrings link to the corresponding nix.conf settings via @docroot@. Implementation notes: - computeRetryDelayMs + RetryDelayParams + clampedExponential + saturateMs live in filetransfer-impl.hh, included only by filetransfer.cc and unit tests, keeping <random> out of the public header - clampedExponential widens to uint64_t for the intermediate shift to avoid uint32_t overflow - Retry-After parsing uses std::chrono (seconds->ms conversion is overflow-safe since milliseconds::rep is >=45 bits); saturateMs narrows back to uint32_t once at the boundary - retryAfterMs is consumed via std::exchange so it applies to at most one retry attempt (otherwise a transport-level failure after a 503 would reuse the stale server-provided floor) - the retry-guard conditional is a canRetry() IIFE for readability; the Range+Content-Encoding interaction is documented inline - HttpStatus enum struct replaces magic status numbers; std::to_underlying is used where a raw long is required This addresses thundering-herd scenarios where many CI jobs hit the same S3 prefix and receive 503 SlowDown; previously the retry window for 503 was only ~4 seconds. Closes NixOS#15023 Part of NixOS#15419

Table-driven parameterized suite (36 cases) covering exponential growth, ceiling interaction, Retry-After floor semantics, shift clamp boundary at attempt 32/33, and integer-extreme inputs. Plus 5 jitter loop tests for stochastic bounds.

runCommand derivation spinning up a Python HTTP server that returns 503 with Retry-After, verifying that nix retries with the expected backoff timing. Registered in hydra.nix.

libutil: Fix computeClosure and callbackToAwaitable lifetime issues

coderabbitai · 2026-07-02T19:14:07Z

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 0b02625e-1c50-4fbe-bb0a-a4af92ba348e

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch sync-2.35

_{Comment @coderabbitai help to get the list of available commands.}

xokdvium and others added 30 commits March 14, 2026 15:09

libutil: Make PosTable thread-safe too

c0bf78a

callFunction(): Create the primop app chain safely

e94e15e

We should never call reset() on a value (such as vRes) than can be seen by another thread. This was causing random failures about 'partially applied built-in function' etc. (cherry picked from commit 839aec2)

Fix compatibility with lowdown 3

342faaa

Resolves NixOS#15420 Signed-off-by: Lisanna Dettwyler <lisanna.dettwyler@gmail.com>

libstore: Use CURLSSLOPT_NATIVE_CA by default on windows

0dbe5f0

Merge pull request NixOS#15471 from NixOS/windows-can-evaluate-flakes

a776974

Windows build can evaluate flakes

doc: fix typo and clarify requisites description in store-object

7ba80df

Fix "due not" → "do not" typo, and replace the incomplete sentence "Different stores that disagree." with a complete, accurate explanation.

doc: fix wording of paragraph discussing informal property

a86de00

libstore: Make Store::querySubstitutablePathInfos async

d2798d7

libstore: Make queryMissing async too

624a2d0

libutil: Crank up maxConcurrent in computeClosure

af52a18

Coroutines are cheap and the FileTransfer already implements its own rate limiting. This is needed just to limit the maximum number of coroutines in flight.

Merge pull request NixOS#15474 from NixOS/async-for-days

5e3c9f9

Make queryMissing, querySubstitutablePaths, topoSortPaths async

Merge pull request NixOS#15475 from lisanna-dettwyler/support-lowdown-3

a1b4805

Fix compatibility with lowdown 3

Merge pull request NixOS#15486 from NixOS/restore-regular-file-flush-…

e656f41

…in-destructor libutil/fs-sink: Flush FdSink before closing the file descriptor

Merge pull request NixOS#15467 from NixOS/even-more-thread-safety

531132b

Make even more parts of the evaluator thread-safe

Pos::getSource(): Catch all exceptions

39f0fc6

This wasn't handling Interrupted, since it's a subclass of BaseError, not Error. Since this is called from handleExceptions() while printing exceptions, the uncaught exception here was causing Nix to crash with a call to std::unexpected().

Handle exceptions during logError()

6a1fd5f

This hardens Nix against uncaught exceptions (like Interrupted) while printing exceptions.

libutil: throw Error instead of asserting in AbsolutePath constructors

8be0039

libutil/async: Don't spin in forEachAsync

1c74bd0

So apparently this had a very high overhead with the loop constantly enqueueing more work into the executor, which wastes cpu cycles needlessly.

Merge pull request NixOS#15491 from obsidiansystems/absolutepath-throw

2568770

libutil: throw Error instead of asserting in `AbsolutePath` constructors

Add FIXME

67dfcda

test(nix): add filetransfer-retry-backoff integration test

e45b4e5

runCommand derivation spinning up a Python HTTP server that returns 503 with Retry-After, verifying that nix retries with the expected backoff timing. Registered in hydra.nix.

docs: add release note for configurable HTTP retry backoff

fa74911

Merge pull request NixOS#15485 from NixOS/fix-computeClosure-on-errors

e24bfb9

libutil: Fix computeClosure and callbackToAwaitable lifetime issues

edolstra added 28 commits July 2, 2026 19:52

Merge commit '4239a7ae2c7e79c567eacdbe2ab56195796acd91' into sync-2.35

1370919

Merge commit '9b8f2ef2adc69a43dab0a602e632cc229e7c0d90' into sync-2.35

5b640ef

Merge commit 'e6ac69fbb8a25b36fe6d3c279a475373dc1ce7c5' into sync-2.35

5faa267

Merge commit 'a39d724a35133dd1feaeed2deb841307e79d6f88' into sync-2.35

254138c

Merge commit '22531ce276bf762dc5eb96159568f16b06df6a52' into sync-2.35

37017d8

Merge commit '5b18a3160b693c25aaf7a716381caf555304ce14' into sync-2.35

79f650a

Merge commit '45e6194c49b5582be23a14775109753742894731' into sync-2.35

0bcccca

Merge commit 'c047acd88bda4fc13a5d2c1aef746acf7f13a02c' into sync-2.35

16409ca

Merge commit '0eff3eefb68b071fe5a13c50247cd73d8f52d7f6' into sync-2.35

572112f

Merge commit '202111d710cb3a1db541790119254be7bbd2ec9a' into sync-2.35

086dfeb

Merge commit '8bdee4d8dbc1100c0a467312256e1bcca41ffe72' into sync-2.35

6b84269

Merge commit '4a2fc610d77f6ffc7b2f4c767c82dc4447ea434b' into sync-2.35

13d9e6a

Merge commit 'b348d9616ca158da92df8cf9ea45e2ed119c335a' into sync-2.35

b933d40

Fix test

365b572

Merge commit '7d9f7cb4d1da38ae77abaeaf74127aeed4f54fa7' into sync-2.35

ebf0fe5

Merge commit '23ceb797b4545044fa4d8d5f91627865baa0aeb4' into sync-2.35

16dd7f4

Merge commit '75f87497be837620423e1548a35d364c015fd811' into sync-2.35

5ebccbf

Merge commit '582b36168f419c90efd8fa7326e517a6a357cddf' into sync-2.35

58f8c96

Merge commit '2d286fde251c486e0771272dbadc47d23ce33b56' into sync-2.35

8c13d64

Merge commit 'a4b847b4d9dc350125eb7259ecd8bd44d67f39a6' into sync-2.35

2bda550

Merge commit '074d03c26218f95aa5826cf50da5bef5ad927b9f' into sync-2.35

7bc3255

Merge commit '3512a88212e017e880c2c5a14b135011547cab15' into sync-2.35

1ced51e

Merge commit 'dde0b10775907ebc748cacfc3e1d1831a98263e4' into sync-2.35

090fe61

Merge commit '25405812fc5ce64b719dace172c21d2301829deb' into sync-2.35

ee06671

Merge commit '1e63e4d24d11bb450d1356e59ac29c4678c4f27e' into sync-2.35

f064196

Merge commit '8661c9ab14115b47d9230e2a4849549bea35d4a5' into sync-2.35

21421ed

Merge commit 'dd3f676d093aceebccd3cf3f575c11441f01a47b' into sync-2.35

770d6e7

Merge commit '7edcd0a24dc71abb7caa600527833ef540c1bc86' into sync-2.35

0a043f4

edolstra added the flake-regression-test Run the flake regressions test suite on this PR label Jul 2, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Sync with upstream 2.35#530

Sync with upstream 2.35#530
edolstra wants to merge 395 commits into
mainfrom
sync-2.35

edolstra commented Jul 2, 2026

Uh oh!

coderabbitai Bot commented Jul 2, 2026

Review skipped

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

17 participants

Uh oh!

Conversation

edolstra commented Jul 2, 2026

Motivation

Context

Uh oh!

coderabbitai Bot commented Jul 2, 2026

Review skipped

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

17 participants