Skip to content

tree-ops optimizations for large trees#90

Merged
DrChainsaw merged 2 commits into
shashi:masterfrom
DrChainsaw:tree-ops-opt
Jun 11, 2026
Merged

tree-ops optimizations for large trees#90
DrChainsaw merged 2 commits into
shashi:masterfrom
DrChainsaw:tree-ops-opt

Conversation

@DrChainsaw

@DrChainsaw DrChainsaw commented Jun 10, 2026

Copy link
Copy Markdown
Collaborator

Some optimizations for large trees and slow disks:

  • Rewrite of a couple of tree-ops (mainly regex_rewrite_tree which powers mv and cp) to avoid excessive copying.
  • Canged path and Path to drastically reduce the amount of allocations from concatenating strings.
  • Add paralleldepth and follow_symlinks to FileTree constructor.

I realized that the pre-walkdir implementation actually followed symlinks by default, and walkdir has follow_symlinks set to false as default which technically made that commit breaking. Defaulting it to true fixes that breakage. Also, contrary to the intuition, follow_symlinks=true is actually the faster option as when set to false it causes walkdir to stat each directory to test if it a symlink (which is not a huge cost since this generally does not touch the filesystem, but still).

mv speedup example:

julia> selfirst(x,y) = x;

julia> t1 = maketree("root" => [string(x) => [string(y) => [(name="data_$(x)_$(y)", value=(x,y))] for y in 1:100] for x in 1:100]);

0.4.3 (after warmup):

julia> length(files(t1))
10000

julia> @time mv(t1, r"(0|13)", s"4"; combine=selfirst);
 21.483043 seconds (150.06 M allocations: 6.289 GiB, 4.15% gc time)

This PR (after warmup):

julia> length(files(t1))
10000

julia> @time mv(t1, r"(0|13)", s"4"; combine=selfirst);
  0.044058 seconds (671.86 k allocations: 24.961 MiB)

FileTree ctor example:

julia> @time FileTree(evilnfsdir);
186.243081 seconds (7.95 M allocations: 701.794 MiB, 0.70% gc time)

julia> @time FileTree(evilnfsdir);
 87.029311 seconds (7.95 M allocations: 701.799 MiB, 1.43% gc time)

julia> @time FileTree(evilnfsdir; paralleldepth=Inf);
 34.147295 seconds (8.18 M allocations: 719.123 MiB, 2.28% gc time, 47 lock conflicts)

julia> @time FileTree(evilnfsdir; paralleldepth=Inf);
 13.578561 seconds (8.18 M allocations: 718.286 MiB, 8.87% gc time, 51 lock conflicts)

@ghyatzo: This might be interesting for you.

@DrChainsaw DrChainsaw merged commit 44d69d6 into shashi:master Jun 11, 2026
3 checks passed
@DrChainsaw DrChainsaw deleted the tree-ops-opt branch June 11, 2026 12:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant