Initialize SKObject ownedObjects/keepAliveObjects ConcurrentDictionary with concurrency level and capacity by nietras · Pull Request #4182 · mono/SkiaSharp

nietras · 2026-06-17T12:57:11Z

Just a suggestion on what a change might be for #4181
Must be evaluated by whether normal usage matches this etc.
I am assuming normal/default use case is 1 UI thread, like in Avalonia. This does not prevent usage with more threads. But is having more buckets/lock objects really necessary for that anyway, as simple "pointer" storage.

Just a suggested change based on mono#4181 Must be evaluated by whether normal usage does not match this

github-actions · 2026-06-17T12:57:24Z

📦 Try the packages from this PR

Warning

Do not run these scripts without first reviewing the code in this PR.

Step 1 — Download the packages

bash / macOS / Linux:

curl -fsSL https://raw.githubusercontent.com/mono/SkiaSharp/main/scripts/get-skiasharp-pr.sh | bash -s -- 4182

PowerShell / Windows:

iex "& { $(irm https://raw.githubusercontent.com/mono/SkiaSharp/main/scripts/get-skiasharp-pr.ps1) } 4182"

Step 2 — Add the local NuGet source

dotnet nuget add source ~/.skiasharp/hives/pr-4182/packages --name skiasharp-pr-4182

More options

Option	Description
`--successful-only` / `-SuccessfulOnly`	Only use successful builds
`--force` / `-Force`	Overwrite previously downloaded packages
`--list` / `-List`	List available artifacts without downloading
`--build-id ID` / `-BuildId ID`	Download from a specific build

Or download manually from Azure Pipelines — look for the nuget artifact on the build for this PR.

Remove the source when you're done:

dotnet nuget remove source skiasharp-pr-4182

mattleibow · 2026-06-24T22:38:06Z

📊 Benchmark: allocations for `SKSurface.Canvas`

I added a BenchmarkDotNet benchmark to benchmarks/SkiaSharp.Benchmarks that exercises the exact path from #4181 — accessing SKSurface.Canvas, which lazily creates the owner's OwnedObjects ConcurrentDictionary. A fresh surface is created per invocation so the dictionary allocation happens on every operation. [MemoryDiagnoser] reports managed allocations per op.

Both runs are identical except for SKObject.cs: before = main (default new ConcurrentDictionary<…>()), after = this PR (concurrencyLevel: 1, capacity: 1).

Before (default constructor — `main`)

BenchmarkDotNet=v0.13.5, OS=macOS 26.5 (25F71) [Darwin 25.5.0]
Apple M3 Pro, 1 CPU, 12 logical and 12 physical cores
.NET SDK=10.0.201
  [Host] : .NET 10.0.5 (10.0.526.15411), Arm64 RyuJIT AdvSIMD

Toolchain=InProcessEmitToolchain  

|           Method |     Mean |     Error |    StdDev |   Median |   Gen0 | Allocated |
|----------------- |---------:|----------:|----------:|---------:|-------:|----------:|
| GetSurfaceCanvas | 3.746 us | 0.2367 us | 0.6677 us | 3.936 us | 0.1869 |   1.54 KB |

After (this PR — `concurrencyLevel: 1, capacity: 1`)

BenchmarkDotNet=v0.13.5, OS=macOS 26.5 (25F71) [Darwin 25.5.0]
Apple M3 Pro, 1 CPU, 12 logical and 12 physical cores
.NET SDK=10.0.201
  [Host] : .NET 10.0.5 (10.0.526.15411), Arm64 RyuJIT AdvSIMD

Toolchain=InProcessEmitToolchain  

|           Method |     Mean |     Error |    StdDev |   Gen0 | Allocated |
|----------------- |---------:|----------:|----------:|-------:|----------:|
| GetSurfaceCanvas | 3.496 us | 0.0444 us | 0.0371 us | 0.0687 |     600 B |

Result

Allocations for SKSurface.Canvas drop from 1.54 KB → 600 B (~62% less, ≈977 B/op) and Gen0 collections fall from 0.1869 → 0.0687 per 1000 ops, with no measurable change in mean time. The dictionary's default concurrencyLevel equals the CPU core count (12 here, so 12 lock objects), so the saving grows on machines with more cores — this is why @nietras observed ~31 object allocations.

Benchmark source

using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Configs;
using BenchmarkDotNet.Jobs;
using BenchmarkDotNet.Toolchains.InProcess.Emit;

namespace SkiaSharp.Benchmarks;

[MemoryDiagnoser]
[Config(typeof(Config))]
public class SKObjectBenchmark
{
	private class Config : ManualConfig
	{
		public Config() =>
			AddJob (Job.Default.WithToolchain (InProcessEmitToolchain.Instance));
	}

	private readonly SKImageInfo info = new SKImageInfo (256, 256);

	[Benchmark]
	public SKCanvas GetSurfaceCanvas ()
	{
		using var surface = SKSurface.Create (info);
		return surface.Canvas;
	}
}

Run with: dotnet run -c Release --project benchmarks/SkiaSharp.Benchmarks -- --filter '*SKObjectBenchmark*'

Copilot

Pull request overview

This PR tweaks SKObject’s lazy-initialized OwnedObjects and KeepAliveObjects dictionaries to use explicit ConcurrentDictionary constructor parameters, aiming to reduce the allocations caused by the default capacity/concurrency settings (related to #4181’s allocation observations).

Changes:

Initialize OwnedObjects with ConcurrentDictionary(concurrencyLevel: 1, capacity: 1) instead of the default constructor.
Initialize KeepAliveObjects with ConcurrentDictionary(concurrencyLevel: 1, capacity: 1) instead of the default constructor.

 					lock (locker) {
-						ownedObjects ??= new ConcurrentDictionary<IntPtr, SKObject> ();
+						ownedObjects ??= new ConcurrentDictionary<IntPtr, SKObject> (
+							concurrencyLevel: 1, capacity: 1);


nietras · 2026-06-25T05:22:49Z

 					lock (locker) {
-						keepAliveObjects ??= new ConcurrentDictionary<IntPtr, SKObject> ();
+						keepAliveObjects ??= new ConcurrentDictionary<IntPtr, SKObject> (
+							concurrencyLevel: 1, capacity: 1);


@mattleibow thanks for looking at this, and I have no problem as such committing changes suggested by copilot, but would it perhaps be better to have these parameters forwarded from different children of SKObject so parameters match usage? SKCanvas keeps capacity 1, others can have more?

mattleibow · 2026-06-25T15:50:55Z

Have you run this in a real app and have data that shows improvement? Also, SkiaSharp can run in a web server in multiple threads. There typically should not be cross-thread usages of these fields.

However, do you have a scenario where these changes help besides the allocations?

What are the downsides of this PR and what scenarios will it impact?

ramezgerges · 2026-06-25T15:55:48Z

I've tested before and after this PR with an Uno Platform sample app, once just measuring the startup allocations and once measuring a steady state animation playing on repeat. The difference is very minor and is indistinguishable from noise, so I'm not sure if it's worth it unless we have a concrete scenario where we're seeing these allocations causing significant GC time or something similar. @nietras did you encounter a real scenario where this PR would decently affect the allocations and/or gc time?

mattleibow · 2026-06-25T17:37:00Z

📉 Perf impact note: `capacity: 1` trades a smaller common case for a costlier "many children" case

I wanted to quantify the one downside of hard-coding capacity: 1: the OwnedObjects / KeepAliveObjects dictionaries start with a single bucket, so an owner that accumulates several children pays repeated resize + rehash, where the old default pre-sized 31 buckets. I benchmarked the exact lifecycle SKObject uses — create → insert N children via the indexer → enumerate + Clear() on dispose — comparing the current default, this PR's (concurrencyLevel: 1, capacity: 1), and a middle-ground (1, 4).

Note: this is a synthetic microbenchmark of the dictionary itself (no native Skia allocation), to isolate the resize cost. Allocations are deterministic; the time columns are noisy in this ShortRun config (errors frequently exceed the means), so I'm only drawing conclusions from Allocated.

BenchmarkDotNet=v0.13.5, OS=macOS 26.5 (25F71) [Darwin 25.5.0]
Apple M3 Pro, 1 CPU, 12 logical and 12 physical cores
  [Host] : .NET 8.0.23 (8.0.2325.60607), Arm64 RyuJIT AdvSIMD
Job=ShortRun  Toolchain=InProcessEmitToolchain  IterationCount=3  LaunchCount=1  WarmupCount=3

|     Method |  N |   Gen0 | Allocated | Alloc Ratio |
|----------- |--- |-------:|----------:|------------:|
|    Default |  1 | 0.1702 |    1424 B |        1.00 |
| Cap1_Conc1 |  1 | 0.0861 |     720 B |        0.51 |
| Cap4_Conc1 |  1 | 0.0899 |     752 B |        0.53 |
|    Default |  2 | 0.1760 |    1472 B |        1.00 |
| Cap1_Conc1 |  2 | 0.0918 |     768 B |        0.52 |
| Cap4_Conc1 |  2 | 0.0956 |     800 B |        0.54 |
|    Default |  4 | 0.1874 |    1568 B |        1.00 |
| Cap1_Conc1 |  4 | 0.1459 |    1224 B |        0.78 |
| Cap4_Conc1 |  4 | 0.1068 |     896 B |        0.57 |
|    Default |  8 | 0.2098 |    1760 B |        1.00 |
| Cap1_Conc1 |  8 | 0.2441 |    2048 B |        1.16 |
| Cap4_Conc1 |  8 | 0.2050 |    1720 B |        0.98 |
|    Default | 16 | 0.2556 |    2144 B |        1.00 |
| Cap1_Conc1 | 16 | 0.2899 |    2432 B |        1.13 |
| Cap4_Conc1 | 16 | 0.2508 |    2104 B |        0.98 |
|    Default | 32 | 0.3471 |    2912 B |        1.00 |
| Cap1_Conc1 | 32 | 0.5341 |    4472 B |        1.54 |
| Cap4_Conc1 | 32 | 0.4940 |    4144 B |        1.42 |
|    Default | 64 | 0.9155 |    7688 B |        1.00 |
| Cap1_Conc1 | 64 | 1.0338 |    8656 B |        1.13 |
| Cap4_Conc1 | 64 | 0.9956 |    8328 B |        1.08 |

Reading the numbers (Allocated):

N ≤ 2 (the dominant real case) — most owners keep exactly one child (surface→canvas, document→stream, colorspace→profile), so this is what actually happens in practice. (1,1) ≈ 49% less (720 B vs 1424 B). This is the win this PR is about. ✅
Crossover at N ≈ 8 — once an owner holds ~8+ children, (1,1) flips to a regression because it resizes repeatedly while the default's 31 buckets absorb the inserts.
Worst case at N = 32 — right around the default capacity boundary, (1,1) allocates +54% (4472 B vs 2912 B). (The ShortRun timings are noisy, but this row was also consistently ~1.5× slower.)
N = 64 — both resize repeatedly, so the gap narrows again (+13%).

Takeaway: for the case this PR targets (0–1 children), it's a clear, safe win and there's no threading downside — each SKObject has its own dictionaries, so concurrencyLevel: 1 only ever matters if the same owner is mutated from multiple threads at once, which is already unsupported (sharing a non-thread-safe Skia object concurrently). The only real cost is per-object allocation for owners that accumulate many children — uncommon, but a measurable regression in that tail.

Suggestion: capacity: 4 captures essentially the full small-N win (752 B vs 720 B at N=1) while removing the N=8–16 regression (back to parity) and halving the N=32 penalty. If we expect any owners to hold more than a couple of children, (concurrencyLevel: 1, capacity: 4) looks like the safer pick than capacity: 1.

nietras · 2026-06-29T13:20:14Z

A little detail on one example app. Simple app showing one Bitmap, bitmap pixels are updated (simple fill byte of all) at some fixed rate e.g. 50 Hz. Using Visual Studio ".NET Object Allocation Tracking" with in code UserMarks I then select a period of 1000 updates after warmup.

There are a LOT of allocations for this simple scenario in the libraries used (not my/user code). But for single type the OwnedObjects object allocations are significant. Note how there are exactly 1000 allocations for some things, and that 31000 is exactly 31 x 1000 as I've written about.

difference is very minor and is indistinguishable from noise

This depends on scenario but this is also the main question here, there are so many allocations here in many parts that the sum of it is so much that any given part perhaps is viewed small. If one always dismissed those "small" parts as "very minor" the amount of allocations will never get small. That is why I asked whether this was viewed as priority as there are so many allocs, to reduce we would need to address each one in turn. :)

Here there is about 140 "reference type" allocations per image update. 140 is a lot, I think 😅 around 2500 bytes per update. This includes dispatcher timer stuff, though. Not all just bitmap but most of it is the bitmap display related code.

I hope to open source this simple benchmark app at some point as I am comparing different .NET UI libraries, many use SkiaSharp so any improvement here would help all.

Initialize ConcurrentDictionary with concurrency level and capacity

0a72031

Just a suggested change based on mono#4181 Must be evaluated by whether normal usage does not match this

github-project-automation Bot added this to SkiaSharp Backlog Jun 17, 2026

dotnet-policy-service Bot added the community ✨ label Jun 17, 2026

mattleibow requested a review from Copilot June 25, 2026 00:03

Copilot started reviewing on behalf of mattleibow June 25, 2026 00:03 View session

Copilot AI reviewed Jun 25, 2026

View reviewed changes

Merge branch 'main' into patch-1

888b610

nietras added 3 commits June 29, 2026 23:22

Merge branch 'main' into patch-1

db52c08

Merge branch 'main' into patch-1

47338e9

change capacity to 4 instead of 1

1592971

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Initialize SKObject ownedObjects/keepAliveObjects ConcurrentDictionary with concurrency level and capacity#4182

Initialize SKObject ownedObjects/keepAliveObjects ConcurrentDictionary with concurrency level and capacity#4182
nietras wants to merge 5 commits into
mono:mainfrom
nietras:patch-1

nietras commented Jun 17, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 17, 2026

Uh oh!

mattleibow commented Jun 24, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

nietras Jun 25, 2026

Uh oh!

mattleibow commented Jun 25, 2026

Uh oh!

ramezgerges commented Jun 25, 2026

Uh oh!

mattleibow commented Jun 25, 2026

Uh oh!

nietras commented Jun 29, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

Conversation

nietras commented Jun 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Jun 17, 2026

📦 Try the packages from this PR

Uh oh!

mattleibow commented Jun 24, 2026

📊 Benchmark: allocations for SKSurface.Canvas

Before (default constructor — main)

After (this PR — concurrencyLevel: 1, capacity: 1)

Result

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

nietras Jun 25, 2026

Choose a reason for hiding this comment

Uh oh!

mattleibow commented Jun 25, 2026

Uh oh!

ramezgerges commented Jun 25, 2026

Uh oh!

mattleibow commented Jun 25, 2026

📉 Perf impact note: capacity: 1 trades a smaller common case for a costlier "many children" case

Uh oh!

nietras commented Jun 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

nietras commented Jun 17, 2026 •

edited

Loading

📊 Benchmark: allocations for `SKSurface.Canvas`

Before (default constructor — `main`)

After (this PR — `concurrencyLevel: 1, capacity: 1`)

📉 Perf impact note: `capacity: 1` trades a smaller common case for a costlier "many children" case

nietras commented Jun 29, 2026 •

edited

Loading