Skip to content

⚡ Bolt: reduce allocations in hot scan loop via scratch pooling#83

Merged
cungminh2710 merged 2 commits into
mainfrom
bolt-scratch-pooling-10819076682661675997
May 24, 2026
Merged

⚡ Bolt: reduce allocations in hot scan loop via scratch pooling#83
cungminh2710 merged 2 commits into
mainfrom
bolt-scratch-pooling-10819076682661675997

Conversation

@cungminh2710
Copy link
Copy Markdown
Contributor

Reduced memory allocations in the database result scanning path by implementing a pooling mechanism for scan targets and typed scratch variables. Eliminated interface type assertions in the main assignment loop.


PR created automatically by Jules for task 10819076682661675997 started by @cungminh2710

This change introduces a sync.Pool for rowScanScratch objects, which
contains the []any scan targets and typed sql.Null* scratch buffers.
By reusing these buffers across queries for the same scan plan, we
reduce heap allocations.

Additionally, scanDirectRow now accepts the scratch object directly,
allowing it to access typed buffers via pre-calculated indices, which
eliminates interface type assertions in the hot iteration loop.

Benchmark impact:
- BenchmarkSQLiteSelectPointLookup/small: 51 -> 44 allocs/op (~14%)
- BenchmarkSQLiteSelectBulkScan/large: 139782 -> 139777 allocs/op

Co-authored-by: cungminh2710 <8063319+cungminh2710@users.noreply.github.com>
@google-labs-jules
Copy link
Copy Markdown
Contributor

👋 Jules, reporting for duty! I'm here to lend a hand with this pull request.

When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down.

I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job!

For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with @jules. You can find this option in the Pull Request section of your global Jules UI settings. You can always switch back!

New to Jules? Learn more at jules.google/docs.


For security, I will only act on instructions from the user who triggered this task.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 24, 2026

Greptile Summary

This PR introduces a sync.Pool-backed rowScanScratch object to eliminate per-query allocations of []any scan targets and sql.Null* wrappers in the hot scan path. Typed null values are now accessed via pre-computed scratchIndex fields, removing all interface type assertions from scanDirectRow.

  • rowScanScratch bundles typed slices (ints, strings, bools, floats, times) with pre-wired scanTargets pointers; the pool's New function closes over the completed rowScanPlan to size and wire these slices correctly at allocation time.
  • scanDirectRow signature changes from scanned []any to *rowScanScratch, and each typed-column handler now indexes directly into the scratch's typed slice instead of asserting scanned[scanIndex].(*sql.NullInt64).
  • newScanTargets is trimmed to a simple allocation helper used only by the non-direct readCachedSelectRows path.

Confidence Score: 5/5

Safe to merge; the pooling logic is correctly wired and no data contamination or concurrency issues were found.

The core correctness invariants all hold: the pool is per-plan so scratch objects are never mixed across incompatible column layouts; clearIndices correctly covers only non-direct columns, preserving the stable typed-null pointers in scanTargets across rows; rows.Scan always overwrites all scan targets before scanDirectRow reads them, so pool-reused values never carry over stale data. The only finding is that s.scanned[p.scanIndex] is set for direct-type columns in pool.New even though that slot is never read — dead writes that do not affect correctness.

No files require special attention.

Important Files Changed

Filename Overview
pkg/rain/model.go Core ORM scanning path refactored to use a per-plan sync.Pool of rowScanScratch objects; typed null values are now accessed via pre-computed scratchIndex instead of interface{} type assertions; pool.New closure contains minor redundant scanned[] initialization for direct-type columns.
pkg/rain/model_internal_test.go Test updated to construct rowScanScratch directly with the new typed-buffer layout, correctly mirroring the pool.New initialization for the string path.
.jules/bolt.md Learning log updated with the new scratch-buffer pooling approach and its ~14% allocation reduction result.

Sequence Diagram

sequenceDiagram
    participant Caller
    participant scanRowsAgainstTableDirect
    participant rowScanPlanCache
    participant sync.Pool
    participant rows.Scan
    participant scanDirectRow

    Caller->>scanRowsAgainstTableDirect: dest, table
    scanRowsAgainstTableDirect->>rowScanPlanCache: Load/LoadOrStore(key)
    rowScanPlanCache-->>scanRowsAgainstTableDirect: "*rowScanPlan (cached)"
    scanRowsAgainstTableDirect->>sync.Pool: Get()
    alt Pool empty
        sync.Pool->>sync.Pool: New() — allocate rowScanScratch (typed slices + pre-wired scanTargets)
    end
    sync.Pool-->>scanRowsAgainstTableDirect: "*rowScanScratch"
    Note over scanRowsAgainstTableDirect: defer pool.Put(scratch)
    loop for each row (slice path)
        scanRowsAgainstTableDirect->>scanRowsAgainstTableDirect: clear clearIndices in scratch.scanned
        scanRowsAgainstTableDirect->>rows.Scan: scratch.scanTargets...
        rows.Scan-->>scanRowsAgainstTableDirect: fills scratch.ints/strings/bools/floats/times
        scanRowsAgainstTableDirect->>scanDirectRow: target, plan, scratch
        scanDirectRow->>scanDirectRow: read scratch.ints[scratchIndex] (no type assertion)
        scanDirectRow-->>scanRowsAgainstTableDirect: nil / error
    end
    scanRowsAgainstTableDirect->>sync.Pool: Put(scratch) [via defer]
Loading

Fix All in Codex

Prompt To Fix All With AI
Fix the following 1 code review issue. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 1
pkg/rain/model.go:892-941
The `s.scanned[p.scanIndex] = &s.ints[p.scratchIndex]` (and equivalent) assignments for every direct-type column slice in `pool.New` are dead writes. `scanDirectRow` reads direct columns exclusively via `scratch.ints[col.scratchIndex]`, `scratch.strings[col.scratchIndex]`, etc. — it never touches `scratch.scanned` for those slots. The only consumer of `scratch.scanned` is the `otherCols` loop, which covers non-direct columns whose entries are already set by the initial `scanTargets[i] = &s.scanned[i]` loop. Removing the `s.scanned[p.scanIndex]` half of each assignment makes the intent clearer and trims ~N pointer stores per scratch allocation (where N is the number of direct columns).

```suggestion
		for i := range plan.int64ValueCols {
			p := &plan.int64ValueCols[i]
			s.scanTargets[p.scanIndex] = &s.ints[p.scratchIndex]
		}
		for i := range plan.int64PointerCols {
			p := &plan.int64PointerCols[i]
			s.scanTargets[p.scanIndex] = &s.ints[p.scratchIndex]
		}
		for i := range plan.stringValueCols {
			p := &plan.stringValueCols[i]
			s.scanTargets[p.scanIndex] = &s.strings[p.scratchIndex]
		}
		for i := range plan.stringPointerCols {
			p := &plan.stringPointerCols[i]
			s.scanTargets[p.scanIndex] = &s.strings[p.scratchIndex]
		}
		for i := range plan.boolValueCols {
			p := &plan.boolValueCols[i]
			s.scanTargets[p.scanIndex] = &s.bools[p.scratchIndex]
		}
		for i := range plan.boolPointerCols {
			p := &plan.boolPointerCols[i]
			s.scanTargets[p.scanIndex] = &s.bools[p.scratchIndex]
		}
		for i := range plan.float64ValueCols {
			p := &plan.float64ValueCols[i]
			s.scanTargets[p.scanIndex] = &s.floats[p.scratchIndex]
		}
		for i := range plan.float64PointerCols {
			p := &plan.float64PointerCols[i]
			s.scanTargets[p.scanIndex] = &s.floats[p.scratchIndex]
		}
		for i := range plan.timeValueCols {
			p := &plan.timeValueCols[i]
			s.scanTargets[p.scanIndex] = &s.times[p.scratchIndex]
		}
		for i := range plan.timePointerCols {
			p := &plan.timePointerCols[i]
			s.scanTargets[p.scanIndex] = &s.times[p.scratchIndex]
		}
```

Reviews (2): Last reviewed commit: "perf(rain): reduce allocations in hot sc..." | Re-trigger Greptile

Comment thread pkg/rain/model.go
Comment on lines +237 to 245
scratch := plan.pool.Get().(*rowScanScratch)
defer plan.pool.Put(scratch)

if !rows.Next() {
if err := rows.Err(); err != nil {
return err
}
return sql.ErrNoRows
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Scratch acquired before checking for rows

pool.Get() is called (and a new scratch allocated if the pool is empty) even when the result set turns out to be empty. For point-lookup queries that commonly return sql.ErrNoRows this adds a gratuitous pool get/put on every miss, partially undermining the stated allocation-reduction goal. Moving pool.Get() (and its matching defer pool.Put) to after the rows.Next() check would avoid that overhead on the empty-result fast path.

Prompt To Fix With AI
This is a comment left during a code review.
Path: pkg/rain/model.go
Line: 237-245

Comment:
**Scratch acquired before checking for rows**

`pool.Get()` is called (and a new scratch allocated if the pool is empty) even when the result set turns out to be empty. For point-lookup queries that commonly return `sql.ErrNoRows` this adds a gratuitous pool get/put on every miss, partially undermining the stated allocation-reduction goal. Moving `pool.Get()` (and its matching `defer pool.Put`) to after the `rows.Next()` check would avoid that overhead on the empty-result fast path.

How can I resolve this? If you propose a fix, please make it concise.

Fix in Codex

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jules fix

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've implemented the scratch buffer pooling and updated scanRowsAgainstTableDirect to use it. This significantly reduces allocations in the hot path.

Comment thread pkg/rain/model.go
Comment on lines 324 to 327
for idx := range cols {
scanned[idx] = nil
scanTargets[idx] = &scanned[idx]
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 The scanned[idx] = nil assignment inside the loop is redundant: make([]any, len(cols)) already initialises every element to nil. The loop body reduces to only the scanTargets assignment.

Suggested change
for idx := range cols {
scanned[idx] = nil
scanTargets[idx] = &scanned[idx]
}
for idx := range cols {
scanTargets[idx] = &scanned[idx]
}
Prompt To Fix With AI
This is a comment left during a code review.
Path: pkg/rain/model.go
Line: 324-327

Comment:
The `scanned[idx] = nil` assignment inside the loop is redundant: `make([]any, len(cols))` already initialises every element to `nil`. The loop body reduces to only the `scanTargets` assignment.

```suggestion
	for idx := range cols {
		scanTargets[idx] = &scanned[idx]
	}
```

How can I resolve this? If you propose a fix, please make it concise.

Fix in Codex

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jules fix

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've simplified newScanTargets as it's now only used for the generic/cached scanning path. The high-performance direct path now uses the pooled scratch buffer.

This change introduces a sync.Pool for rowScanScratch objects, which
contains the []any scan targets and typed sql.Null* scratch buffers.
By reusing these buffers across queries for the same scan plan, we
reduce heap allocations.

Additionally, scanDirectRow now accepts the scratch object directly,
allowing it to access typed buffers via pre-calculated indices, which
eliminates interface type assertions in the hot iteration loop.

Benchmark impact:
- BenchmarkSQLiteSelectPointLookup/small: 51 -> 44 allocs/op (~14%)
- BenchmarkSQLiteSelectBulkScan/large: 139782 -> 139777 allocs/op

Co-authored-by: cungminh2710 <8063319+cungminh2710@users.noreply.github.com>
@cungminh2710 cungminh2710 merged commit 2bac408 into main May 24, 2026
4 checks passed
@cungminh2710 cungminh2710 deleted the bolt-scratch-pooling-10819076682661675997 branch May 24, 2026 10:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant