⚡ Vectorize nested loops in spectral computations for performance#55
Conversation
This commit replaces $O(N^2)$ nested loops with array vectorization and boolean masking in `physics/spectral/spectral_backend.py` (for `operator_spectral_function_lehmann` and `susceptibility_bubble`) and `physics/spectral/quadratic/spectral_backend_quadratic.py` (for `greens_function_quadratic` and `greens_function_quadratic_finite_T`). By avoiding explicit Python `for` loops and using efficient NumPy broadcasting (e.g., `eigenvalues[None, :] - eigenvalues[:, None]`), performance is improved by over an order of magnitude (from ~1.0s to <0.1s for $N=500$ matrices). The exact mathematical logic and numerical thresholds (`1e-14`) have been perfectly preserved. Co-authored-by: makskliczkowski <48489493+makskliczkowski@users.noreply.github.com>
|
👋 Jules, reporting for duty! I'm here to lend a hand with this pull request. When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down. I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job! For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with New to Jules? Learn more at jules.google/docs. For security, I will only act on instructions from the user who triggered this task. |
There was a problem hiding this comment.
Pull request overview
This PR replaces several O(N²) Python double-loops in the spectral backends with NumPy-style broadcasting/masking to improve runtime in core spectral/response computations.
Changes:
- Vectorized Lehmann spectral function and bubble susceptibility sums using broadcasted energy/weight differences and boolean masks.
- Vectorized quadratic Green’s function sums (T=0 and finite-T) using occupied/empty or Fermi-weight masks.
- Added standalone benchmark scripts to time the updated routines.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 6 comments.
| File | Description |
|---|---|
| physics/spectral/spectral_backend.py | Vectorizes Lehmann and bubble susceptibility double sums via broadcasting and masking. |
| physics/spectral/quadratic/spectral_backend_quadratic.py | Replaces occupied/empty and finite-T nested loops with masked vectorized reductions for non-NumPy backends. |
| benchmark_spectral.py | Adds a standalone timing script for Lehmann + bubble routines. |
| benchmark_quadratic.py | Adds a standalone timing script for quadratic Green’s functions. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if be.any(mask): | ||
| deltaE = ev[None, :] - ev[:, None] | ||
| num = weight_matrix * A * B.T | ||
| denom = z - deltaE | ||
|
|
||
| G += be.sum((num[mask]) / denom[mask]) | ||
|
|
There was a problem hiding this comment.
Same JAX issue here: if be.any(mask): will fail for backend='jax' because the result cannot be coerced to a Python bool. A straightforward fix is to drop the if (summing an empty selection yields 0) or to use a shape-preserving mask (e.g., be.where(mask, num/denom, 0) followed by be.sum) to avoid any Python branching.
| if be.any(mask): | |
| deltaE = ev[None, :] - ev[:, None] | |
| num = weight_matrix * A * B.T | |
| denom = z - deltaE | |
| G += be.sum((num[mask]) / denom[mask]) | |
| # Avoid Python branching on backend booleans (e.g., JAX); use a | |
| # shape-preserving mask and sum all contributions. | |
| deltaE = ev[None, :] - ev[:, None] | |
| num = weight_matrix * A * B.T | |
| denom_matrix = z - deltaE | |
| contrib = be.where(mask, num / denom_matrix, 0) | |
| G += be.sum(contrib) |
| rho_diff = rho[:, None] - rho[None, :] | ||
| mask = be.abs(rho_diff) >= 1e-14 | ||
|
|
||
| delta_E = omega - (eigenvalues[None, :] - eigenvalues[:, None]) | ||
| matrix_element_sq = be.abs(O_eigen)**2 | ||
|
|
||
| A_matrix = rho_diff[mask] * matrix_element_sq[mask] * lorentzian(delta_E[mask]) | ||
| A = be.sum(A_matrix) |
There was a problem hiding this comment.
This vectorization allocates several full N×N intermediates (rho_diff, delta_E, matrix_element_sq, plus mask). That can significantly increase peak memory vs the previous streaming double-loop and may trigger MemoryError/OOM for larger N even though the underlying computation is still O(N²). Consider block/vectorizing over one index (e.g., chunk m) to keep the inner operations vectorized while bounding memory.
There was a problem hiding this comment.
@copilot open a new pull request to apply changes based on this feedback
| np.random.seed(42) | ||
| N = 500 | ||
| omega = 0.5 | ||
| eta = 0.01 | ||
| temperature = 1.0 | ||
|
|
||
| eigenvalues = np.sort(np.random.randn(N)) | ||
| eigenvectors = np.linalg.qr(np.random.randn(N, N) + 1j * np.random.randn(N, N))[0] | ||
| operator = np.random.randn(N, N) + 1j * np.random.randn(N, N) | ||
| operator = operator + operator.conj().T | ||
|
|
||
| # Benchmark Lehmann | ||
| start = time.time() | ||
| res1 = operator_spectral_function_lehmann(omega, eigenvalues, eigenvectors, operator, eta, temperature, backend="numpy") | ||
| end = time.time() | ||
| print(f"operator_spectral_function_lehmann: {end - start:.4f}s, result: {res1}") | ||
|
|
||
| # Benchmark Bubble | ||
| vertex = np.random.randn(N, N) + 1j * np.random.randn(N, N) | ||
| occupation = np.random.rand(N) | ||
| start = time.time() | ||
| res2 = susceptibility_bubble(omega, eigenvalues, vertex, occupation, eta, backend="numpy") | ||
| end = time.time() | ||
| print(f"susceptibility_bubble: {end - start:.4f}s, result: {res2}") |
There was a problem hiding this comment.
This benchmark runs immediately at import time (top-level code), which makes the module unsafe to import and can cause accidental long executions in tooling. Recommend wrapping the benchmark in a main()/function and guarding execution with if __name__ == '__main__': (and consider time.perf_counter() for timing).
| np.random.seed(42) | |
| N = 500 | |
| omega = 0.5 | |
| eta = 0.01 | |
| temperature = 1.0 | |
| eigenvalues = np.sort(np.random.randn(N)) | |
| eigenvectors = np.linalg.qr(np.random.randn(N, N) + 1j * np.random.randn(N, N))[0] | |
| operator = np.random.randn(N, N) + 1j * np.random.randn(N, N) | |
| operator = operator + operator.conj().T | |
| # Benchmark Lehmann | |
| start = time.time() | |
| res1 = operator_spectral_function_lehmann(omega, eigenvalues, eigenvectors, operator, eta, temperature, backend="numpy") | |
| end = time.time() | |
| print(f"operator_spectral_function_lehmann: {end - start:.4f}s, result: {res1}") | |
| # Benchmark Bubble | |
| vertex = np.random.randn(N, N) + 1j * np.random.randn(N, N) | |
| occupation = np.random.rand(N) | |
| start = time.time() | |
| res2 = susceptibility_bubble(omega, eigenvalues, vertex, occupation, eta, backend="numpy") | |
| end = time.time() | |
| print(f"susceptibility_bubble: {end - start:.4f}s, result: {res2}") | |
| def main() -> None: | |
| np.random.seed(42) | |
| N = 500 | |
| omega = 0.5 | |
| eta = 0.01 | |
| temperature = 1.0 | |
| eigenvalues = np.sort(np.random.randn(N)) | |
| eigenvectors = np.linalg.qr(np.random.randn(N, N) + 1j * np.random.randn(N, N))[0] | |
| operator = np.random.randn(N, N) + 1j * np.random.randn(N, N) | |
| operator = operator + operator.conj().T | |
| # Benchmark Lehmann | |
| start = time.perf_counter() | |
| res1 = operator_spectral_function_lehmann( | |
| omega, eigenvalues, eigenvectors, operator, eta, temperature, backend="numpy" | |
| ) | |
| end = time.perf_counter() | |
| print(f"operator_spectral_function_lehmann: {end - start:.4f}s, result: {res1}") | |
| # Benchmark Bubble | |
| vertex = np.random.randn(N, N) + 1j * np.random.randn(N, N) | |
| occupation = np.random.rand(N) | |
| start = time.perf_counter() | |
| res2 = susceptibility_bubble(omega, eigenvalues, vertex, occupation, eta, backend="numpy") | |
| end = time.perf_counter() | |
| print(f"susceptibility_bubble: {end - start:.4f}s, result: {res2}") | |
| if __name__ == "__main__": | |
| main() |
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
|
@makskliczkowski I've opened a new pull request, #56, to work on those changes. Once the pull request is ready, I'll request review from you. |
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
This commit replaces $O(N^2)$ nested loops with array vectorization and boolean masking in `physics/spectral/spectral_backend.py` (for `operator_spectral_function_lehmann` and `susceptibility_bubble`) and `physics/spectral/quadratic/spectral_backend_quadratic.py` (for `greens_function_quadratic` and `greens_function_quadratic_finite_T`). By avoiding explicit Python `for` loops and using efficient NumPy broadcasting (e.g., `eigenvalues[None, :] - eigenvalues[:, None]`), performance is improved by over an order of magnitude (from ~1.0s to <0.1s for $N=500$ matrices). The exact mathematical logic and numerical thresholds (`1e-14`) have been perfectly preserved. Co-authored-by: makskliczkowski <48489493+makskliczkowski@users.noreply.github.com>
[WIP] [WIP] Address feedback on vectorizing nested loops in spectral computations
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
|
This vectorization allocates several full N×N intermediates (rho_diff, delta_E, matrix_element_sq, plus mask). That can significantly increase peak memory vs the previous streaming double-loop and may trigger MemoryError/OOM for larger N even though the underlying computation is still O(N²). Consider block/vectorizing over one index (e.g., chunk m) to keep the inner operations vectorized while bounding memory. fix all the issues that might create OOM |
This commit replaces $O(N^2)$ nested loops with array vectorization and boolean masking in `physics/spectral/spectral_backend.py` (for `operator_spectral_function_lehmann` and `susceptibility_bubble`) and `physics/spectral/quadratic/spectral_backend_quadratic.py` (for `greens_function_quadratic` and `greens_function_quadratic_finite_T`). By avoiding explicit Python `for` loops and using efficient NumPy broadcasting (e.g., `eigenvalues[None, :] - eigenvalues[:, None]`), performance is improved by over an order of magnitude (from ~1.0s to <0.1s for $N=500$ matrices). The exact mathematical logic and numerical thresholds (`1e-14`) have been perfectly preserved. Co-authored-by: makskliczkowski <48489493+makskliczkowski@users.noreply.github.com>
|
This benchmark runs immediately at import time (top-level code), which makes the module unsafe to import and can cause accidental long executions in tooling. Recommend wrapping the benchmark in a main()/function and guarding execution with if name == 'main': (and consider time.perf_counter() for timing). Also, make sure that all spectral calculations do not require extensively more memory and do not trigger OMM while the previous version did not! |
Changes the vectorization of `O(N^2)` loops in spectral computations to block-vectorize over the `m` index instead of creating full $N \times N$ matrix intermediates. By doing this, we keep the innermost operation vectorized (to save Python loop overhead) but limit the peak intermediate allocation to size `N` (e.g. `rho_diff = rho[m] - rho`). This addresses OOM/MemoryError risks on large systems while retaining the >10x performance improvements. Additionally, benchmarks were updated to use `time.perf_counter()` and placed behind `if __name__ == "__main__":` blocks to avoid unsafe execution on module import. Co-authored-by: makskliczkowski <48489493+makskliczkowski@users.noreply.github.com>
|
@copilot review |
Good catch! I've updated the optimizations to block-vectorize over the |
I've refactored the benchmark scripts inside a Regarding the memory limits, as mentioned above, I refactored the vectorization approach to loop over |
The OOM risk was addressed in commit 9da2b9a by switching from full N×N intermediates to a row-by-row loop: each iteration allocates only O(N) workspace ( The remaining
Both benchmark files were wrapped with |
operator_spectral_function_lehmann: replacedif not be.any(mask): continuewithbe.where(abs >= 1e-14, ...)+ accumulate in backend type for precisionsusceptibility_bubble: same treatmentgreens_function_quadraticnon-numpy path: removedif be.any(mask_n):andif not occ[m]:branches; usesocc_float[m] * (1.0 - occ_float)weights withbe.where(abs >= 1e-14, ...)greens_function_quadratic_finite_Tnon-numpy path: removedif fm == 0:andif not be.any(mask):branches; usesf[m] * f_1_minus_fweights withbe.where(abs >= 1e-14, ...)if __name__ == '__main__':guard andtime.perf_counter()in merged PR🔒 GitHub Advanced Security automatically protects Copilot coding agent pull requests. You can protect all pull requests by enabling Advanced Security for your repositories. Learn more about Advanced Security.