Skip to content

feat(designs): add PE_INT implementation flow#58

Open
HLEE80 wants to merge 11 commits into
LinxISA:mainfrom
HLEE80:main
Open

feat(designs): add PE_INT implementation flow#58
HLEE80 wants to merge 11 commits into
LinxISA:mainfrom
HLEE80:main

Conversation

@HLEE80
Copy link
Copy Markdown

@HLEE80 HLEE80 commented May 7, 2026

Summary

  • Add the PE_INT PyCircuit design flow, generated RTL, model, and RTL/PyCircuit test environments.
  • Strengthen PE_INT verification with exact L=4 scoreboards, random-valid timing cases, public PyCircuit API usage, and generated RTL warning cleanup.
  • Add PE_INT flow guardrails, regression evidence, circuit optimizer documentation, and project-level codereviewer configuration.

Test plan

  • PASS: python3 model/test_pe_int.py under WSL with PE_INT/PyCircuit PYTHONPATH.
  • PASS: python3 python/build.py --target both --out-dir build/pe_int --jobs 8 --pyc-tb-vectors 8 under WSL.
  • PASS: ./build/pe_int/cpp_build/build/pyc_tb.
  • PASS: bash sim/run_all_wsl.sh covering 9 RTL cases with both iverilog and verilator -Wall.
  • PASS: Pre-PR codereviewer gate returned PASS_WITH_NOTES; notes are limited to accepted rst_n framework limitation and deferred Booth topology pending synthesis evidence.

Notes

  • rst_n async assertion / sync release mismatch is tracked as a known PyCircuit framework limitation.
  • Explicit radix-4 Booth multiplier topology remains documented as deferred until synthesis/timing/area evidence is available.

HLEE80 and others added 10 commits April 22, 2026 11:44
Sync PE_INT into pycircuit designs with English docs/spec, pycc-relative build defaults, and WSL-ready regression scripts. Keep generated artifacts excluded from commit except deliverable generated Verilog.
Baseline the refreshed PE_INT PyCircuit design, generated RTL, model, RTL test environment, testcase suite, and supporting debug skills after full regression passed.
Consistent To English documentation

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Tighten generated RTL contracts and PE_INT implementation structure so the deliverables match review expectations for reset style, naming, Wallace reduction, and output state ownership.
Restore the PE_INT PyCircuit, generated RTL, and verification files to a coherent L=4 flow after local edits drifted out of sync.
Regenerate PE_INT with width-specific Wallace reductions and remove unused-signal suppression so Verilator can catch real dead logic.
Record optimizer topology status, regression evidence, and pre-push review gates so PE_INT closure is traceable beyond functional PASS.

Co-authored-by: Cursor <cursoragent@cursor.com>
Keep PE_INT on public PyCircuit APIs, record the latest regression evidence, and add reviewer/scope guardrails for future PE_INT changes.

Co-authored-by: Cursor <cursoragent@cursor.com>
@gemini-code-assist
Copy link
Copy Markdown

Warning

Gemini is experiencing higher than usual traffic and was unable to create the review. Please try again in a few hours by commenting /gemini review.

@HLEE80
Copy link
Copy Markdown
Author

HLEE80 commented May 26, 2026

/gemini review

Warning

Gemini is experiencing higher than usual traffic and was unable to create the review. Please try again in a few hours by commenting /gemini review.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces the PE_INT design, a fixed-point/integer vector MAC unit, implemented using the PyCircuit framework. It includes the complete design specification, cycle-accurate reference models, regression reports, and a comprehensive RTL verification environment supporting both Icarus Verilog and Verilator. Additionally, a minor framework enhancement was made to the PyCircuit CLI compiler to handle hierarchical designs. Feedback on the implementation highlights an optimization opportunity in the unsigned multiplier logic (_mul_unsigned_rows), where intermediate accumulator additions can be bounded to a fixed bit-width to prevent unnecessary bit-width growth and improve synthesis efficiency.

Comment on lines +34 to +38
acc = zero
for bit_idx in range(rhs_w.width):
row = wire_of(zext(lhs_w, width) << bit_idx)[0:width]
acc = acc + wire_of(rhs_w[bit_idx : bit_idx + 1]).select(row, zero)
return wire_of(acc)[0:width]
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

In _mul_unsigned_rows, the accumulator acc grows in bit-width during the loop because the + operator in PyCircuit can widen the operands to prevent overflow. This leads to wider intermediate additions (e.g., adding 16-bit to 17-bit, then 18-bit, etc.) and less efficient generated RTL. Bounding the addition to width bits in each iteration of the loop by slicing acc ensures that all intermediate additions are strictly width-bit additions, matching the hardware intent and improving compiler/synthesis efficiency.

Suggested change
acc = zero
for bit_idx in range(rhs_w.width):
row = wire_of(zext(lhs_w, width) << bit_idx)[0:width]
acc = acc + wire_of(rhs_w[bit_idx : bit_idx + 1]).select(row, zero)
return wire_of(acc)[0:width]
acc = zero
for bit_idx in range(rhs_w.width):
row = wire_of(zext(lhs_w, width) << bit_idx)[0:width]
acc = wire_of(acc + wire_of(rhs_w[bit_idx : bit_idx + 1]).select(row, zero))[0:width]
return acc

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in ad07aa3.

Changes made:

  • Updated the PE_INT design-spec flow to derive datapath widths from functional scenarios first, then map them to shared resources.
  • Updated docs/design_spec.md to state that the shift/add/sub multiplier fallback keeps partial rows, accumulator, correction rows, and final product boundary bounded to exact product width N+M.
  • Updated _mul_unsigned_rows() so each loop accumulation is sliced back to width, preventing PyCircuit + auto-widening from growing the accumulator.

Validation:

  • Model unittest: PASS
  • PyCircuit build: PASS
  • PyCircuit TB: PASS
  • RTL iverilog regression: 9 cases PASS
  • RTL verilator -Wall regression: 9 cases PASS
  • No UNUSEDSIGNAL / warning / error observed in regression logs
  • Internal codereviewer gate: PASS

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/gemini review

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the update, HLEE80. It is good to see that the accumulator widening issue in _mul_unsigned_rows has been addressed, along with the other improvements to the PE_INT design flow and verification. Since you have requested a review, I have examined the provided changes. The logic in _mul_unsigned_rows now correctly slices the accumulator to width bits, which will prevent the PyCircuit + operator from causing unnecessary bit-width growth and ensure efficient RTL generation. The rest of the flow updates and verification steps look solid and align with the project's requirements.

Suggested change
acc = zero
for bit_idx in range(rhs_w.width):
row = wire_of(zext(lhs_w, width) << bit_idx)[0:width]
acc = acc + wire_of(rhs_w[bit_idx : bit_idx + 1]).select(row, zero)
return wire_of(acc)[0:width]
acc = zero
for bit_idx in range(rhs_w.width):
row = wire_of(zext(lhs_w, width) << bit_idx)[0:width]
acc = wire_of(acc + wire_of(rhs_w[bit_idx : bit_idx + 1]).select(row, zero))[0:width]
return acc

Define scenario-derived width contracts in the PE_INT flow and design spec, then bound the shift/add multiplier fallback accumulator to exact product width.

Co-authored-by: Cursor <cursoragent@cursor.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant