morton index surface followup: points= encode + __from_arrow__#86
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #86 +/- ##
==========================================
+ Coverage 93.61% 93.77% +0.15%
==========================================
Files 25 25
Lines 3728 3822 +94
==========================================
+ Hits 3490 3584 +94
Misses 238 238
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report in Codecov by Harness.
🚀 New features to boost your workflow:
|
espg
left a comment
There was a problem hiding this comment.
🤖 from Claude (review)
Phase 1 review (commit 3ab4bea, issue #79 phase (a)). The change is correct and meets the acceptance criteria: from_latlon(..., points=True) produces Kind::Point words at order 29, and points=False (default) is unchanged Kind::Area.
Verified:
from_nested_pointtuple decomposition is bit-faithful toencode_point(orders 28/29 land intuples[27]/tuples[28]); pair extraction mirrorsfrom_nested's body loop; the base>11 guard mirrorsfrom_nested.- The PyO3 binding
rust_mi_from_nested_pointfaithfully mirrorsrust_mi_from_nested—allow_threads,catch_unwind,par_iter, and thepanic_msg/PyValueErrorerror path all match. - Kernel tests assert
encode_pointequality,Kind::Point,order==29, nested round-trip, and the oversized-nested panic. Python tests assert order 29 + kind Point, area-unchanged default, coarsen→area, and the non-29 reject. - No flake8 style-pass (88-col) violations in the Python diff; no over-long Rust lines.
Findings (all minor / non-blocking, details inline):
(1) from_nested_point silently masks order-≥30 bits via & 3 rather than rejecting them — consistent with from_nested, flagged only to make it a conscious choice.
(2) The ValueError guard accepts an explicit order=29, points=True (indistinguishable from default); a non-integer order raises TypeError from int() rather than the guard's ValueError. Both look intended.
(3) Test nice-to-haves: assert point words differ from the order-29 area words for the same location, add the accept-explicit-29 path, and confirm the >180 longitude fixture matches rust_ang2pix's expected range.
Nothing here blocks advancing to the next phase. Note decimal_morton.rs is already ~1568 lines (pre-existing, not from this diff) — standing observation for @espg per §4, not a finding against this PR.
Generated by Claude Code
| together with ``points=True`` is a ``ValueError`` (the default | ||
| ``order`` is ``MAX_ORDER`` so the point path needs no extra argument). | ||
| """ | ||
| if points and int(order) != MAX_ORDER: |
There was a problem hiding this comment.
🤖 from Claude (review)
The guard is points and int(order) != MAX_ORDER. Worth being explicit that an explicit order=29, points=True is accepted silently (it can't be distinguished from the default, since both resolve to int(order)==29). The docstring already says "pass order=29 (the default) or omit it", so this is intended — just confirming the edge case is by design and the test test_points_true_with_non29_order_raises covers only the reject path (order=12), not the accept-explicit-29 path. A one-line test that from_latlon(..., order=29, points=True) succeeds and yields points would lock that contract in.
Also note int(order) is evaluated before the lat/lon coercion, so a non-integer order with points=True raises a TypeError (from int()) rather than the ValueError — fine, but a different exception type than the explicit guard suggests.
Generated by Claude Code
| _, orders, kinds, _ = _rustie.rust_mi_decode(a._data) | ||
| np.testing.assert_array_equal(orders, np.full(len(self.LATS), 29)) | ||
| np.testing.assert_array_equal(kinds, np.ones(len(self.LATS), dtype=np.uint8)) | ||
|
|
There was a problem hiding this comment.
🤖 from Claude (review)
Coverage is solid — test_points_true_yields_order29_points asserts both order==29 and kind==1 (Point), test_points_false_default_is_area pins the area default, and test_points_coarsen_and_to_nested confirms coarsen→area. The kinds==1 assertion is what guards against a regression that silently routed points=True back through the area path.
Two small additions worth considering (non-blocking): (1) a direct assertion that points=True words differ from the order=29 area words for the same location (the kind bit makes them distinct; asserting pts._data != area._data documents that intent next to test_points_share_nested_cell_with_area, which only checks the shared nested cell). (2) The LONS fixture uses values >180 (200.0, 305.0, 270.2) — confirm that's the intended 0–360 convention rust_ang2pix expects, since the area tests elsewhere may use a different range.
Generated by Claude Code
| if not parts: | ||
| return MortonIndexArray(np.empty(0, dtype=np.uint64)) | ||
| return MortonIndexArray._concat_same_type(parts) | ||
| return to_morton_index(array) |
There was a problem hiding this comment.
🤖 from Claude (review)
Null/NA round-trip is silently unhandled here, and this is the one path where it could bite. to_morton_index does storage.to_numpy(zero_copy_only=False) on the chunk/array storage; if an Arrow morton_index column carries nulls (a real scenario once a MortonIndexArray with isna() sentinels is written to Parquet and read back), the uint64 to_numpy either raises or fills with an arbitrary value rather than mapping back to the all-zero _SENTINEL that isna() expects. Phase (b)'s stated acceptance is words+dtype preservation for non-null data, so this isn't a blocker — but the whole Arrow suite (test_arrow.py) never exercises a null, so the gap is invisible. Standing for @espg: is null/NA Arrow interop in scope for issue #79, or a follow-up? If in scope, a test writing a column with a pd.NA element through Parquet and asserting isna() survives would pin it.
Generated by Claude Code
| """ | ||
| from .arrow import _require_pyarrow, to_morton_index | ||
|
|
||
| pa = _require_pyarrow() |
There was a problem hiding this comment.
🤖 from Claude (review)
Lazy-import discipline is correctly preserved: the from .arrow import _require_pyarrow, to_morton_index is method-local, the module top still imports only numpy and _rustie, so import mortie.morton_index stays numpy-only. Reusing the sibling module's _require_pyarrow helper matches the existing intra-package pattern (arrow.py and morton_index.py both gate optional deps through their own _require_*), so the private-name import is consistent with the codebase rather than a smell. One small note: to_morton_index already calls _require_pyarrow() internally, so the pa bound here is needed only for the isinstance(array, pa.ChunkedArray) discriminator — which is legitimate, just worth knowing it isn't doing double duty as the import guard for the conversion itself.
Generated by Claude Code
espg
left a comment
There was a problem hiding this comment.
🤖 from Claude (review)
Phase 2 review (commit ddfda07, issue #79 phase (b): wire MortonIndexDtype.__from_arrow__ so table.to_pandas() on a morton_index column yields a MortonIndexArray). The change is correct and meets the stated acceptance criteria.
Verified:
- pyarrow stays lazy — the
from .arrow import _require_pyarrow, to_morton_indexis method-local; module top still imports onlynumpyand_rustie, soimport mortie.morton_indexremains numpy-only importable. No module-level pyarrow import added. __from_arrow__is correctly a method ofMortonIndexDtype(8-space indent, same level asconstruct_array_type).- ChunkedArray path iterates
array.chunksin order, converts each viato_morton_index, and concatenates withMortonIndexArray._concat_same_type(which doesnp.concatenatein list order) — order and words preserved across chunks. - Empty ChunkedArray (zero chunks) returns an empty
MortonIndexArray(np.empty(0, dtype=np.uint64)). - Plain
pa.Arraypath returnsto_morton_index(array), i.e. aMortonIndexArray. - Wiring is complete:
arrow.py'sMortonIndexType.to_pandas_dtype()returnsMortonIndexDtype(), which is what routesto_pandas()into this hook. - Tests cover dtype-name + backing-array-type + bit-identical words (in-memory), the chunked order-preservation path, the plain-array path, and the full Parquet ->
to_pandas()round-trip. - Line lengths within the 88-col style pass; no
E9/F63/F7/F82concerns; commit message is title-only per §3.
Findings (all minor / non-blocking, details inline):
(1) Null/NA Arrow round-trip is silently unhandled (to_morton_index -> storage.to_numpy(zero_copy_only=False) won't map Arrow nulls back to the all-zero _SENTINEL that isna() expects); the whole Arrow suite never exercises a null, so the gap is invisible. Phase (b)'s acceptance is non-null words+dtype, so not a blocker — standing for @espg whether null interop is in scope for issue #79 or a follow-up.
(2) Test nice-to-haves: assert isinstance(series.array, MortonIndexArray) directly in the Parquet round-trip test, and add a one-line empty-ChunkedArray test to lock that handled-but-untested edge.
(3) Note on the private _require_pyarrow import: consistent with the codebase's intra-package optional-dep pattern, and pa is genuinely needed for the isinstance(..., pa.ChunkedArray) discriminator (to_morton_index already guards pyarrow internally).
Nothing here blocks the phase. None of the findings imply a side-effecting action; items for @espg are flagged as standing, not acted on.
Generated by Claude Code
There was a problem hiding this comment.
🤖 from Claude (review)
Test coverage for phase (b) is solid and meets the acceptance criteria: test_to_pandas_yields_morton_index_series asserts all three of dtype.name == "morton_index", isinstance(series.array, MortonIndexArray), and bit-identical words; test_from_arrow_handles_chunked_array exercises a 2-chunk ChunkedArray and confirms both order preservation across chunks and bit-identical words; test_from_arrow_handles_plain_array covers the pa.Array branch; and test_parquet_to_pandas_full_round_trip pins the Parquet -> to_pandas() path end to end.
Two non-blocking nits:
(1) test_parquet_to_pandas_full_round_trip asserts dtype.name and compares series.array._data, but unlike the in-memory test it doesn't assert isinstance(series.array, mortie.MortonIndexArray) directly (it's only implied by the ._data access). One explicit isinstance line would make the full-roundtrip path assert the backing-array type the same way the in-memory test does.
(2) The empty-ChunkedArray branch (pa.chunked_array([], type=morton_index) -> empty MortonIndexArray) is handled in the code but never tested. A one-line test would lock that edge in.
Generated by Claude Code
Merging this PR will improve performance by 13.69%
|
| Benchmark | BASE |
HEAD |
Efficiency | |
|---|---|---|---|---|
| ⚡ | coverage_triangle[4] |
131.6 µs | 115.8 µs | +13.69% |
Tip
Curious why this is faster? Comment @codspeedbot explain why this is faster on this PR, or directly use the CodSpeed MCP with your agent.
Comparing claude/79-morton-index-surface (91b73a7) with main (8482da6)
Footnotes
-
1 benchmark was skipped, so the baseline result was used instead. If it was deleted from the codebase, click here and archive it to remove it from the performance reports. ↩
|
🤖 from Claude Folded the diff-scoped findings from the two adversarial self-reviews into commit
Addressed-by-design (no code change), grounding the self-review notes:
Standing for review (1): Null/NA Arrow interop. The phase-(b) self-review flagged that All phases complete; local gates green ( Generated by Claude Code |
Closes #79
Surfaces the two
morton_indexskin gaps flagged in issue #79 (followups from #73 / #51): theKind::Pointencode path and the Arrow -> pandas__from_arrow__hook. One PR, two phases.Phase (a) - thread
points=True->Kind::PointThe pandas/pyarrow skins only ever produced area cells; the kernel's point encode (
encode_point,decimal_morton.rs:206) had no public surface. This adds the full bridge:src_rust/src/decimal_morton.rs): newfrom_nested_point(nested)- the point twin offrom_nested. It decomposes an order-29 nested id into its 29 stored0..=3tuples and callsencode_point, so the result decodes withkind == Kind::Point,order == 29. It guards an oversized nested index (base > 11) the same wayfrom_nesteddoes. Unit tests:from_nested_point_matches_encode_point(bit-identical toencode_point,Kind::Point, order 29, shares the nested cell with the area word) andfrom_nested_point_rejects_oversized_nested.src_rust/src/lib.rs): a dedicated single-purposerust_mi_from_nested_point(nested_array), mirroringrust_mi_from_nested(nopoint: boolflag), registered alongside it.mortie/morton_index.py):MortonIndexArray.from_latlongainspoints=False. Whenpoints=True, lat/lon is hashed at order 29 and routed throughrust_mi_from_nested_point; an explicitorder != 29passed withpoints=TrueraisesValueError. The defaultorderis alreadyMAX_ORDER(29), so the point path needs no extra argument.mortie/tests/test_morton_index.py,TestPointEncode):points=Truewords decode asKind::Point/order 29; points share a nested cell with the matching area word; coarsening a point to a lower order yields an area cell;points=False(and the default) stayKind::Area;order != 29+points=Trueraises.Answer to the
encode_pointorder-arg questionNo,
encode_pointdoes not take anorderargument. Its signature ispub fn encode_point(base_cell: u8, tuples: &[u8]) -> u64(decimal_morton.rs:206). Point encoding is order-29-only by construction: it assertstuples.len() >= MAX_ORDER(29) and hardcodesMAX_ORDERinto both the body pack and the point suffix (pack_prefix_body(base_cell, tuples, MAX_ORDER) | build_point_suffix(tuples[27], tuples[28])). A point is the max-resolution "this exact location, no area claim" encoding, so there is no coarser-order point - that is why no order parameter exists. (By contrast the areaencode/from_nesteddo take an order/depth, because area cells exist at every order 0..=29.) The Python side mirrors this:points=Trueforces order 29 and rejects any explicitorder != 29, so a caller cannot ask for a non-existent "order-N point".Phase (b) - wire
__from_arrow__Before:
table.to_pandas()on amorton_indexArrow column produced a plain int64 Series (the dtype had no__from_arrow__hook). Now:mortie/morton_index.py):MortonIndexDtype.__from_arrow__(self, array)accepts apa.Arrayorpa.ChunkedArray, delegates tomortie.arrow.to_morton_index, iterates chunks and concatenates viaMortonIndexArray._concat_same_typefor the chunked case, and returns aMortonIndexArray. The pyarrow import stays lazy (reusesarrow._require_pyarrow), somorton_index.pyremains numpy-only importable.mortie/tests/test_arrow.py,TestFromArrowHook):table.to_pandas()yields amorton_index-dtyped Series backed by aMortonIndexArraywith bit-identical words;__from_arrow__handles both plain and chunked arrays; full Parquet ->read_table->to_pandas()preserves words + dtype.Phases checklist
from_nested_pointkernel +rust_mi_from_nested_pointbinding +from_latlon(points=...)+ testsMortonIndexDtype.__from_arrow__(Array + ChunkedArray) + testsHow it was tested
All run in an isolated venv on this branch:
cargo fmt(applied) +cargo test-> 175 passed (includes the 2 new kernel tests)cargo clippy-> only pre-existing warnings (theuseless_vecwarnings incoverage/tests.rsand theuseless_conversiononPyResult<PyObject>shared by all 31 existing#[pyfunction]bindings); the new binding matches that established pattern, so no new warning is introduced.maturin develop --release-> builds cleanflake8 mortie --select=E9,F63,F7,F82-> clean; non-blocking--max-line-length=88pass clean on touched filespytest -v-> 450 passed, 8 skipped (31 intest_morton_index.py, 15 intest_arrow.py)Questions for review
cargo clippyreports 31useless_conversionwarnings onPyResult<PyObject>(every existing binding) anduseless_vecwarnings incoverage/tests.rs. None are introduced by this change; the newrust_mi_from_nested_pointdeliberately mirrors the existingrust_mi_from_nestedstyle rather than diverging. Left as-is per the "don't fix unrelated CI noise / match the surrounding code" conventions - flag if you'd rather I clean up the binding pattern repo-wide in a separate PR.__from_arrow__reusesarrow._require_pyarrow(a leading-underscore helper) to keep the lazy-import contract in one place. If you'd prefer a public accessor instead of importing the private name, say so.Generated by Claude Code