XorqBuckarooWidget: render xorq/ibis expressions natively without materializing to pandas

## Gap

Buckaroo today registers display formatters for `pd.DataFrame`, `pl.DataFrame`, and `geopandas.GeoDataFrame` (`widget_utils.py:101-111`). xorq / ibis expressions have no native rendering path — users have to call `.execute()`, which materializes the whole table to pandas before `BuckarooInfiniteWidget` picks it up. That defeats the push-down model that `XorqStatPipeline` (#691) was built for.

## What's missing

A `XorqBuckarooWidget` (and `XorqBuckarooInfiniteWidget`) that:

1. Takes an ibis/xorq expression as input and never `.execute()`s the whole thing.
2. Computes summary stats via `XorqStatPipeline` (already exists — single batched aggregate + per-column histograms run on the backend).
3. Pages through data via `expr.order_by(...).limit(page_size).offset(page * page_size)` queries — only the rows currently visible round-trip to Python.
4. Registers itself with `widget_utils.enable()` so `expr` in a notebook cell renders without an explicit `.execute()`.

## Architectural template

`LazyInfinitePolarsBuckarooWidget` (`buckaroo/lazy_infinite_polars_widget.py`) is the closest analog. It already handles the lazy/paginated case for Polars: stats computed once on the lazy plan, rows fetched on demand. Same shape works for xorq — substitute `pl.LazyFrame.collect_schema()` → `ibis.Table.schema()`, `pl.col(...).hist(...)` → `XorqStatPipeline`, lazy `.collect()` page slicing → `expr.limit().offset().execute()`.

## Tie-in with #700

#700 proposes folding all histogram queries into a single round-trip per phase. That's a prereq for `XorqBuckarooWidget` to feel snappy — N+1 round-trips per render hurts on remote backends (Snowflake, Postgres). Order is: land #700 first, then build the widget on top.

## post_processing_method gap

Same issue blocks `post_processing_method` for xorq: `CustomizableDataflow._compute_processed_result` (`dataflow/dataflow.py:376`) passes `cleaned_df: pd.DataFrame` to `post_process_df`. The whole DataFlow chain (`raw_df` → `cleaned` → `processed` → `summary_sd` → `widget`) is pandas-shaped. A real xorq widget needs a `XorqDataFlow` analog that runs `cleaned`/`processed` on the expression itself.

If `post_process_df` accepted a xorq expression and returned one, the push-down stays. Polars solved this by having `PolarsBuckarooWidget` subclass `BuckarooWidget` and override the relevant DataFlow steps; the xorq version follows the same pattern.

## Sketch

```python
# Hypothetical
class XorqBuckarooWidget(BuckarooWidget):
    DFStatsClass = XorqDfStatsV2  # already exists
    sampling_klass = XorqSampling  # new — limit/offset based
    autocleaning_klass = XorqAutocleaning  # new — would need to map cleaning ops to ibis
    
    def __init__(self, expr, ...):
        # bind 'expr' (an ibis.Table) instead of pd.DataFrame
        # XorqDataFlow handles per-step chaining
        ...

# In widget_utils.enable():
try:
    import xorq.api as xo
    ip_formatter.for_type(xo.expr.types.relations.Table, _display_xorq_as_buckaroo)
except ImportError:
    pass
```

## Scope

Likely a meaningful chunk of work — `cleaned`/`processed` on ibis exprs is the hard part (mapping cleaning rules to ibis transforms). MVP could skip cleaning + post-processing and just paginate `expr` with `XorqStatPipeline` stats overlay; that already gives a useful widget.

Surfaced in #691 review.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

XorqBuckarooWidget: render xorq/ibis expressions natively without materializing to pandas #701

Gap

What's missing

Architectural template

Tie-in with #700

post_processing_method gap

Sketch

Scope

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

XorqBuckarooWidget: render xorq/ibis expressions natively without materializing to pandas #701

Description

Gap

What's missing

Architectural template

Tie-in with #700

post_processing_method gap

Sketch

Scope

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions