Gap
Buckaroo today registers display formatters for pd.DataFrame, pl.DataFrame, and geopandas.GeoDataFrame (widget_utils.py:101-111). xorq / ibis expressions have no native rendering path — users have to call .execute(), which materializes the whole table to pandas before BuckarooInfiniteWidget picks it up. That defeats the push-down model that XorqStatPipeline (#691) was built for.
What's missing
A XorqBuckarooWidget (and XorqBuckarooInfiniteWidget) that:
- Takes an ibis/xorq expression as input and never
.execute()s the whole thing.
- Computes summary stats via
XorqStatPipeline (already exists — single batched aggregate + per-column histograms run on the backend).
- Pages through data via
expr.order_by(...).limit(page_size).offset(page * page_size) queries — only the rows currently visible round-trip to Python.
- Registers itself with
widget_utils.enable() so expr in a notebook cell renders without an explicit .execute().
Architectural template
LazyInfinitePolarsBuckarooWidget (buckaroo/lazy_infinite_polars_widget.py) is the closest analog. It already handles the lazy/paginated case for Polars: stats computed once on the lazy plan, rows fetched on demand. Same shape works for xorq — substitute pl.LazyFrame.collect_schema() → ibis.Table.schema(), pl.col(...).hist(...) → XorqStatPipeline, lazy .collect() page slicing → expr.limit().offset().execute().
Tie-in with #700
#700 proposes folding all histogram queries into a single round-trip per phase. That's a prereq for XorqBuckarooWidget to feel snappy — N+1 round-trips per render hurts on remote backends (Snowflake, Postgres). Order is: land #700 first, then build the widget on top.
post_processing_method gap
Same issue blocks post_processing_method for xorq: CustomizableDataflow._compute_processed_result (dataflow/dataflow.py:376) passes cleaned_df: pd.DataFrame to post_process_df. The whole DataFlow chain (raw_df → cleaned → processed → summary_sd → widget) is pandas-shaped. A real xorq widget needs a XorqDataFlow analog that runs cleaned/processed on the expression itself.
If post_process_df accepted a xorq expression and returned one, the push-down stays. Polars solved this by having PolarsBuckarooWidget subclass BuckarooWidget and override the relevant DataFlow steps; the xorq version follows the same pattern.
Sketch
# Hypothetical
class XorqBuckarooWidget(BuckarooWidget):
DFStatsClass = XorqDfStatsV2 # already exists
sampling_klass = XorqSampling # new — limit/offset based
autocleaning_klass = XorqAutocleaning # new — would need to map cleaning ops to ibis
def __init__(self, expr, ...):
# bind 'expr' (an ibis.Table) instead of pd.DataFrame
# XorqDataFlow handles per-step chaining
...
# In widget_utils.enable():
try:
import xorq.api as xo
ip_formatter.for_type(xo.expr.types.relations.Table, _display_xorq_as_buckaroo)
except ImportError:
pass
Scope
Likely a meaningful chunk of work — cleaned/processed on ibis exprs is the hard part (mapping cleaning rules to ibis transforms). MVP could skip cleaning + post-processing and just paginate expr with XorqStatPipeline stats overlay; that already gives a useful widget.
Surfaced in #691 review.
Gap
Buckaroo today registers display formatters for
pd.DataFrame,pl.DataFrame, andgeopandas.GeoDataFrame(widget_utils.py:101-111). xorq / ibis expressions have no native rendering path — users have to call.execute(), which materializes the whole table to pandas beforeBuckarooInfiniteWidgetpicks it up. That defeats the push-down model thatXorqStatPipeline(#691) was built for.What's missing
A
XorqBuckarooWidget(andXorqBuckarooInfiniteWidget) that:.execute()s the whole thing.XorqStatPipeline(already exists — single batched aggregate + per-column histograms run on the backend).expr.order_by(...).limit(page_size).offset(page * page_size)queries — only the rows currently visible round-trip to Python.widget_utils.enable()soexprin a notebook cell renders without an explicit.execute().Architectural template
LazyInfinitePolarsBuckarooWidget(buckaroo/lazy_infinite_polars_widget.py) is the closest analog. It already handles the lazy/paginated case for Polars: stats computed once on the lazy plan, rows fetched on demand. Same shape works for xorq — substitutepl.LazyFrame.collect_schema()→ibis.Table.schema(),pl.col(...).hist(...)→XorqStatPipeline, lazy.collect()page slicing →expr.limit().offset().execute().Tie-in with #700
#700 proposes folding all histogram queries into a single round-trip per phase. That's a prereq for
XorqBuckarooWidgetto feel snappy — N+1 round-trips per render hurts on remote backends (Snowflake, Postgres). Order is: land #700 first, then build the widget on top.post_processing_method gap
Same issue blocks
post_processing_methodfor xorq:CustomizableDataflow._compute_processed_result(dataflow/dataflow.py:376) passescleaned_df: pd.DataFrametopost_process_df. The whole DataFlow chain (raw_df→cleaned→processed→summary_sd→widget) is pandas-shaped. A real xorq widget needs aXorqDataFlowanalog that runscleaned/processedon the expression itself.If
post_process_dfaccepted a xorq expression and returned one, the push-down stays. Polars solved this by havingPolarsBuckarooWidgetsubclassBuckarooWidgetand override the relevant DataFlow steps; the xorq version follows the same pattern.Sketch
Scope
Likely a meaningful chunk of work —
cleaned/processedon ibis exprs is the hard part (mapping cleaning rules to ibis transforms). MVP could skip cleaning + post-processing and just paginateexprwithXorqStatPipelinestats overlay; that already gives a useful widget.Surfaced in #691 review.