Skip to content

docs: add contributor-guide page for python client design#1588

Draft
andygrove wants to merge 1 commit intoapache:mainfrom
andygrove:docs/contributor-guide-python-client
Draft

docs: add contributor-guide page for python client design#1588
andygrove wants to merge 1 commit intoapache:mainfrom
andygrove:docs/contributor-guide-python-client

Conversation

@andygrove
Copy link
Copy Markdown
Member

Which issue does this PR close?

Closes #.

Rationale for this change

The contributor guide currently has only a single bullet about the Python bindings under code-organization.md, and it links to python/src/context.rs — a file that no longer exists (the current files are lib.rs, cluster.rs, and utils.rs). There is no contributor-facing explanation of how the wheel actually works, even though the design is non-obvious: the Python package depends on datafusion-python, intercepts SessionContext via a metaclass to return a DistributedDataFrame, and only crosses into Ballista at execution time by serializing the locally-built logical plan and shipping it to a fresh SessionContext::remote_with_state. The known limitations listed in python/README.md are direct consequences of that design but the connection isn't documented anywhere.

What changes are included in this PR?

  • New docs/source/contributors-guide/python-client.md describing the crate/package layout, the metaclass + bridge mechanism, the cluster lifecycle helpers (BallistaScheduler, BallistaExecutor, setup_test_cluster), and how each documented limitation maps back to a specific piece of the design. Cross-links to architecture.md and the relevant tracking issues (Ballista Python Issue(s) #1142, Add support for Python UDFs in distributed queries #173).
  • Fixed the broken python/src/context.rs link in code-organization.md and pointed the PyBallista section at the files that exist today, plus the new design page.
  • Added the new page to the contributors-guide toctree in docs/source/index.rst.

This PR only touches contributor guide content — the user guide is being improved separately.

Are there any user-facing changes?

No code changes; documentation only.

Document how the python wheel layers on top of datafusion-python via
metaclass interception and a logical-plan-shipping bridge, and explain
why the README's known limitations follow from that design. Fix the
broken python/src/context.rs link in code-organization.md and add the
new page to the contributors-guide toctree.
@github-actions github-actions Bot added the documentation Improvements or additions to documentation label Apr 26, 2026
@andygrove andygrove marked this pull request as draft April 26, 2026 21:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant