docs: Add KEP-0812 Composable Kale Notebooks proposal by ederign · Pull Request #847 · kubeflow/kale

ederign · 2026-06-25T20:56:35Z

Summary

Adds the design proposal for KEP-0812: Composable Kale Notebooks.

This KEP is the result of collaborative work between @ederign, @StefanoFioravanzo, and @Ya-shh as part of the GSoC 2026 program. Yash delivered a solid POC that validated the core technical approach — compiling each notebook as a KFP sub-pipeline (GraphComponent) with automatic boundary variable detection via AST/PyFlakes analysis. This KEP builds on those findings to define the full design.

Key design decisions

Compilation: Each notebook becomes a KFP sub-pipeline via GraphComponent, preserving cell-level step visibility in the KFP UI (validated in Yash's POC)
Interface declaration: Automatic inference with zero friction — no new tags needed for MVP. Name collisions raise errors.
Composition format: A new notebook cell type in Kale's tag language. Users add notebook:train cells alongside step: cells in the same notebook. The .ipynb file is the composition format — no external config files.
Output marshaling: Non-issue with the sub-pipeline approach — Kale's existing marshal system handles intra-notebook data, KFP artifacts handle inter-notebook data.

What's in the proposal

Motivation, goals, user stories
Background on Kale's current architecture
Detailed evaluation of all design options with rationale for chosen/rejected
Notes on constraints (cycle detection, merge chain, name collisions)
Implementation plan, test plan, migration, consequences
Open questions for future work (nesting depth, React Flow integration)

Ref: #812

Adds the design proposal for composable notebooks — extending Kale to support composition of multiple notebooks into a single KFP pipeline via a new `notebook` cell type. Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Eder Ignatowicz <ignatowicz@gmail.com>

google-oss-prow · 2026-06-25T20:56:43Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from ederign. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

StefanoFioravanzo

I love this KEP, overall it is aligned with what we discussed already and provides a comprehensive overview of what we want to build.
I feel like we need to better explain just a couple of aspects, which I commented above

StefanoFioravanzo · 2026-06-30T13:16:40Z

+
+4. **Type inference uses name heuristics.** KFP artifact types are inferred from variable names using Kale's existing type map (`model`→Model, `dataset`→Dataset, `metrics`→Metrics, etc.). Future work may add explicit type annotations.
+
+5. **`notebook` cells break the code merge chain.** In Kale, untagged cells merge into the previous step. A `notebook:` cell is a reference to another notebook, not executable code — subsequent untagged cells must NOT be merged into it. The `notebook` type must be treated as a boundary (like `imports`, `skip`, `pipeline-parameters`) that stops the merge. Untagged cells after a `notebook:` cell should belong to the next explicit `step:` cell, not to the notebook reference.


Are we sure that imports and pipeline-parameters act as a boundary? skip definitely does, but I don't remember if the others do too. Asking just to make sure

@StefanoFioravanzo , I have checked the parser, and the three don't behave the same. After a skip cell, a following untagged cell merges into the step before it (skip is transparent, it doesn't change the active step). Afterimportsor pipeline-parameters, a following untagged cell goes into that block, not a step. So none of them send a following cell to the next step:, which is what we want for notebook:. So notebook: is its own case, not the same as any of those three. We would need to fix the wording to describe its own rule , @ederign please have a look

StefanoFioravanzo · 2026-06-30T13:30:05Z

+└─────────────────────────────────────────────────────────┘
+```
+
+Kale detects that `train.ipynb` uses `dataset` and `features` (defined by `step:preprocess`) and that it produces `model` and `test_data` (used by `step:evaluate`). The compiled pipeline has `preprocess_step` (a component), `train_pipeline` (a sub-pipeline with its own internal steps), and `evaluate_step` (a component). Data flows automatically between all three — variables cross the step/notebook boundary the same way they cross step/step boundaries.


The train notebook cannot be standalone, since it "needs" to depend on dataset and features variables for it to be acceptable for this pattern. I wonder: how will someone work on and test this notebook?
This is partially mentioned below, but I feel like we haven't really made explicit what is the desired user experience in this case. It might be acceptable for a v1 to indeed require a sub-notebook to not define some of its variables. For a v2, I feel like we need to overcome this limitation

Thinking out loud: If I am developing train and I know someone will attach to it, I will probably define stub variables for me to test the notebook, with dummy values. These stub variables can indeed become part of the signature of the notebook. Below, in Decision 2, we mention the possibility of using notebook-outputs as mechanism to define outputs. Doing something similar for inputs might be interesting, or extending the pipeline-parameters mechanism.

Anyway - I think this part is still up for debate. I think we should add some specific considerations about this in the KEP

@StefanoFioravanzo yeah, this is real gap, right now a sub-notebook has to leave some of its variables undefined for us to pick them up as inputs, which is exactly why it can't run standalone. And if we add stubs to test it, those variables become defined, so we stop picking them up, the real upstream values never get passed in, and it silently runs on the dummy values. So we can't get both standalone-testable and composable unless the stubs are a declared signature, like you said

pipeline-parameters fits scalar inputs, but dataset/features are artifacts, so those would need their own declaration, a notebook-inputs alongside the notebook-outputs idea. Probably both, by type.

So v1 keeps the current behavior which is sub-notebook leaves some variables undefined , and v2 adds explicit input/output declarations for the reusable case. We could add a short section on this @ederign , please have a look

google-oss-prow Bot requested review from StefanoFioravanzo and jesuino June 25, 2026 20:56

google-oss-prow Bot added the size/XL label Jun 25, 2026

This was referenced Jun 27, 2026

feature: Composable Kale Notebooks #812

Open

feat(frontend): Refactor labextension React components from class to functional with hooks #648

Open

StefanoFioravanzo reviewed Jun 30, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

docs: Add KEP-0812 Composable Kale Notebooks proposal#847

docs: Add KEP-0812 Composable Kale Notebooks proposal#847
ederign wants to merge 1 commit into
kubeflow:mainfrom
ederign:kep-0812-composable-notebooks

ederign commented Jun 25, 2026

Uh oh!

google-oss-prow Bot commented Jun 25, 2026

Uh oh!

StefanoFioravanzo left a comment

Uh oh!

StefanoFioravanzo Jun 30, 2026

Uh oh!

Ya-shh Jun 30, 2026 •

edited

Loading

Uh oh!

StefanoFioravanzo Jun 30, 2026

Uh oh!

Ya-shh Jun 30, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants


		4. Type inference uses name heuristics. KFP artifact types are inferred from variable names using Kale's existing type map (`model`→Model, `dataset`→Dataset, `metrics`→Metrics, etc.). Future work may add explicit type annotations.

		5. `notebook` cells break the code merge chain. In Kale, untagged cells merge into the previous step. A `notebook:` cell is a reference to another notebook, not executable code — subsequent untagged cells must NOT be merged into it. The `notebook` type must be treated as a boundary (like `imports`, `skip`, `pipeline-parameters`) that stops the merge. Untagged cells after a `notebook:` cell should belong to the next explicit `step:` cell, not to the notebook reference.

Uh oh!

Conversation

ederign commented Jun 25, 2026

Summary

Key design decisions

What's in the proposal

Uh oh!

google-oss-prow Bot commented Jun 25, 2026

Uh oh!

StefanoFioravanzo left a comment

Choose a reason for hiding this comment

Uh oh!

StefanoFioravanzo Jun 30, 2026

Choose a reason for hiding this comment

Uh oh!

Ya-shh Jun 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

StefanoFioravanzo Jun 30, 2026

Choose a reason for hiding this comment

Uh oh!

Ya-shh Jun 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Ya-shh Jun 30, 2026 •

edited

Loading

Ya-shh Jun 30, 2026 •

edited

Loading