build: prepare PyAirbyte slim package boundaries#1035
build: prepare PyAirbyte slim package boundaries#1035Aaron ("AJ") Steers (aaronsteers) wants to merge 8 commits into
Conversation
🤖 Devin AI EngineerI'll be helping with this pull request! Here's what you should know: ✅ I will automatically:
Note: I can only respond to comments from users who have write access to this repository. ⚙️ Control Options:
|
👋 Greetings, Airbyte Team Member!Here are some helpful tips and reminders for your convenience. 💡 Show Tips and TricksTesting This PyAirbyte VersionYou can test this version of PyAirbyte using the following: # Run PyAirbyte CLI from this branch:
uvx --from 'git+https://github.com/airbytehq/PyAirbyte.git@devin/1779853894-airbyte-slim-packaging' pyairbyte --help
# Install PyAirbyte from this branch for development:
pip install 'git+https://github.com/airbytehq/PyAirbyte.git@devin/1779853894-airbyte-slim-packaging'PR Slash CommandsAirbyte Maintainers can execute the following slash commands on your PR:
📚 Show Repo GuidanceHelpful ResourcesCommunity SupportQuestions? Join the #pyairbyte channel in our Slack workspace. |
| """ | ||
|
|
||
|
|
||
| DEFAULT_CONNECTOR_PYTHON_VERSIONS: tuple[str, ...] = tuple( |
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughModernizes connector execution to try configurable Python versions, centralizes venv creation, migrates datetime utilities to stdlib with a new parse_datetime, implements lazy top-level exports, relaxes cloud destination typing, and adds slim/full dependency groups and slim build tooling. ChangesCore Library Infrastructure Improvements
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Would you like me to run a quick checklist of likely follow-ups (e.g., verify uv behavior across platforms or confirm environment variable parsing edge cases), wdyt? 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
airbyte/cloud/workspaces.py (1)
441-449:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winCould we validate
destinationTypefor both dict and destination-like inputs?Right now the
destinationTypecheck only runs in the non-dict branch, so raw dict inputs can slip through without early validation and fail later in API calls. Could we move the validation after the branch so both paths are validated, wdyt?Proposed patch
if isinstance(destination, dict): destination_conf_dict = destination.copy() else: destination_conf_dict = destination._hydrated_config.copy() # noqa: SLF001 destination_conf_dict["destinationType"] = destination.name.replace("destination-", "") - if "destinationType" not in destination_conf_dict: - raise exc.PyAirbyteInputError( - message="Missing `destinationType` in configuration dictionary.", - ) + + if "destinationType" not in destination_conf_dict: + raise exc.PyAirbyteInputError( + message="Missing `destinationType` in configuration dictionary.", + )🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@airbyte/cloud/workspaces.py` around lines 441 - 449, The branch that builds destination_conf_dict should validate destinationType for both input types: after the if/else that sets destination_conf_dict (using destination.copy() for dict inputs or destination._hydrated_config + setting destination_conf_dict["destinationType"] = destination.name.replace("destination-", "") for destination-like objects), move the check that raises exc.PyAirbyteInputError if "destinationType" is missing to immediately follow the branch so both paths are validated; update/remove the in-else duplicate check and ensure the check references the destination_conf_dict variable.
♻️ Duplicate comments (3)
airbyte/logs.py (1)
20-20:⚠️ Potential issue | 🔴 Critical | ⚡ Quick winCritical: UTC import fails in docs generation (confirmed by pipeline failure).
The pipeline explicitly reports:
ImportError: cannot import name 'UTC' from 'datetime' (from 'datetime import UTC, datetime').This confirms that the docs generation environment uses Python <3.11. You need a compatibility shim for UTC across all affected files.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@airbyte/logs.py` at line 20, The import "from datetime import UTC, datetime" fails on Python <3.11; update airbyte/logs.py to provide a compatibility shim: import datetime and timezone (e.g., import datetime or from datetime import datetime, timezone), then define UTC = timezone.utc when UTC is not available (or use a try/except ImportError to fall back to timezone.utc); replace usages that rely on UTC in this module to use the shimmed UTC variable so docs and older Python versions don't break.airbyte/cloud/sync_results.py (1)
106-106:⚠️ Potential issue | 🔴 Critical | ⚡ Quick winSame UTC import compatibility issue as in
_message_iterators.py.This import requires Python 3.11+. Please apply the same compatibility fix here.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@airbyte/cloud/sync_results.py` at line 106, Replace the Python-3.11-only import "from datetime import UTC, datetime" with a backwards-compatible import and usage: import datetime and timezone (e.g. "from datetime import datetime, timezone") or use a try/except to import UTC and fall back to timezone.utc, then update any use of the UTC symbol in sync_results.py to reference the chosen fallback (timezone.utc) so code works on older Python versions; ensure references to "UTC" and "datetime" in this module are updated accordingly.airbyte/progress.py (1)
27-27:⚠️ Potential issue | 🔴 Critical | ⚡ Quick winSame UTC import compatibility issue.
Please apply the same Python 3.11+ compatibility fix here as well.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@airbyte/progress.py` at line 27, The import of UTC from datetime is incompatible with Python 3.11+; change the import to "from datetime import datetime, timezone" and replace usages of the symbol UTC with timezone.utc (e.g., update any datetime(..., tzinfo=UTC) or UTC references in functions/classes in this module to use timezone.utc) so the code consistently uses the standard timezone.utc available across supported Python versions.
🧹 Nitpick comments (2)
pyproject.toml (2)
75-97: ⚡ Quick winShould the
fullgroup include theslimgroup?Semantically, "full" typically means the complete feature set, but the current structure treats it as only the additional dependencies beyond slim. This could be confusing when users run
uv sync --only-group fullexpecting to get everything.Consider adding
{ include-group = "slim" }at the beginning of thefullgroup, similar to howdevincludesfull. This would makefulltruly represent the complete dependency set, wdyt?♻️ Suggested structure
full = [ + { include-group = "slim" }, "airbyte-cdk>=7.3.9,<8.0.0", "cyclopts>=4.0,<5.0",This would make the dependency hierarchy clearer:
slim= base runtimefull= slim + additional featuresdev= full + development tools🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pyproject.toml` around lines 75 - 97, The "full" dependency group currently lists only extra packages and should include the "slim" group so it truly represents the complete install; update the "full" group definition in pyproject.toml to include slim by adding an include-group = "slim" entry at the top of the full group (mirroring how the "dev" group includes "full"), ensuring the group name "full" is used and not changing package entries themselves.
54-97: Clarify dependency duplication betweenproject.dependenciesand[dependency-groups](slim/full)In
pyproject.toml,project.dependenciescontains the full set that’s also split into[dependency-groups].slimand.full, while uv treatsdependency-groupsas local/dev-only requirements (not end-user published requirements). Is the intent that end users always get the “full” union viaproject.dependenciestoday, andslim/fullgroups exist only for local workflows/lockfiles (and future package split)? If so, wdyt adding a short comment explaining this transition so version bumps don’t drift?🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pyproject.toml` around lines 54 - 97, Add a short clarifying comment near the project.dependencies section and above the [project.optional-dependencies] / [dependency-groups].slim and .full blocks stating that project.dependencies intentionally contains the union (end-user published requirements) while the slim/full groups are for local/dev workflows and lockfile management (uv/uv.lock), and that developers should update both places when changing published requirements to avoid drift; reference the symbols project.dependencies, dependency-groups.slim, dependency-groups.full, and uv.lock so future maintainers know where to sync changes.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@airbyte/cloud/workspaces.py`:
- Around line 441-449: The branch that builds destination_conf_dict should
validate destinationType for both input types: after the if/else that sets
destination_conf_dict (using destination.copy() for dict inputs or
destination._hydrated_config + setting destination_conf_dict["destinationType"]
= destination.name.replace("destination-", "") for destination-like objects),
move the check that raises exc.PyAirbyteInputError if "destinationType" is
missing to immediately follow the branch so both paths are validated;
update/remove the in-else duplicate check and ensure the check references the
destination_conf_dict variable.
---
Duplicate comments:
In `@airbyte/cloud/sync_results.py`:
- Line 106: Replace the Python-3.11-only import "from datetime import UTC,
datetime" with a backwards-compatible import and usage: import datetime and
timezone (e.g. "from datetime import datetime, timezone") or use a try/except to
import UTC and fall back to timezone.utc, then update any use of the UTC symbol
in sync_results.py to reference the chosen fallback (timezone.utc) so code works
on older Python versions; ensure references to "UTC" and "datetime" in this
module are updated accordingly.
In `@airbyte/logs.py`:
- Line 20: The import "from datetime import UTC, datetime" fails on Python
<3.11; update airbyte/logs.py to provide a compatibility shim: import datetime
and timezone (e.g., import datetime or from datetime import datetime, timezone),
then define UTC = timezone.utc when UTC is not available (or use a try/except
ImportError to fall back to timezone.utc); replace usages that rely on UTC in
this module to use the shimmed UTC variable so docs and older Python versions
don't break.
In `@airbyte/progress.py`:
- Line 27: The import of UTC from datetime is incompatible with Python 3.11+;
change the import to "from datetime import datetime, timezone" and replace
usages of the symbol UTC with timezone.utc (e.g., update any datetime(...,
tzinfo=UTC) or UTC references in functions/classes in this module to use
timezone.utc) so the code consistently uses the standard timezone.utc available
across supported Python versions.
---
Nitpick comments:
In `@pyproject.toml`:
- Around line 75-97: The "full" dependency group currently lists only extra
packages and should include the "slim" group so it truly represents the complete
install; update the "full" group definition in pyproject.toml to include slim by
adding an include-group = "slim" entry at the top of the full group (mirroring
how the "dev" group includes "full"), ensuring the group name "full" is used and
not changing package entries themselves.
- Around line 54-97: Add a short clarifying comment near the
project.dependencies section and above the [project.optional-dependencies] /
[dependency-groups].slim and .full blocks stating that project.dependencies
intentionally contains the union (end-user published requirements) while the
slim/full groups are for local/dev workflows and lockfile management
(uv/uv.lock), and that developers should update both places when changing
published requirements to avoid drift; reference the symbols
project.dependencies, dependency-groups.slim, dependency-groups.full, and
uv.lock so future maintainers know where to sync changes.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 056713d3-2756-43f4-a0a4-10c09ee57715
⛔ Files ignored due to path filters (1)
uv.lockis excluded by!**/*.lock
📒 Files selected for processing (10)
airbyte/__init__.pyairbyte/_executors/python.pyairbyte/_executors/util.pyairbyte/_message_iterators.pyairbyte/cloud/sync_results.pyairbyte/cloud/workspaces.pyairbyte/constants.pyairbyte/logs.pyairbyte/progress.pypyproject.toml
| def _parse_datetime(value: str | int) -> datetime: | ||
| if isinstance(value, int) or ( | ||
| isinstance(value, str) | ||
| and (value.isdigit() or (value.startswith("-") and value[1:].isdigit())) | ||
| ): | ||
| return datetime.fromtimestamp(int(value), tz=timezone.utc) | ||
|
|
||
| if not isinstance(value, str): | ||
| raise TypeError(f"Could not parse datetime string: {value}") | ||
|
|
||
| normalized = value.replace("Z", "+00:00") | ||
| return datetime.fromisoformat(normalized) | ||
|
|
There was a problem hiding this comment.
I get why we want to reuse cdk dateparse logic but higher priority is a deterministic code path. Vendor in what's needed and drop cdk for datetime parse logic here.
There was a problem hiding this comment.
Oh. You already did? Sorry, my bad.
There was a problem hiding this comment.
No worries. I double-checked the current file and there is no remaining airbyte_cdk / ab_datetime_parse import on this path; it is already using the local deterministic helper.
There was a problem hiding this comment.
Let's move this to a proper _util helper module and import a non-underscored helper from there.
There was a problem hiding this comment.
Applied in 9cbc54d: moved this to airbyte._util.datetime_parse.parse_datetime and updated sync_results.py to import the non-underscored helper from there. I also added focused unit coverage for Unix timestamp, RFC3339/Z, offset, date-only, and invalid inputs.
Validation run:
uv run --directory /home/ubuntu/repos/PyAirbyte ruff check scripts/build_slim_package.uv airbyte/_util/datetime_parse.py airbyte/cloud/sync_results.py tests/unit_tests/test_datetime_parse.pyuv run --directory /home/ubuntu/repos/PyAirbyte ruff format --check scripts/build_slim_package.uv airbyte/_util/datetime_parse.py airbyte/cloud/sync_results.py tests/unit_tests/test_datetime_parse.pyuv run --directory /home/ubuntu/repos/PyAirbyte pytest tests/unit_tests/test_datetime_parse.pyUV_DYNAMIC_VERSIONING_BYPASS=0.46.0.dev0 uv run --directory /home/ubuntu/repos/PyAirbyte scripts/build_slim_package.uv
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@scripts/build_slim_package.uv`:
- Around line 98-101: In build_airbyte_slim ensure dist_dir and build_dir are
resolved to absolute paths up front (e.g., using Path.resolve() or
os.path.abspath) so relative-path differences don't change behavior when calling
dist_dir.mkdir(...) and subprocess.run(..., cwd=build_dir); replace any uses of
the original variables with the resolved Path objects and pass
str(resolved_dist_dir) and str(resolved_build_dir) into subprocess.run and mkdir
to keep output deterministic.
- Around line 72-73: The code currently calls shutil.rmtree(build_dir)
unconditionally if build_dir.exists(); add a safety-guard before deletion that
validates build_dir.resolve() is a non-root, non-repo-root, and within an
expected build/output directory subtree (e.g., ensure build_dir.resolve() !=
Path("/") and is a child of the project temp/output directory or matches an
allowed basename like "build" or "dist"); if the check fails, raise an exception
or abort instead of calling shutil.rmtree; update the deletion site that uses
build_dir and shutil.rmtree to perform this validation first.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 3f8ddf5d-3a64-47b0-8db8-acdfb4d66401
📒 Files selected for processing (3)
.github/workflows/release_drafter.ymlpyproject.tomlscripts/build_slim_package.uv
🚧 Files skipped from review as they are similar to previous changes (1)
- pyproject.toml
| _LAZY_EXPORTS = { | ||
| "BigQueryCache": "airbyte.caches.bigquery", | ||
| "CachedDataset": "airbyte.datasets", | ||
| "Destination": "airbyte.destinations.base", | ||
| "DuckDBCache": "airbyte.caches.duckdb", | ||
| "ReadResult": "airbyte.results", | ||
| "Source": "airbyte.sources.base", | ||
| "WriteResult": "airbyte.results", | ||
| "get_colab_cache": "airbyte.caches.util", | ||
| "get_default_cache": "airbyte.caches.util", | ||
| "get_destination": "airbyte.destinations.util", | ||
| "get_source": "airbyte.sources.util", | ||
| "new_local_cache": "airbyte.caches.util", | ||
| } |
There was a problem hiding this comment.
I don't follow what 'lazy exports' is supposed to mean. If obvious/easy, answer here. If shameful/controversial, answer in slack.
There was a problem hiding this comment.
Not shameful. Here "lazy exports" means the top-level airbyte module keeps the existing public API names, but it no longer imports full-package modules until the specific name is accessed.
Example: import airbyte as ab stays lightweight; ab.get_source then imports airbyte.sources.util on first access and caches the resolved function. This lets airbyte-slim import the shared top-level package without immediately requiring full-only modules like caches, sources, destinations, CLI, MCP, DuckDB, SQLAlchemy, or CDK.
Summary
This draft PR starts the PyAirbyte multi-package split requested by AJ by preparing import, dependency, and build boundaries for a new
airbyte-slimdistribution. It keeps the currentairbytepackage behavior intact while reducing eager imports of full-package-only modules, defaulting Python connector execution to uv-managed Python 3.12 with 3.11 fallback, and building matchingairbyteandairbyte-slimartifacts from the same version tag.The slim distribution is now a first-class checked-in package project at
packages/airbyte-slim/pyproject.toml, using the sharedairbyte/source tree with declarative Hatch wheel/sdist excludes instead of a generated pyproject build script. It publishes package metadata asairbyte-slim, targets Python>=3.14,<3.15, excludes local execution, MCP, CLI, cache, source, destination, and SQL processor modules, and supports runtime version lookup from either the full or slim distribution metadata.The PR also moves deterministic datetime parsing into
airbyte._util.datetime_parse.parse_datetimewith focused unit coverage, so cloud sync result parsing no longer depends on CDK datetime helpers.Review & Testing Checklist for Human
packages/airbyte-slim/pyproject.toml,packages/airbyte-slim/airbyte, and.github/workflows/release_drafter.ymlto confirm the first-class slim package layout and release build path should attach bothairbyteandairbyte-slimwheels/sdists from one tag.__getattr__lazy export boundary inairbyte/__init__.py; this follows PEP 562 and keeps full-only imports deferred until their top-level names are accessed.pyproject.tomlanduv.lock; this PR modelsslim,full, anddevgroups for lockfile and packaging boundaries.Recommended end-to-end release artifact check:
UV_DYNAMIC_VERSIONING_BYPASS=0.46.0.dev0 uv build.UV_DYNAMIC_VERSIONING_BYPASS=0.46.0.dev0 uv build packages/airbyte-slim --out-dir dist.dist/contains matchingairbyteandairbyte_slimwheel/sdist artifacts.airbyte_slimwheel metadata forName: airbyte-slimandRequires-Python: <3.15,>=3.14.import airbyteplusairbyte.version.get_version().Notes
Local validation run so far:
uv run --project /home/ubuntu/repos/PyAirbyte ruff check /home/ubuntu/repos/PyAirbyte/airbyte/version.py /home/ubuntu/repos/PyAirbyte/packages/airbyte-slim/pyproject.toml /home/ubuntu/repos/PyAirbyte/pyproject.tomlpyproject.tomlandpackages/airbyte-slim/pyproject.toml.uv run --project /home/ubuntu/repos/PyAirbyte pyrefly check /home/ubuntu/repos/PyAirbyte/airbyte/version.pyuv run --project /home/ubuntu/repos/PyAirbyte pytest /home/ubuntu/repos/PyAirbyte/tests/unit_tests/test_datetime_parse.py /home/ubuntu/repos/PyAirbyte/tests/unit_tests/test_cloud_api_util.pyUV_DYNAMIC_VERSIONING_BYPASS=0.46.0.dev0 uv build /home/ubuntu/repos/PyAirbyte --out-dir /home/ubuntu/repos/PyAirbyte/dist-check && UV_DYNAMIC_VERSIONING_BYPASS=0.46.0.dev0 uv build /home/ubuntu/repos/PyAirbyte/packages/airbyte-slim --out-dir /home/ubuntu/repos/PyAirbyte/dist-checkdist-check/contained matchingairbyteandairbyte_slimwheel/sdist artifacts.airbyte_slimwheel and sdist exclude CLI, MCP, caches, sources, destinations, and SQL processor modules.airbyte_slimwheel metadata hasName: airbyte-slim,Version: 0.46.0.dev0, andRequires-Python: <3.15,>=3.14.uv run --python 3.14 --with /home/ubuntu/repos/PyAirbyte/dist-check/airbyte_slim-0.46.0.dev0-py3-none-any.whl python -c "import airbyte; from airbyte.version import get_version; print(airbyte.__name__, get_version())"uv run --project /home/ubuntu/repos/PyAirbyte poe -C /home/ubuntu/repos/PyAirbyte build-slimLatest CI snapshot before the first-class package pivot was commit
9cbc54d: 20 passed, 0 failed, 0 pending. CI for latest commitbb99bc2is running.Current dependency compatibility finding: target Python 3.14 for the first
airbyte-slimrelease. Python 3.15 metadata resolution works, but install reliability is blocked today by missing or source-build-failing wheels for compiled dependencies such aspydantic-core, withcryptographywheel availability also still catching up.Link to Devin session: https://app.devin.ai/sessions/e8366a4b403b4b9daf05e812c50696b4
Requested by: Aaron ("AJ") Steers (@aaronsteers)