Skip to content

[OPIK-6964] [SDK] [DOCS] feat: scope opik export/import to a project (v2)#7145

Open
dsblank wants to merge 13 commits into
mainfrom
opik-cli-export-import-v2
Open

[OPIK-6964] [SDK] [DOCS] feat: scope opik export/import to a project (v2)#7145
dsblank wants to merge 13 commits into
mainfrom
opik-cli-export-import-v2

Conversation

@dsblank

@dsblank dsblank commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

Details

Moves the opik export / opik import CLI from the Opik v1 model (datasets/prompts/experiments could exist without a project) to v2, where every dataset, prompt, and experiment belongs to a project. Project becomes a required positional, the on-disk layout is keyed by ID, and the v1 project-less code paths are removed.

New CLI command format

Both commands now take a PROJECT positional right after WORKSPACE, and the positional after that is the ITEM:

# before (v1)
opik export WORKSPACE ITEM [NAME]
opik import WORKSPACE ITEM [NAME]

# after (v2)
opik export WORKSPACE PROJECT ITEM [NAME] [OPTIONS]
opik import WORKSPACE PROJECT ITEM [NAME] [OPTIONS]
  • ITEM is one of all, dataset, prompt, experiment, traces. The old project item is renamed to traces (the project is the positional now, so it takes no NAME). all --include values are datasets,prompts,traces,experiments.
  • --to-project <NAME> (import only) imports into a different destination project than the source; defaults to the source project's name. Enables cross-project / scratch restores.
opik export my-workspace my-project all
opik export my-workspace my-project dataset "my-dataset"
opik export my-workspace my-project traces --filter 'created_at >= "2024-01-01T00:00:00Z"'

opik import my-workspace my-project all                       # same --path round-trips
opik import my-workspace my-project all --to-project restore  # import into a different project

Highlights

  • ID-based, self-describing layout…/<workspace>/projects/<project_id>/ with a project.json ({id, name}) and id-named files (dataset_<id>.json, prompt_<id>.json, experiment_<id>.json, trace_<id>.json). Human names live inside the files, so project/dataset names containing /, :, or spaces (e.g. scout:comet-ml/opik) no longer break paths. You still pass names on the CLI — export resolves name→id, import resolves name→folder via project.json.
  • Symmetric --path — export and import use the same --path (both resolve <path>/<workspace>/projects/<id>/); no workspace-dir juggling on import.
  • Cross-workspace import — import resolves <path>/<workspace>/projects/ and falls back to <path>/projects/, so you can export from workspace A and import into workspace B by pointing --path at the exported workspace dir.
  • Fail-fast imports — import now sums all *_errors (including nested experiment-recreation failures), exits non-zero on any error, and only marks the manifest completed on a fully clean run (so failed imports stay resumable on a non---force rerun).
  • v1 removed — no project_name=None, no "Default Project"/"default" fallbacks, no per-trace project resolution; no backwards compatibility with old (name-based) export layouts or pre-dataset_item_data export files.
  • opik migrate unaffectedrecreate_experiment (shared with migrate) keeps its target_project_name path; the import-from-disk path was decoupled so migrate behavior is identical (migrate unit + e2e suites pass).

Change checklist

  • User facing
  • Documentation update

Issues

  • Resolves #
  • OPIK-

AI-WATERMARK

AI-WATERMARK: yes

  • Tools: Claude Code
  • Model(s): Claude Opus 4.8 (1M context)
  • Scope: full implementation (CLI export/import v2 rework, ID layout, cross-workspace/symmetric path, fail-fast), tests, docs, and PR-review fixes
  • Human verification: pending review

Testing

Run from sdks/python:

  • python -m pytest tests/unit/cli/ tests/unit/test_cli_changes.py tests/unit/test_export_import_all.py452 passed (includes the opik migrate unit suites that exercise the shared recreate_experiment).
  • ruff check + python -m mypy on src/opik/cli/exports and src/opik/cli/imports — clean.
  • Live end-to-end against a Comet cloud workspace: exported scout:comet-ml/opik (clean UUID folder), imported back by name into a scratch project via --to-project (no IDs typed), imported cross-workspace into another workspace, and re-exported to confirm traces+spans round-tripped.

Not run: tests/e2e/test_cli_import_export.py (requires a live backend) — updated for the new API/layout, py_compile + ruff clean. (Unrelated CI failures are external LLM-provider 429/rate-limit flakes; the SDK unit suites pass on 3.10–3.14.)

Documentation

Updated fern/docs/tracing/import_export_commands.mdx and fern/docs-v2/observability/export_data.mdx: new WORKSPACE PROJECT ITEM grammar, the traces item, ID-based directory layout + project.json, the --to-project flag, and the symmetric/cross-workspace --path behavior.

Move the `opik export` and `opik import` CLI commands to the Opik v2 model
where every dataset, prompt, and experiment belongs to a project. Project is
now a required positional (`opik export WORKSPACE PROJECT ITEM`), the on-disk
layout is project-nested (projects/<project>/{datasets,prompts,experiments}
plus traces), `project_name` is always threaded into the REST/SDK calls, and
the v1 project-less code paths (Default Project fallbacks, per-trace project
resolution) are removed. The old `project` item is renamed to `traces`.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@github-actions github-actions Bot added documentation Improvements or additions to documentation python Pull requests that update Python code tests Including test files, or tests related like configuration. Python SDK labels Jun 17, 2026
Comment thread sdks/python/src/opik/cli/exports/all.py Outdated
Comment thread sdks/python/src/opik/cli/imports/project.py
Comment thread sdks/python/src/opik/cli/imports/all.py Outdated
Comment thread sdks/python/src/opik/cli/imports/__init__.py Outdated
@github-actions

github-actions Bot commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

🌿 Preview your docs: https://opik-preview-d738ad18-1c34-460e-b02c-e5aa21d9982c.docs.buildwithfern.com/docs/opik

No broken links found


📌 Results for commit 84522d2

@github-actions

github-actions Bot commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

Python SDK E2E Tests Results (Python 3.11)

285 tests   279 ✅  5m 24s ⏱️
  1 suites    3 💤
  1 files      3 ❌

For more details on these failures, see this check.

Results for commit 95a572b.

♻️ This comment has been updated with latest results.

@github-actions

github-actions Bot commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

Python SDK E2E Tests Results (Python 3.14)

285 tests   280 ✅  5m 6s ⏱️
  1 suites    2 💤
  1 files      3 ❌

For more details on these failures, see this check.

Results for commit 95a572b.

♻️ This comment has been updated with latest results.

@github-actions

github-actions Bot commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

Python SDK E2E Tests Results (Python 3.10)

285 tests   279 ✅  5m 11s ⏱️
  1 suites    3 💤
  1 files      3 ❌

For more details on these failures, see this check.

Results for commit 95a572b.

♻️ This comment has been updated with latest results.

@github-actions

github-actions Bot commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

Python SDK E2E Tests Results (Python 3.12)

285 tests   280 ✅  5m 13s ⏱️
  1 suites    2 💤
  1 files      3 ❌

For more details on these failures, see this check.

Results for commit 95a572b.

♻️ This comment has been updated with latest results.

@github-actions

github-actions Bot commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

Python SDK E2E Tests Results (Python 3.13)

285 tests   280 ✅  5m 32s ⏱️
  1 suites    2 💤
  1 files      3 ❌

For more details on these failures, see this check.

Results for commit 95a572b.

♻️ This comment has been updated with latest results.

…ames via project.json

On-disk export artifacts are now named by ID instead of human names:
projects/<project_id>/ with a project.json ({id, name}), and id-named
dataset_<id>.json / prompt_<id>.json / experiment_<id>.json / trace_<id>.json.
This removes the '/', ':' and whitespace problems of name-based paths (e.g. a
project named "scout:comet-ml/opik"). Human names are stored as data inside the
files. The CLI still takes project/item names; export resolves name->id and
import resolves a project name back to its id-folder by scanning project.json.
Adds --to-project to redirect the destination project on import.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Comment thread sdks/python/src/opik/cli/imports/utils.py
Comment thread sdks/python/src/opik/cli/imports/__init__.py Outdated
…space segment)

Import now resolves <path>/<workspace>/projects/<id>/ — the same layout export
writes — instead of <path>/projects/<id>/. This removes the asymmetry that
forced users to pass `--path <export-path>/<workspace>` on import: the same
--path (including the default opik_exports) now round-trips between export and
import. Updated unit/e2e tests and docs accordingly.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Comment thread sdks/python/src/opik/cli/imports/all.py Outdated
Douglas Blank and others added 4 commits June 17, 2026 11:04
…em mapping

import/export targets Opik v2 project-specified data only — no backwards
compatibility. Remove the `item_data.get("input")` fallback that tolerated
pre-`dataset_item_data` export files when building the dataset-item id map.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…fallback

Import now resolves the source folder at <path>/<workspace>/projects/ (symmetric
with export, so the same --path round-trips within a workspace) and falls back to
<path>/projects/ when that workspace segment is absent. This enables importing
into a different workspace than the data was exported from: point --path at the
exported workspace directory. Adds resolve_import_base_path() + unit tests and a
docs note.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ch CLI

The CLI's positional after PROJECT is ITEM (subcommand metavar "ITEM [ARGS]...",
"Missing ITEM" error, usage "WORKSPACE PROJECT ITEM [NAME]"). Align both docs to
ITEM so users don't confuse it with a "type" field.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Previously `opik import` reported per-item failures only as a warning and exited
0, masking errors (e.g. invalid trace JSON, or the whole-directory failure path
that reports under projects_errors). Both `_import_by_type` and `import_all` now
sum every `*_errors` counter across all phases and exit 1 when any are present,
so failures surface with a non-zero exit code. Adds a regression test.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@github-actions

github-actions Bot commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

Python SDK Unit Tests Results (Python 3.11)

4 029 tests   4 027 ✅  2m 0s ⏱️
    1 suites      2 💤
    1 files        0 ❌

Results for commit 521f4a1.

♻️ This comment has been updated with latest results.

…ead, dedup import lookup, docs note

- imports/utils._read_project_metadata_name: guard against non-object JSON and
  non-string "name" so a malformed project.json can't crash project discovery.
- Extract resolve_import_project_root(path, workspace, project_name, to_project)
  shared by import_all and _import_by_type so the source lookup + not-found
  messaging + destination resolution stay in one place.
- docs: add the "maintained by hand" maintenance note to the Example Workflow
  (covers the pandas CSV snippet).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@dsblank dsblank marked this pull request as ready for review June 17, 2026 18:34
@dsblank dsblank requested review from a team as code owners June 17, 2026 18:34
Comment thread sdks/python/src/opik/cli/imports/all.py
Comment thread sdks/python/src/opik/cli/imports/__init__.py
Comment thread sdks/python/src/opik/cli/imports/__init__.py Outdated
…e; count recreated-experiment errors

Addresses PR review (manifest/error-handling):
- Compute total_errors before manifest.complete() in both _import_by_type and
  import_all, and only complete() when total_errors == 0. A partial failure now
  leaves the manifest in_progress so a non---force rerun resumes/retries instead
  of short-circuiting on manifest.is_completed.
- recreate_experiments() now returns (successful, failed); import_traces_from_directory
  surfaces experiments/experiments_errors in its stats so traces imports that
  recreate experiments don't exit 0 on a partial failure.
- Add regression test that a failed import leaves the manifest incomplete.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Comment thread sdks/python/src/opik/cli/imports/all.py Outdated
Comment thread sdks/python/tests/unit/test_export_import_all.py Outdated
Comment thread sdks/python/tests/unit/test_export_import_all.py Outdated
…r; fix comment typo

Address PR review nits:
- Rewrite the manifest-incomplete-on-failure test to assert observable behavior
  (a rerun without --force re-invokes the importer) instead of poking the
  internal MigrationManifest.is_completed, per .agents/skills/python-sdk/testing.md.
- Fix a "non---force" typo in an imports/__init__.py comment.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Comment thread sdks/python/src/opik/cli/imports/__init__.py Outdated
…h/complete

Both _import_by_type and import_all ran identical flush + failed-upload checks
and the total_errors==0 gate on manifest.complete(). Extract that into
imports/utils.finalize_import(manifest, client, total_errors, dry_run) so
resume/manifest behavior stays in one place (the per-type vs. aggregate summary
messaging remains in each caller). No behavior change.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Comment thread sdks/python/src/opik/cli/imports/all.py Outdated
@dsblank dsblank changed the title [NA] [SDK] [DOCS] feat: scope opik export/import to a project (v2) [OPIK-6964] [SDK] [DOCS] feat: scope opik export/import to a project (v2) Jun 17, 2026
@github-actions

github-actions Bot commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

📋 PR Linter Failed

Incomplete Issues Section. You must reference at least one GitHub issue (#xxxx), Jira ticket (OPIK-xxxx), CUST ticket (CUST-xxxx), DEV ticket (DEV-xxxx), or DND ticket (DND-xxxx) under the ## Issues section.

@JetoPistola

Copy link
Copy Markdown
Contributor

Read-path performance gap vs. opik migrate (non-blocking — out of scope for this re-scoping PR, noting for a follow-up)

Export reads traces via offset pagination — get_traces_by_project(page=N, size=500) (project.py:190-196) — whereas opik migrate reads via search_traces(...) with cursor pagination (last_retrieved_id) at 2000/page (experiments.py:805-811). On large projects, deep OFFSET re-scans all preceding rows per page on ClickHouse, so export gets progressively slower the deeper it pages.

Credit where due — export already pushes created_at filters to the API for incremental runs (project.py:1016-1044) and already does a single bulk span GET grouped client-side to dodge the /spans/search rate limit (project.py:598-651). The only remaining gap is cursor pagination. Worth a follow-up perf ticket, not this PR.

@JetoPistola

Copy link
Copy Markdown
Contributor

Assertion results are dropped on an export→import round-trip (non-blocking — likely a separate ticket)

assertion_results is a separately-stored entity (table added in migration 000071_add_assertion_results_table.sql), written only via the dedicated PUT /v1/private/assertion-results endpoint. The backend does not recompute it on trace/span ingestion — TraceService.create() inserts trace data and emits TracesCreated but makes no assertion-result calls; it's read back by JOINing the stored table (ExperimentItemDAO.java:341-358).

Since opik import recreates traces/spans but never calls log_assertion_results(), assertion results are silently lost on a round-trip. opik migrate does carry them through the cascade (experiments.py:1043-1123) — export/import don't. Flagging in case it should be in scope or tracked as a follow-up.

Comment thread sdks/python/src/opik/cli/imports/__init__.py Outdated
prompt=prompt_content,
metadata=metadata,
type=prompt_type_enum,
project_name=project_name,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 suggestion | Consistency

opik.Prompt() / ChatPrompt() are deprecated — the constructors themselves log "opik.Prompt() is deprecated. Use client.create_prompt() ...", so a multi-prompt import will emit one deprecation warning per prompt. The migrate path already uses rest_client.prompts.create_prompt / create_prompt_version (see cli/migrate/prompts/executor.py). Consider switching the import path to client.create_prompt(...) for consistency and to silence the warnings.

🤖 Review posted via /review-github-pr

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 997d9d6: the import path now uses client.create_prompt(...) / client.create_chat_prompt(...) instead of the deprecated opik.Prompt()/ChatPrompt() constructors, so a multi-prompt import no longer logs a deprecation warning per prompt (and it's consistent with the migrate path).

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commit 997d9d6 addressed this comment by replacing the deprecated Prompt() and ChatPrompt() constructors with client.create_prompt() and client.create_chat_prompt(). That routes imports through the client factories, matching the migrate path and avoiding the per-prompt deprecation warnings.

Comment thread sdks/python/src/opik/cli/imports/utils.py Outdated
@JetoPistola

Copy link
Copy Markdown
Contributor

👋 Review summary

What looks good

  • Symmetric, self-describing layout. Keying folders by project ID with a project.json carrying the human name is the right call — it eliminates the /, :, and whitespace path-breakage of names like scout:comet-ml/opik, and the same --path round-trips between export and import. Every single-type export still writes project.json via prepare_project_export_dir, so import can always resolve a folder by name regardless of what was exported.
  • Fail-fast import. Summing every *_errors counter (including cascaded experiment/dataset/trace failures), exiting non-zero, and only calling manifest.complete() on a fully clean run is a real correctness improvement — partial failures now stay resumable instead of being masked by a 0 exit code.
  • Clean migrate decoupling. recreate_experiment keeps target_project_name as the migrate-path marker while project_name now drives experiment placement on both paths; the migrate caller passes both, so is_migrate_path stays true and migrate fidelity is genuinely unchanged. Nicely done without touching the migrate module.
  • Thorough docs. Both MDX pages were updated consistently — new grammar, the traces rename, ID layout, project.json, --to-project, and the cross-workspace --path story — including the maintained-by-hand callout.

Overall
Solid, well-scoped v2 rework with strong test coverage and honest documentation. One blocker to resolve before merge: the resume/completion manifest is keyed only to the source project folder, so --to-project doesn't participate in resume state — a clean import into destination A then into destination B short-circuits on "already completed" and writes nothing to B (and a partially-failed first import would resume into B, splitting data). Details inline. The two suggestions are non-blocking.

Inline comments: 1 blocker, 2 suggestions.

🤖 Review posted via /review-github-pr

Addresses JetoPistola review:
- BLOCKER: the resume/completion manifest was keyed only to the source folder,
  so importing one export into different --to-project targets shared one manifest
  (a clean import into A made a later import into B short-circuit on
  "already completed"; a partial import into A would resume into B). The manifest
  is now keyed by destination via destination_manifest_dir(project_root, dest)
  (hashed, names may contain / :). Regression test: import into A then B both run.
- Switch the import prompt path to client.create_prompt()/create_chat_prompt()
  instead of the deprecated opik.Prompt()/ChatPrompt() constructors.
- Warn in find_project_export_dir when >1 exported folder records the same
  project name (ambiguous resolution).
- Docs: manifest now lives under projects/<id>/import_manifests/<destination-key>/.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@dsblank

dsblank commented Jun 17, 2026

Copy link
Copy Markdown
Contributor Author

Thanks for the thorough review, @JetoPistola. Status:

Blocker (per-destination manifest) — fixed in 997d9d6 (details on the inline thread). Manifest is now keyed by destination; added a regression test covering import→A then import→B.

Suggestions — both done in 997d9d6: import prompts now go through client.create_prompt()/create_chat_prompt() (no per-prompt deprecation warning), and find_project_export_dir warns on ambiguous same-name folders.

Non-blocking follow-ups (agreed, out of scope for this re-scoping PR):

  • Read-path cursor pagination — export still uses offset pagination vs. migrate's search_traces cursor (last_retrieved_id). Real perf gap on deep pages; worth a dedicated perf ticket, not this PR.
  • Assertion results dropped on round-tripassertion_results is a separately-stored entity that import doesn't recreate (migrate's cascade does). Genuine fidelity gap; should be tracked as its own ticket.

Both are accurate; I'd prefer to land them as separate follow-ups so this PR stays scoped to the v1→v2 re-scoping. Happy to file the tickets if that's the preferred workflow.

Comment thread sdks/python/src/opik/cli/imports/all.py Outdated
Comment on lines 78 to 84
manifest_dir = destination_manifest_dir(project_root, project_name)

if not dry_run:
manifest = MigrationManifest(base_path)
manifest = MigrationManifest(manifest_dir)
if force:
if MigrationManifest.exists(base_path):
if MigrationManifest.exists(manifest_dir):
manifest.reset()

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using destination_manifest_dir as MigrationManifest's base_path mismatches export file paths under project_root, causing relative_path() to raise ValueError and break resume tracking — should we initialize MigrationManifest with project_root as base_path instead, and extract a shared prepare_migration_manifest helper so import_all and _import_by_type stay in sync?

Severity

Want Baz to fix this for you? Activate Fixer

Other fix methods

Fix in Cursor

Prompt for AI Agents
In `import_all` (lines 78–84), `MigrationManifest` is initialized with
`destination_manifest_dir(project_root, project_name)` as its `base_path`, but export
files (datasets/prompts/traces/experiments) live under `project_root/...`. This causes
`MigrationManifest.relative_path()` → `file_path.relative_to(self.base_path)` to raise
`ValueError`, breaking resume tracking. Fix by decoupling the manifest's key root from
its filesystem location: initialize `MigrationManifest` with `project_root` as
`base_path` (or update `relative_path()` to compute keys relative to `project_root`
regardless of manifest directory). Also, this manifest lifecycle setup duplicates
`_import_by_type` — extract a shared `prepare_migration_manifest` helper to keep both
in sync. Ensure `is_file_completed()`/`mark_file_completed()` receive export paths
safely convertible to manifest keys without relying on `manifest_dir` ancestry.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commit 95a572b addressed this comment by changing MigrationManifest to key files relative to project_root while allowing the SQLite file to live in the per-destination manifest directory. It also extracted a shared setup_import_manifest helper and wired both import paths to use it, keeping the resume/force lifecycle in sync.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch — fixed in 95a572b. MigrationManifest now takes an optional manifest_path: base_path stays project_root (so file keys under project_root/datasets|prompts|experiments and the trace files resolve via relative_to), while the SQLite db lives in the per-destination subdir. Extracted the shared setup_import_manifest(project_root, dest, dry_run, force) helper used by both import_all and _import_by_type (returns (manifest, already_completed)). Verified live: a real import completes and a non---force rerun correctly reports "already completed"; added a regression test that marking a file under project_root no longer raises. No old manifest to migrate (this layout is new in this PR).

Comment on lines +91 to +94
manifest_dir = destination_manifest_dir(project_root, target_project_name)

if not dry_run:
manifest = MigrationManifest(base_path)
manifest = MigrationManifest(manifest_dir)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MigrationManifest still keys files with file_path.relative_to(self.base_path) even though it now gets manifest_dir, so source files under project_root/datasets, prompts, traces, and experiments can raise ValueError on manifest.is_file_completed(...); should we keep project_root as the key root and only move the SQLite path, or migrate the old migration_manifest.db?

Severity web_search

Want Baz to fix this for you? Activate Fixer

Other fix methods

Fix in Cursor

Prompt for AI Agents
Before applying, verify this suggestion against the current code. In
sdks/python/src/opik/cli/imports/__init__.py around lines 90-114 in the
`_import_by_type` function, the code computes `manifest_dir =
destination_manifest_dir(project_root, target_project_name)` and then constructs
`MigrationManifest(manifest_dir)`. Update this so that `MigrationManifest` continues to
key completion state using paths relative to `project_root` (so
`file_path.relative_to(self.base_path)` won’t raise when passed
`project_root/datasets`, `project_root/prompts`, `project_root/experiments`, etc.),
while only the SQLite database file path is relocated under `manifest_dir`. Prefer
either: (1) adjust `MigrationManifest` to accept separate `key_root` (project_root) and
`db_path` (manifest_dir/db file), and wire `_import_by_type` accordingly; or (2)
implement a migration step that loads an existing manifest from the old root-level
location and re-saves/relocates it to the new hashed manifest directory before resuming.
Ensure existing non-dry-run imports can resume without `ValueError` and upgraded runs
can resume prior interrupted imports rather than reprocessing everything.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commit 95a572b addressed this comment by separating the manifest's key root from its SQLite file location. MigrationManifest now keys paths relative to project_root, while setup_import_manifest() stores the DB under the per-destination manifest directory, preventing ValueError for datasets/, prompts/, traces/, and experiments/ paths.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 95a572b — kept project_root as the key root and moved only the SQLite path: MigrationManifest(project_root, manifest_path=<dest-subdir>/migration_manifest.db). So is_file_completed(...) / mark_file_completed(...) for files under project_root/{datasets,prompts,experiments} (and the trace files directly under project_root) no longer raise ValueError. Regression test added; confirmed live.

…t setup_import_manifest

Fixes a regression from the per-destination manifest change: passing the
per-destination subdir as MigrationManifest.base_path made file keys resolve
relative_to() that subdir, but exported files live under project_root/... — so
mark_file_completed()/is_file_completed() raised ValueError and broke resume
tracking (mocked unit tests didn't catch it; live import would).

- MigrationManifest gains an optional manifest_path so the SQLite file can live
  in the per-destination subdir while base_path stays project_root (the key root
  for relative file paths). mkdir the manifest file's parent, not base_path.
- Extract setup_import_manifest(project_root, dest, dry_run, force) — used by
  both import_all and _import_by_type — which builds the manifest correctly and
  runs the start/resume/force lifecycle, returning (manifest, already_completed).
  This also dedups the manifest-lifecycle block flagged in review.
- Add a regression test that files under project_root are keyable while the db
  lives in the per-destination subdir.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Comment on lines +195 to +199
manifest = MigrationManifest(project_root, manifest_path=manifest_file)
if force:
if MigrationManifest.exists(project_root, manifest_path=manifest_file):
manifest.reset()
console.print(

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MigrationManifest(...) opens manifest_file before the --force check, so MigrationManifest.exists(...) is tautologically true on a first opik import ... --force and we end up warning/resetting against a non-existent manifest — should we check manifest_file.exists() first or gate reset() on the pre-construction state?

Severity

Want Baz to fix this for you? Activate Fixer

Other fix methods

Fix in Cursor

Prompt for AI Agents
Before applying, verify this suggestion against the current code. In
sdks/python/src/opik/cli/imports/utils.py around lines 171-214, in the
setup_import_manifest() force branch, fix the logic so the “discarding existing
manifest, starting fresh” warning and manifest.reset() only occur when a prior
manifest actually exists on disk. Refactor by checking manifest_file.exists() (or an
equivalent filesystem-based pre-check) before constructing MigrationManifest or before
calling reset, and only call reset/print when that pre-check is true; if it’s the
first run, skip the reset entirely. Ensure the existing
manifest/is_completed/is_in_progress paths are unaffected.

"""The per-destination manifest must key files relative to project_root even
though the SQLite db lives in a per-destination subdir."""

def test_files_under_project_root_are_keyable(self, tmp_path):

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sdks/python/tests/unit/test_export_import_all.py should rename these tests to the required test_WHAT__CASE__EXPECTED_RESULT format, e.g. test_setup_import_manifest__files_under_project_root__allows_marking_files_completed_without_value_error and test_setup_import_manifest__dry_run__returns_no_manifest.

Severity

Want Baz to fix this for you? Activate Fixer You can also update your AI coding guidelines based on this comment by apply pr to [branch name]

Other fix methods

Fix in Cursor

Prompt for AI Agents
Before applying, verify this suggestion against the current code. In
sdks/python/tests/unit/test_export_import_all.py around lines 403-442 within class
TestSetupImportManifest, rename the two test methods to follow the required naming
convention from .agents/skills/python-sdk/testing.md. Specifically, change
`test_files_under_project_root_are_keyable` to a
`test_setup_import_manifest__files_under_project_root__allows_marking_files_completed_without_value_error`-style
name, and change `test_dry_run_creates_no_manifest` to a
`test_setup_import_manifest__dry_run__returns_no_manifest`-style name. Ensure the new
names encode WHAT, CASE, and EXPECTED_RESULT, and update only the method definitions (no
test logic changes).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation Python SDK python Pull requests that update Python code tests Including test files, or tests related like configuration.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants