Add --split flag for per-object file output#2
Conversation
setuptools could not find the `src` package without an explicit `[tool.setuptools.packages.find]` directive, causing a ModuleNotFoundError when running the installed `sqlextract` console script.
Writes each schema object to its own file organized in typed subdirectories (schemas/, tables/, constraints/, indexes/, views/, procedures/, functions/, deferred_fks/, seed_data/). Useful for version control and code review where individual object diffs are easier to track.
📝 WalkthroughWalkthroughAdds a CLI --split flag that routes extraction to a new split-output path, implements per-object file output in the formatter, threads split through the extractor, adds filename sanitization and new SQLExtractError subclasses, and updates configs and tests accordingly. Changes
Sequence DiagramsequenceDiagram
actor User
participant CLI as CLI
participant Extractor as Extractor
participant Formatter as Formatter
participant FS as FileSystem
User->>CLI: run command with --split
CLI->>CLI: parse args (split=True)
CLI->>Extractor: extract(output_dir..., split=True)
Extractor->>Extractor: collect metadata and per-table seed_data dict
Extractor->>Formatter: write_split_format(..., seed_data_dict)
Formatter->>Formatter: iterate object collections
Formatter->>Formatter: sanitize_filename(name)
Formatter->>FS: create type subdirectories
Formatter->>FS: write individual files per object
FS-->>Formatter: confirm writes
Formatter-->>Extractor: return status
Extractor-->>CLI: return completion
CLI-->>User: print result
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/formatter.py`:
- Around line 460-468: The _write helper currently writes sanitized filenames
(via sanitize_filename) directly and can silently overwrite when different
source names map to the same sanitized path; update _write to detect collisions
by computing the target path from sanitize_filename(filename), then if the path
already exists either (a) compare existing file contents and raise an error if
they differ, or (b) append a deterministic disambiguator (e.g., a short hash of
the original filename or an incremental suffix) to the filename before writing
to avoid overwriting; reference the _write function and sanitize_filename to
implement this check and ensure a clear error or unique filename is produced
instead of silent overwrite.
In `@src/utils.py`:
- Around line 98-109: The sanitize_filename function currently replaces illegal
characters but still returns Windows-reserved basenames (e.g., CON, PRN, AUX,
NUL, COM1..COM9, LPT1..LPT9) which cause failures; update sanitize_filename to
split the filename and extension (use os.path.splitext) check the base name
case-insensitively against the reserved set, and if it matches, modify the base
(for example prefix or suffix an underscore) before recombining so the returned
sanitized name is not a reserved Windows device name; keep the existing
character-replacement and rstrip logic and still raise ValueError if the final
sanitized string is empty.
- Detect filename collisions in _write() when different SQL names sanitize to the same path, raising OutputError instead of silently overwriting - Escape Windows-reserved device names (CON, PRN, AUX, NUL, COM1-9, LPT1-9) in sanitize_filename() by appending underscore to the base name
33 new tests covering sanitize_filename (unsafe chars, empty input, Windows reserved names), write_split_format (directory structure, file content, collision error), and extractor split flag routing.
There was a problem hiding this comment.
♻️ Duplicate comments (1)
src/formatter.py (1)
460-475:⚠️ Potential issue | 🟠 MajorScope collision detection to the current run and normalize write failures as
OutputError.Line 467 currently treats any pre-existing file as a collision, so rerunning
--splitinto the same directory fails even when there is no in-run name collision. Also, Line 464/473 can still surface rawValueError/OSErrorinstead ofOutputError.🔧 Proposed fix
header = self.write_header() + written_paths: dict[str, str] = {} def _write(subdir: str, filename: str, content: str) -> None: """Write a single object file inside a typed subdirectory.""" dir_path = os.path.join(self.output_dir, subdir) os.makedirs(dir_path, exist_ok=True) - path = os.path.join(dir_path, sanitize_filename(filename)) - - # Detect collisions from different source names sanitizing to the same path - if os.path.exists(path): - raise OutputError( - f"Filename collision: '{filename}' maps to '{path}' which already exists. " - f"Two different objects sanitized to the same filename." - ) - - with open(path, 'w', encoding='utf-8') as f: - f.write(header) - f.write(content) + try: + safe_filename = sanitize_filename(filename) + path = os.path.join(dir_path, safe_filename) + collision_key = os.path.normcase(os.path.normpath(path)) + previous_source = written_paths.get(collision_key) + if previous_source is not None and previous_source != filename: + raise OutputError( + f"Filename collision after sanitization: '{previous_source}' vs '{filename}' -> '{safe_filename}'" + ) + written_paths[collision_key] = filename + + with open(path, 'w', encoding='utf-8') as f: + f.write(header) + f.write(content) + except (OSError, ValueError) as exc: + raise OutputError(f"Failed to write split output file: {filename}") from exc
Collision detection now uses an in-memory dict instead of filesystem checks, so re-running --split into the same directory works. Raw ValueError/OSError from sanitize_filename and file I/O are wrapped as OutputError. Added test for re-run idempotency.
✅ Actions performedReview triggered.
|
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
🧹 Nitpick comments (1)
tests/test_extractor.py (1)
68-68: Use temp-path fixtures instead of hardcoded/tmp/test.Line 68, Line 118, and Line 167 hardcode a Unix-specific path. Switching to
tmp_path/TemporaryDirectorywill make these tests portable and less stateful.♻️ Suggested update
- def _run_extract(self, split): + def _run_extract(self, split, output_dir): @@ - extractor.extract(output_dir="/tmp/test", split=split) + extractor.extract(output_dir=str(output_dir), split=split) return formatter - def test_split_true_calls_write_split_format(self): + def test_split_true_calls_write_split_format(self, tmp_path): @@ - formatter = self._run_extract(split=True) + formatter = self._run_extract(split=True, output_dir=tmp_path) - def test_split_false_calls_write_modular_format(self): + def test_split_false_calls_write_modular_format(self, tmp_path): @@ - formatter = self._run_extract(split=False) + formatter = self._run_extract(split=False, output_dir=tmp_path)Also applies to: 118-118, 167-167
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/test_extractor.py` at line 68, Replace the hardcoded "/tmp/test" in extractor.extract(...) calls with a temporary directory created by the test (use the pytest tmp_path fixture or tempfile.TemporaryDirectory), e.g., create a Path from tmp_path or the TemporaryDirectory's name and pass that as the output_dir argument to extractor.extract; ensure tests use the temporary path for all three occurrences and let pytest/tempfile handle cleanup so tests become OS-agnostic and non-stateful.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@tests/test_extractor.py`:
- Line 68: Replace the hardcoded "/tmp/test" in extractor.extract(...) calls
with a temporary directory created by the test (use the pytest tmp_path fixture
or tempfile.TemporaryDirectory), e.g., create a Path from tmp_path or the
TemporaryDirectory's name and pass that as the output_dir argument to
extractor.extract; ensure tests use the temporary path for all three occurrences
and let pytest/tempfile handle cleanup so tests become OS-agnostic and
non-stateful.
Summary
--splitCLI flag that writes each database object to its own file, organized in typed subdirectories (schemas/,tables/,constraints/,indexes/,views/,procedures/,functions/,deferred_fks/,seed_data/)sanitize_filename()helper to make SQL object names filesystem-safeDict[str, List[str]]keyed byschema.tableto support per-table file outputTest plan
--helpshows the new--splitflagsql-worko-wdp-silver-dev) — all 266 objects written to correct subdirectories--splitis not set by default (modular format still works)Summary by CodeRabbit
New Features
Improvements
Tests
Chores