Skip to content

KPMP-5807: rename files in-place in globus dir before copying to dlu#167

Open
HaneenT wants to merge 1 commit into
developfrom
KPMP-5807_rename_file_in_place
Open

KPMP-5807: rename files in-place in globus dir before copying to dlu#167
HaneenT wants to merge 1 commit into
developfrom
KPMP-5807_rename_file_in_place

Conversation

@HaneenT
Copy link
Copy Markdown
Contributor

@HaneenT HaneenT commented May 11, 2026

Summary by CodeRabbit

  • Refactor
    • Enhanced file handling process with improved metadata computation and operation logging during file movement operations.

Review Change Stack

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 11, 2026

Walkthrough

The pull request modifies file handling in DLUFileHandler.rename_and_move_files to use a two-step process: rename source files to slide-mapped destination names before copying them to final destinations. Metadata is now constructed from the renamed source path and final copied file.

Changes

File Rename-Then-Copy Workflow

Layer / File(s) Summary
Rename and Copy Logic
data_management/services/dlu_filesystem.py
Source files are renamed to mapped destination names in the source directory, then copied to final DLU destinations. DLUFile metadata uses mapped destination filename with checksum/size computed from final copied file. Logging tracks both operations.
🚥 Pre-merge checks | ✅ 2
✅ Passed checks (2 passed)
Check name Status Explanation
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch KPMP-5807_rename_file_in_place

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (1)
data_management/services/dlu_filesystem.py (1)

137-137: 💤 Low value

Avoid reusing loop variable name.

Line 137 reassigns the loop variable file to a new DLUFile instance. This shadows the original loop variable and makes the code harder to understand. Use a distinct name like dlu_file for clarity.

♻️ Proposed variable rename
-            file = DLUFile(name=dest_file_name, path=dest_package_directory,
+            dlu_file = DLUFile(name=dest_file_name, path=dest_package_directory,
                            checksum=calculate_checksum(dest_path), size=os.path.getsize(dest_path))
-            dluFiles.append(file)
+            dluFiles.append(dlu_file)

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 96f24bf9-83e9-4997-aa9e-269ed7181ca8

📥 Commits

Reviewing files that changed from the base of the PR and between e931c91 and 11591be.

📒 Files selected for processing (1)
  • data_management/services/dlu_filesystem.py

Comment on lines 128 to 140
for file in file_list:
dest_file = os.path.join(dest_package_directory, slide_name_map[file.name])
logger.info("Copying file " + os.path.join(source_package_directory, file.name) + " to "
+ os.path.join(dest_package_directory, slide_name_map[file.name]))
shutil.copy(os.path.join(source_package_directory, file.name),
dest_file)
file = DLUFile(name=slide_name_map[file.name], path=dest_package_directory,
checksum=calculate_checksum(dest_file), size=os.path.getsize(dest_file))
dest_file_name = slide_name_map[file.name]
renamed_src_path = os.path.join(source_package_directory, dest_file_name)
dest_path = os.path.join(dest_package_directory, dest_file_name)

logger.info("Renaming file " + os.path.join(source_package_directory, file.name) + " to " + renamed_src_path)
os.rename(os.path.join(source_package_directory, file.name), renamed_src_path)
logger.info("Copying file " + renamed_src_path + " to " + dest_path)
shutil.copy(renamed_src_path, dest_path)
file = DLUFile(name=dest_file_name, path=dest_package_directory,
checksum=calculate_checksum(dest_path), size=os.path.getsize(dest_path))
dluFiles.append(file)
return dluFiles
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Add error handling and validate unique destination names.

The method lacks error handling around file operations and doesn't check for duplicate destination filenames. This can cause:

  1. Silent overwrites: If multiple source files map to the same dest_file_name, later files will overwrite earlier ones without warning
  2. Unhandled exceptions: Any I/O error (permissions, disk full, etc.) will crash without cleanup
  3. Partial state: Failed operations leave destination directory in an inconsistent state
🛡️ Proposed fix with validation and error handling
     def rename_and_move_files(self, file_list: list[DLUFile], slide_name_map, package_id ):
         dluFiles = []
         dest_package_directory = os.path.join(self.dlu_data_directory, self.dlu_package_dir_prefix + package_id)
         if os.path.exists(dest_package_directory):
             shutil.rmtree(dest_package_directory)
         if not os.path.exists(dest_package_directory):
             logger.info("Creating directory " + dest_package_directory)
             os.makedirs(dest_package_directory, exist_ok=True)
 
         source_package_directory = self.globus_data_directory + '/' + self.globus_dir_prefix + package_id
+        
+        # Validate no duplicate destination names
+        dest_names = set()
         for file in file_list:
+            if file.name not in slide_name_map:
+                raise ValueError(f"File '{file.name}' not found in slide_name_map")
             dest_file_name = slide_name_map[file.name]
+            if dest_file_name in dest_names:
+                raise ValueError(f"Duplicate destination filename: '{dest_file_name}'")
+            dest_names.add(dest_file_name)
+            
+        # Process files with error handling
+        for file in file_list:
+            dest_file_name = slide_name_map[file.name]
-            renamed_src_path = os.path.join(source_package_directory, dest_file_name)
+            src_path = os.path.join(source_package_directory, file.name)
             dest_path = os.path.join(dest_package_directory, dest_file_name)
-
-            logger.info("Renaming file " + os.path.join(source_package_directory, file.name) + " to " + renamed_src_path)
-            os.rename(os.path.join(source_package_directory, file.name), renamed_src_path)
-            logger.info("Copying file " + renamed_src_path + " to " + dest_path)
-            shutil.copy(renamed_src_path, dest_path)
-            file = DLUFile(name=dest_file_name, path=dest_package_directory,
+            
+            try:
+                logger.info(f"Copying file {src_path} to {dest_path}")
+                shutil.copy(src_path, dest_path)
+            except (IOError, OSError) as e:
+                logger.error(f"Failed to copy {src_path} to {dest_path}: {e}")
+                raise
+            
+            dlu_file = DLUFile(name=dest_file_name, path=dest_package_directory,
                            checksum=calculate_checksum(dest_path), size=os.path.getsize(dest_path))
-            dluFiles.append(file)
+            dluFiles.append(dlu_file)
         return dluFiles

dest_file)
file = DLUFile(name=slide_name_map[file.name], path=dest_package_directory,
checksum=calculate_checksum(dest_file), size=os.path.getsize(dest_file))
dest_file_name = slide_name_map[file.name]
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Add KeyError protection for missing slide name mappings.

Line 129 accesses slide_name_map[file.name] without checking if the key exists. If a file name is not present in the mapping, this will raise a KeyError and crash the process, potentially leaving the source directory in a partially renamed state.

🛡️ Proposed fix with KeyError handling
         for file in file_list:
-            dest_file_name = slide_name_map[file.name]
+            if file.name not in slide_name_map:
+                raise ValueError(f"File '{file.name}' not found in slide_name_map")
+            dest_file_name = slide_name_map[file.name]
             renamed_src_path = os.path.join(source_package_directory, dest_file_name)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
dest_file_name = slide_name_map[file.name]
if file.name not in slide_name_map:
raise ValueError(f"File '{file.name}' not found in slide_name_map")
dest_file_name = slide_name_map[file.name]

Comment on lines +133 to +136
logger.info("Renaming file " + os.path.join(source_package_directory, file.name) + " to " + renamed_src_path)
os.rename(os.path.join(source_package_directory, file.name), renamed_src_path)
logger.info("Copying file " + renamed_src_path + " to " + dest_path)
shutil.copy(renamed_src_path, dest_path)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical | 🏗️ Heavy lift

Critical: In-place source mutation creates data loss risk.

The code renames files directly in the source Globus directory (line 134), then copies them (line 136). This approach is risky because:

  1. Data loss on failure: If shutil.copy() fails after os.rename() succeeds, the original file is permanently renamed and not copied, leaving the source in an inconsistent state
  2. No rollback: If the process crashes partway through, previously renamed files cannot be easily restored
  3. Concurrent access issues: Other processes reading from the Globus directory may fail if files are renamed mid-operation
  4. Source preservation: The source directory should typically remain unchanged until the entire operation succeeds

The standard pattern is to copy directly from source to destination with the new name, leaving the source intact until success is confirmed.

♻️ Proposed fix: Copy directly without mutating source
         for file in file_list:
+            if file.name not in slide_name_map:
+                raise ValueError(f"File '{file.name}' not found in slide_name_map")
             dest_file_name = slide_name_map[file.name]
-            renamed_src_path = os.path.join(source_package_directory, dest_file_name)
+            src_path = os.path.join(source_package_directory, file.name)
             dest_path = os.path.join(dest_package_directory, dest_file_name)
 
-            logger.info("Renaming file " + os.path.join(source_package_directory, file.name) + " to " + renamed_src_path)
-            os.rename(os.path.join(source_package_directory, file.name), renamed_src_path)
-            logger.info("Copying file " + renamed_src_path + " to " + dest_path)
-            shutil.copy(renamed_src_path, dest_path)
+            logger.info(f"Copying file {src_path} to {dest_path}")
+            shutil.copy(src_path, dest_path)
             file = DLUFile(name=dest_file_name, path=dest_package_directory,
                            checksum=calculate_checksum(dest_path), size=os.path.getsize(dest_path))
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
logger.info("Renaming file " + os.path.join(source_package_directory, file.name) + " to " + renamed_src_path)
os.rename(os.path.join(source_package_directory, file.name), renamed_src_path)
logger.info("Copying file " + renamed_src_path + " to " + dest_path)
shutil.copy(renamed_src_path, dest_path)
for file in file_list:
if file.name not in slide_name_map:
raise ValueError(f"File '{file.name}' not found in slide_name_map")
dest_file_name = slide_name_map[file.name]
src_path = os.path.join(source_package_directory, file.name)
dest_path = os.path.join(dest_package_directory, dest_file_name)
logger.info(f"Copying file {src_path} to {dest_path}")
shutil.copy(src_path, dest_path)
file = DLUFile(name=dest_file_name, path=dest_package_directory,
checksum=calculate_checksum(dest_path), size=os.path.getsize(dest_path))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant