Crypto Module Migration#104
Conversation
Summary of ChangesHello @harvey0100, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
The PR successfully migrates the crypto and genio modules to autils, providing good test coverage and documentation. I've identified a few areas for improvement, primarily focusing on memory efficiency when handling large files and a bug in hash_file when dealing with special files (like /dev/zero) where os.path.getsize returns 0.
I am having trouble creating individual review comments. Click here to see my feedback.
autils/file/crypto.py (57-60)
Using os.path.getsize() to cap the size parameter causes issues with special files (e.g., /dev/zero, pipes, or some files in /proc) where the reported size is 0, even though they contain data. This results in an empty hash being returned even if a positive size was requested. Additionally, it introduces a race condition (TOCTOU) if the file size changes between the call to getsize() and the subsequent read(). It's more robust to read until EOF or until the requested number of bytes is reached.
if not size or size <= 0:
size = None
autils/file/crypto.py (71-78)
This loop should be updated to handle the case where size is None (meaning hash the entire file), ensuring it works correctly for both regular and special files without relying on a pre-calculated file size.
while size is None or size > 0:
read_amt = chunksize if size is None else min(chunksize, size)
data = file_to_hash.read(read_amt)
if not data:
if size is not None:
LOG.debug("Nothing left to read but size=%d", size)
break
hash_obj.update(data)
if size is not None:
size -= len(data)
autils/file/genio.py (122)
Calling file_obj.readlines() creates an intermediate list of all lines in memory before the list comprehension runs. Iterating directly over the file object is more memory-efficient.
contents = [line.rstrip("\n") for line in file_obj]
autils/file/genio.py (150)
Using file_obj.readlines() loads the entire file into memory. For large files (like logs), this can lead to high memory consumption. Iterating over the file object directly processes it line by line.
for line in file_obj:
autils/file/genio.py (286)
Reading the entire file into memory with content_file.read() can be problematic for very large files. While re.MULTILINE is used, if the pattern doesn't actually span multiple lines, iterating line by line would be much safer. If multi-line matching is required, consider reading in chunks or using a memory-mapped file for large inputs.
autils/file/genio.py (310-311)
Comparing file sizes before computing cryptographic hashes is a significant optimization. If the sizes differ, the files cannot be equal, and we can avoid the expensive hashing process entirely.
if os.path.getsize(filename) != os.path.getsize(other):
return False
hash_1 = crypto.hash_file(filename)
hash_2 = crypto.hash_file(other)
- Add autils/file/crypto.py (hash_file without deprecation block) - Add unit and functional tests with updated imports - Add metadata/file/crypto.yml - Add crypto to docs under File section Reference: avocado-framework#43 Assisted-By: Cursor-Claude-4-Sonnet Signed-off-by: Harvey Lynden <hlynden@redhat.com>
Remove test_file_tampering_detection and test_symlink_follows_to_target to match avocado selftests/functional/utils/crypto.py line for line. Made-with: Cursor
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #104 +/- ##
===========================
===========================
☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Migrates the crypto module from avocado to autils. Adds hash_file() for file hashing, plus unit and functional tests, metadata, and docs. The implementation matches the avocado version, with the deprecation block removed and imports updated for autils.