Update shared path, serialization, and evaluation helpers#30
Open
T4ras123 wants to merge 1 commit into
Open
Conversation
Bundle the path fallback, serialization support, evaluation robustness, and training checkpoint path fixes that need to land together so the shared utilities stay consistent on top of main. Made-with: Cursor
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This pull request introduces significant improvements and new features to the SMILES encoder/decoder module, focusing on more robust path handling, enriched coordinate binning functionality, and minor bug fixes. The main highlights are the addition of a flexible binning configuration class and methods for encoding/decoding molecules using these bin configurations, as well as improved error handling for file system paths.
Path handling robustness:
_resolve_path_valueand_base_candidate_valuesto gracefully skip over unreadable or inaccessible file system paths, preventing crashes due toOSErrorwhen traversing shared or restricted mounts. [1] [2] [3]Coordinate binning enhancements:
BinConfigdataclass to encapsulate binning configuration, including methods for saving/loading configurations and calculating digit widths. This enables flexible and reusable binning strategies for molecular coordinate encoding. (F3f17218L544R544)fit_uniform_binsandfit_quantile_binsfunctions to generate bin edges based on uniform or quantile-based strategies, supporting robust and data-driven binning. (F3f17218L544R544)encode_cartesian_with_configanddecode_cartesian_with_configto serialize and deserialize 3D molecular coordinates using the new binning configuration, supporting more accurate and customizable encoding/decoding workflows. (F3f17218L544R544)Bug fixes and minor improvements:
encode_cartesian_binnedandencode_cartesian_binned_v2from(-13.0, 13.0)to(-11.0, 11.0)for all axes, likely to better fit the data distribution. [1] [2]encode_cartesian_binned_v2by ensuring each atom entry is terminated with a semicolon, improving consistency for downstream parsing.Code cleanup:
Summary of most important changes:
1. Path handling improvements
OSErrorexception handling in_resolve_path_valueand_base_candidate_valuesto skip unreadable paths and continue searching for valid candidates. [1] [2] [3]2. Coordinate binning functionality
BinConfigdataclass for managing binning configuration, including persistence methods. (F3f17218L544R544)fit_uniform_binsandfit_quantile_binsfor flexible bin edge generation. (F3f17218L544R544)encode_cartesian_with_configanddecode_cartesian_with_configfor encoding/decoding molecules using bin configs. (F3f17218L544R544)3. Bug fixes
(-11.0, 11.0)for better data fit. [1] [2]encode_cartesian_binned_v2to ensure semicolon separation for atom entries.4. Code cleanup