Updates for 0.8.0#104
Merged
Merged
Conversation
Add abstraction layer that allows for for use of either hdf5r or rhdf5
Apply hdf5 abstraction functions to where they are applied in creating/writing data to output file.
Performance improvements
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Performance Improvements and HDF5 Backend Flexibility
Summary
This PR introduces significant performance improvements to the NEONiso package and adds flexible HDF5 backend support. The package can now use either
hdf5r(CRAN) orrhdf5(Bioconductor) for HDF5 file I/O, withhdf5ras the preferred backend when available. Additionally, multiple calibration and data processing functions have been optimized to reduce runtime and memory usage.Major Changes
1. HDF5 Backend Abstraction Layer
New file:
R/hdf5_utils.R(192 lines)hdf5r(CRAN) andrhdf5(Bioconductor) backendshdf5rwhen available (CRAN installation) but falls back torhdf5(Bioconductor) seamlesslyPackage dependency updates (
DESCRIPTION):rhdf5fromImportstoSuggestshdf5rtoSuggestsneonUtilitiesminimum version to 2.3.0caretdependency (see performance improvements below)2. Performance Optimizations
Calibration Functions (
R/reference_data_regression.R)Cross-validation improvements:
caretpackage inestimate_calibration_error()Memory and speed improvements in
fit_carbon_regression():summary()results to avoid redundant computationsOutput Functions (
R/output_functions.R)Enhanced
setup_output_file():attrsparameter to accept pre-read attributes (avoids redundant file reads)keep_openparameter to return open file handle for subsequent operationsrhdf5::calls with abstraction layer functionsOptimized write functions:
write_carbon_calibration_data()andwrite_carbon_ambient_data()now accept optional open file handle (fidparameter)Similar optimizations in:
write_water_calibration_data()write_water_ambient_data()Time Functions (
R/time_functions.R)Data Ingestion (
R/restructure_data.R)Quality Control (
R/quality_control.R)3. Enhanced Test Coverage
New test files:
test-hdf5_utils.R(186 lines) - Comprehensive tests for HDF5 abstraction layertest-hdf5_roundtrip.R(255 lines) - Tests for full read-write-read cyclestest-regression_snapshots.R(144 lines) - Snapshot tests for calibration outputstest-gold_file_regression.R(123 lines) - Gold file comparison testsUpdated test files:
test-data_regression.R- Enhanced with backend-agnostic teststest-high_level_functions.R- Updated for new function signaturestest-data_ingestion.R- Improved coverage of edge casestest-utility_functions.R- Additional utility function testsTest statistics:
New snapshot file:
_snaps/gold_file_regression.md- Comprehensive golden file snapshots for regression testing4. Calibration Function Updates
R/calibrate_carbon.RandR/calibrate_water.R:setup_output_file()5. Infrastructure and Documentation
Build and ignore files:
packageVersionimportManual pages updated:
estimate_calibration_error.Rd- Updated to reflect removal ofcaretdependencysetup_output_file.Rd- Documented newattrsandkeep_openparameterswrite_*_data.Rdfiles - Documented newfidparameterWorkflow files:
workflows/test_workflows.Randworkflows/test_workflows_parallel.Rupdated for compatibilityPerformance Impact
These changes provide measurable performance improvements:
caretdependency reduces installation time and package bloatcaret::train()Backward Compatibility
rhdf5already installed will see no behavioral changeshdf5rby installing it:install.packages("hdf5r")Migration Path
For users currently using the package:
rhdf5if already installedhdf5rfrom CRAN for easier dependency management:install.packages("hdf5r")Key commits:
cbd1e8a- Add abstraction layer for hdf5 packagee29a3cf- Apply hdf5 abstraction layers8872bdf- Various performance improvements45abf91- Update functions with some performance tweaks1090e77- Add additional tests880e340- Update tests12ed984- Use close instead of close_all for hdf5r to avoid testing errors