Feature/dominant frequency#685
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds rolling-window mean and median frequency transforms with PSD helpers, re-exports, Patch instance shortcuts, and tests. ChangesMean and Median Frequency Transforms
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 inconclusive)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #685 +/- ##
==========================================
- Coverage 99.93% 99.91% -0.02%
==========================================
Files 135 136 +1
Lines 11704 11868 +164
==========================================
+ Hits 11696 11858 +162
- Misses 8 10 +2
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Actionable comments posted: 4
🧹 Nitpick comments (2)
tests/test_transform/test_mean_median_frequency.py (1)
11-124: ⚡ Quick winConsider consolidating test classes to eliminate duplication.
The
TestMeanFrequencyandTestMedianFrequencyclasses are structurally identical, differing only in the method name and expected strings. This duplication increases maintenance burden and risks inconsistency.♻️ Proposed refactor using pytest parametrization
-class TestMeanFrequency: - """Tests for mean_frequency patch function.""" +@pytest.mark.parametrize( + "method_name,expected_data_type", + [ + ("mean_frequency", "Mean Frequency"), + ("median_frequency", "Median Frequency"), + ], +) +class TestFrequencyTransforms: + """Tests for mean_frequency and median_frequency patch functions.""" - def test_runs(self, random_patch): - """Ensure mean_frequency runs as a patch method.""" - out = random_patch.mean_frequency(winlen=0.01, step=0.01) + def test_runs(self, random_patch, method_name, expected_data_type): + """Ensure frequency transform runs as a patch method.""" + method = getattr(random_patch, method_name) + out = method(winlen=0.01, step=0.01) - assert out.attrs.data_type == "Mean Frequency" + assert out.attrs.data_type == expected_data_type assert out.attrs.data_units == units.Hz assert "time" in out.dims assert "distance" in out.dims - def test_fft_method_runs(self, random_patch): + def test_fft_method_runs(self, random_patch, method_name, expected_data_type): """Ensure FFT method runs.""" - out = random_patch.mean_frequency(winlen=0.01, step=0.01, method="fft") + method = getattr(random_patch, method_name) + out = method(winlen=0.01, step=0.01, method="fft") - assert out.attrs.data_type == "Mean Frequency" + assert out.attrs.data_type == expected_data_type assert out.attrs.data_units == units.Hz assert np.isfinite(out.data).any() - def test_step_defaults_to_winlen(self, random_patch): + def test_step_defaults_to_winlen(self, random_patch, method_name, expected_data_type): """Ensure step defaults to winlen.""" - out = random_patch.mean_frequency(winlen=0.01) - expected = random_patch.mean_frequency(winlen=0.01, step=0.01) + method = getattr(random_patch, method_name) + out = method(winlen=0.01) + expected = method(winlen=0.01, step=0.01) assert np.allclose(out.data, expected.data, equal_nan=True) - def test_frequency_bounds(self, random_patch): + def test_frequency_bounds(self, random_patch, method_name, expected_data_type): """Ensure frequency bounds are accepted.""" - out = random_patch.mean_frequency( + method = getattr(random_patch, method_name) + out = method( winlen=0.01, step=0.01, fmin=10, fmax=100, ) - assert out.attrs.data_type == "Mean Frequency" + assert out.attrs.data_type == expected_data_type assert out.attrs.data_units == units.Hz - def test_invalid_frequency_bounds_raise(self, random_patch): + def test_invalid_frequency_bounds_raise(self, random_patch, method_name, expected_data_type): """Ensure invalid frequency bounds raise.""" + method = getattr(random_patch, method_name) with pytest.raises(AssertionError, match="must be smaller than fmax"): - random_patch.mean_frequency( + method( winlen=0.01, step=0.01, fmin=100, fmax=10, ) - def test_collapse_mode_runs(self, random_patch): + def test_collapse_mode_runs(self, random_patch, method_name, expected_data_type): """Ensure winlen=None collapses over the full time range.""" - out = random_patch.mean_frequency() + method = getattr(random_patch, method_name) + out = method() - assert out.attrs.data_type == "Mean Frequency" + assert out.attrs.data_type == expected_data_type assert out.attrs.data_units == units.Hz assert out.data.shape != random_patch.data.shape - - -class TestMedianFrequency: - """Tests for median_frequency patch function.""" - - def test_runs(self, random_patch): - """Ensure median_frequency runs as a patch method.""" - out = random_patch.median_frequency(winlen=0.01, step=0.01) - - assert out.attrs.data_type == "Median Frequency" - assert out.attrs.data_units == units.Hz - assert "time" in out.dims - assert "distance" in out.dims - - def test_fft_method_runs(self, random_patch): - """Ensure FFT method runs.""" - out = random_patch.median_frequency(winlen=0.01, step=0.01, method="fft") - - assert out.attrs.data_type == "Median Frequency" - assert out.attrs.data_units == units.Hz - assert np.isfinite(out.data).any() - - def test_step_defaults_to_winlen(self, random_patch): - """Ensure step defaults to winlen.""" - out = random_patch.median_frequency(winlen=0.01) - expected = random_patch.median_frequency(winlen=0.01, step=0.01) - - assert np.allclose(out.data, expected.data, equal_nan=True) - - def test_frequency_bounds(self, random_patch): - """Ensure frequency bounds are accepted.""" - out = random_patch.median_frequency( - winlen=0.01, - step=0.01, - fmin=10, - fmax=100, - ) - - assert out.attrs.data_type == "Median Frequency" - assert out.attrs.data_units == units.Hz - - def test_invalid_frequency_bounds_raise(self, random_patch): - """Ensure invalid frequency bounds raise.""" - with pytest.raises(AssertionError, match="must be smaller than fmax"): - random_patch.median_frequency( - winlen=0.01, - step=0.01, - fmin=100, - fmax=10, - ) - - def test_collapse_mode_runs(self, random_patch): - """Ensure winlen=None collapses over the full time range.""" - out = random_patch.median_frequency() - - assert out.attrs.data_type == "Median Frequency" - assert out.attrs.data_units == units.Hz - assert out.data.shape != random_patch.data.shape🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/test_transform/test_mean_median_frequency.py` around lines 11 - 124, Duplicate test classes TestMeanFrequency and TestMedianFrequency should be merged using pytest parametrization: replace the two classes with a single set of parametrized tests that iterate over method names ("mean_frequency", "median_frequency") and expected attrs ("Mean Frequency", "Median Frequency"), calling the function via getattr(random_patch, method_name); update each test (runs, fft_method_runs, step_defaults_to_winlen, frequency_bounds, invalid_frequency_bounds_raise, collapse_mode_runs) to accept the parameters and use getattr to invoke the method and assert expected attrs and behavior (e.g., compare data, check units.Hz, check exceptions), keeping the same test logic but removing duplicated code in TestMeanFrequency and TestMedianFrequency.dascore/transform/mean_median_frequency.py (1)
173-179: ⚡ Quick winUse explicit exceptions for argument validation.
These checks are guarding public inputs, so
assertcan disappear under optimized Python and let invalidmethod/ frequency bounds fall through into misleading errors or empty-spectrum behavior. RaiseValueErrorhere instead.Suggested change
- assert ( - method.upper() in allowed_methods - ), f"Invalid method: '{method}'. Must be one of {allowed_methods}" + if method.upper() not in allowed_methods: + raise ValueError( + f"Invalid method: '{method}'. Must be one of {allowed_methods}" + ) - if fmin is not None and fmax is not None: - assert fmin < fmax, f"fmin ({fmin}) must be smaller than fmax ({fmax})" + if fmin is not None and fmax is not None and fmin >= fmax: + raise ValueError(f"fmin ({fmin}) must be smaller than fmax ({fmax})")🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@dascore/transform/mean_median_frequency.py` around lines 173 - 179, Replace the use of assert for validating public inputs with explicit exceptions: where the code checks method membership against allowed_methods (using method.upper() in allowed_methods) raise a ValueError with the same message instead of using assert, and likewise replace the assert that verifies fmin < fmax with a ValueError describing the invalid bounds; update the validation logic surrounding the allowed_methods check and the fmin/fmax check in mean_median_frequency.py so callers get a clear ValueError when inputs are invalid.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@dascore/transform/mean_median_frequency.py`:
- Around line 389-390: The example incorrectly calls patch.mean_frequency(...)
while documenting the median-frequency transform; change the example to call
patch.median_frequency(...) (and adjust the variable name if you want to reflect
median, e.g., med -> medf or median) so the documented example uses the correct
function (median_frequency) and matches the surrounding documentation.
- Around line 183-187: The code unconditionally resets nperseg to
data.shape[-1], ignoring any public nperseg parameter; change the logic in
mean_median_frequency (around the get_window/_welch call) to only set nperseg =
data.shape[-1] when the function parameter nperseg is None, validate that
provided nperseg <= data.shape[-1] (raise ValueError otherwise), then create win
= get_window("hann", nperseg) and call _welch(data, win, (1/dt),
nperseg=nperseg) so the public nperseg parameter is honored.
- Around line 410-420: The median_frequency implementation should guard against
empty or zero-power PSDs: after calling _get_psd_in_window (frq, pxx), first
handle the case where pxx has no frequency bins (e.g., pxx.shape[-1] == 0) by
returning an array of np.nan matching the expected output shape; next compute
cumulative_psd and total_power, and where total_power == 0 set the median result
to np.nan instead of using argmax (similar to mean_frequency's handling). Use
the existing names frq, pxx, cumulative_psd, total_power, median_idx and ensure
median_idx is only computed for non-zero-power spectra so you don't index frq
with invalid indices or report fmin/DC when there is no spectral content.
- Around line 196-200: The current logic sets inf2 = -1 when fmax is None which
causes pxx = pxx[:, :, inf1:inf2] and frq = frq[inf1:inf2] to drop the final
(Nyquist) bin; change the inf2 assignment so the slice includes the end (e.g.
inf2 = None if fmax is None else np.searchsorted(frq, fmax) or inf2 = frq.size
when fmax is None) so pxx and frq slicing (pxx[:, :, inf1:inf2], frq[inf1:inf2])
does not exclude the last frequency bin.
---
Nitpick comments:
In `@dascore/transform/mean_median_frequency.py`:
- Around line 173-179: Replace the use of assert for validating public inputs
with explicit exceptions: where the code checks method membership against
allowed_methods (using method.upper() in allowed_methods) raise a ValueError
with the same message instead of using assert, and likewise replace the assert
that verifies fmin < fmax with a ValueError describing the invalid bounds;
update the validation logic surrounding the allowed_methods check and the
fmin/fmax check in mean_median_frequency.py so callers get a clear ValueError
when inputs are invalid.
In `@tests/test_transform/test_mean_median_frequency.py`:
- Around line 11-124: Duplicate test classes TestMeanFrequency and
TestMedianFrequency should be merged using pytest parametrization: replace the
two classes with a single set of parametrized tests that iterate over method
names ("mean_frequency", "median_frequency") and expected attrs ("Mean
Frequency", "Median Frequency"), calling the function via getattr(random_patch,
method_name); update each test (runs, fft_method_runs, step_defaults_to_winlen,
frequency_bounds, invalid_frequency_bounds_raise, collapse_mode_runs) to accept
the parameters and use getattr to invoke the method and assert expected attrs
and behavior (e.g., compare data, check units.Hz, check exceptions), keeping the
same test logic but removing duplicated code in TestMeanFrequency and
TestMedianFrequency.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: c864e698-43e0-4de3-9945-0007fd0a41ad
📒 Files selected for processing (4)
dascore/core/patch.pydascore/transform/__init__.pydascore/transform/mean_median_frequency.pytests/test_transform/test_mean_median_frequency.py
| nperseg = data.shape[-1] | ||
| win = get_window("hann", nperseg) | ||
|
|
||
| if method.upper() == "WELCH": | ||
| frq, pxx = _welch(data, win, (1 / dt), nperseg=nperseg) |
There was a problem hiding this comment.
nperseg is currently ignored for Welch PSD.
nperseg gets unconditionally reset to the full window length here, so the public parameter never affects the result and "welch" collapses to a single full-window segment. That materially changes the estimator and makes the API contract misleading.
Suggested change
- nperseg = data.shape[-1]
+ if nperseg is None:
+ nperseg = data.shape[-1]
+ else:
+ nperseg = min(int(nperseg), data.shape[-1])
win = get_window("hann", nperseg)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| nperseg = data.shape[-1] | |
| win = get_window("hann", nperseg) | |
| if method.upper() == "WELCH": | |
| frq, pxx = _welch(data, win, (1 / dt), nperseg=nperseg) | |
| if nperseg is None: | |
| nperseg = data.shape[-1] | |
| else: | |
| nperseg = min(int(nperseg), data.shape[-1]) | |
| win = get_window("hann", nperseg) | |
| if method.upper() == "WELCH": | |
| frq, pxx = _welch(data, win, (1 / dt), nperseg=nperseg) |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@dascore/transform/mean_median_frequency.py` around lines 183 - 187, The code
unconditionally resets nperseg to data.shape[-1], ignoring any public nperseg
parameter; change the logic in mean_median_frequency (around the
get_window/_welch call) to only set nperseg = data.shape[-1] when the function
parameter nperseg is None, validate that provided nperseg <= data.shape[-1]
(raise ValueError otherwise), then create win = get_window("hann", nperseg) and
call _welch(data, win, (1/dt), nperseg=nperseg) so the public nperseg parameter
is honored.
| inf1 = 0 if fmin is None else np.searchsorted(frq, fmin) | ||
| inf2 = -1 if fmax is None else np.searchsorted(frq, fmax) | ||
|
|
||
| pxx = pxx[:, :, inf1:inf2] | ||
| frq = frq[inf1:inf2] |
There was a problem hiding this comment.
The default slice drops the last frequency bin.
When fmax is None, inf2 = -1 excludes the Nyquist bin from every default call. That biases both mean and median frequency downward.
Suggested change
inf1 = 0 if fmin is None else np.searchsorted(frq, fmin)
- inf2 = -1 if fmax is None else np.searchsorted(frq, fmax)
+ inf2 = None if fmax is None else np.searchsorted(frq, fmax, side="right")📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| inf1 = 0 if fmin is None else np.searchsorted(frq, fmin) | |
| inf2 = -1 if fmax is None else np.searchsorted(frq, fmax) | |
| pxx = pxx[:, :, inf1:inf2] | |
| frq = frq[inf1:inf2] | |
| inf1 = 0 if fmin is None else np.searchsorted(frq, fmin) | |
| inf2 = None if fmax is None else np.searchsorted(frq, fmax, side="right") | |
| pxx = pxx[:, :, inf1:inf2] | |
| frq = frq[inf1:inf2] |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@dascore/transform/mean_median_frequency.py` around lines 196 - 200, The
current logic sets inf2 = -1 when fmax is None which causes pxx = pxx[:, :,
inf1:inf2] and frq = frq[inf1:inf2] to drop the final (Nyquist) bin; change the
inf2 assignment so the slice includes the end (e.g. inf2 = None if fmax is None
else np.searchsorted(frq, fmax) or inf2 = frq.size when fmax is None) so pxx and
frq slicing (pxx[:, :, inf1:inf2], frq[inf1:inf2]) does not exclude the last
frequency bin.
| frq, pxx = _get_psd_in_window( | ||
| data, dt=dt, method=method, fmin=fmin, fmax=fmax, nperseg=nperseg | ||
| ) | ||
|
|
||
| # calculate median frequency | ||
| cumulative_psd = np.cumsum(pxx, axis=-1) | ||
| total_power = cumulative_psd[:, :, -1] | ||
| half = total_power[..., None] / 2 | ||
| median_idx = np.argmax(cumulative_psd >= half, axis=-1) | ||
|
|
||
| return frq[median_idx] |
There was a problem hiding this comment.
Guard empty and zero-power spectra before computing the median bin.
If the frequency filter removes all bins, cumulative_psd[:, :, -1] raises immediately. If total power is zero, argmax returns index 0 and reports fmin/DC as the median despite there being no spectral content. mean_frequency already handles the zero-power case; median_frequency should too.
Suggested change
frq, pxx = _get_psd_in_window(
data, dt=dt, method=method, fmin=fmin, fmax=fmax, nperseg=nperseg
)
+ if frq.size == 0:
+ raise ValueError("Frequency bounds exclude the entire spectrum")
# calculate median frequency
cumulative_psd = np.cumsum(pxx, axis=-1)
total_power = cumulative_psd[:, :, -1]
half = total_power[..., None] / 2
median_idx = np.argmax(cumulative_psd >= half, axis=-1)
- return frq[median_idx]
+ out = np.zeros_like(total_power, dtype=float)
+ valid = total_power > 0
+ out[valid] = frq[median_idx[valid]]
+ return out📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| frq, pxx = _get_psd_in_window( | |
| data, dt=dt, method=method, fmin=fmin, fmax=fmax, nperseg=nperseg | |
| ) | |
| # calculate median frequency | |
| cumulative_psd = np.cumsum(pxx, axis=-1) | |
| total_power = cumulative_psd[:, :, -1] | |
| half = total_power[..., None] / 2 | |
| median_idx = np.argmax(cumulative_psd >= half, axis=-1) | |
| return frq[median_idx] | |
| frq, pxx = _get_psd_in_window( | |
| data, dt=dt, method=method, fmin=fmin, fmax=fmax, nperseg=nperseg | |
| ) | |
| if frq.size == 0: | |
| raise ValueError("Frequency bounds exclude the entire spectrum") | |
| # calculate median frequency | |
| cumulative_psd = np.cumsum(pxx, axis=-1) | |
| total_power = cumulative_psd[:, :, -1] | |
| half = total_power[..., None] / 2 | |
| median_idx = np.argmax(cumulative_psd >= half, axis=-1) | |
| out = np.zeros_like(total_power, dtype=float) | |
| valid = total_power > 0 | |
| out[valid] = frq[median_idx[valid]] | |
| return out |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@dascore/transform/mean_median_frequency.py` around lines 410 - 420, The
median_frequency implementation should guard against empty or zero-power PSDs:
after calling _get_psd_in_window (frq, pxx), first handle the case where pxx has
no frequency bins (e.g., pxx.shape[-1] == 0) by returning an array of np.nan
matching the expected output shape; next compute cumulative_psd and total_power,
and where total_power == 0 set the median result to np.nan instead of using
argmax (similar to mean_frequency's handling). Use the existing names frq, pxx,
cumulative_psd, total_power, median_idx and ensure median_idx is only computed
for non-zero-power spectra so you don't index frq with invalid indices or report
fmin/DC when there is no spectral content.
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (2)
dascore/transform/mean_median_frequency.py (2)
151-151: 💤 Low valueRemove redundant import.
numpyis already imported at the module level (line 11). This inner import is unnecessary.Suggested fix
- import numpy as np - n = xx.shape[-1]🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@dascore/transform/mean_median_frequency.py` at line 151, Remove the redundant local "import numpy as np" statement inside the module (present around line 151) — numpy is already imported at the top of dascore/transform/mean_median_frequency.py, so delete the inner import to avoid duplication and potential shadowing while leaving all uses of "np" unchanged.
1-5: 💤 Low valueUpdate module docstring to describe actual purpose.
The docstring still contains Spyder Editor boilerplate ("temporary script file") rather than describing the mean/median frequency transforms. Consider replacing with a meaningful module description.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@dascore/transform/mean_median_frequency.py` around lines 1 - 5, Replace the placeholder Spyder Editor docstring at the top of mean_median_frequency.py with a concise module docstring that describes the purpose and public API of this module (e.g., that it implements mean and median frequency transforms for signals), mentions the key functions/classes such as mean_frequency and median_frequency (or whichever public functions are defined in this file), documents expected inputs/outputs (signal array, frequency axis, return value), and notes any important behavior or dependencies; keep it short, informative, and aligned with the actual function names in the file.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@dascore/transform/mean_median_frequency.py`:
- Around line 384-385: The default step behavior in median_frequency is
inconsistent with mean_frequency: change the median_frequency branch that
currently sets step = winlen when step is None to instead set step = dt so both
transforms default to the sampling interval; update any related docstring or
comment for median_frequency to reflect that step defaults to dt, and ensure
variable names used are the same as in mean_frequency (dt, winlen) to keep
output shapes consistent.
In `@tests/test_transform/test_mean_median_frequency.py`:
- Line 211: The test in TestMedianFrequency incorrectly calls
patch.mean_frequency(...) so median_frequency's transpose behavior isn't tested;
update the call where out = patch.mean_frequency(...) to call
patch.median_frequency(...) instead (keeping the same arguments and assertions)
so the TestMedianFrequency test actually exercises the median_frequency
function.
---
Nitpick comments:
In `@dascore/transform/mean_median_frequency.py`:
- Line 151: Remove the redundant local "import numpy as np" statement inside the
module (present around line 151) — numpy is already imported at the top of
dascore/transform/mean_median_frequency.py, so delete the inner import to avoid
duplication and potential shadowing while leaving all uses of "np" unchanged.
- Around line 1-5: Replace the placeholder Spyder Editor docstring at the top of
mean_median_frequency.py with a concise module docstring that describes the
purpose and public API of this module (e.g., that it implements mean and median
frequency transforms for signals), mentions the key functions/classes such as
mean_frequency and median_frequency (or whichever public functions are defined
in this file), documents expected inputs/outputs (signal array, frequency axis,
return value), and notes any important behavior or dependencies; keep it short,
informative, and aligned with the actual function names in the file.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 6becbb98-c994-42a0-a0d0-f40e4a2176b6
📒 Files selected for processing (2)
dascore/transform/mean_median_frequency.pytests/test_transform/test_mean_median_frequency.py
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
dascore/transform/mean_median_frequency.py (2)
179-184:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winReplace
assertwith explicit exceptions.
assertstatements are removed when Python runs with optimization (-O), causing these validations to silently disappear. Use explicitValueErrorraises instead.Suggested fix
- assert ( - method.upper() in allowed_methods - ), f"Invalid method: '{method}'. Must be one of {allowed_methods}" + if method.upper() not in allowed_methods: + raise ValueError(f"Invalid method: '{method}'. Must be one of {allowed_methods}") if fmin is not None and fmax is not None: - assert fmin < fmax, f"fmin ({fmin}) must be smaller than fmax ({fmax})" + if fmin >= fmax: + raise ValueError(f"fmin ({fmin}) must be smaller than fmax ({fmax})")🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@dascore/transform/mean_median_frequency.py` around lines 179 - 184, Replace the two assert checks with explicit ValueError raises so validation remains in optimized runs: for the method check, raise ValueError with the message "Invalid method: '{method}'. Must be one of {allowed_methods}" when method.upper() not in allowed_methods (use the same variables method and allowed_methods), and for the frequency bounds check, if fmin is not None and fmax is not None and not (fmin < fmax) raise ValueError with the message "fmin ({fmin}) must be smaller than fmax ({fmax})" (use the same fmin/fmax variables).
241-242:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winDocstring contradicts implementation for
stepdefault.The docstring states step "Defaults to
winlen", but line 275 setsstep = dt(the sampling interval) whenstep is None.Suggested fix
step - Step between windows in seconds. Defaults to `winlen`. + Step between windows in seconds. Defaults to the sampling interval (`dt`).🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@dascore/transform/mean_median_frequency.py` around lines 241 - 242, Docstring and implementation disagree: the docstring for the step parameter says "Defaults to `winlen`" but the code sets step = dt when step is None; pick one behavior and make them consistent. Either change the implementation to set step = winlen when step is None (update the assignment where step is set) or update the docstring to state that step defaults to the sampling interval `dt`; adjust the text near the parameter description in mean_median_frequency.py so the parameter description for `step` accurately matches the assignment that uses `dt`, and ensure any references to `winlen` vs `dt` are consistent in that function.
🧹 Nitpick comments (4)
dascore/transform/mean_median_frequency.py (4)
327-327: 💤 Low valueUse
inttype hint fornperseg.Same as
mean_frequency-npersegrepresents a number of samples.Suggested fix
- nperseg: float = 256, + nperseg: int = 256,🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@dascore/transform/mean_median_frequency.py` at line 327, The nperseg parameter in mean_median_frequency is incorrectly typed as float; change its annotation to int (matching mean_frequency) so nperseg: int = 256, and update any related type hints or docstring references in mean_median_frequency to reflect it represents a sample count rather than a float.
218-218: 💤 Low valueUse
inttype hint fornperseg.
npersegrepresents a number of samples and should be an integer.Suggested fix
- nperseg: float = 256, + nperseg: int = 256,🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@dascore/transform/mean_median_frequency.py` at line 218, The parameter nperseg is annotated as float but represents a count of samples and should be an int; update the type hint for the nperseg parameter (in the mean/median frequency function in dascore.transform.mean_median_frequency where nperseg: float = 256 is declared) to use int (nperseg: int = 256) and ensure any internal usage or calls pass/expect an integer accordingly.
1-5: 💤 Low valueReplace placeholder module docstring.
The docstring contains leftover Spyder editor boilerplate. It should describe the module's purpose (e.g., "Rolling mean and median frequency transforms for DASCore patches").
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@dascore/transform/mean_median_frequency.py` around lines 1 - 5, Replace the placeholder Spyder editor docstring at the top of the module with a real module-level docstring that describes the purpose and high-level functionality (e.g., "Rolling mean and median frequency transforms for DASCore patches"), mention the main utilities provided (rolling_mean_frequency, rolling_median_frequency if those functions exist) and any important inputs/outputs or expected data types so readers know what the module implements.
151-151: 💤 Low valueRemove redundant import.
numpyis already imported at the module level (line 11).Suggested fix
- import numpy as np - n = xx.shape[-1]🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@dascore/transform/mean_median_frequency.py` at line 151, Remove the redundant "import numpy as np" in mean_median_frequency.py (the extra import on line where the diff shows "import numpy as np"); the module already imports numpy at the top, so delete this duplicate import to avoid redundancy and potential linter warnings while leaving any uses of the symbol "np" elsewhere (e.g., inside functions or class methods in this module) unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@dascore/transform/mean_median_frequency.py`:
- Around line 179-184: Replace the two assert checks with explicit ValueError
raises so validation remains in optimized runs: for the method check, raise
ValueError with the message "Invalid method: '{method}'. Must be one of
{allowed_methods}" when method.upper() not in allowed_methods (use the same
variables method and allowed_methods), and for the frequency bounds check, if
fmin is not None and fmax is not None and not (fmin < fmax) raise ValueError
with the message "fmin ({fmin}) must be smaller than fmax ({fmax})" (use the
same fmin/fmax variables).
- Around line 241-242: Docstring and implementation disagree: the docstring for
the step parameter says "Defaults to `winlen`" but the code sets step = dt when
step is None; pick one behavior and make them consistent. Either change the
implementation to set step = winlen when step is None (update the assignment
where step is set) or update the docstring to state that step defaults to the
sampling interval `dt`; adjust the text near the parameter description in
mean_median_frequency.py so the parameter description for `step` accurately
matches the assignment that uses `dt`, and ensure any references to `winlen` vs
`dt` are consistent in that function.
---
Nitpick comments:
In `@dascore/transform/mean_median_frequency.py`:
- Line 327: The nperseg parameter in mean_median_frequency is incorrectly typed
as float; change its annotation to int (matching mean_frequency) so nperseg: int
= 256, and update any related type hints or docstring references in
mean_median_frequency to reflect it represents a sample count rather than a
float.
- Line 218: The parameter nperseg is annotated as float but represents a count
of samples and should be an int; update the type hint for the nperseg parameter
(in the mean/median frequency function in
dascore.transform.mean_median_frequency where nperseg: float = 256 is declared)
to use int (nperseg: int = 256) and ensure any internal usage or calls
pass/expect an integer accordingly.
- Around line 1-5: Replace the placeholder Spyder editor docstring at the top of
the module with a real module-level docstring that describes the purpose and
high-level functionality (e.g., "Rolling mean and median frequency transforms
for DASCore patches"), mention the main utilities provided
(rolling_mean_frequency, rolling_median_frequency if those functions exist) and
any important inputs/outputs or expected data types so readers know what the
module implements.
- Line 151: Remove the redundant "import numpy as np" in
mean_median_frequency.py (the extra import on line where the diff shows "import
numpy as np"); the module already imports numpy at the top, so delete this
duplicate import to avoid redundancy and potential linter warnings while leaving
any uses of the symbol "np" elsewhere (e.g., inside functions or class methods
in this module) unchanged.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: a09ff1a8-d6fd-4575-8c40-26c8bca071e3
📒 Files selected for processing (2)
dascore/transform/mean_median_frequency.pytests/test_transform/test_mean_median_frequency.py
* small tweaks for agents * address review
* add *args and **kwargs to rolling.apply * add example to apply for rolling
* add more details about making a release * address review
…as-wuestefeld/dascore_AWU into feature/dominant-frequency
d-chambers
left a comment
There was a problem hiding this comment.
Main suggestions here:
-
Consider broadening the API to permit any dimension name to work, eg for higher order patches with multiple time-like dimensions.
-
Decide if patches with 1 or 3+ dimensions are allowed. If so, rework the helper functions. If not, raise a error early on.
-
Consider renaming some of the opaque numpy/scipy style variables to something more informative unless it becomes overly verbose. Particularly, the function parameters would benefit from clear names, even if internal variables keep the abbreviated form.
-
See if
maybe_numba_jitcan come in handy here, especially for the_welchfunction.
| @@ -0,0 +1,418 @@ | |||
| """Rolling mean and median frequency transforms for DASCore patches""" | |||
There was a problem hiding this comment.
Maybe some maybe_numba_jit would be appropriate here too?
| if patch.get_axis("distance") != mfr_patch.get_axis("distance"): | ||
| mfr_patch = mfr_patch.transpose() |
There was a problem hiding this comment.
This is problematic because you didn't declare distance was required. You might have time and depth for example, then this would fail.
A better approach is to not depend on distance at all.
original_dims = patch.dims
... # do stuff here
out = out.transpose(*original_dims)| @patch_function(required_dims=("time",), history="full") | ||
| def mean_frequency( |
There was a problem hiding this comment.
You probably already know what I am going to say here, but...
I really don't like to restrict patch functions to a particular dimension when there isn't really a very good technical reason to do it. It is not just because there might be some theoretical (but bizarre) application to the distance axis.
Consider, for example, that in a higher-dimensional patch, there might be multiple time dimensions (eg absolte_time, shot_time, relative_time, lag_time, etc.). It could become vary awkward to use this function in such a case if the desired time axis isn't called exactly "time".
So, my suggestion here is like with the fbe function: use kwargs to indicate window length. Then we know the window length and the dimension from one parameter call.
| fmin: float | None = None, | ||
| fmax: float | None = None, | ||
| method: str = "welch", | ||
| nperseg: int = 256, |
There was a problem hiding this comment.
This parameter is not documented in your docstring. Also, it has a very opaque name which probably corresponds to the scipy/numpy name, but I find it a bit unhelpful. I suggest naming int something like
segment_samples, welsh_samples or spectral_window_samples.
| from dascore.utils.patch import patch_function | ||
|
|
||
|
|
||
| def _welch(xx, win, fs=1.0, nperseg=256): |
There was a problem hiding this comment.
Two suggestions here.
First, I know this is mainly adapted from scipy's function, but the variable naming in scipy is often less than clear. Perhaps we could give the public facing parameters some names that are a bit longer but more specific?
Second, this function assumes a 2d array as xx. We probably want to just refactor a bit to allow an array of any number of dimension (eg for higher order patches common in active source workflows).
Here are codex's suggestions to make this work. If this is too onerous let's add an explicit check before the code gets here to indicate only 2D patches are accepted.
- Stop indexing xx.shape[2].
n_samples = xx.shape[-1]
- Flatten all leading dimensions before the segment loop:
lead_shape = xx.shape[:-1]
x_flat = xx.reshape(-1, n_samples)
- Allocate PSD using flattened trace count:
n_traces = x_flat.shape[0]
nfreq = nfft // 2 + 1
psd = np.zeros((n_traces, nfreq), dtype=float)
- Remove the loop over xx.shape[1].
Current code loops over the middle “rolling window” axis:
for i in range(xx.shape[1]):
x = xx[:, i, :]
That is the 3D assumption. Instead, operate on all flattened traces together:
x = x_flat
- Compute each Welch segment for all traces at once:
for k, ind in enumerate(indices):
segment = x[:, ind : ind + nperseg]
segment = segment - np.mean(segment, axis=-1, keepdims=True)
xft = fftpack.rfft(segment * win, nfft)
...
- Accumulate/average PSD across segments.
You can keep the existing incremental average logic, but it should update pxx shaped (n_traces, nfreq) instead of (distance, nfreq).
- Keep separate even/odd FFT unpacking branches, but make them use x_flat.
Even branch:
pxx[:, (0, -1)] = xft[:, (0, -1)] ** 2
pxx[:, 1:-1] = xft[:, 1:-1:2] ** 2 + xft[:, 2::2] ** 2
Odd branch:
pxx[:, 0] = xft[:, 0] ** 2
pxx[:, 1:] = xft[:, 1::2] ** 2 + xft[:, 2::2] ** 2
- Apply scaling to pxx once after averaging.
- Reshape back to the original leading dimensions:
psd = pxx.reshape(lead_shape + (pxx.shape[-1],))
- Build frequency axis from nfft:
f = np.arange(pxx.shape[-1]) * (fs / nfft)
So structurally _welch should go from (only works for (a, b, samples):
psd = np.zeros((xx.shape[0], xx.shape[1], nfreq))
for i in range(xx.shape[1]):
x = xx[:, i, :]
...
psd[:, i, :] = pxx
to (works for (..., samples)):
lead_shape = xx.shape[:-1]
x = xx.reshape(-1, xx.shape[-1])
pxx = compute_welch_for_2d_array(x)
psd = pxx.reshape(lead_shape + (pxx.shape[-1],))
| if patch.get_axis("distance") != mfr_patch.get_axis("distance"): | ||
| mfr_patch = mfr_patch.transpose() | ||
|
|
||
| return mfr_patch.update(attrs={"data_type": "Mean Frequency", "data_units": "Hz"}) |
There was a problem hiding this comment.
If we no longer assume time index, data units should become inverse of current axis units.
coord = patch.get_coordinate(dim)
units = 1/dc.get_quantity(coord.units) if coord.units else None|
|
||
| # use only desired part of spectrum | ||
| inf1 = 0 if fmin is None else np.searchsorted(frq, fmin) | ||
| inf2 = -1 if fmax is None else np.searchsorted(frq, fmax) |
There was a problem hiding this comment.
As code rabbit caught, this drops the final frequency bin when fmax is None because it uses inf2 = -1. That omits the Nyquist bin from default calculations, biasing both mean and median frequency. You probably just want None as the slice stop when no upper bound is provided.
Some more transformations that have been useful for me in the past:
mean_frequency and median_frequency apply a rolling window and calculate a measure of the "dominant frequency"
Sometimes mean_frequency shows better what you expect from your data, sometimes median_frequency. Here are options for both.
Note that these functions make use of the "partial" function on the rolling_function.
this can be updated once the rolling_function allows for arguments directly
Checklist
I have (if applicable):
Summary by CodeRabbit
New Features
Tests