Skip to content

Feature/dominant frequency#685

Open
andreas-wuestefeld wants to merge 34 commits into
DASDAE:masterfrom
andreas-wuestefeld:feature/dominant-frequency
Open

Feature/dominant frequency#685
andreas-wuestefeld wants to merge 34 commits into
DASDAE:masterfrom
andreas-wuestefeld:feature/dominant-frequency

Conversation

@andreas-wuestefeld
Copy link
Copy Markdown
Contributor

@andreas-wuestefeld andreas-wuestefeld commented May 21, 2026

Some more transformations that have been useful for me in the past:
mean_frequency and median_frequency apply a rolling window and calculate a measure of the "dominant frequency"

Sometimes mean_frequency shows better what you expect from your data, sometimes median_frequency. Here are options for both.

Note that these functions make use of the "partial" function on the rolling_function.
this can be updated once the rolling_function allows for arguments directly

Checklist

I have (if applicable):

  • referenced the GitHub issue this PR closes.
  • documented the new feature with docstrings and/or appropriate doc page.
  • included tests. See testing guidelines.
  • added the "ready_for_review" tag once the PR is ready to be reviewed.

Summary by CodeRabbit

  • New Features

    • Patch objects gain mean_frequency and median_frequency transforms for rolling-window spectral analysis with configurable window/step, optional fmin/fmax, and selectable PSD method; outputs include updated data type and units (Hz).
  • Tests

    • Added tests for PSD/FFT behavior, helper routines, frequency-bounds validation, output attributes, numerical stability, and axis/shape handling for the new transforms.

Review Change Stack

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 21, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds rolling-window mean and median frequency transforms with PSD helpers, re-exports, Patch instance shortcuts, and tests.

Changes

Mean and Median Frequency Transforms

Layer / File(s) Summary
PSD computation infrastructure
dascore/transform/mean_median_frequency.py
_welch, _fft_psd, and _get_psd_in_window compute one-sided PSDs and slice frequency ranges.
Mean frequency transform
dascore/transform/mean_median_frequency.py
mean_frequency computes PSD-weighted mean frequency per rolling time window and sets data_type="Mean Frequency" and data_units="Hz".
Median frequency transform
dascore/transform/mean_median_frequency.py
median_frequency computes per-window median frequency via cumulative PSD and sets data_type="Median Frequency" and data_units="Hz".
Module and Patch API wiring
dascore/transform/__init__.py, dascore/core/patch.py
Re-exports mean_frequency/median_frequency from the transform package and adds Patch.mean_frequency and Patch.median_frequency instance shortcuts.
Test coverage
tests/test_transform/test_mean_median_frequency.py
Tests for _welch, _fft_psd, mean_frequency, and median_frequency, including shape checks, SciPy comparisons, FFT-scaling, attribute/dimension assertions, and frequency-bound validation.
🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name Status Explanation Resolution
Title check ❓ Inconclusive The title 'Feature/dominant frequency' is vague and uses generic branch naming rather than describing the actual feature being added. Revise to a more specific title that clearly describes the feature, such as 'Add mean_frequency and median_frequency transforms' or 'Implement dominant frequency spectral analysis methods'.
✅ Passed checks (4 passed)
Check name Status Explanation
Description check ✅ Passed The description explains the feature well and completes most checklist items, though the GitHub issue reference checkbox remains unchecked with no issue linked.
Docstring Coverage ✅ Passed Docstring coverage is 87.50% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@codecov
Copy link
Copy Markdown

codecov Bot commented May 21, 2026

Codecov Report

❌ Patch coverage is 98.97436% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 99.91%. Comparing base (f8a171a) to head (8971839).
⚠️ Report is 5 commits behind head on master.

Files with missing lines Patch % Lines
dascore/transform/mean_median_frequency.py 98.43% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #685      +/-   ##
==========================================
- Coverage   99.93%   99.91%   -0.02%     
==========================================
  Files         135      136       +1     
  Lines       11704    11868     +164     
==========================================
+ Hits        11696    11858     +162     
- Misses          8       10       +2     
Flag Coverage Δ
unittests 99.91% <98.97%> (-0.02%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (2)
tests/test_transform/test_mean_median_frequency.py (1)

11-124: ⚡ Quick win

Consider consolidating test classes to eliminate duplication.

The TestMeanFrequency and TestMedianFrequency classes are structurally identical, differing only in the method name and expected strings. This duplication increases maintenance burden and risks inconsistency.

♻️ Proposed refactor using pytest parametrization
-class TestMeanFrequency:
-    """Tests for mean_frequency patch function."""
+@pytest.mark.parametrize(
+    "method_name,expected_data_type",
+    [
+        ("mean_frequency", "Mean Frequency"),
+        ("median_frequency", "Median Frequency"),
+    ],
+)
+class TestFrequencyTransforms:
+    """Tests for mean_frequency and median_frequency patch functions."""
 
-    def test_runs(self, random_patch):
-        """Ensure mean_frequency runs as a patch method."""
-        out = random_patch.mean_frequency(winlen=0.01, step=0.01)
+    def test_runs(self, random_patch, method_name, expected_data_type):
+        """Ensure frequency transform runs as a patch method."""
+        method = getattr(random_patch, method_name)
+        out = method(winlen=0.01, step=0.01)
 
-        assert out.attrs.data_type == "Mean Frequency"
+        assert out.attrs.data_type == expected_data_type
         assert out.attrs.data_units == units.Hz
         assert "time" in out.dims
         assert "distance" in out.dims
 
-    def test_fft_method_runs(self, random_patch):
+    def test_fft_method_runs(self, random_patch, method_name, expected_data_type):
         """Ensure FFT method runs."""
-        out = random_patch.mean_frequency(winlen=0.01, step=0.01, method="fft")
+        method = getattr(random_patch, method_name)
+        out = method(winlen=0.01, step=0.01, method="fft")
 
-        assert out.attrs.data_type == "Mean Frequency"
+        assert out.attrs.data_type == expected_data_type
         assert out.attrs.data_units == units.Hz
         assert np.isfinite(out.data).any()
 
-    def test_step_defaults_to_winlen(self, random_patch):
+    def test_step_defaults_to_winlen(self, random_patch, method_name, expected_data_type):
         """Ensure step defaults to winlen."""
-        out = random_patch.mean_frequency(winlen=0.01)
-        expected = random_patch.mean_frequency(winlen=0.01, step=0.01)
+        method = getattr(random_patch, method_name)
+        out = method(winlen=0.01)
+        expected = method(winlen=0.01, step=0.01)
 
         assert np.allclose(out.data, expected.data, equal_nan=True)
 
-    def test_frequency_bounds(self, random_patch):
+    def test_frequency_bounds(self, random_patch, method_name, expected_data_type):
         """Ensure frequency bounds are accepted."""
-        out = random_patch.mean_frequency(
+        method = getattr(random_patch, method_name)
+        out = method(
             winlen=0.01,
             step=0.01,
             fmin=10,
             fmax=100,
         )
 
-        assert out.attrs.data_type == "Mean Frequency"
+        assert out.attrs.data_type == expected_data_type
         assert out.attrs.data_units == units.Hz
 
-    def test_invalid_frequency_bounds_raise(self, random_patch):
+    def test_invalid_frequency_bounds_raise(self, random_patch, method_name, expected_data_type):
         """Ensure invalid frequency bounds raise."""
+        method = getattr(random_patch, method_name)
         with pytest.raises(AssertionError, match="must be smaller than fmax"):
-            random_patch.mean_frequency(
+            method(
                 winlen=0.01,
                 step=0.01,
                 fmin=100,
                 fmax=10,
             )
 
-    def test_collapse_mode_runs(self, random_patch):
+    def test_collapse_mode_runs(self, random_patch, method_name, expected_data_type):
         """Ensure winlen=None collapses over the full time range."""
-        out = random_patch.mean_frequency()
+        method = getattr(random_patch, method_name)
+        out = method()
 
-        assert out.attrs.data_type == "Mean Frequency"
+        assert out.attrs.data_type == expected_data_type
         assert out.attrs.data_units == units.Hz
         assert out.data.shape != random_patch.data.shape
-
-
-class TestMedianFrequency:
-    """Tests for median_frequency patch function."""
-
-    def test_runs(self, random_patch):
-        """Ensure median_frequency runs as a patch method."""
-        out = random_patch.median_frequency(winlen=0.01, step=0.01)
-
-        assert out.attrs.data_type == "Median Frequency"
-        assert out.attrs.data_units == units.Hz
-        assert "time" in out.dims
-        assert "distance" in out.dims
-
-    def test_fft_method_runs(self, random_patch):
-        """Ensure FFT method runs."""
-        out = random_patch.median_frequency(winlen=0.01, step=0.01, method="fft")
-
-        assert out.attrs.data_type == "Median Frequency"
-        assert out.attrs.data_units == units.Hz
-        assert np.isfinite(out.data).any()
-
-    def test_step_defaults_to_winlen(self, random_patch):
-        """Ensure step defaults to winlen."""
-        out = random_patch.median_frequency(winlen=0.01)
-        expected = random_patch.median_frequency(winlen=0.01, step=0.01)
-
-        assert np.allclose(out.data, expected.data, equal_nan=True)
-
-    def test_frequency_bounds(self, random_patch):
-        """Ensure frequency bounds are accepted."""
-        out = random_patch.median_frequency(
-            winlen=0.01,
-            step=0.01,
-            fmin=10,
-            fmax=100,
-        )
-
-        assert out.attrs.data_type == "Median Frequency"
-        assert out.attrs.data_units == units.Hz
-
-    def test_invalid_frequency_bounds_raise(self, random_patch):
-        """Ensure invalid frequency bounds raise."""
-        with pytest.raises(AssertionError, match="must be smaller than fmax"):
-            random_patch.median_frequency(
-                winlen=0.01,
-                step=0.01,
-                fmin=100,
-                fmax=10,
-            )
-
-    def test_collapse_mode_runs(self, random_patch):
-        """Ensure winlen=None collapses over the full time range."""
-        out = random_patch.median_frequency()
-
-        assert out.attrs.data_type == "Median Frequency"
-        assert out.attrs.data_units == units.Hz
-        assert out.data.shape != random_patch.data.shape
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/test_transform/test_mean_median_frequency.py` around lines 11 - 124,
Duplicate test classes TestMeanFrequency and TestMedianFrequency should be
merged using pytest parametrization: replace the two classes with a single set
of parametrized tests that iterate over method names ("mean_frequency",
"median_frequency") and expected attrs ("Mean Frequency", "Median Frequency"),
calling the function via getattr(random_patch, method_name); update each test
(runs, fft_method_runs, step_defaults_to_winlen, frequency_bounds,
invalid_frequency_bounds_raise, collapse_mode_runs) to accept the parameters and
use getattr to invoke the method and assert expected attrs and behavior (e.g.,
compare data, check units.Hz, check exceptions), keeping the same test logic but
removing duplicated code in TestMeanFrequency and TestMedianFrequency.
dascore/transform/mean_median_frequency.py (1)

173-179: ⚡ Quick win

Use explicit exceptions for argument validation.

These checks are guarding public inputs, so assert can disappear under optimized Python and let invalid method / frequency bounds fall through into misleading errors or empty-spectrum behavior. Raise ValueError here instead.

Suggested change
-    assert (
-        method.upper() in allowed_methods
-    ), f"Invalid method: '{method}'. Must be one of {allowed_methods}"
+    if method.upper() not in allowed_methods:
+        raise ValueError(
+            f"Invalid method: '{method}'. Must be one of {allowed_methods}"
+        )

-    if fmin is not None and fmax is not None:
-        assert fmin < fmax, f"fmin ({fmin}) must be smaller than fmax ({fmax})"
+    if fmin is not None and fmax is not None and fmin >= fmax:
+        raise ValueError(f"fmin ({fmin}) must be smaller than fmax ({fmax})")
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@dascore/transform/mean_median_frequency.py` around lines 173 - 179, Replace
the use of assert for validating public inputs with explicit exceptions: where
the code checks method membership against allowed_methods (using method.upper()
in allowed_methods) raise a ValueError with the same message instead of using
assert, and likewise replace the assert that verifies fmin < fmax with a
ValueError describing the invalid bounds; update the validation logic
surrounding the allowed_methods check and the fmin/fmax check in
mean_median_frequency.py so callers get a clear ValueError when inputs are
invalid.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@dascore/transform/mean_median_frequency.py`:
- Around line 389-390: The example incorrectly calls patch.mean_frequency(...)
while documenting the median-frequency transform; change the example to call
patch.median_frequency(...) (and adjust the variable name if you want to reflect
median, e.g., med -> medf or median) so the documented example uses the correct
function (median_frequency) and matches the surrounding documentation.
- Around line 183-187: The code unconditionally resets nperseg to
data.shape[-1], ignoring any public nperseg parameter; change the logic in
mean_median_frequency (around the get_window/_welch call) to only set nperseg =
data.shape[-1] when the function parameter nperseg is None, validate that
provided nperseg <= data.shape[-1] (raise ValueError otherwise), then create win
= get_window("hann", nperseg) and call _welch(data, win, (1/dt),
nperseg=nperseg) so the public nperseg parameter is honored.
- Around line 410-420: The median_frequency implementation should guard against
empty or zero-power PSDs: after calling _get_psd_in_window (frq, pxx), first
handle the case where pxx has no frequency bins (e.g., pxx.shape[-1] == 0) by
returning an array of np.nan matching the expected output shape; next compute
cumulative_psd and total_power, and where total_power == 0 set the median result
to np.nan instead of using argmax (similar to mean_frequency's handling). Use
the existing names frq, pxx, cumulative_psd, total_power, median_idx and ensure
median_idx is only computed for non-zero-power spectra so you don't index frq
with invalid indices or report fmin/DC when there is no spectral content.
- Around line 196-200: The current logic sets inf2 = -1 when fmax is None which
causes pxx = pxx[:, :, inf1:inf2] and frq = frq[inf1:inf2] to drop the final
(Nyquist) bin; change the inf2 assignment so the slice includes the end (e.g.
inf2 = None if fmax is None else np.searchsorted(frq, fmax) or inf2 = frq.size
when fmax is None) so pxx and frq slicing (pxx[:, :, inf1:inf2], frq[inf1:inf2])
does not exclude the last frequency bin.

---

Nitpick comments:
In `@dascore/transform/mean_median_frequency.py`:
- Around line 173-179: Replace the use of assert for validating public inputs
with explicit exceptions: where the code checks method membership against
allowed_methods (using method.upper() in allowed_methods) raise a ValueError
with the same message instead of using assert, and likewise replace the assert
that verifies fmin < fmax with a ValueError describing the invalid bounds;
update the validation logic surrounding the allowed_methods check and the
fmin/fmax check in mean_median_frequency.py so callers get a clear ValueError
when inputs are invalid.

In `@tests/test_transform/test_mean_median_frequency.py`:
- Around line 11-124: Duplicate test classes TestMeanFrequency and
TestMedianFrequency should be merged using pytest parametrization: replace the
two classes with a single set of parametrized tests that iterate over method
names ("mean_frequency", "median_frequency") and expected attrs ("Mean
Frequency", "Median Frequency"), calling the function via getattr(random_patch,
method_name); update each test (runs, fft_method_runs, step_defaults_to_winlen,
frequency_bounds, invalid_frequency_bounds_raise, collapse_mode_runs) to accept
the parameters and use getattr to invoke the method and assert expected attrs
and behavior (e.g., compare data, check units.Hz, check exceptions), keeping the
same test logic but removing duplicated code in TestMeanFrequency and
TestMedianFrequency.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: c864e698-43e0-4de3-9945-0007fd0a41ad

📥 Commits

Reviewing files that changed from the base of the PR and between f8a171a and 879caa0.

📒 Files selected for processing (4)
  • dascore/core/patch.py
  • dascore/transform/__init__.py
  • dascore/transform/mean_median_frequency.py
  • tests/test_transform/test_mean_median_frequency.py

Comment on lines +183 to +187
nperseg = data.shape[-1]
win = get_window("hann", nperseg)

if method.upper() == "WELCH":
frq, pxx = _welch(data, win, (1 / dt), nperseg=nperseg)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

nperseg is currently ignored for Welch PSD.

nperseg gets unconditionally reset to the full window length here, so the public parameter never affects the result and "welch" collapses to a single full-window segment. That materially changes the estimator and makes the API contract misleading.

Suggested change
-    nperseg = data.shape[-1]
+    if nperseg is None:
+        nperseg = data.shape[-1]
+    else:
+        nperseg = min(int(nperseg), data.shape[-1])
     win = get_window("hann", nperseg)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
nperseg = data.shape[-1]
win = get_window("hann", nperseg)
if method.upper() == "WELCH":
frq, pxx = _welch(data, win, (1 / dt), nperseg=nperseg)
if nperseg is None:
nperseg = data.shape[-1]
else:
nperseg = min(int(nperseg), data.shape[-1])
win = get_window("hann", nperseg)
if method.upper() == "WELCH":
frq, pxx = _welch(data, win, (1 / dt), nperseg=nperseg)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@dascore/transform/mean_median_frequency.py` around lines 183 - 187, The code
unconditionally resets nperseg to data.shape[-1], ignoring any public nperseg
parameter; change the logic in mean_median_frequency (around the
get_window/_welch call) to only set nperseg = data.shape[-1] when the function
parameter nperseg is None, validate that provided nperseg <= data.shape[-1]
(raise ValueError otherwise), then create win = get_window("hann", nperseg) and
call _welch(data, win, (1/dt), nperseg=nperseg) so the public nperseg parameter
is honored.

Comment on lines +196 to +200
inf1 = 0 if fmin is None else np.searchsorted(frq, fmin)
inf2 = -1 if fmax is None else np.searchsorted(frq, fmax)

pxx = pxx[:, :, inf1:inf2]
frq = frq[inf1:inf2]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

The default slice drops the last frequency bin.

When fmax is None, inf2 = -1 excludes the Nyquist bin from every default call. That biases both mean and median frequency downward.

Suggested change
     inf1 = 0 if fmin is None else np.searchsorted(frq, fmin)
-    inf2 = -1 if fmax is None else np.searchsorted(frq, fmax)
+    inf2 = None if fmax is None else np.searchsorted(frq, fmax, side="right")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
inf1 = 0 if fmin is None else np.searchsorted(frq, fmin)
inf2 = -1 if fmax is None else np.searchsorted(frq, fmax)
pxx = pxx[:, :, inf1:inf2]
frq = frq[inf1:inf2]
inf1 = 0 if fmin is None else np.searchsorted(frq, fmin)
inf2 = None if fmax is None else np.searchsorted(frq, fmax, side="right")
pxx = pxx[:, :, inf1:inf2]
frq = frq[inf1:inf2]
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@dascore/transform/mean_median_frequency.py` around lines 196 - 200, The
current logic sets inf2 = -1 when fmax is None which causes pxx = pxx[:, :,
inf1:inf2] and frq = frq[inf1:inf2] to drop the final (Nyquist) bin; change the
inf2 assignment so the slice includes the end (e.g. inf2 = None if fmax is None
else np.searchsorted(frq, fmax) or inf2 = frq.size when fmax is None) so pxx and
frq slicing (pxx[:, :, inf1:inf2], frq[inf1:inf2]) does not exclude the last
frequency bin.

Comment thread dascore/transform/mean_median_frequency.py Outdated
Comment on lines +410 to +420
frq, pxx = _get_psd_in_window(
data, dt=dt, method=method, fmin=fmin, fmax=fmax, nperseg=nperseg
)

# calculate median frequency
cumulative_psd = np.cumsum(pxx, axis=-1)
total_power = cumulative_psd[:, :, -1]
half = total_power[..., None] / 2
median_idx = np.argmax(cumulative_psd >= half, axis=-1)

return frq[median_idx]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Guard empty and zero-power spectra before computing the median bin.

If the frequency filter removes all bins, cumulative_psd[:, :, -1] raises immediately. If total power is zero, argmax returns index 0 and reports fmin/DC as the median despite there being no spectral content. mean_frequency already handles the zero-power case; median_frequency should too.

Suggested change
         frq, pxx = _get_psd_in_window(
             data, dt=dt, method=method, fmin=fmin, fmax=fmax, nperseg=nperseg
         )
+        if frq.size == 0:
+            raise ValueError("Frequency bounds exclude the entire spectrum")

         # calculate median frequency
         cumulative_psd = np.cumsum(pxx, axis=-1)
         total_power = cumulative_psd[:, :, -1]
         half = total_power[..., None] / 2
         median_idx = np.argmax(cumulative_psd >= half, axis=-1)

-        return frq[median_idx]
+        out = np.zeros_like(total_power, dtype=float)
+        valid = total_power > 0
+        out[valid] = frq[median_idx[valid]]
+        return out
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
frq, pxx = _get_psd_in_window(
data, dt=dt, method=method, fmin=fmin, fmax=fmax, nperseg=nperseg
)
# calculate median frequency
cumulative_psd = np.cumsum(pxx, axis=-1)
total_power = cumulative_psd[:, :, -1]
half = total_power[..., None] / 2
median_idx = np.argmax(cumulative_psd >= half, axis=-1)
return frq[median_idx]
frq, pxx = _get_psd_in_window(
data, dt=dt, method=method, fmin=fmin, fmax=fmax, nperseg=nperseg
)
if frq.size == 0:
raise ValueError("Frequency bounds exclude the entire spectrum")
# calculate median frequency
cumulative_psd = np.cumsum(pxx, axis=-1)
total_power = cumulative_psd[:, :, -1]
half = total_power[..., None] / 2
median_idx = np.argmax(cumulative_psd >= half, axis=-1)
out = np.zeros_like(total_power, dtype=float)
valid = total_power > 0
out[valid] = frq[median_idx[valid]]
return out
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@dascore/transform/mean_median_frequency.py` around lines 410 - 420, The
median_frequency implementation should guard against empty or zero-power PSDs:
after calling _get_psd_in_window (frq, pxx), first handle the case where pxx has
no frequency bins (e.g., pxx.shape[-1] == 0) by returning an array of np.nan
matching the expected output shape; next compute cumulative_psd and total_power,
and where total_power == 0 set the median result to np.nan instead of using
argmax (similar to mean_frequency's handling). Use the existing names frq, pxx,
cumulative_psd, total_power, median_idx and ensure median_idx is only computed
for non-zero-power spectra so you don't index frq with invalid indices or report
fmin/DC when there is no spectral content.

@andreas-wuestefeld andreas-wuestefeld added patch related to Patch class transform Related to transform operations labels May 21, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (2)
dascore/transform/mean_median_frequency.py (2)

151-151: 💤 Low value

Remove redundant import.

numpy is already imported at the module level (line 11). This inner import is unnecessary.

Suggested fix
-    import numpy as np
-
     n = xx.shape[-1]
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@dascore/transform/mean_median_frequency.py` at line 151, Remove the redundant
local "import numpy as np" statement inside the module (present around line 151)
— numpy is already imported at the top of
dascore/transform/mean_median_frequency.py, so delete the inner import to avoid
duplication and potential shadowing while leaving all uses of "np" unchanged.

1-5: 💤 Low value

Update module docstring to describe actual purpose.

The docstring still contains Spyder Editor boilerplate ("temporary script file") rather than describing the mean/median frequency transforms. Consider replacing with a meaningful module description.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@dascore/transform/mean_median_frequency.py` around lines 1 - 5, Replace the
placeholder Spyder Editor docstring at the top of mean_median_frequency.py with
a concise module docstring that describes the purpose and public API of this
module (e.g., that it implements mean and median frequency transforms for
signals), mentions the key functions/classes such as mean_frequency and
median_frequency (or whichever public functions are defined in this file),
documents expected inputs/outputs (signal array, frequency axis, return value),
and notes any important behavior or dependencies; keep it short, informative,
and aligned with the actual function names in the file.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@dascore/transform/mean_median_frequency.py`:
- Around line 384-385: The default step behavior in median_frequency is
inconsistent with mean_frequency: change the median_frequency branch that
currently sets step = winlen when step is None to instead set step = dt so both
transforms default to the sampling interval; update any related docstring or
comment for median_frequency to reflect that step defaults to dt, and ensure
variable names used are the same as in mean_frequency (dt, winlen) to keep
output shapes consistent.

In `@tests/test_transform/test_mean_median_frequency.py`:
- Line 211: The test in TestMedianFrequency incorrectly calls
patch.mean_frequency(...) so median_frequency's transpose behavior isn't tested;
update the call where out = patch.mean_frequency(...) to call
patch.median_frequency(...) instead (keeping the same arguments and assertions)
so the TestMedianFrequency test actually exercises the median_frequency
function.

---

Nitpick comments:
In `@dascore/transform/mean_median_frequency.py`:
- Line 151: Remove the redundant local "import numpy as np" statement inside the
module (present around line 151) — numpy is already imported at the top of
dascore/transform/mean_median_frequency.py, so delete the inner import to avoid
duplication and potential shadowing while leaving all uses of "np" unchanged.
- Around line 1-5: Replace the placeholder Spyder Editor docstring at the top of
mean_median_frequency.py with a concise module docstring that describes the
purpose and public API of this module (e.g., that it implements mean and median
frequency transforms for signals), mentions the key functions/classes such as
mean_frequency and median_frequency (or whichever public functions are defined
in this file), documents expected inputs/outputs (signal array, frequency axis,
return value), and notes any important behavior or dependencies; keep it short,
informative, and aligned with the actual function names in the file.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 6becbb98-c994-42a0-a0d0-f40e4a2176b6

📥 Commits

Reviewing files that changed from the base of the PR and between 879caa0 and 94b0e19.

📒 Files selected for processing (2)
  • dascore/transform/mean_median_frequency.py
  • tests/test_transform/test_mean_median_frequency.py

Comment thread dascore/transform/mean_median_frequency.py Outdated
Comment thread tests/test_transform/test_mean_median_frequency.py Outdated
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
dascore/transform/mean_median_frequency.py (2)

179-184: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Replace assert with explicit exceptions.

assert statements are removed when Python runs with optimization (-O), causing these validations to silently disappear. Use explicit ValueError raises instead.

Suggested fix
-    assert (
-        method.upper() in allowed_methods
-    ), f"Invalid method: '{method}'. Must be one of {allowed_methods}"
+    if method.upper() not in allowed_methods:
+        raise ValueError(f"Invalid method: '{method}'. Must be one of {allowed_methods}")

     if fmin is not None and fmax is not None:
-        assert fmin < fmax, f"fmin ({fmin}) must be smaller than fmax ({fmax})"
+        if fmin >= fmax:
+            raise ValueError(f"fmin ({fmin}) must be smaller than fmax ({fmax})")
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@dascore/transform/mean_median_frequency.py` around lines 179 - 184, Replace
the two assert checks with explicit ValueError raises so validation remains in
optimized runs: for the method check, raise ValueError with the message "Invalid
method: '{method}'. Must be one of {allowed_methods}" when method.upper() not in
allowed_methods (use the same variables method and allowed_methods), and for the
frequency bounds check, if fmin is not None and fmax is not None and not (fmin <
fmax) raise ValueError with the message "fmin ({fmin}) must be smaller than fmax
({fmax})" (use the same fmin/fmax variables).

241-242: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Docstring contradicts implementation for step default.

The docstring states step "Defaults to winlen", but line 275 sets step = dt (the sampling interval) when step is None.

Suggested fix
     step
-        Step between windows in seconds. Defaults to `winlen`.
+        Step between windows in seconds. Defaults to the sampling interval (`dt`).
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@dascore/transform/mean_median_frequency.py` around lines 241 - 242, Docstring
and implementation disagree: the docstring for the step parameter says "Defaults
to `winlen`" but the code sets step = dt when step is None; pick one behavior
and make them consistent. Either change the implementation to set step = winlen
when step is None (update the assignment where step is set) or update the
docstring to state that step defaults to the sampling interval `dt`; adjust the
text near the parameter description in mean_median_frequency.py so the parameter
description for `step` accurately matches the assignment that uses `dt`, and
ensure any references to `winlen` vs `dt` are consistent in that function.
🧹 Nitpick comments (4)
dascore/transform/mean_median_frequency.py (4)

327-327: 💤 Low value

Use int type hint for nperseg.

Same as mean_frequency - nperseg represents a number of samples.

Suggested fix
-    nperseg: float = 256,
+    nperseg: int = 256,
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@dascore/transform/mean_median_frequency.py` at line 327, The nperseg
parameter in mean_median_frequency is incorrectly typed as float; change its
annotation to int (matching mean_frequency) so nperseg: int = 256, and update
any related type hints or docstring references in mean_median_frequency to
reflect it represents a sample count rather than a float.

218-218: 💤 Low value

Use int type hint for nperseg.

nperseg represents a number of samples and should be an integer.

Suggested fix
-    nperseg: float = 256,
+    nperseg: int = 256,
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@dascore/transform/mean_median_frequency.py` at line 218, The parameter
nperseg is annotated as float but represents a count of samples and should be an
int; update the type hint for the nperseg parameter (in the mean/median
frequency function in dascore.transform.mean_median_frequency where nperseg:
float = 256 is declared) to use int (nperseg: int = 256) and ensure any internal
usage or calls pass/expect an integer accordingly.

1-5: 💤 Low value

Replace placeholder module docstring.

The docstring contains leftover Spyder editor boilerplate. It should describe the module's purpose (e.g., "Rolling mean and median frequency transforms for DASCore patches").

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@dascore/transform/mean_median_frequency.py` around lines 1 - 5, Replace the
placeholder Spyder editor docstring at the top of the module with a real
module-level docstring that describes the purpose and high-level functionality
(e.g., "Rolling mean and median frequency transforms for DASCore patches"),
mention the main utilities provided (rolling_mean_frequency,
rolling_median_frequency if those functions exist) and any important
inputs/outputs or expected data types so readers know what the module
implements.

151-151: 💤 Low value

Remove redundant import.

numpy is already imported at the module level (line 11).

Suggested fix
-    import numpy as np
-
     n = xx.shape[-1]
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@dascore/transform/mean_median_frequency.py` at line 151, Remove the redundant
"import numpy as np" in mean_median_frequency.py (the extra import on line where
the diff shows "import numpy as np"); the module already imports numpy at the
top, so delete this duplicate import to avoid redundancy and potential linter
warnings while leaving any uses of the symbol "np" elsewhere (e.g., inside
functions or class methods in this module) unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@dascore/transform/mean_median_frequency.py`:
- Around line 179-184: Replace the two assert checks with explicit ValueError
raises so validation remains in optimized runs: for the method check, raise
ValueError with the message "Invalid method: '{method}'. Must be one of
{allowed_methods}" when method.upper() not in allowed_methods (use the same
variables method and allowed_methods), and for the frequency bounds check, if
fmin is not None and fmax is not None and not (fmin < fmax) raise ValueError
with the message "fmin ({fmin}) must be smaller than fmax ({fmax})" (use the
same fmin/fmax variables).
- Around line 241-242: Docstring and implementation disagree: the docstring for
the step parameter says "Defaults to `winlen`" but the code sets step = dt when
step is None; pick one behavior and make them consistent. Either change the
implementation to set step = winlen when step is None (update the assignment
where step is set) or update the docstring to state that step defaults to the
sampling interval `dt`; adjust the text near the parameter description in
mean_median_frequency.py so the parameter description for `step` accurately
matches the assignment that uses `dt`, and ensure any references to `winlen` vs
`dt` are consistent in that function.

---

Nitpick comments:
In `@dascore/transform/mean_median_frequency.py`:
- Line 327: The nperseg parameter in mean_median_frequency is incorrectly typed
as float; change its annotation to int (matching mean_frequency) so nperseg: int
= 256, and update any related type hints or docstring references in
mean_median_frequency to reflect it represents a sample count rather than a
float.
- Line 218: The parameter nperseg is annotated as float but represents a count
of samples and should be an int; update the type hint for the nperseg parameter
(in the mean/median frequency function in
dascore.transform.mean_median_frequency where nperseg: float = 256 is declared)
to use int (nperseg: int = 256) and ensure any internal usage or calls
pass/expect an integer accordingly.
- Around line 1-5: Replace the placeholder Spyder editor docstring at the top of
the module with a real module-level docstring that describes the purpose and
high-level functionality (e.g., "Rolling mean and median frequency transforms
for DASCore patches"), mention the main utilities provided
(rolling_mean_frequency, rolling_median_frequency if those functions exist) and
any important inputs/outputs or expected data types so readers know what the
module implements.
- Line 151: Remove the redundant "import numpy as np" in
mean_median_frequency.py (the extra import on line where the diff shows "import
numpy as np"); the module already imports numpy at the top, so delete this
duplicate import to avoid redundancy and potential linter warnings while leaving
any uses of the symbol "np" elsewhere (e.g., inside functions or class methods
in this module) unchanged.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: a09ff1a8-d6fd-4575-8c40-26c8bca071e3

📥 Commits

Reviewing files that changed from the base of the PR and between 94b0e19 and 43b8bcf.

📒 Files selected for processing (2)
  • dascore/transform/mean_median_frequency.py
  • tests/test_transform/test_mean_median_frequency.py

Copy link
Copy Markdown
Contributor

@d-chambers d-chambers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Main suggestions here:

  1. Consider broadening the API to permit any dimension name to work, eg for higher order patches with multiple time-like dimensions.

  2. Decide if patches with 1 or 3+ dimensions are allowed. If so, rework the helper functions. If not, raise a error early on.

  3. Consider renaming some of the opaque numpy/scipy style variables to something more informative unless it becomes overly verbose. Particularly, the function parameters would benefit from clear names, even if internal variables keep the abbreviated form.

  4. See if maybe_numba_jit can come in handy here, especially for the _welch function.

@@ -0,0 +1,418 @@
"""Rolling mean and median frequency transforms for DASCore patches"""
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe some maybe_numba_jit would be appropriate here too?

Comment on lines +308 to +309
if patch.get_axis("distance") != mfr_patch.get_axis("distance"):
mfr_patch = mfr_patch.transpose()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is problematic because you didn't declare distance was required. You might have time and depth for example, then this would fail.

A better approach is to not depend on distance at all.

original_dims = patch.dims
... # do stuff here
  out = out.transpose(*original_dims)

Comment on lines +205 to +206
@patch_function(required_dims=("time",), history="full")
def mean_frequency(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You probably already know what I am going to say here, but...

I really don't like to restrict patch functions to a particular dimension when there isn't really a very good technical reason to do it. It is not just because there might be some theoretical (but bizarre) application to the distance axis.

Consider, for example, that in a higher-dimensional patch, there might be multiple time dimensions (eg absolte_time, shot_time, relative_time, lag_time, etc.). It could become vary awkward to use this function in such a case if the desired time axis isn't called exactly "time".

So, my suggestion here is like with the fbe function: use kwargs to indicate window length. Then we know the window length and the dimension from one parameter call.

fmin: float | None = None,
fmax: float | None = None,
method: str = "welch",
nperseg: int = 256,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This parameter is not documented in your docstring. Also, it has a very opaque name which probably corresponds to the scipy/numpy name, but I find it a bit unhelpful. I suggest naming int something like
segment_samples, welsh_samples or spectral_window_samples.

from dascore.utils.patch import patch_function


def _welch(xx, win, fs=1.0, nperseg=256):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two suggestions here.

First, I know this is mainly adapted from scipy's function, but the variable naming in scipy is often less than clear. Perhaps we could give the public facing parameters some names that are a bit longer but more specific?

Second, this function assumes a 2d array as xx. We probably want to just refactor a bit to allow an array of any number of dimension (eg for higher order patches common in active source workflows).

Here are codex's suggestions to make this work. If this is too onerous let's add an explicit check before the code gets here to indicate only 2D patches are accepted.

  1. Stop indexing xx.shape[2].

n_samples = xx.shape[-1]

  1. Flatten all leading dimensions before the segment loop:

lead_shape = xx.shape[:-1]
x_flat = xx.reshape(-1, n_samples)

  1. Allocate PSD using flattened trace count:

n_traces = x_flat.shape[0]
nfreq = nfft // 2 + 1
psd = np.zeros((n_traces, nfreq), dtype=float)

  1. Remove the loop over xx.shape[1].

Current code loops over the middle “rolling window” axis:

for i in range(xx.shape[1]):
x = xx[:, i, :]

That is the 3D assumption. Instead, operate on all flattened traces together:

x = x_flat

  1. Compute each Welch segment for all traces at once:

for k, ind in enumerate(indices):
segment = x[:, ind : ind + nperseg]
segment = segment - np.mean(segment, axis=-1, keepdims=True)
xft = fftpack.rfft(segment * win, nfft)
...

  1. Accumulate/average PSD across segments.

You can keep the existing incremental average logic, but it should update pxx shaped (n_traces, nfreq) instead of (distance, nfreq).

  1. Keep separate even/odd FFT unpacking branches, but make them use x_flat.

Even branch:

pxx[:, (0, -1)] = xft[:, (0, -1)] ** 2
pxx[:, 1:-1] = xft[:, 1:-1:2] ** 2 + xft[:, 2::2] ** 2

Odd branch:

pxx[:, 0] = xft[:, 0] ** 2
pxx[:, 1:] = xft[:, 1::2] ** 2 + xft[:, 2::2] ** 2

  1. Apply scaling to pxx once after averaging.
  2. Reshape back to the original leading dimensions:

psd = pxx.reshape(lead_shape + (pxx.shape[-1],))

  1. Build frequency axis from nfft:

f = np.arange(pxx.shape[-1]) * (fs / nfft)

So structurally _welch should go from (only works for (a, b, samples):

psd = np.zeros((xx.shape[0], xx.shape[1], nfreq))
for i in range(xx.shape[1]):
x = xx[:, i, :]
...
psd[:, i, :] = pxx

to (works for (..., samples)):

lead_shape = xx.shape[:-1]
x = xx.reshape(-1, xx.shape[-1])
pxx = compute_welch_for_2d_array(x)
psd = pxx.reshape(lead_shape + (pxx.shape[-1],))

if patch.get_axis("distance") != mfr_patch.get_axis("distance"):
mfr_patch = mfr_patch.transpose()

return mfr_patch.update(attrs={"data_type": "Mean Frequency", "data_units": "Hz"})
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we no longer assume time index, data units should become inverse of current axis units.

coord = patch.get_coordinate(dim)
units = 1/dc.get_quantity(coord.units) if coord.units else None


# use only desired part of spectrum
inf1 = 0 if fmin is None else np.searchsorted(frq, fmin)
inf2 = -1 if fmax is None else np.searchsorted(frq, fmax)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As code rabbit caught, this drops the final frequency bin when fmax is None because it uses inf2 = -1. That omits the Nyquist bin from default calculations, biasing both mean and median frequency. You probably just want None as the slice stop when no upper bound is provided.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

patch related to Patch class transform Related to transform operations

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants