Skip to content

siardv/weasel

Repository files navigation

weasel

R-CMD-check

Wave-based Extraction and Selection for Longitudinal Data.

Tools for selecting, filtering, and balancing longitudinal panel data across survey waves. A respondent counts as observed at a wave when a row with that (id, wave) pair exists in the long-format data; a missed wave is an absent row.

Installation

# install.packages("remotes")
remotes::install_github("siardv/weasel", build_vignettes = TRUE)

Quick start

library(weasel)

d <- generate_weasel_dummy_data(n_ids = 300, n_times = 12, seed = 1)

# scenario planning
p <- weasel_plan(d, id = "id", wave = "time", span = "core")
p  # compact print: span, scenario table, attached data size

cmp <- weasel_compare_scenarios(p)
weasel_print_table(cmp, title = "Scenarios")

analysis_data <- weasel_apply(p, "anchored_balanced")
cat(weasel_justify_subset(p, "anchored_balanced"))

# audit the selection
weasel_print_table(weasel_sensitivity(p, max_missing = 0:2), n = 10)
weasel_print_table(weasel_selectivity(p, "anchored_balanced"))

# interactive pattern exploration
set_weasel_scope(d, "id", "time", gap = 1)
evaluate_weasel_scope()
weasel_reshape_to_wide()
weasel_summarize_waves()
weasel_print_table(weasel_filter_wave_summary(), n = 10)
weasel_scope_info()
weasel_clear_scope()

Panels with non-consecutive schedules (biennial waves, waves recorded as years) are supported through grid = "observed" in weasel_plan() and set_weasel_scope().

See the guide for a full walkthrough, or the vignettes vignette("introduction", package = "weasel") and vignette("advanced-usage", package = "weasel").

Two pipelines

Pipeline Entry point Purpose
Scope set_weasel_scope() Interactive exploration of wave patterns
Plan weasel_plan() Named, comparable, defensible selection scenarios

Both share the same structural vocabulary: endpoints (observed first and last wave of the window) and interior gaps (runs of missing waves strictly between a respondent's first and last observed wave).

Diagnostics

  • weasel_sensitivity() sweeps the selection tolerances and reports the retained sample size for every combination.
  • weasel_selectivity() compares retained and excluded respondents on covariates (standardized mean differences) to check whether completeness-based selection skews the sample.
  • weasel_scope_info() prints the state of the active scope.

Options

options(weasel.verbose = FALSE) silences all status messages.

A note on how this package was built

I started building weasel in June 2021, before AI coding assistants were a realistic option, and it has been a constant companion project ever since. The problem it addresses, the selection logic, the architecture, and the design decisions grew out of five years of building, discarding, and rebuilding.

I also want to be open about the fact that AI language models (including Anthropic's Claude) contributed to later versions. I used them as assistants, not as authors: to review code, stress-test logic, propose refactorings, draft tests and documentation, and speed up the grueling parts of package development. Nothing was accepted on trust. Every suggestion was read, questioned, run, and frequently rejected or rewritten; whatever ships has passed the full test suite and R CMD check, and responsibility for every line, including the mistakes, is mine alone.

weasel exists to make selection decisions in panel data explicit instead of silent. It seems only consistent to be equally explicit about how the package itself was made. If you have questions about any part of that process, the issue tracker is open.

License

MIT

About

Wave-based Extraction and Selection for Longitudinal Data (R package and guide)

Resources

License

Unknown, MIT licenses found

Licenses found

Unknown
LICENSE
MIT
LICENSE.md

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages