Skip to content

rep_slice_sample on groups with multiple n values #527

Description

@adrie-stclair

Hello package maintainers!
I am building confidence intervals for groups with bootstrapped values and I'm having trouble creating multiple re-sampled datasets from which to build my confidence intervals.

Using the palmerpenguins library as an example:

library(tidyverse)
library(infer)
library(palmerpenguins)

There are 344 total observations and each species has a different number of observations:

nrow(penguins)
# [1] 344

penguins %>% group_by(species) %>% count()

# A tibble: 3 × 2
# Groups:   species [3]
#  species       n
  <fct>     <int>
#1 Adelie      152
#2 Chinstrap    68
#3 Gentoo      124

I want to be able to group by the species, and for each species pull multiple samples while using the original number of observations per each group.

set.seed(100)

slices <- penguins2 %>% 
    group_by(species) %>% 
    rep_slice_sample(prop = 1, replace = TRUE, reps = 10)

That should give me 344 * 10 = 3440 lines in the full new data set. This is true, but when you look at the data you can see that each replicate has a different number of observations. For all of the Adelie, n per sample should be 152, chinstrap should be 68, and Gentoo should be 124. Instead we find this:

slices %>% group_by(species, replicate) %>% count()

# A tibble: 30 × 3
# Groups:   species, replicate [30]
#   species replicate     n
#   <fct>       <int> <int>
#1 Adelie          1   148
#2 Adelie          2   147
# 3 Adelie          3   148
# 4 Adelie          4   151
# 5 Adelie          5   138
# 6 Adelie          6   157
# 7 Adelie          7   161
# 8 Adelie          8   157
# 9 Adelie          9   151
#10 Adelie         10   138
# ℹ 20 more rows
# ℹ Use `print(n = ...)` to see more rows

What am I missing?
thanks for your insight.

Metadata

Metadata

Assignees

No one assigned

    Labels

    featurea feature request or enhancement

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions