Feature eair: Extra combined statistic required to be able to present "n (EAIR per 100 SY)"#362
Feature eair: Extra combined statistic required to be able to present "n (EAIR per 100 SY)"#362iaugusty wants to merge 31 commits into
Conversation
…hnson/junco into feature/hotfix_eair
nikolas-burkoff
left a comment
There was a problem hiding this comment.
Overall it's looking good. A few comments from me (note I have not gone into detail to check the maths is correct).
I think if both eair is needed by subject and by events having two separate calls is fine - though I would add that as a note in the roxygen comment - it feels unlikely the same output would need both
I think the row labels behaviour makes sense though I don't know if there's an expectation from others for how it should behave
| lifecycle, | ||
| dplyr | ||
| dplyr, | ||
| ratesci (>= 1.0.0.9000) |
There was a problem hiding this comment.
I see this is only used for tests now. If it's going to stay like this (and the custom implementation is going to be used) I might actually remove the dependency here and create test cases like below.
We can then have control over when we update our test cases rather than tests failing because something changed in ratesci (the argument against this is if we think it's likely there are bugs and ratesci and our implementation and ratesci fixes them, then we might miss them)
testthat::test_that("our implementation matches ratesci", {
our_result <- ...
# These values have been created by running:
# ratesci::package_function(...) using version xxxx of the ratesci package
ratesci_result <- c("0.0045", "0.33544")
expect_equal(our_result, ratesci_result)
})| insightsengineering/formatters@main, | ||
| insightsengineering/tern@main | ||
| insightsengineering/tern@main, | ||
| petelaud/ratesci |
There was a problem hiding this comment.
In terms of keeping our custom implementation v using ratesci - once ratesci is updated on CRAN ( I wouldn't use ratesci until it is), personally I'd prefer to use ratesci rather than maintaining an implementation ourselves as this simplifies thing.
However, adding a dependency on a stats package adds risks in terms of validation. I don't know junco/tern's policy on this so it may be that the control we gain from our own implementation actually outweights the benefits of adding the dependency in our case (especially as the custom implementation is already written)
| #' Defaults to `100`, yielding a rate per 100 person-years. | ||
| #' For example, use `1` for a rate per person-year or `1000` for a rate per | ||
| #' 1000 person-years. | ||
| #' @param count_events (`logical`)\cr if `TRUE`, counts the total number of events |
There was a problem hiding this comment.
How about count_multiple_events_per_subj being TRUE or FALSE or count_events_per_subject being "first" (or "single") or "all"
There was a problem hiding this comment.
I also think the default label should change dependent on the value of this, like tern - as it makes it very clear to users what's going on
| #' with at least one occurrence of a specified adverse event divided by the total | ||
| #' at-risk exposure time across all subjects. | ||
| #' | ||
| #' At-risk exposure time is calculated for each subject |
There was a problem hiding this comment.
I think this needs updating with the new count_events argument
| #' @param fup_var (`string`)\cr follow-up variable name. | ||
| #' @param occ_var (`string`)\cr occurrence variable name. | ||
| #' @param occ_dy (`string`)\cr occurrence day variable name. | ||
| #' @param fup_var (`string`)\cr name of the variable containing the total follow-up duration |
There was a problem hiding this comment.
These add a lot more detail which I think is great
|
@munoztd0 can you take a look into failed checks and pkgdown issues, these seem to be from files not touched in this PR? |
|
@danielinteractive could you take a look into this PR as well, especially your expertise on derivations of confidence interval for difference in rates - own derivations vs dependency on other statistical package (here ratesci) would be appreciated current approach is : perform own calculations, in tests confirm results with ratesci package |
There was a problem hiding this comment.
Hi @iaugusty , thanks for reaching out, I had a look at this file in particular. In general I would say it is ok to implement the calculations ourselves and then compare with (hardcoded results from) another package.
However, I think we should be careful not to have more locations where we put differences of rates code - currently we have https://github.com/insightsengineering/tern/blob/main/R/prop_diff.R and also in our junco package here we have in principle https://github.com/johnsonandjohnson/junco/blob/main/R/estimate_proportion_diff.R. If we could use the tern functions as much as possible here that would be good - if the interface from tern is not convenient we could also propose a change to that there. If we need something faster then we could also add it here in the junco file, but then if possible follow similar conventions for the function names, such that they can also be reused again in other places later.
There was a problem hiding this comment.
@danielinteractive
incidence rate code is in https://github.com/insightsengineering/tern/blob/v0.9.10/R/incidence_rate.R, rather than proportions.
The design of incidence rate functions, esp how input datasets are utilized, is quite different between tern and junco. In tern, input dataset requires event and non events in the same input dataset, we have only events in input, non events information is from adsl/alt_counts_df dataset. Our design is easier to generate incidence rates per aedecod, and input dataset requirements is very similar to ADAM ADAE dataset.
As this difference in design exists, it seemed harder to integrate our function within tern.
wrt confidence intervals, in tern only confidence intervals for the incidence rates itself are available, not for difference in rates. This would be something we can propose in tern.
There was a problem hiding this comment.
Ah those are incidence rates, ok, I did not see that quickly.
Maybe then for now we could put them in a similar incidence_rates_diff.R file in junco and propose an inclusion upstream in tern later. But then we will need more documentation of the calculation functions, including literature references where applicable.
| waldcc = h_s_eair_diff_wald(x1, n1, x2, n2, conf_level, cc = TRUE), | ||
| wald = h_s_eair_diff_wald(x1, n1, x2, n2, conf_level, cc = FALSE), | ||
| mn = h_s_eair_diff_mn(x1, n1, x2, n2, conf_level, skew = FALSE), |
There was a problem hiding this comment.
I would be relatively optimistic that we have these 3 in tern already implemented - did you have a look already?
| waldcc = h_s_eair_diff_wald(x1, n1, x2, n2, conf_level, cc = TRUE), | ||
| wald = h_s_eair_diff_wald(x1, n1, x2, n2, conf_level, cc = FALSE), | ||
| mn = h_s_eair_diff_mn(x1, n1, x2, n2, conf_level, skew = FALSE), | ||
| scas = h_s_eair_diff_mn(x1, n1, x2, n2, conf_level, skew = TRUE) |
There was a problem hiding this comment.
This skewness correction could be interesting and might not be in tern yet
Pull Request
Fixes #361
Checks