We currently use INDEX as a rough way to get access to exported package datasets so we don't flag them as false positives.
But both data/Rdata.rds and Meta/data.rds have this info as well:
data/Rdata.rds — a load-time manifest
- Written by data2LazyLoadDB at src/library/tools/R/makeLazyLoad.R:153, alongside the
lazy-load DB (Rdata.rdb/Rdata.rdx).
- Contents: a named list, one entry per data topic (original file name), whose value is the
character vector of object names that topic binds when loaded.
- Read by list_data_in_pkg at src/library/tools/R/index.R and by utils::data()
(src/library/utils/R/data.R:74) so they know which R objects a call like data("foo") will
create.
- Only present when the package's data is lazy-loaded.
Meta/data.rds — a documentation index
- Written by .install_package_Rd_indices at src/library/tools/R/admin.R:493 via
.build_data_index (src/library/tools/R/index.R:32).
- Contents: a 2-column character matrix of [topic, Rd title]. The topic column is formatted
for display (e.g. "obj (file)" when the object name differs from the file name). It is built
by combining list_data_in_pkg (which itself reads Rdata.rds when present) with the package's
Rd contents table.
- Consumed when producing the user-facing "Data sets in package" listing from data() and by
tools that need topic → title.
Meta/data.rds looks to be a superset that works for packages that don't LazyData their datasets. It works like this:
1. `data/Rdata.rds` exists → use it (LazyData package).
2. Else `data/datalist` exists → parse it.
3. Else → scan `data/`, call `utils::data()` on each file, record the objects it creates.
Here is what a standard result looks like for a LazyData package, here workflowsets:
readRDS("/Users/davis/Library/R/arm64/4.5/library/workflowsets/data/Rdata.rds")
#> $chi_features_set
#> [1] "chi_features_res" "chi_features_set"
#>
#> $two_class_set
#> [1] "two_class_res" "two_class_set"
readRDS("/Users/davis/Library/R/arm64/4.5/library/workflowsets/Meta/data.rds")
#> [,1] [,2]
#> [1,] "chi_features_res (chi_features_set)" "Chicago Features Example Data"
#> [2,] "chi_features_set" "Chicago Features Example Data"
#> [3,] "two_class_res (two_class_set)" "Two Class Example Data"
#> [4,] "two_class_set" "Two Class Example Data"
{clinical} is an example of a package that doesn't have LazyData. Instead it has prostate listed in data/datalist which is not available via clinical::prostate but is available via data(prostate, package = "clinical").
try(readRDS("/Users/davis/Library/R/arm64/4.5/library/clinical/data/Rdata.rds"))
#> Warning in gzfile(file, "rb"): cannot open compressed file
#> '/Users/davis/Library/R/arm64/4.5/library/clinical/data/Rdata.rds', probable
#> reason 'No such file or directory'
#> Error in gzfile(file, "rb") : cannot open the connection
readRDS("/Users/davis/Library/R/arm64/4.5/library/clinical/Meta/data.rds")
#> [,1] [,2]
#> [1,] "prostate" "Clinical Data of a Cohort of Prostate Cancer Patiens"
For completions after pkg:: and diagnostics after a call to library(), I think we'd just want LazyData supported datasets, because non lazy loaded ones require a call to data() first.
So the result of this analysis is that data/Rdata.rds is probably what we should be looking at.
We could add a parsed result of looking at this file to our package cache on cache creation.
For help, i.e. ?pkg::data, note that ?clinical::prostate DOES work even though clinical::prostate at the console does not. I think this works because prostate is listed as a topic in the help system. I think we would probably get this result elsewhere without having to worry about looking into Meta/data.rds for it. For example, it's in the help/clinical lazy load database:
env <- new.env()
lazyLoad("/Users/davis/Library/R/arm64/4.5/library/clinical/help/clinical", env)
#> NULL
names(env)
#> [1] "intersect" "correlation.test" "as.data.matrix" "categorical.test"
#> [5] "initialization" "add_analysis" "frequency_matching" "prostate"
#> [9] "continuous.test" "txtsummary" "multi_analysis"
env$prostate
#> \title{Clinical Data of a Cohort of Prostate Cancer Patiens}\name{prostate}\alias{prostate}\keyword{datasets}\description{The data belong to a cohort of 35 patients with prostate cancer from two different hospitals.
#> }\usage{data(prostate)}\value{
#> The data.frame "\code{prostate}" with the following elements: "\code{Hospital}", "\code{Gender}", "\code{Gleason score}", "\code{BMI}", and "\code{Age}".
#> }\examples{
#> data(prostate)
#>
#> head(prostate)
#> }
We currently use
INDEXas a rough way to get access to exported package datasets so we don't flag them as false positives.But both
data/Rdata.rdsandMeta/data.rdshave this info as well:Meta/data.rdslooks to be a superset that works for packages that don'tLazyDatatheir datasets. It works like this:Here is what a standard result looks like for a LazyData package, here workflowsets:
{clinical} is an example of a package that doesn't have LazyData. Instead it has
prostatelisted indata/datalistwhich is not available viaclinical::prostatebut is available viadata(prostate, package = "clinical").For completions after
pkg::and diagnostics after a call tolibrary(), I think we'd just want LazyData supported datasets, because non lazy loaded ones require a call todata()first.So the result of this analysis is that
data/Rdata.rdsis probably what we should be looking at.We could add a parsed result of looking at this file to our package cache on cache creation.
For help, i.e.
?pkg::data, note that?clinical::prostateDOES work even thoughclinical::prostateat the console does not. I think this works becauseprostateis listed as a topic in the help system. I think we would probably get this result elsewhere without having to worry about looking intoMeta/data.rdsfor it. For example, it's in thehelp/clinicallazy load database: