add one more step for filtering out some geographics.#42
Conversation
lindsay-fredrick
left a comment
There was a problem hiding this comment.
Looks fine, left some suggestions. Also with this being in here, is the 15 script even used anymore? Might be worth archiving. Also, the README should be updated with all the steps - seems to be lagging behind a bit.
I'm also still worried that there is a general lack of documentation surrounding how to re-run this from year to year - but I'm hoping that that is going to get built as the next year data is run? (i.e. if you were to win the lottery next week, how would someone else come in and run this entire pipeline with minimal effort to piece it all together). Basically some step by step instructions...
| #------------------------------------------------------------------------------------------- | ||
|
|
||
| # Input: Geo Suppression IDs file | ||
| geo_suppression_file <- file.path( |
There was a problem hiding this comment.
This feels like an extremely ad hoc/undocumented place to store these files. There should be a document somewhere that explains where this came from... there is clearly some non version-controlled code sitting in this folder as well. Should we not have that in here?
There was a problem hiding this comment.
After dealing with three options, I ended up using the 2021 Geographic Attribute File from StatsCAN since it has the CSDTYPE column.
- I removed those non-version controled files.
- I created a new README.md in the src folder.
| "Geo Suppression IDs.xlsx" | ||
| ) | ||
|
|
||
| # Output path for 2023 data catalogue products |
There was a problem hiding this comment.
just a note that hard coding years is usually a bad idea. Will want to make this usable from year to year.
There was a problem hiding this comment.
I added a parameter for this target year which can be changed in the config section.
| cat(" Original records:", nrow(sei_long_chsa), "\n") | ||
|
|
||
| # Keep consistent with DET - filter out CHSAs removed in DET | ||
| sei_long_chsa_filtered <- sei_long_chsa %>% |
There was a problem hiding this comment.
Why not just re-run the remove_geographies function?
There was a problem hiding this comment.
Now it is handled by the remove_geographies function.
|
We remove the dependency on the ad hoc list of indigenous CSDs. Instead, switch to a programmatically way to deal with the identification of indigenous CSDs based on StatsCAN's documentation and classification of CSDTYPE. |
lindsay-fredrick
left a comment
There was a problem hiding this comment.
looks good, just left one suggestion for simplifying the data pull.
|
|
||
| # Download 2021 Geographic Attribute File for BC CSDs directly from StatsCan | ||
| # Reference: https://www12.statcan.gc.ca/census-recensement/2021/ref/dict/az/definition-eng.cfm?ID=geo044 | ||
| cat("Downloading StatsCan Geographic Attribute File for BC CSDs...\n") |
There was a problem hiding this comment.
suggest skipping all this and just pulling via cancensus:
cancensus::get_statcan_geographic_attributes('2021')
pulls the exact same file.
There was a problem hiding this comment.
This function does not work on my workstation because it cannot unzip the downloaded file. I will keep both options.
There was a problem hiding this comment.
there's no file to unzip? I'm confused, it just pulls it directly into my R session (exact same file as the one you are pulling in from stats can). Maybe just make sure you have your API set up correctly?
…assfication file.
| geo_attr <<- NULL | ||
| } | ||
| ) | ||
| # geo_attr_url <- 'https://www12.statcan.gc.ca/census-recensement/2021/geo/aip-pia/attribute-attribs/files-fichiers/2021_92-151_X.zip' |
There was a problem hiding this comment.
if not using anymore I'd just delete to keep things clean
Just one file, 17_remove_geo_suppression_ids.R. The source of suppressed geo IDs is in the LAN folder.