Skip to content

add one more step for filtering out some geographics.#42

Open
jonjunduan wants to merge 10 commits into
mainfrom
geo-suppression
Open

add one more step for filtering out some geographics.#42
jonjunduan wants to merge 10 commits into
mainfrom
geo-suppression

Conversation

@jonjunduan
Copy link
Copy Markdown
Contributor

Just one file, 17_remove_geo_suppression_ids.R. The source of suppressed geo IDs is in the LAN folder.

Copy link
Copy Markdown
Contributor

@lindsay-fredrick lindsay-fredrick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks fine, left some suggestions. Also with this being in here, is the 15 script even used anymore? Might be worth archiving. Also, the README should be updated with all the steps - seems to be lagging behind a bit.

I'm also still worried that there is a general lack of documentation surrounding how to re-run this from year to year - but I'm hoping that that is going to get built as the next year data is run? (i.e. if you were to win the lottery next week, how would someone else come in and run this entire pipeline with minimal effort to piece it all together). Basically some step by step instructions...

Comment thread src/17_remove_geo_suppression_ids.R Outdated
#-------------------------------------------------------------------------------------------

# Input: Geo Suppression IDs file
geo_suppression_file <- file.path(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels like an extremely ad hoc/undocumented place to store these files. There should be a document somewhere that explains where this came from... there is clearly some non version-controlled code sitting in this folder as well. Should we not have that in here?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After dealing with three options, I ended up using the 2021 Geographic Attribute File from StatsCAN since it has the CSDTYPE column.

  1. I removed those non-version controled files.
  2. I created a new README.md in the src folder.

Comment thread src/17_remove_geo_suppression_ids.R Outdated
"Geo Suppression IDs.xlsx"
)

# Output path for 2023 data catalogue products
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just a note that hard coding years is usually a bad idea. Will want to make this usable from year to year.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a parameter for this target year which can be changed in the config section.

Comment thread src/17_remove_geo_suppression_ids.R Outdated
cat(" Original records:", nrow(sei_long_chsa), "\n")

# Keep consistent with DET - filter out CHSAs removed in DET
sei_long_chsa_filtered <- sei_long_chsa %>%
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just re-run the remove_geographies function?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now it is handled by the remove_geographies function.

@jonjunduan
Copy link
Copy Markdown
Contributor Author

We remove the dependency on the ad hoc list of indigenous CSDs. Instead, switch to a programmatically way to deal with the identification of indigenous CSDs based on StatsCAN's documentation and classification of CSDTYPE.

Copy link
Copy Markdown
Contributor

@lindsay-fredrick lindsay-fredrick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good, just left one suggestion for simplifying the data pull.


# Download 2021 Geographic Attribute File for BC CSDs directly from StatsCan
# Reference: https://www12.statcan.gc.ca/census-recensement/2021/ref/dict/az/definition-eng.cfm?ID=geo044
cat("Downloading StatsCan Geographic Attribute File for BC CSDs...\n")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggest skipping all this and just pulling via cancensus:
cancensus::get_statcan_geographic_attributes('2021')
pulls the exact same file.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function does not work on my workstation because it cannot unzip the downloaded file. I will keep both options.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there's no file to unzip? I'm confused, it just pulls it directly into my R session (exact same file as the one you are pulling in from stats can). Maybe just make sure you have your API set up correctly?

geo_attr <<- NULL
}
)
# geo_attr_url <- 'https://www12.statcan.gc.ca/census-recensement/2021/geo/aip-pia/attribute-attribs/files-fichiers/2021_92-151_X.zip'
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if not using anymore I'd just delete to keep things clean

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants