Skip to content

OC data cleaning: people vs. counts #5

Description

@cclatterbuck

Need some ideas from collaborators regarding cleaning the OC (Ocean Conservancy) dataset.

In normalizing counts from the raw dataset (filtered to the date of Coastal Cleanup Days), I noticed the range of total number of People and Adults per cleanup (row) ranged from 0-12620. When cleanups without any count data were excluded, this range lessened to 0-9600. I also noticed that some of the cleanup efforts with large numbers of people (>1000 people; e.g., Cleanup IDs 17836, 34455, 34464, 34469) only collected a single trash item. These appear to be issues with data entry and, in my opinion, can be excluded.

Potential decisions to make, with the goal of cleaning the data as much as reasonable:

  • Remove cleanups with 0 people
  • Remove cleanups with improbable numbers of people (help determining this)
  • Remove cleanups with a single item collected (& more?)

Metadata

Metadata

Assignees

No one assigned

    Labels

    dataIssue topic includes dataquestionFurther information is requested

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions