While data science is a hot new field, everyone in your generation will need a high level of data literacy. The goal of this project is to build both your programming skills and your data literacy skills.
Numerous organizations, especially governments and governmental agencies, have made data about the communities they serve available to the public in open data portals. Some agencies even provide tools to help users explore, visualize, and make sense of the data. Most, however, dump the data in a variety of formats, and leave it to the user to make sense of it all.
In this project, you will work individually to:
-
Identify at least one data source you find interesting,
- It must have multiple records that you have to analyze. It should not be data that is already summarized,
- The source should be publicly available and able to be used (in the public domain or licensed by an appropriate Creative Commons license),
-
Develop at least three related descriptive analysis questions (summarizing, finding trends) and not inferential analysis (looking for causation, correlations, or predictions) you'd like to answer using the data,
-
Write a Python program that answers the questions through descriptive analysis (sorting, searching, summarizing, finding trends),
- Your program should read in the data, process it, and write out information to the console. That information will help you create visualizations,
-
Generate visuals that help answer your questions (using Google Sheets or other programs),
-
Deliver an oral presentation of about 5 minutes that explains to your audience what data sources you used, what research questions you had, a brief overview of your program, and the results of your work supported by the visual representations.