Project 1 Data Analysis and Processing
Task 1: Data Analysis
When we do Lab 04 – US Baby Names, you downloaded the data from the following website: https://www.ssa.gov/oact/babynames/limits.html
There’s a table on the website for “Percentage of all names represented in the top 1000 names”. Please write a Python code, based on the Lab 04, to get the results exactly same as the table. This is a “creative” project. You are expected to “create” your own code based on the Lab 04 scripts. Please attach your Python code as the part 1 of your Project 1 report.
Task 2: Data Processing
The big dataset collected from real-world cannot be analyzed immediately because it’s not clean. Up to now of our “Intelligent Big Data” course, you have learned enough useful methods and skills for the processing of collected big dataset, i.e., cleaning up messy, real-world data.
Please walk through an example of that, using an open recipe database compiled from various sources on the Web. Your goal will be to parse the recipe data into ingredient lists, so you can quickly find a recipe based on some ingredients you have on hand.
This is a “practice” project. You are not expected to “create” anything. The scripts used to compile this can be found at https://github.com/fictivekin/openrecipes, and the link to the current version of the database is found there as well. Please spend time as much as you could to get yourself fully practiced.
I strongly suggest you go through the last part of the notebook 03.10-Working-WithStrings.ipynb first before you explore further. Record what you have tried and done as the part 2 of your Project 1 report (Python code plus your understanding and detailed explanations as comments).
Have Fun!