DSCI 510: Principles of Programming for Data Science
Fall 2023
Lab 9 Assignment
Due: 10/24/2023 11:59 PM PT
Assignment Link - https://classroom.github.com/a/r-cJHAl8
Assignment Overview
This assignment expects you to write two python functions using provided starter code.
Deliverable:
A python file named: run.py
1. Counting Occurrences. [15 points]
For this question, we will be using requests library to make web-pages behave like files. We have a webpage that displays an extract from the famous play Romeo and Juliet. As we know, Shakespeare loved using some words like "thou", "thy", "thee", "O" etc, we want to calculate the frequency of these words in the selected extract from the web-page http://data.pr4e.org/romeo- full.txt
Function: get_frequency
Argument: (str) url, (str)word_to_search
Return: (int) count
Example:
Input-
url = http://data.pr4e.org/romeo-full.txt
word_to_search= “thou”
Output -
32
2. Web Scrapping [15 points]
For this question, we will be using beautiful soup to Scrap Table of Contents from a Wiki page.
Given a wikipedia article,
https://en.wikipedia.org/wiki/List_of_spaceflight_launches_in_January%E2%80%93June_2023
We want to identify all operators and their counts for the given outcome(successful/operational) spaceflight launches between January 2023 to June 2023.
Consider both the tables from the website - Orbital launches and Suborbital flights A pseudo code to web scrape this information is given in run.py for your reference.
Function: get_contents_from_web Arguments: (str) url, (str)outcome Return: (dict) operator_dict
Example:
Input -
url =
https://en.wikipedia.org/wiki/List_of_spaceflight_launches_in_January%E2%80%93June_2023
outcome= “successful”
Output -
{
"AFGSC":1,
"Antaris Space":1,
"CU Boulder":1,
"Clemson University":2,
...
....
.....
}