Analyzing Climate Data
Global warming is the ongoing rise of the average temperature of the Earth’s climate system and has been demonstrated by direct temperature measurements and by measurements of various effects of the warming (Wikipedia). So a dataset on the temperature of major cities of the world will help analyze the progression of global warming. Also weather information is helpful for a lot of data science tasks like sales forecasting, logistics etc.
Thanks to University of Dayton, a dataset containing average daily temperatures (in Farenheit) is available for many major world cities. You will use it in writing code that can be used in educating people about global warming.
1 Part I: Historic average change in one city
In the tasks 0, 1, and 2, the input will come from the keyboard. In Tasks 3 and 4, the input will come from a file in simplified format: the file contains floating point temperature values, one number per line. The title of the file tells us the city and the year for which measurements are available.
1.1 Task 0
Write a program that asks the user to enter a series of temperature values (continuously asking for a value until they type ”quit”) and outputs the average temperature. Each temperature is represented as a floating point number.
After you complete the program in Task 0, use it as a starting point to build programs for tasks 1-4.
1.2 Task 1
Modify the program from Task 0 so that the list of the temperatures is entered first and then the average is computed by the function average(lst). This function takes as a parameter a list of temperatures (float) and returns the average temperature in the list(float).
In the docstring for the function, please provide 4 tests for this function. They don’t need to be in doctest format but if you can do it, it would be great.
1.3 Task 2
Modify the program from Task 0 so that it reports a median of the temperatures. The median function is in statistics Python module. The median temperature might be a more meaningful characteristic, because the average might be skewed by a few abnormally cold or hot days. Look up the concept of a median if you’re not very familiar with it, also check Python documentation for how to use this function (https://docs.python.org/3/library/statistics.html).
1.4 Task 3
Modify the program from Task 0 so that it reads the input from the file - the name of the file will be provided by the user - and computes the average temperature for the values from the file. The final average should be rounded to one decimal point using round() function. E.g. round(83.45291394, 1) returns 83.5.
1.5 Task 4
Compose the program that possesses all the above functionality. It should ask the user for the name of the file (where temperatures are located, one number per line) and report the number of observations, average temperature calculated by average(lst) function, median temperature returned from statistics Python module. The displayed values should be rounded to one decimal point using round function.
Run this program on a few available files and notice the change in values through history.
2 Part II: Micro analysis
The actual per-city data files have the following format:
1 1 1995
1 2 1995
1 3 1995
1 4 1995
1 5 1995
1 6 1995
1 7 1995
1 8 1995
1 9 1995
1 10 1995
45.5 46.8 40.1 60.5 55.4 48.1 59.2 58.7
That is, the first column contains the month number, the second column contains the day number, the third column contains the year, and the fourth column contains the average temperature for that day. Columns are separated by several (unspecified number of) spaces. In the example above, there’re measurements for the first ten days of January of 1995.
(Hint: .split() method for strings correctly partitions a line in this file into individual values).
2.1 Task 5
Another way to analyze global warming is to consider the sheer number of days in a certain period that the temperature exceeds a certain threshold. People might be fine with the occasional triple- digit days in a certain location, for example, but if the yearly number of hot days increases, living in that location might become intolerable.
Write a program for calculating this number. The program will ask the user for the name of the input file and threshold temperature value (float), and will report how many temperatures in the input file are above the threshold. The input file format is as described above.
2.2 Task 6
Sometimes, a more granular analysis might be needed. E.g. we might be interested in analyzing temperature values by month. As the first step for this analysis, write a function that would take the name of the input file as a parameter. Using the data from the file it would construct a dictionary in which month (integer) is a key and list of observed temperatures (list of floats) is a value for each month, and return this dictionary.
2.3 Task 7
Using the work of the two previous tasks, build an interactive program that would allow the user to analyze data per month. The user will enter the name of the input file containing the data for a particular year. Then the program will ask them to enter a month and display average, median, maximum and minimum temperatures for that month. Built in min and max functions can be used to find minimum and maximum values in the list. The program should allow the user to repeatedly enter a month and see the analysis until they type ’quit’.
3 Part III: Macro analysis
In this part the temperatures are stored as .csv (comma-separated values), a very common data sharing format. You can open it with text editor such as TextEdit to view ”raw” data, essentially rows of values separated by commas. Or you can open it with Excel/Numbers/Google Sheets to see the same data in a different way, nicely formatted by columns (our file is so huge that not all rows might be visible in those programs).
The files all start with a header line followed by a line for each measurement.
For the questions in this assignment, the data of interest is in the columns with the headers:
When measurement is not available for a particular day, the temperature value in the file is set
3.1 Task 8
Write a program that would allow us to analyze data by cities. The program would first ask the user for the name of input .csv file. The program would then allow us to enter a year, a threshold temperature value, and the name of output file, and report the number of days above the threshold value by city in that year. The results will be written to an output file. This analysis can be repeated until the user types ’quit’.
3.2 What to submit
For this assignment you should submit answers to individual tasks on Zybook. You can submit as many times as you want before the deadline.
3.3 Tiered scoring system
Submitted programs should run. A program that doesn’t run will automatically receive 0 points. A correctly running program should also have well-readable code (with comments, docstrings for every functions and tests where specified, free from commented-out code, with meaningful variable and function names).
3.4 Partial credit
Partial credit is available for programs that are not fully finished. Please specify known limitations of your programs in the comments. If you have trouble with using functions, a program written without using your functions can receive partial credit. The program should still produce the report in the desired format (or as much of it as possible).
3.5 Optional Extra Credit
You can earn up to 15% points of extra credit by designing your own problem and answering it using the city temperatures data. If you choose to do the extra credit, please submit a separate file with the name temperatures ec.py. Make sure the multiline comment at the top clearly describes the question and how your code addresses the question, and how to run your code.