Assignment 2: Bridges
Due Thursday by 4p.m. Points None
In Assignment 2, you will work with an open dataset that contains information about bridges in Ontario. You can complete the whole assignment with only the concepts from Weeks 1 up to and including Nested Lists and Loops from Week 7 of the course. You do not need the Week 7 material on files. This handout explains the problem being solved, and the tasks to complete, for the assignment. Please read it carefully and in its entirety.
Logistics
Due Date: Thursday, November 2nd before 4:00 pm (Toronto Time)
Submission: You will submit your assignment solution on MarkUs.
Late Policy: There are penalties for submitting the assignment after the due date. These penalties
depend on how many hours late your submission is. Please see the syllabus on Quercus for more
information.
No Remark Requests: No remark requests will be accepted. A syntax error could result in a grade of 0
on the assignment. Before the deadline, you are responsible for running your code and the checker
program to identify and resolve any errors that will prevent our tests from running. The best way to check
for this is to run the tests on MarkUs via the Automated Testing tab for this assignment.
Goals of this assignment
The main goal of this assignment is that students will continue to use the Function Design Recipe, with an emphasis on the last two steps of the recipe (Body and Test Your Function). Assignment 2 lets you practice with more programming concepts than before. These are some of the goals for Assignment 2:
Students will be able to write loops (i.e., while , for ) in the body of functions to implement their
description
Students will be able to appropriately use a variety of data types (including lists and nested lists) through
indexing, methods, etc.
Students will be able to apply what they've learned about mutability and mutate function inputs only
when it is appropriate to do so
Students will be able to reuse functions to help them implement other functions according to their
docstring description
Students will learn to use the Wing 101 debugger and automated testing with doctest
About doctests
In this assignment, we will be introducing you to a new tool called doctest. The doctest module in Python allows you to automatically run the examples in your function docstrings. Doctests can sometimes be a bit finicky, and depend on your examples being in a very precise format. If you have issues with doctest, you can test your functions manually by copy pasting the examples in the shell. Unlike Assignment 1, we have provided you with the docstring examples for the functions you need to implement. You may add your own examples if you like (and you should certainly test your code with more than just the provided examples) but we will not be marking your docstrings. See section Using doctest for more details on doctests.
Ontario Bridges
The Government of Ontario collects a huge amount of data (https://data.ontario.ca/) on provincial programs and infrastructure, and much of it is provided as open data sets for public use. In this assignment, we’ll work with a particular dataset that contains information about provincially owned and maintained bridges in Ontario. All bridges in Ontario are reviewed every 2 years, and their information is collected to help determine when bridges need inspection and maintenance. The data you'll be working with contains information about all bridges in the Ontario highway network, such as the length of the bridges, their condition over various years, and historical information.
We have included two datasets in your starter files: bridge_data_small.csv and bridge_data_large.csv (do not download the files from the Government of Ontario - use only the datasets in the starter files). These are Comma Separated Value (CSV) files, which contain the data in a table-format similar to a spreadsheet. Each row of the table contains information about a bridge. And each column of the table represents a "feature" of that bridge. For example, the latitude and longitude columns tell us the precise location of the bridge. If you would like to take a peek at the data, we recommend opening the files with a spreadsheet program (e.g., Excel) rather than using Wing 101.
Here is a screenshot of the data opened in Microsoft Excel (your computer may have a different spreadsheet program installed):
We can see that the first bridge is described on row 3. From row 3, column B, we can see that the bridge is
named: Highway 24 Underpass at Highway 403. Subsequent columns include even more information about the bridge.
Inspecting Bridges
Ontario sends inspectors to check the condition of a bridge. The dataset contains a column (LAST INSPECTION DATE) showing the last date a bridge was inspected. When a bridge is inspected, it receives a score based on its condition. This score is called the Bridge Condition Index (BCI). The BCI is a number between 0 and 100, inclusive. You can see the most recent score in the dataset (CURRENT BCI), as well as past scores (the columns with years ranging from 2013 to 2000).
Fixing Bridges
If a bridge is in poor condition, it can be fixed (i.e., "rehabilitated"). These can be major or minor fixes. The dataset includes the year the last major (LAST MAJOR REHAB) or minor (LAST MINOR REHAB) rehabilitation was performed on the bridge.
Bridge Spans
A bridge is made up of one or more spans (# OF SPANS). A span "is the distance between two intermediate supports for a structure, e.g. a beam or a bridge. A span can be closed by a solid beam or by a rope" (Source: Wikipedia (https://en.wikipedia.org/wiki/Span_(engineering)) ). Each span has a length associated with it (SPAN DETAILS). For example, if a bridge has two spans, the SPAN DETAILS data follows the following format:
There is a semicolon after every span length.
Each span length has a prefix of the form (x)= where x is a number starting from 1 and increasing by
1 for every span.
Span lengths are numeric and greater than zero.
Ontario Bridges in Python
We will represent the dataset as a list of lists in Python (i.e., list[list] ). The outer list has the same length as the number of bridges in the dataset. Each inner list (i.e., list ) corresponds to one row (i.e., bridge) of the dataset. For example, here is what the first bridge of our dataset will look like in our Python program:
Total=[total length of all spans] (1)=[the length of the first span];(2)=[the length of the second spa
n]; and so on for each span of the bridge;
>>> MISSING_BCI = -1.0
>>> first_bridge = [
...
...
...
...
...
...
...
... ]
1, 'Highway 24 Underpass at Highway 403',
'403', 43.167233, -80.275567, '1965', '2014', '2009', 4,
[12.0, 19.0, 21.0, 12.0], 65.0, '04/13/2012',
[['2013', '2012', '2011', '2010', '2009', '2008', '2007',
'2006', '2005', '2004', '2003', '2002', '2001', '2000'],
[MISSING_BCI, 72.3, MISSING_BCI, 69.5, MISSING_BCI, 70.0, MISSING_BCI,
70.3, MISSING_BCI, 70.5, MISSING_BCI, 70.7, 72.9, MISSING_BCI]]
The variable first_bridge has the general type list . Notice how the elements inside the list are not all the same type. The list includes:
integers (e.g., 1 , 4 )
strings (e.g., '403' , '04/13/2012' )
floats (e.g., 65.0 )
lists (e.g., [12.0, 19.0, 21.0, 12.0] )
You may also notice that first_bridge is different from the first bridge in the dataset file itself. This is because the data has been cleaned to suit our needs. For example:
we have replaced the ID with an integer value.
we have replaced the spans with a list of floats, omitting the total ( [12.0, 19.0, 21.0, 12.0] )
we have replaced the BCI scores with two lists. The first is a list of strings containing the dates. The
second is a parallel list (i.e., has the same length as the list of strings) of floats containing the scores,
where empty scores are assigned the value MISSING_BCI
The data is not magically converted into this "clean" format - you will be implementing functions that transform the text data found in the files into a format that is more useful to our program.
Indexing with Constants
The bridge_functions.py file includes many constants to use when indexing the nested data. Much like Assignment 1, you should be using these constants in the bodies of your functions. For example, what should we write if we wanted to access the year a bridge was built?
Consider the following code that does not use the constants:
How did I know to use index 5? Am I expected to memorize all the indexes? The answer is no; you should not write code like above. Instead, use the constants to provide context into which data feature you are accessing:
>>> # Assume that first_bridge is a list containing data
>>> first_bridge[5]
'1965'
>>> # Assume that first_bridge is a list containing data
>>> first_bridge[COLUMN_YEAR_BUILT]
'1965'
The following table shows how the dataset file and Python data are related through constants that begin with the prefix COLUMN_ . The table also includes the data type that you should expect to find if you were to index a bridge list (like first_bridge ) using that constant.
Column Name
ID
STRUCTURE
HWY NAME
LATITUDE
LONGITUDE
YEAR BUILT
LAST MAJOR REHAB
LAST MINOR REHAB
# OF SPANS
SPAN DETAILS
DECK LENGTH
LAST INSPECTION DATE
CURRENT BCI
Remaining Columns
Constant to use as Index
COLUMN_ID
COLUMN_NAME
COLUMN_HIGHWAY
COLUMN_LAT
COLUMN_LON
COLUMN_YEAR_BUILT
COLUMN_LAST_MAJOR_REHAB
COLUMN_LAST_MINOR_REHAB
COLUMN_NUM_SPANS
COLUMN_SPAN_DETAILS
COLUMN_DECK_LENGTH
COLUMN_LAST_INSPECTED
N/A (see below)
Data Type
int
str
str
float
float
str
str
str
N/A
COLUMN_BCI
Note that the COLUMN_ID in our inner list is an integer, which is very different from the ID column in the
dataset.
Storing BCI Scores
Our inner list does not contain the CURRENT BCI column from the dataset (instead, you will implement a function to find the most recent BCI score). Moreover, the remaining columns in the dataset that contain BCI scores are stored in another list (at index COLUMN_BCI ) with type list[list] . This list contains exactly two lists:
The first list is at INDEX_BCI_YEARS with type list[str] and includes the years in decreasing order. The second list is at INDEX_BCI_SCORES with type list[float] and includes the BCI scores. Empty scores in the dataset have a value of MISSING_BCI .
These two lists are parallel lists and should have the same length. Consider the following example:
>>> # Assume that first_bridge is a list containing data
>>> # Assume that MISSING_BCI refers to the value -1.0
>>> first_bridge[COLUMN_BCI][INDEX_BCI_YEARS]
['2013', '2012', '2011', '2010', '2009', '2008', '2007', '2006', '2005', '2004', '2003', '2002', '200
>>> len(first_bridge[COLUMN_BCI][INDEX_BCI_SCORES])
14
From the example above, we can see that first_bridge has no BCI score in the year 2013 (see index 0 of both lists). But it does have a BCI score of 72.3 in the year 2012 (see index 1 of both lists). Therefore,
first_bridge 's most recent BCI score is 72.3 .
Locations and Calculating Distance
The bridges have their locations represented as a latitude and longitude, which we typically refer to as (lat, lon) for short. If you are curious, you can always search for a specific location online (e.g., with Google Maps):
It is very convenient to be able to calculate the straight-line distance between two locations. But this is actually a little tricky due to the curvature of the earth. We are providing you with the full implementation of a function, calculate_distance , that will accurately return the distance between two (lat, lon) points. You do not need to know how this function works - you only need to know how to use it.
What to do
At a high-level, your next steps are to:
1. Open the file bridge_functions.py .
2. Make sure that the file is in the same directory as , and the folder pyta .
3. Complete the function definitions in .
4. Test your Python file by using the Python shell, running the doctest examples, and running the
a2_checker.py .
a2_checker.py
bridge_functions.py
We have provided you with the function headers, docstring descriptions, and at least one example for the functions you need to implement; you do not need to add or change them. The focus of Assignment 2 is implementing the bodies of these functions and testing them. You can assume that we will only test your functions with inputs that satisfy any preconditions included in those headers. You should be sure to do your own testing with more examples than what we have provided.
You can create your own helper functions (with complete docstrings) if you would like, but you do not have to. There are also functions where you will find it helpful to call another function you already wrote - this kind of function reuse is a great idea, and we will be looking for you to reuse functions in your solution.
This assignment is divided into four parts. In Parts 1 to 3, you will implement functions and test them using sample data that is already in bridge_functions.py . In Part 4, you will implement functions that allow us to clean the data from the original dataset files so that we can use them in Python. Once you are done Part 4, you will be able to test your functions from Parts 1 to 3 with real data!
Part 1
In this part, you will focus on functions that work with the data by searching through it. You should not mutate any of the list inputs to these functions. You should refer to the section Indexing with Constants for help.
1. find_bridge_by_id(list[list], int) -> list
a2/
├─── pyta/
├─── many pyta files...
├─── a2_checker.py
├─── bridge_functions.py
├─── bridge_data_small.csv
├─── bridge_data_large.csvNotes:
This function will be useful in the rest of your program to get the data about one specific bridge. You
should complete it first, then practice using it when you need a specific bridge in the other functions.
You must implement the body of this function according to its description.
You must NOT mutate the list argument(s).
2. find_bridges_in_radius(list[list], float, float, int, list[int]) -> list[int] Notes:
This function helps us find all the bridges within a certain area. It becomes useful when you reach
Part 3.
You must implement the body of this function according to its description.
You must NOT mutate the list argument(s).
You should use the calculate_distance function in the body. See the Calculating Distance section for help.
3. get_bridge_condition(list[list], int) -> float Notes:
You must implement the body of this function according to its description.
You must NOT mutate the list argument(s).
See the Storing BCI Scores section for help.
4. calculate_average_condition(list, int, int) -> float Notes:
Be careful; the years are stored as strings, not integers, in our Python lists.
You must implement the body of this function according to its description.
You must NOT mutate the list argument(s).
See the Storing BCI Scores section for help.
Review the Week 5 PCRS module on "Parallel Lists and Strings" for help.
Part 2
In this part, you will work on functions that mutate the data. Notice that the wording of the docstring descriptions has changed (the descriptions do not begin with the word "Return"). Notice also that the return type for these functions is None , so nothing is returned.
1. inspect_bridge(list[list], int, str, float) -> None Notes:
You must implement the body of this function according to its description.
Remember that the years a bridge has been inspected are stored in decreasing order. See the
Storing BCI Scores section for help.
Review the PCRS modules on "List Methods" and "Mutability and Aliasing" for help. 2. rehabilitate_bridge(list[list], list[int], str, bool) -> None
Notes:
You must implement the body of this function according to its description.
See the Fixing Bridges section for help.
Review the Week 5 PCRS module on "Mutability and Aliasing" for help.
Part 3
In this part, you will write functions to implement an algorithm to help pick the sequence of bridges a bridge inspector will visit next. You will do this in two parts.
First, by implementing a function that finds the bridge (from a subset of bridges) that is in the worst condition. Second, by implementing a function that chooses the worst bridge within a certain radius, then the next worst bridge, etc, until the desired number of bridges have been inspected.
These functions will take time - make sure you start early and, if you are stuck, visit us in office hours. Read the docstrings carefully as they will help you understand the algorithm. It will also be especially helpful here to plan your functions first, on paper or in comments. If you aren't already using the debugger, start now!
1. find_worst_bci(list[list], list[int]) -> int Notes:
You must implement the body of this function according to its description.
You must NOT mutate the list argument(s).
See the Storing BCI Scores section for help.
2. map_route(list[list], float, float, int, int) -> list[int] Notes:
You must implement the body of this function according to its description.
You must NOT mutate the list argument(s).
Hint: use a while loop
Part 4
In this part, we will finally start working with the real data files. We call this data "raw data" because it is has not been processed yet.
The clean_data function is already implemented for you. However, it won't work correctly until you implement the functions below. Once you are done Part 4, then you can start loading the dataset files we have provided you. After that, test your functions using real data instead of just the docstring examples.
Note: clean_data does all of the required file reading for you. You do not need to write any code that opens or reads from a file directly.
You must implement the body of this function according to its description.
See the Bridge Spans section for help. clean_bci_data(list) -> None
Notes:
You must implement the body of this function according to its description.
See the Storing BCI Scores section for help.
Testing Your Solutions
The last step in the Function Design Recipe is to test your function. You can use the a2_checker.py file to check for style and type contracts. You can use the doctest module to test the examples we have provided you in the docstrings. If you pass all of these tests, it does not mean that your function is 100% correct! You must do your own additional testing (e.g., by calling your functions with different arguments in the shell).
Using a2_checker.py
We are providing a checker module ( a2_checker.py ) that tests three things:
-
Whether your code follows the Python style guidelines,
-
Whether your functions are named correctly, have the correct number of parameters, and return the
correct types
-
Whether your functions are appropriately mutating their inputs (some functions SHOULD mutate, others
SHOULD NOT)
To run the checker, open a2_checker.py and run it. Note: the checker file should be in the same directory as bridge_functions.py and the pyta folder, as provided in the starter code zip file. After running the checker,
be sure to scroll up to the top of the shell and read all the messages! You can also run this checker in MarkUs.
Using doctest
In this assignment, we have provided you with doctest examples and some example data. These can be used as a quick test to see if your function works for a specific example. However, please note that being correct for one example does not mean your function is 100% correct. Be sure to test your code in other ways -- as with Assignment 1, our own tests that evaluate your solution are hidden.
A quick way to run all the doctest examples automatically is by importing the doctest module and calling one of its functions. We have already included the code to do this for you at the bottom of the starter file:
You can uncomment this code to run doctest automatically when you run your file. We have provided this for you as a potentially helpful tool; however we will not run doctest on your submitted code.