CSE 450: Code Project 1 (Hiking Trails)

This code project should be completed individually. Refer to Canvas for a code template and example test case input/output pairs. When you’re finished, upload your completed code to codePost for autograding. You may resubmit as many times as you want until the deadline.

1 Motivation & Setting

Suppose that a government wants to create a new, large conservation park for wildlife and tourism. They know where they want to build the visitors’ center and the locations of the natural landmarks they want to make accessible, but they need your help to decide where to put the hiking trails connecting them all. To save on costs, they want to build as little trail as possible while respecting the following design constraints.

• The trails must connect the visitors’ center and all natural landmarks in the park.

• There are certain no-trail regions of the park where trails cannot pass through (e.g., lakes, sensitive

nesting areas, excessively steep terrain).

• Trails connecting two locations of interest must be built in a straight line; they cannot curve or bend.

Once the trail network is built, the park service is also hoping you can help them calculate trail distances and other interesting measurements for their trail maps and brochures.

2 Modeling

You ask the park service for the relevant data, and they provide you with (1) a list of latitude-longitude coordinates for the visitors’ center and natural landmarks, and (2) a list of no-trail regions, each expressed as a pair of latitude-longitude coordinates for the region’s northwest and southeast corners. An example set of locations and no-trail regions from ASU campus might look like:

As you start to think about the problem, you realize that the Minimum-Spanning-Tree problem—which you conveniently studied recently in CSE 450—might be helpful for this hiking trail task. You imagine the park as a graph G = (V,E) where the nodes V = {v0,...,vn−1} are the visitors’ center and natural landmarks, and the edges E represent the possible trails between them. You observe that since trails can be walked in either direction, these edges are undirected. You also realize that some pairs of locations shouldn’t have edges between them because they would involve a possible trail crossing through a no-trail region. You aren’t totally sure at this moment how to test which edges are valid and which should be removed, but you know it’s something you need to solve eventually.

You next consider how to assign edge weights w : E → R+ based on the distance between any pair of locations. You’re more used to working with Cartesian coordinates on a 2D grid. In that setting, you know that the distance between any two points p1 = (x1, y1) and p2 = (x2, y2) is given by

22 d(p1,p2)= (x1−x2) +(y1−y2).

But this hiking trail task deals with latitude-longitude—and the Earth is round! You aren’t sure how to calculate distance over a spherical surface (even if you ignore elevation changes), but after some digging around on the Internet you discover the haversine formula does exactly what you need. Using haversines, the distance between two points p1 = (lat1, lng1) and p2 = (lat2, lng2) is very closely approximated by

d(p1,p2)=2·r⊕ ·arcsin

where r⊕ ≈ 6,367.5 km is the radius of the Earth and the latitudes/longitudes are given in radians.

With this modeling and brainstorming done, you feel ready to start implementing your algorithm.

The codePost environment will autograde your project using python 3.7 in a Linux environment. Use other versions of python at your own risk. You may use any standard packages shipped with core python in your implementation, but you only need os.path (for handling OS-independent filepaths), csv (for reading the input data) and math (for trigonometric functions). The numpy package (v1.21.6) is also available in the codePost environment if you want to use it, but it is not necessary. Structure your directory as:

codeproject1

|--- hikingtrails.py // Start with the template. All implementation goes here. |--- inputs // Directory of .csv input files.

|--- |--- <id1>_landmarks.csv

|--- |--- <id1>_notrails.csv

|--- |--- <id2>_landmarks.csv

|--- |--- <id2>_notrails.csv

|--- |--- ...

According to their documentation, any autograded test on codePost will timeout after 30 seconds. You will not be graded based on the runtime of your code, but it does need to be efficient enough that it does not run longer than codePost will allow.

3.2 Input Data

Each test case involves two input files: landmarks and no-trail regions. Landmarks files have filenames <id> + "_landmarks.csv"

1In reality, even when ignoring elevation changes, the Earth is not a perfect sphere and thus using haversines can yield up to a 0.5% error when calculating distances. Alternative (but more complicated) metrics like Vincenty’s formulae accurately estimate distances—without elevation and assuming an ellipsoidal Earth—within 0.5 mm.

3 Implementation

3.1 Environment, Packages, and Directory Structure

and no-trail regions files have filenames

<id> + "_notrails.csv"

where id is the string identifier of the test case. Since your code will be tested in a Linux environment, it is important to avoid hardcoding your filepaths. For example, loading the file inputs\file.csv on Windows will break on Linux, which expects inputs/file.csv. Instead, use os.path.join("inputs", "file.csv") for OS-independent filepaths.

Each landmarks file contains one latitude-longitude location per row, in degrees. Each row corresponds to a landmark, and the first row specifies the location of the visitors’ center. Each no-trail region file contains two latitude-longitude locations per row indicating the northwest and southeast corners of a rectangular no-trail region. For example, the ASU location data generating the map in Section 2 is:

asu_landmarks.csv: 33.4235199204736,-111.9389895178650 33.4161999268015,-111.9379442781375 33.4204818018029,-111.9339702789114 33.4264120936318,-111.9326212152087 33.4189540042138,-111.9345724451219 33.4153789823921,-111.9285127742504 33.4196990211426,-111.9283729910807

asu_notrails.csv: 33.4173262237920,-111.9326952069888,33.4160122364887,-111.930216085321 33.4255050937388,-111.9366697810203,33.4232222352196,-111.934977523185

Throughout the project, landmarks will be referred to as integers corresponding to their index in the land- marks file; the visitors’ center is landmark v0, the next row is the location of landmark v1, and so on.

3.3 Task 1: Helper Functions

There are two helper functions to implement before tackling the core problems.

haversine_distance(lat1, lng1, lat2, lng2)

- Description. Computes the distance in kilometers between two latitude-longitude locations accounting for the curvature of the Earth using the haversine formula. See Section 2 for details.
- Parameters. Four float values representing two latitude-longitude locations in degrees.
- Returns. The float distance between the two locations in kilometers. intersect(lat1, lng1, lat2, lng2, latNW, lngNW, latSE, lngSE)

- Description. Determines if the line segment between (lat1, lng1) and (lat2, lng2) intersects with the region defined by its northwest corner (latNW, lngNW) and southeast corner (latSE, lngSE).
- Parameters. Eight float values representing four latitude-longitude locations in degrees; the first two locations are the start and endpoints of the line segment and the second two locations are the northwest and southeast corners of the region, respectively.
- Assumptions. You may assume that the region does not contain the north or south pole. You may also assume it does not intersect the prime meridian, i.e., 0◦ longitude.
- Returns. True if the line segment between the specified locations intersects with the specified region; False otherwise.

3.4 Task 2: Graph Construction

The first core task is to construct an undirected graph G = (V, E) representing the landmarks and possible trails between them as discussed in Section 2. You are required to use the provided template Graph class and implement its functions, but how you implement it and what internal data structures you use are up to you. The functions you need to implement for this class are:

Graph.__init__(self, id)

- Description. Initializes a new Graph with nodes, edges, and edge weights based on the input data.
- Parameters. A string identifier for the input data. Corresponds to the <id> part of the corresponding input filenames.
- Returns. An instantiated Graph object. Graph.edge_query(self, i, j)

• Description. Determines if the specified edge (vi,vj) exists in the graph and, if so, what its weight is. • Parameters. Two int indices i and j corresponding to landmarks vi and vj.

• Returns. A float weight of the edge (vi,vj) if the edge exists and None otherwise.

Graph.connected(self)

• Description. Determines if the graph is connected.

• Parameters. N/A.

• Returns. True if the graph is connected and False otherwise.

3.5 Task 3: Trail Network Design

With a graph G instantiated, the next core task is to determine the optimal trail network using an algorithm for Minimum-Spanning-Tree. You may use any of the three algorithms we covered in class (Kruskal’s, Reverse- Delete, or Prim’s). Your implementation does not necessarily have to be optimized, but it does need to run fast enough to avoid timing out (see Section 3.1). You are required to use the provided template MST class and implement its functions, but you can implement it however you want.

MST.__init__(self, graph)

- Description. Initializes a new MST by computing the minimum spanning tree of the input Graph object.
- Parameters. A Graph object.
- Exceptions. Raises a ValueError if the input graph is disconnected.
- Assumptions. You may assume that the edge weights of the input Graph object are distinct and thus the resulting minimum spanning tree is unique.
- Returns. An instantiated MST object. MST.edge_query(self, i, j)

• Description. Determines if the specified edge (vi,vj) exists in the minimum spanning tree. • Parameters. Two int indices i and j corresponding to landmarks vi and vj.

• Returns. True if the edge (vi,vj) is in the minimum spanning tree and False otherwise.

• Description. Computes w(e) for the minimum spanning tree T.

• Parameters. N/A.

• Returns. The float total length of the minimum spanning tree, in kilometers.

MST.visitor_length(self, i)

• Description. Computes the walking distance from the visitors’ center to landmark vi along the trails. • Parameters. An int index i corresponding to landmark vi.

• Returns. A float distance from the visitors’ center to vi, in kilometers.

4 Grading and Evaluation

This project is graded out of 80 points, with the opportunity to earn an extra 20 points of extra credit (maximum score 100/80). The grade breakdown is:

- Eight for-credit codePost test cases worth 10 points each. Each test case is autograded in an all-or- nothing, pass-or-fail manner. There is no partial credit.
- One extra-credit codePost test case worth 10 points. This test case has significantly more landmarks than the for-credit cases and thus rewards carefully optimized code.
- One extra-credit visualization task worth 10 points. The goal here is to produce a plot visualizing the landmarks, no-trail regions, and computed minimum spanning tree for a very large instance. I recommend using the matplotlib package for python, but you are free to choose your preferred tools.

The instance to visualize is Yellowstone National Park, corresponding to the yellowstone input files included in the .zip file on Canvas. Your completed visualization should have a dot for each landmark, a special dot for the visitors’ center, boxes for each no-trail region, and edges connecting dots for your minimum spanning tree. Upload your completed figure to Canvas as a .png or .jpg for extra credit.