Homework 1: Math Foundations for ML
The is the coding potion of Homework 1. The homework is aimed at testing the ability to code up mathematical operations using Python and the numpylibrary.
For each problem, we provide hints or example test cases to check your answers (see the assert statements below). Your full submission will be autograded on a larger batch of randomly generated test cases.
Note on the autograding process¶
For this assignment, we are using nbgrader for autograding. We recommend that you use JupyterLab or Jupyter notebook to complete this assignment for compatibility.
The cells containing example test cases also serve as placeholders for the autograder to know where to inject additional random tests. Notice that they are always after your solution; moving/deleting them will cause the tests to fail, and we'd have to manually regrade it. They are marked with DO NOT MOVE/DELETE and set to read-only just in case.
The autograder tests will call the functions named solve_system, split_into_train_and_test, closest_interval. You may not change the function signature (function name and argument list), but otherwise feel free to add helper functions in your solution. You can also make a copy of the notebook and use that as a scratchpad.
To double check your submission format, restart your kernel (Menu bar -> Kernel -> Restart Kernel); execute all cells from top to bottom, and see if you can pass the example test cases.
In [ ]:
import numpy as np
Part 1: Systems of linear equations
Given ?n equations with ?n unknown variables (?≤4n≤4), write a function solve_system that can solve this system of equations and produce an output of value for each variable such that the system of equations is satisfied.
The system of equations will be provided as a list of strings as seen in test_eq.
You may assume that the variables are always in {?,?,?,?}{a,b,c,d}, the system has a unique solution, and all coefficients are integers.
In [ ]:
def solve_system(equations):
""""
Takes in a list of strings for each equation.
Returns a numpy array with a row for each equation value
"""
# YOUR CODE HERE
raise NotImplementedError()
In [ ]:
# === DO NOT MOVE/DELETE ===
# This cell is used as a placeholder for autograder script injection.
def test_eq(sys_eq):
results = solve_system(sys_eq)
expected = np.array([[3],[5],[2],[4]])
assert np.allclose(expected, results)
test_eq([
'2 a + b - 3 c + d = 9',
'-5 a + b - 4 c + d = -14',
'a + 2 b - 10 c = -7',
'a + 2 b = 13',
])
Part 2: Split a dataset into test and train
(For this question, using an existing implementation (e.g. sklearn.model_selection.train_test_split) will give 0 points.)
In supervised learning, the dataset is usually split into a train set (on which the model is trained) and a test set (to evaluate the trained model). This part of the homework requires writing a function split_into_train_and_test that takes a dataset and the train-test split ratio as input and provides the data split as an output. The function takes a random_state variable as input which when kept the same outputs the same split for multiple runs of the function.
Note: if frac_test does not result in an integer test set size, round down to the nearest integer.
Hints:
- The input array x_all_LF should not be altered after the function call.
- Running the function with the same seed multiple times should yield the same results.
- Every element in the input array should appear either in the train or test set, but not in both.
In [ ]:
def split_into_train_and_test(x_all_LF, frac_test=0.5, seed=None):
''' Divide provided array into train and test sets along first dimension
https://stackoverflow.com/questions/28064634/random-state-pseudo-random-numberin-scikit-learn
'''
if seed is None:
rng = np.random.RandomState()
# YOUR CODE HERE
raise NotImplementedError()
In [ ]:
# === DO NOT MOVE/DELETE ===
# This cell is used as a placeholder for autograder script injection.
N = 10
x_LF = np.eye(N)
xcopy_LF = x_LF.copy() # preserve what input was before the call
train_MF, test_NF = split_into_train_and_test(x_LF, frac_test=0.2, seed=0)
Part 3: Solving a Search Problem
Given a list of N intervals, for each interval [?,?][a,b], we want to find the closest non-overlapping interval [?,?][c,d] greater than [?,?][a,b].
An interval [?,?][c,d] is greater than an non-overlapping interval [?,?][a,b] if ?<?<?<?a<b<c<d.
The function closest_interval takes in the list of intervals, and returns a list of indices corresponding to the index of the closest non-overlapping interval for each interval in the list. If a particular interval does not have a closest non-overlapping interval in the given list, return -1 corresponding to that element in the list.
In [ ]:
def closest_interval(intervals):
# YOUR CODE HERE
raise NotImplementedError()
In [ ]:
# === DO NOT MOVE/DELETE ===
# This cell is used as a placeholder for autograder script injection.
intervals = np.array([
[1, 4],
[2, 5],
[8, 9],
[6, 8],
[9, 10],
[3, 4],
[7, 9],
[5, 7],
])
expected_closest_intervals = closest_interval(intervals)
# Evaluate
results = np.array([7, 3, -1, 4, -1, 7, -1, 2])
assert np.allclose(expected_closest_intervals, results)