Machine Learning Homework 1: Math Foundations for ML

Engage in a Conversation

Homework 1: Math Foundations for ML CourseNana.COM

The is the coding potion of Homework 1. The homework is aimed at testing the ability to code up mathematical operations using Python and the numpylibrary. CourseNana.COM

For each problem, we provide hints or example test cases to check your answers (see the assert statements below). Your full submission will be autograded on a larger batch of randomly generated test cases. CourseNana.COM

Note on the autograding process¶ CourseNana.COM

For this assignment, we are using nbgrader for autograding. We recommend that you use JupyterLab or Jupyter notebook to complete this assignment for compatibility. CourseNana.COM

The cells containing example test cases also serve as placeholders for the autograder to know where to inject additional random tests. Notice that they are always after your solution; moving/deleting them will cause the tests to fail, and we'd have to manually regrade it. They are marked with DO NOT MOVE/DELETE and set to read-only just in case. CourseNana.COM

The autograder tests will call the functions named solve_system, split_into_train_and_test, closest_interval. You may not change the function signature (function name and argument list), but otherwise feel free to add helper functions in your solution. You can also make a copy of the notebook and use that as a scratchpad. CourseNana.COM

To double check your submission format, restart your kernel (Menu bar -> Kernel -> Restart Kernel); execute all cells from top to bottom, and see if you can pass the example test cases. CourseNana.COM

In [ ]: CourseNana.COM

import numpy as np CourseNana.COM

Part 1: Systems of linear equations CourseNana.COM

Given ?n equations with ?n unknown variables (?≤4n≤4), write a function solve_system that can solve this system of equations and produce an output of value for each variable such that the system of equations is satisfied. CourseNana.COM

The system of equations will be provided as a list of strings as seen in test_eq. CourseNana.COM

You may assume that the variables are always in {?,?,?,?}{a,b,c,d}, the system has a unique solution, and all coefficients are integers. CourseNana.COM

In [ ]: CourseNana.COM

def solve_system(equations): CourseNana.COM

"""" CourseNana.COM

Takes in a list of strings for each equation. CourseNana.COM

Returns a numpy array with a row for each equation value CourseNana.COM

""" CourseNana.COM

# YOUR CODE HERE CourseNana.COM

raise NotImplementedError() CourseNana.COM

In [ ]: CourseNana.COM

# === DO NOT MOVE/DELETE === CourseNana.COM

# This cell is used as a placeholder for autograder script injection. CourseNana.COM

CourseNana.COM

def test_eq(sys_eq): CourseNana.COM

results = solve_system(sys_eq) CourseNana.COM

expected = np.array([[3],[5],[2],[4]]) CourseNana.COM

assert np.allclose(expected, results) CourseNana.COM

CourseNana.COM

test_eq([ CourseNana.COM

'2 a + b - 3 c + d = 9', CourseNana.COM

'-5 a + b - 4 c + d = -14', CourseNana.COM

'a + 2 b - 10 c = -7', CourseNana.COM

'a + 2 b = 13', CourseNana.COM

]) CourseNana.COM

Part 2: Split a dataset into test and train CourseNana.COM

(For this question, using an existing implementation (e.g. sklearn.model_selection.train_test_split) will give 0 points.) CourseNana.COM

In supervised learning, the dataset is usually split into a train set (on which the model is trained) and a test set (to evaluate the trained model). This part of the homework requires writing a function split_into_train_and_test that takes a dataset and the train-test split ratio as input and provides the data split as an output. The function takes a random_state variable as input which when kept the same outputs the same split for multiple runs of the function. CourseNana.COM

Note: if frac_test does not result in an integer test set size, round down to the nearest integer. CourseNana.COM

Hints: CourseNana.COM

The input array x_all_LF should not be altered after the function call.
Running the function with the same seed multiple times should yield the same results.
Every element in the input array should appear either in the train or test set, but not in both.

In [ ]: CourseNana.COM

def split_into_train_and_test(x_all_LF, frac_test=0.5, seed=None): CourseNana.COM

''' Divide provided array into train and test sets along first dimension CourseNana.COM

https://stackoverflow.com/questions/28064634/random-state-pseudo-random-numberin-scikit-learn CourseNana.COM

''' CourseNana.COM

if seed is None: CourseNana.COM

rng = np.random.RandomState() CourseNana.COM

CourseNana.COM

# YOUR CODE HERE CourseNana.COM

raise NotImplementedError() CourseNana.COM

In [ ]: CourseNana.COM

# === DO NOT MOVE/DELETE === CourseNana.COM

# This cell is used as a placeholder for autograder script injection. CourseNana.COM

CourseNana.COM

N = 10 CourseNana.COM

x_LF = np.eye(N) CourseNana.COM

xcopy_LF = x_LF.copy() # preserve what input was before the call CourseNana.COM

train_MF, test_NF = split_into_train_and_test(x_LF, frac_test=0.2, seed=0) CourseNana.COM

Part 3: Solving a Search Problem CourseNana.COM

Given a list of N intervals, for each interval [?,?][a,b], we want to find the closest non-overlapping interval [?,?][c,d] greater than [?,?][a,b]. CourseNana.COM

An interval [?,?][c,d] is greater than an non-overlapping interval [?,?][a,b] if ?<?<?<?a<b<c<d. CourseNana.COM

The function closest_interval takes in the list of intervals, and returns a list of indices corresponding to the index of the closest non-overlapping interval for each interval in the list. If a particular interval does not have a closest non-overlapping interval in the given list, return -1 corresponding to that element in the list. CourseNana.COM

In [ ]: CourseNana.COM

def closest_interval(intervals): CourseNana.COM

# YOUR CODE HERE CourseNana.COM

raise NotImplementedError() CourseNana.COM

In [ ]: CourseNana.COM

# === DO NOT MOVE/DELETE === CourseNana.COM

# This cell is used as a placeholder for autograder script injection. CourseNana.COM

CourseNana.COM

intervals = np.array([ CourseNana.COM

[1, 4], CourseNana.COM

[2, 5], CourseNana.COM

[8, 9], CourseNana.COM