Homepage
Exam
COMP2420/COMP6420 - 2022 Sample - Q4 Linear Regression Model

COMP2420/COMP6420 - 2022 Sample - Q4 Linear Regression Model

This question has been solved

Engage in a Conversation

Q4) (20 marks)

The Diabetes dataset contains ten baseline variables - age, sex, body mass index, average blood pressure, and six blood serum measurements - obtained for each of n = 442 diabetes patients, as well as a quantitative measure of disease progression one year after baseline (column named 'Y'). CourseNana.COM

Your task is to predict the disease progression for each patient based on the given data. We have already split data into two parts: train and test sets. diab_train is training data and diab_test is testing data CourseNana.COM

CourseNana.COM

### Do not edit this cell

diab = pd.read_csv('data/diabetes.tab.txt', delimiter='\t')

print(diab.describe())

diab =  diab.iloc[np.random.permutation(len(diab))]

diab_train = diab.head(300)

diab_test = diab.tail(142)

a) (10 marks) Build a linear regression model to predict Y by using diab_train as training data and diab_test as testing data. You should use 9 of the 10 features while building your model and selection of 9 features should be based on the performance of your models. Report your test error and coefficients of your best model. CourseNana.COM

b) (10 marks) Modify your best model in part a) to now predict whether the progression of disease is dangerous or not. Consider values greater than 200 for disease progression to be dangerous. Report the results on the test data. CourseNana.COM

Get the Solution to This Question

WeChat (微信)

Last: COMP2420/COMP6420 - 2022 Sample - Q3 Constant Classifier

Next: COMP2420/COMP6420 - 2020 Sample - Q1 Security - Key Exchange