K6312 Take-home Assignment I
Import Libraries
(395, 33)
/var/folders/l5/w9vxyb4s46b7rmcp_txnn9fw0000gn/T/ipykernel_76773/768104290.py:1: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
df_data.head(2).append(df_data.tail(2))
school sex age address famsize Pstatus Medu Fedu Mjob Fjob ... famrel freetime goout Dalc Walc health absences G1 G2 G3 0 GP F 18 U GT3 A 4 4 at_home teacher ... 4 3 4 1 1 3 6 5 6 6 1 GP F 17 U GT3 T 1 1 at_home other ... 5 3 3 1 1 3 4 5 5 6 393 MS M 18 R LE3 T 3 2 services other ... 4 4 1 3 4 5 0 11 12 10 394 MS M 19 U LE3 T 1 1 other at_home ... 3 2 3 3 3 5 5 8 9 9
4 rows × 33 columns
(357, 33)
/var/folders/l5/w9vxyb4s46b7rmcp_txnn9fw0000gn/T/ipykernel_76773/1705270932.py:4: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
df_data.head(2).append(df_data.tail(2))
school sex age address famsize Pstatus Medu Fedu Mjob Fjob ... famrel freetime goout Dalc Walc health absences G1 G2 Grade 0 GP F 18 U GT3 A 4 4 at_home teacher ... 4 3 4 1 1 3 6 5 6 6 1 GP F 17 U GT3 T 1 1 at_home other ... 5 3 3 1 1 3 4 5 5 6 393 MS M 18 R LE3 T 3 2 services other ... 4 4 1 3 4 5 0 11 12 10 394 MS M 19 U LE3 T 1 1 other at_home ... 3 2 3 3 3 5 5 8 9 9
4 rows × 33 columns
Index(['school', 'sex', 'age', 'address', 'famsize', 'Pstatus', 'Medu', 'Fedu',
'Mjob', 'Fjob', 'reason', 'guardian', 'traveltime', 'studytime',
'failures', 'schoolsup', 'famsup', 'paid', 'activities', 'nursery',
'higher', 'internet', 'romantic', 'famrel', 'freetime', 'goout', 'Dalc',
'Walc', 'health', 'absences', 'G1', 'G2', 'Grade'],
dtype='object')
Index(['age', 'Medu', 'Fedu', 'traveltime', 'studytime', 'failures', 'famrel',
'freetime', 'goout', 'Dalc', 'Walc', 'health', 'absences', 'G1', 'G2',
'Grade', 'school_GP', 'school_MS', 'sex_F', 'sex_M', 'address_R',
'address_U', 'famsize_GT3', 'famsize_LE3', 'Pstatus_A', 'Pstatus_T',
'Mjob_at_home', 'Mjob_health', 'Mjob_other', 'Mjob_services',
'Mjob_teacher', 'Fjob_at_home', 'Fjob_health', 'Fjob_other',
'Fjob_services', 'Fjob_teacher', 'reason_course', 'reason_home',
'reason_other', 'reason_reputation', 'guardian_father',
'guardian_mother', 'guardian_other', 'schoolsup_no', 'schoolsup_yes',
'famsup_no', 'famsup_yes', 'paid_no', 'paid_yes', 'activities_no',
'activities_yes', 'nursery_no', 'nursery_yes', 'higher_no',
'higher_yes', 'internet_no', 'internet_yes', 'romantic_no',
'romantic_yes'],
dtype='object')
/var/folders/l5/w9vxyb4s46b7rmcp_txnn9fw0000gn/T/ipykernel_76773/3661735764.py:1: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
df_used.head(2).append(df_used.tail(2))
age Medu Fedu traveltime studytime failures famrel freetime goout Dalc ... activities_no activities_yes nursery_no nursery_yes higher_no higher_yes internet_no internet_yes romantic_no romantic_yes 0 18 4 4 2 2 0 4 3 4 1 ... 1 0 0 1 0 1 1 0 1 0 1 17 1 1 1 2 0 5 3 3 1 ... 1 0 1 0 0 1 0 1 1 0 393 18 3 2 3 1 0 4 4 1 3 ... 1 0 1 0 0 1 0 1 1 0 394 19 1 1 1 1 0 3 2 3 3 ... 1 0 0 1 0 1 0 1 1 0
4 rows × 59 columns
(395, 33)
/var/folders/l5/w9vxyb4s46b7rmcp_txnn9fw0000gn/T/ipykernel_76773/768104290.py:1: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead. df_data.head(2).append(df_data.tail(2))
school | sex | age | address | famsize | Pstatus | Medu | Fedu | Mjob | Fjob | ... | famrel | freetime | goout | Dalc | Walc | health | absences | G1 | G2 | G3 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | GP | F | 18 | U | GT3 | A | 4 | 4 | at_home | teacher | ... | 4 | 3 | 4 | 1 | 1 | 3 | 6 | 5 | 6 | 6 |
1 | GP | F | 17 | U | GT3 | T | 1 | 1 | at_home | other | ... | 5 | 3 | 3 | 1 | 1 | 3 | 4 | 5 | 5 | 6 |
393 | MS | M | 18 | R | LE3 | T | 3 | 2 | services | other | ... | 4 | 4 | 1 | 3 | 4 | 5 | 0 | 11 | 12 | 10 |
394 | MS | M | 19 | U | LE3 | T | 1 | 1 | other | at_home | ... | 3 | 2 | 3 | 3 | 3 | 5 | 5 | 8 | 9 | 9 |
4 rows × 33 columns
(357, 33)
/var/folders/l5/w9vxyb4s46b7rmcp_txnn9fw0000gn/T/ipykernel_76773/1705270932.py:4: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead. df_data.head(2).append(df_data.tail(2))
school | sex | age | address | famsize | Pstatus | Medu | Fedu | Mjob | Fjob | ... | famrel | freetime | goout | Dalc | Walc | health | absences | G1 | G2 | Grade | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | GP | F | 18 | U | GT3 | A | 4 | 4 | at_home | teacher | ... | 4 | 3 | 4 | 1 | 1 | 3 | 6 | 5 | 6 | 6 |
1 | GP | F | 17 | U | GT3 | T | 1 | 1 | at_home | other | ... | 5 | 3 | 3 | 1 | 1 | 3 | 4 | 5 | 5 | 6 |
393 | MS | M | 18 | R | LE3 | T | 3 | 2 | services | other | ... | 4 | 4 | 1 | 3 | 4 | 5 | 0 | 11 | 12 | 10 |
394 | MS | M | 19 | U | LE3 | T | 1 | 1 | other | at_home | ... | 3 | 2 | 3 | 3 | 3 | 5 | 5 | 8 | 9 | 9 |
4 rows × 33 columns
Index(['school', 'sex', 'age', 'address', 'famsize', 'Pstatus', 'Medu', 'Fedu', 'Mjob', 'Fjob', 'reason', 'guardian', 'traveltime', 'studytime', 'failures', 'schoolsup', 'famsup', 'paid', 'activities', 'nursery', 'higher', 'internet', 'romantic', 'famrel', 'freetime', 'goout', 'Dalc', 'Walc', 'health', 'absences', 'G1', 'G2', 'Grade'], dtype='object')
Index(['age', 'Medu', 'Fedu', 'traveltime', 'studytime', 'failures', 'famrel', 'freetime', 'goout', 'Dalc', 'Walc', 'health', 'absences', 'G1', 'G2', 'Grade', 'school_GP', 'school_MS', 'sex_F', 'sex_M', 'address_R', 'address_U', 'famsize_GT3', 'famsize_LE3', 'Pstatus_A', 'Pstatus_T', 'Mjob_at_home', 'Mjob_health', 'Mjob_other', 'Mjob_services', 'Mjob_teacher', 'Fjob_at_home', 'Fjob_health', 'Fjob_other', 'Fjob_services', 'Fjob_teacher', 'reason_course', 'reason_home', 'reason_other', 'reason_reputation', 'guardian_father', 'guardian_mother', 'guardian_other', 'schoolsup_no', 'schoolsup_yes', 'famsup_no', 'famsup_yes', 'paid_no', 'paid_yes', 'activities_no', 'activities_yes', 'nursery_no', 'nursery_yes', 'higher_no', 'higher_yes', 'internet_no', 'internet_yes', 'romantic_no', 'romantic_yes'], dtype='object')
/var/folders/l5/w9vxyb4s46b7rmcp_txnn9fw0000gn/T/ipykernel_76773/3661735764.py:1: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead. df_used.head(2).append(df_used.tail(2))
age | Medu | Fedu | traveltime | studytime | failures | famrel | freetime | goout | Dalc | ... | activities_no | activities_yes | nursery_no | nursery_yes | higher_no | higher_yes | internet_no | internet_yes | romantic_no | romantic_yes | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 18 | 4 | 4 | 2 | 2 | 0 | 4 | 3 | 4 | 1 | ... | 1 | 0 | 0 | 1 | 0 | 1 | 1 | 0 | 1 | 0 |
1 | 17 | 1 | 1 | 1 | 2 | 0 | 5 | 3 | 3 | 1 | ... | 1 | 0 | 1 | 0 | 0 | 1 | 0 | 1 | 1 | 0 |
393 | 18 | 3 | 2 | 3 | 1 | 0 | 4 | 4 | 1 | 3 | ... | 1 | 0 | 1 | 0 | 0 | 1 | 0 | 1 | 1 | 0 |
394 | 19 | 1 | 1 | 1 | 1 | 0 | 3 | 2 | 3 | 3 | ... | 1 | 0 | 0 | 1 | 0 | 1 | 0 | 1 | 1 | 0 |
4 rows × 59 columns
Split dataframe into features dataframe and target dataframe
Index(['age', 'Medu', 'Fedu', 'traveltime', 'studytime', 'failures', 'famrel',
'freetime', 'goout', 'Dalc', 'Walc', 'health', 'absences', 'G1', 'G2',
'school_GP', 'school_MS', 'sex_F', 'sex_M', 'address_R', 'address_U',
'famsize_GT3', 'famsize_LE3', 'Pstatus_A', 'Pstatus_T', 'Mjob_at_home',
'Mjob_health', 'Mjob_other', 'Mjob_services', 'Mjob_teacher',
'Fjob_at_home', 'Fjob_health', 'Fjob_other', 'Fjob_services',
'Fjob_teacher', 'reason_course', 'reason_home', 'reason_other',
'reason_reputation', 'guardian_father', 'guardian_mother',
'guardian_other', 'schoolsup_no', 'schoolsup_yes', 'famsup_no',
'famsup_yes', 'paid_no', 'paid_yes', 'activities_no', 'activities_yes',
'nursery_no', 'nursery_yes', 'higher_no', 'higher_yes', 'internet_no',
'internet_yes', 'romantic_no', 'romantic_yes'],
dtype='object')
Index(['Grade'], dtype='object')
Index(['age', 'Medu', 'Fedu', 'traveltime', 'studytime', 'failures', 'famrel', 'freetime', 'goout', 'Dalc', 'Walc', 'health', 'absences', 'G1', 'G2', 'school_GP', 'school_MS', 'sex_F', 'sex_M', 'address_R', 'address_U', 'famsize_GT3', 'famsize_LE3', 'Pstatus_A', 'Pstatus_T', 'Mjob_at_home', 'Mjob_health', 'Mjob_other', 'Mjob_services', 'Mjob_teacher', 'Fjob_at_home', 'Fjob_health', 'Fjob_other', 'Fjob_services', 'Fjob_teacher', 'reason_course', 'reason_home', 'reason_other', 'reason_reputation', 'guardian_father', 'guardian_mother', 'guardian_other', 'schoolsup_no', 'schoolsup_yes', 'famsup_no', 'famsup_yes', 'paid_no', 'paid_yes', 'activities_no', 'activities_yes', 'nursery_no', 'nursery_yes', 'higher_no', 'higher_yes', 'internet_no', 'internet_yes', 'romantic_no', 'romantic_yes'], dtype='object')
Index(['Grade'], dtype='object')
Split data into training/testing sets
Baseline for Students Grade Prediction
Baseline on test data, MAE is 2.83
Baseline on test data, RMSE is 3.51
/opt/anaconda3/lib/python3.9/site-packages/numpy/core/fromnumeric.py:3438: FutureWarning: In a future version, DataFrame.mean(axis=None) will return a scalar mean over the entire DataFrame. To retain the old behavior, use 'frame.mean(axis=0)' or just 'frame.mean()'
return mean(axis=axis, dtype=dtype, out=out, **kwargs)
Baseline on test data, MAE is 2.83 Baseline on test data, RMSE is 3.51
/opt/anaconda3/lib/python3.9/site-packages/numpy/core/fromnumeric.py:3438: FutureWarning: In a future version, DataFrame.mean(axis=None) will return a scalar mean over the entire DataFrame. To retain the old behavior, use 'frame.mean(axis=0)' or just 'frame.mean()' return mean(axis=axis, dtype=dtype, out=out, **kwargs)
Let's build a linear regression model to predict the student grade using three features: studytime, traveltime and higher_yes
studytime higher_yes traveltime 375 3 1 4 121 4 1 1 339 2 1 1 199 2 1 1 282 4 1 2
LinearRegression()
Using Linear Regression, MAE is 2.96
Using Linear Regression, RMSE is 3.54
/opt/anaconda3/lib/python3.9/site-packages/numpy/core/fromnumeric.py:3438: FutureWarning: In a future version, DataFrame.mean(axis=None) will return a scalar mean over the entire DataFrame. To retain the old behavior, use 'frame.mean(axis=0)' or just 'frame.mean()'
return mean(axis=axis, dtype=dtype, out=out, **kwargs)
w0: [8.62325546]
w1,w2,w3: [[ 0.62150473 2.10847672 -0.33782372]]
studytime | higher_yes | traveltime | |
---|---|---|---|
375 | 3 | 1 | 4 |
121 | 4 | 1 | 1 |
339 | 2 | 1 | 1 |
199 | 2 | 1 | 1 |
282 | 4 | 1 | 2 |
LinearRegression()
Using Linear Regression, MAE is 2.96 Using Linear Regression, RMSE is 3.54
/opt/anaconda3/lib/python3.9/site-packages/numpy/core/fromnumeric.py:3438: FutureWarning: In a future version, DataFrame.mean(axis=None) will return a scalar mean over the entire DataFrame. To retain the old behavior, use 'frame.mean(axis=0)' or just 'frame.mean()' return mean(axis=axis, dtype=dtype, out=out, **kwargs)
w0: [8.62325546] w1,w2,w3: [[ 0.62150473 2.10847672 -0.33782372]]