1. Homepage
2. Programming
3. 554.488/688 Computing for Applied Mathematics Fall 2022 - Final Project Assignment - Fannie Mae Loan Performance Prediction Project

# 554.488/688 Computing for Applied Mathematics Fall 2022 - Final Project Assignment - Fannie Mae Loan Performance Prediction Project

USJohns Hopkins University554.488554.688Computing for Applied MathematicsFannie Mae Loan Performance Prediction ProjectLogistic regressionPython

554.488/688 Computing for Applied Mathematics Fall 2022 - Final Project Assignment

Fannie Mae Loan Performance Prediction Project

Project Aim

The aim of this project is use data collected on a large number of loans in Oct of 2021, to develop prediction models for the number of months payments are made on mortgage loans and for pre- dicting foreclosure of a loan based on information available to FNMA at the time the loan is put on their books.

Some background

FNMA aka Fannie Mae (look it up in Wikipedia) was put in place in order to ensure liquidity in the US mortgage loan markets. When a mortgage holder secures a mortgage from a bank, the bank will sell that mortgage to FNMA giving them the capital to enable them to make additional future loans. The FNMA bundles the mortgages they acqure into what are called mortgage-backed securities (MBS’s) and sells them to investors while insuring the underlying mortgages against losses of principal. The investors recieve the bundle of monthy payments associated with the underlying mortgages. When a holder of a mortgages forcloseses/defaults, or sells their house, or refinances their mortgage while most of the principle is transferred to the holder of the MBS, but this means that their future cash flow might not be as expected (interest rates may have gone down so any future bond investments bring lower returns). So the investor, in pricing the value of their asset, would like to be able to predict outcomes such as foreclosure or when the loan will be settled.

Hopefully this brief description gives you an understanding as to why one woud be interested in, for a given loan, being able to determine how likely it is to foreclose, or determine its duration i.e. how many months of payments can be expected before monthly payments cease to occur.

An email will be sent to every student in the class with urls for two comma delimited files: a training dataset and a test dataset.

Each student will have their own unique set of data. Each data have been drawn from different populations - using results for someone elses dataset will likely lead to poor performance.

You should not share these datasets with any other students in the class. You should not collaborate with other students in the class

Any evidence of data sharing or collaboration will be viewed as an ethics violation and subject to the rules and regulations of the university.

Training set

The training set is a comma delimited file consisting of information for exactly 250,000 mortgage loans with 30 variables (LOAN ID, 27 predictor variables, and 2 response variables):

The LOAN ID variable is a unique 12 character identifier for a mortgage loan.

Predidctor variables (as well as the others) are described in the Appendix. These variables provide information about the mortgage known to Fannie Mae when the mortgage was ac- quired by them.

The response variables are variables that ultimately become known by the time the data on loan performance was collected in Oct 2021.

NMONTHS variable is the number of months of mortgage payments made on the loan up until the date when data was collected.

FORECLOSURE variable is 1 if the loan foreclosed, and 0 otherwise as of the date when the data was collected.

Test set

The test set is also a comma delimited file consisting of information for 100,000 mortgage loans (drawn at random from the same loan population as your training set) with only the LOAN ID and the 27 predictor variables. I have the ground truth, i.e. the NMONTHS and FORECLOSURE variables for the loans in your test set. Once I have your predictions I will be able to determine the quality of those predictions.

Your task is to use the training data to build a predictors of each of the two response variables NMONTHS, FORECLOSURE.

For FORECLOSURE, I am asking you to pick 1,000 loans you think are most likely to foreclose.

Reading the data to create a data frame

The two files you are provided with are comma delimited with all data represented as a string (each column/field of fixed size). To ease the process of reading these files to produce data frames, a jupyter notebook called “FunctionToReadData” has been provided. This function does the conver- sions of the fields for you so to create the data frames (either training or testing) you simpy give a command like:

Some recommendations

You can use any method you wish to build your prediction model, but I recommend that you use regression for NMONTHS

logistic regression for FORECLOSURE
get started early!!! don’t put this off!!!
don’t assume prediction rates will be low - do the best you can

it is not just important to get good predictions - it is also important to be able to quantify how well your predictions are likely to perform i.e. do a good job in estimating your error rates

since you only have ground truth in the training set, it is recommended that you separate that dataset into a training set and a test set so that your error estimates are not underestimated due to over-fitting.

try various choices of sets of variables to use as predictors and compare performance on test data

I will ask you to provide a couple of summary bits of information about the variables in your training set.

NMONTHS (3): Number of months of mortgage payments made up until date the data was col-

## Get in Touch with Our Experts

QQ
WeChat
Whatsapp
US代写,Johns Hopkins University代写,554.488代写,554.688代写,Computing for Applied Mathematics代写,Fannie Mae Loan Performance Prediction Project代写,Logistic regression代写,Python代写,US代编,Johns Hopkins University代编,554.488代编,554.688代编,Computing for Applied Mathematics代编,Fannie Mae Loan Performance Prediction Project代编,Logistic regression代编,Python代编,US代考,Johns Hopkins University代考,554.488代考,554.688代考,Computing for Applied Mathematics代考,Fannie Mae Loan Performance Prediction Project代考,Logistic regression代考,Python代考,UShelp,Johns Hopkins Universityhelp,554.488help,554.688help,Computing for Applied Mathematicshelp,Fannie Mae Loan Performance Prediction Projecthelp,Logistic regressionhelp,Pythonhelp,US作业代写,Johns Hopkins University作业代写,554.488作业代写,554.688作业代写,Computing for Applied Mathematics作业代写,Fannie Mae Loan Performance Prediction Project作业代写,Logistic regression作业代写,Python作业代写,US编程代写,Johns Hopkins University编程代写,554.488编程代写,554.688编程代写,Computing for Applied Mathematics编程代写,Fannie Mae Loan Performance Prediction Project编程代写,Logistic regression编程代写,Python编程代写,USprogramming help,Johns Hopkins Universityprogramming help,554.488programming help,554.688programming help,Computing for Applied Mathematicsprogramming help,Fannie Mae Loan Performance Prediction Projectprogramming help,Logistic regressionprogramming help,Pythonprogramming help,USassignment help,Johns Hopkins Universityassignment help,554.488assignment help,554.688assignment help,Computing for Applied Mathematicsassignment help,Fannie Mae Loan Performance Prediction Projectassignment help,Logistic regressionassignment help,Pythonassignment help,USsolution,Johns Hopkins Universitysolution,554.488solution,554.688solution,Computing for Applied Mathematicssolution,Fannie Mae Loan Performance Prediction Projectsolution,Logistic regressionsolution,Pythonsolution,