1. Homepage
  2. Programming
  3. Coursework: Logistic Regression, Decision Trees and Classification

Coursework: Logistic Regression, Decision Trees and Classification

Engage in a Conversation
University of California IrvineUCIMachine LearningArtificial IntelligenceRData MiningData VisualisationStatistical modelling

Resit coursework CourseNana.COM

Read carefully all the instructions and information in the coursework brief. It contains clear guidance on a number of questions related to how to approach the coursework task and what to avoid. CourseNana.COM

Submission instructions CourseNana.COM

1.     Electronic submission: Submit an electronic copy of your report in PDF format on Moodle. CourseNana.COM

2.     Page limits: Your report must be submitted as a PDF file that does not exceed 12 pages, with at least 11 point typeface. This limit is strict and it includes appendices (which I strongly recommend that you do not use). If your report exceeds the page limit your mark will be affected negatively as you will be failing on the last assessment criterion (see below). CourseNana.COM

3.     Plagiarism: This is an individual piece of assessment, and you should ensure that your report reflects your own work exclusively. CourseNana.COM

All reports go through automated software to detect plagiarism from a variety of sources (including past and current students’ reports as well as online resources, conference and journal publications etc.) The consequences of plagiarism are very serious. CourseNana.COM

4.     Report Structure: This is not a business report and as such it does not need to include an executive summary, a cover page, table of contents, or even an introduction describing the context of the task. It is however mandatory to end your report with a Conclusions section that summarises your findings (this will be assessed). CourseNana.COM

Assessment criteria CourseNana.COM

Please read carefully the following criteria and make sure that you understand them and their implications when preparing your report. CourseNana.COM

Exploratory analysis CourseNana.COM

1.     Your ability to use correctly the tools that we covered in the course. It is important to stress that your report needs to clearly show that you understand what the visualisation and statistical measures you use mean, and why they are relevant to this specific task (problem). It is not enough to simply present relevant figures and measures; you also need to explain what these are and why you chose to use these. CourseNana.COM

2.     Your ability to draw the correct conclusions from the visualisation and statistical measures you use. Again including a figure or a number/ statistic is not sufficient: You have to inform the reader what it means and why it is relevant. CourseNana.COM

3.     Your ability to address the questions posed in the coursework brief based on an intelligent interpretation of the evidence provided in the previous two steps. CourseNana.COM

4.     Your ability to express and justify your key findings succinctly (rather than report every possible figure/ table/ statistic/ or model you created). CourseNana.COM

5.     You will also be assessed on report quality aspects such as using figures and tables which are legible, have captions and numbers, and are properly referenced in the text. CourseNana.COM

6.     You will not be assessed based on your R programming skills. CourseNana.COM

Bear in mind I will read your report, and assess your work based on the description and interpretation of your findings. I will not read the tables, figures, screenshots etc, and draw my own conclusions. CourseNana.COM

Statistical modelling CourseNana.COM

·       For both logistic regression, and decision trees discuss different settings you used and why you considered these important. (Consider the choice of variable selection method as part of this question also.) CourseNana.COM

·       For each classification method develop one or a few candidate models that you think are promising before providing a final recommendation of the most appropriate model. You do not need to include every possible model that you tried in detail, but you must include the results for what you consider as the important steps in the process that led to your final recommendations. In particular, you must provide a clear and logical explanation of the steps you followed and justify the different decisions you made. CourseNana.COM

·       Justify the recommended model(s), using appropriate performance measures. Comment on your findings and the generalisation performance of the model(s) you recommend for each type of classifier. CourseNana.COM

Your coursework will not be evaluated solely by the quality of the final model, or by whether you got a particular answer right. You will be primarily assessed by whether you are able to correctly justify the steps you took to complete the assignment. In other words, your report needs to document that you are able to intelligently analyse the provided data; that you draw correct conclusions from what you observe; and that these conclusions lead you either to the next logical step of the data mining process, or to the revision of decisions made in previous steps of the analysis. (Refer to the flowchart of data mining stages we covered in the first lectures and in particular to the feedback loops) Therefore, don’t simply present the conclusions/ results of your analysis and expect to get a high mark. Reports that don’t document the steps followed and the reasons why these were chosen will receive minimal marks, even if the final answer is sensible. Explain your reasoning clearly and in good English. Don’t provide a list of bullet points, or unstructured sentences etc. Similarly, don’t include figures or any other output from R that you don’t comment/ explain in the text. I will not assume that you know how to interpret these correctly. CourseNana.COM

What to avoid CourseNana.COM

1.     Do not replicate the workshop material. The objective of the workshops is to provide hands on experience with the different concepts and methods introduced in the lectures. Workshops are not designed to provide a roadmap on how to answer the coursework. (This approach is typically a sign of little engagement with the coursework task.) CourseNana.COM

2.     Do not simply include figures and (or) screenshots from R without any (or hardly any) interpretation. CourseNana.COM

3.     Do not include figures and (or) tables without including captions, numbers etc so that they can be properly referenced in the text. CourseNana.COM

4.     Do not include R code, or explanation of R functions, and options, etc. As previously mentioned you will not be assessed based on your R programming skills. CourseNana.COM

Software and assessment CourseNana.COM

I recommend using R for this coursework, but you are free to use a software of your preference. However: CourseNana.COM

1.     You can not use as an excuse the fact that you couldn’t do a particular task because the software you chose does not offer a particular capability which we covered in the workshops. CourseNana.COM

2.     If you use a different software you must be able to explain the details of the output/ models produced by this software. To give an example (which is relevant to coursework 2) if you use a software like SPSS to perform variable selection for logistic regression you need to explain what variable selection method was used, with which parameters, and was the output. CourseNana.COM

Dataset description CourseNana.COM

The dataset for this coursework is included in the UCI machine learning repository, and was used for an actual research study aiming to understand and predict credit card holders who default on their debt. It is important to stress that the dataset you will be provided with is not identical to the one on the UCI repository, as it has been processed to correct specific errors. It is important however to bear in mind that as with any real-world dataset you should not expect the data to be perfect. Identifying any issues (limitations) with the data and attempting to correct these is part of the assessment. CourseNana.COM

Description: The data is a sample of 30,000 credit card holders from an important bank in Taiwan. The data was collected on October 2006. All amounts are in New Taiwan dollars (NT). In the variables’ list below first we report the name of each variable in the data frame format and then its description. CourseNana.COM

·       LIMIT BAL: Amount of credit, which includes both the individual consumer credit and his/her family (supplementary) credit. CourseNana.COM

·       EDUCATION: This is a categorical variable representing education: 1 = graduate school; 2 = university; 3 = high school; 4 = other/ unknown. CourseNana.COM

·       MARRIAGE: Marital status of credit card holder. Categorical variable taking values: 1 = married, 2=single, 0=unknown CourseNana.COM

·       AGE: Age of credit card holder CourseNana.COM

·       DELAY 1, ..., DELAY 6: Repayment status over the last 6 months. Specifically, DELAY 1 corresponds to repayment status in September, DELAY 2 to repayment status in August, etc. A value of zero means that the credit card holder has repaid their credit card fully. A value of 1 means that there is a payment delay of one month; 2 means a repayment delay of two months, etc. CourseNana.COM

·       BILL AMT1, ..., BILL AMT6: Bill statements over past six months: BILL AMT1 corresponds to Septem- ber 2005, BILL AMT2 to August 2005, etc up to BILL AMT6 which corresponds to April 2005. CourseNana.COM

·       PAY AMT1, ..., PAY AMT6: Amount of previous payments over past six months: PAY AMT1 corresponds to September 2005, PAY AMT2 to August 2005, etc up to PAY AMT6 which corresponds to April 2005. CourseNana.COM

·       default: Binary response (class) variable. This binary variable indicates whether the credit card holder defaulted on the next monthly payment (default=1), or paid on time (default=0). CourseNana.COM

Task description
Exploratory analysis (50% of the marks) CourseNana.COM

Using appropriate visualisation methods and statistical measures covered in the first part of the course (the meaning of this is explained precisely at the top of this brief), develop general and specific insights from the data which are relevant to the classification problem at hand. Your report should discuss all the variables contained in the dataset, and for each variable your answer should address the questions: CourseNana.COM

·       Does this variable appear to be important for the task at hand, and why? Support your claims with appropriate visualisations that document whether and how important each variable is. CourseNana.COM

·       Are different variables related, and which variables convey information similar to that provided in other variable(s)? CourseNana.COM

You should also report key findings related to issues of data quality such as incorrect observations, outliers, unexpected findings. Note that this is not an exhaustive list of questions. CourseNana.COM

Statistical modelling (50% of the marks) CourseNana.COM

Your objective is to develop a model to predict whether card holders will default on their next monthly payment. We are primarily interested in understanding what are the main factors that influence default to improve future decisions. The problem owner is interested in the following questions: CourseNana.COM

·       What is the best statistical model and how should it be used to achieve following goals (note that you can recommend different models for each task): CourseNana.COM

1.     Suppose that at least 95% of individuals that default must be correctly identified. What is the maximum proportion of individuals who pay on time that can be correctly predicted under the above requirement? CourseNana.COM

2.     If instead we must guarantee that at least 85% of credit card holders that pay on time are correctly identified. What is the maximum proportion of individuals that default that can be correctly predicted? CourseNana.COM

·       If the previous two objectives were not specified which statistical model would you recommend, and why? Justify your choice appropriately and state clearly how to use the model you recommend. CourseNana.COM

·       How many and which are the most important variables that determine default? (Do these differ depending on the objective?) CourseNana.COM

Please read carefully the assessment criteria for both sections of the report. CourseNana.COM

  CourseNana.COM

Get in Touch with Our Experts

WeChat (微信) WeChat (微信)
Whatsapp WhatsApp
University of California Irvine代写,UCI代写,Machine Learning代写,Artificial Intelligence代写,R代写,Data Mining代写,Data Visualisation代写,Statistical modelling代写,University of California Irvine代编,UCI代编,Machine Learning代编,Artificial Intelligence代编,R代编,Data Mining代编,Data Visualisation代编,Statistical modelling代编,University of California Irvine代考,UCI代考,Machine Learning代考,Artificial Intelligence代考,R代考,Data Mining代考,Data Visualisation代考,Statistical modelling代考,University of California Irvinehelp,UCIhelp,Machine Learninghelp,Artificial Intelligencehelp,Rhelp,Data Mininghelp,Data Visualisationhelp,Statistical modellinghelp,University of California Irvine作业代写,UCI作业代写,Machine Learning作业代写,Artificial Intelligence作业代写,R作业代写,Data Mining作业代写,Data Visualisation作业代写,Statistical modelling作业代写,University of California Irvine编程代写,UCI编程代写,Machine Learning编程代写,Artificial Intelligence编程代写,R编程代写,Data Mining编程代写,Data Visualisation编程代写,Statistical modelling编程代写,University of California Irvineprogramming help,UCIprogramming help,Machine Learningprogramming help,Artificial Intelligenceprogramming help,Rprogramming help,Data Miningprogramming help,Data Visualisationprogramming help,Statistical modellingprogramming help,University of California Irvineassignment help,UCIassignment help,Machine Learningassignment help,Artificial Intelligenceassignment help,Rassignment help,Data Miningassignment help,Data Visualisationassignment help,Statistical modellingassignment help,University of California Irvinesolution,UCIsolution,Machine Learningsolution,Artificial Intelligencesolution,Rsolution,Data Miningsolution,Data Visualisationsolution,Statistical modellingsolution,