1. Homepage
  2. Programming
  3. ST2195 Programming for data science Coursework Project - Markov Chain Monte Carlo algorithm

ST2195 Programming for data science Coursework Project - Markov Chain Monte Carlo algorithm

Engage in a Conversation
LSEST2195Programming for data scienceMarkov ChainMonte Carlo algorithmR scriptMetropolis-Hastings algorithm

ST2195 Coursework Project CourseNana.COM

Instructions to candidates CourseNana.COM

This project contains two questions. Answer BOTH questions. All questions will be given equal weight (50%). CourseNana.COM

In this part, you are asked to work with the Markov Chain Monte Carlo algorithm, in particular the Metropolis-Hastings algorithm. The aim is to simulate random numbers for the distribution with probability density function given below CourseNana.COM

f(x) = 12 exp(−|x|), CourseNana.COM

where x takes values in the real line and |x| denotes the absolute value of x. More specifically, you are asked to generate x0, x1, . . . , xN values and store them using the following version of the Metropolis-Hastings algorithm (also known as random walk Metropolis) that consists of the steps below: CourseNana.COM

Random walk Metropolis CourseNana.COM

Step 1 Set up an initial value x0 as well as a positive integer N and a positive real number s. Step2 Repeatthefollowingprocedurefori=1,...,N: CourseNana.COM

  • Simulate a random number xfrom the Normal distribution with mean xi1 and standard deviation s. CourseNana.COM

  • Compute the ratio CourseNana.COM

    r (x, xi1) = f (x) . f (xi1) CourseNana.COM

  • Generate a random number u from the uniform distribution between 0 and 1. CourseNana.COM

  • Ifu<r(x,xi1),setxi =x,elsesetxi =xi1. CourseNana.COM

    (a) Apply the random walk Metropolis algorithm using N = 10000 and s = 1. Use the generated samples (x1, . . . xN ) to construct a histogram and a kernel density plot in the same figure. Note that these provide estimates of f (x).Overlay a graph of f (x) on this figure to visualise the quality of these estimates. Also, report the sample mean and standard deviation of the generated samples (Note: these are also known as the Monte Carlo estimates of the mean and standard deviation respectively). CourseNana.COM

    Practical tip: To avoid numerical errors, it is better to use the equivalent criterion log u < log r (x, xi1) = log f (x) log f (xi1) instead of u < r (x, xi1). CourseNana.COM

2023-24/ST2195 Coursework Project Page 1 of 3 CourseNana.COM

(b) The operations in part 1(a) are based on the assumption that the algorithm has converged. One of the most widely used convergence diagnostics is the so-called Rb value. In order to obtain a valued of this diagnostic, you need to apply the procedure below: CourseNana.COM

  • Generate more than one sequence of x0,...,xN, potentially using different CourseNana.COM

    initial values x0. Denote each of these sequences, also known as chains, by CourseNana.COM

    (x(j),x(j),...,x(j)) for j = 1,2,...,J. 01N CourseNana.COM

  • Define and compute Mj as the sample mean of chain j as 1N CourseNana.COM

    i=1
    and Vj as the within sample variance of chain j as CourseNana.COM

Mj = Xx(j). Ni CourseNana.COM

1N
Vj = X(x(j) Mj)2. CourseNana.COM

In general, values of Rb close to 1 indicate convergence, and it is usually desired for Rb to be lower than 1.05. Calculate the Rb for the random walk Metropolis algorithm with N = 2000, s = 0.001 and J = 4. Keeping N and J fixed, provide a plot of the values of Rb over a grid of s values in the interval between 0.001 and 1. CourseNana.COM

B = J
Rb= W CourseNana.COM

(Mj M)2 rB + W CourseNana.COM

2023-24/ST2195 Coursework Project Page 2 of 3 CourseNana.COM

CourseNana.COM

The 2009 ASA Statistical Computing and Graphics Data Expo consisted of flight arrival and departure details for all commercial flights on major carriers within the USA from Oc- tober 1987 to April 2008. This is a large dataset; there are nearly 120 million records in total, and it takes up 1.6 gigabytes of space when compressed and 12 gigabytes when un- compressed. The complete dataset, along with supplementary information and variable descriptions, can be downloaded from the Harvard Dataverse at https://doi.org/10.7910/DVN/HG7NV7 CourseNana.COM

Choose any subset of ten consecutive years and any of the supplementary information provided by the Harvard Dataverse to answer the following questions using the principles and tools you have learned in this course: CourseNana.COM

  1. (a)  What are the best times and days of the week to minimise delays each year? CourseNana.COM

  2. (b)  Evaluate whether older planes suffer more delays on a year-to-year basis. CourseNana.COM

  3. (c)  For each year, fit a logistic regression model for the probability of diverted US flights using as many features as possible from attributes of the departure date, the sched- uled departure and arrival times, the coordinates and distance between departure and planned arrival airports, and the carrier. Visualize the coefficients across years. CourseNana.COM

General Instructions CourseNana.COM

  • All questions should be answered using R and Python for all tasks. CourseNana.COM

  • Your answers should be provided in a separate structured report of no more than 1 page for part 1 and 6 pages for part 2. The page limit excludes title, references and table of contents but includes graphics and tables. The report should be in PDF format and also contain adequate explanations for readers not familiar with programming. In addition to the report, you will also be asked to provide your R and Python code in RMarkdown and Jupyter notebooks, respectively. All the relevant files must be submitted in the designated Atrio or VLE submission portal. CourseNana.COM

  • For part 2, each report should detail all steps you took starting from raw data up to the answer for each question. Any databases you set up, data wrangling/cleaning operations you carry out, and any modelling decisions you make should be clearly described in each structured report. Each report should also include any relevant graphics and tables as part of the answer. CourseNana.COM

  • If you are using elements (e.g. code, databases, graphics, etc) from your answer to a previous question to answer the current one, you will need to refer to those elements. CourseNana.COM

  • You should also supply the code you used to answer each question, in a way that can be used by someone else to replicate your analyses. You can do this either as separate scripts or separate RMarkdown/Jupyter notebooks per question, clearly indicating (both with comments and in the filename) which question each script refers to. CourseNana.COM

2023-24/ST2195 Coursework Project Page 3 of 3  CourseNana.COM

Get in Touch with Our Experts

WeChat WeChat
Whatsapp WhatsApp
LSE代写,ST2195代写,Programming for data science代写,Markov Chain代写,Monte Carlo algorithm代写,R script代写,Metropolis-Hastings algorithm代写,LSE代编,ST2195代编,Programming for data science代编,Markov Chain代编,Monte Carlo algorithm代编,R script代编,Metropolis-Hastings algorithm代编,LSE代考,ST2195代考,Programming for data science代考,Markov Chain代考,Monte Carlo algorithm代考,R script代考,Metropolis-Hastings algorithm代考,LSEhelp,ST2195help,Programming for data sciencehelp,Markov Chainhelp,Monte Carlo algorithmhelp,R scripthelp,Metropolis-Hastings algorithmhelp,LSE作业代写,ST2195作业代写,Programming for data science作业代写,Markov Chain作业代写,Monte Carlo algorithm作业代写,R script作业代写,Metropolis-Hastings algorithm作业代写,LSE编程代写,ST2195编程代写,Programming for data science编程代写,Markov Chain编程代写,Monte Carlo algorithm编程代写,R script编程代写,Metropolis-Hastings algorithm编程代写,LSEprogramming help,ST2195programming help,Programming for data scienceprogramming help,Markov Chainprogramming help,Monte Carlo algorithmprogramming help,R scriptprogramming help,Metropolis-Hastings algorithmprogramming help,LSEassignment help,ST2195assignment help,Programming for data scienceassignment help,Markov Chainassignment help,Monte Carlo algorithmassignment help,R scriptassignment help,Metropolis-Hastings algorithmassignment help,LSEsolution,ST2195solution,Programming for data sciencesolution,Markov Chainsolution,Monte Carlo algorithmsolution,R scriptsolution,Metropolis-Hastings algorithmsolution,