Homepage
Programming
STAT605/STAT405 Data Science Computing Project - Homework 3: Distributed Computing via Slurm and the Statistics High Performance Computing (HPC) Cluster

STAT605/STAT405 Data Science Computing Project - Homework 3: Distributed Computing via Slurm and the Statistics High Performance Computing (HPC) Cluster

Engage in a Conversation

STAT605: Data Science Computing Project CourseNana.COM

Homework 3: Distributed Computing via Slurm and the Statistics High Performance Computing (HPC) Cluster CourseNana.COM

1. Login to a suitable HPC computer. CourseNana.COM

Use lunchbox only for editing and running Slurm commands that launch and manage jobs. (Do not run computations on lunchbox, as it cannot handle computations from many people.)
Run cd /workspace/<STATuser> to work in a directory the compute nodes can read. (They cannot read your home directory.)

(Optional: Run srun --pty /bin/bash to get an interactive job on a compute node where you can run and debug computations from a terminal. I do this from within the emacs shell. You may ignore the message “bash: .../.bashrc: Permission denied” which occurs be- cause the compute nodes cannot read your home directory.) CourseNana.COM

2. Solve the mtcars exercise at www.stat.wisc.edu/~jgillett/605/HPC/examples/5mtcarsPractice/instructions.txt. CourseNana.COM

Hint: I recommend that you now go to step (4) and turn in an incomplete but working version of your work. (We will grade your last submission before the deadline.) CourseNana.COM

Since this exercise (2) started as group work, it is ok for your solution to look like the solution of members of your group. For exercise (3), below, you should do independent work, so your solution should not look like other students’ solutions. CourseNana.COM

3. Read http://stat-computing.org/dataexpo/2009/the-data.html, which links to and de- scribes data on all U.S. flights in the period 1987-2008. Find out, for departures from Madison: CourseNana.COM

How far you can get in one flight?
What is the average departure delay for each day of the week?

To do this, write a program submit.sh and supporting scripts to:
(a) Run 22 parallel jobs, one for each year from 1987 to 2008. The first job should: CourseNana.COM

i. download the 1987 data via CourseNana.COM

wget http://pages.stat.wisc.edu/~jgillett/605/HPC/airlines/1987.csv.bz2 ii. unzip the 1987 data via bzip2 -d 1987.csv.bz2 CourseNana.COM

iii. useashortbashpipelinetoextractfrom1987.csvthecolumnsDayOfWeek,DepDelay, Origin, Dest, and Distance; and retain only the rows whose Origin is MSN (Madi- son’s airport code); and write a much smaller file, MSN1987.csv. CourseNana.COM

The other 21 jobs should handle the other years analogously. (On a recent run, my jobs took from 18 to 154 seconds to run, with an average of about 111 seconds.) CourseNana.COM

(b) Collect the Madison data from your 22 MSN*.csv files into a single allMSN.csv file, and write a set of jobs to answer the following two questions: CourseNana.COM

? CourseNana.COM

How far can you get from Madison in one flight? Write a line like MSN,ORD,109 to answer. This line says, “You can fly 109 miles from Madison (MSN) to Chicago (ORD).” But 109 isn’t the farthest you can get from Madison in one flight; write the correct line. (Hint: I used a bash pipeline to do this.) Save the result in farthest.txt. CourseNana.COM

What is the average departure delay for each day of the week? Write a pair of lines like these to a file delays.txt:
Mo Tu We Th Fr Sa Su
8.3 5.0 4.3 5.5 9.5 2.1 3.5 CourseNana.COM

(These are not the correct numbers.) Hint: I used R’s tapply() to do this. CourseNana.COM

4. Organize files to turn in your solution. (See “Copying files with scp,” below.) CourseNana.COM

(a) On your VM, make a directory NetID_hw3, where NetID is your NetID.
(b) Make a subdirectory NetID_hw3/mtcars. Copy the following files there:

? getData.sh
? jobArray.sh
? findLightest.sh ? submit.sh
? out CourseNana.COM

We should be able to recreate out by running ./submit.sh. CourseNana.COM

(c) Make a subdirectory NetID_hw3/airlines. Copy the following files there:

? submit.sh
? farthest.txt
? delays.txt
? any supporting files required by your submit.sh CourseNana.COM

We should be able to recreate farthest.txt and delays.txt by running ./submit.sh. CourseNana.COM

(d) MakeafileREADMEinthedirectoryNetID_hw3withalineoftheformNetID,LastName,FirstName. If you collaborated with any other students on the mtcars part of this homework, add ad-
ditional lines of this form, one for each of your collaborators. So, for example, if George
Box with NetID gepbox worked with John Bardeen with NetID jbardeen, George’s

README file should look like gepbox,Box,George CourseNana.COM

jbardeen,Bardeen,John CourseNana.COM

(e) From the parent directory of NetID_hw3, run tar cvf NetID_hw3.tar NetID_hw3 and then upload NetID_hw3.tar as your HW3 submission on Canvas.
You can verify your submission by downloading it from Canvas, and then:

i. Make a directory to test in, e.g. mkdir test_HW3. CourseNana.COM

ii. Move your downloaded .tar file there cd there.
iii. Extract the .tar file with tar xvf NetID_hw3.tar. This will make a new directory, CourseNana.COM

which should be called NetID_hw3. iv. Check that all your files are there. CourseNana.COM

CourseNana.COM

Get in Touch with Our Experts

WeChat (微信)

Last: ROB313: Introduction to Learning from Data - Assignment 1: KNN Algorithm

Next: STAT605/STAT405 Data Science Computing Project - Homework 1: The Emacs Text Editor

US代写,WISC代写,University of Wisconsin-Madison代写,Statistics 405代写,Statistics 605代写,Data Science Computing Project代写,R代写,Distributed Computing via Slurm and the Statistics High Performance Computing (HPC) Cluster代写,US代编,WISC代编,University of Wisconsin-Madison代编,Statistics 405代编,Statistics 605代编,Data Science Computing Project代编,R代编,Distributed Computing via Slurm and the Statistics High Performance Computing (HPC) Cluster代编,US代考,WISC代考,University of Wisconsin-Madison代考,Statistics 405代考,Statistics 605代考,Data Science Computing Project代考,R代考,Distributed Computing via Slurm and the Statistics High Performance Computing (HPC) Cluster代考,UShelp,WISChelp,University of Wisconsin-Madisonhelp,Statistics 405help,Statistics 605help,Data Science Computing Projecthelp,Rhelp,Distributed Computing via Slurm and the Statistics High Performance Computing (HPC) Clusterhelp,US作业代写,WISC作业代写,University of Wisconsin-Madison作业代写,Statistics 405作业代写,Statistics 605作业代写,Data Science Computing Project作业代写,R作业代写,Distributed Computing via Slurm and the Statistics High Performance Computing (HPC) Cluster作业代写,US编程代写,WISC编程代写,University of Wisconsin-Madison编程代写,Statistics 405编程代写,Statistics 605编程代写,Data Science Computing Project编程代写,R编程代写,Distributed Computing via Slurm and the Statistics High Performance Computing (HPC) Cluster编程代写,USprogramming help,WISCprogramming help,University of Wisconsin-Madisonprogramming help,Statistics 405programming help,Statistics 605programming help,Data Science Computing Projectprogramming help,Rprogramming help,Distributed Computing via Slurm and the Statistics High Performance Computing (HPC) Clusterprogramming help,USassignment help,WISCassignment help,University of Wisconsin-Madisonassignment help,Statistics 405assignment help,Statistics 605assignment help,Data Science Computing Projectassignment help,Rassignment help,Distributed Computing via Slurm and the Statistics High Performance Computing (HPC) Clusterassignment help,USsolution,WISCsolution,University of Wisconsin-Madisonsolution,Statistics 405solution,Statistics 605solution,Data Science Computing Projectsolution,Rsolution,Distributed Computing via Slurm and the Statistics High Performance Computing (HPC) Clustersolution,