1. Homepage
  2. Programming
  3. CS544 Intro to Big Data Systems - P1: Predicting COVID Deaths with PyTorch

CS544 Intro to Big Data Systems - P1: Predicting COVID Deaths with PyTorch

Engage in a Conversation
USWISCUniversity of WisconsinCS544Intro to Big Data SystemsPythonPredicting COVID Deaths with PyTorch

P1 (4% of grade): Predicting COVID Deaths with PyTorch

Overview

In this project, we'll use PyTorch to create a regression model that can predict how many deaths there will be for a WI census tract, given the number of people who have tested positive, broken down by age. The train.csv and test.csv files we provide are based on this dataset: https://data.dhsgis.wi.gov/datasets/wi-dhs::covid-19-vaccination-data-by-census-tract CourseNana.COM

Learning objectives: CourseNana.COM

  • multiply tensors
  • use GPUs (when available)
  • optimize inputs to minimize outputs
  • use optimization to optimize regression coefficients

Before starting, please review the general project directions. CourseNana.COM

Corrections/Clarifications

  • Feb 6: fix trainX and trainY examples

Part 1: Prediction with Hardcoded Model

Install some packages: CourseNana.COM

pip3 install pandas
pip3 install -f https://download.pytorch.org/whl/torch_stable.html torch==1.13.1+cpu
pip3 install tensorboard

Use train.csv and test.csv to construct four PyTorch tensors: trainXtrainYtestX, and testY. Hints: CourseNana.COM

trainX (number of positive COVID tests per tract, by age group) should look like this: CourseNana.COM

tensor([[ 24.,  51.,  44.,  ...,  61.,  27.,   0.],
        [ 22.,  31., 214.,  ...,   9.,   0.,   0.],
        [ 84., 126., 239.,  ...,  74.,  24.,   8.],
        ...,
        [268., 358., 277.,  ..., 107.,  47.,   7.],
        [ 81., 116.,  90.,  ...,  36.,   9.,   0.],
        [118., 156., 197.,  ...,  19.,   0.,   0.]], dtype=torch.float64)

trainY (number of COVID deaths per tract) should look like this (make sure it is vertical, not 1 dimensional!): CourseNana.COM

tensor([[3.],
        [2.],
        [9.],
        ...,
        [5.],
        [2.],
        [5.]], dtype=torch.float64)

Let's predict the number of COVID deaths in the test dataset under the assumption that the deathrate is 0.004 for those <60 and 0.03 for those >=60. Encode these assumptions as coefficients in a tensor by pasting the following: CourseNana.COM

coef = torch.tensor([
        [0.0040],
        [0.0040],
        [0.0040],
        [0.0040],
        [0.0040],
        [0.0040], # POS_50_59_CP
        [0.0300], # POS_60_69_CP
        [0.0300],
        [0.0300],
        [0.0300]
], dtype=testX.dtype)
coef

Multiply the first row testX by the coef vector and use .item() to print the predicted number of deaths in this tract. CourseNana.COM

Requirement: your code should be written such that if torch.cuda.is_available() is true, all your tensors (trainXtrainYtestXtestY, and coef) should be move to a GPU prior to any multiplication. CourseNana.COM

Part 2: R^2 Score

Create a predictedY tensor by multiplying all of testX by coef. We'll measure the quality of these predictions by writing a function that can compare predictedY to the true values in testY. CourseNana.COM

The R^2 score (https://en.wikipedia.org/wiki/Coefficient_of_determination) can be used as a measure of how much variance is a y column a model can predict (with 1 being the best score). Different definitions are sometimes used, but we'll define it in terms of two variables: CourseNana.COM

  • SStot. To compute this, first compute the average testY value. Subtract to get the difference between each testY value and the average. Square the differences, then add the results to get SStot
  • SSreg. Same as SStot, but instead of subtracting the average from each testY value, subtract the prediction from testY

If our predictions are good, SSreg will be much smaller than SStot. So define improvement = SStot - SSreg. CourseNana.COM

The R^2 score is just improvement/SStot. CourseNana.COM

Generalize the above logic into an r2_score(trueY, predictedY) that you write that can compute the R^2 score given any vector of true values alongside a vector of predictions. CourseNana.COM

Call r2_score(testY, predictedY) and display the value in your notebook. CourseNana.COM

Part 3: Optimization

Let's say y = x^2 - 8x + 19. We want to find the x value that minimizes y. CourseNana.COM

First, what is y when x is a tensor containing 0.0? CourseNana.COM

x = torch.tensor(0.0)
y = x**2 - 8*x + 19
y

We can use a PyTorch optimizer to try to find a good x value. The optimizer will run a loop where it computes y, computes how a small change in x would effect y, then makes a small change to x to try to make y smaller. CourseNana.COM

There are many optimizers in PyTorch; we'll use SGD here. You can create the optimizer like this: CourseNana.COM

optimizer = torch.optim.SGD([????], lr=0.1)

For ????, you can pass in one or more tensors that you're trying to optimize (in this case, you're trying to find the best value for x). CourseNana.COM

The optimizer is based on gradients (an idea from Calculus, but you don't need to know Calculus to do this project). You'll need to pass requires_grad=True to torch.tensor in your earlier code that defined x so that we can track gradients. CourseNana.COM

Write a loop that executes the following 30 times: CourseNana.COM

    optimizer.zero_grad()
    y = ????
    y.backward()
    optimizer.step()
    print(x, y)

Notice the small changes to x with each iteration (and resulting changes to y). Report x.item() in your notebook as the optimized value. CourseNana.COM

Create a line plot of x and y values to verify that best x value you found via optimization seems about right. CourseNana.COM

Part 4: Linear Regression

In part 1, you used a hardcoded coef vector to predict COVID deaths. Now, you will start with random coefficients and optimize them. CourseNana.COM

Steps: CourseNana.COM

Requirements: CourseNana.COM

  • report the r2_score of your predictions on the test data; you must get >0.5
  • print out how long training took to run
  • create a bar plot showing each of the numbers in your model.weight tensor. The x-axis should indicate the column names corresponding to each coefficient (from train.columns)

Tips: CourseNana.COM

Submission

You should commit your work in a notebook named p1.ipynb. CourseNana.COM

Approximate Rubric:

The following is approximately how we will grade, but we may make changes if we overlooked an important part of the specification or did not consider a common mistake. CourseNana.COM

  1. [x/1] prediction from hardcoded coefficient vector (part 1)
  2. [x/1] when available, the tensors being multiplied are moved to a GPU (part 1)
  3. [x/1] the R^2 score is computed correctly (part 2)
  4. [x/1] the R^2 score is computed via a reusable function (part 2)
  5. [x/1] the best x value is found (part 3)
  6. [x/1] the line plot is correct (part 3)
  7. [x/1] the training loop correctly optimizes the coefficients (part 4)
  8. [x/1] the R^2 score is reported and >0.5 (part 4)
  9. [x/1] the execution time is printed (part 4)
  10. [x/1] a bar plot shows the coefficients (part 4)

Get in Touch with Our Experts

WeChat (微信) WeChat (微信)
Whatsapp WhatsApp
US代写,WISC代写,University of Wisconsin代写,CS544代写,Intro to Big Data Systems代写,Python代写,Predicting COVID Deaths with PyTorch代写,US代编,WISC代编,University of Wisconsin代编,CS544代编,Intro to Big Data Systems代编,Python代编,Predicting COVID Deaths with PyTorch代编,US代考,WISC代考,University of Wisconsin代考,CS544代考,Intro to Big Data Systems代考,Python代考,Predicting COVID Deaths with PyTorch代考,UShelp,WISChelp,University of Wisconsinhelp,CS544help,Intro to Big Data Systemshelp,Pythonhelp,Predicting COVID Deaths with PyTorchhelp,US作业代写,WISC作业代写,University of Wisconsin作业代写,CS544作业代写,Intro to Big Data Systems作业代写,Python作业代写,Predicting COVID Deaths with PyTorch作业代写,US编程代写,WISC编程代写,University of Wisconsin编程代写,CS544编程代写,Intro to Big Data Systems编程代写,Python编程代写,Predicting COVID Deaths with PyTorch编程代写,USprogramming help,WISCprogramming help,University of Wisconsinprogramming help,CS544programming help,Intro to Big Data Systemsprogramming help,Pythonprogramming help,Predicting COVID Deaths with PyTorchprogramming help,USassignment help,WISCassignment help,University of Wisconsinassignment help,CS544assignment help,Intro to Big Data Systemsassignment help,Pythonassignment help,Predicting COVID Deaths with PyTorchassignment help,USsolution,WISCsolution,University of Wisconsinsolution,CS544solution,Intro to Big Data Systemssolution,Pythonsolution,Predicting COVID Deaths with PyTorchsolution,