MGSC 416, Winter 2023 Data-driven Models for Operations Analytics

# Problem Set 6 – Individual Assignment

Consider a pharmaceutical company that is developing a drug to decrease cholesterol levels. It has developed three prototypes of a drug from which they wish to choose one prototype to take to the market. We model the drug testing trial as a multi-armed bandit problem. Each prototype is an arm that we “pull” if we test the prototype on a patient. For each patient i, we choose a prototype and take the “reward” - a binary value that is 1 if the drug works on the patient and 0 if it does not work. In this problem, you will need to implement multi-armed bandit algorithms to maximize the total reward. You may use the code given in class to adapt the algorithms to the assignment.

- We consider three algorithms: the ε-greedy, the ε-decreasing and the Thomson-sampling algorithms. For each algorithm, which parameters do we need to select? (3 pts)
- For various values of the parameters in the previous question (for example, ε ∈ [0.01,0.02,....,0.6]), run the ε-greedy, the ε-decreasing and the Thomson-sampling algorithms on the training dataset, Training.csv, which represents simulated test scores from 250 25-35 year old adults. Assume that each drug j is effective on every member of this test population independently with probability pj, which is unknown. Report the sum of rewards that you get from the three strategies, and thus select the best values for the parameters for each of the algorithms. (14 pts) Note: for Thompson-sampling, assume a prior distribution of Beta(1,1) (uniform [0,1]) for the pa- rameters of each arm. This allows you to update the posterior distribution as follows: If your prior is Beta(a,b), then the posterior distribution is Beta(a+1,b) for an observed success and Beta(a,b+1) for an observed failure.
- Using the parameters learned from the previous question, test the three multi-armed bandit strategies on the first test dataset, Test1.csv, which represents simulated successes or failures, from 100 25-35 year old adults from a similar population to the training data. (I.e. for each test subject, each strategy should only access one of the results in that row of the data, corresponding to the prototype selected for that subject.) Report the sum of rewards that you get from the three strategies. (8 pts)
- Now instead consider the second dataset, Test2.csv, which is constructed by taking the same 100 simulated test scores from 25-35 year old adults, and appending an additional 100 simulated test scores, this time from 55-75 year old adults. Using the same parameters as in Question 3, test the three strategies on this new dataset. Report the sum of rewards that you get from the three strategies. How is your answer different to the results that you got from the first dataset? Explain these differences. (10 pts)
- Which drug prototype would you recommend further preparation and testing to market? (5 pts)