Homepage
Programming
[2022] DS-UA 201 Causal Inference - Homework3 CATE using GOTV

[2022] DS-UA 201 Causal Inference - Homework3 CATE using GOTV

Engage in a Conversation

Homework 3-115 points CourseNana.COM

General Instructions CourseNana.COM

This homework must be turned in on Gradescope by August 4th 2022, 11:59pm. It must be your own work, and your own work only—you must not copy anyone’s work, or allow anyone to copy yours. This extends to writing code. You may consult with others, but when you write up, you must do so alone. Your homework submission must be written and submitted using Rmarkdown. No handwritten solutions will be accepted. You should submit: CourseNana.COM

A compiled PDF file named yourNetID solutions.pdf containing your solutions to the problems.
A .Rmd file containing the code and text used to produce your compiled pdf named yourNetID solutions.Rmd. Note that math can be typeset in Rmarkdown in the same way as Latex.

Please make sure your answers are clearly structured in the Rmarkdown file: CourseNana.COM

Label each question part(e.g. 3.a).
Do not include written answers as code comments.
The code used to obtain the answer for each question part should accompany the written answer.

Problem 1 - CATE using GOTV 20 points CourseNana.COM

Consider again the GOTV data from last problem set by Gerber, Green and Larimer (APSR, 2008). Although it is not specified in the paper, it is highly possible that the authors created subgroups based on the turnout history for 5 previous primary and general elections (number of times the individual voted), and number of registered voters in the household. In this problem, we will create subgroups based on the turnout history, and investigate the CATE(conditional average treatment effect) and the effect modifications in each subgroup. We denote the turnout history/number of times voted as a covariate Xi for individual i. CourseNana.COM

Part a. Data preparation (5 points): CourseNana.COM

Construct a new dataset for this problem using individual dataset from the last problem set. CourseNana.COM

Create a new column num voted to represent the number of times the individual has voted in previous 5 elections by summing the variables g2000, p2000, g2002, p2002 and p2004 (exclude g2004 because the experiment filtered out people who didn’t vote in g2004), the resulting column should be an integer ranging from [0,5]
In the following problems, we are using the individual data with num voted as different sub- groups. To simplify the problem, we investigate only the ”Neighbor” treatment effect. Con- struct a cleaner dataset with {id, hh id, hh size, num voted, voted, treatment} as columns and filter out treatment groups besides {Neighbor, Control}.

Construct a household-level dataset by taking the means of hh size, num voted, and voted in each household (the other variables are all equal within the same household and can simply be left as they are). Round the mean of num voted up to the nearest integer. Your result- ing dataset should have one household per row, and hh id, hh size, num voted, voted, and treatment as columns. The variable num voted should have only values 0, 1, 2, 3, 4, 5.
Report number of households in each subgroup for both treatment and control, what do you observe?

Part b. CATE for subgroups (6 points) CourseNana.COM

We define conditional average treatment effect as the ATE for different subgroups defined by the ”num voted” variable: CourseNana.COM

τ(x)=E[Yi(1)−Yi(0)∣Xi =x],x∈{0,1,2,3,4,5}
Since treatment was randomized at the household level, positivity and ignorability hold both unconditionally, and conditionally, within each subgroup. For each subgroup: CourseNana.COM

1. Estimate the CATE and report the variance of your estimates. CourseNana.COM

2. Construct a 95% confidence interval around your estimates. CourseNana.COM

3. What conclusions can you draw from these statistics? CourseNana.COM

You can skip subgroups that either do not have members in them or do not have any treated/control members. CourseNana.COM

Part c. Effect modification (6 points) CourseNana.COM

Suppose we want to estimate whether there is a difference in effects for two extreme groups, individuals who always vote(Xi = 5) and individuals who never vote(Xi = 0), we construct an estimator CourseNana.COM

Calculate the variance of δ and construct a 95% confidence interval around it, can we say that there’s significant difference in the treatment effect for people who always vote and people who never vote?
Combine your observations with conclusions from part b, comment about your findings.

Part d. Sample sizes and significance effect (3 points) CourseNana.COM

In the experiment, the authors claimed no significant differences between groups, one possible reason may be that the sample size for each subgroup is too small. This is a practical problem we may encounter in experimental designs when we are testing multiple hypothesis or we are having too many subgroups. Explain in your own words why having more hypothesis/subgroups would make significant effect harder to detect for each group, assuming the overall sample size is fixed. CourseNana.COM

∆to estimate the difference. As we saw in class, we can estimate this difference as: ∆ˆ = τˆ(0) − τˆ(5) CourseNana.COM

Problem 2 - 15 points CourseNana.COM

In this question we will be using the same household-level dataset that you constructed in part a of Problem 1. CourseNana.COM

Part a (4 points): CourseNana.COM

Compute the ATE of the ”Neighbors” treatment using the standard difference-in-means estimator, CourseNana.COM

Part b (5 points): CourseNana.COM

Now compute the same ATE but with the stratification estimator that is defined as the weighted mean of the stratum CATEs that you computed in the previous problem: i.e., τˆ = Yt − Yc. Provide standard errors and 95% confidence intervals for your estimates. CourseNana.COM

estimator defined as: CourseNana.COM

Nx τ̂ =∑τˆ(x) . CourseNana.COM

Compute variance and 95% confidence intervals for this estimator as well using the stratified variance CourseNana.COM

Var (τ ) = ∑ Var(τ (x)) ( ) CourseNana.COM

Comment on the difference between the ATE estimates you obtained here and in part a and their variances. What is it due to? CourseNana.COM

Part c (6 points): CourseNana.COM

Now Divide the data set into 6 strata in such a way that each of the strata have same proportion of Treated and Control observations. You can do so by creating a new variable called ”group” with values 0, 1, 2, 3, 4, 5 and randomly assigning each value to Nt/6 treated units and Nc/6 control units. You may exclude enough treated and control units from the data to make Nt and Nc divisible by 6. CourseNana.COM

Compute the ATE by applying the estimator τˆ to these newly created strata. Provide variance block estimates and 95% confidence intervals for these ATE estimates as well using the stratified variance estimator. Is the variance of this estimator much different from that of τˆ you computed in part A? Why do you think this is the case? CourseNana.COM

Problem 3 25 points CourseNana.COM

Consider a study with N units. Each unit i in the sample belongs to one of G mutually exclusive strata. Gi = g denotes that the ith unit belongs to stratum g. Ng denotes the size of stratum g and Nt,g denotes the number of treated units in that stratum. Suppose that treatment is assigned via block-randomization. Within each stratum, Nt,g units are randomly selected to receive treatment and the remainder receive control. Suppose that the proportion of treated units in each stratum, Nt,g is not the same for all strata. After treatment is assigned, you record an outcome Yi for each Ng unit in the sample. Assume consistency holds with respect to the potential outcomes: Yi = DiYi(1) + (1 − Di)Yi(0) CourseNana.COM

Part a (5 points) CourseNana.COM

Show that the ATE: τ = E[Yi(i) − Yi(0)] is is identified in this setting, i.e., show that τ equal to a function of the observed outcomes. CourseNana.COM

Part b (10 points) CourseNana.COM

estimator: CourseNana.COM

G Ng τ̂ = ∑ τ̂ ( g ) N CourseNana.COM

g=1 is unbiased for the ATE, i.e., show that E[τ̂] = τ: CourseNana.COM

Part c (10 points) CourseNana.COM

Instead of using the stratified difference-in-means estimator, your colleague suggests an alternative that assigns a weight to each unit and takes two weighted averages. Let w(Gi) = Pr(Di = 1∣Gi) denote the known (constant) probability that unit i would receive treatment given its stratum membership Gi. The new estimator is: CourseNana.COM

Assume that E[τ̂(g)∣G = g,N = n ] = τ(g) and that E[Ng ] = Pr(G = g). Show that the stratified CourseNana.COM

Problem 4 - Directed Acyclic Graphs (DAGs) 15 points Consider the following Directed Acyclic Graph: CourseNana.COM

N i=1 w(Gi) 1 − w(Gi)
Assuming that E[Ng ] = Pr(G = g), show that τ̂ is unbiased i.e., show that E[τ̂ ] = τ. CourseNana.COM

Note: either showing that τ̂ is unbiased for τ = E[Y (1)−Y (0)] or for τ = 1 ∑N E[Y (1)−Y (0)] CourseNana.COM

w ii Ni=1ii will count as a valid answer. CourseNana.COM

Part a (5 points) CourseNana.COM

Of the five variables in the graph, 2 are colliders and 3 are non colliders. Which variables are colliders and which are non-colliders? CourseNana.COM

Part b (5 points) CourseNana.COM

Suppose that we wanted to estimate the effect of A on Y . Indicate if we should or should not condition on X, and explain why, and indicate if we should or should not condition on Z and explain why. CourseNana.COM

Part c (5 points) CourseNana.COM

Suppose that we wanted to estimate the effect of M on Y . List all the backdoor paths between M and Y, and indicate which variable we should condition on to close each path. There may be multiple valid options for each path. CourseNana.COM

Get in Touch with Our Experts

WeChat (微信)

Last: [2022] CMPT 479 Special Topics in Computing Systems - Assignment3 Error Correcting Codes

Next: [2022] Applied Data Science (MAST30034) - Project 1: Quantitative Analysis

NYU代写,New York University代写,DS-UA 201代写,Causal Inference代写,R代写,NYU代编,New York University代编,DS-UA 201代编,Causal Inference代编,R代编,NYU代考,New York University代考,DS-UA 201代考,Causal Inference代考,R代考,NYUhelp,New York Universityhelp,DS-UA 201help,Causal Inferencehelp,Rhelp,NYU作业代写,New York University作业代写,DS-UA 201作业代写,Causal Inference作业代写,R作业代写,NYU编程代写,New York University编程代写,DS-UA 201编程代写,Causal Inference编程代写,R编程代写,NYUprogramming help,New York Universityprogramming help,DS-UA 201programming help,Causal Inferenceprogramming help,Rprogramming help,NYUassignment help,New York Universityassignment help,DS-UA 201assignment help,Causal Inferenceassignment help,Rassignment help,NYUsolution,New York Universitysolution,DS-UA 201solution,Causal Inferencesolution,Rsolution,