ECON7310: Elements of Econometrics Research Project 1
Instruction
Please answer all questions following a format similar to the answers to your tutorial questions. When you use R to conduct empirical analysis, you should show your R script(s) and outputs (e.g., screenshots of commands, tables, and figures). You will lose 2 points whenever you fail to provide R commands and outputs. Please clearly label all your answers and keep your response brief and concise. You should upload your research report (in PDF or Word format) via the Turnitin submission link (in the “Research Project 1” folder under “Assessment”) . You are allowed to work on this assignment in groups; however, you must answer all the questions in your own words and submit your report separately.
Background
Use the cps09mar.csv dataset to estimate the effect of education on earnings. Data descrip tion and variable definitions can be found in the document cps09mar description.pdf . For all questions below, use the subsample of individuals who are nonHispanic and at least 22 years old.
Research Questions
1.(10 points) Create two new variables: wage = earnings/(hours ×week) andln(wage)1. Plot histograms for these two variables to explore their distributions.
2.(15 points) You have read in the news that women make 70 cents for every dollar earned by men. To investigate this phenomenon, you first regress ln(wage) on a constant and a binary variable, which takes on a value of 1 for females and is 0 otherwise.
(a)(3 points) Report the estimation result in the standard equation form as introduced in Lecture 5, where the estimates are presented along with standard errors and some measure of goodness of fit. (b)(9 points) Based on the estimation result in part (a), calculate the female hourly wages as a percentage of the male hourly wages. Indicate whether or not the percentage difference in the mean hourly wages is statistically significant. (c)(1 point) Based on the estimation result in part (a), how would you test whether or not women earn less than men on average (in percentage terms)? (d)(2 points) Are these results enough to argue that there is discrimination against females in the labor market? Why or why not?
3.(15 points) You recall from your textbook that additional years of education are supposed to result in higher earnings. For that reason, you decide to include the education variable in the regression in question 2.
(a)(8 points) Report the estimation results. What is the effect of an additional year of education on hourly wages (“returns to education”) for men? For women? (b)(5 points) Based on the estimation result in part (a), for a given level of education, how much less do females earn on average? Does this result represent stronger evidence of discrimination against females? (c)(2 points) To investigate whether or not there is discrimination against females, you regress the log of earnings on determining variables, such as education, and a binary variable for females. You consider two possible specifications. First, you run two separate regressions, one for females and one for the others. Second, you run a single regression but allow for the binary variable to appear in the regression. Your professor suggests that the latter option is better for the task at hand, as long as you allow for a shift in both the intercepts and the slopes. Explain her reasoning.
4.(15 points) You read in the literature that there should also be returns to onthejob training. To approximate onthejob training, researchers often use the socalled Mincer or potential experience variable, which is defined as exper = age  education  6 . (a)(4 points) Under what condition(s) would the estimates in Question 3 be biased and inconsistent due to the omission of the work experience? (b)(8 points) You incorporate the experience variable into your regression in Question
 Report the estimation results and interpret the estimated coefficients. (c)(3 points) Draw scatter plots of ln(wage) versus exper for female workers with at least a Bachelor’s degree or equivalent.
5.(20 points) You suspect the relationship between ln(wage) andexper is not linear. To test this idea, you add the square of experience to your loglinear regression in Question
 Based on the new estimation result:

(a)(6 points) Test for the significance of the coefficient of the quadratic term. Is it meaningful? Are there strong reasons to assume that this specification is superior to the previous one? (b)(2 points) Has the coefficient on education changed much compared to the esti mation results in Questions 3 and 4? Why or why not? (c)(6 points) Bob is a 40yearold male high school graduate. Predict his hourly wage2. What is the effect of an additional year of experience on his hourly wage? (d)(6 points) What is the effect of an additional year of experience on the hourly wage of a person who has 20 years of work experience, holding constant the gender and the education variables? Calculate the 95% confidence interval of the estimated effect. Is it a significant effect?
6.(12 points) With the regression model in Question 5, you are still concerned about omitted variable bias. For that reason, you decide to include one more control variable in the regression, and you want to find the effect of introducing marital status. Accordingly, you specify a binary variable, Married , that takes on the value of one if the worker is married ( marital ≤3) and zero otherwise. Based on the new estimation result
(a)(6 points) Compare the effect of being married and the effect of an additional year of education, and test whether these two effects are of the same magnitude. (b)(6 points) What is the percentage difference in hourly wages between a single male and a married female, controlling for education and experience? What about between single males and females? Between married males and females?
7.(10 points) In your final specification, you allow for the two binary variables, gender and marital status, to interact, by adding the interaction term to the regression in Question 6. Based on the new estimation result, repeat the exercise in 6(b) of calculating the various percentage differences between gender and marital status. Do you think the approach in this question is more general than the one in Question 6?
8.(3 points) Report estimation results of all regressions in Questions 2 to 7 using a table similar to those presented in your Tutorials 5–6.3