The Job Openings and Labor Turnover Survey (JOLTS) is conducted by the U.S. Bureau of Labor Statistics. The data set for this assignment gives total nonfarm job quits in the U.S., monthly in thousands. A quit is defined as leaving one’s job voluntarily and does not include retirement or transfer to another location. The data are in the file JOLTS22.txt and span the months from January 2001 to July 2022.
To do this assignment, you need to convert the class for two variables given in the data frame. To do so, employ these steps at the outset:
Read in the data frame using this form of command:
JOLTS<-read.csv("F:/Stat711022Fall/JOLTS22.txt")
That is, give the name JOLTS to the data frame. Next, give these commands:
attach(JOLTS)
Time<-as.numeric(Time)
fMonth<-as.factor(Month)
The last two lines convert the variable Time to numeric class and the variable Month to factor class. In addition, augment the data frame using the following command:
JOLTS<-data.frame(JOLTS,fMonth)
1. Make separate time series plots for (i) Quits, (ii) log Quits, and (iii) the log return of Quits. Mark the periods of economic downturn as determined by the Business Cycle Dating Committee. Discuss and compare the three plots. Comment on trend structure and volatility. Do the plots reveal any unusual features? If yes, describe what is notable and discuss the underlying causes. Do the plots in (i) and (ii) indicate whether an additive decomposition model or a multiplicative decomposition model should be fit to model the variable Quits? Explain your answer.
In all the parts which follow, include in the models you fit the two pairs of trigonometric variables which account for calendar structure. If one or both of the pairs are found to be insignificant, remove and refit. In addition, fit subsequent models with data excluded for the years 2020 to 2022, unless otherwise instructed. To fit a model with these years excluded, use a command of the type
model<-lm(y~x1+x2,data=JOLTS[1:228,])
In this command, JOLTS is the name of the data frame.
2. Fit an additive decomposition model to Quits. Include a polynomial trend, a seasonal component using the fMonth variable, and trigonometric variables to investigate calendar structure. Include only significant trigonometric pairs. Investigate whether an additive model is acceptable. To do so, use the methodology described on page 53 of the 8 September notes.
[R hint: To fit a fifth-degree polynomial trend, for example, include as explanatory variables in the lm command
Time + I(Time^2) + I(Time^3) + I(Time^4) + I(Time^5)
As an alternative, you can use poly(Time,5)
These two approaches give identical overall fits, but produce different coefficient estimates. The latter employs orthogonal polynomials, and the former does not. Either form can be used for this assignment—the overall results will be the same.]
3. Next, fit a multiplicative decomposition model to Quits. Discuss the results.
(a) Tabulate and plot the estimated statuc seasonal indices and give a detailed interpretation of them in the context of the data collection.
(b) Save the residuals from the fit. Form a normal quantile plot of these residuals, test the residuals for normality, plot the residuals vs. time, and plot their autocorrelations. Describe each of these results. The model fails to capture trend structure fully. Where does it fail and what is the cause? What conclusions do you draw from the residual analysis? In particular, what structures in the time series has the model failed to capture?
4. Redo the fit in part 2, but now with cosine and sine seasonal dummies, instead of the fMonth variable, for estimation of the seasonal component. Although this is an additive model fit, which was rejected in part 2, proceed nonetheless. The estimates will be correct, but their standard errors will be slightly wrong. Perform the amplitude, phase, and peak calculations and tabulate and interpret the results. [R hint: After you form the cosine and sine variables for this part, add them to the data frame. Then fit the model, and remember to exclude data for the years 2020 to 2022. Code to form the cosines and sines and add them to the data frame follows:
cosm<-matrix(nrow=length(Time),ncol=6)
sinm<-matrix(nrow=length(Time),ncol=5)
for(i in 1:5){
cosm[,i]<-cos(2*pi*i*Time/12)
sinm[,i]<-sin(2*pi*i*Time/12)
}
cosm[,6]<-cos(pi*Time)
c1<-cosm[,1];c2<-cosm[,2];c3<-cosm[,3];c4<-cosm[,4];c5<-cosm[,5];c6<-cosm[,6]
s1<-sinm[,1];s2<-sinm[,2];s3<-sinm[,3];s4<-sinm[,4];s5<-sinm[,5]
JOLTS<-data.frame(JOLTS,c1,s1,c2,s2,c3,s3,c4,s4,c5,s5,c6)
5. Calculate the lag 1, lag 2, lag 3, and lag 7 residuals from the model in part 3 and add these three new variables to the model you fit in part 3. Be sure to include the calendar trigonometric pairs if they turn out to be significant now. In addition, include the variable 2009. Then perform a residual analysis for this new model. What improvements do you notice? [R hint: The following code will create these lagged variables:
lresid<-c(rep(0,259))
lag1resid<-lresid;lag2resid<-lresid;lag3resid<-lresid;lag7resid<-lresid
lag1resid[2]<-resid(model)[1];lag1resid[3]<-resid(model)[2]
lag2resid[3]<-resid(model)[1]
for(i in 4:259){
i1<-i-1;i2<-i-2;i3<-i-3
lag1resid[i]<-resid(model)[i1];lag2resid[i]<-resid(model)[i2]
lag3resid[i]<-resid(model)[i3]
}
for(i in 8:259){
i7<-i-7
lag7resid[i]<-resid(model)[i7]
}
Add these four new variables to the data frame.
The part 3 model fails to reduce to white noise. The lag residuals help to capture added structure (in the irregular part) which the part 3 model fails to account for.
(a) Perform a thorough residual analysis, with a normal quantile plot and test for normality, a plot of the residuals vs. time, and a residual acf plot. What do these results indicate? Explain the role of the variable 2009.
(b) Calculate the estimated static seasonal estimates from this model. Compare them to the estimates obtained in part 3(a) using a table and a plot. Use one plot to picture the two sets of estimates. Discuss the result obtained.
6. Repeat the part 3 analysis, but now with inclusion of data for the years 2020 to 2022. Compare the static seasonal estimates to those obtained in parts 3 and 5. Discuss the impact of inclusion of these added data points on the seasonal estimation.