SDSC2005-22B
Exercise 4. Time Series Analysis for Predictive Research
(V3, Feb 28, 2023)
Your EID (required):
Data1: Ex4_data1.xlsx, which contains the following sheets:
· Intraday_data: the price of HSI per minute from Jan 16, 2023 to Feb 13, 2023 (in total of 18 trading days)
· Interday_data: the price of HSI per day from Jan 2, 1987 to Feb 10, 2023 (in total of 9,167 trading days of 36 years)
· Use linear interpolation to fill any empty cell.
Task:
Identify the following intraday cycles of HSI Price to answer the question that when is the best time of the day to buy (i.e., at the lowest price) and sell (the highest price) stocks on average in HK stock market:
1. Intraday time:
a. Detrend (removing the overall trend of) Price throughout the entire period of the Intraday Data;
b. Use each 30 minutes of trading time as a half-hourly unit within each trading day;[1]
c. Use 1-hot encoding to create the half-hourly variables;
d. Use an OLS regression with Price as the DV and hourly variables as the IVs to measure the effect of “half-hour of the day” on Price;
e. Optional: use alternative way(s) to measure the best time to buy/sell based on OLS regression.[2]
2. Interday time:
a. Detrend Price throughout the entire period of the Interday Data;
b. Use each trading day as a daily unit within each trading week and each month as a monthly unit within each month, respectively;
c. Use 1-hot encoding to create the daily and monthly variable(s), respectively;
d. Use an OLS regression with Price as the DV and daily and monthly variables as the IVs to measure the effect of “day of the week” and “month of the year” on Price, respectively;
e. Optional: use alternative way(s) to create best day/month to buy/sell on the OLS regression report the resulting effect if significantly greater than the 1-hot encoding approach (grading policy: extra point(s) for significantly improved results, depending on the size of the improvement; no penalty for wrong answers).
Report:
1. Quantitative findings in Table 1.
2. A summary paragraph to interpret what investors may learn from the results, if any, for their trading strategies.
Table 1. Results of OLS Regressions
| Intraday Effect (Half-hour of the day) | Interday Effect (Day of the week) | Interday Effect (Month of the week) |
Required: | |||
Best time to buy | Which half-hour? | Which day? | Which month? |
Best time to sell | Which half-hour? | Which day? | Which month? |
Ratio of sell-to-buy price[3] | the s2b ratio | the s2b ratio | the s2b ratio |
Model R-squared |
|
| |
Optional: | |||
Best time to buy | Which half-hour? | Which day? | Which month? |
Best time to sell | Which half-hour? | Which day? | Which month? |
Ratio of sell-to-buy price | the s2b ratio | the s2b ratio | the s2b ratio |
Model R-squared |
|
|
Data2: Ex4_data2.xlsx, containing the following sheets (using “Adj Close” in column F as Price for all questions below):
· 0005.hk: the price and volume of HSBC (bank)
· 0027.hk: the price and volume of Galaxy Entertainment (casino)
· 0101.hk: the price and volume of Hang Lung Properties
· HSI: the price and volume of Hang Seng Index (Hong Kong)
· DJI: the price and volume of Dow Jones Index (U.S.)
· SSEC: the price and volume of Shanghai Stock Exchange Composite (China)
· Use linear interpolation to fill any empty cell.
Task:
1. ARIMA parameters:
a. Data: use all dates up to Dec 31, 2022 for the three stocks (HSBC, Galaxy, and Hang Lung), respectively;
b. Use ACF (autocorrelation function) and PACF (partial autocorrelation function) to identify the autoregression (AU), integration (I), and moving average (MA) parameters for each stock price;
c. Fit a univariate ARIMA model (i.e., only Price plus AR, I, and MA, without any IV) for each stock
d. Report the results in Table 2.
2. Predictive models:
a. Data: split the data to a training set (up to Dec 31, 2022) and a test set (from Jan 1 to Feb 21, 2023) for each stock;
b. Model: build a predictive model for each stocks, respectively, using Price as the DV and any of the following as the IVs:
i. Time-effects: day of the week, month of the year (“seasonality”), and any other features that represent repeated cycles of time (see questions for Data 1);
ii. Internal factors: the previous price and volume of the stock (no need for previous price if you use ARIMA/SARIMA because it will be automatically included);
iii. Market influences: the previous price of the stock market in Hong Kong (HSI), the U.S. (DJI), and mainland China (SSEC);
iv. Optional IVs: any other time series data measured on a daily unit to be collected by you and add to the model as the IVs (same grading policy as in Data 1 applies here);
b. Estimation (based on the training set) and test (based on the test set) method: use any method of your choice, including an ensemble of several methods, e.g.,
i. OLS;
ii. Exponential smoothing;
iii. ARIMA/SARIMA;
iv. Machine learning/deep learning;
v. Anything else;
c. Report your model specification and results in Table 3.
3. Forecast future values: use your predictive model to forecast the price of each stock on March 13, 15, and 17. Report the results in Table 4 and Figure 1.
Report: Present your results in the following tables:
Table 2. ARIMA Parameters of Individual Stock Price
| HSBC (005) | Galaxy (027) | Hang Lung (101) |
Autoregression (AR) | |||
· Order (e.g., 0, 1, etc.) |
|
|
|
· Coefficient |
|
|
|
Integration (I) | |||
· Order |
|
|
|
Moving Average (MA) | |||
· Order |
|
|
|
· Coefficient |
|
|
|
Model fit (AIC) |
|
|
|
Table 3. Predictive Models of Individual Stock Price
| HSBC (005) | Galaxy (027) | Hang Lung (101) |
a. Training Set: | |||
· Model type |
|
|
|
· Equation |
|
|
|
· Accuracy (MAPE)[4] |
|
|
|
· Justification for using the model |
|
|
|
b. Test Set: | |||
Accuracy (MAPE)3 |
|
|
|
Table 4. Forecasted Stock Price on March 13, 15, and 17
| HSBC (005) | Galaxy (027) | Hang Lung (101) |
March 13 |
|
|
|
March 15 |
|
|
|
March 17 |
|
|
|
Figure 1.
1. Use a scatterplot with Price in the y-axis, and date in the x-axis, including include an “observed period” (up to Feb 21, 2023) and a “forecast period” (March 13, 15, and 17);
2. Show two lines (the observed and estimated prices) in the observed period and three lines (forecasted price, and the confidence interval at the 95% confidence level) in the forecast period;
3. See slide 34 of Week 5 as two examples.
Optional Question for both Data1 and Data2 (the above grading policy applies)
Note that you are required to detrend for Data1 but not required to do so for Data2. Discuss what the detrend (for Data1) and non-trend (for Data2) will do to the respective results? If you think either Data1 or Data2 should be done differently, why and what will happen to the results?
Submission:
1. Write your answer the above questions in this Word document and save it in Word format (doc or docx);
2. Attach your programming codes for both Data1 and Data2 in the original format (e.g., *.py, *.ipynb, etc.);
3. Put the Word file and the programming codes in a zip/rar file package (i.e., *.zip or *.rar) and upload it to the Assignment box.