**Group Assignment**

This group assignment consists of three parts: the first two parts are related to the initial lecture, "Financial Analytics," and the third part is associated with the third lecture, "Linear Regression."

Please assemble groups comprising **4 to 5 students**. Each student must outline their individual contributions to the group project. Within your group, you can, for example, concentrate on specific sections or questions. Ensure you submit your work in Jupyter notebook format (.ipynb) by the deadline of **Sunday, December 3rd, at 11:30pm**. Late submissions will not be accepted, and a grade of 0 will be assigned.

**Part I - Time to Recovery After Maximum Drawdown**

** **

The time to recovery after maximum drawdown is the time it takes for a portfolio to recover from its worst loss. Specifically, if the maximum drawdown period starts at time and ends at time , the time to recovery is the time it takes for the portfolio to go back at or above its value at time after time . In the example below, the drawdown period started on week 4 with a portfolio value at $110, the trough was reached on week 7, and the portfolio was back above $110 on week 12, so the time to recovery was 12 − 7 = 5 weeks.

A portfolio may not always recover by the end of the sample, as in the example below, where the value stayed under $110.

1. Based on the function that returns the maximum drawdown and the start and the end dates of the maximum drawdown period for a portfolio (either your own function or the function *maxDD_Series* provided in the Lecture 1 *‘nb_5_maxDD_solution.ipynb’* Jupyter notebook), write a Python function that accepts a Pandas Series and returns these three outputs and the time to recovery, expressed as a number of days. In case the input series has not recovered, your function should return an identifiable time to recovery, e.g., an empty array.

2. Calculate the time to recovery for the 6 funds in the ‘.csv’ file.

** **

** **

**Part II – Maximum Time Under Water**

** **

The following definition of the maximum time under water is reproduced from the Glossary of Scientific Beta website https://www.scientificbeta.com/factor/#/ :

“Using daily return series, a daily price index is obtained for the strategy portfolio. We define ’High Water Marks’ (or HWM) as the points in the price index at which the index value is greater than any previous index value and is followed by a loss. For each HWM, which is also the starting point of a drawdown, we calculate ’Time Under Water’ (or TUW) as the number of business days required to reach the same index value at the HWM or, in other words, the time required to reach the point at which the cumulative losses for that drawdown are equal to zero. If the HWM value is never reached in the future, which could be case with the latest HWM, we define TUW as the number of business days from HWM till the end of the data series. The maximum across all TUW values is reported as the ’Maximum Time Under Water’. **It should be noted that this ’Maximum Time Under Water’ corresponds to the drawdown that lasted for the longest period of time and not to the maximum drawdown.”**

We are calculating the maximum time under water for the six funds in the *‘funds.csv’* file by following this definition, except that we are counting calendar days rather than business days.

1. Write a Python function that calculates the first hitting index for a Pandas Series. This function should take a Pandas Series and a number and return the first index where the Series is greater than or equal to the number. If the target is never hit, the function should return some default value, e.g. an empty object.

** **

2. Write a Python function that returns the longest “period under water” for a Pandas Series and the duration in days of that period. Here is a possible method:

· The function should first calculate all high water marks (HWMs) in the Series. If 𝑇 is the number of fund quotes in the sample and is the fund value at time 𝑡, is a HWM if

for all date and

** **

· After the list of HWMs has been obtained, the function should find, for each HWM, the first date at which the fund is back above the HWM after reaching it a first time. This “recovery date” can be found by calling the function written in Question 1. The start date of the “period under water” is the date when the HWM is first reached, and the end date is the date when it is reached for the second time. The function should record a list of these dates and the corresponding period duration for all HWMs.

· At last, the function should return the duration of the longest period under water and the corresponding start and end dates.

3. Find the longest periods under water for the 6 funds in the *‘funds.csv’*.

** **

** **

**Part III – Predictive Linear Regression**

The purpose of this part is to estimate a simple predictive model of stock returns and to look at the stability of the model over time.

We are using the economic and financial dataset of Robert Shiller[1]. Among other variables, it contains the monthly values of the S&P 500 index from January 1871 to February 2023 and the price- earnings ratio, defined as the index value divided by the average earnings of the constituents over the past 10 years. Thus, the ratio measures the index value relative to past earnings. Intuitively, it is a “cheapness” indicator, and the purpose of the activity is precisely to check if there is a relationship between the return on the S&P 500 index over a given investment period and the price-earning ratio at the start of that period. Stock returns are hard to predict, especially over small periods, so we must consider long enough periods. In this activity, we are using 5-year periods.

1. Create a blank Jupyter notebook,read the data from the comma separated value file *ie_data2023.csv* into a DataFrame and extract the annual values (in January) of the S&P 500 index and the price-earnings ratio (CAPE).

2. Calculate the 5-year returns (from January of year to January of year ) on the S&P 500 index with a one-year shift. In other words, the first return is from January 1871 to January 1875, the second return is from January 1872 to January 1876 and so on.

These returns are overlapping because two consecutive returns have 4 years in common. In the following questions, we look at the relationship between the price-earnings ratio in January of year 𝑡 and the returns from January of year 𝑡 to January of year 𝑡 + 5.

3. Make two Pandas Series *peratio* and *stockreturns* with the **same length** containing respectively the non-missing price-earnings ratios at the start of every 5-year period and the 5-year stock returns.

4. Make a scatter plot of PERs against 5-year returns. Is there any relationship between the two quantities from the plot?

5. Regress the 5-year returns on the PERs and comment on the relationship between the two variables. The regression equation reads

6. Make a scatter plot of the residuals in year against the residuals in . Is there any apparent relationship between residuals and their lagged values? Where does it come from? What lag *h* should be used to remove the relationship between residuals in year 𝑡 and residuals in year ?

7. Re-estimate the parameters 𝛼 and 𝛽 by using non-overlapping returns, i.e. a sub-sample of 5-year returns and PERs suitably chosen so that there is no overlap between two consecutive returns. How does this affect the significance of the relationship between returns and PERs?

In the following question, you are using the overlapping returns again and you are re-estimating the parameters 𝛼 and 𝛽 over a rolling window of 30 years. This procedure is intended to give us a sense of the stability of the model over time and represents therefore a robustness check.

8. Re-estimate the parameters over a 30-year rolling window. The first window is from January 1871 to January 1901, the second is from 1872 to 1902 and so on. For each window, save the t-stat for the slope 𝛽 in an array or a list.

9. Plot the time series of t-statistics. How often is the relationship between returns and the PER significant?