FINA6229
Machine Learning in Finance
Project 5
A Comprehensive Investment Project with Machine Learning
Introduction
The purpose of Project 5 is to give you some exposure to applying machine learning techniques to design an advanced investment strategy. Note that not all machine-learning techniques are suitable for solving investment problems. You need to figure out whether and how the machine-learning technique(s) you selected can help the investment problem. You are free to use any resources to help you finish this project. I recommend you taking advantage of existing packages from Python instead of writing your own codes from scratch.
Located in Hong Kong, AllGreen Alpha, LLC, is a long-short equity hedge fund with $2 billion under asset management focusing on the U.S. financial market. During the past decade, by implementing factor investing strategies, AllGreen Alpha has successfully beaten the benchmark (S&P 500) and delivered favorable returns to its clients after management fees and carried interests. Most factors covered by AllGreen Alpha are traditional anomalies such as size, book-to-market, momentum, idiosyncratic volatility, CAPM market beta, and firm profitability. Beginning this year, however, the portfolio managers find it more and more difficult to earn extra returns by simply investing in these factors. The long-short equity portfolio of AllGreen Alpha earned a total year- to-date return of 1.2%, while S&P 500 has gained 24.6% during the same period. The clients are complaining about the poor fund performance and threating with redemptions. As a result, one of the general partners, James Cohen, hired you as a research associate to help their equity portfolio managers design new investment strategies, in order to improve the fund’s performance.
James heard that recently there are many new profitable factors discovered in academia and some large quant funds are using complicated machine-learning algorithms to select rewarding factors from high-dimension dataset, so that they can pick up the most favorable stocks (for either long or short positions).
After discussing with the portfolio managers at AllGreen Alpha, James decided to apply machine-learning techniques to select outperformed individual stock returns in cross-section to improve the performance of the current portfolio. James learned that there are many advanced machine-learning approaches to strengthen stock return predictability. Given that he is not familiar with any of those techniques, James decided to give you flexibilities to explore some innovative methods to develop investment strategies. After consulting with some professionals in machine- learning area, James has summarized some popular methods which have been applied successfully in quantitative investments.
1. Ridge regression, LASSO, and ElasticNet
2. Logistic Regression
3. Support Vector Machine
4. Tree Model (Decision Trees, Random Forest, Gradient Boosting)
5. K-Nearest Neighbors
6. Clustering
7. Hidden Markov Model
8. Naïve Bayes
9. Cross Validation
10. Neural Network with Deep Learning
To support your research work, James has asked the IT department to help clean and organize the database from Bloomberg. The main table is stored in a “.csv” file. The data can be downloaded from the CUHK Blackboard. The database covers all available stock in the company’s investment pool. Each stock is labeled by the unique identifier “permno”, which is used by the Center for Research in Security Prices (CRSP) in the United States. There are two datasets:
- Data_Project_5: the historical data for 45 known factors, with each factor corresponding to a firm characteristic (including stock returns “ret”). The sample period is from January
The definition of each variable has been listed in the appendix
at the end of the project.
- Benchmark_SP500: the monthly total returns of S&P 500
Given the project target, James asks you to conduct cross-sectional analyses and write an investment policy statement and submit to the investment committee. In the report:
-
1) Briefly describe your investment strategy. Which machine learning method do you choose to build the model? How do you fit the method to the investment problem? What is the advantage of your method, compared to those traditional approaches?
-
2) Output a time series of your portfolio performance (monthly returns) from 198001 to 202211. Then report necessary statistics to describe the backtesting performance of your model. Plot the cumulative returns for your backtesting portfolio performance of your new investment strategy and S&P 500 index over the same sample period. Briefly comment on the performance of your new investment strategy.