BFF5555 Project Semester 2 2024
Project Overview
This capstone project assesses your ability to apply machine learning concepts and frameworks covered in this unit to build a predictive model for a financial application.
Submission Requirements:
-
A PDF report documenting your model building process, including analysis and conclusions.
-
A Jupyter Notebook containing all Python code for your implementation. The notebook should be self contained, meaning the marker can run all the codes therein without making any adjustment. Clear all outputs before submitting the notebook.
-
A data file.
-
Use of AI statement (if applicable). See below.
Each of these files should be named as FirstnameLastName. For example, XiaoWang_REPORT.pdf, XiaoWang_CODE.ipynb, XiaoWang_data.csv, XiaoWang_AI.pdf.
There is no set number of pages and styling. Some students prefer to present in bullet points, others choose a more narrative style.
Project Objective
You are required to develop a machine learning model to predict positive market movements (uptrend). This prediction task will be treated as a binary classification problem, where the target variable is binary [0, 1].
Key Tasks:
Select one ticker symbol (stock or ETF) of interest.
Focus on predicting short-term returns (e.g., daily or weekly).
Follow the six-step model building process discussed in class.
Perform all computation, plotting and model implementation in a Jupyter Notebook. Document the model building process in a PDF file. The PDF file should contain tables and plots generated from the Notebook.
Additional guidance: Data and preprocessing
For daily predictions, a dataset of 5 years should be sufficient. For weekly predictions,
select a suitably longer timeframe.
The features must be derived from OHLCV data (Open, High, Low, Close, Volume)
available from Yahoo Finance. You are expected to:
Construct features such as intra-period price range, sign and magnitude of past returns:
Feature
O-C, H-L
Sign
Past Returns
Formula
Open - Close, High - Low sign [ ]
Description
Intraperiod price range
Sign of return or momentum
Lagged returns
Construct additional technical indicators using
The total number of initial features is your design choice.
Apply feature selection techniques, such as feature importance ranking and
regularization, to refine the feature set.
Algorithm and model training and selection
Include all suitable machine learning algorithms covered in this unit. Perform
hyperparameter tuning and model selection on the training set. Select the final model
based on cross validation.
The train-test split ratio is your design choice.
Define the response variable based on your selected ticker and prediction frequency.
For example, you may choose to label small positive returns (below 0.25%) as negative
for weekly returns.
Model evaluation
The evaluation must include relevant metrics plus a backtest, the latter should report the annualised return and Sharpe ratio for your strategy compared to a buy-and-hold benchmark.
Support
Post any project-related queries on the Assessments Forum.
.
... ,2−tr ,1−tr
) 1−tP (nl = tr
tP
Pandas-TA
Use of Generative AI
You may use Generative AI tools to assist with the Python coding aspects of this assessment. If you choose to do so, you must submit a separate document that includes:
Acknowledgment of AI usage, with a clear explanation of how and where it was used. Documentation of the AI tool employed, including screenshots of the prompts and any interactions with the AI.
For guidance on how to complete this document, please refer to this link. While the use of Generative AI is permitted, it is not mandatory.
Marking rubric
Criteria
1. Ability to follow the six-step process
2. Competent execution of technical aspects
3. Creative application in feature engineering and model selection
4. Quality of documentation
Marks Description
- Clearly structured workflow following the six-step 10 process discussed in class.
- Logical progression and adherence to all key steps (e.g., data collection, preprocessing, etc.).
- Data collection: Sufficient, relevant data collected and
explained.
- Preprocessing: Correct handling of missing data,
normalization, and transformations.
10 - Model evaluation: Robust validation methods (train-test
split, cross-validation, backtest) applied and evaluated
clearly.
- Coding: efficient Python programming with appropriate
use of relevant packages.
- Thoughtful feature engineering (going beyond basic
features demonstrated).
10 - Exploration of advanced techniques or creative use of
domain knowledge.
- Novel algorithm/model selection and tuning (attempt to
innovate or tailor to dataset).
- Well-commented code in Jupyter notebook.
- Clear, concise, and thorough report, explaining choices, 10 findings, and reflections.
- Screenshots of AI tools used and interactions documented (if applicable).