FINTECH 590 Risk Management Project Week03: Covariance Matrix, Nearest PSD, PCA
Project Week03
Instructions:
Be verbose. Explain clearly your reasoning, methods, and results in your written work. Write clear code that is well documented. With 99% certainty, you cannot write too many code comments.
Written answers are worth 8 points. Code is worth 2 points. 10 points total.
- When finished, respond to the questions in Sakai as “done.” We will record your grade there.
- In your code repository, create a folder called “Week03.”
- In that folder, include a. a PDF with your responses. b. All code c. A README file with instructions for us to run your code
Everything must be checked into your repository by 8am Saturday 2/11. A pull will be done at that time. Documents and code checked in after the instructors pull will not be graded.
You are welcome to use a notebook for coding, but do not rely on code to explain your answer. Use a document editor to write your responses and paste graphs into the document. Use words, math, tables, and charts to explain your results – not code.
This week we will implement the methodologies we discussed in class. These methodologies will be crucial for work later.
Some routines might be available in your programming language, but you need to implement them once to help your understanding. If you rely on a package, you need to prove that it works as expected in this homework. Implementation of these routines will aide in your understanding of the concepts and help you troubleshoot errors later.
You may find that your implementation is faster than your package, or that the package is faster. If you have proved that the package implementation is faster, and works as needed, you may use it throughout this class.
Data for problems can be found in CSV files with this document in the class repository.
Problem 1
Use the stock returns in DailyReturn.csv for this problem. DailyReturn.csv contains returns for 100 large US stocks and as well as the ETF, SPY which tracks the S&P500.
Create a routine for calculating an exponentially weighted covariance matrix. If you have a package that calculates it for you, verify that it calculates the values you expect. This means you still have to implement it.
Vary λ ∈ (0, 1). Use PCA and plot the cumulative variance explained by each eigenvalue for each λchosen.
What does this tell us about values of λ and the effect it has on the covariance matrix?
Problem 2
Copy the chol_psd(), and near_psd() functions from the course repository – implement in your programming language of choice. These are core functions you will need throughout the remainder of the class.
Implement Higham’s 2002 nearest psd correlation function.
Generate a non-psd correlation matrix that is 500x500. You can use the code I used in class:
n=500
sigma = fill(0.9,(n,n))
for i in 1:n
sigma[i,i]=1.0
end
sigma[1,2] = 0.7357
sigma[2,1] = 0.7357
Use near_psd() and Higham’s method to fix the matrix. Confirm the matrix is now PSD.
Compare the results of both using the Frobenius Norm. Compare the run time between the two. How does the run time of each function compare as N increases?
Based on the above, discuss the pros and cons of each method and when you would use each. There is no wrong answer here, I want you to think through this and tell me what you think.
Problem 3
Using DailyReturn.csv.
Implement a multivariate normal simulation that allows for simulation directly from a covariance matrix or using PCA with an optional parameter for % variance explained. If you have a library that can do these, you still need to implement it yourself for this homework and prove that it functions as expected.
Generate a correlation matrix and variance vector 2 ways:
- Standard Pearson correlation/variance (you do not need to reimplement the cor() and var() functions).
- Exponentially weighted λ = 0. 97
Combine these to form 4 different covariance matrices. (Pearson correlation + var()), Pearson correlation + EW variance, etc.)
Simulate 25,000 draws from each covariance matrix using:
- Direct Simulation
- PCA with 100% explained.
- PCA with 75% explained.
- PCA with 50% explained.
Calculate the covariance of the simulated values. Compare the simulated covariance to it’s input matrix using the Frobenius Norm (L2 norm, sum of the square of the difference between the matrices). Compare the run times for each simulation.
What can we say about the trade offs between time to run and accuracy.