Evaluative Assignment: 098 Correlation Matrix
For this assignment, you will write the first step toward your final evaluative assignment. Ultimately, you will write a program that, given a universe of assets, creates the optimal portfolios by calculating the efficient frontier, as described in your portfolio theory module.
For this part, you will calculate the correlation matrix for a universe (set) of assets, given historical price data. At a high level, your program should:
- Read historical price data from a file.
- For each asset, calculate the rate of return for each time step.
- For each asset, calculate the average return and standard deviation.
- Calculate the covariance matrix for all the assets.
- Calculate the correlation matrix for all the assets.
The system will read an input file specified as a command-line argument. This file is a CSV (comma-separated values) file with a header row. The first column in the header row is the label for the “time index” field. The remaining columns are asset names. (A variable number of assets may be present. At least one asset name will be present.) The remaining rows specify the “time index”, followed by the price of each asset at that time. You may assume that there are no missing rows. Note in year.csv that some of the prices are null. Your program should handle null or non-numeric data in some fields by just repeating the previous valid price for that asset. (Of course, if there is no valid data in a column, that is an error – print a message to stderr and exit with EXIT_FAILURE.)
Time,A,B 0,2193.03,24848.53 1,2291.72,25723.20 2,2349.01,26955.34 3,2373.67,27963.47 4,2297.72,28385.72
Sample Input File: small.csv
For each change in the “time index” (i.e., row - it does not matter if the data is daily, monthly, or something else), you should compute the rate of return. The rate of return can be calculated as
Rates of returns for A in the sample input file: 0.04500166, 0.02499869, 0.01049804, -0.0319969
Once you have the rates of return, compute the average return and standard deviation for that asset. The standard deviation is calculated as follows: (N = # of time “clicks”, which is 1 less than the number of data records (excluding first row))
For asset A: !$̅ =0.01212538 7 = 0.03263941 Next, you’ll need to create the covariance matrix, where each element is the covariance of two assets at that row and column. So the covariance for assets 0 and ;:
Covariance Matrix =
Finally, you will need to compute the correlation matrix.
Correlation Matrix =
Recall that -1 < A),+ < 1, where positive correlation means assets change in the same direction, and negative correlation means assets change in opposite directions. Therefore, the correlation of an asset with itself should be exactly 1. Note, however, that these formulas will not give you exactly A),) = 1 but will approach 1 for large time series of data. For this project, let A),) be exactly 1 instead of doing the correlation computation.
Your program should print the result to stdout as follows. list of assets, separated by newlines [correlation matrix] Note: The matrix must be formatted with open and close square brackets and commadelimited values, such that each floating-point number has 7 spaces and four digits after the decimal point. See ios_base::width, setprecision, and fixed in the C++ library. The files small.out and year.out contain sample output.
- Create a Makefile that compiles your code to an executable named “correl_matrix”.
- Your executable takes exactly 1 command-line argument – the name of the file to read. The specification is described in the input section. Two sample files have been provided: small.csv and year.csv
For full credit, your program must valgrind cleanly. Of course, you should test your program on many more inputs than those provided. You will also be graded on code quality. This means your code should make good use of abstraction, have good variable, function, and class names, be well commented and formatted, and have at least one class definition. While you are free to implement this in any way that is reasonable, here are some recommendations:
• Create an Asset class o State: name, time series rate of return, average return, and standard deviation. o Behavior: calculate the average return, standard deviation, covariance, correlation • A mechanism to represent a matrix. This can be another class, typedef a vector of vectors of doubles, or just double **matrix. • Separate your source code into multiple files. One idea is to have files: main.cpp, parse.cpp, asset.hpp, and asset.cpp.