Jason Victor: CS 34 Project Proposal

Predicting excess equity returns from company fundamentals

Jason Victor

Goal

The goal of my project is to predict the excess return of a given stock. Excess return is defined as the return of a security over a given period minus the return of some benchmark (like the S&P 500 or the DAX) in the same period. This is a measure of the degree to which the market "favors" that company that is independent of changes in the overall stock market (and investor attitudes toward securities with that risk profile). Predicting excess returns to a market benchmark is key in constructing optimal portfolios.

However, while many quantiative analyses of the stock market take into account only technical factors–price movements and trade volumes, for example–I am attempting to make these predictions using fundamental information provided in the financial statements of the given enterprise. Therefore, predictions will be based on the financial standing of a company, not the performance of its stock or any other company's stock.

There has been research on predicting credit events of a company using ML and financial ratios, but credit events deal entirely with the top (debt) part of a company's capital structure. I am going to attempt to apply the same reasoning to making predictions about the bottom (equity) part of the capital structure.

Methods

I am mainly interested in kernel SVMs, since they have been used extensively in anticipating credit default events and because of their high performance. I will use k-fold cross validation to determine the optimal value for the kernel function's parameter.

However, I also plan on trying BP-NNs and random forests for comparison purposes. Random forests, in particular, may be more suitable to this task because there is some degree of transparency into how they reason that the user can intuitively comprehend. Based on the relative performance, I will decide which is better.

My code will be implemented in a combination of Python and R.

Data Sets

I will have to create my own data set using the publicly available financial statements on the SEC website in combination with the split-adjusted close price data from Yahoo! Finance. Once I have this information in a usable form, I can calculate the financial ratios (their definitions are provided here) as the ratios are likely to be more interesting input data than the pure values collected from the statements. I will also create input variables that relate the change of a given variable over time.

Timeline

Collect data from SEC and Yahoo!
Write parsers to convert the data into a usable form
Implement KSVM
Project milestone
Implement random forest
Implement BP-NN
Test different parameters and configurations
Test different inputs: ratios, changes, raw values, etc.
Determine quality of results