Constructing an Eigenportfolio Using Principal Component Analysis

Eric Schlobohm

Background: Exchange-Traded Funds or ETFs have been traded since the early 1990's, but have only recently seen widespread use on major markets. They consist of a collection of assets and trade near their net asset value throughout the day, however unlike a mutual fund doesn't have its NAV calculated at the end of each day. Since 2008 top hedge funds have begun using PCA to determine an eigenportfolio using the principal asset components for statistical arbitrage.

I propose a method of PCA to further reduce the number of “assets” in an ETF to only those principal components for quicker trade execution and reduction of computation. In addition, finding the most volatile eigenstocks allows possible arbitrage on the derivatives market as the ETF will have less variance than the particular eigenportfolio of the principal components.

Method: My analysis will be done primarily in Matlab. I will use principal component analysis to determine an eigenportfolio of the principal asset components of specific ETFs. From there I will optimize the number of components in the portfolio to maximize correlation with past performance of these ETFs while minimizing computation necessities through regression analysis. I will then attempt to make predictions of future prices of these EFTs using the eigenportfolio models, though this is a secondary goal and possibly outside the scope of this project.

Data Set: I will use Matlab's Datafeed Toolkit in conjunction with Yahoo! Finance to collect historic training data from 1995 to 2010 for specific ETFs and the underlying assets, including SPY, VOO, and QQQ.

Timeline:

Week 1-2: Collect data and parse into useable format. Begin writing code for PCA.

Week 3-4: Run PCA on an ETF with small number of assets and reduce the number of principal assets of the ETF. For the milestone I expect to have a constructed eigenportfolio that contains fewer assets than the ETF itself, but still models its behavior closely.

Week 5-6: Run supervised learning regression of the PCA to decrease computation while still being highly correlated with the ETFs themselves.

Week 7-8: Finish work on optimization and create an eigenportfolio that approximates SPY with fewer component assets that has higher overall volatility than SPY.