Can Twitter Predict the Bond Market?

By: Taylor Sipple and Andrew Hannigan

Problem

Predicting the future price of securities is a foundational problem in quantitative finance. In recent years, new social media technologies have emerged as outlets for individuals to express their current mood, thoughts, and ideas. These new sources of data could contain valuable information about the general mood of the public and sentiment surrounding specific economic issues. Assuming that prices of government bonds are influenced by public mood and sentiment toward those countries, then perhaps we can extract some correlation between sentiment or mood features on social media and bond market price fluctuations.

Bollen et. al. tested this theory by extracting general mood sentiments from twitter posts and correlating those moods fluctuations to changes in the Dow Jones Industrial Average, achieving an 87.6% success rate in predicting the daily directional change of the DJIA [1]. Another study looked at correlations between sentiment towards specific companies and the price fluctuations in that company’s stock, and created a trading model that outperformed the DJIA [2].

Method

We will begin by pre-processing the Twitter data to extract a set of features that characterize the general mood of the Twitterverse. We will accomplish this by analyzing the Tweets using the Google-Profile of Mood States and OpinionFinder[3] methods detailed by Bollen et. al [1], which both seek to classify commonly used words into certain mood states. We will look at three different subsets of tweets. We will analyze the general sentiment of all Tweets, the sentiment of the subset of Tweets whose users are from country C (for all countries), and the sentiment of the subset of Tweets about country C (for users of all countries). Using these features, we are interested in investigating if Twitter sentiment corresponds with the price of government bonds. We are also interested in finding out if the amount of activity about a country on Twitter, as measured by these features, is correlated with the amount of trading activity surrounding that country’s government bonds.

To explore the relationship between Twitter sentiment and bond prices, we will start by computing the cross-correlation coefficient at different time lags as described by Ruiz et. al [2]. This will give us a very broad overview of the relationship between the data streams, and the time-lagging will allow us to look for any structural lagging/leading behavior existing between the data sets. Next we will run a Granger Causality test[4] which will look deeper at this temporal relationship and seek to determine if the Twitter data is useful in forecasting movements in the bond market. Finally, our ultimate goal is to implement the Self-organizing Fuzzy Neural Network[5] that gave Bollen et. al [1] the 87.6% predictive success, which will give us the opportunity to explore the non-linear relationships inherent in the data.

Data Set

Stanford University has a database of Tweets containing over 450 million tweets [6]. Bond price data will be drawn from Yahoo! Finance[7] and the CME Group[8].

Milestone Goal

By the milestone we will have preprocessed our Twitter data as described above and will have extracted a number of meaningful sentiment features from our twitter dataset. We will have computed the cross-correlation coefficients between these features and price fluctuation of various government bonds. We will also have attempted to demonstrate Granger Causality of certain twitter sentiment features on bond price changes. Following the milestone, we will shift our attention to the construction of a Neural Network in an attempt to develop a predictive non-linear model for the future price of government bonds.

References

[1] Bollen, J., Mao, H. and Zeng, X.-J. 2010. Twitter mood predicts the stock market. Journal of Computational Science. http://arxiv.org/pdf/1010.3003v1.pdf

[2] Eduardo J. Ruiz et al., 2012. Correlating Financial Time Series with Micro-Blogging Activity. http://www.cs.ucr.edu/~vagelis/publications/wsdm2012-microblog-financial.pdf.

[3] "OpinionFinder." MPQA. University of Pittsburg, n.d. Web. 12 Apr 2012. http://www.cs.pitt.edu/mpqa/opinionfinder.html.

[4] Granger, C.W.J. "Investigating Causal Relations by Econometric Models and Cross-spectral Methods." Econometrica. 37.3 (1969): 424-38. Web. 12 Apr. 2012. http://www.jstor.org/stable/1912791.

[5] Tung, W.L.; Quek, C.; , "GenSoFNN: a generic self-organizing fuzzy neural network," Neural Networks, IEEE Transactions on , vol.13, no.5, pp. 1075- 1086, Sep 2002 doi: 10.1109/TNN.2002.1031940. http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=1031940&isnumber=22161.

[6] Leskovec, Jure. "Network datasets: 476 Million Twitter tweets." Stanford Network Analysis Project. Stanford University, n.d. Web. 12 Apr 2012. http://snap.stanford.edu/data/twitter7.html.

[7] http://finance.yahoo.com/

[8] http://www.cmegroup.com/