This paper proposes a novel framework to build a time-series-oriented lexicon which can cover di erent types of sources and also has explicit links with the targets of prediction problems. In the framework, the input is composed of a text stream, such as nancial news and a nancial time series, such as the stock prices of a company. We then calculate the Pearson correlation between the frequency series of each word and the stock price series of a company. Although Pearson correlation gives a good idea of how much the two time series are correlated, it has a limitation in capturing the similarity when one of the series is stretched or shifted. To overcome this limitation, we adopt Dynamic time warping (DTW) to handle the problem. Finally, the words with high correlations will be extracted to build the time-series-oriented lexicon.
Proceedings of the 35th International Symposium on Forecasting (ISF '15), 2015