Financial Data Mining Literature“Data Mining” in computer science means analysing large data arrays. It incorporates many sciences, such as statistics, information systems and artificial intelligence. Due to exponential increase in computing power and the availability of information it is used more in recent years.
Janice Wood (2011) wrote that a team of scientists from University of Vermont analysed 3 years of twits from Twitter website and concluded that the level of happiness had decreased. They assigned weights to different words, such as “laughter” and “greed”. The overall composite score showed them the dynamics of happiness expression, which they used to make conclusions.
One of the fields data mining works with is finance. Similarly to the happiness analysis, a program can analyse words in news articles from popular financial newspapers and RSS feeds.
Nagar and Hahsler (2012) analysed News Sentiment using R to forecast share price returns. They implemented a series of filters, which analysed whether the news is recent, important, relevant and reliable and then it assigned weightings accordingly. They then checked for sentences within the article that included the stock symbol of a company and extracted relevant words. The number of positive and negative words was calculated and divided by the total number of key words. The resulting correlation between the word ratio and the stock performance is striking.
Picture 1 – Word ratio and stock performance visual comparison
(Nagar and Hahsler 2012, pp. 15)
|
Themos Kalafatis (2009) wrote a similar program, which analysed headlines. The news were filtered into “Important” and “Unimportant” categories. Important news was then filtered into 16 packs based on the words used in the headlines, such as “futures”, “yen” or “Fed”. These word combinations then gave a trading recommendation, such as “buy” or “sell”.
Aase (2011) analysed whether it is possible to forecast stock price immediately after a news article was released by using automated data filters. Author’s program has four steps. First the news articles are obtained from the internet. Then they are filtered on the stock name basis. Then the text within the news articles is processed. The final step is the implementation of a trade accordingly in a simulation.
During the third step, after the text is analysed the news is categorized as “positive”, “neutral” or “negative”. The trading simulation performed in the study showed that most of the trades were profitable, especially when short-selling due to negative news arrival. It also outperformed buy-and-hold strategy, which was used as a control strategy.
Schumaker and Chen (2008) compared text analysis and regression forecasting performance. Their results showed that text analysis performed significantly better in terms of direction forecasting and trading simulation performance. The trading simulation analysed news and instantly traded in accordance to the article, which forecasted 20 minutes of stock return in response to the news.
Date: 2015-12-17; view: 1121
|