We explore an extreme case of text classification. The short statements in micro-blogs were collected, and were associated by a category based on the sentiment indicated by the associated icons. We evaluated different methods that assigned the categories with just the wordings in the short statements. Short statements in micro-blogs are harder to classify because of the shortage of context, yet it is not rare for the statements to include words that may be linked to sentiments directly. In this work, we considered two polarities of sentiments: negative and positive. We employed the statistical information about the word usage, a dictionary for Chinese synonyms, and an emotional phrases dictionary to convert short statements into vectors, and applied techniques of support vector machines and probabilistic modeling for the classification task. The results of classification varied with the classification methods and experimental setups. The best one exceeded 80%, but the lowest just made 55%.
Proceedings of the 22nd Conference on Computational Linguistics and Speech Processing, ROCLING 2010