This paper introduces an approach to Twitter sentiment analysis, with the task of classifying tweets as positive, negative or neutral. In the preprocessing task, we propose a method to deal with two problems: (i) repeated characters in informal expression of words; and (ii) the affect of contrast word in determining sentence polarity. We propose features used in this task and investigate an optimal method of using these features. Classification algorithms including Decision Tree, K Nearest Neighbor, Support Vector Machine, and Adaboost are used for implementing the system. Experiment results with Twitter 2016 test dataset shown that our system achieved good results (63.7% F1-score) compared to related research in this field.
Keyword
twitter, sentiment analysis, word embedding, decision tree, kNN, SVM, Adaboost