by Alexander Osherenko, Socioware Development, firstname.lastname@example.org
Considering Impact of Sociolinguistic Findings in Believable Opinion Mining Systems
Proceedings of The Fifth International Conference On Cognitive Science. 2012. Kalinigrad, Russia (http://www.informatik.uni-augsburg.de/~osherenk/final_kalinigrad.pdf)
Opinions are frequent means of communication in human society and automatic approaches to opinion mining in texts attracted therefore much attention. All in all, most approaches apply data mining techniques and extract lexical features (words) as reliable means of classication. Noteworthy that although the interest in opinion mining is huge, there are only few explorations on words extracted in opinion mining. This study considers this drawback and elaborates on a sociolinguistic explanation. We hypothesize: an opinion mining system should be trained for classifying opinions in texts of the same language style. Hence, this contribution focuses on the following questions: 1) do sociolinguistic aspects of corpora, for example, their colloquiality or literariness, infuence classication results; 2) how should reliable opinion mining systems train to obtain trustworthy classication results.
In the study, 4 text corpora of the same (emotional) domain: the Sensitive Artificial Listener (SAL) corpus , the Berardinelli movie reviews corpus (BMRC), Pang movie reviews corpus (PMRC) , the Corpus with product reviews (CwPR).
The table above shows results of classication of non-sociolinguistic and sociolinguistic datasets of different corpora (column Corpus)
where R0 and R values refer to recall values averaged over classes for the non-sociolinguistic and sociolinguistic datasets respectively, and the
CN0 and CN columns specify the corresponding classes-number values.
Results show that sociolinguistic aspects affect classication results.
Read full paper here.
Feel free to share your comments, thoughts and different experiences with us.