Lex E., Juffinger A. and Granitzer M. (2010). A Comparison of Stylometric and Lexical Features for Web Genre Classification and Emotion Classification in Blogs. DEXA Workshops 2010.
Abstract: In the blogosphere, the amount of digital content is expanding and for search engines, new challenges have beenimposed. Due to the changing information need, automatic methods are needed to support blog search users to filter informationby different facets. In our work, we aim to support blog search with genre and facet information. Since we focus on thenews genre, our approach is to classify blogs into news versus rest. Also, we assess the emotionality facet in news related blogsto enable users to identify people’s feelings towards specific events. Our approach is to evaluate the performance of textclassifiers with lexical and stylometric features to determine the best performing combination for our tasks. Our experimentson a subset of the TREC Blogs08 dataset reveal that classifiers trained on lexical features perform consistently better thanclassifiers trained on the best stylometric features.
Jin-Cheon Na, Tun Thura Thet (2009). Effectiveness of web search results for genre and sentiment classification. J. Information Science 35(6): 709-726
Abstract: The motivation of this study is to enhance general topical search with a sentiment-based one where the search results (snippets) returned by the web search engine are clustered by sentiment categories. Firstly we developed an automatic method to identify product review documents using the snippets (summary information that includes the URL, title, and summary text), which is genre classification. Then the identified snippets were automatically classified into positive (recommended) and negative (non-recommended) documents, which is sentiment classification. Thereafter the user may directly decide to access the positive or negative review documents. In this study we used only the snippets rather than their original full-text documents, and applied a common machine learning technique, SVM (support vector machine), and heuristic approaches to investigate how effectively the snippets can be used for genre and sentiment classification. The results show that the web search engine should improve the quality of the snippets especially for opinionated documents (i.e. review documents).