Post signed by: Alexander Osherenko, University of Augsburg
I gained a comprehensive knowledge in emotion recognition in texts in my PhD thesis “Opinion mining and lexical affect sensing” (http://www.informatik.uni-augsburg.de/~osherenk/promotionsvortrag_english.pdf). In my opinion, this knowledge can be utilized for identifying genres of texts — I don’t think identifying emotions differs much from identifying genres.
There are two basic categories of approach to recognize emotions in texts: a semantic approach and a statistical approach. There are also another categories of approach, for example, information fusion. However, I discuss only the first two for simplicity!
In the semantic approach specific patterns of text parts are identified and used as cues of opinions. Jan Wiebe has done much work on identifying affective patterns: she uses emotional corpora to compose a list of emotional patterns. In the genre-related case, you would use a genre-related corpora to construct this list and identify genre-specific patterns. For example, the pattern “<verb> roses” can be identified as the cue “throw roses” of the genre “love story” (excuse me for this example).
Another case is the statistical approach. Here, you compose datasets with features and classify the datasets using mathematical algorithms, for example, NaiveBayes or SVM. The only difference in contrast to emotion recognition: you have to consider genres as results of recognition and look what happens. I would extract the same features as in the emotion recognition for identifying genres: lexical (Bag-Of-Words), grammatical, stylometric, deictic. If you ask why I suggest to use, for example, stylometric features to identify genres I remind you on the work of psychologist Pennebaker who argues that also function words can be used for expressing meaning.
What is better for analysis of genre? In the context of opinion mining, the semantic approach was more beneficial to identification of emotions in short texts (a sentence). The statistical approach was beneficial for analysis of long texts (more than 200 words). I assume that the same applies to genre identification: genre of short texts should be analyzed by semantic approach; statistical approach can classify genres of long texts.