Riding the Rough Waves of Genre on the Web
Concepts and Research Questions
Marina Santini, Alexander Mehler, Serge Sharoff

In: Genres on the Web Computational Models and Empirical Studies
Alexander Mehler, Serge Sharoff and Marina Santini
Text, Speech and Language Technology
Volume 42, 2011, DOI: 10.1007/978-90-481-9178-9

1 Why is Genre Important?
Genre, in the most generic definition, takes the meaning “kind; sort; style” (OED). A more specialised definition of genre in OED reads: “A particular style or category of works of art; esp. a type of literary work characterised by a particular form, style, or purpose.”. Similar definitions are found in other dictionaries, for instance, OALD reads “a particular type or style of literature, art, film or music that you can recognise because of its special features”. Broadly speaking, then, generalising from lexicographic definitions, genre can be seen as a classificatory principle based on a number of characterising attributes. Traditionally, it was Aristotle, in his attempt to classify existing knowledge, who started genre analysis and defined some attributes for genre classi fication. Aristotle sorted literary production into different genre classes by focussing on the attributes of purpose and conventions. After him, through the centuries, numberless definitions and attributes of the genre of written documents have been provided in differing fields, including literary criticism, linguistics and library and information science. With the advent of digital media, especially in the last 15 years, the potential of genre for practical applications in language technology and information technology has been vigorously emphasised by scholars, researchers and practitioners. But why is genre important? The short answer is: because it reduces the cognitive load by triggering expectations through a number of conventions. Put in another way, genres can be seen as sets of conventions that transcend individual texts, and create frames of recognition governing document production, recognition and use. Conventions are regularities that affect information processing in a repeatable manner. Regularities engage predictions about the “type of information” contained in the document. Predictions allow humans to identify the communicative purposes and the context underlying a document. Communicative purposes and context are two important principles of human communication and interactions. In this respect, genre is then an implicit way of providing background information and suggesting the cognitive requirements needed to understand a text. For instance, if we read a sequence of short questions and brief answers (conventions), we might surmise that we are reading FAQs (genre); we then realize that the purpose of the document is to instruct or inform us (expectations) about a particular topic or event of interest. When we are able to identify and name a genre thanks to a recurrent set of regular traits, the functions of the document and its communicative context immediately build up in our mind. Essentially, knowing the genre to which a text belongs leads to predictions concerning form, function and context of communication. All these properties together define what Bateman (2008: 196) calls the “the most important theoretical property” of genre for empirical study, namely the power of predictivity. The potential of predictivity is certainly highly attractive when the task is to come to terms with the overwhelming mass of information available on the web.

1.1 Zooming In: Information on the Web

The immense quantity of information on the web is the most tangible benefit (and challenge) that the new medium has endowed us as web users. This wealth of information is available either by typing a URL (suggested by other web external or web internal sources) or by typing a few keywords (the query) in a search box. The web can be seen as the Eldorado of information seekers. However, if we zoom in a little and focus our attention on the most common web documents, i.e. written texts, we realize that finding the \right” information for one’s need is not always straightforward. Indeed, a common complaint is that users are overwhelmed by huge amounts of data and are faced with the challenge of finding the most relevant and reliable information in a timely manner. For some queries we can get thousands of hits. Currently, commercial search engines (like Google and Yahoo!) do not provide any hint about the type of information contained in these documents. Web users may intuit that the documents in the result list contain a topic that is relevant to their query. But what about other dimensions of communication? As a matter of fact, Information Retrieval (IR) research and products are currently trying to provide other dimensions. For instance, some commercial search engines provide specialised facilities, like Google Scholar or Google News. IR research is active also in plagiarism detection, in the identification of context of interaction and search, in the identification of the “sentiment” contained in a text and in other aspects affecting the reliability, trust, reputation and, in a word, the appropriateness of a certain document for a certain information need. Still, there are a number of other dimensions that have been little explored on the web for retrieval tasks. Genre is one of these. The potential of genre to improve information seeking and reduce information overload was highlighted a long time ago by Karlgren and Cutting (1994) and Kessler et al. (1997). [Continue reading excerpts here… or dowload the pdf from here]

