Book Review: Genres on the Web (Mehler et al. 2010)

Post signed by: Michael Oakes, University of Sunderland and Uni Research, Bergen.

Excerpts from the book review: Alexander Mehler, Serge Sharoff and Marina Santini (eds), Genres on the Web: Computational Models and Empirical Studies. Springer, 2010, 362 pp.

Language Resources and Evaluation, Volume 45 / 2011

This comprehensive book makes many original contributions to the field of genres on the web. The identification and characterization of genres is of obvious interest to “pure” linguistics, but as this book makes clear, there are some important practical applications. Chief amongst these will be the advent of genre-aware search engines, where users will be able to specify not only their topics of interest, but the desired genre of the returned web pages, as in the WEGA search engine described in this book by Stein, Meyer zu Eissen and Lipka in Chapter 8. In Chapter 4, Crowston, Kwaśnik and Rubleske give the example of someone wishing to buy a digital camera. A traditional search engine would return pages on the topic of the specified brand of digital cameras, most of which will just be the web sites of sellers. But what the buyer really wants is information about this type of camera in certain genres only, such as product reviews and opinion-bearing blogs, which provide the opinions of people who have already bought that camera. The idea of genre-aware search engines is already commercially viable, as shown by Google Scholar and Google News, but existing systems tend to work for only one genre. What is needed is search engines which can cope with the entire likely range (or “palette”) of genres that the users might ask for, where a genre palette is simply a set of genres. The term “palette” was first introduced by Jussi Karlgren to denote the set/list of genres that were used in his quest to find the ideal genres to use in automatic classification algorithms.

This is not a text book on the construction of standard search engines, but tells you everything you need to know (or at least, how to find out what you need to know) additional to this to make the search engine aware of genre. The book shows how such search engines would move beyond the “bag of words” model traditionally used, to include more linguistically-motivated features, aspects of the visual layout, analysis of the links between web pages, and the relationships between the writers and readers of material on the web.

There is no universally-agreed definition of “genre”. In Chapter 6, Kim and Ross describe genres in terms of “forms of dissemination”, such as scientific papers, emails, blogs, news reports, FAQ pages, job descriptions, editorials, and reportage. Other examples of genres are calls for papers, sitemaps, job descriptions, CVs, syllabuses, and e-shops, where e-shops are an example of a newly emerged genre on the web. Several authors use an “ethnographic” definition (see Chapter 13 by Paolillo, Warren and Kunz), where writers and readers create shared expectations about the form and content of the pages. Writing is made easier since the writer knows what the readers expect, and the cognitive burden of reading and understanding is lessened since the readers know what they are looking for. Thus Karlgren (Chapter 2) states that genre is “a form of implicit agreement between readership and authorship”, bringing them closer together.  Stein et al. say that genre provides information related to the document’s form, purpose, and intended audience. […]

Overall I strongly recommend this book. It will appeal to linguists looking at new types of language emerging on the web, corpus linguists who wish to build genre‐based corpora, and to information retrieval specialists who wish to go further than the current limits of web search. It is destined to
become a classic research text, especially if genre‐aware search engines “take off”.

GoWebcompressed.jpglanguageResourceEvaluation.pngBook Review: Language Resources and Evaluation, Volume 45 / 2011


Leave a Reply

Your email address will not be published. Required fields are marked *