Abstract: Problems in the Use-Centered Development of a Taxonomy of Web Genres

Problems in the Use-Centered Development of a Taxonomy of Web Genres
by Kevin Crowston, Barbara Kwasnik and Joseph Rubleske

In: Genres on the Web Computational Models and Empirical Studies
Alexander Mehler, Serge Sharoff and Marina Santini
Text, Speech and Language Technology
Volume 42, 2011, DOI: 10.1007/978-90-481-9178-9

A document’s genre refleects the purpose of a document and as such is potentially useful meta-data to improve search effectiveness. Using genre in an information retrieval system seems to require a taxonomy of genres to provide a controlled vocabulary and to show relations among genres. In this chapter, we report on a study to develop a `bottom-up’ genre taxonomy, that is, from the genre terms identified by informants. We collected a total of 767 genre terms from 52 respondents (teachers, journalists and engineers) engaged in natural use of the Web, and reduced this list to a set of 298 genres. We report on various difficulties we encountered in the study. Respondents frequently had difficulty coming up with an unambiguous genre label for a page, offering several possibilities, or applied the same label to many pages. In many cases, respondents could not think of a term, or applied an overly general term, such as an information page. Furthermore, even when respondents did offer a clear genre term, they often were unable to say what about the page led to that choice. These difficulties seem to re ect underlying problems in the definition of genres as social constructions, that have meaning only in use.

1 Introduction
Web search engines such as Google or Yahoo determine relevance of Web pages according to the occurrence of words in the pages indexed by the engine (additional information is then used to rank these results). Unfortunately, such searches are not always sufficient to solve information needs since taskdriven searchers often must distinguish between documents that share a set of keywords (i.e., a topic) but assume a different form to serve a different purpose or function. For example, before purchasing a digital camera, an individual may want to read reviews from online magazines and see the blogs in which people who have used this camera express their opinions and personal stories.  [Continue reading excerpts here or download PDF from here]

