Abstract: Identification of Web Genres by User Warrant

Identification of Web Genres by User Warrant

by Mark A. Rosso and Stephanie W. Haas

In: Genres on the Web Computational Models and Empirical Studies
Alexander Mehler, Serge Sharoff and Marina Santini
Text, Speech and Language Technology
Volume 42, 2011, DOI: 10.1007/978-90-481-9178-9

The use of genre metadata has been proposed as a potentially beneficial supplement to general web search engines. A key issue in this solution is the selection of genre labels and definitions for web pages. What genres should be used in a general search engine? How are these genres to be identified? What are effective methodologies for collecting user terminology for the purpose of deriving web page genre labels? Three criteria for effective labels are proposed. In light of these criteria, traditional genre theory is applied to the web. The existing research literature is examined, focusing on the results of a series of studies in which the feedback of almost 300 users was solicited for the purpose of building a classification of genre labels for web pages from the .edu Internet domain. The chapter includes discussion of the implications of our findings for future studies of web genre, including recommendations for best practice.

1 Introduction
Genre is seen by many as a promising enhancement to the process of web search. The capability to specify or exclude certain types of web pages during a search is intuitively appealing. Historically, document type has proven to be a useful tool for document retrieval. […] A genre recognized as relevant to the user’s information need could be part of the user’s query formulation. For example, a user could specify that only documents of that genre be included in the search results; or, a user might decide to exclude from the search results documents of a genre deemed not to be useful. In either case, document genre is being used to constrain the search space, with the intent of improving the search results. In essence, part of the users’ task of filtering search results would be taken on by the system. [continue reading excerpts here or download PDF from here]

