Articles Comments

The WebGenre Blog: The power of genre applied to digital information. By Marina Santini » Archive

Seminar – Towards Contextualized Information: How Automatic Genre Identification Can Help

Seminar Series Laboratory for Cognition, Interaction and Language Technology (CILTLab) Linköping University, Linköping, Sweden, Tuesday 28 August 2012 Abstract: Genre is one of the textual dimensions that can be used to reconstruct the communicative context needed to assess the value of information with respect to a purpose (business, learning, finding, monitoring, predicting, etc.). When we know the genre of a text, we can surmise the CONTEXT where a text has been created and for which purpose. Therefore we can more confidently decide whether a text contains the information we are looking for. For example, factual texts might have more credibility than opinionated texts. In this respect, genres such as press conferences, declarations or announcements by a White House spokesman might be more reliable than subjective genres, e.g. newspapers’ editorials or op-ed articles. On the … Read entire article »

Filed under: abstracts, announcements, seminars

Towards Language–Independent Web Genre Detection (2009)

Poster paper by : Philipp Scholl, Renato Domínguez García, Doreen Böhnstedt, Christoph Rensing, Ralf Steinmetz The term web genre denotes the type of a given web resource, in contrast to the topic of its content. In this research, we focus on recognizing the web genres blog, wiki and forum. We present a set of features that exploit the hierarchical structure of the web page’s HTML mark-up and thus, in contrast to related approaches, do not depend on a linguistic analysis of the page’s content. Our results show that it is possible to achieve a very good accuracy or a fully language independent detection of structured web genres. … Read entire article »

Filed under: abstracts

Abstract: Evolving Genres in Online Domains: The Hybrid Genre of the Participatory News article

Evolving Genres in Online Domains: The Hybrid Genre of the Participatory News article by  Ian Bruce In: Genres on the Web Computational Models and Empirical Studies Alexander Mehler, Serge Sharoff and Marina Santini Text, Speech and Language Technology Volume 42, 2011, DOI: 10.1007/978-90-481-9178-9 Abstract Cognitive science proposes that any category, such as a genre as a category for a certain type of text, is formed in relation to human purpose or intentionality. Grouped in relation to three types of high level, general purpose for academic writing, Young posits three broad categories of genre: those of personal discourse (such as diaries, journals, notebooks); interactive discourse (letters, emails, fora in publications and other written messages) and public discourse (articles, reports, presentations). However, an outcome of internet-based communication and publication has often been to con ate these general types of … Read entire article »

Filed under: abstracts

Abstract: Variation Among Blogs: A Multi-dimensional Analysis

Variation Among Blogs: A Multi-dimensional Analysis by Jack Grieve, Douglas Biber, Eric Friginal, and Tatiana Nekrasova In: Genres on the Web Computational Models and Empirical Studies Alexander Mehler, Serge Sharoff and Marina Santini Text, Speech and Language Technology Volume 42, 2011, DOI: 10.1007/978-90-481-9178-9 Abstract This chapter uses multi-dimensional analysis to investigate functional  linguistic variation in internet blogs, with the goal of identifying text types that are  distinguished linguistically. A 2 million word corpus of blogs written in American  English, sampled across a wide range of topics, is analyzed for this purpose. The  corpus is tagged for grammatical information and a factor analysis is carried out to  identify the major linguistic patterns of co-occurrence across this corpus. The  resultant factors are interpreted as underlying dimensions of functional linguistic  variation. The dimensions are subsequently used as predictors in a … Read entire article »

Filed under: abstracts

Abstract: Genre Emergence in Amateur Flash

Genre Emergence in Amateur Flash by John C. Paolillo, Jonathan Warren and Breanne Kunz In: Genres on the Web Computational Models and Empirical Studies Alexander Mehler, Serge Sharoff and Marina Santini Text, Speech and Language Technology Volume 42, 2011, DOI: 10.1007/978-90-481-9178-9 Abstract Research on genre emergence in digital media often characterizes the emergence of new genres using notions of “community” and “social interaction”. In this chapter, we attempt to provide empirical content to these notions by employing a social network approach. We examine Flash animations posted to Newgrounds.com, in terms of both genre features and favorite author nominations. Results indicate that participants’ social network positions are strongly associated with the genres of Flash they produce. We argue from these findings that the social network positions of Flash authors contribute to the establishment of genre norms, and that … Read entire article »

Filed under: abstracts

Abstract: Mining Graph Patterns in Web-based Systems: A Conceptual View

Mining Graph Patterns in Web-based Systems: A Conceptual View by Matthias Dehmer and Frank Emmert-Streib In: Genres on the Web Computational Models and Empirical Studies Alexander Mehler, Serge Sharoff and Marina Santini Text, Speech and Language Technology Volume 42, 2011, DOI: 10.1007/978-90-481-9178-9 Abstract This chapter discusses a graph-based perspective for automatically analyzing web genre data by mining graph patterns representing web-based hypertext structures. The major purpose of our contribution is to emphasize that an approach entirely different to the vector space model, frequently used in Web mining and related problems, can not only be applied to these problems but is more suitable conceptually. The graphs in our study are hierarchical and directed and are called generalized trees. Starting from a similarity measure for determining the structural similarity of generalized trees, we discuss some evaluation steps for automatically … Read entire article »

Filed under: abstracts

Abstract: Classification of Web Sites at Super-genre Level

Classification of Web Sites at Super-genre Level by Christoph Lindemann and Lars Littig In: Genres on the Web Computational Models and Empirical Studies Alexander Mehler, Serge Sharoff and Marina Santini Text, Speech and Language Technology Volume 42, 2011, DOI: 10.1007/978-90-481-9178-9 Abstract We present an approach for the classification of Web sites at supergenre level. This approach utilizes both structure and content of Web sites in order to distinguish between eight relevant Web genres. We show that this combination of structural and content-based features considerably improves the classification performance compared to approaches solely based on structure or content. We evaluate our approach on a dataset comprising more than 16,000 Web sites with about 20 million crawled and 100 million known pages. The approach achieves an accuracy of 92% for the classification of these Web sites. … Read entire article »

Filed under: abstracts

Abstract: Marrying Relevance and Genre Rankings: an Exploratory Study

Marrying Relevance and Genre Rankings: an Exploratory Study by Pavel Braslavski Amazon.com Widgets Amazon.co.uk Widgets In: Genres on the Web Computational Models and Empirical Studies Alexander Mehler, Serge Sharoff and Marina Santini Text, Speech and Language Technology Volume 42, 2011, DOI: 10.1007/978-90-481-9178-9 Abstract In this chapter, we discuss different options for using genre-related information inWeb search. We conduct an experiment on merging genre-related and text-relevance rankings using a reference Web collection. A method for automatic extraction of formality score akin to readability score using canonical discriminant analysis applied to a sample of genres with decreasing formality is proposed. Effects of aggregating genre-related and text relevance rankings are considered. Evaluation of the results shows moderate positive effects. Findings suggest that further research is needed on implicit use of genre-related information in Web search. … Read entire article »

Filed under: abstracts

Abstract: Web Genre Analysis: Use Cases, Retrieval Models, and Implementation Issues

Web Genre Analysis: Use Cases, Retrieval Models, and Implementation Issues by Benno Stein, Sven Meyer zu Eissen and Nedim Lipka Amazon.com Widgets Amazon.co.uk Widgets In: Genres on the Web Computational Models and Empirical Studies Alexander Mehler, Serge Sharoff and Marina Santini Text, Speech and Language Technology Volume 42, 2011, DOI: 10.1007/978-90-481-9178-9 Abstract People who search the World Wide Web often have a multi-faceted understanding of their information need: they know what they are searching for, and they know of which form or type the desired documents should be. The former aspect relates to the content of a desired document (= topic), the latter to the presentation of its content and the intended target group. … Read entire article »

Filed under: abstracts

Abstract: In the Garden and in the Jungle: Comparing Genres in the BNC and Internet

In the Garden and in the Jungle: Comparing Genres in the BNC and Internet by Serge Sharoff In: Genres on the Web Computational Models and Empirical Studies Alexander Mehler, Serge Sharoff and Marina Santini Text, Speech and Language Technology Volume 42, 2011, DOI: 10.1007/978-90-481-9178-9 Abstract In this chapter I will present an approach to classifying the Web into genres. The goal is to have a compact system of categories that can be assigned with little ambiguity to almost every webpage. The proposed typology is organised from the functional viewpoint: generalised categories for genre classification correspond to major aims of text production, such as `discussion’ or `instruction’. This chapter compares the genre distributions in English and Russian automatically constructed Internet corpora against their human-collected counterparts (BNC and RNC) in terms of these classes using probabilistic classifiers. … Read entire article »

Filed under: abstracts