Articles Comments

The WebGenre Blog: The power of genre applied to digital information. By Marina Santini » Archive

Working Paper: A Web Corpus for eCare: Collection, Annotation and Learning – Preliminary Results

We are building a small corpus of medical documents from the web. The sublanguage used in these web documents has been annotated as “lay” or “specialised” by two annotators (annotation still ongoing). We would like to use this corpus for bootstrapping (semi-supervised and weakly- supervised learning), for lay-specialized terminology extraction, for the automatic identification of related terms, and for similar tasks. If you have suggestions and hints to interesting new directions in this field of research, we would gladly hear from you. Abstract and link to the paper follow. A Web Corpus for eCare: Collection, Annotation and Learning- Preliminary Results – DRAFT: 20 March 2017 by Marina Santini, Marjan Alirezai, Mikael Nyström, and Arne Jönsson Abstract We present eCare Sv, Beta, a small corpus of web documents written in Swedish. The content of … Read entire article »

Filed under: abstracts

Seminar – Towards Contextualized Information: How Automatic Genre Identification Can Help

Seminar Series Laboratory for Cognition, Interaction and Language Technology (CILTLab) Linköping University, Linköping, Sweden, Tuesday 28 August 2012 Abstract: Genre is one of the textual dimensions that can be used to reconstruct the communicative context needed to assess the value of information with respect to a purpose (business, learning, finding, monitoring, predicting, etc.). When we know the genre of a text, we can surmise the CONTEXT where a text has been created and for which purpose. Therefore we can more confidently decide whether a text contains the information we are looking for. For example, factual texts might have more credibility than opinionated texts. In this respect, genres such as press conferences, declarations or announcements by a White House spokesman might be more reliable than subjective genres, e.g. newspapers’ editorials or op-ed articles. On the … Read entire article »

Filed under: abstracts, announcements, seminars

Towards Language–Independent Web Genre Detection (2009)

Poster paper by : Philipp Scholl, Renato Domínguez García, Doreen Böhnstedt, Christoph Rensing, Ralf Steinmetz The term web genre denotes the type of a given web resource, in contrast to the topic of its content. In this research, we focus on recognizing the web genres blog, wiki and forum. We present a set of features that exploit the hierarchical structure of the web page’s HTML mark-up and thus, in contrast to related approaches, do not depend on a linguistic analysis of the page’s content. Our results show that it is possible to achieve a very good accuracy or a fully language independent detection of structured web genres. … Read entire article »

Filed under: abstracts

Abstract: Evolving Genres in Online Domains: The Hybrid Genre of the Participatory News article

Evolving Genres in Online Domains: The Hybrid Genre of the Participatory News article by  Ian Bruce In: Genres on the Web Computational Models and Empirical Studies Alexander Mehler, Serge Sharoff and Marina Santini Text, Speech and Language Technology Volume 42, 2011, DOI: 10.1007/978-90-481-9178-9 Abstract Cognitive science proposes that any category, such as a genre as a category for a certain type of text, is formed in relation to human purpose or intentionality. Grouped in relation to three types of high level, general purpose for academic writing, Young posits three broad categories of genre: those of personal discourse (such as diaries, journals, notebooks); interactive discourse (letters, emails, fora in publications and other written messages) and public discourse (articles, reports, presentations). However, an outcome of internet-based communication and publication has often been to con ate these general types of … Read entire article »

Filed under: abstracts

Abstract: Variation Among Blogs: A Multi-dimensional Analysis

Variation Among Blogs: A Multi-dimensional Analysis by Jack Grieve, Douglas Biber, Eric Friginal, and Tatiana Nekrasova In: Genres on the Web Computational Models and Empirical Studies Alexander Mehler, Serge Sharoff and Marina Santini Text, Speech and Language Technology Volume 42, 2011, DOI: 10.1007/978-90-481-9178-9 Abstract This chapter uses multi-dimensional analysis to investigate functional  linguistic variation in internet blogs, with the goal of identifying text types that are  distinguished linguistically. A 2 million word corpus of blogs written in American  English, sampled across a wide range of topics, is analyzed for this purpose. The  corpus is tagged for grammatical information and a factor analysis is carried out to  identify the major linguistic patterns of co-occurrence across this corpus. The  resultant factors are interpreted as underlying dimensions of functional linguistic  variation. The dimensions are subsequently used as predictors in a … Read entire article »

Filed under: abstracts

Abstract: Genre Emergence in Amateur Flash

Genre Emergence in Amateur Flash by John C. Paolillo, Jonathan Warren and Breanne Kunz In: Genres on the Web Computational Models and Empirical Studies Alexander Mehler, Serge Sharoff and Marina Santini Text, Speech and Language Technology Volume 42, 2011, DOI: 10.1007/978-90-481-9178-9 Abstract Research on genre emergence in digital media often characterizes the emergence of new genres using notions of “community” and “social interaction”. In this chapter, we attempt to provide empirical content to these notions by employing a social network approach. We examine Flash animations posted to Newgrounds.com, in terms of both genre features and favorite author nominations. Results indicate that participants’ social network positions are strongly associated with the genres of Flash they produce. We argue from these findings that the social network positions of Flash authors contribute to the establishment of genre norms, and that … Read entire article »

Filed under: abstracts

Abstract: Mining Graph Patterns in Web-based Systems: A Conceptual View

Mining Graph Patterns in Web-based Systems: A Conceptual View by Matthias Dehmer and Frank Emmert-Streib In: Genres on the Web Computational Models and Empirical Studies Alexander Mehler, Serge Sharoff and Marina Santini Text, Speech and Language Technology Volume 42, 2011, DOI: 10.1007/978-90-481-9178-9 Abstract This chapter discusses a graph-based perspective for automatically analyzing web genre data by mining graph patterns representing web-based hypertext structures. The major purpose of our contribution is to emphasize that an approach entirely different to the vector space model, frequently used in Web mining and related problems, can not only be applied to these problems but is more suitable conceptually. The graphs in our study are hierarchical and directed and are called generalized trees. Starting from a similarity measure for determining the structural similarity of generalized trees, we discuss some evaluation steps for automatically … Read entire article »

Filed under: abstracts

Abstract: Classification of Web Sites at Super-genre Level

Classification of Web Sites at Super-genre Level by Christoph Lindemann and Lars Littig In: Genres on the Web Computational Models and Empirical Studies Alexander Mehler, Serge Sharoff and Marina Santini Text, Speech and Language Technology Volume 42, 2011, DOI: 10.1007/978-90-481-9178-9 Abstract We present an approach for the classification of Web sites at supergenre level. This approach utilizes both structure and content of Web sites in order to distinguish between eight relevant Web genres. We show that this combination of structural and content-based features considerably improves the classification performance compared to approaches solely based on structure or content. We evaluate our approach on a dataset comprising more than 16,000 Web sites with about 20 million crawled and 100 million known pages. The approach achieves an accuracy of 92% for the classification of these Web sites. … Read entire article »

Filed under: abstracts

Abstract: Marrying Relevance and Genre Rankings: an Exploratory Study

Marrying Relevance and Genre Rankings: an Exploratory Study by Pavel Braslavski Amazon.com Widgets Amazon.co.uk Widgets In: Genres on the Web Computational Models and Empirical Studies Alexander Mehler, Serge Sharoff and Marina Santini Text, Speech and Language Technology Volume 42, 2011, DOI: 10.1007/978-90-481-9178-9 Abstract In this chapter, we discuss different options for using genre-related information inWeb search. We conduct an experiment on merging genre-related and text-relevance rankings using a reference Web collection. A method for automatic extraction of formality score akin to readability score using canonical discriminant analysis applied to a sample of genres with decreasing formality is proposed. Effects of aggregating genre-related and text relevance rankings are considered. Evaluation of the results shows moderate positive effects. Findings suggest that further research is needed on implicit use of genre-related information in Web search. … Read entire article »

Filed under: abstracts

Abstract: Web Genre Analysis: Use Cases, Retrieval Models, and Implementation Issues

Web Genre Analysis: Use Cases, Retrieval Models, and Implementation Issues by Benno Stein, Sven Meyer zu Eissen and Nedim Lipka Amazon.com Widgets Amazon.co.uk Widgets In: Genres on the Web Computational Models and Empirical Studies Alexander Mehler, Serge Sharoff and Marina Santini Text, Speech and Language Technology Volume 42, 2011, DOI: 10.1007/978-90-481-9178-9 Abstract People who search the World Wide Web often have a multi-faceted understanding of their information need: they know what they are searching for, and they know of which form or type the desired documents should be. The former aspect relates to the content of a desired document (= topic), the latter to the presentation of its content and the intended target group. … Read entire article »

Filed under: abstracts