The WebGenre Blog: The power of genre applied to digital information. By Marina Santini » Entries tagged with "marina santini"

Lecture 1: What is Machine Learning (ML4LT 2015)

Lecture 1: What is Machine Learning (ML4LT 2015)

Opening lecture to the Machine Learning for Language Technology courseat Uppsala University, Sweden. Autumn 2015. Lecture 1: What is Machine Learning? from Marina Santini … Read entire article »

Filed under: featured, lectures

Presentation: Text analytics and R – Open Question: is it a good match?

* The Quest: finding the optimal way to handle Big Textual Data for Information Discovery & Actionable Intelligence * The Open Question: is R convenient for text analytics of Big TEXTUAL Data? * The Mission: identification of pros, cons, limits, benefits Current Status: investigation in progress… Live casts of the this R meetup are available here: First talk: Text analytics and R by Marina Santini Second and third talks: Wordclouds from Twitter with R by Måns Magnusson and An example of text analytics in R by Joakim Lundborg http://www.meetup.com/StockholmR/pages/Live_casts_from_past_meetups/ You can find R code suggestions in this thread:  http://www.meetup.com/StockholmR/events/103353372/?&a=uc1_te … Read entire article »

Filed under: featured, slides

Dissemination: Multi-Labeling Web Pages by Genre

Excerpts from: Chaker Jebari. MLICC: A Multi-Label and Incremental Centroid-Based Classification of Web Pages by Genre. NLDB 2012: 183-190. For the full version, please contact: jebarichaker@yahoo.fr   Evaluation Corpus In our approach we used the corpus MGC. This corpus was gathered from internet and consists of 1539 English web pages classified into 20 genres as shown in the following table. In this corpus each web page was assigned by labelers to primary, secondary and final genres. Among 1539 web pages, 1059 are labeled with one genre, 438 with two genres, 39 with three genres and 3 with four genres. It is clear from the following table that the corpus MGC is unbalanced, meaning that the web pages are not equally distributed among the genres. … Read entire article »

Filed under: dissemination, reading suggestions

Seminar – Towards Contextualized Information: How Automatic Genre Identification Can Help

Seminar Series Laboratory for Cognition, Interaction and Language Technology (CILTLab) Linköping University, Linköping, Sweden, Tuesday 28 August 2012 Abstract: Genre is one of the textual dimensions that can be used to reconstruct the communicative context needed to assess the value of information with respect to a purpose (business, learning, finding, monitoring, predicting, etc.). When we know the genre of a text, we can surmise the CONTEXT where a text has been created and for which purpose. Therefore we can more confidently decide whether a text contains the information we are looking for. For example, factual texts might have more credibility than opinionated texts. In this respect, genres such as press conferences, declarations or announcements by a White House spokesman might be more reliable than subjective genres, e.g. newspapers’ editorials or op-ed articles. On the … Read entire article »

Filed under: abstracts, announcements, seminars

White Paper: Automatic Genre Identification – Testing with Noise

Automatic Genre Identification – Testing with Noise by Efstathios Stamatatos, Serge Sharoff, Marina Santini – Copyright © 2012, All rights reserved.   Citation:  Stamatatos E., Sharoff S., Santini M. (2012). Automatic Genre Identification – Testing with Noise. [White paper]. Retrieved from http://www.forum.santini.se/2012/03/white-paper-automatic-genre-identification-testing-with-noise/ The genre collections used in the experiments are available here. The reference list is here. In the experiments described below, genre classes coming from three genre collections have been used: Santinis7 (Santini, 2007). KI-04 (Meyer zu Eissen and Stein, 2004), and HGC (Stubbe and Ringlstetter, 2007). These genre collections have been created by different people, in different universities, for different purposes, with different criteria, and different notions of what genre is. Since genre is a complex concept and genre classes can be characterized in different ways, we assume that having a AGI algorithm … Read entire article »

Filed under: collaborative blogging, computational models, featured, signed posts, white papers

CLT seminar (University of Gothenburg): 2011-06-16, 10:15 – 12:00

 Marina Santini – Computational Models for Automatic Web Genre Identification http://www.clt.gu.se/seminar/2011-06-16/clt-seminar-marina-santini Date:  2011-06-16 10:15 – 12:00 Where:  L308, Lennart Torstenssonsgatan 8 Broadly speaking, “genre” is a classification concept. A genre is a recurring and recognized pattern of communication that has a specific name. The web hosts many recognised genres, such as FAQs, press releases, product descriptions, instructions, guides, e-magazines, blogs, professional profiles, how-tos, web ads and reviews. Each of these genres serves a number of communicative and social purposes and carries additional contextual information that helps the reader interpret the content. Can web genres be identified and detected automatically? Which computational models have been tried out so far in automatic genre identification research? How well do they perform? In this talk, I will present and discuss the latest findings in automatic genre identification and suggest viable … Read entire article »

Filed under: announcements

Is a boat manual still a manual a hundred years later?

(All inaccuracies in this anecdote are mine). During  a seminar on genre at Stockholm University, J. K. remarked that genre is too fluctuating to be captured computationally. Genre, he said, is much more than its linguistic markup. What we can capture is the linguistic form but not the genre itself. For instance, he continued, what belongs to a genre today does not necessarily belong to the same genre in the future. To support his claim he told us that he was reading an old manual of a boat built about a century ago. … Read entire article »

Filed under: discussions

Chapter: Any Land in Sight?

Any Land in Sight? by Marina Santini, Serge Sharoff, Alexander Mehler In: Genres on the Web Computational Models and Empirical Studies Alexander Mehler, Serge Sharoff and Marina Santini Text, Speech and Language Technology Volume 42, 2011, DOI: 10.1007/978-90-481-9178-9 Abstract Is there hope of sorting out the complex issues of genre on the web? Is there any land in sight? We think so. Genre is a multifarious concept that lends itself to many interpretations and uses. For this reason, we included as many approaches and different views as possible. We believe that the plurality and diversity of visions fosters cross-fertilisation of ideas and that inter- and transdisciplinarity are the most productive approaches to increasing our understanding of this important concept. … Read entire article »

Filed under: chapters

Excerpt: Cross-Testing a Genre Classification Model for the Web

Cross-Testing a Genre Classification Model for the Web by Marina Santini In: Genres on the Web Computational Models and Empirical Studies Alexander Mehler, Serge Sharoff and Marina Santini Text, Speech and Language Technology Volume 42, 2011, DOI: 10.1007/978-90-481-9178-9 Abstract The main aim of the experiments described in this chapter is to explore how to assess the robustness of genre models for the web. For this purpose, a simple genre model is presented and cross-tested with four genre collections. In this difficult experimental setting, the model shows some stability and its results are in line with other current genre-enabled applications. The model provides some insights into open issues in AGI on the web. In particular, it shows that we know very little about the effect of noise on genre classification results. The set of experiments presented here offers … Read entire article »

Filed under: chapter excerpts

Chapter Excerpt: Riding the Rough Waves of Genre on the Web

Riding the Rough Waves of Genre on the Web Concepts and Research Questions Marina Santini, Alexander Mehler, Serge Sharoff In: Genres on the Web Computational Models and Empirical Studies Alexander Mehler, Serge Sharoff and Marina Santini Text, Speech and Language Technology Volume 42, 2011, DOI: 10.1007/978-90-481-9178-9 1 Why is Genre Important? Genre, in the most generic definition, takes the meaning “kind; sort; style” (OED). A more specialised definition of genre in OED reads: “A particular style or category of works of art; esp. a type of literary work characterised by a particular form, style, or purpose.”. Similar definitions are found in other dictionaries, for instance, OALD reads “a particular type or style of literature, art, film or music that you can recognise because of its special features”. Broadly speaking, then, generalising from lexicographic definitions, genre can be seen … Read entire article »

Filed under: chapter excerpts