Articles Comments

The WebGenre Blog: The power of genre applied to digital information. By Marina Santini » Archive

Spreading the Word about (Web)Genre Research

Spreading the Word about (Web)Genre Research

What is genre? Why is it useful to master genre conventions? Can we classify document genres automatically? Around the world, lots of researches and scholars belonging to a wide range of disciplines are trying to provide answers to these and to many other questions. Aristotle suggested the first genre classification scheme by dividing literature into Tragedy, Comedy and Lyrics (well, I am oversimplifying…).  Aristotle smoothly classified all the knowledge of his time, so arguably classifying genres … Read entire article »

Filed under: discussions, reading suggestions, references, reflections

Working Definition of Digital Genre (II)

Last Updated: 22 June 2014 – 26 June 2014 – 3 July 2014 - — draft in progress — In this blog post (that I will update seamlessly), I would like to pin down a working definition of digital genre that is appropriate for our computational experiments. The experiments I refer to are those that will be included in the forthcoming book “Computational Theory of Digital Genre” that I have already announced a while ago. With Michael Oakes and Georgious Paltoglou (both at University of Woleverhampton, UK), we are setting up experiments focussing on the computational modeling of the concept of digital genre. Since the concept of genre is difficult to define in a simple way, because it inherits all the idiosyncrasies and ambiguities that characterize language and human communication in general, I … Read entire article »

Filed under: discussions, reflections

Enterprise Search has a very bright future!

Last updated (Comments): 10 July 2013 On the 30th May 2013, I attended the Findability Day 2013 ( organized by Findwise ( The gathering of about 200 participants took place in Central Stockholm (Odenplan) in a sunny day, in bright and spacious conference rooms, and in friendly and laid-off atmosphere. The event – “the biggest event on search and findability in Northern Europe”, as the subtitle says – was free of charge (only registration was required) and was sponsored by Google and Splunk. I will not give a complete debrief of the Findability Day 2013 in this post. Martin White has summarized the highlights in his blog (, and Olof Belfrage describes in more details the presentations in a post ( published on Findwise blog. In this post I would like to summarize a … Read entire article »

Filed under: reflections, reports

Opinion Retrieval and Ranking: the creeping and ineluctable force of Genre

Last Updated: 27 May 2013 Two fundamental principles concurring to the definition and characterization of the concept of genre are conventions and expectations. Simply put, in textual (written or spoken) communication, genres are words that connote different types of text. For instance, on the web the home page genre is different from the blog genre; in a company, the minutes genre is different from the white paper genre; in the press the leader genre is different from the letter to the editor genre… Genres have the power of shaping information following rhetorical and discourse patterns that have become conventionalized. Genre conventions are implemented by the writer(s). When acknowledged, genre conventions raise predictable expectations in the readers or more generally in those who “process” a text… Although I am oversimplifying here, broadly speaking … Read entire article »

Filed under: discussions, quotes, reflections

Towards a Cross-Lingual Lexical Knowledge Base of Lexical Forms

Last updated: 15 May 2013 How do you overcome problems related to cross-linguality? My specific problem at them moment is caused by the poor coverage of everyday language in lexical resources. For instance, the Swedish single-word expression /egenremiss/ (14,900 hits, April 2013) – or alternatively as a a multiword expession (MWE) – /egen remiss/ (8,210 hits, April 2013) denotes a referral to a specialist doctor written by patients themselves. This expression is made up from two common Swedish words /egen/ `own (adj)’ and /remiss/ `referral’. It is a recent expression (probably coined around 2010*) and not yet recorded in any official dictionary nor in Wiktionary or other multilingual online lexical resources. This compound happens to be very frequent in query logs belonging to a Swedish public health service website. … Read entire article »

Filed under: discussions, featured, queries, reflections, requests

Reflection: Analysing Emotions of Social Writing

by Marina Santini A few days ago, I attended a fascinating session organized by the Quantified Self Stockholm (QS) MeetuUp, in a venue with an inspiring name, Psykologifabriken (The Psychology Factory), in center Stockholm. This QS session – Adding Power to body and soul… – included two presentations: one about adding power to the body through a robotic glove that adds gripping energy to the hand of those who have lost strength in this limb; the other one about methods to enable self-development through digital tools. Since I am not into robotics, I will only say that the empowering glove shown by Johan Ingvast from Bioservo is simply amazing… I am not a psychologist either, but I found the presentation about empowring the “soul” very relevant to some of my interests, namely sentiment analysis, mood … Read entire article »

Filed under: reflections

Question: How to Define Criteria for Subgenre Classification?

I had an interesting email exchange with Christophe Clugston, a researcher currently located in Thailand, about the classification of a specific subgenre belonging to the Netadvertising supergenre. He says: “I am looking at classifying a very narrow sub genre. Within the domain of Netvertising I am looking at an extant, variant genre that I am terming Long Scroll Web Advertisements (as the off line version is termed Long Copy Advertising). This type of advertising is very different than the multi media image tied to a few words or few clauses. It is based entirely on the factor of extended reading (some of these ads are over 24 pages when printed). I have enclosed a link to one type of ad in this category At current I am looking only at self defense … Read entire article »

Filed under: discussions, queries, reflections, requests

Report: Language in the Digital Age – META-NORD National Workshop

Report: Language in the Digital Age – META-NORD National Workshop by Marina Santini Held in Stockholm, Sweden, 23 Nov 2012 Download program and presentations here. I was very happy to attend the workshop “Language in the Digital Age” last week in Stockholm. It was informative and inspring. The workshop’s venue – Stacken at Nalen’s (a building from the end of XIX century) – is a fascinating example of architectonic re-use. Stacken (literally meaning “The Stack”, but probably a nickname to refer to the boxing ring) was the former boxing gym of the still existing Narva Boxningsklubb. Now Stacken is an cosy conference/banquet room decorated with four thin columns that add status and elegance to events ( The speakers and the audience (about 50 people) represented a wide range of interests, from the linguistic needs of the … Read entire article »

Filed under: reflections, reports, seminars

Towards a Computational Theory of Digital Genre (I): Working Definition of Genres for Computational Purposes

Towards a Computational Theory of Digital Genre (I): Working Definition of Genres for Computational Purposes by Marina Santini – Last Updated: 29 Oct 2012 1. What is a (textual) genre? • A genre is a class of texts with similar communicative, textual and linguistic features. 2. What characterizes a genre? A genre: • Must have a name • Must be recognized within a community • Must be produced or retrieved during a task • Must have conventions • Must raise expectations • Can change over time. It is an cultural artifact (culture here includes society, media, techonology, etc.) 3. What characterizes a digital genre? • The same characteristics listed above. • A digital genre is any kind of genre that has a digital form, such as emails, chats, online academic papers, online newspaper articles, blogs… • A digital genre can be any paper genre … Read entire article »

Filed under: dialectic, discussions, dissemination, reflections

Mining Query Logs: Query Disambiguation & Understanding through a KB

Marina Santini. Copyright © 2012 Work in progress Talking about  query logs, Karlgren (2010) points out: “There are several reasons to be cautious in drawing too far-reaching conclusions: we cannot say for sure what the users were after; [...]“. However, some linguistic problems can be sorted out, for example those related to sublanguage, terminology, multi-word expressions, etc. Interestingly, the use of different sublanguages has been studied by Karin Friberg Heppin in her PhD thesis: Resolving Power of Search Keys in MedEval. A Swedish Medical Test collection with User Groups: Doctors and Patients. Karin highlights how patients (laymen) and doctors (experts) use different vocabulary (or terminology) to indicate the same concept. For example, patients might use the word “painkiller” while doctors may prefer the word “analgesic” to refer to the same treatment. Different sublanguages … Read entire article »

Filed under: discussions, featured, reflections