Book Review: Building and Using Comparable Corpora (Springer, 2013)

I would like to recommend “Building and Using Comparable Corpora” (edited by S. Sharoff, R. Rapp, P. Zweigenbaum and P. Fung) to those who are working with or are interested in multilingual and monolingual comparable corpora. The volume is an edited collection of articles covering many topics related to the compilation, measurement and use of comparable corpora. It is divided into two parts and includes 17 articles. I found this volume useful and inspiring for my research. The volume is comprehensive and still up-to-date, although it collects extended papers from a BUCC (Building and Using Comparable Corpora) workshop held in 2011, or articles written between 2011-2012. The book starts with an informative overview (article 1), where issues are presented neatly and where the definitions of the different types of corpora … Read entire article »

Spreading the Word about (Web)Genre Research

What is genre? Why is it useful to master genre conventions? Can we classify document genres automatically? Around the world, lots of researches and scholars belonging to a wide range of disciplines are trying to provide answers to these and to many other questions. Aristotle suggested the first genre classification scheme by dividing literature into Tragedy, Comedy and Lyrics (well, I am oversimplifying…).  Aristotle smoothly classified all the knowledge of his time, so arguably classifying genres … Read entire article »

Summary: Looking for Corpora…

Dear All, In this post I collect all the suggestions I got for the following request: “Looking for Corpora in….” Big thanks to (hope I have not forgotten anybody): Johannes Heinecke, Dominika Rogozinska, Mohamed-Zakaria KURDI, Bartosz Ziólko, Olga Whelan, Margarita Borreguero, Zuloaga, Ayesha Zafar, Will Snellen, Katherine (Katie) Skees Hund, Anna Matyszczyk, Massinissa Ahmim, Marcin Feder, Maria Pia Montoro, Lawrence Niculescu, Jesus Vilares, Ewa Gwiazdecka, Jack Bowers, Taner Sezer, Yvonne Adesam, Kadri Muischnek, Anne Tamm, Ralf Steinberger, Ricardo Campos, Edyta Jurkiewicz-Rohrbacher, Pat, Sara Castagnoli, Adam Przepiorkowski, Hung Le Khanh,  Kristian Kankainen, Norton Roman, Mansur Sayhunov. Suggestions were sent through: Mailing Lists: Corpora List (, BCS IRSG ( LinkedIn Groups: Corpus Linguistics, Computational Linguistics, Natural Language Processing, Applied linguistics, Terminology Services. Hope this list of corpora is useful for everybody working with multi- and cross-linguality. Please … Read entire article »

Distributional Semantics applied to Flickr® Tags

Upcoming Publications MARIANNA BOLOGNESI, International Center for Intercultural Exchange Distributional Semantics meets Embodied Cognition: Flickr® as a database of semantic features Selected Papers from the 4th UK Cognitive Linguistics Conference (in press) Distributional models such as Latent Semantic Analysis (LSA, Landauer, Dumais 1997) generate semantic spaces based on words’ co-occurrences in linguistic contexts. The semantic representations that emerge from these models are based on solely linguistic information, leaving aside the information that we retrieve from perceptual experiences. The analysis proposed applies the methods of distributional semantics to Flickr®, a corpus of images enhanced with metadata (tags), expressing a wide range of concepts, including perceptual features triggered by the experiences captured in the photographs. A case study on the domain of colors shows how a distributional analysis based on Flickr® can produce semantic … Read entire article »

Reading Suggestion: Adjectives and adverbs as indicators of affective language for automatic genre detection (2008)

Rittman, Robert and Nina Wacholder. (2008). Adjectives and adverbs as indicators of affective language for automatic genre detection. Proceedings of AISB 2008 Convention, Symposium on Affective Language. Aberdeen, Scotland, April 1-2, 2008. Abstract. We report the results of a systematic study of the feasibility of automatically classifying documents by genre using adjectives and adverbs as indicators of affective language. In addition to the class of adjectives and adverbs, we focus on two specific subsets of adjectives and adverbs: (1) trait adjectives, used by psychologists to assess human personality traits, and (2) speaker-oriented adverbs, studied by linguists as markers of narrator attitude. We report the results of our machine learning experiments using Accuracy Gain, a measure more rigorous than the standard measure of Accuracy. We find that it is possible to classify … Read entire article »

Beyond Topic: Genre and Search

One of the central problems of information retrieval (IR) is the difficulty of matching a document to a query in the absence of any contextual information about the searchers and the document creators.  Genre is a context carrier and genre information can be exploited by information systems to improve their matching algorithms. The web hosts many recognised genres that can potentially provide this contextual information, including FAQs, press releases, product descriptions, instructions, guides, and reviews. … Read entire article »

Genre, Social Action and Social Intelligence

An important dimension that has not been investigated so far is the relatedness among genre, social action and social intelligence. The interpretation of genre in terms of social action was put forward more than 25 years ago by Carolyn Miller (Miller, 1984) and backed up by recent empirical studies on web genres (e.g. Miller and Shepherd, 2004, 2009). Lately, the social implications of the concept of genre have been stretched up to support the claim that that teaching how to master genre since the primary school  is a way of implementing democracy and social justice (Martin and Rose, 2008). I would suggest extending the social interpretation of genre even further by arguing that the recognition of social action is a sign of social intelligence. … Read entire article »

University of Borås: lecture and seminar slides

Click on the highlighted words to download the slides of the lecture and seminar held at University of Borås on 11 April 2011. … Read entire article »

Seminar’s Slides — Stockholm University

The powerpoint presentation of today’s seminar at Stockholm University (24 Feb 2011) can be downloaded from here … Read entire article »

In quest of the holy grail

Do we really need a definition of (web) genre? Once we are convinced that genre is useful, do we really need a definition? We could just say that: genre is a classificatory principle based on a number of attributes.  Well, easy to say Without a theoretical definition and characterization of the concept of genre, it is not clear: how to create a genre taxonomy that both humans and automatic classifiers can easily discriminate against; how to select  representative corpus for the genre classes in the taxonomy, since there is a lot of variation in users’ assessment; how to identifiy the optimal genre–revealing features… … Read entire article »

