Articles Comments


Spreading the Word about (Web)Genre Research

Spreading the Word about (Web)Genre Research

What is genre? Why is it useful to master genre conventions? Can we classify document genres automatically? Around the world, lots of researches and scholars belonging to a wide range of disciplines are trying to provide answers to these and to many other questions. Aristotle suggested the first genre classification scheme by dividing literature into Tragedy, Comedy and Lyrics (well, I am oversimplifying…).  Aristotle smoothly classified all the knowledge of his time, so arguably classifying genres … Read entire article »


Working Definition of Digital Genre (II)

Last Updated: 22 June 2014 – 26 June 2014 – 3 July 2014 - — draft in progress — In this blog post (that I will update seamlessly), I would like to pin down a working definition of digital genre that is appropriate for our computational experiments. The experiments I refer to are those that will be included in the forthcoming book “Computational Theory of Digital Genre” that I have already announced a while ago. With Michael Oakes and Georgious Paltoglou (both at University of Woleverhampton, UK), we are setting up experiments focussing on the computational modeling of the concept of digital genre. Since the concept of genre is difficult to define in a simple way, because it … Read entire article »

Summary: Looking for Corpora…

Dear All, In this post I collect all the suggestions I got for the following request: “Looking for Corpora in….” Big thanks to (hope I have not forgotten anybody): Johannes Heinecke, Dominika Rogozinska, Mohamed-Zakaria KURDI, Bartosz Ziólko, Olga Whelan, Margarita Borreguero, Zuloaga, Ayesha Zafar, Will Snellen, Katherine (Katie) Skees Hund, Anna Matyszczyk, Massinissa Ahmim, Marcin Feder, Maria Pia Montoro, Lawrence Niculescu, Jesus Vilares, Ewa Gwiazdecka, Jack Bowers, Taner Sezer, Yvonne Adesam, Kadri Muischnek, Anne Tamm, Ralf Steinberger, Ricardo Campos, Edyta Jurkiewicz-Rohrbacher, Pat, Sara Castagnoli, Adam Przepiorkowski, Hung Le Khanh,  Kristian Kankainen, Norton Roman, Mansur Sayhunov. Suggestions were sent through: Mailing Lists: Corpora List (, BCS IRSG ( LinkedIn Groups: Corpus Linguistics, Computational Linguistics, Natural Language Processing, Applied linguistics, Terminology … Read entire article »

Looking for Corpora to explore Cross-Linguality

Dear All, I am looking for corpora of any genre in the following languages: English, Swedish, Polish, Italian, Finnish, Estonian, and Hungarian. I am already aware of a number of corpora (several posts in this blog are dedicated to the dissemination of corpora-related information). These corpora are mostly in English. I would like now to focus on: 1) additional languages and 2) additional genres, such as search query logs, tv scripts, emails, tweets, whatsup messages, etc. All genres are well accepted! The only requirement is: corpora must be free and publicly available. Everybody must be able to replicate or extend experiments using the same corpora/datasets. The purpose of the experiments is to explore cross-linguality in different settings. Please, read the use … Read entire article »

Lecture 3: Structuring the Unstructured via Sentiment Analysis

Lecture 3: Structuring Unstructured Texts Through Sentiment Analysis from Marina Santini Bookmark on Delicious Recommend on Facebook Share on Linkedin Tweet about it Subscribe to the comments on this post … Read entire article »

Lecture 2: From Semantics to Semantic-Oriented Applications

From the “Natural Language Processing” LinkedIn group: John Kontos, Professor of Artificial Intelligence I wonder whether translating into formal logic is nothing more than transliteration which simply isolates the part of the text that can be reasoned upon using the simple inference mechanism of formal logic. The real problem I think lies with the part of text that CANNOT be translated one the one hand and the one that changes its meaning due to civilization advances. My own proposal is to leave NL text alone and try building inference mechanisms for the UNTRANSLATED text depending on the task requirements. All the best John” Bookmark on Delicious Recommend on Facebook Share on Linkedin Tweet about it Subscribe to the … Read entire article »

Lecture 1: Semantic Analysis in Language Technology – Introduction

Lecture 1: Semantic Analysis in Language Technology – Introduction

Lecture 1: Semantic Analysis in Language Technology from Marina Santini Quick overview on basic concepts of semantic analysis, lexical semantics, computational lexical semantics, computational semantics, formal semantics, representation of meaning… Bookmark on Delicious Recommend on Facebook Share on Linkedin Tweet about it Subscribe to the comments on this post … Read entire article »

Course: Semantic Analysis in Language Technology

Uppsala University: Department of Linguistics and Philology Semantic Analysis in Language Technology (2013)         Credits: 7,5 hp Syllabus: 5LN456 Teacher: Marina Santini The course website will be update regularly during the teaching session with additional material. Last Updated: 23 October 2013 Course website: Nov, 12 (Tue) 10‑12 9-2042 (Turing) Course introduction [OH]. J&M 17–18 Nov, 14 (Thu) 10-12 9-2042 (Turing) Introduction to essay assignment (EA) [OH]. Nov, 19 (Tue) 10-12 9-2042 (Turing) IE/PAS, PAS assignment [OH] Johansson and Nugues 2008, J&M 20.9 Nov, 21 (Thu) 10-12 9-2042 (Turing) EA and PAS supervision – Nov, 26 (Tue) 10-12 9-2042 (Turing) Sentiment analysis BL 1–4 Nov, 28 (Thu) 10-12 9-2042 (Turing) Sentiment analysis BL 5–7 Dec, 03 (Tue) 10-12 9-2042 (Turing) Supervision – Dec, 06 (Thu) Deadline EA, step 1 Dec, 10 (Tue) 10-12 9-2042 (Turing) EA presentations – Dec, 12 (Thu) 10-12 9-2042 (Turing) WSD [OH] J&M 19–20. Dec, 17 (Tue) 10-12 9-2042 (Turing) WSD. Deadline EA, feedback to another group (link to submitted essays below) – Jan, 20 (Mon) 2014-01-20: Deadline, all assignments Intended learning outcomes In order to pass the course, a student must be able to: describe systems that perform the following tasks, apply them to authentic linguistic data, and evaluate the results: disambiguate … Read entire article »

Lecture 7: Learning from Massive Datasets

Lecture 7: Learning from Massive Datasets from Marina Santini In this lecture we explore how big datasets can be used with the Weka workbench and what other issues are currently under discussion in the real world, for ex: big data applications, predictive linguistic analysis, new platforms and new programming languages. Bookmark on Delicious Recommend on Facebook Share on Linkedin Tweet about it Subscribe to the comments on this post … Read entire article »

Cloud & Big Data Day

On 24th Sept 2013, I attended the CLOUD & BIG DATA DAY in Stockholm (Kista) organized by SICS and EIT ICT Labs. Cloud & Big Data Day is part of SICS Software Week that takes place every year. The specific purpose of the Cloud & Big Data Day was to “feature leading international and Swedish experts from industry and academia, who present the cutting edge of cloud computing technologies. The intended audience is professionals in IT and its applications for all areas in industry and academia”. The presentations were all interesting and covered a wide range of projects and applications centered on BIG DATA: from how to harness pentabytes of data at Spotify, to big cellular … Read entire article »