Book Review: Building and Using Comparable Corpora (Springer, 2013)

I would like to recommend “Building and Using Comparable Corpora” (edited by S. Sharoff, R. Rapp, P. Zweigenbaum and P. Fung) to those who are working with or are interested in multilingual and monolingual comparable corpora. The volume is an edited collection of articles covering many topics related to the compilation, measurement and use of comparable corpora. It is divided into two parts and includes 17 articles. I found this volume useful and inspiring for my research. The volume is comprehensive and still up-to-date, although it collects extended papers from a BUCC (Building and Using Comparable Corpora) workshop held in 2011, or articles written between 2011-2012. The book starts with an informative overview (article 1), where issues are presented neatly and where the definitions of the different types of corpora … Read entire article »

Filed under: reading suggestions, references, reviews

Book Review: Fundamentals of Predictive Text Mining 2nd Ed. (2015)

Book Review: Weiss S. M., Indurkhya N. and Zhang T. (2015). Fundamentals of Predictive Text Mining. Springer-Verlag, London. Second Edition Informer website Winter 2016 Issue, Book Review The volume “Fundamentals of Predictive Text Mining”, 2nd ed. has nine chapters, a table of contents, a list of references, a Subject Index and an Author Index. The book also includes a Preface written by the three authors, Summary Abbriavions: ML=Machine Learning; NLP=Natural Language Processing; IR= Information Retrieval 1) In Chapter 1, “Overview of … Read entire article »

Filed under: featured, reading suggestions, reviews

Spreading the Word about (Web)Genre Research

What is genre? Why is it useful to master genre conventions? Can we classify document genres automatically? Around the world, lots of researches and scholars belonging to a wide range of disciplines are trying to provide answers to these and to many other questions. Aristotle suggested the first genre classification scheme by dividing literature into Tragedy, Comedy and Lyrics (well, I am oversimplifying…).  Aristotle smoothly classified all the knowledge of his time, so arguably classifying genres … Read entire article »

Filed under: discussions, reading suggestions, references, reflections

Distributional Semantics applied to Flickr® Tags

Upcoming Publications MARIANNA BOLOGNESI, International Center for Intercultural Exchange Distributional Semantics meets Embodied Cognition: Flickr® as a database of semantic features Selected Papers from the 4th UK Cognitive Linguistics Conference (in press) Distributional models such as Latent Semantic Analysis (LSA, Landauer, Dumais 1997) generate semantic spaces based on words’ co-occurrences in linguistic contexts. The semantic representations that emerge from these models are based on solely linguistic information, leaving aside the information that we retrieve from perceptual experiences. The analysis proposed applies the methods of distributional semantics to Flickr®, a corpus of images enhanced with metadata (tags), expressing a wide range of concepts, including perceptual features triggered by the experiences captured in the photographs. A case study on the domain of colors shows how a distributional analysis based on Flickr® can produce semantic … Read entire article »

Filed under: dissemination, reading suggestions, references

Dissemination: Multi-Labeling Web Pages by Genre

Excerpts from: Chaker Jebari. MLICC: A Multi-Label and Incremental Centroid-Based Classification of Web Pages by Genre. NLDB 2012: 183-190. For the full version, please contact:   Evaluation Corpus In our approach we used the corpus MGC. This corpus was gathered from internet and consists of 1539 English web pages classified into 20 genres as shown in the following table. In this corpus each web page was assigned by labelers to primary, secondary and final genres. Among 1539 web pages, 1059 are labeled with one genre, 438 with two genres, 39 with three genres and 3 with four genres. It is clear from the following table that the corpus MGC is unbalanced, meaning that the web pages are not equally distributed among the genres. … Read entire article »

Filed under: dissemination, reading suggestions

Reading Suggestions: Meaning & Genre — Affect & Buying Behaviour

1)  Pattern and Meaning across Genres and Disciplines: An Exploratory Study Author:    Groom, Nicholas Journal of English for Academic Purposes, v4 n3 p257-277 Jul 2005 Abstract:    Work in corpus linguistics has led to the development of a theory of language as “phraseology” [Hunston, S., & Francis, G. (1999). "Pattern grammar: A corpus-driven approach to the lexical grammar of English." Amsterdam: John Benjamins. Sinclair, J. M. (1991). "Corpus, concordance, collocation." Oxford: Oxford University Press. Sinclair, J. M. (2004). "Trust the text: Language, corpus and discourse." London: Routledge.]. This paper investigates whether and to what extent phraseology, as exemplified by the grammar patterns “it” v-link ADJ that- (e.g. “It is clear that the problem of evidence continues to vex new historicist criticism”) and “it” v-link ADJ to-inf (e.g. “it is important to compare unemployment rates … Read entire article »

Filed under: reading suggestions

Reblogging: Practical advice for machine learning

Practical advice for machine learning: bias, variance and what to do next By Mikael Huss at Follow the data ( The online machine learning course given by Andrew Ng in 2011 (available here among many other places, including YouTube) is highly recommended in its entirety, but I just wanted to highlight a specific part of it, namely the “Practical advice part”, which touches on things that are not always included in machine learning and data mining courses, like “Deciding what do to do next” (the title of this lecture) or “debugging a learning algorithm” (the title of the first slide in that talk). His advice here focuses on the concepts of the bias and variance in statistical learning. I had been vaguely aware of the concepts of “bias and variance tradeoff” and “bias/variance decomposition” for a long time, but I had always … Read entire article »

Filed under: dissemination, reading suggestions, reblogging

Impact of Sociolinguistics in Opinion Mining Systems

Signed post by Alexander Osherenko, Socioware Development, Full paper: Considering Impact of Sociolinguistic Findings in Believable Opinion Mining Systems Proceedings of The Fifth International Conference On Cognitive Science. 2012. Kalinigrad, Russia ( Opinions are frequent means of communication in human society and automatic approaches to opinion mining in texts attracted therefore much attention. All in all, most approaches apply data mining techniques and extract lexical features (words) as reliable means of classi cation. Noteworthy that although the interest in opinion mining is huge, there are only few explorations on words extracted in opinion mining. This study considers this drawback and elaborates on a sociolinguistic explanation. We hypothesize: an opinion mining system should be trained for classifying opinions in texts of the same language style. Hence, this contribution focuses on the following questions: 1) do sociolinguistic … Read entire article »

Filed under: collaborative blogging, computational models, dialectic, discussions, dissemination, featured, reading suggestions, signed posts

Reblogging: Informer, Spring Issue

Informer Newsletter of the BCS Information Retrieval Specialist Group Table of Contents Editorial: By Udo Kruschwitz on April 28, 2012 Conference Review: ECIR 2012 Industry Day: By Franco Maria Nardini on April 26, 2012 Book Review: Search Analytics for Your Site: By Tyler Tate on April 26, 2012 Conference review: ECIR 2012: By Claudia Hauff on April 25, 2012 Call for Book Reviews: By Cathal Gurrin on April 25, 2012 Conference Review: ECIR 2011: By Cathal Gurrin on April 18, 2012 The Information Needs of Mobile Searchers: By Tyler Tate on April 6, 2012 Designing Faceted Search: Getting the basics right (pt 2): By Tony Russell-Rose on April 4, 2012 Events spring 2012: By Andy Macfarlane on March 30, 2012 … Read entire article »

Filed under: dissemination, reading suggestions, reblogging

Reading Suggestions: Social Network Analysis and Mining for Business Applications

Social Network Analysis and Mining for Business Applications by FRANCESCO BONCHI, CARLOS CASTILLO, ARISTIDES GIONIS, and ALEJANDRO JAIMES, Yahoo! Research Barcelona ACM Transactions on Intelligent Systems and Technology, Vol. 2, No. 3, Article 22, Publication date: April 2011. Social network analysis has gained significant attention in recent years, largely due to the success of online social networking andmedia-sharing sites, and the consequent availability of a wealth of social network data. In spite of the growing interest, however, there is little understanding of the potential business applications of mining social networks. While there is a large body of research on different problems and methods for social network mining, there is a gap between the techniques developed by the research community and their deployment in real-world applications. Therefore the potential business impact of these techniques … Read entire article »

Filed under: reading suggestions