Headline

Working Paper: A Web Corpus for eCare: Collection, Annotation and Learning – Preliminary Results

We are building a small corpus of medical documents from the web. The sublanguage used in these web documents has been annotated as “lay” or “specialised” by two annotators (annotation still ongoing). We would like to use this corpus for bootstrapping (semi-supervised and weakly- supervised learning), for lay-specialized terminology extraction, for the automatic identification of related terms, and for similar tasks. If you have suggestions and hints to interesting new directions in this field of research, we would gladly hear from you. Abstract and link to the paper follow. A Web Corpus for eCare: Collection, Annotation and Learning- Preliminary Results – DRAFT: 20 March 2017 by Marina Santini, Marjan Alirezai, Mikael Nyström, and Arne Jönsson Abstract We present eCare Sv, Beta, a small corpus of web documents written in Swedish. The content of … Read entire article »

Latest

Book Review: Building and Using Comparable Corpora (Springer, 2013)

I would like to recommend “Building and Using Comparable Corpora” (edited by S. Sharoff, R. Rapp, P. Zweigenbaum and P. Fung) to those who are working with or are interested in multilingual and monolingual comparable corpora. The volume is an edited collection of articles covering many topics related to the compilation, measurement and use of comparable corpora. It is divided into two parts and includes 17 articles. I found this volume useful and inspiring for my research. The volume is comprehensive and still up-to-date, although it collects extended papers from a BUCC (Building and Using Comparable Corpora) workshop held in 2011, or articles written between 2011-2012. The book starts with an informative overview (article 1), … Read entire article »

ML4LT: Machine Learning for Language Technology – A Gentle Introduction

— Last Updated: 27 Feb 2017 — Log: Debriefing available (Jan 2016) Marina Santini’s contact details: marinasantini dot ms at g-m-a-i-l ML4LT is an online self-paced introductory course in Machine Learning for Language Technology. It has been designed for linguists and for undergraduate students in Computational Linguistics. The course includes 10 lectures, both theoretical and practical. The practical part relies on the Weka Machine Learning Workbench (free software). [See Lab1 for installation]. The content of this page is based on selected material from the course: “ML4LT: Machine Learning for Language Technology 2016, Undergraduate Students”, Uppsala University. I will update this page regularly with links, videos, labs, assignments and literature. When visiting this page keep an eye on … Read entire article »

Book Review: The Personal Weblog (Peter Lang, 2016)

— draft version — AUTHOR(S): Schildhauer, Peter; TITLE: The Personal Weblog SUBTITLE: A Linguistic History SERIES: Hallesche Sprach- und Textforschung. Language and Text Studies. Recherches linguistiques et textuelles – Band 14 YEAR: 2016 PUBLISHER: Peter Lang AG ISBN13: 9783631662748,9783631662748,9783631662748 ANNOUNCED IN: http://linguistlist.org/issues/27/27-2198.htmla Introduction “The Personal Weblog: A Linguistic History” is a monograph that describes and interprets the evolution of the personal weblog genre. The study of the personal weblog is corpus-based. The corpus was created using material from The Internet Archive. The volume is written in English. It is based on the author’s PhD thesis (p. 17), originally written in German. The reading of this book is recommended to all those interested in genre analysis, genre evolution, genre classification, blog genre analysis. Summary The volume … Read entire article »

Book Chapter: Genre and Terminology by Margaret Rogers (2000)

Useful insights about the relation between domain-specific lexicons and the corpus-driven approach to terminology Genre and Terminology by Margaret Rogers Chapter in: Analysing Professional Genres Edited by Anna Trosborg, John Benjamins [Pragmatics & Beyond New Series 74] 2000 Googlebook: Bookmark on Delicious Recommend on Facebook Share on Linkedin Tweet about it Subscribe to the comments on this post … Read entire article »

Lecture: Semantic Word Clouds

Lecture: Semantic Word Clouds

Topics: folksonomy, social tagging, tag clouds, automatic folksonomy construction, word clouds, wordle,context-preserving word cloud visualisation, CPEWCV, seam carving, inflate and push, star forest, cycle cover, quantitative metrics, realized adjacencies, distortion, area utilization, compactness, aspect ratio, running time, semantics in language technology Lecture: Semantic Word Clouds from Marina Santini Bookmark on Delicious Recommend on Facebook Share on … Read entire article »

Lecture: Ontologies and the Semantic Web

Lecture: Ontologies and the Semantic Web

Topics: Semantic Web, Web 3.0, shared understanding, shared semantic annotation, tree of Porphyry, ontology,wordnet, mesh,rdf, iri, description logics, DLs, Owl, WebProtege, domain-specific,Sparql, tags, ontology learning, classes, relations, axioms, instances, semantics in language technology. Lecture: Ontologies and the Semantic Web from Marina Santini Bookmark on Delicious Recommend on Facebook Share on Linkedin Tweet about it Subscribe to … Read entire article »

Lecture: Summarization

Lecture: Summarization

Topics: abstracting, extractive summarization, abstractive summarization, summarization in question answering, single vs. multiple documents, query-focused summarization, snippets, unsupervised content selection, topic signature-based content selection, rouge, recall oriented understudy for gisting evaluation, semantics in language technology, Lecture: Summarization from Marina Santini Bookmark on Delicious Recommend on Facebook Share on Linkedin Tweet about it Subscribe to the comments … Read entire article »

Lecture: Relation Extraction

Lecture: Relation Extraction

Topic: databases of relations, knowledge graph, DBpedia, freebase, ACE, relation extractors, hand-written patterns, supervised machine learning, semi-supervised learning, bootstrapping, distant supervision, unsupervised learning from the web, semantic analysis in language technology. Relation Extraction from Marina Santini Bookmark on Delicious Recommend on Facebook Share on Linkedin Tweet about it Subscribe to the comments on this post … Read entire article »

Course: Probability and Statistics for Language Technology

Course: Probability and Statistics for Language Technology

Uppsala University – Department of Linguistics and Philology Topics: elementary concepts in probability theory, such as unconditional and conditional probability, Bayes’ theorem, and the law of total probability; elementary concepts in statistics such as sample, estimation, and hypothesis testing. Bookmark on Delicious Recommend on Facebook Share on Linkedin Tweet about it Subscribe to the comments on this post … Read entire article »