We are building a small corpus of medical documents from the web. The sublanguage used in these web documents has been annotated as “lay” or “specialised” by two annotators (annotation still ongoing). We would like to use this corpus for…
Author: Marina Santini
Computational Linguist, PhD
Book Review: Building and Using Comparable Corpora (Springer, 2013)
I would like to recommend “Building and Using Comparable Corpora” (edited by S. Sharoff, R. Rapp, P. Zweigenbaum and P. Fung) to those who are working with or are interested in multilingual and monolingual comparable corpora. The volume is an…
ML4LT: Machine Learning for Language Technology – A Gentle Introduction
— Last Updated: 27 Feb 2017 — Log: Debriefing available (Jan 2016) Marina Santini’s contact details: marinasantini dot ms at g-m-a-i-l ML4LT is an online self-paced introductory course in Machine Learning for Language Technology. It has been designed for linguists…
Book Review: The Personal Weblog (Peter Lang, 2016)
Published here: LINGUIST List 28.2320, Wed May 24 2017 https://linguistlist.org/issues/28/28-2320.html AUTHOR(S): Schildhauer, Peter; TITLE: The Personal Weblog SUBTITLE: A Linguistic History SERIES: Hallesche Sprach- und Textforschung. Language and Text Studies. Recherches linguistiques et textuelles – Band 14 YEAR: 2016 PUBLISHER:…
Book Chapter: Genre and Terminology by Margaret Rogers (2000)
Useful insights about the relation between domain-specific lexicons and the corpus-driven approach to terminology Genre and Terminology by Margaret Rogers Chapter in: Analysing Professional Genres Edited by Anna Trosborg, John Benjamins [Pragmatics & Beyond New Series 74] 2000 Googlebook:
Lecture: Semantic Word Clouds

Topics: folksonomy, social tagging, tag clouds, automatic folksonomy construction, word clouds, wordle,context-preserving word cloud visualisation, CPEWCV, seam carving, inflate and push, star forest, cycle cover, quantitative metrics, realized adjacencies, distortion, area utilization, compactness, aspect ratio, running time, semantics in language…
Lecture: Ontologies and the Semantic Web

Topics: Semantic Web, Web 3.0, shared understanding, shared semantic annotation, tree of Porphyry, ontology,wordnet, mesh,rdf, iri, description logics, DLs, Owl, WebProtege, domain-specific,Sparql, tags, ontology learning, classes, relations, axioms, instances, semantics in language technology. Lecture: Ontologies and the Semantic Web from…
Lecture: Summarization

Topics: abstracting, extractive summarization, abstractive summarization, summarization in question answering, single vs. multiple documents, query-focused summarization, snippets, unsupervised content selection, topic signature-based content selection, rouge, recall oriented understudy for gisting evaluation, semantics in language technology, Lecture: Summarization from Marina Santini
Lecture: Relation Extraction

Topic: databases of relations, knowledge graph, DBpedia, freebase, ACE, relation extractors, hand-written patterns, supervised machine learning, semi-supervised learning, bootstrapping, distant supervision, unsupervised learning from the web, semantic analysis in language technology. Relation Extraction from Marina Santini