Articles Comments

The WebGenre Blog: The power of genre applied to digital information. By Marina Santini » Archive

ML4LT: Machine Learning for Language Technology – A Gentle Introduction

— Last Updated: 27 Feb 2017 — Log: Debriefing available (Jan 2016) Marina Santini’s contact details: marinasantini dot ms at g-m-a-i-l ML4LT is an online self-paced introductory course in Machine Learning for Language Technology. It has been designed for linguists and for undergraduate students in Computational Linguistics. The course includes 10 lectures, both theoretical and practical. The practical part relies on the Weka Machine Learning Workbench (free software). [See Lab1 for installation]. The content of this page is based on selected material from the course: “ML4LT: Machine Learning for Language Technology 2016, Undergraduate Students”, Uppsala University. I will update this page regularly with links, videos, labs, assignments and literature. When visiting this page keep an eye on the “last updated” date. The course and the linked material will be updated and upgraded … Read entire article »

Filed under: featured, lectures, slides

Book Review: The Personal Weblog (2016)

— draft version — AUTHOR(S): Schildhauer, Peter; TITLE: The Personal Weblog SUBTITLE: A Linguistic History SERIES: Hallesche Sprach- und Textforschung. Language and Text Studies. Recherches linguistiques et textuelles – Band 14 YEAR: 2016 PUBLISHER: Peter Lang AG ISBN13: 9783631662748,9783631662748,9783631662748 ANNOUNCED IN: http://linguistlist.org/issues/27/27-2198.htmla Introduction “The Personal Weblog: A Linguistic History” is a monograph that describes and interprets the evolution of the personal weblog genre. The study of the personal weblog is corpus-based. The corpus was created using material from The Internet Archive. The volume is written in English. It is based on the author’s PhD thesis (p. 17), originally written in German. The reading of this book is recommended to all those interested in genre analysis, genre evolution, genre classification, blog genre analysis. Summary The volume ”The Personal Weblog: A Linguistic History” has 308 pages. It includes Acknowledgements, Contents Overview, Table … Read entire article »

Filed under: featured, reviews

Lecture: Semantic Word Clouds

Lecture: Semantic Word Clouds

Topics: folksonomy, social tagging, tag clouds, automatic folksonomy construction, word clouds, wordle,context-preserving word cloud visualisation, CPEWCV, seam carving, inflate and push, star forest, cycle cover, quantitative metrics, realized adjacencies, distortion, area utilization, compactness, aspect ratio, running time, semantics in language technology Lecture: Semantic Word Clouds from Marina Santini … Read entire article »

Filed under: featured, lectures, slides

Lecture: Ontologies and the Semantic Web

Lecture: Ontologies and the Semantic Web

Topics: Semantic Web, Web 3.0, shared understanding, shared semantic annotation, tree of Porphyry, ontology,wordnet, mesh,rdf, iri, description logics, DLs, Owl, WebProtege, domain-specific,Sparql, tags, ontology learning, classes, relations, axioms, instances, semantics in language technology. Lecture: Ontologies and the Semantic Web from Marina Santini … Read entire article »

Filed under: featured, lectures, slides

Lecture: Summarization

Lecture: Summarization

Topics: abstracting, extractive summarization, abstractive summarization, summarization in question answering, single vs. multiple documents, query-focused summarization, snippets, unsupervised content selection, topic signature-based content selection, rouge, recall oriented understudy for gisting evaluation, semantics in language technology, Lecture: Summarization from Marina Santini … Read entire article »

Filed under: featured, lectures, slides

Lecture: Relation Extraction

Lecture: Relation Extraction

Topic: databases of relations, knowledge graph, DBpedia, freebase, ACE, relation extractors, hand-written patterns, supervised machine learning, semi-supervised learning, bootstrapping, distant supervision, unsupervised learning from the web, semantic analysis in language technology. Relation Extraction from Marina Santini … Read entire article »

Filed under: featured, lectures, slides

Course: Probability and Statistics for Language Technology

Course: Probability and Statistics for Language Technology

Uppsala University – Department of Linguistics and Philology Topics: elementary concepts in probability theory, such as unconditional and conditional probability, Bayes’ theorem, and the law of total probability; elementary concepts in statistics such as sample, estimation, and hypothesis testing. … Read entire article »

Filed under: announcements, featured, TOC

Lecture: Question Answering

Lecture: Question Answering

Topics: IBM’s Watson, Apple’s Siri, WolframAlpha, factoid questions, complex questions, narrative questions, IR-based approaches, knowledge-based approaches, hybrid approaches, IR-based question answering, answer type taxonomy, passage retrieval,mean reciprocal rank, MRR, semantic analysis in language technology Lecture: Question Answering from Marina Santini … Read entire article »

Filed under: featured, lectures, slides

Lecture: IE – Named Entity Recognition (NER)

Lecture: IE – Named Entity Recognition (NER)

Topics: Information Extraction, Named Entity Recognition, NER, text analytics, text mining, e-discovery, unstructured data, structured data, calendaring, standard evaluation per entity, standard evaluation per token, sequence classifier, sequence labeling, word shapes, semantic analysis in language technology IE: Named Entity Recognition (NER) from Marina Santini … Read entire article »

Filed under: featured, lectures, slides

Lecture: Vector/Distributional Semantics

Lecture: Vector/Distributional Semantics

Topics: term-context matrix, distributional models, Zellig Harris, John Rupert Firth, PMI, Pointwise Mutual Information, PPMI, Positive Pointwise Mutual Information, joint probability, marginals, smoothing, cosine metric, cosine similarity measure, dot product, vectors. Lecture: Vector Semantics (aka Distributional Semantics) from Marina Santini … Read entire article »

Filed under: featured, lectures, slides