Articles Comments

The WebGenre Blog: The power of genre applied to digital information. By Marina Santini » featured, lectures, slides » ML4LT: Machine Learning for Language Technology – A Gentle Introduction

ML4LT: Machine Learning for Language Technology – A Gentle Introduction

— Last Updated: 27 Feb 2017 —

Log: Debriefing available (Jan 2016)

Marina Santini’s contact details: marinasantini dot ms at g-m-a-i-l

ML4LT is an online self-paced introductory course in Machine Learning for Language Technology. It has been designed for linguists and for undergraduate students in Computational Linguistics. The course includes 10 lectures, both theoretical and practical. The practical part relies on the Weka Machine Learning Workbench (free software). [See Lab1 for installation].

The content of this page is based on selected material from the course: “ML4LT: Machine Learning for Language Technology 2016, Undergraduate Students”, Uppsala University.

I will update this page regularly with links, videos, labs, assignments and literature. When visiting this page keep an eye on the “last updated” date. The course and the linked material will be updated and upgraded continuously.

Pre-Requirements: elements of statistics and probability theory

Disclaimer: All the video clips are also available on YouTube.

Lecture 1: What is Machine Learning? – Slides (1); Videos (1, 2, 3). Reading: Handouts (1, 2); Witten et al. (2011: Ch 1). Extra: What’s Machine Learning by Andrew Ng.

Lecture 2: Basic Concepts (slides: 1, 2). Lab1. Videos: (1, 2, 3, 4). Reading: Handout; Daumé III (2015: 8-10; 19-24; 26-28); Witten et al. (2011: Ch 2; Ch 11: 407-410).

Lecture 3: Decision Trees (slides: 1, 2, 3). Lab2. Videos: (1, 2, 3, 4). Reading: Transcripts; Daumé III (2015: 10-18), Witten et al. (2011: 99-108; 192-203), Mitchell (1997: Ch3).

Lecture 4: Evaluation (slides: 1, 2, 3). Videos: (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12). Lab3. Reading: Transcripts; Daumé III (2015: 60-67); Witten et al. (2011: Ch 5).

Lecture 5: k-Nearest Neighbours. Lab4. Reading: Daume’ III (2015: 26-32, excl. 2.4); Witten et al. (2011:131-138).

Lecture 6: Naive Bayes. (Slides: 1, 2). Lab5. Reading: Daumé III (2015: 53-59; 107-110); Witten et al. (2011: 90-99).

Lecture 7: Perceptron – Attribute Transformation. Weka Tutorial: Discretization. Lab6. Reading: Daumé III (2015: 39-52); Witten et al. (2011: 305-308; 314-315; 322-323; 328-329; 331-332; 334).

Lecture 8: k-Means Clustering. (Slides: 1 [by AndrewNg]; 2). Videos: (1, 2, 3, 4, 5). Lab7. Reading: Transcripts; Daumé III (2015: 32-33); Witten et al. (2011: 138-141).

Lecture 9: Hierarchical Clustering (slides). Lab8. Videos: (1, 2, 3, 4, 5, 6). Reading: Witten et al. (2011: 273-284); Evaluation of Clustering.

Lecture 10: Wrapping up. (slides: 1,
2, 3). Videos: (1, 2, 3). Optional Lab by Svetlana S. Aksenova. Reading: Domingos (2012).

Course Literature

– Hal Daumé III (2015). A Course in Machine Learning. Copyright © 2015.Only chapters specified in the timetable.
– Ian H. Witten, Eibe Frank, Mark A. Hall (2011). Data Mining: Practical Machine Learning Tools and Techniques. 3rd Edition. Morgan Kaufmann Publishers. Only chapters specified in the timetable. You can also use the 2nd edition (freely available online). (Fourth Edition is available in Europe in January 2017: http://www.cs.waikato.ac.nz/ml/weka/book.html).
– Petro Domingos (2012). A Few Useful Things to Know about Machine Learning. Communications of the ACM, 55(10), 78-87.
– Evaluation of Clustering in C. D. Manning, P. Raghavan & H. Schütze (2008). Introduction to Information Retrieval. Cambridge University Press, © 2008 Cambridge University Press. Website: http://informationretrieval.org/

Filed under: featured, lectures, slides

Leave a Reply

*