Author: Marina Santini

Computational Linguist, PhD

Lecture 9: Machine Learning in Practice (2)

Bag-of-words Representation

Topics: features representation, unbalanced data, multiclass classification, theoretical modelling, real-world implementations, evaluation, holdout estimation, crossvalidation, leave-one-out, bootstrap Lecture 9: Machine Learning in Practice (2) from Marina Santini

Lecture 8: Machine Learning in Practice (I)

Topics: evaluation, t-­test, cost-sensitive measures, occam’s razor, k-statistic, lift charts, ROC curves, recall-precision curves, loss function, counting the cost, weka Lecture 8: Machine Learning in Practice (1) from Marina Santini

Lecture 5: Interval Estimation (ML4LT)

Topics: inferential statistics, statistical inference, language technology, interval estimation, confidence interval, standard error, confidence level, z critical value, confidence interval for proportion, confidence interval for the mean, multiplier, Lecture 5: Interval Estimation from Marina Santini

Lecture 4: Decision Trees (Part 2) (ML4LT)

Topics: attribute selection, constructing decision trees, decision trees, divide and conquer, entropy, gain ratio, information gain, machine leaning, pruning, rules, suprisal Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio from Marina Santini

Book Review: Sequences in Language and Text (2015)

Book review by Marina Santini in publication on the LinguistList — http://linguistlist.org/issues/27/27-1505.html — Book announced at http://linguistlist.org/issues/26/26-2205.html EDITOR: George K. Mikros EDITOR: Ján Macutek TITLE: Sequences in Language and Text SERIES TITLE: Quantitative Linguistics [QL] 69 PUBLISHER: De Gruyter Mouton…

Lecture 3b: Decision Trees (Part 1) (ML4LT)

Decision Treee

slideshare presentation: http://www.slideshare.net/marinasantini1/lecture-3b-decision-trees-1-part Topics: Greediness, Divide and Conquer, Inductive Bias of the Decision Tree, Loss function, Expected loss, Empirical error, Induction.

Dissemination: What sampling size?

Dear All, I paste here an interesting discussion I read on Corpora List some days ago. I think the issue of corpus size is relevant to many of us. Here is the discussion in its integrity: %— start Daniel Elmiger…