Automatic Language Analysis for Suicide Prevention

Text/Content Analytics for Suicide Prevention (II)

Last Updated: 2nd October 2012

Last week I sent out a request about suicides’ language analysis on several LinkedIn groups asking for pointers to previous studies and existing material that could enrich the list of references proposed as a starting point (see here).  Noteworthy suggestions and useful reflections are summarized below:

The work of James Pennebaker in Texas. They have their own corpus-like tool (LIWC) which has been used for this purpose too. He did an analysis of Sylvia Plath and other poets’ writings (several who met tragic ends at their own hands) and had some very interesting findings about their use of pronouns in particular.

• References:

  1. Pennebaker did a study on the language of suicidal poets, and also on depressed and depression vulnerable college students (among other language and mental health studies). You can download most of his articles from his website
  2. Wenbo Wang, Lu Chen, Krishnaprasad Thirunarayan and Amit P. Sheth. Harnessing Twitter ‘Big Data’ for Automatic Emotion Identification. In Proceedings of International Conference on Social Computing (SocialCom), 2012: (detection of emotions (e.g., joy, love, sad, anger, etc.) in social media posts at the sentence level)
  3. Wenbo Wang, Lu Chen, Ming Tan, Shaojun Wang, Amit P. Sheth. Discovering Fine-grained Sentiment in Suicide Notes. Biomedical Informatics Insights, vol. 5 (Suppl. 1) pp. 137-145, 2012:
  4. Biomedical Informatics Insights journal has a special issue on detecting emotions in suicide note sentences. The focus was on 7 negative and 6 positive emotions: (exploratory research of suicidal language).
  5. Carole E. Chaski, Suicide Note Assessment with Quantitative and Qualitative Methods (abstract)
  6. Carole E. Chaski, Is this a Real Suicide Note? Authentication Using Statistical Classifiers and Computational Linguistics (presentation)
  7. Results of a shared task on classifying emotions in suicide notes held last year. The proceedings are available online at Some titles: Statistical and Similarity Methods for Classifying Emotion in Suicide Notes; Rule-based and Lightly Supervised Methods to Predict Emotions in Suicide Notes; Three Hybrid Classifiers for the Detection of Emotions in Suicide Notes; Binary Classifiers and Latent Sequence Models for Emotion Detection in Suicide Notes; Discovering Fine-grained Sentiment in Suicide Notes; etc.
  8. NLP on clinical notes to predict suicide outcomes (Rodney Nielsen ( was the one who did much of the NLP research):

    Heather D. Anderson, Wilson D. Pace, Elias Brandt, Rodney D. Nielsen, David R. West, Richard R. Allen, Anne M. Libby, and Robert J. Valuck. (2011). Methods for enhanced identification and detection of suicidality outcomes in observational comparative effectiveness and safety research. In The Third Symposium on Comparative Effectiveness Research Methods (Methods for Developing and Analyzing Clinically Rich Data for Patient-Centered Outcomes Research). Rockville, Maryland, June 6-7, 2011.

    Wilson Pace, Rodney D. Nielsen, Heather Anderson, Robert Valuck, Elias Brandt, and David R. West. (2010). Data Additions Related to Depression Care through Natural Language Processing. A report to Agency for Healthcare Research and Quality: Developing Evidence to Inform Decisions about Effectiveness (DEcIDE) Program. November, 2010.

• Comments: Carole E. Chaski’s comment: “SNARE (Suicide Note Assessment REsearch) is a suicide note classifier, part of ALIAS (Automated Linguistic Identification and Assessment System) and available to vetted and trained users (law enforcement, psychologists, psychiatrists, security and intelligence analysts, and researchers) through the web (web_ALIAS). I have given some talks about this at the American Academy of Forensic Sciences and other conferences. Please contact me for powerpoints etc at cchaski at ALIAS technology dot com. Basically, I have a database of ~400 real suicide notes and ~500 control documents, and the classifier runs at 86% (leave-one-out cross-validated) accuracy for notes larger than 45 words and 80% for longer notes. The longer the notes the more they get easily confused with similar types of texts such as apologies, love letters and such.”

• Suicide prevention’s controversial issues: An issue that has been pointed out concerns “false alarms”, i.e. the identification of people  just having bad time rather than really going to take their life. A more important issue is about “freedom” and “authorities’ control”: does suicide prevention implies having some authority monitoring what citizens are writing and reacting to it in some way? What is more, is it ethical to prevent (also using force) people from committing suicide rather than letting them do what they want?

• Handwriting analysis: Kimmon Iannetta’s work with Handwriting and violence? You can go to her website at She is a wealth of knowledge and very helpful. Handwriting reveals how we act (behavioral), think (word selection) and feel (changes from personal baseline). Suicidal tendencies in handwritten notes tend to show compression, baseline deterioration and pressure pattern changes.

• Features: (a) The semantic analysis must surely give some discriminant features but what about analyzing the way people type. E.g., gothic people could have a morbid sense of humour or some other people could just be depressed and would not commit suicide, all these people would be recognized as false positive. It is a precision problem. The way people type would give some information about the intern emotional state of the person and might be useful. (b) Apparently, few people wake up and suddenly decide to commit suicide. In other words, I would say one could expect a history of communication with tell-tale signs, for example on blogs or Twitter. So I think a timeline of connected documents sharing the same kind of tone and key words might be a useful discriminant feature. (c) it would be interesting to define risky behavioral patterns and then search for patterns as in frauds detection. (d) James Pennebaker mentioned function words as indicators of depression and suicidal tendencies in his invited talk at this years NAACL, so you might want to check out his book “The Secret Life of Pronouns” and the associated research publications. (e) A research group at SRI have been working on speech analysis and depression, using prododic patterns,etc. to pick up on affective state

Big Thanks to the following people for their great suggestions and for liking the discussion:
Zsofia Demjen, Wenbo Wang, Amit Sheth, Pawel Matykiewicz, Hector-Hugo Franco-Penya, Daniel Lindmark, Stephanus van Schalkwyk, Przemyslaw Maciolek,Chaker Jebari, Costas Gabrielatos, Federica Ferrari, Christian Bauckhage, R. David Weaver, Carole E. Chaski, Sylvie Dalbin, Frank Marsh, Marcel Elfers, Serena Pasqualetto, Alison Rush, Kim Luyckx, Jelena Mitrovic, Kevin Bougé, Gideon Kotzé, Florian Laws, Aaron Lawson, Isabel Picornell, Jonathon Read, Lee Becker … did I forget anyone?

4 comments for “Automatic Language Analysis for Suicide Prevention

  1. 26 September, 2012 at 08:25

    Dear Marina,

    Are you aware of the competition that was the basis of the Biomedical Informatics Insights special issue? In 2011, the Computational Medicine Center (Cincinnati, Ohio) organized this competition (see to detect emotions conveyed in suicide notes. On the website, you can download samples of the data.

    Best regards,


  2. 26 September, 2012 at 08:42

    Dear Kim,

    thanks for this update. I was not aware of it. That’s great!

  3. 30 October, 2012 at 22:34

    Have you considered crowd sourcing as a tool for this? A system could identify candidates and then, stripping identifying information if the source isn’t public, it could send out candidates to a crowd sourced tool. Volunteers could then confirm that the text indicates suicidal ideation back to the central system and then the central system could alert someone in a pool of trained councilors. This would remove most false positives.

  4. Marina Santini
    22 February, 2013 at 18:19

    Thanks, Eric, for your suggestion.

Leave a Reply

Your email address will not be published. Required fields are marked *