Papageno: Predictive Models for Crisis Intelligence

Last Updated Comments: 22 July 2013

Papageno: A Pilot Study to identify suitable Predictive Models for Crisis Intelligence

I need some help to jot down real-world use cases for crisis intelligence.  Could you please point out to me past events or previous experiences that can be useful for a pilot study? “Crisis intelligence” is a new research area that is becoming more and more crucial in medium-large organizations and companies. It consists in detecting an upcoming “crisis” (a scandal or general dissatisfaction or any negative attitude) by automatically analysing text documents of any kind in electronic format.

Many commercial and open source solutions are proposed to identify the “mood” and the sentiment of masses with respect to a certain event, brand, or person through tweets, blogs, etc. But very little research has been carried out to capture and prevent more subtle phenomena, such as the slow decline of a business model or inefficient communication patterns that affect negatively both internal and external relations and, consequently, companies’ public image.

The aim of the this pilot study is create a suite of predictive models for crisis forecast. The test bed is the Enron email dataset, The Enron scandal, revealed in October 2001, eventually led to the bankruptcy of the Enron Corporation, an American energy company based in Houston, Texas. This dataset was collected and prepared by the CALO Project (A Cognitive Assistant that Learns and Organizes). It contains data from about 150 users, mostly senior management of Enron, organized into folders. The corpus contains a total of about 0.5M messages. This dataset was originally made public, and posted to the web, by the Federal Energy Regulatory Commission during its investigation. The Enron email dataset is valuable for many reasons, and it is widely used in research. To my knowledge it is the only large collection of “real” emais that is publicly available.

That initial research question of the pilot study is: “Is it possible to work out and implement predictive models that would tell us that the Enron CRISIS (= scandal & collapse) would have happened by analysing and processing the raw textual data of emails belonging to the Enron dataset?”.

The challenges are: (1) to structure the unstructured data of the Enron Dataset, (2) extract the information that are useful to create predictive models; (3) create a set of statistical models that return a “crisis” score.

Predictive models can then be applied to any kind of company data to monitor the pulse and the general health state of the whole organization in order to prevent unexpected corporate and social scandals or collapses.

Any thought, pointer, reflection, suggestion, reference or experience would be very useful.

Thanks in advance


15 comments for “Papageno: Predictive Models for Crisis Intelligence

  1. Dimitri Popolov
    10 May, 2013 at 19:36

    Hello Marina,

    I have been following Text Analytics LinkedIn group for some time as I was involved with that area with IBM. At the moment between jobs so have some brain power to do interesting things, rather than profitable ones 🙂

    Since you asked for reflections and thoughts:

    There is a Harvard Business Review article ‘LEadership Run Amok’ which identifies six styles of leadership: These are directive,
    which entails strong, sometimes coercive behavior;
    visionary, which focuses on clarity and communication;
    affiliative, which emphasizes harmony
    and relationships;
    participative, which is collaborative
    and democratic; pacesetting,
    which is characterized by personal heroics; and
    coaching, which focuses on long-term development
    and mentoring.

    There is also a mention that certain styles are more conductive to strong work culture and that during a crisis certain types of leaders tend to use certain styles of leadership.

    I wonder if these styles of leadership could be characterised by certain typical speech patterns. If so, we might be able to notice a change in a number of communicative acts of each style by organisation’s leaders as an early indicator of a coming crisis.


  2. 11 May, 2013 at 07:09

    Thanx for sharing your experience, Dimitri.

  3. 12 May, 2013 at 09:48

    From GATE (LinkedIn Group):

    obinna onyimadu • In the paper, “Out of Sight not out of mind: On the effect of social and physical detachment on information need” by Elad Yom Tov and Fernando Diaz, they tried to model the information need of users during natural disaster like earth quakes and explosions. They also modeled their system after three real live disasters in the US. While it delves into predicting user behaviour it is an interesting read and might profit your work. All the best.

  4. 12 May, 2013 at 09:51

    From Enterprise Search Engine Professionals (LinkedIn Group)

    Bruno Therrien • Hi Marina,
    Interesting piece. This brings to mind the Challenger / Champion approach to managing risk used by banks and credit card issuers (here in Canada and the US).
    This is a BI rather than CI approach but there is a lot available out there on the subject and I’m sure that it has been adapted and applied to other situations.
    One important element in the cases that you should find is where the risks fall in the order of the crisis event spectrum for medium to large companies.


  5. 12 May, 2013 at 09:55

    From Computational Linguistics (LinkedIn Group):

    Bill DeSmedt • Not to belabor the obvious, but there’s always the 9/11/2001 World Trade Center terrorist attack. There were plenty of harbingers, all extensively documented (see, e.g.,‎).

  6. 12 May, 2013 at 10:01

    From Text Analytics (LinkedIn Group):

    Tamarie Ellis • Hi Marina, there isn’t much data in this document but it might help:

    Erik Celentano, Dr. Gary Shiffman, Dr. Danielle Sandler, of Giant Oak: Predicting future crime based on structural, institutional, and demographic make-up of neighborhoods where crime occurs
    see this page for a bit more information

  7. 12 May, 2013 at 13:02

    From Internet of Things, Internet of People for a Sustainable Future… (LinkedIn Group):

    Marina Santini • Hi Ina,

    I would be great if you could post them here. Eager to know your suggestions, which are always to the point!

    Ina Lauth • Thank you, Marina, I really appreciate your posts too. I read about a couple of EU-Projects on Future Internet Technologies for Crisis Management. The largest of them is called PANDORA (7FP-ICT) and from it you find all other running EU Projects in this area. (Look in CORDIS 7FP Projects database)..also the project called FUTURE-ICT has a few case studies on predictive modeling for crisis management.

  8. Graham Bennett
    16 May, 2013 at 05:56
  9. 26 June, 2013 at 11:50

    Thanx Graham!

  10. 22 July, 2013 at 11:00

    Very interesting work with predictions by Kira Radinsky

    Kira suggests:
    “1. Many f the AI and machine learning community that mine web data do not rely only on big events. Specifically in my work
    we have seen that smaller events like a funeral announcement on a local newspaper follows by other types of events may lead to
    a crisis.
    2. There are works who investigate twitter sentiment for many tasks – including riot predictions:
    I have seen many papers trying to use Twitter for different sorts of predictions.
    Simliar works have been done for query log analysis and prediction.
    I have experimented on some of those many years ago using Google Trends and since then it is a common practice in many
    web mining applications.”

  11. 22 July, 2013 at 11:07

    Recent work on disaster detection:

    Detecting Natural Disaster Events on Twitter across Languages

    Authors: Andrea Zielinski
    Pages 291 – 301DOI10.3233/978-1-61499-262-2-291
    Book: Frontiers in Artificial Intelligence and Applications,
    Volume 254: Intelligent Interactive Multimedia Systems and Services

    Social media such as Twitter can act as a human sensor network for real-time event detection and recently has been extensively exploited for crisis management. However, little attention has been paid to applying text mining and NLP techniques to monitor events in a multilingual setting and most of the work focusses on one single language only. This paper investigates a unified framework for detecting natural disaster events on twitter across a variety of languages, and is embedded into a larger system for real-time decision support for Natural Crisis Management. This work presents the results achieved in classifying tweets in various languages. Among the four ML classifiers we evaluated for each language, the Random Forest classifier produces the best results, achieving on average 85.02% accuracy for known languages. We also bootstrapped classifiers for unknown languages by cross-language text classification. In this case, the average accuracy drops to 66.64% for the Random Forest classifier. Our results based on a specific test scenario indicate that for a timely detection of an earthquake event it is important to consider the distribution of languages spoken at the location of the event.

  12. 22 July, 2013 at 11:12


    Multilingual Analysis of Twitter News in Support of Mass Emergency Events Mass

    Andrea Zielinski & Ulrich Bügel

    Social media are increasingly becoming a source for event-based early warning systems in the sense that they can help to detect natural disasters and support crisis management during or after disasters.
    In this work-in-progress paper we study the problems of analyzing multilingual twitter feeds for emergency events. The present work focuses on English as “lingua franca” and on under-resourced Mediterranean languages in endangered zones, particularly Turkey, Greece, and Romania Generally, as local civil protection authorities and the population are likely to respond in their native language. We investigated ten earthquake events and defined four language-specific classifiers that can be used to detect earthquakes by filtering out irrelevant messages that do not relate to the event. The final goal is to extend this work to more Mediterranean languages and to classify and extract relevant information from tweets, translating the main keywords into English.
    Preliminary results indicate that such a filter has the potential to confirm forecast parameters of tsunami affecting coastal areas where no tide gauges exist and could be integrated into seismographic sensor networks.

  13. 22 July, 2013 at 11:20


    Getting There First:Real-Time Detection of Real-World Incidents on Twitter
    Social Event Detection on Twitter
    Knowledge based Social Network Applications to Disaster Event Analysis

    Proceedings of the International MultiConference of Engineers and Computer Scientists 2013 Vol I,
    IMECS 2013, March 13 – 15, 2013, Hong Kong

  14. 22 July, 2013 at 11:20


    On-line Trend Analysis with Topic Models:
    #twitter trends detection topic model online
    J e yHan Lau1,2 Ni gel Collier3 Timothy Baldwin1,2

    (1) Dept of Computing and Information Systems, The University Melbourne, Australia
    (2) NICTA Victoria Research Laboratory, Australia
    (3) National Institute of Informatics, Japan,,
    We present a novel topic modelling-based methodology to track emerging events in microblogs
    such as Twitter. Our topic model has an in-built update mechanism based on time slices and implements a dynamic vocabulary. We first show that the method is robust in detecting events using a range of datasets with injected novel events, and then demonstrate its application in identifying trending topics in Twitter.

    KEYWORDS: Topic Model, Twitter, Trend Detection, Topic Evolution, Online Processing

  15. 22 July, 2013 at 11:23


    A Support Platform for Event Detection using Social Intelligence

    Timothy Baldwin, Paul Cook, Bo Han, Aaron Harwood,Shanika Karunasekera and Masud Moshtaghi
    Department of Computing and Information Systems
    The University of Melbourne

    This paper describes a system designed
    to support event detection over Twitter.
    The system operates by querying the data
    stream with a user-specified set of keywords,
    filtering out non-English messages,
    and probabilistically geolocating each message.
    The user can dynamically set a probability
    threshold over the geolocation predictions,
    and also the time interval to present
    data for.

Leave a Reply

Your email address will not be published. Required fields are marked *