Last Updated Comments: 22 July 2013
Papageno: A Pilot Study to identify suitable Predictive Models for Crisis Intelligence
I need some help to jot down real-world use cases for crisis intelligence. Could you please point out to me past events or previous experiences that can be useful for a pilot study? “Crisis intelligence” is a new research area that is becoming more and more crucial in medium-large organizations and companies. It consists in detecting an upcoming “crisis” (a scandal or general dissatisfaction or any negative attitude) by automatically analysing text documents of any kind in electronic format.
Many commercial and open source solutions are proposed to identify the “mood” and the sentiment of masses with respect to a certain event, brand, or person through tweets, blogs, etc. But very little research has been carried out to capture and prevent more subtle phenomena, such as the slow decline of a business model or inefficient communication patterns that affect negatively both internal and external relations and, consequently, companies’ public image.
The aim of the this pilot study is create a suite of predictive models for crisis forecast. The test bed is the Enron email dataset, The Enron scandal, revealed in October 2001, eventually led to the bankruptcy of the Enron Corporation, an American energy company based in Houston, Texas. This dataset was collected and prepared by the CALO Project (A Cognitive Assistant that Learns and Organizes). It contains data from about 150 users, mostly senior management of Enron, organized into folders. The corpus contains a total of about 0.5M messages. This dataset was originally made public, and posted to the web, by the Federal Energy Regulatory Commission during its investigation. The Enron email dataset is valuable for many reasons, and it is widely used in research. To my knowledge it is the only large collection of “real” emais that is publicly available.
That initial research question of the pilot study is: “Is it possible to work out and implement predictive models that would tell us that the Enron CRISIS (= scandal & collapse) would have happened by analysing and processing the raw textual data of emails belonging to the Enron dataset?”.
The challenges are: (1) to structure the unstructured data of the Enron Dataset, (2) extract the information that are useful to create predictive models; (3) create a set of statistical models that return a “crisis” score.
Predictive models can then be applied to any kind of company data to monitor the pulse and the general health state of the whole organization in order to prevent unexpected corporate and social scandals or collapses.
Any thought, pointer, reflection, suggestion, reference or experience would be very useful.
Thanks in advance