The WebGenre Blog: The power of genre applied to digital information. By Marina Santini » Entries tagged with "big data"

Papageno: Predictive Models for Crisis Intelligence

Last Updated Comments: 22 July 2013 Papageno: A Pilot Study to identify suitable Predictive Models for Crisis Intelligence I need some help to jot down real-world use cases for crisis intelligence.  Could you please point out to me past events or previous experiences that can be useful for a pilot study? ”Crisis intelligence” is a new research area that is becoming more and more crucial in medium-large organizations and companies. It consists in detecting an upcoming “crisis” (a scandal or general dissatisfaction or any negative attitude) by automatically analysing text documents of any kind in electronic format. Many commercial and open source solutions are proposed to identify the “mood” and the sentiment of masses with respect to a certain event, brand, or person through tweets, blogs, etc. But very little research has been carried … Read entire article »

Filed under: queries, requests

Presentation: Text analytics and R – Open Question: is it a good match?

* The Quest: finding the optimal way to handle Big Textual Data for Information Discovery & Actionable Intelligence * The Open Question: is R convenient for text analytics of Big TEXTUAL Data? * The Mission: identification of pros, cons, limits, benefits Current Status: investigation in progress… Live casts of the this R meetup are available here: First talk: Text analytics and R by Marina Santini Second and third talks: Wordclouds from Twitter with R by Måns Magnusson and An example of text analytics in R by Joakim Lundborg You can find R code suggestions in this thread: … Read entire article »

Filed under: featured, slides

Presentation: How Emotional Are Users’ Needs? Emotion in Query Logs

According to recent IR research, searchers’ behaviour is not only limited to traditional informational, navigational and transactional needs. A novel hypothesis is that the seeking behaviour is driven by emotion. These experiments are part of SearchInFocus, a study centred on search. How Emotional Are Users’ Needs? Emotion in Query Logs from Marina Santini … Read entire article »

Filed under: featured, slides

Meetup Report: Big Data & Predictive Modeling – What’s happening in Sthlm?

On Thursday, September 6, 2012 the first meetup on BIG DATA & PREDICTIVE MODELING- WHAT’S HAPPENING IN STHLM? was held at the Klarna Headquarters in Stockholm. The event was very successful and (according to the organizer) unexpectedly crowded (about 90 attendees) of passionate practitioners and, more generally, of people interested in big data (like myself). Although I could not attend the socialization slots before and, above all, after the event at the bar, it was a very informative and enjoyable meeting and I hope that similar events will be held in the future. … Read entire article »

Filed under: reports, reviews

Reblogging: Gavagai! Gavagai!

Source: Follow the Data Blog — Follow the Data podcast, episode 1: Gavagai! Gavagai! by Mikael Huss Podcast link: Follow The Data | Episode 1 – Gavagai! Gavagai! This first episode, as has been mentioned before on this blog, is about a Stockholm startup company, Gavagai, which provides a technology platform called Ethersource. We interviewed the company’s CDO (chief data officer), Fredrik Olsson, and the chief scientist, Magnus Sahlgren, and we think it resulted in a very interesting chat, although the sound quality is perhaps not ideal due to our inexperience with podcasting. Some interesting tidbits from the conversation: The name “Gavagai” comes from a thought experiment by Quine demonstrating the “indeterminacy of translation“. It’s also the reason for the presence of the little rabbit on the Gavagai web page. Olsson describes Ethersource as a “semantic processing layer of … Read entire article »

Filed under: reblogging

Summary: Where is the future? From big data to contextualized information

Comments to the post: The Path Forward: From Big Unstructured Data to Contextualized Information ( Discussion on LinkedIn: American Society for Information Science & Technology ( Tom Reamy • Hi Marina, good blog – and as someone dealing with the idea of context in text analytics for many years, I’m in total agreement as to its importance. There are quite a few other types of context that are important as well. Another conversation. As far as text analytics tools dealing with this – most of them can but the ones with a full set of operators will probably do best. Two contextual areas come to mind immediately – how to get TA software to recognize context like genre when it is not specified and how to take context into account in categorization or extraction rules. The … Read entire article »

Filed under: featured, summaries

The Path Forward: From Big Unstructured Data to Contextualized Information

How can we convert massive quantities of unstructured data to structured information? What kind of “structure” do we need for a reliable interpretation of this undomesticated data? I suggest thinking of a text-analytic framework based on “context”. Search keywords, events, entities, sentiments, attitudes, polarities, opinions etc. have a different weight and require a different assessment depending on the kind of texts, the situational context, the  field of discussion, and the authority of the source, as well as on the purpose of use. For example, for an official use, factual texts might have more credibility than opinionated texts. In this respect, press conferences, declarations or announcements by a White House spokesman might be more reliable than newspapers’ speculations or op-ed articles. On the contrary, if we want to test the pulse and … Read entire article »

Filed under: dialectic, discussions

Reblogging: Big Data Week

A good week for (big) data (science) Source: Follow the data – A Data Driven Blog, Posted by Mikael Huss, 10 March 2012 Perhaps as a subconscious compensation for my failure to attend Strata 2012 last week (I did watch some of the videos and study the downloads from the “Two Most Important Algorithms in Predictive Modeling Today” session), I devoted this week to more big-data/data-science things than usual. Monday to Wednesday were spent at a Hadoop and NGS (Next Generation [DNA] Sequencing) data processing hackathon hosted by CSC in Espoo, Finland. All of the participants were very nice and accomplished; I’ll just single out two people for having developed high-throughput DNA sequencing related Hadoop software: Matti Niemenmaa, who is the main developer of Hadoop-BAM, a library for manipulating aligned sequence data in the cloud, and Luca Pireddu, who is the … Read entire article »

Filed under: reblogging