Enterprise Search has a very bright future!

Last updated (Comments): 10 July 2013

On the 30th May 2013, I attended the Findability Day 2013 (findabilityday2013-esli.eventbrite.com) organized by Findwise (www.findwise.com). The gathering of about 200 participants took place in Central Stockholm (Odenplan) in a sunny day, in bright and spacious conference rooms, and in friendly and laid-off atmosphere. The event – “the biggest event on search and findability in Northern Europe”, as the subtitle says – was free of charge (only registration was required) and was sponsored by Google and Splunk.

I will not give a complete debrief of the Findability Day 2013 in this post. Martin White has summarized the highlights in his blog (http://www.intranetfocus.com/?p=1295), and Olof Belfrage describes in more details the presentations in a post (http://www.findwise.com/blog/impressions-from-findability-day-2013/) published on Findwise blog.

In this post I would like to summarize a few points that, in my opinion, are important for the future of search and information discovery. These points were neatly listed by Martin White, the first speaker.

In his presentation, Martin firmly stressed that (enterprise) search should not be seen in isolation. He encouraged a closer interaction of three fields that have overlapping areas and many intersections, namely Enterprise Search (ES), Information Retrieval (IR) and Information Science (IS).

At present, ES is mostly unaware of what IR and Text Analytics are doing and has not yet taken into consideration of including, for example, sentiment analysis as a way of enriching search information, says Martin.  That’s an excellent point, I think. I also see a gap in current search applications. If you have hints, suggestions or pointers on this topic, please give me a shout. My personal experiments in this area can be found in the following post: How emotional are query logs (http://www.forum.santini.se/2013/01/emotion-needs/).

Search is an irreplaceable compass that helps orientate in many fields. For example, search can be seen as a way of guiding information discovery. However, Martin states, one cannot think of fixing search and automatically get out valuable information, just like magic. Search must be shaped and moulded for information discovery, if this the specific focus. The same is true when using search for structuring Big Unstructured Data. For instance, where is strategic information in petabytes of unstructured text when doing Business Intelligence (BI)? Simple search is not enough (http://en.wikipedia.org/wiki/Business_intelligence). Traditional BI dashboards are still leaking…  Investments are needed in research on search to hunt the treasure buried in the big data. New ways of creating metadata and taxonomies, of handling interoperability should be sought after with more determination and open-mindness.

Searching for entities, connecting people via content across mobile enterprise search are the next actions that cannot be missed, says Martin. He sees a big potential in delivering search against documents on all devices, e.g. smart phones, tablets and PCs, which should also be enabled to print search results and relevant documents (print functionalities are currently overlooked in mobile applications).

Another field where search can play an important role is stream analytics. Stream analytics focuses on how to monitor what is going on in a organisation. Why searching only you own content? Search the whole ecosystem instead (of course, search with permission!) IR research has recently presented Streamz (http://www.research.ibm.com/haifa/dept/imt/papers/guyCIKM12.pdf) and Jane McCornell is working on the concept of Digital Workplace (http://www.digital-workplace-trends.com/), a framework that empowers people by connecting everybody within an organization.

Martin gives big praise to the recent book on search “Designing the Search Experience” by Tony Russell-Rose and Tyler Tate (http://designingthesearchexperience.com/), apparently “the best book on search so far”,  with many use cases and with a special focus on usability.

Search is “incredibly modular”, is Martin’s final statement, and search should definitely include content analytics in all its forms, e.g. text analytics, sentiment analysis, social media monitoring and the many more things that can be found “under the bonnet”. The convergence of search and text analytics & co. is definitely a corner stone for next-generation (enterprise) search engines, emphasizes Martin White.

Well, my views on search are basically on the same lines as Martin’s…

I would be interested in getting different views on the future and/or the needs of search. Your opinion is much appreciated…

Thanks to Findwise for the nice event!


IMPORTANT: almost all of the presentations have now been made available. To view the videos and slides, go to Findwise’ resources page.

You can also download some of the presentations if you click the links below:

The Future of Search – Martin White

One Common Search Service – Niclas Lillman and Nicklas Eriksson, Scania

Developing a Search & Findability Practice for the Enterprise  – Ravi Mynampaty, Harvard Business School

Unveil the hidden values in your organization – Troels Walsted Hansen, Microsoft

Building the Star Trek Computer – Daniel Bergqvist, Google

Results from Findability Survey 2013 – Kristian Norling, Findwise

Governance and the role of search in user satisfaction – Johan Johansson, Municipality of Norrköping

6 comments for “Enterprise Search has a very bright future!

  1. JF Delannoy
    18 June, 2013 at 09:58

    are there opensource modules for e.g.

    forum scraping (reformatting) including hybrid strategy of parsing and matching

    lexicon handling

  2. JF Delannoy
    18 June, 2013 at 09:58

    same for visualisation, preferably hybrid relational/clustering

    Tak tak

  3. 24 June, 2013 at 10:10

    From Enterprise Search Engine Professionals (LinkedIn Group)

    Jhon Zika • But now, all The enterprise search software is very weak! They all only supply the text search function, not the ture enterprise search. We know the document quality is very important for the search quality, but all the famous enterprise search engine do not supply some function to do the clean job for the existed data.

    Dmytro Kurylovych • I’m really excited of what explained here
    One addition – such systems should be tightly integrated with communication and web2.0

    Marina Santini • Thanks for sharing your thoughts and preferences!

    Jhon Zika • Henry, the Aspire is an open source sorftware?

    Charlie Hull • @Jhon no it isn’t, sadly, but is based on some open source parts http://st2.spacestream.co.uk/aspire-faq.html Findwise also have a pipelining framework which is open source https://github.com/Findwise/Hydra There are other options too including Pypes (which is a little stale I think) and Piped https://github.com/foundit

    Paul Gerwe • @Marina Search has a long future ahead of it. The specific features and capabilities though that you mention are a tool box. Capabilities that might work well in structured data searching or well currated unstructured content with signficant and complete metadata will need different tools than if you’re searching a social feed. The tools and weights you deploy should be tuned to the problem at hand. What’s your corpus and what problem are you trying to solve?

    Every offering will have it’s strengths and weaknesses. Is it more advantageous to have a single search technology that’s general purpose and does most things at least reasonably well or have specialized and focused engines that excel at text analytics or semantic analysis for example? I’d say it’s situational.

    @Jhon In my experience the available enterprise search tools tend to be capable of delivering the desired data. The larger outages are often in data quality, perception and unrealistic expectations of what an enterprise search experience should deliver. Changing things internal to a company especially culture to do things like complete metadata etc. are hard.

    Marina Santini • @Paul: the long term goal is to extract actionable intelligence from unstructured textual data using search. In my view, one crucial problem that has been overlooked so far is to characterize the different data types. As you said, you cannot apply the same methods for structured data and for social feeds.
    I would be more specific and my guess is that you cannot apply the same methods for the different types of unstructured documents (that I usually call “genres”). If you want to mine actionable intelligence in query logs, for example, you must be aware of what kind of intelligence you want (customer-oriented, future trend detections, emotions, etc?) and where and how this core information is located. If you search email archives and/or corporate letters, and/or financial reports, etc. it is convenient that a search functionality is aware of the communicative structure underlying the different types of documents. Personally, I get extremely frustrated when searching information (through a search box) within websites. So the short-term wish is to improve website search by making the search functionality aware of the types of documents available and building on it. Just returning snippets containing the keywords typed by users is not enough any more. An obvious next step is to study query logs and user interaction. The usual answer that I get from search practitioners when I complain about the search quality is “clients do not want to pay for it”. My usual reply to that is “Serialize research and you will get better search quality for the same price.” After all, the best investment is often in “user satisfaction”…

    As you said Paul, changing things internal to a company especially culture to do things is hard. But not impossible 🙂

    Stephen E. Arnold • Am I incorrect in noticing that the buzzword is taking precedence over the specific function required by a user to complete a task? Is it possible that buzzwords describe the failure of systems which have not solved specific information retrieval problems? In order to implement a system which generates useful outputs to a user, has the difficult task of identifying specific requirements been handled in an incomplete way? Buzzwords are good for consultants and marketers. Are the buzzwords helpful to a user who cannot locate information required, often under time and resource constraints? Maybe “good enough” is the future of findability? Imprecise language may be the evidence one needs to answer this question, “Has enterprise search run into a dead end?” Stephen E Arnold, June 29, 2013

    John O’Gorman • @Stephen is on the right track…”enterprise search” is great for marketeers and platform salesmen, but the phrase itself doesn’t really mean anything. Even Google (or maybe especially Google) knows this.

    In other words E-Search has a profitable future in front of in but not, IMO, for very much longer.

    Constance Ard • This has been an intriguing discussion. The need for specific requirements as well as professionals who understand how the tools operate and how information is created, stored and retrieved are essential elements to successful search in any organization or environment. I was reminded of the essential truth to the need for professionals as I began a new search project recently that requires translation of Boolean operators from one system into another system’s syntax. No machine can make that translation without the analysis of a human first due to the complexity of the information sources being used.

    Charlie Hull • @Constance – interesting, we see this need a lot as we work with several media monitoring companies who can have tens of thousands of complex stored query strings. Our approach is to develop a plugin for Solr that speaks the syntax of the previous engine (see http://www.flax.co.uk/blog/2012/04/24/dtsolr-an-open-source-replacement-for-the-dtsearch-closed-source-search-engine/ ) and then work with the client on the ‘outliers’ – often there are queries that work a certain way due to deficiencies in the original engine and we need to work out how to support the same ‘failings’ (which the client and end user want to retain!) in the new engine. It’s never a 100% drop-in replacement and does need human input.

    The interesting thing for me is to wonder which other closed-source syntaxes we could implement as a plugin for an open source engine….would certainly make migrations easier!

    Stephen E. Arnold • Is the present line up of enterprise search solutions a result of computational similarities? Are the systems now available almost impossible to differentiate? Has enterprise search reached a dead end because more sophisticated methods of processing content objects hit a computational barrier? Is the talk about the Big O in the analytics world a factor in what is creating procurement teams who ask, “How does System A add something we don’t have with our existing systems? As the volume of data goes up, dissatisfaction with enterprise search is a constant theme. Where is the progress? Is there an answer in for fee magic grids, expensive reports, and endless marketing jargon adjustments? Stephen E Arnold, July 6, 2013

    Dmytro Kurylovych • Stephen, enterprise search only – can not significantly increase the value. Anyway – human is responsible for decisions.
    Today business need a set of accelerations, which will improve human activity in:
    – semantic search
    – find expert, collaboration
    – decision making, assesment
    * we have to use all existing experience
    * we have to utilize all existing resources

    I feel that many problems of businees are not in digits, but in something what is between (fake constraints).
    One example – I saw singificant increase of productivity and motivation – when people starts working without schedule (No hours).
    World is changing, order is reorganizing, something will never be the same as in the past, we have to be ready to new order.

    Runar Buvik • Dmytro, not sure I agree with you that enterprise search cannot significantly increase the value.

    People need information to take decisions. One can always find specific cases where a very advanced system will perform better for the case one are studying, but thus systems may also be complex and hard to use, so only a small elite end up using them. A very basic search system on the other hand may be much easier to use, so the whole organization can more easily benefit from it.

    > One example – I saw singificant increase of productivity and
    > motivation – when people starts working without schedule (No hours).

    Of course you did, but were it real or only the Hawthorne effect?

    For those unfamiliar with this effect so it is based on experiments done over 5 years in 1920 at an electric factory. Then did experiments on a group of workers and found that every change they made caused the worker to be more productive. Both increasing and decreasing the pay made the workers be more productive. Same for increasing breaks and decreasing breaks, higher or lower the levels of light, moving workstations around etc all made the workers more productive for a while.

    Apparently the workers may have felt that any changes meant the management was taking an interest in them, so they become more dedicated and productive for a while, then the effect wore off. Quit fascinating read. More information is available at http://en.wikipedia.org/wiki/Hawthorne_effect

    Paul Gerwe • Expectations in terms of findability both in internet and enterprise search are set by Google. Users largely don’t know and likely don’t care that many of the key reasons that Google can deliver the kind of results it does aren’t replicated in a company.

    For any given topic there are likely dozens, hundreds or thousands of sites that provide the answer the user is looking for (or think meet their needs), and as long as 1 comes up, Google succeeds. Internally there’s likely to be at most a handful of pages or doucments, possibly even just 1, which may or may not be indexed that contains the answer. If enterprise search doesn’t turn up that specific page or document, enterprise search fails.

    This is just one key difference in why Google succeeds where ES often fails.

    Improving the corpus and indexed content is a key driver to better overall results. I think that enterprise search tools that can provide better analysis and methods to help manage that underlying content will differentiate themselves from their competitors. Imagine for example that your search engine starts providing reports on the last modified date of content within a given collection. Consider if it could marry that information with at least the frequency that the content is accessed through the search tool and possibly even be able to link to the general usage information. Not only could that information be used to weight for relevancy, but it could also serve as guidance for content management.

    Is the content old and unused? – Recommend deletion
    Is the content old, but somewhat used? – Recommend review and update, with a minimum of the inclusion of a “last reviewed on date and by who” that can increase the confidence of the content’s quality.

    Users expect that everything within the company is indexed by the enterprise search tool. If the search engine could generate a list of the content that is actively indexed, it could help content owners review for obvious gaps. It would also provide visibility on what’s truly in and out.

    Findability within the enterprise is a complex problem. The more the search engine providers can support addressing the bigger issues and challenges with enterprise findability, the greater the likelihood that they and we succeed.

  4. 24 June, 2013 at 10:15

    @JF Delannoy: are you in the right thread with your request? if yes, be more specific…

  5. 2 July, 2013 at 07:45

    From KD2U – Knowledge Discovery in Distributed and Ubiquitous… (LinkedIn Group)

    Silvester Claassen • I can only underscore your emphasis on interdisciplinarity, interoperability, metadata/taxonomies and workplace orientation, while pointing out the further benefit of paradigmatic integration with the modeling, logical-inference, simulation, testing/query-generation, data-mining/warehousing processes of knowledge management and interfacing with Semantic Web 3.0

  6. 2 July, 2013 at 07:53

    From Information Science and LIS (LinkedIn Group)

    Joni Metcalf-Kemp • Thank you for posting your observations. I enjoyed reading them and hope that you can suggest further reading in this area.

Leave a Reply to Marina Santini Cancel reply

Your email address will not be published. Required fields are marked *