Enterprise Search Technologies: Enterprise Search Summit Session Notes

by Bill Ives

This is another in a series of notes from the Enterprise Search Summit connected with 2009 KM World. Enterprise Search Technologies was a preconference workshop. It was led by independent consultant Miles Kehoe from New Idea Engineering, Inc. Here is the session description.

“This workshop, by a vendor neutral consultant who has hands-on experience with a broad range of “out of the box,” open source, commercial, and home grown solutions, provides an overview of the enterprise search technology landscape. It reviews technologies currently on the market; discusses pros and cons, strengths and weaknesses, and specific requirements Kehoe shares case studies that illuminate how search technologies are leveraged in different types of organizations; and provides a good introduction to and understanding of the enterprise search world.”

Miles said the characteristics of great search include conversational capabilities, open ended, flexible, and smart.  Conversational search allows you to interact to focus your quest. This is especially important for enterprise search, as search is much harder inside the enterprise.  Never provide just – no hits. Ask more if you cannot find anything.  Every search engine is built around a set of indices. This even applies to Google who creates an index through its spiders. Different search engines just add different stuff around the indices.  Every search engine goes through a process. Some expose parts of it which gives you added flexibility to pull specific information out.

It used to be you got plain search results pages in enterprise search like basic Google Web search. Now you get what he called enterprise search 2.0 with visualization, navigation, people, facets, etc. strung around the basic results.

There are two basic parts of search: indexing and the actual search. It is better to take time at indexing (when people are not waiting) than search when people are waiting. However, I asked if the real time capabilities of Twitter were changing those expectations.  People want to see stuff as soon as it exists.

Before he reviewed vendors, Miles said it is not the technology but the methodology. It is how you implement the search engine. I can agree with this.

As he started to review vendors Miles mentioned Lucene/Soir, a free open source search engine that is behind a number of search engines, including some commercial ones. It is Java based with an Apache license, prolific documentation, many implementations, and you have total control over search and relevance. However, there is some implementation work required and it is hard to find answers. There are limited enterprise support options. SearchBlox is packaged Lucene. Lucid Imagination is packaged Soir.

Miles’ tier one vendors are: Autonomy, Endeca, Exalead, Fast Search (the original independent version), Google, Vivisimo. I have reviewed Exalead (see Exalead’s CloudView Offers Integrated Search Capabilities).  His criteria are: broad enterprise presence, multi-platform search, market penetration, and clear product vision.  People like the Google brand so they have a perception that Google enterprise search works well. Not being in Tier One is not necessarily bad, just not meeting all the criteria. Other vendors I have reviewed that Miles also mentioned include Attivio (see Attivio Aligns with Traction and Releases New Features) that is newer and Recommind (see Recommind Provides Axcelerate eDiscovery 3.0 with New Features) which is more vertical focused.

Dates are important but web servers provide bad data so it is hard to trust what you get. Miles gave the example of a 1996 document appearing as new because it had just been re-indexed.

The wifi started working so Miles showed us Web sites with good search capabilities. Globrix is a UK real estate site that uses FAST and you could see a lot of facets in home listings such as number of bedroom, bathrooms, price range, etc. Then we looked at Newssift that displays sentiment on topics. We looked at Kosmix that provides an example of exploratory search.  It shows things that are related and loosely related.

Next we covered supporting technologies including document filters, connectors, social search, and federation.  Document filters are part of the indexing process that converts binary source documents (PDF, Office, etc.) into a stream of text for indexing.  Connectors are utility tools to provide a clearly defined interface between a search engine and external content. Some relate to indexing and others to display.  Connectbeam is an example (see my reviews: Connectbeam Offers New Social Networking Application Integration Possibilities).

Social search is a popular term that applies to the capability to search corporate personal profiles to find people in an organization with certain skills or experience. It typically requires user to explicitly self-profile in order for searches to return accurate results. Some products now track user behavior to implicitly associate interest to users.

Federation refers to a program that can dispatch user queries to one or more external data sources (search engines, RDBMS systems, etc.) and present the combined results to the user. Federation from unsecure resources is fairly easy. Because relevance from each source is calculated differently, it is sometimes difficult to integrate results in a meaningful way.

Entity extraction recognizes people, places, or things during indexing. In unsupervised extraction entities are recognized through algorithms. In supervised extraction, the process is seeded by human operators prior to processing.

Sentiment analysis recognizes positive or negative sentiment algorithmically during indexing. It is easier to tell positive sentiment than negative.

Results clustering groups sets of documents into categories base don content. It looks like facets and entity extraction however clustering can be done independent of the query. Clustering is often used in search results to assist the user to discover additional related terms and content.

Facted search is the result of assigning documents in a search result list into a pre-defined taxonomy-like order. Unlike clustering, which can appear similar, facets are base don the query and populate pre-defined classes of content (authors location, etc.). Facets are often used to encourage interaction with user.

A key to having good search is to monitor it over time after the initial implementation. Look at what is happening and make corrections. Look at what people are searching for and accommodation them. You need to pull together a diverse collection of skills to have a great search function (e.g., business domain experts and corporate librarians, beyond just technical skills).

Miles mentioned two blogs on the topic that he writes: EnterpriseSearchBlog,com and SearchComponentsOnline.com.

Share:
  • e-mail
  • TwitThis
  • del.icio.us
  • StumbleUpon
  • Digg
  • Reddit
  • SphereIt
  • Facebook
  • Google Bookmarks


6 Tweets

6 Comments »

  theappgap wrote @ November 17th, 2009 at 8:24 pm

New Post “Enterprise Search Technologies: KM World Session Notes ” http://bit.ly/3pBuWh

This comment was originally posted on Twitter

  IdeatoEmpire wrote @ November 17th, 2009 at 9:24 pm

Enterprise Search Technologies: KM World Session Notes http://bit.ly/9NjMe

This comment was originally posted on Twitter

  searchtools_avi wrote @ November 17th, 2009 at 10:10 pm

RT @theappgap Enterprise Search Technologies: KM World Session Notes http://bit.ly/3pBuWh (BTW Miles Kehoe is a great & generous friend)

This comment was originally posted on Twitter

  searchtools_avi wrote @ November 17th, 2009 at 10:11 pm

RT @theappgap Enterprise Search Technologies session notes http://bit.ly/3pBuWh (BTW Miles Kehoe is a great guy) #essw09 #in

This comment was originally posted on Twitter

  bret_clement wrote @ November 18th, 2009 at 12:32 pm

Nice KMWorld #kmw09 recap from @billives http://bit.ly/4cfPNb. Good to see Miles Kehoe listing Exalead as tier 1 search vendor.

This comment was originally posted on Twitter

  hebsgaard wrote @ November 18th, 2009 at 2:11 pm

Enterprise Search Technologies: KM World Session Notes http://tinyurl.com/ykr64f3 via @billives

This comment was originally posted on Twitter

Your comment

HTML-Tags:
<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Additional comments powered by BackType





Custom Search
Online Database Reviews

Be sure to catch Bill Ives' ongoing review series in which he looks at online, sharable database apps. The focus of Bill's reviews: web-based business software that enables companies and individuals to better organize, track, and share information, as well as better manage projects, processes and workflows.

Among the Web-based tools he's reviewed: Zoho, QuickBase, and TrackVia.

Looking for apps that help you and your team get work done?

Check out the AppGap's Appopedia, an ever-expanding section with reviews of more than 150 of today's best tools to help you better manage projects and collaborate. Reviews are presented in a useful directory that breaks down tools by category and function, e.g., online crm, project management, human resources, security, etc. Check it out here.

The AppGap Webinar Series

The AppGap has hosted a series of discussions with leading thinkers and doers intended to illuminate how new apps and approaches are changing the way we work and help companies and individuals implement better collaboration, project management, and productivity practices and solutions. Access, via the links below, the recordings, each about an hour long, of the discussions.

- 5 Big Ideas for Getting All That Work Done
- Should Your Business be Friends with Facebook
- The Future of Work

Email Newsletter icon, E-mail Newsletter icon, Email List icon, E-mail List icon Sign up for our Email Newsletter

Recent Comments

  • Michal Wachstock: Disclaimer: I work for Clarizen. I know this conversation is a bit old, but I just bumped into it...
  • KateLukach: RT @BillIves: post on @theappgap @Coveo Provides Version 2.0 of Its Customer Information Access Solutions...
  • BillIves: post on @theappgap @Coveo Provides Version 2.0 of Its Customer Information Access Solutions (CIAS)...
  • Allen Bonde: Hi Bill – I agree these tools can be addictive! Kinda like candy for brand marketers :-) Thought...
  • eastwickcom: RT @BillIves: post on @theappgap NetBase Provides an Expanding Set of Social Media Monitoring Measures...
The AppGap is a blog and resource on the future of work and how new tools are addressing age-old challenges of organization, collaboration, and innovation. But it is also an idea: that there remains a gap between the toolset that exists and what's needed...

Can today's project management software be done better? What can online CRM help companies companies accomplish? Which development platform can help individuals and organizations build better online databases, Web based applications, and HR solutions? And what are the processes and best practices that help organizations large and small achieve success. Find out more.

About | Contributor Bios | Blog Policy | Contact us