Attivio Tightly Integrates Structured Data and Unstructured Content for a New Approach to Information Access


Attivio Inc. offers the Active Intelligence Engine™ (AIE) for Unified Information Access, bringing together business intelligence and enterprise search capabilities. AIE enables enterprises to blend their structured data and unstructured content in a rich search context, “providing the precision of SQL with the fuzziness and speed of search.” Recently I spoke with Andrew McKay of Attivio on what they are doing.

Andrew said they brought together many of the best people from enterprise search vendors such as FAST and structured data tools vendors to start fresh. Based on their prior experience they knew what they wanted to build, a comprehensive information access engine that addresses all the data and content within the enterprise in a unified manner. He said this fresh start gave them the advantage of not dealing with legacy technology that needs to be adapted. The resulting innovations include a core footprint of less than 15 megabytes, automatic facet generation, true incremental scaling, and the other features described below, along with low effort integration. Attivio has filed five patents on their technology.

Attivio believes that search results from unstructured content or query results from structured data isolated from each other sub-optimize the business process. By delivering answers based on all types of corporate information and integrating these results with corporate processes, Attivio allows users to go beyond finding information to using it in an active sense within the same tool. AIE lets you save query results to generate alerts that trigger an action. You can define an alert or trigger anywhere in a workflow, and you can use any of several communication transport protocols to ensure the action reaches its target (e.g. AS400, EJB, Email, File, FTP, HTTP, SOAP, SSL). Examples of alerts include sending notification via email, IM, or to a mobile device through SMS; writing enriched data back into a database through SQL; triggering another application to take action; or posting an event into an MQSeries event queue.

AIE also gives you improved control for indexing content, processing queries, and returning results by passing them through multiple processing stages before they reach their destinations. These stages are organized into workflows that support branching, conditional logic, and parallel processing. Most of the stages are provided out of the box (e.g. content decomposition, term extraction, results sorting), but you can create your own.

As an example of these processing steps, zip files and emails with attachments have historically been problematic with existing search engines, but with AIE they are processed automatically through a simple looping workflow that indexes the container first and then the contained items. Workflows for video and audio files process the meta-data first, making it visible for searching immediately. In parallel, they spawn a separate task to generate the voice-to-text transcription that when completed is added to the meta-data in the index at a later time (transcriptions take a long time to run because they do so in real time with the media).

Facets are another area that Andrew covered. A facet is usually a piece of text on the screen representing a property of an object, general concept or term, or part of a graphic tag cloud or heat map, that when clicked returns information relevant to the facet. Facets are generally used to refine a set of search results by “drilling down” through terms or properties to progressively define the details of the query until only the “right” results remain. Facets are especially useful for discovery because they guide you through the content based on what the content says.

Instead of manually defining the facets as part of the configuration – a task that can be quite heavy if you have, say, hundreds of thousands of products in a catalog, AIE’s patent-pending approach dynamically recommends the best facets for each query based on the query’s results. It also recommends the order in which they should be displayed. This works right out of the box. For example a search for “laptop” might recommend the facets “CPU”, “screen size”, and “weight”; whereas a search for “server” might recommend “CPU”, “memory”, and “number of rack units”. AIE will still allow you to create static facets. Combining the two approaches is also an option. In this example, “price” can be defined as the first facet for every search.

Finally, Andrew went over the JOINs. The JOIN command in the SQL language is the key command of the relational database environment. It defines the cross-section of results among two or more database tables. For example, a request for “our 100 best-selling products in the last quarter” would extract the invoices by “JOINing” the table of all products with the table of all invoices where the intersection would be all the products whose invoices with dates in the last quarter add up to the 100 largest total amounts. The JOIN is possible because the invoice table contains a product ID number that links to the invoice’s product in the product table.

AIE allows you to extend the JOIN to unstructured content like documents and email. For instance, change the example above to, “relevant blog and press information about our 100 best-selling products in the last quarter.” A database engine would reshape the web logs and RSS news feeds to fit in the database and then perform the JOIN. The challenge would be to determine which logs and feeds are relevant to include in the first place. A search engine, on the other hand, would select the relevant logs and feeds, but determining the products would be hard. AIE’s JOIN feature understands how to JOIN any two objects that conceptually share a common property. The property could be a field in a database, a tag in a document, or an entity extracted from the content of the text itself. The example, “relevant blog and press information about our 100 best-selling products in the last quarter” is now doable.

Attivio can do this because it indexes structured data differently than conventional enterprise search technologies. The conventional approach is to query the database before indexing, which flattens the data and inserts into the index like a document. Attivio indexes each table’s rows separately, and performs the JOINs on the fly at query time.

This combination of features should take enterprise search and information access to another level. They released version 1.0 in January and are about to launch version 1.2. Andrew says the client reaction is very positive so far. This will be an interesting tool to watch as it further enters the market. Here is a sample Attivio search result screen,


If you enjoyed this post, please consider leaving a comment or subscribing to the RSS feed to have future articles delivered to your feed reader.