Index Engines Looks Deep into Enterprise Data
by Bill Ives
There is a growing market in the eDiscovery and compliance space as organizations amass a growing and vast amount of content in an increasing variety of formats. I recently spoke with Index Engines about their capabilities. They have developed the means to look into a variety of backup formats that have previously been difficult to deal with. Index Engines recently announced that the Index Engines Collection Engine now works with EMC Data Domain deduplication storage systems and leverages existing backup processes to automatically identify and extract specific files and email for regulatory, compliance and legal applications. They also work with a number of other formats.
As they pointed out, only a small subset of the data captured in the backup process is of value for long-term access. To filter down the volume of data that is archived, detailed knowledge of the backup images is required. Index Engines Collection Engine automatically indexes backup images, identifies the useful content, collects what is relevant and writes it back to Data Domain storage making it available for compliance and litigation purposes.
The Index Engines Collection Engine for Data Domain indexes the content of backup images so that they can be searched and analyzed for business relevance. These searches can be high-level metadata such as user mailboxes, or detailed queries based on file or email content, location and date ranges. Searches are saved as stored queries that run automatically once a new backup is executed.
The relevant set of data that is identified within the backup image is extracted into a Collection image and written back to the Data Domain system. This allows a small subset, typically less than 5% of the backup data, to be retained for long-term access and also takes advantage of the Data Domain deduplication technology. Specific user files or email can be extracted, keeping all metadata intact, for compliance with legal and regulatory requirements. Here is sample of the Index Engine interface.
This issue will only grow in importance as organizations are generating an ever-increasing amount of data. The introduction of more unstructured data through internal social media only makes it a more massive job. In addition, Index Engines indicated that courts and regulatory agencies are becoming more demanding as the technology for eDiscovery becomes more accessible. They used to give organizations more of break because the process was so expense. Now that progress has been made and costs are coming down this tolerance is tightening up. This is certainly a growth field that serves a very useful purpose.


