MarkLogic Server allows organizations to store, search, analyze and dynamically deliver XML content. I have written about one application before (see – US Army’s Battle Command Knowledge System (BCKS) Moves to XML-based Platform). Recently, they announced the release of MarkLogic Server 4.0. I spoke with John Kreisa, Director of Product marketing. This new release is their largest so far and contains a number of extensions that we went over.
But first let me cover the basics. MarkLogic Server is an XML server which provides a software application development platform for creating XML-based content related applications. The XML basis provides greater granularity in database searches and more efficient document delivery than traditional means. It accommodates semi-structured data. John explained that they this is what is often called unstructured data such as narrative but that they prefer the term semi-structure as all data has some structure. In the US Army application I wrote about earlier, both the speed of access and the granularity of search were key benefits of this approach.
The new release adds to these features in several ways. First, it adds new geospatial capabilities that enable organizations to build location-based applications that search and analyze content based on location information. The new release provides built-in support for popular geospatial data tagging formats such as GML, KML, and GeoRSS/Simple, as well as new geospatial query functions for point, radius, bounding box and polygon constraints. As information consumers and workers become more mobile, the delivery of information in the context of their physical location can greater improve relevance. For example, military personnel operating directly in the battlefield could search for background information relevant to their next mission, just as shoppers can see local places to eat after completing their mission. You can see MarkLogic geospatial bucketing below.

Release 4.0 also provides built-in support for entity identification and inline markup. This new text mining feature works in 11 languages and identifies 18 different types of entities, including person, organization, location, credit card number, email address, latitude/longitude, date, and time. For example, one query might be to distinguish between Paris Hilton the person and Paris Hilton the hotel to find which is relevant to the query. This works through tags. In this context, Mark Logic is also introducing the Open Enrichment Framework, an initiative created to speed integration with third-party entity extraction engines and other content enrichment tools.
One new feature that I especially liked is co-occurrence analytics that finds and counts pairs of entities in content. This can potentially expose previously unknown relationships and provide useful new insights. The system groups content based on pre-set parameters, such as treatments and side effects.
Co-occurrence would allow, in this example, a medical researcher to determine the common instances of pharmaceuticals and side-effects and then display the results graphically on a map based on frequency. You could also look for pairs of people or the pairing of a person and places.
There are also new alert capabilities. You can be notified of new content that contains the terms, places, etc. that you pre-set. This feature is very scalable. The modular documents feature allows you to re-use content more efficiently. Instead of cutting and pasting portions of a document. This feature has support for XPointer and XInclude, mechanisms for merging XML documents. You simply link back to it so it is store once, link many. This ensures a consistency of content that is especially useful for policies in regulated industries.
John went on to describe their new support for W3C XQuery 1.0. Mark Logic has hundreds of active XQuery-based deployments so this will smooth migration to XQuery 1.0 as the new release provides compatibility modes to ensure interoperability with applications developed with all previous versions. Finally he concluded with the enhancements to the administration functions including the automation of key management activities through scriptable administration and scheduled back-ups, as well as event logging and auditing support.
XML provides a more flexible foundation for content creation, storage, sharing, and searching. I think the new MarkLogic Server 4.0 takes greater advantage of the potential within this format and will even more useful for developing web 2.0 and enterprise 2.0 applications.