Lexalytics Salience 5.0 Devours Wikipedia to Enhance its Semantic Understanding


Semantic technology differs from most computing as it learns on the job. This can provide great benefits but it can also be time consuming. I recently talked with Jeff Catlin, CEO of Lexalytics, about what they’re doing with their recent upgrade of their product, Salience 5.0. They came up with a clever idea to reduce the learning curve. They had their semantic engine digest Wikipedia to gain an understanding of human thought and build their Concept Matrices™. This allows it to do things that most computer technology would struggle with such as understanding that pizza is a food even though the word food was never associated with pizza in the text it was looking at.

While the Wikipedia content is freely reusable under the Creative Commons Attribution-ShareAlike 3.0 Unported License, Jeff said that they made sure the Wikipedia people were comfortable with what they are doing without asking for a formal endorsement.

These associations are established through concept matrixes. Salience 5.0 will also knows that cat is more associated with lion than with hippo. Most semantic engines required extensive training to be able to reach these conclusions. Devouring Wikipedia gives Salience 5.0 an “out-of-the box” categorization system. You can then add your own categories to extend the system and Salience will know where to put stuff it encounters. Wikipedia was a good choice since it is the work of many people and is constantly refined so there is a “wisdom of crowds” component to its content.

For example, the screen below shows the results of a concept topic run for the text “I like chicken”.  Notice that the selected topic is Food (which looks about right).  Notice on the far right that the highlighted definition of Food doesn’t talk about chicken anywhere. The system managed to figure out that chicken is a type of food.

The system can adapt to subtle changes. For example, the screen below shows the results when the text is changed to “I like chickens”.  Now we get Agriculture as a topic, which makes more sense for liking chickens.  Again notice that Agriculture doesn’t talk about chickens, the system managed to figure out that chickens are typically talked about around agriculture, whereas chicken is usually talked about around food.

Lexalytics generally partners with other firms to provide their semantic engine as a component to comprehensive solutions. For example, they are the text analytics engine within Endeca (see my review: Endeca Moves Beyond Deterministic BI Reporting). Other partners include Radian6 and TripAdvisor (see also my review – Radian6 – Monitoring Social Media).  There are also several large organizations such as Microsoft and Cisco that directly use Lexalytics.

I really like what they did and can see why it is being adopted by a number of partners.

If you enjoyed this post, please consider leaving a comment or subscribing to the RSS feed to have future articles delivered to your feed reader.