Enterprise Search Summit 2010 Notes: BI in the Age of Social Media
by Bill Ives
I recently attended the 2010 Enterprise Search Summit. This session, BI in the Age of Social Media, was lead by David Bean Ph.D., CTO, Founder, Attensity. Here is the session overview. My notes follow.
“Customers are flocking to the web to discuss everything from customer service complaints to product recommendations. JetBlue executives are mining unstructured text for insight into customer thoughts and actions. This session will discuss how companies can develop an efficient knowledge repository to enable more effective creation, maintenance, and administration of business knowledge from the web and other sources, and link it with all service-relevant documents. Presenters will provide examples of what worked and, more importantly, what hasn’t, in analyzing social media content.”
David changed his title at the start to BI in the Age of Social Media and How Search Can Help. Up until now search and BI have not been well connected. Then he said he has more of a liberal arts background and is not a tech person. I like this, as I am the same. He also said that h teaches linguistics but his degree not there. He also is a welder, raises chickens and fixes old tractors. I like him more already.
In the past search was simply put on the top of the BI tools to find reports generated by the tool. But it ignored the data behind the report.
Now social media has started to get into the BI space. This has happen largely with social media monitoring tools rather than traditional BI tools. I have looked at a number of these tools on the AppGap and agree with David.
David then asked why should BI care about search. First BI is about measurement. It quantifies business processes but has looked mostly inside. He said that now BI need to focus more on the Web as so many brand related conversations occur on the Web. There are “banking company x sucks” sites where people put their comments. People also do more research on what they plan to buy and they look at these comments by others.
There is now confessional intimacy. You look more at individual people who have just used the product than the professional rating systems. His firm looks at “first person intelligence” (their TMed term), what individual people say.
He gave a call center example to look at varying rates of call volume by region to discover why. The structured data is not enough to fully answer the question. You need to look at the call notes to see why the customers called. He wants to ask questions of both structured and unstructured data to go beyond the “what” to the “why.”
In addition to data like call center notes, you need to go out on the Web to forums, Twitter, Facebook, and other sites such as Craig’s List and YouTube. The Web volume is dramatically bigger for the web. They need a listening post to monitor this content and then the ability to integrate this Web data with a BI tool.
His firm bought a Web search tool, Biz360, to better look at this Web content. It is now Attensity 360. But there is need to shift through this volume as much is not relevant but some is. They are monitoring 37 million blogs as well as forums. They push all this data into a search engine. They vary case sensitivity in three ways. They have 3 billion documents hosted on Amazon and the index is updated every 30 minutes through Lucene. They focus on 3 – 6 months of the most recent data on a rolling basis.
They make 30,000 queries on this data every day. The queries can be up to 10,000 characters. The data is also pushed to their A5 analytic application. They use the queries to filter down what is important to their customers.
David discussed what has not worked – search as collector. They did a key word search on the customer’s products but it did not work as they got lots of porn and fetish sites when looking for Old Spice aftershave.
There is also the new language of social media and the abbreviations. They can do a table for this but there are other issues such as short hand like +1 to say I agree with the last comment in a forum. The traditional NLP problems of metaphor, sarcasm, and idioms now appear a lot of social media to make them harder. The same goes for typos such as missing spaces.
They have a team of linguists trying to keep up with the changes in languages, Darwin bypasses this issue, as it does not try to understand the content but lets the content self-organize and the user then can make the decisions. The translation and understanding issue is passed to the user but in a format that supports the user making these decisions. Now this is a different approach than looking at massive amounts of data in an automated manner so Darwin supports a different method of use. Each have their place.
Eric Andersen of IBM asked why they index everything instead of pulling out what the customer wants. David said they do not know what the customer wants in advance. New questions are always are emerging. So they look at the same data for all customers and just ask different questions. They create special queries for each customer, starting from standard queries and customizing them. Some of these queries are complex. Eric, who sat next to me, also told me about Emoticons, a new IBM tool for sentiment analysis.



