Monday, Nov. 08, 2004

Super Searches

By Laura A. Locke/San Jose

It was the kind of question you might expect from a kindergartner: "What color is the Web?" But for IBM scientist Dan Gruhl, there was nothing childish about it. A researcher with seven patents to his name, Gruhl, 32, is tasked with solving all sorts of lofty brainteasers. And when this question popped up, he wanted to solve it--quickly and definitively.

It was, in fact, the perfect kind of question for test driving a brand-new tool that Gruhl and his colleagues at the storied IBM Almaden Research Center in San Jose, Calif., have developed. The new tool, called WebFountain, is a next-generation search technology that lets users ask specific questions, in complete sentences--something today's search engines have trouble handling. Powered by one of the world's fastest supercomputers, WebFountain can whittle down billions of pages of unstructured data from the entire Web in real time, rapidly retrieving and analyzing only the most relevant pages. Geared for corporate applications, WebFountain spots online trends as they emerge, identifies patterns--assessing even word-of-mouth gossip, chatter and sentiment--and keeps track of them, noting how they change over time. "Google on steroids" is how one top IBM executive has described it.

"We're looking at relationships between entities, and between people and places and products," says Gruhl, WebFountain's chief architect. If a company wants to know, for example, what potential customers are saying online about its new gizmo, or even what people are saying in a specific language about the company's product, WebFountain can help give the answer. In one pilot program, WebFountain found that the buzz on college campuses preceded music sales of new CDs by two weeks (good to know if you're a record label). London-based Semagix licenses WebFountain to help its large banking customers track suspicious money flows. "Instead of looking at 5,000 documents, we're only looking at 20," explains Larry Levy, Semagix's president and CEO. Factiva, owned by Dow Jones and Reuters, has licensed WebFountain to help its customers track their corporate reputations. As IBM's Gruhl puts it, "It's a huge untapped area where people are getting blindsided by things that are developing on the Internet in chat rooms, in discussion forums or in blogs."

Why does this matter? Consider, for example, Gruhl's question about the color of the Web. WebFountain helped illustrate how culture, language and context all come into play on the Internet: in the U.S. and Europe, most websites are blue; Southeast Asian sites are typically red or orange. For companies planning on selling products in, say, India or China, developing a lime-hued Web presence may not be the smartest investment.

IBM has reportedly pumped more than $100 million into WebFountain. To date, the market has been dominated by a handful of small companies like ClearForest, Inxight and Intelliseek. Over the next year, Big Blue aims to roll out a flurry of new Web-scale information-discovery services. While IBM is closemouthed about specifics, the intelligence community is among the hungriest customers of such advanced, large-scale text analytics. The CIA's venture-funding arm, for example, has invested one-third of its $30 million portfolio in data mining and text/visual analytic companies like Inxight. When it comes to tracking terrorist threats, says Jim Thompson, chief scientist for information technologies at the Department of Homeland Security's newly created National Visualization and Analytics Center, high-volume text analytics "has saved people's lives." That's hardly child's play.