building a data search engine

Suppose you want to find data relating to a specific topic.  Software can easily distinguish between alphabetical text and numbers.  Software can also easily recognize tabular formats in HTML and plain text.  Moreover, the extent to which documents contain numbers and tables strongly distinguishes among documents.  “Look for data” seems like a significant search qualifier that wouldn’t be costly to implement.  So where are the data search tools?

Zanran is search engine designed to find data.  The only way to tell Google that you want to search for data is to specify a search for .xls (Microsoft Excel) documents.  That’s a poor specification for a data search, especially for a company that’s not Microsoft. With Zanran, you specify that you want to search for data simply by using Zanran for the search.  Zanran returns documents containing data.  Unlike Datafiniti, Zanran doesn’t attempt to extract the data into a tabular form.  Unlike WolframAlpha, Zanran doesn’t attempt to do calculations with data that it finds.  Zanran finds the relevant documents.  You extract the data and make the specific tables or calculations that you want.  At least until the forthcoming world-wide implementation of linked data, Zanran’s approach is the most cost-effective division of tasks for using data from the whole web of various document types.

A stand-alone data search engine unfortunately doesn’t seem economically propitious.  Building a data search engine involves all of the challenges of building a general search engine.[*]  Search for data is mainly a tuning of the relevance-ranking algorithm.  Even without such a tuning, a Google search for historical advertising expenditure leads to better data than a similar Zanran search.  Moreover, searches for data don’t provide context for lucrative advertising.  Persons searching for data aren’t looking to buy mass-market consumer products.  Perhaps Zanran or similar services can succeed on a subscription or pay-per-search basis. For the sake of fact-based policy and business analysis, let’s hope so.

*  *  *  *  *

Read more:

[*] Zanran analyzes images to determine if they contain a graph, chart, or table.  Thus a Zanran search can potentially return a desired graph or data that a purely textual search would miss.  That’s valuable, but searching images for numerical data and data presentations seems to me to be a rather small share of the overall value of data search to users.

Leave a Reply

Your email address will not be published.

Current month ye@r day *