8/07/2007

Search engine... What is it really about?

After I ran into article The Anatomy of a Large-Scale Hypertextual Web Search Engine by Google founders Sergey Brin and Lawrence Page, my interest in internet search engine have triggered. I knew the code optimization was very important because you are dealing with real-time data presented to millions of people at the same time on search queries from billions of pages.

Not that I wanted to start building my own search engine, while I was digging through what would be needed to build such system, I ran into widely read article in the search engine community Why Writing Your Own Search Engine is Hard by Anna Patterson. She did great job of gathering what really it would be like to actually generating codes for your search engine! Here are abstract from her article:

Big or small, proprietary or open source, Web or intranet, it's a tough job.

There must be 4,000 programmers typing away in their basements trying to build the next "world's most scalable" search engine. It has been done only a few times. It has never been done by a big group; always one to four people did the core work, and the big team came on to build the elaborations and the production infrastructure. Why is it so hard? We are going to delve a bit into the various issues to consider when writing a search engine. This article is aimed at those individuals or small groups that are considering this endeavor for their Web site or intranet. It is fun, but a word of caution: not only is it difficult, but you need two commodities in short supply—time and patience.


Additionally Anna even points out that there are open source search engine project by Mike Cafarella and Doug Cutting in article, "Nutch: Open Source Web Search." You can find out more detail on this open source project from Nutch home page.

Technorati Tags: ,

Powered by ScribeFire.

No comments: