When there's an 800 pound gorilla in your space, trying to steal bananas isn't exactly the smartest approach. You figure out what he's not eating, and you start nibbling. Before you know it, you're eating just as much as he is and wouldn't you know it, one bad banana crop and he's toast.
So when I hear that someone wants to build a better search engine than Google, while I don't think it's impossible, I question along what lines they're trying to do it. Can you really do it by indexing more pages than Google? I find that hard to believe, because an infrastructure arms race with Google seems like a bad idea--no matter how efficiently you think you can manage your crawling costs.
Smarter algorithms? Maybe, but isn't algorithm quality a function of sheer brain power of your search team? Again, this isn't where I want to take Google on head to head.
No, where I think you can beat Google, or at least make some headway on them, is with people.
At the end of the day, computer interpretation of human behavior and desires is what drives Google. You could attempt building a bigger or faster computer, but no one computer is really going to be able to interpret people better than, well, people.
To me, it's also the reason why Firefox gained so much ground against Internet Explorer. It wasn't that smarter people work on Firefox--it's that more people worked on only the things they cared about, solving problems for themselves. The best ideas floated to the top and became part of the codebase. Things got addressed that weren't a priority for the IE team, but that more engaged users had keen insight into the value of. The more you directly involve people--at scale, which isn't easy--into the process, the better your product is, because your product is made for people.
So, right now, Cuil and a number of other startups have teams of a handful of people who are supposed to know better than all the Google people what users want out of their search and how to search better. Why not, instead, open up the process to something more open source--more Firefox-like?
Here's what a more collaborative approach to building a better search engine might look like:
Outsource just the basic crawl to Amazon, because they've probably got the best shot at competing cost effectively, but enable outsiders a chance to add elements to the crawl. So, if you have a way of categorizing pages, like Cuil says they do, add that ability to the Amazon powered crawl, and your special taxonomy and tags will be available for anyone to access, work on and improve.
Let others use your infrastructure to target specific pages with a different type of crawl and contribute to the results. In other words, let Indeed and others run their crawlers on your infrastructure, so that the barrier to create new attempts at search isn't set artificially high. This will make a lot more sense when we talk about Plugins.
There are lots of different types of search that Google just doesn't do well, like jobs and events. This has given rise to some opportunities in the vertical search market Let's take that Indeed example. Right now, Indeed searches jobs much better than Google does, so why not enable Indeed, Simply Hired or anyone else crawling jobs to outline what represent job keywords and searches, and automatically provide results for them. You could even randomly rotate which job search engine plugin powers your job search and let them duke it out for highest clickthrough rates, or allow the user to set a default.
Basically, a "plugin" would be a hosted version of the third party crawler that gets sent queries based on their structure, keywords, etc., and gets to send back all the results they can, as well as gets the opportunity to advertise against them. So, in our "open source" search engine, when I type in "marketing jobs, new york, NY", instead of getting a page of links to search engines for their marketing jobs queries--i.e. an "extra click"--I'd actually get jobs as my results, and Indeed powered job ads.
The same could go for movies. How many times do you type a movie name into Google, knowing full well that IMDB is going to be the first result? Why not allow IMDB to be the movie plugin? They could directly provide structured results for all the actor and movie queries and be allowed to advertise against them. This way, you eliminate the Google middle man when all you were really trying to do was reach IMDB in the first place. All you'd need are some standardized display templates for results, which could also allow some interface flexibility for different types of queries, like videos or location searches on local maps.
People could build other types of plugins, like one that would automatically display RSS results when blogs came up high in your ranking. I get Google results for "Charlie O'Donnell's blog"... let Newsgator build that plugin and power it with all of the clickthrough data on what my interesting most recent posts were.
The system of sending queries to the right search tool would be a kind of AdWords platform, but a level up the chain. Instead of a marketplace for advertising next to one kind of search result, you'd have a marketplace of search results, each coming with their own ads in tow (or using a default ad platform that anyone could use.) You could attempt to "buy" certain keywords to put your search results next to them, but you'd have to get good clickthrough performance to keep appearing.
You'd definitely allow users to add their own scripts and plugins, as well as have them contribute other types of data. Let me pump in my blog, my del.icio.us tags, twitter feed, etc. in an effort to teach the search engine all about what I like. Let me remove results, follow my clicks... learn about me (and my friends) as I go along.
The company that should really get into this is Yahoo! They couldn't out-Google Google on search or monetization, so they should just crack open the whole thing and let the community and other companies have a shot at it. They could be the default ad network for searches that weren't powered by plugins... and they could strike deals with plugin providers to take a smaller cut than Google would have had on ad clickthoughs.
If not, I still think it would make for a pretty viable community project. Hell, maybe Mozilla should be the one to work on it, or are they getting paid too much by Google for Firefox default search to rock the boat there?