Could Google Open Source Their Rankings?
Google has been asked if they could open source their search engine algorithms before, and have pretty solidly said that they can’t because it would lead to even heavier gaming once everyone knew what really drives Google rankings.
Of course there’s probably a business reason there too, it’s likely Google thinks that if their competitors could see their code, they would have a leg up on improving their own search engines.
I personally think that even though it might be tricky, it’s possible to do an open source ranking algorithm, or at least partially open. However, it would depend on reworking the way that the web works, the basic problem is that the anonymity of the web makes knowing what pages can be trusted a very difficult problem.
Currently Google solves this issue through a lot of hacks that get around the anonymity problem: looking at trusted pages and where they link to, looking at content’s writing quality, using factors like age of domains and then of course traditional keyword matching.
Because there is a lot of hackiness built into this analysis, publishing the details of the hack could ruin it.
What could change this situation is if there were a way to associate an identity record with a web page, saying that this person or company stands behind this page. This would immediately simplify the ranking and gaming problems by changing the algorithm into two questisons:
1. What is the credibility of this identity, and in what fields?
2. What other identities referenced this page, and what are their credibilities?
Identity credibility is very hard to game, even if you know the algorithm of the game. Because giving out bad references hurts your identity, people are motivated not to do it. An open Google could tell you exactly what your credibility score is, instead of hiding it behind secret indexes somewhere in the Googleplex for indirect things like your blog and your flickr pages, etc.
This sounds kind of pie in the sky, because it would require a new identity standard that people might not take up. But compare it to email SPF records. The email standard was written so that you can send email from any “FROM:” address you want. This led to a huge spam issue that was mostly based on trust problems. But once SPF was introduced, emails were tied to an identity: domains. Spam became a lot easier problem to deal with.
One example of how identity based web could work a lot better for search engines is this new blog I’m writing. The address is alexbosworth.net. It’s a new domain, and a new blog, and even though the name is right in the domain and I linked to it from all my other pages, Google still stuck it in a sandbox for months and didn’t rank it #1 for a search on my name - Google needs a lot of time to figure out if it can trust new pages.
If Google knew however that it was really from me, it could instantly compare it against my other pages and see that I was updating this now on a rapid basis. It wouldn’t take months, it could do it in a couple days.
I think an identity standard could push the search engines beyond the wall of small incremental gains they are getting now, and let them deliver next generation features, such as mapping your friends network to your search results, or community blacklisting of SEO spammers. If Google won’t do it, hopefully someone else will.