Home
» Wiki
»
Google Accidentally Published Documentation on How Search Works
Google Accidentally Published Documentation on How Search Works
On Monday, internal documents describing the factors Google Search considers when ranking and displaying web results were leaked.
Google accidentally reveals how Search works
These documents were made public by Rand Fishkin of SparkToro, a software company. Fiskin previously worked in the search engine optimization (SEO) industry.
This “Google API Content Repository” contains internal API documentation that explains to employees how the various components that generate search results work. There are more than 2,500 pages in total. Some of the documentation describes older systems, but other documents appear to be up-to-date.
Based on what has been published, Google appears to have made it publicly available — perhaps by accident — via GitHub starting on March 27. Explanation documents were then released on May 7. However, because it was indexed by a third-party service around that time, a copy of it remained available even after Google removed it.
While this data shows what factors Google Search may consider when ranking search results, it doesn't reveal how important each factor is to final rankings.
Those in the SEO community who are trying to adapt to changes in Google Search rankings and appear higher on the page may find this data useful. After reviewing this document, they found it to be inconsistent with what Google has publicly said about how Search works.
Google has yet to publicly comment on the leak. The company announced its last major update to Search in March, which was intended to surface more authentic and “useful” content. Its core ranking system was updated to determine whether a page was “made for search engines rather than for people . ”
The biggest findings in the leak
It’s clear that Google’s search algorithm hasn’t been leaked, and SEO experts don’t suddenly have all the answers. But the information leaked in a trove of thousands of internal Google documents is still huge. It’s an unprecedented look into Google’s usually closely guarded inner workings.
Websites depend on search traffic to survive, and many will go to great lengths – and great expense – to beat their competitors and rise to the top of the results. Better rankings mean more site visits, which means more money. As a result, website operators are closely monitoring every word Google publishes, as well as every social media post, in relation to search rankings.
Over the years, Google spokespeople have repeatedly denied that user clicks influence how websites rank—but leaked documents have noted that certain types of user clicks factor into search rankings. Testimony from a previous US Department of Justice antitrust case revealed a ranking factor called Navboost that uses user clicks to boost content in search.
“For me, the biggest takeaway is that many of Google’s public statements about what they collect and how their search engine works are at odds with reality,” Rand Fishkin, an expert in the search engine optimization (SEO) industry, told The Verge via email .
The leak first went viral after SEO experts Fishkin and Mike King published some of the contents of the secret document earlier this week, along with accompanying analysis. The leaked API documents are repositories filled with information and definitions about the data Google collects, some of which can inform how websites are ranked in search results. Google initially dodged questions about the authenticity of the leaked documents before officially confirming them on Wednesday.
“We are cautious about making incorrect assumptions about Search based on out-of-context, outdated, or incomplete information,” Google spokesperson Davis Thompson told The Verge in an email on Wednesday. “We have shared extensive information about how Search works and the types of factors our systems consider, and we work hard to protect the integrity of our results from manipulation . ”
There is no mention in the documentation of how the different attributes are weighted. It is also possible that some of the attributes listed in the documentation—such as the identifier for “small personal sites” or the downgrade for product reviews—may have been implemented at some point, but then removed. They may also have never been used to rank sites.
“We don’t necessarily know how [the elements] are being used, other than the various descriptions of them. While it’s not very rich, it’s still a lot of information for us,” King said. “What aspects should we think about more specifically when building a site or optimizing a site?”
The idea that the world’s largest search platform doesn’t rank search results based on how users interact with content may seem absurd. But repeated denials, carefully worded responses, and industry publications make it a controversial topic.
Another important point Fishkin and King highlight concerns how Google may use Chrome data in its search rankings. Google Search representatives have said that they do not use anything from Chrome for ranking, but leaked documents suggest that may not be the case. For example, one section lists “chrome_trans_clicks” to indicate which links from a domain appear below the main site in search results. Fishkin explains that this means that Google “takes the number of clicks on pages in the Chrome browser and uses that number to determine which URLs on a site are the most popular/important, which URLs are included in the sitelinks calculation . ”
There are more than 14,000 attributes mentioned in the document, and researchers would have to dig through the pages for weeks to find hints. “Twiddlers,” or ranking tweaks, are deployed outside of major system updates to boost or demote content based on certain criteria. Site factors, such as who the author is, are mentioned, as are measures of a site’s “authority.” Fishkin points out that there are many things that are not represented in the documents, such as information about AI-generated search results.
So what does all this mean? First, it’s likely that anyone who runs a website will read about this leak and try to make sense of it. Publishers, e-commerce companies, and businesses will likely design various experiments to try to test some of the things suggested in the document. As this happens, websites may start to feel a little different—all as they try to make sense of this new, but still vague, wave of information.
“Journalists and publishers covering SEO and Google Search need to stop parroting Google’s public statements and take a much more critical, adversarial look at the search giant’s rhetoric,” Fishkin said. “Publications that repeat Google’s statements as if they were fact only help Google create a narrative that is useful to the company, not to practitioners, users, or the public. ”