A slew of documents appearing to describe how Google ranks search results have appeared online, likely the result of an accidental posting by an internal bot.
The leaked documentation describes an older version of Google’s Content Warehouse API and provides insight into the inner workings of Google Search.
The material appears to have been inadvertently deposited into a publicly accessible Google-owned repository on GitHub around March 13 by the web giant’s own automated tools. This automation added an Apache 2.0 open source license to the commit, as is the case with Google’s public documentation. A follow-up commit on May 7 attempted to revert the leak.
The material was nevertheless spotted by Erfan Azimi, CEO of search engine optimization (SEO) company EA Digital Eagle, and then leaked on Sunday by fellow SEO agents Rand Fishkin, CEO of SparkToro and Michael King, CEO of ‘iPullRank.
These documents do not contain code or anything, but rather describe how to use Google’s Content Warehouse API, likely intended for internal use only. The leaked documentation includes numerous references to internal systems and projects. While there is a Google Cloud API of the same name that is already public, what ended up on GitHub goes way beyond that, it seems.
The files are notable for what they reveal about what elements Google considers important when ranking web pages for relevance, a question of constant interest to anyone involved in the SEO industry and/or anyone exploiting a website and hoping that Google will help it gain traffic.
Among the more than 2,500 pages of documentation collected here for easy reading, there are details on more than 14,000 attributes accessible or associated with the API, although little information on whether or not all of these are used. signals and their importance. This makes it difficult to discern how much weight Google gives to attributes in its search ranking algorithm.
But SEO consultants say the documents contain remarkable details because they differ from public statements made by Google representatives.
“Many of (Azimi’s) assertions (in an email describing the leak) directly contradict public statements made by Googlers over the years, particularly the company’s repeated denial that search-centric user signals clicks are used, denial that subdomains are considered separately in rankings, denials of a sandbox for newer websites, denials that the age of a domain is collected or taken into account account, and more,” SparkToro’s Fishkin explained in a report.
iPullRank’s King, in his post about the documents, highlighted a statement made by Google search advocate John Mueller, who said in a video that “we don’t have anything like an authority score Website Rating” – a measure of whether Google considers a site authoritative and therefore worthy of higher ranking in search results.
But King notes that the documents reveal that, as part of the compressed quality signals that Google stores for documents, a “siteAuthority” score can be calculated.
Several other revelations are cited in the two posts.
The first is the importance of clicks – and different types of clicks (good, bad, long, etc.) – in determining a web page’s ranking. Google during United States against Google The antitrust lawsuit acknowledged (PDF) that it considers click-through metrics as a ranking factor in web search.
Another reason is that Google uses websites viewed in Chrome as a quality signal, visible in the API as the ChromeInTotal parameter. “One of the modules related to Page Quality Scores offers site-level view measurement from Chrome,” according to King.
Additionally, the documents indicate that Google considers other factors such as freshness of content, authorship, whether a page is related to the central purpose of a site, alignment between the title of the page and content and “the weighted average font size of a term in the body of the document.”
Google did not respond to a request for comment. ®
News Source : www.theregister.com
Gn tech