Skip to content
4 insights from the google leak Blog Header

4 Insights From the Google Leak — Whiteboard Friday

Tom Capper

The author's views are entirely their own (excluding the unlikely event of hypnosis) and may not always reflect the views of Moz.

Table of Contents

Tom Capper

4 Insights From the Google Leak — Whiteboard Friday

The author's views are entirely their own (excluding the unlikely event of hypnosis) and may not always reflect the views of Moz.

In this Whiteboard Friday, Tom touches on four particular insights gleaned from the recent Google documentation leak.

Click on the whiteboard image above to open a high-resolution version!

Happy Friday, Moz fans. I'm recording this on Wednesday, two days before it hopefully goes live, and you see it. Just this week, there's been a huge, unprecedented leak of internal Google documentation on their search algorithm.

We've not seen anything like this really in the past. Quite surprisingly, in a way, that they've managed to keep everything to do with how their search systems work so secure and so private. We did have some unusual inside looks in the US v. Google Department of Justice case last year, and we did see a Yandex leak a while ago as well, which I guess is the closest we've got to something like this in the past.

This leak has confirmed a lot of theories which I and which many others have had and have written about in the Moz blog for a long time but which Google has always explicitly or nearly explicitly denied. There's also a treasure trove of additional information in here, which, to be honest, I've not really scratched the surface of so far and I don't think we have yet as an industry either.

So yeah, I just want to share four highlights with you today. Before I get into that, I want to offer some caveats but also some explanation as to where this has come from. So a huge credit should go to Mike King, Rand Fishkin, and a guy they worked with called Erfan, I believe, who brought this document to light.

This is a set of documents that was accidentally made public from Google's GitHub for a period of time earlier this year. I don't know how Erfan found out about this. He's not himself an ex-Googler as far as I can tell, but perhaps he had access to someone who does now or has previously worked at Google, who sort of tipped him off.

I'm not sure. In any case, Rand and Mike have done a great job of bringing this to light and cataloging some of the most obvious findings as well, and hopefully the posts from them will be linked to below.

I also want to say we don't know exactly how the features listed in this documentation are used. It does seem like it's recent. It does seem like it's currently in use. But it doesn't give us any clues as to how some of the systems that it details are weighted or whether they're used in, for example, news search or YouTube search as opposed to core search.

So this does confirm that Google has these systems, but it doesn't necessarily confirm how they're being used. So yeah, without any further ado, let's talk about what I think are the four most interesting revelations so far, at least for me.

Clickstream

Clickstream including Chrome and long clicks

So, the first one, I'm sort of grouping two things together here under clickstream.

So Google has always denied (a) that any Chrome data is fed into the search algorithm and (b) that they use any kind of click data to inform rankings. Now, I wrote last year about the US v. Google case, sort of explicitly confirming that click data was used through a system called Navboost.

This document gives a little bit more detail on that, including confirming what many SEOs, including me, have talked about for a long time, which is the concept of long clicks. The addition here or the new information here or the new confirmation is Chrome traffic data to give Google more of an, I guess, unbiased view on which websites are actually getting significant amounts of traffic, maybe even aside from what they get through search. So that's basically confirmed.

In terms of SEO implications of Chrome and click data being used, I think, to be honest, if you've been doing SEO well, this probably shouldn't change what you've been doing because this just confirms the importance of having a user experience and a brand that your users enjoy, enjoy discovering, want to come back to, this kind of thing. However, it is good to have this, I guess, doubly confirmed at this point.

siteAuthority

siteAuthority

The second thing I want to talk about I actually covered only a few weeks ago in my Whiteboard Friday on the March algorithm updates, and this is Google having obviously not a Domain Authority because that's a Moz metric, but sort of an equivalent to Domain Authority, which at least in this document is remarkably similarly called ‘siteAuthority.’

Now, the way it's referenced here makes it seem like it's mainly used to assess new pages. So this is similar to something that John Mueller talked about from the Search Love stage, I want to say in 2018. He said that maybe they do have a domain-level signal that is used in this way.

It is actually quite similar to what Moz does with Page Authority, as well. So, if we discover a page and we don't have a calculated page-level score for it yet, we will approximate something based on the domain-level score, and it sounds like Google is doing something very similar. What we don't know is if this is the only way that they leverage this score.

We know from things like the site reputation abuse update and from the way they talk about site-level HCU, that's helpful content update, we know that there are other site-level signals that Google is interested in, but there's not too much detail on that here.

Branded search

Branded search including link BSV ration and Baby Panda

The third thing I want to talk about, and to me, this is the most novel, as I wouldn't have necessarily guessed it previously, although I have covered some similar theories, is branded search.

So many people have noticed, including myself and including a chap called Malcolm Slade that I remember and some others many years ago now, that branded search volume actually correlates really well with organic rankings.

I always assumed there was some other explanation for this, that Google actually had a better way of measuring brand, and this just happened to correlate, that it wasn't a ranking factor in its own right. Now, this doesn't explicitly address that point. It actually gets at something slightly more interesting.

Now, I must say the way this is written about in the documentation is quite arcane. There's a lot of sort of Google-specific jargon and terminology, a lot of links to other documents that we don't have access to, so we can't really see the context. But what it seems to me is that Google is interested in the link to branded search volume ratio of some sites, and that this is part of how Panda worked or works. So, let me just explain how that might work.

So if you have a site which has got a lot of links, but no one is actually searching for that specific site, Google might call it navigational search, no people are looking for that specific site, so say for example, tcapper.co.uk, my personal site, spoiler, it doesn't get a huge amount of traffic, if nobody is looking for tcapper.co.uk, but I have millions of links, that is a bit suspect, right? Something is wrong here. So, that ratio would suggest a problem.

The documentation makes it seem like that's somehow related to or some component of how Panda works. It also alludes to something called Baby Panda, which Mike King has suggested could be the helpful content update or a related system. Actually, this makes sense.

So again, in that last Whiteboard Friday I did about the March updates, I theorized that brand would be a good way of handling some of the problems that Google currently faces, some of the crises it currently faces. Anecdotally, a lot of the sites that you see being hit by helpful content updates, they perhaps do have a slightly smaller brand than one might expect for their traffic level.

That's extremely anecdotal. There are exceptions. But this is interesting. It kind of lines up with a lot of experiences, but I don't think anyone suspected such a crude metric here. So we'll dig into this more as time goes on. But yeah, some food for thought there.

Demotions

Demotions including product reviews and nav

Then, the last thing I want to talk about, by no means the last in this set of documentation, but the last thing I want to talk about today is demotions.

So, this seems like a bunch of different algorithmic penalties. So there are some more obvious ones, for example, exact match domain demotion, which maybe we've known about for a long time.

There were two that I thought were particularly interesting, and there's a long list. Product review demotion: Again, we've talked before about how Google seems to have it in for product review sites. To some degree, they don't like product review sites. Maybe they're just sending revenue to Amazon. Maybe they're full of low-grade affiliates who haven't actually reviewed the products.

In reality, they're just sort of aggregating other people's reviews or making it up. But one way or another, Google seems to have a lot of product review focused updates, and a lot of other updates have disproportionately affected product review sites. So this was interesting to see.

Also, nav demotion, not much detail around this. But hypothetically, what if this was something to do with bad navigation experience or difficult to use navigation on the site? That would make sense as something that Google could have. But yeah, we don't have the full details.

I would thoroughly encourage you to go and take a look at the two articles I mentioned before from Rand and Mike, that wrote this up, and also to the original document, which is currently still viewable.

So yeah, I hope you found that interesting. There is a lot of food for thought here, and maybe for a lot of SEOs, a little bit of an "I told you so" moment.

Transcription by Speechpad

Back to Top
Tom Capper

I head up the Search Science team at Moz, working on Moz's next generation of tools, insights, and products.

Learn from the best in the industry.

Gain access to every presentation and speaker deck from MozCon 2024.

Read Next

Navigating Content Marketing Amidst the Rise of AI — Whiteboard Friday

Navigating Content Marketing Amidst the Rise of AI — Whiteboard Friday

Sep 27, 2024
International SEO — Whiteboard Friday

International SEO — Whiteboard Friday

Sep 20, 2024
Getting Buy-In for Customer Stories — Whiteboard Friday

Getting Buy-In for Customer Stories — Whiteboard Friday

Sep 13, 2024