Без рубрики

A few months back I discovered a shocking bug in how Google handles XML sitemaps, which enabled brand new sites to rank for competitive shopping terms by hijacking the equity from legitimate sites.

I reported this issue to Google, and they have now fixed the issue, and paid me a bug bounty.

However, since I published my write up of the issue, a number of SEO professionals have contacted me worrying that they may have been a victim of such an attack, requesting that I help use the attack or theorizing variations which may still work.

This article will answer some of the most popular questions I’ve been getting.

What Was Google’s XML Sitemap Bug?

The issue is related to how Google handles and authenticates XML sitemap files, specifically those files that were submitted via the ping mechanism.

Sitemaps can be submitted directly to Google Search Console, via an entry in your robots.txt file, or by ‘pinging’ them by sending the sitemap URL to a special endpoint that Google provides.

For GSC and robots.txt entries these are obviously authenticated as genuine by the fact you have entry to the domain’s GSC or robots.txt file, but for ping URLs, Google seemed to decide whether they were trustworthy simply by looking at the domain in the URL that you send.

The issue is that if this URL redirects elsewhere, even to a different domain, then Google still trusts it as belonging to that original URL.

So, for example, I may submit a sitemap URL of apples.com/sitemap.xml, but that URL may redirect to oranges.com/sitemap.xml, but Google would still associate the XML sitemap as belonging to apples.com.

What Are Open Redirects?

Many websites succumb to a form of manipulation known as “open redirects,” where an attacker can trick a website into redirecting to a location of their choice.

An example may be websites that have a login mechanism that has the form apples.com/login.php?continue=/shop, which may be manipulated to be apples.com/logout.php?continue=http://evil.com/.

In my research, I found open redirects on Facebook, LinkedIn, Tesco, and a number of other sites (I’ve reported all of these, and many have been fixed).

To give an indication of how widespread they are, Google’s Vulnerability Rewards Program explicitly excludes open redirects as qualifying for a bounty (and indeed there are known open redirects on Google).

This allowed for the opportunity to ping sitemaps via an open redirect on a legitimate site which would redirect to the XML file hosted on an attacker site.

For example, by submitting a sitemap on the URL apples.com/logout.php?continue=http://evil.com/sitemap.xml, Google would treat it as being an authentic sitemap for apples.com, but it would actually be hosted on evil.com.

At this point, evil.com can submit sitemaps for apples.com, and by including hreflang entries in these sitemaps, it can leverage apples.com’s equity (PageRank) to rank for search results it has no legitimate right to do so.

Are You a Victim & Now Being Outranked?

Since the news became public, I have had a bunch of SEO professionals from various places reach out to me asking me to review their case, concerned that they may have been the victim of this or asking if this is how a competitor is able to outrank them.

I can certainly understand why.

It can sometimes be super frustrating to try to understand why another site is ranking so well against you, or why your site has suddenly had a lull in performance.

Having an explanation for these edge cases is certainly appealing.

So far I have not seen anything to convince me that this bug was being exploited in the wild.

Google is a complex beast, and there could be all sorts of explanations for why certain sites are ranking the way they are, but at the moment I remain to be convinced that this bug is one of them.

If you are concerned you are the victim of this, then the only real footprint it would leave is an entry in your server logs showing Googlebot coming to your site to collect a sitemap and being 3xx redirected to another domain (JavaScript and meta-refresh redirects wouldn’t work).

This is the best thing you can check.

In my experiment, I was regularly re-pinging the sitemap, but even without re-pings it I believe Google would always go via the open redirect, so you should see entries in your server logs.

Does This Change Anything About XML Sitemaps?

Yes. It changes when hreflang entries will be used.

Google will no longer pay attention to hreflang entries in “unverified sitemaps”, which I believe means those submitted via the ping URL.

Those submitted inside Google Search Console or in your robots.txt file will still operate as they always have done, and pinging one of these sitemaps to prompt a recrawl from Google will also work as expected.

I anticipate the change will affect very few sites, but you should be aware of it.

Conclusion

My recommendation: submit sitemaps via both the GSC interface and include them inside your robots.txt.

If you are a site that suffers particularly from scrapers, for whatever reason, then you may wish to exclude sitemap entries from your robots.txt file such that bad actors cannot find them and use them to expedite their efforts.

18.04.2018
google-bug-xml-sitemaps-760x400

Google Black Hat Sitemap Bug: What It Means for XML Sitemaps

A few months back I discovered a shocking bug in how Google handles XML sitemaps, which enabled brand new sites to rank for competitive shopping terms […]
05.04.2018
structured-data-errors-760x400

SEO for Rich Results: How to Find & Fix Structured Data Errors

Using structured data on your website is an essential part of SEO. At its most basic, structured data helps engines understand your content better (a must as semantic web […]
28.03.2018
international-seo-cost-saving-760x400

5 Cost-Saving Tips for International SEO

Building and running a well-oiled website doesn’t come cheap. Having multiple global websites could cost you loads of $$$. How can you possibly manage and optimize […]
12.03.2018
ai-content-seo-1520x800

AI, Content & Search: 5 Macro Market Trends for Micro Marketing

We work in probably the most dynamic and fast-paced marketing discipline in the world. When we take time to step back and look at some of […]
06.03.2018
mobile-first-design-1520x800

Clues in Google’s Mobile SERPs Test

A few weeks ago I noticed a search results test. Tests can reveal the direction Google aspires to take in a mobile first world. It’s likely […]
28.02.2018
search-engine-results-760x400

7 Insights You Can Unlock From Every Relevant SERP

Search engine results pages, or SERPs, are an always evolving user experience. The SERPs have changed dramatically since the early days of search. Although many of […]
23.02.2018
machine-learning-760x400

How Search Engines Use Machine Learning: 9 Things We Know for Sure

When we first started hearing about machine learning in the early 2010s, it seemed scary at first. But once it was explained to us (and we realized how […]
15.02.2018
dumb-seo-mistakes-760x400

5 Dumb SEO Mistakes That Smart People Make

We All Start at Zero Why are there so many mistakes? For starters, there is a monumental amount of disinformation about SEO available online. In fact, when I […]
31.01.2018
build-on-page-engagement-and-drive-conversions-760x400

3 Stupid Easy Ways to Drive Conversions

I never realized it back then, but growing up in church, I learned an important marketing lesson: You have not because you ask not. That’s a […]
25.01.2018
seo-in-china-760x400

The State of Search Engine Marketing in China

Back in 2004, when I started taking the best practices in search from the U.S. and implementing them in China, Baidu was only a few years old. There was no […]