Без рубрики

Gary Illyes of Google has reiterated the importance of utilizing robots.txt to prevent crawlers from accessing URLs that execute actions like adding items to carts or wishlists, thereby conserving server resources.

In a recent LinkedIn post, Illyes emphasized longstanding advice directed at website owners: Employ the robots.txt file to shield action-oriented URLs from web crawlers.

Illyes pointed out the common issue of excessive crawler traffic burdening servers, typically caused by bots indexing URLs designed for user-specific actions.

He stated:

«Upon reviewing the crawling behavior reported by sites, it frequently involves URLs that execute actions such as ‘add to cart’ or ‘add to wishlist.’ These are irrelevant to crawlers and are likely unwanted.»

To mitigate server strain, Illyes recommended blocking URLs containing parameters like «?add_to_cart» or «?add_to_wishlist» in the robots.txt file.

As an illustrative measure, he suggested:

«If your site includes URLs such as:
https://example.com/product/scented-candle-v1?add_to_cart
and
https://example.com/product/scented-candle-v1?add_to_wishlist

It’s advisable to implement a disallow directive for them in your robots.txt file.»

While deploying the HTTP POST method can also deter crawlers from indexing such URLs, Illyes cautioned that crawlers may still initiate POST requests, underscoring the continued relevance of robots.txt.

Reaffirming Traditional Best Practices Alan Perkins, participating in the discussion, underscored the historical basis of this guidance, dating back to web standards introduced in the 1990s for analogous reasons.

Quoting from a document titled «A Standard for Robot Exclusion» from 1993:

«In 1993 and 1994, there were instances where robots accessed WWW servers against their host’s wishes…robots traversed unsuitable parts of WWW servers, such as excessively deep virtual trees, redundant information, transient data, or cgi-scripts with unintended effects (like voting).»

The robots.txt standard, proposing rules to restrict well-mannered crawler access, emerged as a consensus among web stakeholders in 1994.

Adherence and Exceptions Illyes affirmed Google’s commitment to honoring robots.txt directives, noting rare documented exceptions involving «user-initiated or contractually obligated fetches.»

This adherence to robots.txt has long been a cornerstone of Google’s web crawling policies.

17.06.2024
google

Google Issues Reminder on Robots.txt Usage to Block Action URLs

Gary Illyes of Google has reiterated the importance of utilizing robots.txt to prevent crawlers from accessing URLs that execute actions like adding items to carts or […]
26.05.2024
google_core_update

Google CEO Addresses AI’s Effect on Search Traffic Concerns

Google CEO Sundar Pichai discusses the influence of AI on search traffic, asserting it boosts user engagement. In a recent conversation, Google CEO Sundar Pichai explored […]
14.05.2024
Google-Algorithm-Updates-of-2021-1-1

Significant Overhaul to Google’s Product Structured Data Documentation

Google has undertaken a major revamp of its extensive Product Structured Data documentation, breaking it down into three separate pages, each focusing on specific topics. This […]
11.04.2024
Google-Algorithm-Updates-of-2021-1-1

Google Enhances INP Metric for Websites Utilizing Consent Management Platforms

In a bid to boost website performance, Google has enhanced its Interaction to Next Paint (INP) metric, particularly benefiting sites employing consent management platforms (CMPs). This […]
25.03.2024
ru-kak-obnaruzit-straf-google-1 (2)

Google Responds to Queries About SEO Impact of Tailoring Content by Country

Google’s John Mueller provides insights into whether displaying different content based on a visitor’s country affects SEO rankings. Recently, John Mueller addressed a query on Reddit […]
23.02.2024
google-links

Google has recently announced a groundbreaking agreement with Reddit

Google has recently announced a groundbreaking agreement with Reddit, granting Google real-time access to a vast array of Reddit conversations. This collaboration aims to amplify the […]
12.02.2024
Google-Gemini

Google Introduces Gemini, Formerly Known as Bard, Alongside Paid Version

Google has rebranded its generative AI-powered conversational tool Bard to Gemini. This renaming comes with the unveiling of Gemini Advanced, a paid version of the platform, […]
29.01.2024
dolly22

Amazon Q Emerges as AWS’ Response to Microsoft’s GPT-Driven Copilot

Unveiled during AWS CEO Adam Selipsky’s keynote at the ongoing re:Invent 2023 conference, Amazon Q has emerged as Amazon Web Services’ (AWS) solution to Microsoft’s GPT-driven […]
15.01.2024
google_core_update

Google Clarifies the Functionality of the Index, Follow Meta Tag

In an insightful discussion on Reddit, Google’s John Mueller delves into the intricacies of a frequently employed meta robots tag and sheds light on the consequences […]
31.12.2023
page-experience

Google Enhances Emphasis on Business ‘Availability’ Signal in Local Search Ranking Algorithm

oogle’s latest update to its local search ranking algorithm has placed a greater emphasis on the «availability» signal for non-navigational queries, signaling a notable shift in […]