Troubleshooting crawl errors
Overview
Cloudflare allows search engine crawlers and bots. If you observe crawl issues or Cloudflare challenges presented to the search engine crawler or bot, contact Cloudflare support with the information you gather when troubleshooting the crawl errors via the methods outlined in this guide.
Disable Anti-bot modules
Search engine crawlers’ requests, when proxied through Cloudflare, can blocked by anti-bot modules installed on your origin server. Try disabling any anti-bot modules to prevent your origin from blocking these requests.
Adjust Google and Bing crawl rates
To optimize CDN performance, Google and Bing assign special crawl rates to websites that use CDN services in order. Special crawl rates do not negatively affect Search Engine Optimization (SEO) and Search Engine Results Pages (SERPs). To change your crawl rates for Bing and Google, follow the guides below:
- Change the Google crawl rate by reviewing Google’s documentation.
- Change your Bing crawl rate via guidance from Bing’s documentation:
Prevent crawl errors
Review the following recommendations to prevent crawler errors:
Monitor the performance and availability of your website using a third-party tool:
Do not block Google crawler IP addresses via firewall rules or IP Access Rules within the Security app. If you are using rate limiting rules (new version), make sure they do not apply to the Google crawler.
Confirm an IP address belongs to Google by consulting Google’s documentation on verifying googlebot IP addresses.
- Do not block the United States via firewall rules or IP Access Rules within the Security app.
- Do not block or User-Agents in your .htaccess, server configuration, robots.txt, or web application.
Google uses a variety of User-Agents to crawl your website. You can test your robots.txt via Google.
- Do not allow crawling of files in the /cdn-cgi/ directory. This path is used internally by Cloudflare and Google encounters errors when crawling it. Disallow crawls of cdn-cgi via robots.txt:
Disallow: /cdn-cgi/
- Ensure your robots.txt file allows the AdSense crawler.
- Restore original visitor IP addresses in your server logs.
Troubleshoot crawl errors
Troubleshooting steps for the most commonly reported crawl errors are mentioned below.
HTTP 4XX Errors
HTTP 4XX errors are the most common type of crawl error. Cloudflare delivers these errors from your web server to Google. These errors are caused for various reasons such as a missing page on your web server or a malformed link in your HTML. The solution depends upon the problem encountered.
HTTP 5XX Errors
HTTP 5XX errors indicate that either Cloudflare or your origin web server experienced an internal error. To correlate occurrences of crawl errors with site outages, monitor your origin web server’s health. Monitoring your website health both through Cloudflare and directly to your origin web server IPs determines whether errors occurred due to Cloudflare or your origin web server.
DNS Errors
Troubleshooting steps vary depending on whether your domain is on Cloudflare via a Full or CNAME setup. To verify which setup your domain uses, open a terminal and execute the following command (replace www.example.com with your Cloudflare domain):
dig +short SOA
_www.example.com_
For domains on a CNAME setup, the result response contains cdn.cloudflare.net. For example:
example.com.cdn.cloudflare.net.
For domains on a Full setup, the result response contains the cloudflare.com domain in the nameservers listed. For example:
josh.ns.cloudflare.com. dns.cloudflare.com. 2013050901 10000 2400 604800 3600
Once you’ve confirmed how your domain was setup with Cloudflare, proceed with the troubleshooting steps appropriate to your domain setup.
CNAME
Contact your hosting provider to investigate DNS errors and provide the date Google encountered DNS errors. Additionally, review the Cloudflare System Status page for any network outages on the date the errors were encountered by Google.
Full
Contact Cloudflare support and provide the date and time that Google observed the errors.
Requesting troubleshooting assistance
If the above troubleshooting steps do not resolve your crawl errors, follow the steps below to export crawler errors as a .csv file from your Google Webmaster Tools Dashboard. Include this .csv file when contacting Cloudflare Support.
- Log in to your Google Webmaster Tools account and navigate to the Health section of the affected domain.
- Click Crawl Errors in the left hand navigation.
- Click Download to export the list of errors as a .csv file.
- Provide the downloaded .csv file to Cloudflare support.