HREF Builder is the only HREFLang tool that has built in validation for redirects, 404, canonical and robots directives. The goal is to product 100% error free XML site maps that ensure you maximize your crawl budgets.
Every redirect, canonical or robots block is a wasted request from the search engines. The search engines have stated no more than 1% of your submitted URL's should have errors and if they encounter more hey will slow or stop indexing your site and site maps. Why should they waste their time getting bad data?
What are we checking for?
Robots Directives - we check each page to make sure that you have not incorrectly set the robots directive to no index. If we detect it, we add it to the error report
Header Status Codes- we check the header status code for each page and if not 200 we flag the page as an error with the returned code and if a redirect, show the destination URL.
Canonical Tags - incorrect canonical tags are a very large problem for global sites. They can be set incorrectly or set to canonical to a different version of the site for content or logistical reasons. This defeats the purpose of the HREFLang file and needs to be identified.
How Does it work?
During the import process we fetch each of the URL's which gives us the header status and then we scan the page looking for a canonical and robots block.
If any of these elements are incorrect, we identify them, add them to the country/language report and exclude them from the final XML output until they have been corrected. This essentially guarantees that no URL's with errors will be submitted. Once the processing is done you can delete any URL's that are known to be wrong or export them for the dev team to investigate.
In the master View Screen you can click on the red number that indicates the number of errors. In the example below, the source file had 587 URL's (15%) have some sort of error which is why few of the URL's for the main global version of the site were not indexed.
You can download this report and share it with the development team to fix the individual pages or how the XML was created. Most of the time little consideration was given to the quality of these files.