How To Find and Fix Your 404 Errors Early

Find and Fix Your 404 Errors Early

We would like to think that the urls that Google has in its index of your website are the most current ones, i.e. the same ones that are in your sitemap.  However, for a variety of reasons, that is not the case. 

A crawl by a link-checker of your website will only reveal if you have 404 errors. 

What you need to do is to figure out if Google has bad links. To crawl Google, if you will, and find out what urls it is going to send users to, and then if you can find them, you can change the visitors experience.  

If you knew that the link in the Google index was going to generate a 404 error, you could setup a redirect in advance and then that URL would not give the visitor the 404 experience.

Yes, there are lazy alternatives. You could wait for the visit to show up in your websites 404 log, and correct it then.

Also, you could wait for Google to find and correct it's index. That'll happen, maybe.

So, if you do nothing the problem that some of your urls in the Google index should eventually go away.  Maybe. However, you can be proactive. It is easy, but it takes several steps.

You need to use the site: command.

The site: command will show you what pages of your domain Google has indexed. An incognito window may give you better results. 

You will need a bookmarklet.

You want to view the url behind the link. is preferred. Bookmarklets are small javascript scripts stored in a bookmarklet as if they were urls. When clicked, the javascript executes upon the current page in the browser. To undo the effects of the bookmarklet, refresh the page or press F5. Maybe you always like to read black on white text instead of the dark screen your favorite hacking site has selected. Well, a bookmarklet can change the color instantly.

Results from doing a site: search on the domain.
Results from doing a site: search on the domain.

The bookmarklet I use to convert links into their full urls is  "full urls as link text" by Jesse Ruderman. The effect of  this bookmarklet is to replace the link text with the full url, the 'https://..." url. 

Screenshot of Google screen after bookmarklet has converted links to urls.
Results from bookmarklet. Links are now displayed as urls.

You will need to copy the screen into your clipboard buffer.

The effect of the "full urls as link text" bookmarklet will be ugly, but you are not concerned with the on-screen presentation. You want the ugly text, the urls, to "get in your belly".  To get them, copy all the text into the clipboard. Now that it displayed on the screen, Ctrl-A will select all the text, and Ctrl-C will copy it to the clipboard buffer.

Screenshot showing using ctrl-A and ctrl-C to copy text into clipboard.
Copying the screen into the Windows clipboard.

You will need to make the https links begin each line.

Before we can use it as a URL list we need to clean it up a bit. I paste the list of urls into Notepad2.  You can use whatever text editor you like as long as you know how to search-and-replace for carriage returns and linefeeds. I use 'Notepad2' and search and replace all the "https:\\" with carriage-return/line-feed (crlf) followed by "https://". This will have the effect of making all the urls start on their own line. Copy the modified text into your clipboard again. 

Using Notepad2 to search-and-replace the https: with CRLF https
Using Notepad2 to search-and-replace the https: with CRLF https

Take a breath. Where are we? 

We have a list of lines, some of which are urls, in the Windows clipboard buffer. We are going to take it to a website and get just the urls we need.

We will dedupe the list and discard the rest.

There are several websites that will do this, as well as several stand-alone utilities. For the purpose of this post, I used This will sort the list and remove duplicates.  

The output of, or whatever site you use, is a list of url's representing everywhere on your domain that Google is likely to send a searcher. 

Copy the lines that begin with from into the clipboard again.  This is the list that you will process.

Sorting the list and discarding the duplicates with
Sorting the list and discarding the duplicates with

Filter the list to just the URLs for the domain we want to see.

Filter the lines to just your domain.
Filter the lines to just  your domain.

Lots of a ways to do this.  I usually just use Notepad2, but here's a website called that has a filter-lines online utility.

Use a link-checking program that will crawl a list of urls.

Now that we have a list of URL's representing everywhere on our domain that Google can send a searcher, we need to send that list through a link-checking program. Xenu's Link Sleuth is a program for Windows that I highly recommend. However, there are many online alternatives.

For the purpose of this demonstration, I used I pasted the url list into the website and in a few seconds I had discovered that 4 urls in the google index will give the visitor a 404 error rather than content. 

(A custom 404 error-handler could be used to take the user to a "page not found" box and then to the search page so that the visitor might find the desired page. )

A web-based link-checker website.
A web-based link-checker website.

So, Google has the wrong link, how do we fix it?

Eventually, Google may crawl the sitemap.xml for your site and correct the link, but can you wait? This post is about being pro-active and solving the problem before the Google user sees it. You can't fix Google fast enough so what can you do?

Set up a permanent redirect.

You can inform anyone, even Google Search, that the url has changed by setting up a permanent redirect to the right page. If you don't know what the correct page is, then maybe you redirect them to a custom 404 page, or to your search page. It's up to you.  

How to setup a redirect in Google Blogger.

My solution is to redirect the incoming traffic to the right URL by using Google Blogger's redirection tool which is found under settings. In Google Blogger, under settings, under "Errors and Redirects"  is "Custom Redirects".  

After putting in the new equivalents for the old urls, I can re-run the report.