Managing a website or a blog is not so easy, as it may seem from the surface. There are a lot of background work which every webmaster has to worry about, in order to provide a good user experience and make sure the search bots can “find” and chew the content of the site.
One of the most widely discussed problems is how to properly deal with non existent 404 pages on your site; there is a difference of opinion among webmasters and SEO’s around the world regarding the best practice, do’s and don’ts.
Some common doubts and questions which may arise are as follows:
- Should you 301 redirect 404 pages to the home page (the index) page of your site?
- Should you leave the 404 pages as it is?
- If a website has thousands of 404 pages, will it affect the rankings of other pages?
- Is it considered a good practice to remove 404 pages by using the URL removal tool, available in Google webmaster tools?
- Is it a good practice to redirect the 404 pages to archive pages of your site and maintain a moderate user experience?
- I am using Google webmaster tools to remove 404 pages but the crawl errors are coming back after every few weeks. What is the best way to ensure that the crawl errors are gone and my site remains healthy in Google’s eyes?
The short answer to all the above question is that it depends on the nature of the problem and the goals of the site in question. There can be scenarios when redirecting a bunch of 404 pages to the homepage makes sense for an ecommerce site but the same might not hold true for a website that serves content e.g a blog for example.
There are two main disadvantages of 404 pages which you should worry about. First, 404 pages results in a very bad user experience. If a large number of users are landing on a 404 page, they will surely go to another source and this will hurt your site’s reputation. Second, if there are a good number of external links pointing to these 404 pages, your site is losing a good amount of Google juice
Stephanie Chang from Seomoz gives some good advice on how you should deal with expired content and ensure that both users and search engines are not annoyed, when they try to discover your content or services that no longer exist on your website. She writes:
Create a custom 404 page: If your website has large number of 404 pages, it makes sense to create a custom 404 page that includes keyword rich links to other useful pages on the site. That way, visitors who land on your 404 page might convert better than the usual 404 page which offers nothing. (Tip: If you are using WordPress as the content management system, I highly recommend using Bing 404 WordPress plugin. This plugin will intercept your standard 404 page and return a list of urls that may help your user find the content they are looking for.)
How much traffic the page had received?: Check your visitor analytics program and see whether the page in question had received significant traffic in past. Are there a couple of external links pointing to this page? What was the Google pagerank of this page before you deleted it? (Tip: bulk check Google pagerank of all the pages of your site).
In this case, it is recommended to 301 redirect the broken 404 page to a new page on the same site which contains similar content. If no similar content is available, create a new page, add the old content and 301 redirect the old link to this new one. That way, your external links will now automatically point to this new page and your website won’t lose the Google juice that was flowing through those external links.
Now where should you redirect a given 404 page, if no similar content is available? Stephanie writes:
Consider what would result in the best user experience. You want to redirect these pages to the most relevant page. A suggestion is to take a look at the breadcrumbs and redirect the page based on the internal navigation of the site. For instance, the product page can be redirected to the most relevant sub-category page. You want to be careful that you’re redirecting the page to another page that is likely to stay on the site in the foreseeable future, otherwise you run the risk of having to deal with this issue again.
Show dynamic messages, Inform the user: If you have an ecommerce site and you’ve recently overhauled your online store, it is very much possible that there are more than a thousand 404 pages on the site. In this case, it is a good idea to serve dynamic messages to the visitor via cookies. That way, users will know that the speciic product is no longer available on the site and instead of random wandering, they can be directed to related products or more resources on your site.
Redirect only the important pages: Redirecting a whole bunch of pages through Htacess or via a script is a strict no no, as this may adversly affect your server performance and slow down the entire website. You should take extra care to filter important pages and redirect only those that have received external links, have good authority and traffic.
Leave it as it is: 404 pages are normal and there is no reason why you should feel savvy about them. If there are 500 pages which you have recently deleted and want them removed from Google’s index, keep them 404. Let the search bots (e.g Googlebot) see that these URL’s are pointing to 404 pages and they will automatically drop them from their index sooner than later. However, if you redirect all the pages to the homepage or any other archive page that is not related to the content of the page in question, search bots may sometimes be misinformed and there are chances that it will negatively impact your site’s rankings.
Google engineers have already said that if you want Google to de-index the 404 pages, keep them as it is and ensure that there are no internal links pointing to those pages.
Remove the references from your XML and HTML sitemap: Clean up your XMl and HTML sitemap and remove all references of the 404 pages. The search bots frequently crawl these pages and there is no reason why you should let Googlebot see those links in the first place. Also clean up internal links that are pointing to those 404 pages from other pages on your site.
Finally, here are some useful advice from Rand Fishkin and Google engineer Matt Cutts on how you should handle expired content: