Getting your website indexed at Google is not a very difficult thing but a lot of webmasters often find it difficult to ensure that their websites are Google friendly, indexable or can be crawled by search engines. There can be a whole host of issues with indexing a site on Google, if you are not careful with the structure and other issues of your website.
Below I will try to explain some problems novice webmasters face, when it comes to indexing their website at Google.
How to Know whether a Particular page has been indexed by Google or not?
Easy. Go to Google.com, enter the URL of the page in Google search box and hit enter. If you see the page listed on Google search result page, the page is indexed with Google. However, if you do not see the page at all, chances are that the page is not indexed with Google yet.
Sometimes, it may take some time for Google to index or “find” a page. This depends on how often your website is updates, how many backlinks it has and a whole host of other signals. Hence, if you add a new page or publish a new blog post, do not expect the page to be indexed the net moment, It can take a day or two or even a week to get that page indexed at Google. Recommended reading – adding your website to Google’s index
One more thing. You can not control when Google chooses to index a page or section on your website. You can use the Fetch as Googlebot tool to tell Google that you have a new page up, but it is upto Google to decide when the page will be indexed and if at all the page deserves to be indexed.
How to know when a page was last indexed by Google?
This one is also easy. Go to Google.com and enter the following command in Google search box:
cache: URL of the page you want to test.
So if your website is domain.com and you want to know when was the last time Google indexed or crawled a specific page e.g domain.com/about.html, you have to enter the following command
Hit Enter and you should see the latest version of the page that Google saved in its cache. Here, you can notice the date and time when this particular page was last crawled and indexed by Google.
What is the best way to ensure all my website page’s are indexed at Google?
There is no best or full proof way to ensure smooth indexing or crawling of a website or blog. It greatly depends on how your website is built, what content management system you are using and the website’s structure. However, given below is a checklist which you can use to ensure that Google or other search engines will have less difficulties in indexing your content and pages
- Triple check your Robots.txt file, check whether you’ve disallowed Google to index a portion of your website or the whole website.
- Check the HTML source of your website’s pages. Do you see a meta No INDEX tag? If yes, remove it.
- Move pages closer to the root directory of your website. The less number of directories or sub-directories you use, the better.
- Make sure every page can be reached by a text link which is already on the website. Remember, Google crawls, finds and indexes new content through links so when you upload a new page or publish a new blog post, link it from an old page on your website.
- Use an HTML sitemap to highlight key pages and important sections of your website. Although HTML sitemap do not directly influence crawling or indexing but an HTML sitemap may help search bots find pages that are otherwise not indexed.
- Most importantly, create a Google webmaster tools account and add an XML sitemap of your website in Google Webmaster tools. Google webmaster tools is by far the best way to analyze how Google is crawling your website, whether you have duplicate titles and description tags and whether Googlebot i having issues crawling into your website. Read our earlier tips on how you can use Google webmaster tools to prevent duplicate content and remove not found and 404 pages from your website.
- Learn how to properly deal with 404 pages and 301 redirects.
Google May choose not to index pages on your website that do not add value
There are situations when Google may decide not to index specific pages or a complete section, even if you have followed all of the best practices. Google’s search algorithm has gotten smarter since the launch of Panda and Penguin updates and if some pages of your website do not add value, the algorithms may decide not to index them.
Gary Illyes, a Googler explained to a user at Google help forums that Google may deliberately choose not to index specific pages on a website. Precisely, Google may choose to ignore pages that do not add value to their users and choose not to index them at all. Here is what Gary suggested:
As we improve our algorithms, they may decide to not reindex pages that are likely to be not useful for the users. I took a look on the pages that were once indexed but currently aren’t and it appears there are quite a few that have no real content.
Then Gary cites some example pages from the user’s site which are actually soft 404’s and empty pages that do not have any content at all.
So the bottom line is that if your website has content that adds no value to users, Google may decide not to index it at all. You might see a drop in the number of indexed pages in your Google webmaster tools account, if the drop is a major one, you might see a huge dip in traffic as well. While Gary suggests to check for canonical issues and whether the sitemap refrence of a URL is equal to the canonical version, my advice would be the following.
Carefully examine the pages of your site from scratch. If the information is outdated, or the said method no longer works, you must update it or improve the content. If updating the content does not make sense, delete the page or blog post from your site and leave it 404. This is because the algortihms have learned the art of finding junk and obsolte pages on your website that do not add value and eventually, they are going to de-index the page anyway. There is no sure-shot answer, you have to be careful but if you see a sudden drop in the number of indexed pages, please perform a quality check and update or improve the content of weak and “thin” pages on the site.