Can Search Engines or Googlebot Crawl Pages Using HTML Forms or Search Result Pages

If you have a blog or website and you always wondered whether Googlebot or other search engines can crawl deeper pages of your website through HTML forms or search result pages within your site, here is a straight answer.

The answer is Yes – the search bots can discover a deeper page of your website through a search result page within your site, from HTML forms or from other input elements. In some cases, when there are not many links pointing to a page on your blog and the search bots find no other way to crawl a specific page of your website, they can definitely have a look at different HTML forms.

Let’s take an example to understand this.

Say you have a website which contains 100 pages but there are 20 pages which do not have any incoming links – neither from pages of your site and not from external sources. In such cases, you may doubt how Googlebot is indexing the content of those pages on your site.

If you have a search box and a particular query or search result points to the specific page in question, the Google bot can use the links from the search form and crawl the deeper page of your website. The bots are very intelligent and they can parse the different input values of a dropdown menu, a search form or may be a radio button and crawl the deeper pages of your website.

In the following video, Google Engineer Matt Cutts, explains how Google bot can use the data from search forms to find undiscovered content on your blog or website:

Matt’s answer is very to the point and makes sense. However, as a webmaster you have the freedom to block certain pages from indexing using the Robots.txt file. You can block the search result pages or any other HTML generated page using a wild card character e.g

user agent:*
disallow: search/*.html

The bottom line is that if some of the pages of your website does not have incoming links, Googlebot can use data from HTML forms to discover your content.

