iSearch PHP Site Search Engine - Excluding Pages from the Search Index

Pages can be excluded from the search index in a number of ways:

  • If the <META name="robots"> tag is present and includes the data "noindex", the page will not be index. The "nofollow" data will prevent the spider engine from following any links found on that page. To prevent a cached copy of the page being stored, include the "noarchive" data in the robots META tag.
  • Use a robots.txt file to exclude iSearch from certain parts of your site. See http://www.robotstxt.org/wc/norobots.html for the specification of this file. iSearch follows the rules for the "iSearch" user-agent. The Google extension of "*" (wildcard match) and "$" (match end of line) are supported by iSearch.
  • If the URL to a page begins with one of the strings in the "Exclude URL(s) Beginning" configuration setting it will not be indexed.
  • If the URL to a page matches one of the regular expressions in the "Exclude URL(s) Regexp" configuration setting it will not be indexed.

Part of a page can be excluded by placing the following HTML comments around the text:

    <!-- ISEARCH_BEGIN_INDEX -->
    <!-- ISEARCH_END_INDEX -->

These are used as follows:

  • The ISEARCH_BEGIN_INDEX and ISEARCH_END_INDEX comments must appear in pairs.
  • If the first ISEARCH_END_INDEX comment appears before the first ISEARCH_START_INDEX comment, the following body text will be excluded from the search index:
    • any part of the body of the page between the ISEARCH_END_INDEX and ISEARCH_START_INDEX pairs
    For example:
        <HTML>
        <HEAD>
        <TITLE>Title</TITLE>
        </HEAD>
        <BODY>
        <P>This paragraph will be indexed.
        <!-- ISEARCH_END_INDEX -->
        <P>This paragraph will NOT be indexed.
        <!-- ISEARCH_BEGIN_INDEX -->
        <P>This paragraph will be indexed.
        </BODY>
        </HTML>
    
  • If the first ISEARCH_BEGIN_INDEX comment appears before the first ISEARCH_END_INDEX comment, the following body text will be excluded from the search index:
    • body text before the first ISEARCH_BEGIN_INDEX
    • body text between subsequent the ISEARCH_END_INDEX and ISEARCH_START_INDEX pairs
    • body text after the last ISEARCH_END_INDEX
    For example:
        <HTML>
        <HEAD>
        <TITLE>Title</TITLE>
        </HEAD>
        <BODY>
        <P>This paragraph will NOT be indexed.
        <!-- ISEARCH_BEGIN_INDEX -->
        <P>This paragraph will be indexed.
        <!-- ISEARCH_END_INDEX -->
        <P>This paragraph will NOT be indexed.
        </BODY>
        </HTML>
    
  • If the ISEARCH_BEGIN_INDEX is not present, the whole body of the page will be indexed.

These exclusion comments only apply to the HTML document <BODY>. They are ignored in the document <HEAD>.

You can prevent the spider from following some links in your site using the following comments:

    <!-- ISEARCH_BEGIN_FOLLOW -->
    <!-- ISEARCH_END_FOLLOW -->

These work in the same way as the index tags, but only affect whether links are followed by the spider.


This site is hosted by Z-Host. For low cost PHP web hosting visit Z-Host.com. Web hosting plans start at just $20 PER YEAR.

iSearch Home

Pro Version

Requirements

Download

Quick Install

Installation

Excluding Pages

Site Map

FAQ

Support

Donate

Toolbar Removal

Forums


 
 

PHP Mission - PHP Tutorials and Forums