 |
|
 |
 |
 |
 |
Let's say you typically structure your RSS feed pages as summaries of a more complete page on each of the subjects that are located elsewhere on your website, and because of this you'd like to stop the search engines indexing the RSS feed pages so that they don't think you are trying to "spam the index" (a view they sometimes take when you have 2 similar pages on the same subject).
Now just because you can stop search engines indexing the RSS feed pages, doesn't mean that it is the right thing for you to do. In fact, most sites wont be looking to restrict them at all, but for those of you who do have a case where you don't want the pages indexed, read how below.
The way to do this is to use a robots.txt file to tell the search engines' spiders that you are happy for them to look around your site but you really want them to keep out of certain folders. The first example below is for a site that is using an unregistered version of the rssFeedFolder program.
User-agent: *
Disallow: /rssfeedfolder/
|
This next example is for a site that is using a registered version of the rssFeedFolder program to generate 3 RSS feed folders (none of which they want to be indexed on the search engines).
User-agent: *
Disallow: /news/
Disallow: /hiring/
Disallow: /investor-relations/
|
When you create the robots.txt file, ftp it up to the top level directory of your site, e.g. For yourdomain.com, copy the robots.txt file into the same directory/folder as your homepage. And then test it is correct with your browser using the following URL http://www.yourdomain.com/robots.txt.
|
 |
 |
 |
 |
 |
|
 |
|
|
|