User Tools

Site Tools


features:white_and_black_list
Translations of this page:

White and black list

Using the white and black list you can specify very precisely, what will get in the index and what not.

The base rule is always: A document gets in the index, if its URL comes up to at least one entry from the white list, but no entry from the black list.

How can I use this feature?

The lists are defined in the CrawlerConfiguration.xml by the tags <whitelist> tag resp. <blacklist> tag.

The following configuration defines for example that all URLs should be taken in that start with http://www.mydomain.de, except for those starting with http://www.mydomain.de/some/dynamic/content/:

<whitelist>
  <prefix><nowiki>http://www.mydomain.de</nowiki></prefix>
</whitelist>
<blacklist>
  <prefix><nowiki>http://www.mydomain.de/some/dynamic/content/</nowiki></prefix>
</blacklist>

Additionally, a crawler plugin may be written in order to blacklist files according to more complex conditions (e.g. filesize).

features/white_and_black_list.txt · Last modified: 2014/10/29 10:22 (external edit)