White and black list

Using the white and black list you can specify very precisely, what will get in the index and what not.

The base rule is always: A document gets in the index, if its URL comes up to at least one entry from the white list, but no entry from the black list.

How can I use this feature?

The lists are defined in the CrawlerConfiguration.xml by the tags <whitelist> tag resp. <blacklist> tag.

The following configuration defines for example that all URLs should be taken in that start with, except for those starting with


Additionally, a crawler plugin may be written in order to blacklist files according to more complex conditions (e.g. filesize).

