Using the white and black list you can specify very precisely, what will get in the index and what not.
The base rule is always: A document gets in the index, if its URL comes up to at least one entry from the white list, but no entry from the black list.
The lists are defined in the CrawlerConfiguration.xml
by the tags <whitelist> tag resp. <blacklist> tag.
The following configuration defines for example that all URLs should be taken in that start with http://www.mydomain.de
, except for those starting with http://www.mydomain.de/some/dynamic/content/
:
<whitelist> <prefix><nowiki>http://www.mydomain.de</nowiki></prefix> </whitelist> <blacklist> <prefix><nowiki>http://www.mydomain.de/some/dynamic/content/</nowiki></prefix> </blacklist>
Additionally, a crawler plugin may be written in order to blacklist files according to more complex conditions (e.g. filesize).