User Tools

Site Tools


features:white_and_black_list
Translations of this page:

====== White and black list ====== Using the **white and black list** you can specify very precisely, what will get in the index and what not. The base rule is always: A document gets in the index, if its URL comes up to at least one entry from the white list, but no entry from the black list. ===== How can I use this feature? ===== The lists are defined in the ''CrawlerConfiguration.xml'' by the tags [[:config:CrawlerConfiguration.xml#<whitelist> tag]] resp. [[:config:CrawlerConfiguration.xml#<blacklist> tag]]. The following configuration defines for example that all URLs should be taken in that start with ''<nowiki>http://www.mydomain.de</nowiki>'', except for those starting with ''<nowiki>http://www.mydomain.de/some/dynamic/content/</nowiki>'': <code xml> <whitelist> <prefix><nowiki>http://www.mydomain.de</nowiki></prefix> </whitelist> <blacklist> <prefix><nowiki>http://www.mydomain.de/some/dynamic/content/</nowiki></prefix> </blacklist> </code> Additionally, a [[components:crawler_plugins#checkdynamicblacklist|crawler plugin]] may be written in order to blacklist files according to more complex conditions (e.g. filesize).

features/white_and_black_list.txt · Last modified: 2024/09/18 08:31 (external edit)