How does the crawling process work? Where do the Crawler Plugins interact?
At the beginning,
onStartCrawling(Crawler) is called for all plugins.
onAcceptURL(String url, CrawlerJob job)or
onDeclineURL(String url)is called to inform the plugin)
boolean checkDynamicBlacklist(String url, String sourceUrl, String sourceLinkText)- if at least one of the plugins returns true, the file isn't indexed.)
onDeleteIndexEntry(Document doc, IndexReader index)will be called (just before deletion).)
onBeforePrepare(RawDocument document, WriteablePreparator preparator)) and after (
onAfterPrepare(RawDocument document, WriteablePreparator preparator)) the actual preparation, the plugins are called.
At the end,
onFinishCrawling(Crawler) is called.