User Tools

Site Tools


components:crawler_plugins
Translations of this page:

This is an old revision of the document!


Crawler Plugins

Crawler Plugins hook into the crawling process in order to add advanced functionality.

What can crawler plugins do?

Some examples:

  • Modify the result of preparators
    • by specifying default-values if the chosen preparator does not fill in a certain field (onBeforePrepare)
    • by overriding or modyfing the results of whatever preparator was chosen (onAfterPrepare)
  • Modify their storage in the lucene index
  • Do sth at every start or end of the crawling process (e.g. inform the administrator via email)

How to create a crawler plugin

  1. Create a class that implements CrawlerPlugin.
  2. Packaged it (and all its dependencies) as a .jar
    • In the manifest file, the attribute Plugin-Class must be set to the complete class name of the implementing class.
  3. Drop it into the plugins-Directory.

Crawler Plugin API

onStartCrawling

void onStartCrawling(Crawler crawler)

Called before the crawling process starts (Crawler::run()).

This may be called multiple times during the lifetime of a plugin instance, but [Crawler Plugins#onFinishCrawling|onFinishCrawling()] is always called in between.

Parameters:

Paramter Name Description
crawler The crawler instance that is about to begin crawling

onFinishCrawling

void onFinishCrawling(Crawler crawler)

Called after the crawling process has finished or aborted (because of an exception).

This may be called multiple times during the lifetime of a plugin instance.

Parameters:

Paramter Name Description
crawler The crawler instance that is about to begin crawling

Existing Plugins

components/crawler_plugins.1312016582.txt.gz · Last modified: 2014/10/29 10:20 (external edit)