User Tools

Site Tools


components:crawler
Translations of this page:

Crawler

The crawler is the program that creates the search index. This index is needed by the search mask in order to perform searches.

It crawles the directories and web sites specified in the configuration file CrawlerConfiguration.xml for documents. With each document, that is not yet in the search index, the text is extracted using the suitable preparator. This text is included in the index.

Technically spoken the crawler is a Java stand-alone application (regain-crawler.jar), that runs on the console, that is without a graphical user interface. Thereby it may be started automated, e.g. by a cron job.

The desktop search starts the crawler periodically, so there is no need to execute it by hand. The intervall may be set via configuration page as well as inside DesktopConfiguration.xml.

Crawler call should be perfomed from its installation directory, otherwise it may fail on accessing the ressources (e.g. log file, Preparatoren, etc.). A typical call on Windows-Systems is:

c:
cd C:\Program files\regain\crawler
java -jar regain-crawler.jar

Hints:

  • Crawler is not restricted to run on the same engine like the Servlet-Engine with search mask ist running.
  • So you may build (parts of) the search index using the Preparators available for Windows only…

Developer's Documentation of the Crawling Process

components/crawler.txt · Last modified: 2014/10/29 10:22 (external edit)