It crawles the directories and web sites specified in the configuration file CrawlerConfiguration.xml for documents. With each document, that is not yet in the search index, the text is extracted using the suitable preparator. This text is included in the index.
Technically spoken the crawler is a Java stand-alone application (
regain-crawler.jar), that runs on the console, that is without a graphical user interface. Thereby it may be started automated, e.g. by a cron job.
Crawler call should be perfomed from its installation directory, otherwise it may fail on accessing the ressources (e.g. log file, Preparatoren, etc.). A typical call on Windows-Systems is:
c: cd C:\Program files\regain\crawler java -jar regain-crawler.jar