regain manual

Translations of this page:

en
de

====== Crawler ====== The **crawler** is the program that creates the [[search index]]. This index is needed by the [[search mask]] in order to perform searches. It crawles the directories and web sites specified in the configuration file [[:config:CrawlerConfiguration.xml]] for documents. With each document, that is not yet in the search index, the text is extracted using the suitable [[preparator]]. This text is included in the index. Technically spoken the crawler is a Java stand-alone application (''regain-crawler.jar''), that runs on the console, that is without a graphical user interface. Thereby it may be started automated, e.g. by a cron job. The [[:project_info:variant_comparison|desktop search]] starts the crawler periodically, so there is no need to execute it by hand. The intervall may be set via configuration page as well as inside [[:config:DesktopConfiguration.xml]]. Crawler call should be perfomed from its installation directory, otherwise it may fail on accessing the ressources (e.g. log file, [[Preparator]]en, etc.). A typical call on Windows-Systems is: c: cd C:\Program files\regain\crawler java -jar regain-crawler.jar Hints: * Crawler is not restricted to run on the same engine like the Servlet-Engine with [[search mask]] ist running. * So you may build (parts of) the [[search index]] using the [[Preparator]]s available for Windows only... [[project_info:crawling_process|Developer's Documentation of the Crawling Process]]

regain manual

User Tools

Site Tools

Page Tools