The search index stores the data about the documents in a way, so that documents containing a certain key word may be found rapidly on search request. Because of the index's smart design a search over many thousands of documents can be performed in parts of a second.
A search request “
regain extension:pdf” for instance will look for
regain in the default fields as well as for
regain creates the following standard fields:
url- The document's URL.
content- The document´s text extracted by the preparator.
title- The document's title (if it has any).
summary- The summary shown in the hit list.
headlines- The headlines (if there are any) contained in the document.
size- The document's size in bytes (can't be searched).
last-modified- The date of the last change in the
YYYY-MM-DD HH:MMformat (can't be searched).
path- The navigation path to the document. (can't be searched)
groups- Contains the user groups that are allowed to read the document. Is set only when the access rights management is enabled.
extensionstoring the document´s file extension (e.g.
contentfield is established! Instead of this the
preparation-errorfield is created and set to
'index directory' regain puts the search indexes. The indexes are stored in different sub directories, depending in which phase of their life cycle they are.
regain uses the following sub directories:
temp- An index in this directory is currently changed by the crawler.
breakpoint- Periodically the crawler creates breakpoints. If the crawler is stopped before it finished the new index (e.g. when the computer is shut down), then it is able to proceed from the last breakpoint when it is started the next time and doesn't have to start from the beginning.
new- When the crawler finished the index, it renames the diretory to
new. This directory is the interface between crawler and search mask. The search mask regularily checks, if there is an index with the state
newin the index directory. If it finds such an index it changes to that index, that is it renames the directory to
index. In this way the hot deployment is implemented.
quarantine- If the crawler finished an index but had many errors, the new index doesn't get the state
quarantine. In this way the search mask doesn't change to the faulty index automatically. In this case you should check the log file and, if you want to change to that index, rename the directory to
index- This index is currently used by the search mask.
backup- Before the search mask changes to the new index, it renames the old index to
backup. If a new index should be faulty, you are able to quickly switch to the previous index by renaming the directory