User Tools

Site Tools


project_info:crawling_process

====== Differences ====== This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
project_info:crawling_process [2012/01/30 09:43]
benjamin add checkDynamicBlacklist
project_info:crawling_process [2024/09/18 08:31] (current)
Line 23: Line 23:
    * Create a new Index entry (IndexWriterManager::​createNewIndexEntry)    * Create a new Index entry (IndexWriterManager::​createNewIndexEntry)
    * First the document is prepared for indexation (DocumentFactory::​createDocument)    * First the document is prepared for indexation (DocumentFactory::​createDocument)
 +      * [[features:​auxiliary_fields|Auxiliary Fields]] are calculated.
 +      * The [[features:​access_rights_management|Crawler Access Controller]] (if available) is asked to retrieve the allowed groups.
       * The MIME-Type is identified.(org.semanticdesktop.aperture.mime.identifier.magic.MagicMimeTypeIdentifier::​identify())       * The MIME-Type is identified.(org.semanticdesktop.aperture.mime.identifier.magic.MagicMimeTypeIdentifier::​identify())
       * All preparators are collected which accept this MIME-Type.       * All preparators are collected which accept this MIME-Type.
Line 30: Line 32:
    * Then it is added to the [[http://​lucene.apache.org|Lucene]] [[components:​search_index|index]],​ after notification of the plugins (''​__onCreateIndexEntry(Document doc, IndexWriter index)__''​).    * Then it is added to the [[http://​lucene.apache.org|Lucene]] [[components:​search_index|index]],​ after notification of the plugins (''​__onCreateIndexEntry(Document doc, IndexWriter index)__''​).
  
-At the en, ''​__onFinishCrawling(Crawler)__''​ is called.+At the end, ''​__onFinishCrawling(Crawler)__''​ is called.
project_info/crawling_process.1327912988.txt.gz · Last modified: 2024/09/18 08:31 (external edit)