User Tools

Site Tools


components:crawler_plugins

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
components:crawler_plugins [2012/01/30 09:39]
benjamin new plugin
components:crawler_plugins [2014/10/29 10:22] (current)
Line 33: Line 33:
  
 **Parameters**: ​ **Parameters**: ​
-Paramter ​Name       ^ Description ​                                             ^+Parameter ​Name       ^ Description ​                                             ^
 | crawler ​            | The crawler instance that is about to begin crawling ​    ​| ​ | crawler ​            | The crawler instance that is about to begin crawling ​    ​| ​
  
Line 45: Line 45:
  
 **Parameters**: ​ **Parameters**: ​
-Paramter ​Name       ^ Description ​                                             ^+Parameter ​Name       ^ Description ​                                             ^
 | crawler ​            | The crawler instance that is about to begin crawling ​    ​| ​ | crawler ​            | The crawler instance that is about to begin crawling ​    ​| ​
  
Line 57: Line 57:
  
 **Parameters**: ​ **Parameters**: ​
-Paramter ​Name       ^ Description ​                                             ^+Parameter ​Name       ^ Description ​                                             ^
 | url             | URL of the crawling job that should normally be added. ​    ​| ​ | url             | URL of the crawling job that should normally be added. ​    ​| ​
 | sourceUrl ​            | The URL where the url above has been found (<​a>​-Tag,​ PDF or similar) ​    ​| ​ | sourceUrl ​            | The URL where the url above has been found (<​a>​-Tag,​ PDF or similar) ​    ​| ​
Line 65: Line 65:
  
 ''​True'':​ blacklist this URL. ''​False'':​ Allow this URL. ''​True'':​ blacklist this URL. ''​False'':​ Allow this URL.
 +
 +If at least one of the crawler plugins returns true, the file will be treated as blacklisted.
  
 === onAcceptURL === === onAcceptURL ===
Line 75: Line 77:
  
 **Parameters**: ​ **Parameters**: ​
-Paramter ​Name       ^ Description ​                                             ^+Parameter ​Name       ^ Description ​                                             ^
 | url             | URL that just was accepted ​    ​| ​ | url             | URL that just was accepted ​    ​| ​
 | job             | CrawlerJob that was created as a consequence ​    ​| ​ | job             | CrawlerJob that was created as a consequence ​    ​| ​
Line 88: Line 90:
  
 **Parameters**: ​ **Parameters**: ​
-Paramter ​Name       ^ Description ​                                             ^+Parameter ​Name       ^ Description ​                                             ^
 | url             | URL that just was declined ​    ​| ​ | url             | URL that just was declined ​    ​| ​
  
Line 101: Line 103:
  
 **Parameters**: ​ **Parameters**: ​
-Paramter ​Name       ^ Description ​                                             ^+Parameter ​Name       ^ Description ​                                             ^
 | doc             | Document to write     | | doc             | Document to write     |
 | index             | Lucene Index Writer ​    ​|  ​ | index             | Lucene Index Writer ​    ​|  ​
Line 115: Line 117:
  
 **Parameters**: ​ **Parameters**: ​
-Paramter ​Name       ^ Description ​                                             ^+Parameter ​Name       ^ Description ​                                             ^
 | doc             | Document to read     | | doc             | Document to read     |
 | index             | Lucene Index Reader ​    ​|  ​ | index             | Lucene Index Reader ​    ​|  ​
Line 127: Line 129:
  
 **Parameters**: ​ **Parameters**: ​
-Paramter ​Name       ^ Description ​                                             ^+Parameter ​Name       ^ Description ​                                             ^
 | document | Regain document that will be analysed | | document | Regain document that will be analysed |
 | preparator | Preperator that was chosen to analyse this document | | preparator | Preperator that was chosen to analyse this document |
Line 140: Line 142:
  
 **Parameters**: ​ **Parameters**: ​
-Paramter ​Name       ^ Description ​                                             ^+Parameter ​Name       ^ Description ​                                             ^
 | document | Regain document that was analysed | | document | Regain document that was analysed |
 | preparator | Preperator that has analysed this document | | preparator | Preperator that has analysed this document |
components/crawler_plugins.1327912759.txt.gz · Last modified: 2014/10/29 10:21 (external edit)