====== Differences ====== This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision | ||
components:crawler_plugins [2012/01/30 09:39] benjamin new plugin |
components:crawler_plugins [2024/09/18 08:31] (current) |
||
---|---|---|---|
Line 33: | Line 33: | ||
**Parameters**: | **Parameters**: | ||
- | ^ Paramter Name ^ Description ^ | + | ^ Parameter Name ^ Description ^ |
| crawler | The crawler instance that is about to begin crawling | | | crawler | The crawler instance that is about to begin crawling | | ||
Line 45: | Line 45: | ||
**Parameters**: | **Parameters**: | ||
- | ^ Paramter Name ^ Description ^ | + | ^ Parameter Name ^ Description ^ |
| crawler | The crawler instance that is about to begin crawling | | | crawler | The crawler instance that is about to begin crawling | | ||
Line 57: | Line 57: | ||
**Parameters**: | **Parameters**: | ||
- | ^ Paramter Name ^ Description ^ | + | ^ Parameter Name ^ Description ^ |
| url | URL of the crawling job that should normally be added. | | | url | URL of the crawling job that should normally be added. | | ||
| sourceUrl | The URL where the url above has been found (<a>-Tag, PDF or similar) | | | sourceUrl | The URL where the url above has been found (<a>-Tag, PDF or similar) | | ||
Line 65: | Line 65: | ||
''True'': blacklist this URL. ''False'': Allow this URL. | ''True'': blacklist this URL. ''False'': Allow this URL. | ||
+ | |||
+ | If at least one of the crawler plugins returns true, the file will be treated as blacklisted. | ||
=== onAcceptURL === | === onAcceptURL === | ||
Line 75: | Line 77: | ||
**Parameters**: | **Parameters**: | ||
- | ^ Paramter Name ^ Description ^ | + | ^ Parameter Name ^ Description ^ |
| url | URL that just was accepted | | | url | URL that just was accepted | | ||
| job | CrawlerJob that was created as a consequence | | | job | CrawlerJob that was created as a consequence | | ||
Line 88: | Line 90: | ||
**Parameters**: | **Parameters**: | ||
- | ^ Paramter Name ^ Description ^ | + | ^ Parameter Name ^ Description ^ |
| url | URL that just was declined | | | url | URL that just was declined | | ||
Line 101: | Line 103: | ||
**Parameters**: | **Parameters**: | ||
- | ^ Paramter Name ^ Description ^ | + | ^ Parameter Name ^ Description ^ |
| doc | Document to write | | | doc | Document to write | | ||
| index | Lucene Index Writer | | | index | Lucene Index Writer | | ||
Line 115: | Line 117: | ||
**Parameters**: | **Parameters**: | ||
- | ^ Paramter Name ^ Description ^ | + | ^ Parameter Name ^ Description ^ |
| doc | Document to read | | | doc | Document to read | | ||
| index | Lucene Index Reader | | | index | Lucene Index Reader | | ||
Line 127: | Line 129: | ||
**Parameters**: | **Parameters**: | ||
- | ^ Paramter Name ^ Description ^ | + | ^ Parameter Name ^ Description ^ |
| document | Regain document that will be analysed | | | document | Regain document that will be analysed | | ||
| preparator | Preperator that was chosen to analyse this document | | | preparator | Preperator that was chosen to analyse this document | | ||
Line 140: | Line 142: | ||
**Parameters**: | **Parameters**: | ||
- | ^ Paramter Name ^ Description ^ | + | ^ Parameter Name ^ Description ^ |
| document | Regain document that was analysed | | | document | Regain document that was analysed | | ||
| preparator | Preperator that has analysed this document | | | preparator | Preperator that has analysed this document | |