owark/archiver/pipelines/actions
Eric van der Vlist 16ef7979b0 Trying to guess content types 2012-04-28 23:12:20 +02:00
..
mediatypes Trying to guess content types 2012-04-28 23:12:20 +02:00
README.txt Moving action pipelines in their own directory 2012-04-13 10:53:25 +02:00
archive-resource.xpl First version that can produce a packaged archive. 2012-04-13 19:08:04 +02:00
archive-set.xpl Adding a basic squeleton to generate what should ultimately be a WARC archive 2012-04-13 18:01:53 +02:00
crawler-beans-template.cxml Adding whois records 2012-04-23 12:11:17 +02:00
cxml.xslt Modifying the way the Heritrix (spring) config file is generated since it seems to be picky on whitespaces and indentation... 2012-04-22 16:27:16 +02:00
get-heritrix-warc.xpl Download and convert the crawl log 2012-04-26 17:08:28 +02:00
heritrix-archive-set.xpl Adding a mechanism to delay actions in the queue. 2012-04-22 18:56:15 +02:00
package-archive.xpl Still work in progress, but the WARC archive now validates with warc-tools' warcvalid.py... 2012-04-15 00:12:29 +02:00
package-heritrix-warc.xpl Rewriting links in HTML and CSS resources within WARC archives 2012-04-27 18:29:15 +02:00
parse-log.xslt Download and convert the crawl log 2012-04-26 17:08:28 +02:00
resource-index.xslt Cleaning the algorithm to compute friendly local names. 2012-04-28 18:36:16 +02:00
warc-lib.xsl Still work in progress, but the WARC archive now validates with warc-tools' warcvalid.py... 2012-04-15 00:12:29 +02:00

README.txt

Pipelines in this directory are called by the scheduler.

Their name is the name of the corresponding action.

Inputs:

    * data: the action

Outputs: None

These pipelines must take care of removing the action from the queue once they are done.