owark/archiver/pipelines/actions
Eric van der Vlist be1a361ab9 Implementing yet another WARC parser (the heritrix one didn't work well with Orbeon due to http client library conflicts). 2012-04-26 09:48:43 +02:00
..
mediatypes First version that can produce a packaged archive. 2012-04-13 19:08:04 +02:00
README.txt Moving action pipelines in their own directory 2012-04-13 10:53:25 +02:00
archive-resource.xpl First version that can produce a packaged archive. 2012-04-13 19:08:04 +02:00
archive-set.xpl Adding a basic squeleton to generate what should ultimately be a WARC archive 2012-04-13 18:01:53 +02:00
crawler-beans-template.cxml Adding whois records 2012-04-23 12:11:17 +02:00
cxml.xslt Modifying the way the Heritrix (spring) config file is generated since it seems to be picky on whitespaces and indentation... 2012-04-22 16:27:16 +02:00
get-heritrix-warc.xpl Queue an action to package the Heritrix WARC. 2012-04-23 11:09:36 +02:00
heritrix-archive-set.xpl Adding a mechanism to delay actions in the queue. 2012-04-22 18:56:15 +02:00
package-archive.xpl Still work in progress, but the WARC archive now validates with warc-tools' warcvalid.py... 2012-04-15 00:12:29 +02:00
package-heritrix-warc.xpl Implementing yet another WARC parser (the heritrix one didn't work well with Orbeon due to http client library conflicts). 2012-04-26 09:48:43 +02:00
warc-lib.xsl Still work in progress, but the WARC archive now validates with warc-tools' warcvalid.py... 2012-04-15 00:12:29 +02:00

README.txt

Pipelines in this directory are called by the scheduler.

Their name is the name of the corresponding action.

Inputs:

    * data: the action

Outputs: None

These pipelines must take care of removing the action from the queue once they are done.