Commit Graph

23 Commits

Author SHA1 Message Date
Eric van der Vlist be1a361ab9 Implementing yet another WARC parser (the heritrix one didn't work well with Orbeon due to http client library conflicts). 2012-04-26 09:48:43 +02:00
Eric van der Vlist 307b6d2a72 Adding whois records 2012-04-23 12:11:17 +02:00
Eric van der Vlist 22c3028c38 First stab of WARC packaging. 2012-04-23 11:26:59 +02:00
Eric van der Vlist 51c2058aa6 Queue an action to package the Heritrix WARC. 2012-04-23 11:09:36 +02:00
Eric van der Vlist b346236789 Adding a mechanism to delay actions in the queue. 2012-04-22 18:56:15 +02:00
Eric van der Vlist 3bcb813cb7 Unpause Heritrix job. 2012-04-22 17:59:39 +02:00
Eric van der Vlist f25a9246bc Modifying the way the Heritrix (spring) config file is generated since it seems to be picky on whitespaces and indentation... 2012-04-22 16:27:16 +02:00
Eric van der Vlist a3fa073667 Update to follow changes to Orbeon Forms experimental features... 2012-04-22 08:44:12 +02:00
Eric van der Vlist a1dc635607 Update to follow changes to Orbeon Forms experimental features... 2012-04-22 00:01:51 +02:00
Eric van der Vlist 57daa703da Now building and launching Heritrix jobs... 2012-04-21 23:42:16 +02:00
Eric van der Vlist be2f974a4c Update to follow changes to Orbeon Forms experimental features... 2012-04-21 22:51:58 +02:00
Eric van der Vlist c4c4108025 Starting to write pipeline actions that interact with an Heritrix server 2012-04-20 20:39:00 +02:00
Eric van der Vlist ad35672603 Still work in progress, but the WARC archive now validates with warc-tools' warcvalid.py... 2012-04-15 00:12:29 +02:00
Eric van der Vlist ba51ddfb0b Starting to support content lengths in warc archives 2012-04-14 22:32:33 +02:00
Eric van der Vlist 9d99928c60 Removing the last action from the queue 2012-04-13 19:17:20 +02:00
Eric van der Vlist 01a66903f3 First version that can produce a packaged archive. 2012-04-13 19:08:04 +02:00
Eric van der Vlist 5ac9ea90bb Packaging resources that have not been rewritten... 2012-04-13 18:42:32 +02:00
Eric van der Vlist 0e7bdd1de4 Adding a basic squeleton to generate what should ultimately be a WARC archive 2012-04-13 18:01:53 +02:00
Eric van der Vlist 3d18e9d8a4 Adding a mechanism to avoid to archive multiple times the same resource for a single archive set. 2012-04-13 13:05:25 +02:00
Eric van der Vlist cf97a98416 Fist version supporting CSS rewriting 2012-04-13 12:27:04 +02:00
Eric van der Vlist 750ccaac7c Dummy (passthrough) implementation of the CSS support... 2012-04-13 11:58:38 +02:00
Eric van der Vlist 16cc943d48 Refactoring before supporting CSS 2012-04-13 11:16:40 +02:00
Eric van der Vlist 11027c068a Moving action pipelines in their own directory 2012-04-13 10:53:25 +02:00