Commit Graph

80 Commits

Author SHA1 Message Date
Eric van der Vlist 5ee8aba026 Header 2020-05-07 15:20:57 +02:00
Eric van der Vlist a2f20cb296 Trying netbeans... 2020-05-07 10:21:34 +02:00
Eric van der Vlist b7c70cfd10 Reformating 2020-05-07 10:21:10 +02:00
Eric van der Vlist 0d4af6419a cleanup 2020-05-07 09:53:35 +02:00
Eric van der Vlist ba2cc3ec40 Updating these views and adding a third one to list the rchives with their status. 2020-05-05 20:38:38 +02:00
Eric van der Vlist 4ef5fe2994 Using these views 2020-05-05 15:42:46 +02:00
Eric van der Vlist 0097514442 Adding views 2020-05-05 15:22:49 +02:00
Eric van der Vlist 7d2b6e53b3 Indentation 2020-05-05 14:08:35 +02:00
Eric van der Vlist 49afb2b9ea Massive refactoring and bug fixes 2020-05-05 14:05:53 +02:00
Eric van der Vlist 460c77f116 Check that the archive directory is writable 2020-05-03 11:22:24 +02:00
Eric van der Vlist 4d62124a03 PHP7 constructor syntax 2020-05-03 10:13:11 +02:00
Eric van der Vlist 885867e065 Removing "&" in function calls by reference (support of PHP 5.6+) 2020-05-02 16:43:41 +02:00
Eric van der Vlist 21807536ca Deleting what doesn't belong to Wordpress 2020-05-01 12:28:16 +02:00
Eric van der Vlist 10c0d87b93 Markdown 2020-05-01 12:15:59 +02:00
Eric van der Vlist b9c833fd17 Removing intermediary directories 2020-05-01 12:09:23 +02:00
Eric van der Vlist f907af85c7 Fixing #9 2014-01-11 22:37:00 +01:00
Eric van der Vlist 5acb10101f Rewriting resources with no archived out links 2012-05-09 19:38:21 +02:00
Eric van der Vlist 4473ad6e15 Support HTML @background 2012-05-04 19:57:24 +02:00
Eric van der Vlist 94d335170f Map application/xhtml+xml to .html 2012-05-04 19:52:56 +02:00
Eric van der Vlist 5e2b674092 Store the craw log into the archive 2012-05-04 19:49:41 +02:00
Eric van der Vlist c25b18f9f5 Support HTML embed/@src 2012-05-04 19:43:20 +02:00
Eric van der Vlist 16ef7979b0 Trying to guess content types 2012-04-28 23:12:20 +02:00
Eric van der Vlist bc581fabf9 Adapting relative links to match the structure of the browsable archive 2012-04-28 22:29:43 +02:00
Eric van der Vlist bf2980567a Cleaning the algorithm to compute friendly local names. 2012-04-28 18:36:16 +02:00
Eric van der Vlist cfaf8ae9c2 Adding XSLTUnit tests for the local-name function. 2012-04-28 17:29:52 +02:00
Eric van der Vlist a7c3525ef6 Hmmm... HTML should be serialized as HTML, of course! 2012-04-28 16:52:28 +02:00
Eric van der Vlist c79bd8e49c Forcing HTML content type for XHTML documents 2012-04-28 09:42:21 +02:00
Eric van der Vlist 9bce34f7c6 Rewriting links in HTML and CSS resources within WARC archives 2012-04-27 18:29:15 +02:00
Eric van der Vlist 5b162a64df WARC mail extract loop 2012-04-27 17:34:18 +02:00
Eric van der Vlist 466d4473ce Generating a resource index to facilitate further processing. 2012-04-27 17:04:17 +02:00
Eric van der Vlist 675ed04aba Download and convert the crawl log 2012-04-26 17:08:28 +02:00
Eric van der Vlist 6f64c7f8a9 Handling payload content types 2012-04-26 14:13:24 +02:00
Eric van der Vlist be1a361ab9 Implementing yet another WARC parser (the heritrix one didn't work well with Orbeon due to http client library conflicts). 2012-04-26 09:48:43 +02:00
Eric van der Vlist 307b6d2a72 Adding whois records 2012-04-23 12:11:17 +02:00
Eric van der Vlist 22c3028c38 First stab of WARC packaging. 2012-04-23 11:26:59 +02:00
Eric van der Vlist 51c2058aa6 Queue an action to package the Heritrix WARC. 2012-04-23 11:09:36 +02:00
Eric van der Vlist b346236789 Adding a mechanism to delay actions in the queue. 2012-04-22 18:56:15 +02:00
Eric van der Vlist 3bcb813cb7 Unpause Heritrix job. 2012-04-22 17:59:39 +02:00
Eric van der Vlist f25a9246bc Modifying the way the Heritrix (spring) config file is generated since it seems to be picky on whitespaces and indentation... 2012-04-22 16:27:16 +02:00
Eric van der Vlist a3fa073667 Update to follow changes to Orbeon Forms experimental features... 2012-04-22 08:44:12 +02:00
Eric van der Vlist a1dc635607 Update to follow changes to Orbeon Forms experimental features... 2012-04-22 00:01:51 +02:00
Eric van der Vlist 57daa703da Now building and launching Heritrix jobs... 2012-04-21 23:42:16 +02:00
Eric van der Vlist be2f974a4c Update to follow changes to Orbeon Forms experimental features... 2012-04-21 22:51:58 +02:00
Eric van der Vlist c4c4108025 Starting to write pipeline actions that interact with an Heritrix server 2012-04-20 20:39:00 +02:00
Eric van der Vlist ad35672603 Still work in progress, but the WARC archive now validates with warc-tools' warcvalid.py... 2012-04-15 00:12:29 +02:00
Eric van der Vlist ba51ddfb0b Starting to support content lengths in warc archives 2012-04-14 22:32:33 +02:00
Eric van der Vlist 9d99928c60 Removing the last action from the queue 2012-04-13 19:17:20 +02:00
Eric van der Vlist 01a66903f3 First version that can produce a packaged archive. 2012-04-13 19:08:04 +02:00
Eric van der Vlist 5ac9ea90bb Packaging resources that have not been rewritten... 2012-04-13 18:42:32 +02:00
Eric van der Vlist 0e7bdd1de4 Adding a basic squeleton to generate what should ultimately be a WARC archive 2012-04-13 18:01:53 +02:00