For larger crawls it much faster if we calculate digests and wrote the WARC files to DOSS in parallel. A 5 TB crawl takes many hours to import single threaded and the hardware is capable of several times the throughput if we used maybe 6 threads or so.
For larger crawls it much faster if we calculate digests and wrote the WARC files to DOSS in parallel. A 5 TB crawl takes many hours to import single threaded and the hardware is capable of several times the throughput if we used maybe 6 threads or so.