Hi,
I'm trying to compress a large chunk of data produced by another program on the
fly with the lz4 command line tool. The program writes the data (about 8GB) to
a stdout or to a FIFO (like the ones created with mkfifo). Then, I either let
lz4 read from stdin or from the FIFO. As you may know, the buffer size of a
pipe is about 64KB. (this applied to both the | operator as implemented by your
shell) and the FIFOs creates by mkfifo. I have no idea how to increase this
buffer size.
Examples:
1) write8GB /dev/stdout | lz4 -1 - output.lz4
2) mkfifo myFifo; write8GB myFifo &; lz4 -1 myFifo output.lz4
Now with default settings (-B7), it takes 22seconds until the whole process is
complete. With -B4 on the other hand, it takes only 13 seconds. The time is
almost cut in half.
The issue here is that the program producing the data has to wait when the pipe
is full and lz4 is compressing. Also, if the pipe is suddenly empty because lz4
reads the next block, then lz4 has to wait for the program to produce the data.
Actually, the two programs never really run in parallel, wasting previous time.
The is typically not an issue with compressors that read data in smaller chunks.
It would be nice, IMHO, if lz4 would have a buffer of two or three block, and
if an internal thread of lz4 would read ahead while it is currently compressing
one block. Now a similar problem might exist while decompressing.
There is a program called buffer (i.e. you execute cmd1|buffer|cmd2 instead of
cmd1|cmd2), but for some reason it wastes a LOT of cpu time (almost as much as
lz4 itself!). Maybe, because it copies the data around.
I reliaze that I could also blame the program that's producing the data (it
does not produce the data ahead of time), but you will find that most tools
won't (like tar for example).
Original issue reported on code.google.com by
sven.koe...@gmail.comon 13 May 2014 at 1:03