Skip to content

performance issues when piping data to lz4 #129

Description

@GoogleCodeExporter
Hi,

I'm trying to compress a large chunk of data produced by another program on the 
fly with the lz4 command line tool. The program writes the data (about 8GB) to 
a stdout or to a FIFO (like the ones created with mkfifo). Then, I either let 
lz4 read from stdin or from the FIFO. As you may know, the buffer size of a 
pipe is about 64KB. (this applied to both the | operator as implemented by your 
shell) and the FIFOs creates by mkfifo. I have no idea how to increase this 
buffer size.

Examples:
1) write8GB /dev/stdout | lz4 -1 - output.lz4
2) mkfifo myFifo; write8GB myFifo &; lz4 -1 myFifo output.lz4

Now with default settings (-B7), it takes 22seconds until the whole process is 
complete. With -B4 on the other hand, it takes only 13 seconds. The time is 
almost cut in half.

The issue here is that the program producing the data has to wait when the pipe 
is full and lz4 is compressing. Also, if the pipe is suddenly empty because lz4 
reads the next block, then lz4 has to wait for the program to produce the data.

Actually, the two programs never really run in parallel, wasting previous time. 
The is typically not an issue with compressors that read data in smaller chunks.

It would be nice, IMHO, if lz4 would have a buffer of two or three block, and 
if an internal thread of lz4 would read ahead while it is currently compressing 
one block. Now a similar problem might exist while decompressing.

There is a program called buffer (i.e. you execute cmd1|buffer|cmd2 instead of 
cmd1|cmd2), but for some reason it wastes a LOT of cpu time (almost as much as 
lz4 itself!). Maybe, because it copies the data around.

I reliaze that I could also blame the program that's producing the data (it 
does not produce the data ahead of time), but you will find that most tools 
won't (like tar for example).

Original issue reported on code.google.com by sven.koe...@gmail.com on 13 May 2014 at 1:03

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions