suspect.io.load_twix() to accept binary stream input#183
Conversation
Also fixed docstring
|
@darrencl great to see continued improvements being applied to suspect. I was going to suggest that you might prefer to support passing a file-like object to the The other possible value of doing it that way is that you can have other types of file-like objects, for example in-memory binary arrays. I suppose there might be some alternative method of getting a file-like object representing a networked file that can be read using a different approach and more efficiently than simply using So basically, I am fine with your proposed change, unless you see sufficient value in the "file-like object" approach to adopt that (note that we would have to continue supporting str/Path-like as well for compatibility anyway). The one thing that I would suggest is adding some extra detail in the docstring explaining why a user might want to fiddle with the buffer size, and perhaps a link to the issue or this PR so they can see the buffer size you used and the improvement achieved. |
|
Hi @bennyrowland , great to hear from you! Thanks for the suggestion! I agree that it makes more sense to support file-like object instead – I will update this PR. |
|
@bennyrowland Done! The In [4]: with open("/Users/darren/tmp/test-copy.dat", 'rb', buffering=1024*1024) as f:
...: test_twix = suspect.io.load_twix(f)
...:
In [5]: test_twix = suspect.io.load_twix("/Users/darren/tmp/test-copy.dat")I will update this PR's description, and it should be good to go. I think for consistency, it is best if the other readers could also support binary stream input. What do you think? |
suspect.io.load_twix() suspect.io.load_twix() to accept binary stream input
To mitigate the slow reading for network drive access as reported in #182,
load_twix()now supports handling binary stream. By doing this, user can increase the buffer size in theopen()function. For example:Per my testing, increasing the buffer size significantly improves the read speed. Using the same example in the issue, the total time is down to ~64s when increasing the buffer to 1MiB(vs ~1174s when using
io.DEFAULT_BUFFER_SIZE, which is 8KiB in my environment).Also:
np.fromstring()tonp.frombuffer().Closes #182