`suspect.io.load_twix()` to accept binary stream input by darrencl · Pull Request #183 · openmrslab/suspect

darrencl · 2025-07-03T08:16:17Z

To mitigate the slow reading for network drive access as reported in #182, load_twix() now supports handling binary stream. By doing this, user can increase the buffer size in the open() function. For example:

In [4]: with open("/Users/darren/tmp/test-copy.dat", 'rb', buffering=1024*1024) as f:
   ...:     test_twix = suspect.io.load_twix(f)
   ...:

Per my testing, increasing the buffer size significantly improves the read speed. Using the same example in the issue, the total time is down to ~64s when increasing the buffer to 1MiB(vs ~1174s when using io.DEFAULT_BUFFER_SIZE , which is 8KiB in my environment).

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    14985   64.459    0.004   64.459    0.004 {method 'read' of '_io.BufferedReader' objects} # Here
  8261568    2.266    0.000    2.266    0.000 <string>:61(<genexpr>)
        1    1.427    1.427    1.427    1.427 {built-in method _io.open}
     4032    1.125    0.000    3.389    0.001 {built-in method numpy.fromiter}
    14984    0.307    0.000    0.307    0.000 {built-in method _struct.unpack}

Also:

np.fromstring() to np.frombuffer().
Fixed docstring format.

Closes #182

Also fixed docstring

coveralls · 2025-07-03T08:23:05Z

coverage: 70.803% (+0.2%) from 70.632%
when pulling 5b5ed29 on darrencl:io/increase-buffer-twix
into 15ef7e4 on openmrslab:master.

bennyrowland · 2025-07-03T11:14:58Z

@darrencl great to see continued improvements being applied to suspect. I was going to suggest that you might prefer to support passing a file-like object to the load_twix() function, as is done for the json.load() function, for example. The advantage of this is that you then don't have to add any extra parameters to the suspect functions to simply pass through to open(), and you don't run the risk of having to regularly add to the list as a need for another internal parameter to be tweaked comes up. However, looking more carefully at open(), it seems like the other options pretty much only apply to text files, so it is unlikely that any of them may need to be added later.

The other possible value of doing it that way is that you can have other types of file-like objects, for example in-memory binary arrays. I suppose there might be some alternative method of getting a file-like object representing a networked file that can be read using a different approach and more efficiently than simply using open() with a bigger buffer-size, but then again there may not, so this is not really a compelling reason to make a more complex change.

So basically, I am fine with your proposed change, unless you see sufficient value in the "file-like object" approach to adopt that (note that we would have to continue supporting str/Path-like as well for compatibility anyway). The one thing that I would suggest is adding some extra detail in the docstring explaining why a user might want to fiddle with the buffer size, and perhaps a link to the issue or this PR so they can see the buffer size you used and the improvement achieved.

darrencl · 2025-07-04T00:18:26Z

Hi @bennyrowland , great to hear from you!

Thanks for the suggestion! I agree that it makes more sense to support file-like object instead – I will update this PR.

darrencl · 2025-07-04T07:31:49Z

@bennyrowland Done!

The load_twix() can now accept both binary stream and file path, e.g.,

In [4]: with open("/Users/darren/tmp/test-copy.dat", 'rb', buffering=1024*1024) as f:
   ...:     test_twix = suspect.io.load_twix(f)
   ...:

In [5]: test_twix = suspect.io.load_twix("/Users/darren/tmp/test-copy.dat")

I will update this PR's description, and it should be good to go.

I think for consistency, it is best if the other readers could also support binary stream input. What do you think?
If you agree, I will open an issue so I don't forget, and will do it at my convenience.

Option to modify buffer size in open()

dfff544

Also fixed docstring

darrencl self-assigned this Jul 3, 2025

Accept binary stream in load_twix()

5b5ed29

darrencl changed the title ~~Setting IO buffer size via suspect.io.load_twix()~~ suspect.io.load_twix() to accept binary stream input Jul 4, 2025

darrencl merged commit a903fd7 into openmrslab:master Jul 13, 2025
25 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`suspect.io.load_twix()` to accept binary stream input#183

`suspect.io.load_twix()` to accept binary stream input#183
darrencl merged 2 commits into
openmrslab:masterfrom
darrencl:io/increase-buffer-twix

darrencl commented Jul 3, 2025 •

edited

Loading

Uh oh!

coveralls commented Jul 3, 2025 •

edited

Loading

Uh oh!

bennyrowland commented Jul 3, 2025

Uh oh!

darrencl commented Jul 4, 2025

Uh oh!

darrencl commented Jul 4, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

darrencl commented Jul 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coveralls commented Jul 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bennyrowland commented Jul 3, 2025

Uh oh!

darrencl commented Jul 4, 2025

Uh oh!

darrencl commented Jul 4, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

darrencl commented Jul 3, 2025 •

edited

Loading

coveralls commented Jul 3, 2025 •

edited

Loading