Skip to content

suspect.io.load_twix() to accept binary stream input#183

Merged
darrencl merged 2 commits into
openmrslab:masterfrom
darrencl:io/increase-buffer-twix
Jul 13, 2025
Merged

suspect.io.load_twix() to accept binary stream input#183
darrencl merged 2 commits into
openmrslab:masterfrom
darrencl:io/increase-buffer-twix

Conversation

@darrencl

@darrencl darrencl commented Jul 3, 2025

Copy link
Copy Markdown
Collaborator

To mitigate the slow reading for network drive access as reported in #182, load_twix() now supports handling binary stream. By doing this, user can increase the buffer size in the open() function. For example:

In [4]: with open("/Users/darren/tmp/test-copy.dat", 'rb', buffering=1024*1024) as f:
   ...:     test_twix = suspect.io.load_twix(f)
   ...:

Per my testing, increasing the buffer size significantly improves the read speed. Using the same example in the issue, the total time is down to ~64s when increasing the buffer to 1MiB(vs ~1174s when using io.DEFAULT_BUFFER_SIZE , which is 8KiB in my environment).

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    14985   64.459    0.004   64.459    0.004 {method 'read' of '_io.BufferedReader' objects} # Here
  8261568    2.266    0.000    2.266    0.000 <string>:61(<genexpr>)
        1    1.427    1.427    1.427    1.427 {built-in method _io.open}
     4032    1.125    0.000    3.389    0.001 {built-in method numpy.fromiter}
    14984    0.307    0.000    0.307    0.000 {built-in method _struct.unpack}

Also:

  • np.fromstring() to np.frombuffer().
  • Fixed docstring format.

Closes #182

@darrencl darrencl self-assigned this Jul 3, 2025
@coveralls

coveralls commented Jul 3, 2025

Copy link
Copy Markdown
Collaborator

Coverage Status

coverage: 70.803% (+0.2%) from 70.632%
when pulling 5b5ed29 on darrencl:io/increase-buffer-twix
into 15ef7e4 on openmrslab:master.

@bennyrowland

Copy link
Copy Markdown
Member

@darrencl great to see continued improvements being applied to suspect. I was going to suggest that you might prefer to support passing a file-like object to the load_twix() function, as is done for the json.load() function, for example. The advantage of this is that you then don't have to add any extra parameters to the suspect functions to simply pass through to open(), and you don't run the risk of having to regularly add to the list as a need for another internal parameter to be tweaked comes up. However, looking more carefully at open(), it seems like the other options pretty much only apply to text files, so it is unlikely that any of them may need to be added later.

The other possible value of doing it that way is that you can have other types of file-like objects, for example in-memory binary arrays. I suppose there might be some alternative method of getting a file-like object representing a networked file that can be read using a different approach and more efficiently than simply using open() with a bigger buffer-size, but then again there may not, so this is not really a compelling reason to make a more complex change.

So basically, I am fine with your proposed change, unless you see sufficient value in the "file-like object" approach to adopt that (note that we would have to continue supporting str/Path-like as well for compatibility anyway). The one thing that I would suggest is adding some extra detail in the docstring explaining why a user might want to fiddle with the buffer size, and perhaps a link to the issue or this PR so they can see the buffer size you used and the improvement achieved.

@darrencl

darrencl commented Jul 4, 2025

Copy link
Copy Markdown
Collaborator Author

Hi @bennyrowland , great to hear from you!

Thanks for the suggestion! I agree that it makes more sense to support file-like object instead – I will update this PR.

@darrencl

darrencl commented Jul 4, 2025

Copy link
Copy Markdown
Collaborator Author

@bennyrowland Done!

The load_twix() can now accept both binary stream and file path, e.g.,

In [4]: with open("/Users/darren/tmp/test-copy.dat", 'rb', buffering=1024*1024) as f:
   ...:     test_twix = suspect.io.load_twix(f)
   ...:

In [5]: test_twix = suspect.io.load_twix("/Users/darren/tmp/test-copy.dat")

I will update this PR's description, and it should be good to go.

I think for consistency, it is best if the other readers could also support binary stream input. What do you think?
If you agree, I will open an issue so I don't forget, and will do it at my convenience.

@darrencl darrencl changed the title Setting IO buffer size via suspect.io.load_twix() suspect.io.load_twix() to accept binary stream input Jul 4, 2025
@darrencl darrencl merged commit a903fd7 into openmrslab:master Jul 13, 2025
25 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Siemens TWIX reader slow when reading through network drive

3 participants