partition speed improvement#16
Open
flowers9 wants to merge 1 commit into
Open
Conversation
|
I could not compile with these changes, the errors were below. I think it is because of flowers9:index_t branch changed index_t to idx_t but in this branch it still uses index_t. Sorry I am not that experienced with git so not sure if there is an elegant way to merge both branches. I manually edit the files in this branch and changed index_t back to idx_t and now it compiles... |
Owner
|
Dear all, thanks for your interest in MECAT, We have updated MECAT versiong 1.3 and fixed these issues by adding one new option '-k to specified the number of partition files. Please complie the new version again and use '-k -1' to let mecat2cns write as many as possible partition files at one pass. Thanks. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
When partitioning the *.can file, multiple passes are required - one per 10 output files (plus one to get the number of reads involved). As the *.can file can be many gigabytes, this slows down the partitioning process quite a bit. This fix increases the number of files that can be written to be closer to the system limit, rather than a fixed 10, likely reducing the number of passes to one (plus the one to get the number of reads).
My initial approach of using std::vector<PODArray > failed when other variables got overwritten. I didn't want to muck around in PODArray<> to figure out what the cause was, so I used new[]/delete[] instead.