Function file name sanitisation#49
Open
valbucci wants to merge 5 commits into
Open
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
File name sanitization
There used to is a bug described in issue #45 where if a function's name is too long the data extraction and generation will fail.
This fix checks if file names are longer than 100 characters and, if so, they are truncated such that the leading+trailing 50 characters are retained to avoid duplication, and the ones in the middle are removed. For example, a function named:
sym.slices.siftDownCmpFunc_go.shape.interface__Info____io_fs.FileInfo__error___IsDir___bo ol__Name___string__Type___io_fs.FileMode(129 characters)sym.slices.siftDownCmpFunc_go.shape.interface__InfIsDir___bool__Name___string__Type___io_fs.FileModeThere is a function called
sanitize_filename(see src/utils.rs:L95), which replaces all invalid characters with_and performs the bidirectional truncation described above.In the extract mode I added an option
--func-filenamewith allows options "symbol" (default), "address", or a custom template. This only works with thebytesmode since it's the only one that extracts function-level data files.For example, a function called
main0at address0xdeadbeefwill be extracted with the following file names, depending on the specified option:symbolmain0.binaddressdeadbeef.bin{address}.{symbol}deadbeef.main0func-{symbol}.binfunc-main0.binIt would be nice to have this functionality in bin2ml generate as well, but it would entail a performance reduction during the feature extraction process -- e.g. it would be necessary to run multiple commands for each function instead of
agCj @@f.Secondary changes
I also made some other secondary changes.
FunctionToBeProcessed(see extract.rs:L706) to hold all the function-related logic such as getters and the byte extraction script.AFIJFunctionInfoandAFLJFuncDetails(see extract.rs:L1182 and L1200)extraction_job_matcherandget_job_type_suffixsuch that they are based on a single source of truth (see HashMap in extract.rs:L60)