Skip to content

Add Shannon entropy analysis to detect obfuscated payloads#97

Open
contemas-tschmidt wants to merge 2 commits into
scr34m:masterfrom
contemas-tschmidt:feature/entropy-analysis
Open

Add Shannon entropy analysis to detect obfuscated payloads#97
contemas-tschmidt wants to merge 2 commits into
scr34m:masterfrom
contemas-tschmidt:feature/entropy-analysis

Conversation

@contemas-tschmidt

Copy link
Copy Markdown
Contributor

Summary

  • Adds --entropy flag that flags PHP files containing string literals with unusually high Shannon entropy (≥ 5.3 bits/char by default)
  • Targets quoted string literals ≥ 40 characters — the typical carrier for encrypted, XOR-obfuscated, or base64-packed payloads
  • Complements signature-based detection: catches novel obfuscation that has no known pattern/signature
  • Adds --entropy-threshold option to override the default threshold

Why entropy analysis?

Signature-based scanners only detect known malware. Attackers increasingly use:

  • XOR-encrypted payloads with rotating keys
  • Multi-layer encoding (gzip → base64 → str_rot13)
  • Custom obfuscators that produce no recognizable function names

High-entropy string literals are a reliable indicator of such techniques. A string of 60+ chars with entropy > 5.3 bits/char is almost certainly encoded/encrypted data.

Integration

Fits cleanly into the existing scan() pipeline — runs after all pattern checks, respects --no-stop, and uses the same printPath() output format:

# ER # {/path/to/infected.php} #[entropy:5.90] # 12

Works alongside pattern detection — if a file triggers both a signature and high entropy, both findings are reported (with --no-stop).

Test

# Should flag (entropy 5.90, no known pattern match):
echo '<?php $a = "K7gH2mPxR9sNqL4vT0cA8fZdYeWoUiJbXlj3Q6nBy5wMhEkVuCp1FtGrIDsOz"; echo $a;' > test.php
php scan.php -d . --entropy -n --disable-stats -p -L
# → # ER # {./test.php} #[entropy:5.9] # 2

# Should not flag (clean, low entropy):
echo '<?php echo "Hello World"; ?>' > clean.php
php scan.php -d . --entropy -n --disable-stats
# → # OK # {./clean.php}

🤖 Generated with Claude Code

Adds --entropy flag that flags PHP files containing string literals
with unusually high entropy (>= 5.3 bits/char by default), which is
characteristic of encrypted, XOR-obfuscated or base64-packed payloads
that evade signature-based detection.

New options:
  --entropy              Enable entropy-based scanning
  --entropy-threshold    Override threshold (default: 5.3 bits/char)

Targets quoted string literals >= 40 chars only, avoiding noise from
short strings. Integrates cleanly with --no-stop so both pattern and
entropy hits are reported per file.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@scr34m

scr34m commented Jun 8, 2026

Copy link
Copy Markdown
Owner

This is a very interesting idea, i'll test a bit before merging on my code sample with malicious codes.

Adds --ast flag (requires nikic/php-parser via composer) that parses
PHP files as an AST and flags constructs that evade signature-based
detection:

  - eval() with any non-literal argument (dynamic code execution)
  - Dynamic function calls via variable: $func(...)
  - create_function() — runtime code compilation
  - assert() with dynamic argument (code execution in PHP < 8)
  - preg_replace() with /e modifier (arbitrary code via regex)

The visitor is isolated in ast_visitor.php and loaded only at runtime
after confirming vendor/autoload.php exists, so the scanner continues
to work without any dependencies for users who do not install php-parser.

nikic/php-parser is listed under suggest (not require) to keep the
PHP >= 5.3.0 minimum intact; v3 supports PHP 5.5+, v4 supports 7.0+,
v5 supports 7.4+. composer.lock is added to .gitignore.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants