Skip to content

Follow-ups: object-literal reconstruction pass + accurate brackets metric#63

Merged
vasie1337 merged 6 commits into
mainfrom
claude/stoic-franklin-zft90c
Jun 8, 2026
Merged

Follow-ups: object-literal reconstruction pass + accurate brackets metric#63
vasie1337 merged 6 commits into
mainfrom
claude/stoic-franklin-zft90c

Conversation

@vasie1337

Copy link
Copy Markdown
Member

Follows up the open items from the deobfuscation-improvement effort (PR #62). Two concrete, general, sound changes; the larger items are deliberately deferred (see below).

1. ReconstructObject — new pass

transformObjectKeys (and hand packers) lower an object literal into an empty object plus a contiguous run of property writes:

const O = {}; O.name = "P1"; O.price = 600;      const O = { name: "P1", price: 600 };

This folds that run back into the literal. Beyond readability, it's the keystone for proxy-table recovery: obfuscators build their operator-proxy tables the same way (const t = {}; t.m = function(a,b){…}), so reconstructing them into the { m: function… } literal is what lets proxy_inline recognize and collapse them — which folds the opaque predicates guarding dead branches so DCE removes them.

Sound by construction: only an empty-object seed; only immediately-following contiguous X.<staticKey> = <expr> writes; the value may not reference X (it isn't bound yet inside the literal); __proto__ and duplicate keys stop the run; property order is preserved so side-effect order is unchanged. Runs before ProxyInline.

Impact: corpus decoder residue 217 → 149 hexrefs; the split-object profiles shrink further (numbers_keys 71→37, strong 132→98). All 140 generated + 15 real samples still pass equivalence.

2. Accurate brackets metric

The metric counted raw [" occurrences, dominated by array/object literals (= ["x"]), not member access — sample_10 read 96 when only ~6 are real accesses. It now counts [" only where the byte before [ ends an expression (identifier char, ), ], or a closing quote), i.e. genuine string-keyed member access. Mirrored in report.rs and golden.rs. sample_10 96→6; sample_7 576→528 (its residuals are genuine decoder-gated base64-key accesses like )["BX52O0AVdwg="], correctly retained).

Deliberately deferred (each its own effort)

  • Bytecode-VM devirtualization (sample_9): the namesake target — a custom interpreter + encoded bytecode. Needs a dedicated devirt pass, not a follow-up.
  • Deeper opaque-predicate residue in strong/numbers_keys: remaining _0x are mostly top-level object names that RenameByRole intentionally leaves (root-scope bindings), plus a few proxy cases needing predicate folding.
  • Short-base v1…v999 renaming: a size/readability tradeoff, not a clear win.

Verification

New phase1 tests cover the fold and the proxy-table cascade. Full slow net green: golden (re-blessed), sample_equivalence (behavior preserved on every sample), generated_corpus (all 140 reproduce manifest output), determinism (5). cargo clippy --all-targets clean.

https://claude.ai/code/session_01EjhNTCU89wa5zaeRHMnfEc


Generated by Claude Code

claude added 6 commits June 8, 2026 17:21
The synthetic javascript-obfuscator (obfuscator.io) fixtures in
samples/generated/ gated correctness via manifest.json but were excluded
from the readability metrics: both the live report binary and the
committed SCOREBOARD.md read samples/ non-recursively.

Add a per-profile rollup of the generated corpus (aggregated over all
seeds, one row per obfuscation technique) to both surfaces, so the
obfuscator.io samples count toward readability the same way they count
toward correctness. kept% is byte-weighted, opaque% is the mean per-file
ratio, rounds is the worst case, and converged flags any non-fixpoint.
rename: RenameByRole now infers meaningful names for array-iteration
callback params (reduce->acc/value, map/filter/forEach->item/index,
sort->left/right), C-style loop counters (->index), and catch bindings
(->error), instead of falling back to generic varN. Names stay >=3 chars
so they remain idempotent under the opaque-name guard, and reuse the
existing scope de-duplicator.

report/golden: add a hexrefs column (raw, non-distinct count of _0x...
identifier occurrences) to the live dashboard and committed scoreboard.
opaque% counts DISTINCT tokens, so a single surviving decoder referenced
N times barely moved it; hexrefs spikes when a string-array decoder is
left intact, so the board now flags the worst failures (strarr_base64
163, strarr_rc4 211, numbers_keys 231, strong 385) instead of
greenlighting them. Snapshots/scoreboard re-blessed.
…ional-chain members

dce: a string-array decoder's accessor memoizes through its own name
(if (f.flag===undefined){ f.cache={}; ... } ... f.cache[k] ...). After
every call site is inlined by decoder-lift, the only surviving references
to f are reads of its own properties inside its own body, which pinned
the spent decoder and its entire encoded string array alive forever.
fn_decl_is_dead now treats a function as dead when every resolved read of
its symbol is lexically inside its own body (shadowing-safe via reference
resolution), with the existing guard that a still-called self-reassigning
function is kept. Collapses the obfuscator.io string-array profiles:
strarr_rc4 kept 72%->19% (hexrefs 211->3), strarr_base64 68%->28%
(163->3); corpus output 328K->154K bytes, hexrefs 998->217.

member-normalize: a?.["foo"] parses as a ChainElement, not an
Expression, so optional-chained computed members were never normalized.
Added enter_chain_element to rewrite them to a?.foo (identifier keys
only, optional flag preserved). Covered by a new phase1 test.

Snapshots/scoreboard re-blessed; full equivalence/determinism/corpus net green.
…-zft90c

# Conflicts:
#	src/bin/report.rs
#	tests/golden.rs
#	tests/snapshots/SCOREBOARD.md
ReconstructObject (new pass): transformObjectKeys and hand packers lower
an object literal into an empty object plus a contiguous run of property
writes (var O = {}; O.a = …; O.b = …). This pass folds that run back into
the literal it came from. Beyond readability, it is the keystone for the
operator-proxy *tables* obfuscators build the same way: reconstructing
var t = {}; t.m = function(){…} into the { m: function(){…} } literal is
what lets proxy-inline recognize and collapse them, which folds the
opaque predicates guarding dead branches so DCE removes them. Sound by
construction: only an empty-object seed, only immediately-following
contiguous X.<staticKey> = <expr> writes, value may not reference X (it
is not yet bound in the literal), __proto__ and duplicate keys stop the
run. Runs before ProxyInline. Cuts corpus decoder residue 217 -> 149
hexrefs and shrinks the split-object profiles further.

brackets metric: counted raw [" occurrences, which are dominated by
array/object literals (= ["x"]), not member access — sample_10 read 96
when only ~6 are real accesses. Now counts [" only where the byte before
[ ends an expression (identifier char, ), ], or quote), i.e. genuine
string-keyed member access. Mirrored in report.rs and golden.rs.

New phase1 tests cover the fold and the proxy-table cascade. Snapshots
and scoreboard re-blessed; full equivalence/determinism/corpus net green.
…-zft90c

# Conflicts:
#	tests/snapshots/SCOREBOARD.md
#	tests/snapshots/sample_11.js.out.js
#	tests/snapshots/sample_7.js.out.js
@vasie1337 vasie1337 merged commit 7e97a1c into main Jun 8, 2026
@vasie1337 vasie1337 deleted the claude/stoic-franklin-zft90c branch June 8, 2026 19:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants