Skip to content

pinonym/unpy

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

unpy

Static extractor and decompiler for PyInstaller-packed executables, written in Rust. Designed for malware reverse engineering — no Python runtime required.

$ unpy malware.exe -o out/
[*] read 41711939 bytes from malware.exe
[*] cookie at offset 0x27c78eb
[*] archive start at offset 0xaaa00
[*] Python version: 312 (python312.dll)
[*] 1281 TOC entries
[*] extracted 2591 files
[*] report written to out/report.json

What unpy actually is

unpy combines two separate steps that analysts usually run manually:

  1. PyInstaller extraction (what pyinstxtractor does) — PE → raw .pyc files
  2. Bytecode decompilation (what pycdc does) — .pyc.py source

The real value is not magic — it's the structured output: modules are automatically classified into stdlib / libs / project, a JSON report is generated, and the decompilation chain (pycdc → pycdas fallback) runs without manual intervention.

If you already have a working pyinstxtractor + pycdc setup, unpy is not a replacement. It's a convenience wrapper with analyst-oriented output.

Dependencies

  • Rust + Cargo — to build unpy
  • pycdc — bytecode decompiler, must be in PATH

pycdc is not always available via package manager and is not actively maintained. Build it from source if needed: github.com/zrax/pycdc

Install

git clone https://github.com/pinonym/unpy
cd unpy
cargo build --release
# binary at target/release/unpy

Usage

unpy <input.exe> [OPTIONS]

Options:
  -o, --output <DIR>          Output directory [default: <input>_extracted]
  --no-libs                   Skip stdlib and third-party modules
  --pycdc <PATH>              Path to pycdc binary [default: pycdc]
  --uncompyle6 <PATH>         Path to uncompyle6 binary (used for Python <= 3.8)
  --python-version <VER>      Force Python version (e.g. 312) instead of autodetect
  -v, --verbose               Show per-file decompilation status

Output structure

out/
├── src/
│   ├── project/     ← attacker's custom modules — start here
│   │   ├── mainscript.py
│   │   └── encryptor.py
│   ├── libs/        ← known third-party (discord, requests, …) — verify these
│   └── stdlib/      ← Python built-ins — usually safe to ignore
├── pyc/             ← raw .pyc files, mirroring src/ structure
│   ├── project/
│   ├── libs/
│   └── stdlib/
├── binaries/        ← native .dll / .pyd / .so
└── report.json

Use --no-libs to extract only project/ and skip stdlib/third-party noise.

Decompilation chain

For each .pyc:

  1. pycdc — attempts full Python source reconstruction
  2. uncompyle6 (Python ≤ 3.8 only, if --uncompyle6 is provided) — better support for older bytecode formats
  3. pycdas (fallback) — triggered if all above fail. Gives readable bytecode disassembly.

Each module in report.json includes which decompiler was used and the final status.

report.json

{
  "python_version": 312,
  "total_modules": 1327,
  "decompilation": {
    "ok": 232,
    "partial": 0,
    "disasm": 1095,
    "failed": 0
  },
  "suspicious_count": 15,
  "modules": [
    {
      "name": "mainscript",
      "status": "ok",
      "decompiler": "pycdc",
      "category": "project",
      "suspicious": false,
      "path": "out/src/project/mainscript.py"
    },
    {
      "name": "requestss",
      "status": "disasm",
      "decompiler": "pycdas",
      "category": "project",
      "suspicious": true,
      "path": "out/src/project/requestss.py"
    }
  ]
}

Status values:

  • ok — clean pycdc decompile, readable Python source
  • disasm — pycdas fallback, bytecode disassembly (pycdc failed or produced no code)
  • partial — pycdc produced warnings, pycdas also unavailable
  • failed — both decompilers failed

suspicious flag: set when a module name in project/ is within Levenshtein distance 2 of a known stdlib or third-party package name — potential typosquatting or name confusion attack.

⚠ Third-party packages in libs/ can also be backdoored. The classification is a best-effort heuristic, not a guarantee. Always verify libs/ modules that appear in sensitive code paths.

Supported targets

  • Python: 3.0 – 3.13
  • PyInstaller: 3.x – 6.x
  • Binary format: PE (Windows .exe) — ELF/MachO not yet supported
  • Encryption: PyInstaller 4.x --key AES encryption is detected but not decrypted; affected modules will fall back to pycdas disassembly

Limitations

  • ELF (Linux) and Mach-O (macOS) binaries not yet supported
  • PyInstaller 4.x AES-encrypted bundles: modules decrypt only at runtime, unpy cannot recover plaintext
  • pycdc support for Python 3.12+ opcodes is still incomplete — expect heavy pycdas fallback on recent samples
  • The stdlib/third-party classification is a static heuristic list, not exhaustive

Python 3.6 / 3.7 — bytecode disassembly unavailable

Extraction and .pyc output work correctly, but all decompilation steps fail on Python 3.6–3.7 bytecode due to two separate upstream bugs:

  • pycdc / pycdas : does not handle FLAG_REF (0xe3) in the marshal parser — the 0x80 bit is not stripped before type dispatch, causing std::bad_cast on virtually all real-world 3.6 binaries.

  • uncompyle6 / decompyle3 : xdis 6.1.x marshal parser crashes on FLAG_REF code objects.

FLAG_REF is a CPython marshal optimisation that marks objects interned in a reference table during deserialisation. It became pervasive in PyInstaller-packed 3.6 binaries.

unpy will still extract all .pyc files and emit a warning in report.json. Workaround : strings on the binary for quick triage, or python3.6 -m dis if a Python 3.6 interpreter is available.

Non-goals

No dynamic analysis, no VirusTotal submission, no scoring. The tool extracts and decompiles. The analyst reads.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages