| Metric | Value | Target | Status |
|---|---|---|---|
| Full EXE size | 681,824 | 681,824 | ✅ MATCH |
| Stub size | 369,568 | 369,568 | ✅ MATCH |
| Relocations | 597/597 | 597 | ✅ 100% MATCH |
| MZ header | 14/14 | 14/14 | ✅ 100% MATCH |
| Code+Data diffs | 0 / 366,496 | 0 | ✅ 100% MATCH |
| Overlay | 312,256/312,256 | 312,256 | ✅ 100% MATCH |
| String match | 6,805/6,805 | 6,805 | ✅ 100% MATCH |
| Reloc zone opcodes | 17,159/17,159 | 17,159 | ✅ 100% MATCH |
| MD5 | 09BEBE491B015EDE4CEA210469F863CC | Match | ✅ PERFECT MATCH |
STATUS: 100% BINARY-EXACT MATCH — ACHIEVED February 27, 2026 This is the first bit-perfect reconstruction of XyWrite IV v4.018 in history.
The MZ EXE stub (369,568 bytes) has been reconstructed to produce
a byte-identical binary. MD5: 09BEBE491B015EDE4CEA210469F863CC
- File: EDITOR.EXE (681,824 bytes = 369,568 stub + 312,256 overlay)
- MD5:
09BEBE491B015EDE4CEA210469F863CC - Structure: MZ EXE stub (43 segments, 597 relocations) + overlay binary
- Architecture: x86-16 real mode with
.386/.486(MOVSX, SETcc, BT, BSWAP) - Language: 100% hand-written assembly — NO C runtime, NO compiler artifacts
Folder structure:
build/
├── BUILDOVL.BAT Assembles + links 35 overlay files
├── BUILDEXE.BAT Assembles + links 64 core stub files
├── CORELINK.RSP Linker response for core stub (64 OBJs)
├── OVLLINK.RSP Linker response for overlay (35 OBJs)
├── CORE/ Core stub ASM source (64 files + SEGMENTS.INC + XYOPCDES.INC)
├── OVERLAY/ Overlay ASM source (35 files + OVLSEGS.INC)
│ └── OUTPUT/ Overlay build output (OBJ, LST, OVERLAY.EXE, OVERLAY.BIN)
├── BIN/ Core stub build output (OBJ, LST, EDITOR4.EXE, EDITOR.EXE)
├── ORIGINAL/ Reference binaries (EDITOR.EXE, OVERLAY.EXE, OVERLAY.BIN)
├── SCRIPTS/ Post-build PowerShell scripts
│ ├── MAKEEXE.PS1 Master: STRIPHDR → PADHDR → AUDIT
│ ├── STRIPHDR.PS1 Strip MZ header from OVERLAY.EXE → .BIN
│ ├── PADHDR.PS1 Pad stub header + combine + patch checksum
│ └── AUDIT.PS1 Full binary verification audit
└── TOOLCHAIN/ Local MASM 6.11 + LINK 5.13
Phase 1 — DOSBox-X:
└→ BUILDOVL.BAT (35 OVL ASM → OVERLAY\OUTPUT\OVERLAY.EXE)
└→ BUILDEXE.BAT (64 CORE ASM → BIN\EDITOR4.EXE)
Phase 2 — PowerShell: SCRIPTS\MAKEEXE.PS1
└→ STRIPHDR.PS1 (strip MZ header → OVERLAY.BIN)
└→ PADHDR.PS1 (pad header 0xA00→0xC00 + append overlay + patch)
└→ AUDIT.PS1 (MD5 + byte verification → PERFECT BINARY MATCH)
#### Phase 1: Assembly + Link (DOSBox-X)
```batch
mount C F:\RE\REVERSE\XyWrite\XyWrite4\build
mount D F:\RE\REVERSE\XyWrite\Tools
C:
1. **BUILDOVL.BAT** — Assembles 35 overlay ASM files from `OVERLAY\`, outputs OBJ/LST to `OVERLAY\OUTPUT\`, links to `OVERLAY\OUTPUT\OVERLAY.EXE`
2. **BUILDEXE.BAT** — Assembles 64 core ASM files from `CORE\`, outputs OBJ/LST to `BIN\`, links to `BIN\EDITOR4.EXE`
#### Phase 2: Post-Processing (PowerShell)
```powershell
cd F:\RE\REVERSE\XyWrite\XyWrite4\build
.\SCRIPTS\MAKEEXE.PS1
MAKEEXE.PS1 runs three steps:
| Step | Script | Action |
|---|---|---|
| 1 | STRIPHDR.PS1 | Strip MZ header from OVERLAY\OUTPUT\OVERLAY.EXE → OVERLAY.BIN |
| 2 | PADHDR.PS1 | Pad stub header (0xA00→0xC00) + append overlay + patch checksum |
| 3 | AUDIT.PS1 | MD5 + byte comparison vs ORIGINAL\EDITOR.EXE |
*** PERFECT BINARY MATCH ***
MD5: 09BEBE491B015EDE4CEA210469F863CC
Size: 681,824 bytes
Offset 0x00000–0x00BFF MZ header (3,072 bytes, padded from 0x0A00)
Offset 0x00C00–0x5A39F Stub load image (43 segments, 366,496 bytes)
Offset 0x5A3A0–0xA675F Overlay (34 segments, 312,256 bytes)
Total: 681,824 bytes, MD5 09BEBE491B015EDE4CEA210469F863CC
- Comment all assembly routines (purpose, inputs, outputs, side effects)
- Add file-level headers describing each module
- Document calling conventions used across routines
- Explain any non-obvious optimizations or low-level tricks
- Split large assembly files into smaller, logical modules
- Group related subroutines into dedicated files (e.g., text handling, I/O, UI)
- Standardize file naming conventions
- Ensure each file has a clear, single responsibility
- Identify all ambiguous or auto-generated labels
- Rename labels to meaningful, descriptive names
- Establish naming conventions for:
- Functions
- Local labels
- Global labels
- Constants and macros
- Remove unused or duplicate labels
- Organize build scripts for modular assembly files
- Ensure reproducible builds
- Add debug vs release build configurations
- Document build steps clearly
- Map out high-level architecture of the original codebase
- Identify core subsystems (editor, rendering, input, file I/O)
- Document data structures and memory layout
- Trace key execution paths (startup, editing loop, save/load)
- Verify behavior matches original XYWrite functionality
- Create small test cases for critical routines
- Check edge cases (large files, unusual input)
- Validate stability after refactoring
- Identify reusable algorithms and logic
- Separate platform-dependent code (DOS-specific parts)
- Mark areas for future C++/Qt reimplementation
- Define clean interfaces for porting components
- Track progress per module
- Maintain changelog of refactors
- Set milestones for incremental cleanup
- Identify and document all existing bugs
- Reproduce bugs consistently with test cases
- Categorize bugs (critical, major, minor)
- Identify architectural or design limitations
- Document constraints imposed by legacy DOS environment
- Fix confirmed bugs systematically
- Refactor or redesign areas causing major limitations
- Identify missing or desirable features for XYWrite4
- Prioritize features (core vs optional)
- Ensure new features align with lightweight philosophy
- Avoid feature bloat—define strict inclusion criteria
- Create a roadmap for feature implementation
-
Identify the original assembler used (if possible) for XYWrite sources
-
Evaluate modern compatible assemblers (MASM, TASM, NASM, FASM)
-
Select the assembler that produces the most accurate binary output
-
Identify and configure a compatible linker for the chosen assembler
-
Ensure the build toolchain replicates original binary behavior as closely as possible
-
Audit all
db-encoded instruction sequences- Example:
db 81h, 0FFh, 6, 0→ replace withcmp di, 6 - Example:
-
db 32h, 0E4h→xor ah, ah-db 8Bh, 0C8h→mov cx, ax-db 36h, 29h, 0Eh, 3Ah, 37h→sub word ptr ss:[0x373a], cx-db 7Eh, 2→jle <label>-db 0F3h, 0A4h→rep movsb-db 8Bh, 0F3h→mov si, bx
- Example:
-
Replace raw opcode (
db) sequences with proper assembly mnemonics wherever possible -
Identify why raw opcodes were originally used:
- Assembler limitations
- Optimization tricks
- Self-modifying code
- Macro/workaround behavior
-
Validate that rewritten instructions produce identical machine code
-
Use disassembly tools to verify correctness of transformations
-
Preserve behavior in edge cases (flags, segment overrides, etc.)
-
Document any instructions that must remain as raw opcodes and explain why
-
Establish guidelines for when
dbusage is acceptable vs prohibited -
Create automated or semi-automated process for opcode-to-mnemonic conversion (if feasible)
-
Ensure final codebase is readable, maintainable, and assembler-friendly