Skip to content

siyamsarker/OSFlip

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

76 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

OSFlip

Reinstall the operating system on a live EC2 instance — without replacing the instance.

Version Tests Coverage Python 3.9+ MIT License AWS EC2 AES-256-GCM


What is OSFlip?

OSFlip is a command-line tool that swaps the root OS volume on a running EC2 instance and leaves everything else alone: the instance ID, Elastic IP, ENI, IAM role, tags, security groups, and every attached data volume stay exactly as they were.

The problem it solves: when you need to migrate an instance to a new OS (say, Ubuntu 20.04 → 22.04, or Amazon Linux 2 → Amazon Linux 2023), the usual options are either a destructive terminate-and-replace (you lose the instance identity) or a slow, error-prone in-place upgrade (you risk a broken system). OSFlip gives you a third path — a clean OS swap in minutes, with a safety snapshot as a rollback point and full resumability if anything is interrupted.


Contents


Key Capabilities

Capability Detail
Non-destructive OS swap Only the root EBS volume is replaced; instance identity and all data volumes are preserved
Resumable state machine Nine atomic, idempotent steps persisted after each transition
Multi-account support AES-256-GCM encrypted credential store with OS keychain integration
Five auth methods Access keys, role assumption, AWS SSO, instance profile, environment variables
Dry-run mode Intercepts every planned AWS API call and prints the full operation list without executing
Pre-reinstall snapshot Safety EBS snapshot created before root detach — rollback point if anything goes wrong
Partial-success handling If a data volume fails to re-attach, the instance stays running and the user retries only that step
Interactive TUI Run osflip with no arguments for a guided menu — grouped actions, numbered shortcuts, instance/AMI pickers, and live progress on every AWS call
Machine-readable output --output json on every list, status, and doctor command for scripting and CI
Structured JSON logging Rotating log files with automatic credential redaction on every record

How It Works

OSFlip manages the OS swap as a nine-step state machine. Progress is persisted to ~/.osflip/state.json after every step, so an interrupted run resumes from exactly where it stopped — never from scratch.

┌─────────────────────────────────────────────────────────────────────┐
│                        OSFlip Reinstall Flow                        │
├──────┬──────────────────────────────────────────────────────────────┤
│ Step │ Action                                                       │
├──────┼──────────────────────────────────────────────────────────────┤
│  1   │ Record the full EBS volume attachment map                    │
│  2   │ Stop the EC2 instance                                        │
│  3   │ Snapshot the current root volume (safety backup)             │
│  4   │ Detach the root volume — data volumes stay attached          │
│  5   │ Create a new root volume from the target AMI and attach it   │
│  6   │ Start the instance and wait for EC2 health checks            │
│  7   │ Re-attach all data volumes at their original device names    │
│  8   │ Verify every data volume is in the in-use state              │
│  9   │ Delete the old root volume (only on full success)            │
└──────┴──────────────────────────────────────────────────────────────┘

Every step is idempotent: if the process dies and is restarted, a step that already ran detects its own work and skips ahead safely. Data volumes are never modified by steps 1–6 — they are handled exclusively by steps 7 and 8, and even there they are only re-attached, never created or deleted.


Requirements

  • Python 3.9 or later
  • An AWS IAM user or role with the permissions in docs/iam-policy.json
  • The target EC2 instance must be reachable (same account and region as the configured credentials)

IAM Permissions

Attach the policy in docs/iam-policy.json to the IAM user or role OSFlip will authenticate as. The policy is scoped to the minimum set of EC2, EBS, and STS actions required.

Core permission groups:

Action group Purpose
ec2:Describe* Read instance, volume, snapshot, and AMI metadata
ec2:StartInstances, ec2:StopInstances Instance lifecycle for reinstall steps 2 and 6
ec2:CreateVolume, ec2:DeleteVolume Create new root volume from AMI, delete old root on success
ec2:AttachVolume, ec2:DetachVolume Volume attachment in steps 4, 5, 7
ec2:CreateSnapshot, ec2:DeleteSnapshot Safety snapshots and manual snapshot commands
ec2:ModifyInstanceAttribute Restore delete-on-termination flags after volumes are re-attached — the policy conditions this action to the block-device-mapping attribute only, so it cannot be used to alter user data, security groups, or termination protection
ec2:CreateTags Tag resources created by OSFlip (scoped to creation actions)
sts:GetCallerIdentity, sts:AssumeRole Identity resolution and role assumption

Installation

Recommended — pipx (isolated, no venv management):

pipx installs CLI tools in isolated Python environments so they don't interfere with your system packages.

Don't have pipx yet? Install it first

1. Install pipx

OS / Distro Command
Debian / Ubuntu sudo apt install pipx
Fedora / RHEL 8+ / CentOS Stream sudo dnf install pipx
Arch / Manjaro sudo pacman -S python-pipx
openSUSE Tumbleweed / Leap sudo zypper install python3-pipx
Alpine Linux sudo apk add pipx
macOS brew install pipx
Other python3 -m pip install --user pipx

2. Add pipx to your PATH

pipx ensurepath

Restart your shell (or source ~/.bashrc / source ~/.zshrc) for the change to take effect.

pipx install git+https://github.com/siyamsarker/OSFlip.git
osflip --version

From source (for development):

git clone https://github.com/siyamsarker/OSFlip.git
cd OSFlip
pip install -e ".[dev]"
osflip --version

Updating

pipx install:

pipx upgrade osflip
osflip --version

From source:

cd OSFlip
git pull origin main
pip install -e ".[dev]"
osflip --version

Uninstalling

pipx install:

pipx uninstall osflip

From source:

pip uninstall osflip

OSFlip does not remove its data directory on uninstall. To clean up application data as well:

rm -rf ~/.osflip

Warning: ~/.osflip/ contains your encrypted credential store and logs. Back up or export your account credentials before deleting it.


Quick Start

# First-run wizard — add first account and print next steps
osflip init

# Run preflight checks (config, STS identity, IAM permissions, in-progress ops)
osflip doctor

# Add your first AWS account (interactive prompts guide you through auth method selection)
osflip account add

# Verify the credentials resolve correctly against STS
osflip account test

# List all running and stopped instances in the active account
osflip instance list

# List instances across all configured accounts
osflip --all-accounts instance list

# List instances and emit machine-readable JSON
osflip --output json instance list

# Launch the interactive menu (no subcommand needed)
osflip

# Reinstall the OS interactively — prompts for instance and AMI selection
osflip reinstall

# Reinstall non-interactively
osflip reinstall --instance i-0abc1234def567890 --ami ami-0abcdef1234567890 --force

# Preview every planned AWS API call without making any changes
osflip reinstall --instance i-0abc1234def567890 --ami ami-0abcdef1234567890 --dry-run

# Resume an interrupted reinstall
osflip reinstall --instance i-0abc1234def567890 --resume

# Retry failed data-volume re-attachments from a partial-success state
osflip reinstall --instance i-0abc1234def567890 --retry-volumes

# Roll back to the previous OS from the safety snapshot
osflip reinstall --instance i-0abc1234def567890 --rollback

# View all tracked reinstall operations with status and next-action hints
osflip status

# Batch reinstall multiple instances from a TOML manifest
osflip batch reinstall manifest.toml

Authentication

OSFlip stores credentials in ~/.osflip/credentials.enc — encrypted with AES-256-GCM using a PBKDF2-HMAC-SHA256 derived key (600,000 iterations per OWASP 2023). The master password can be stored in the OS keychain to avoid prompts on every run.

Method Description
access_key AWS access key ID + secret access key stored in the encrypted credential file
assume_role Calls sts:AssumeRole using the configured source account's credentials (or the ambient chain); supports external IDs and MFA-protected roles
sso AWS IAM Identity Center — uses the AWS CLI profile named after the account alias (aws configure sso --profile <alias>, then aws sso login --profile <alias>)
instance_profile EC2 IMDS — no credentials needed when running directly on an EC2 instance
env AWS_ACCESS_KEY_ID / AWS_SECRET_ACCESS_KEY environment variables — for CI/CD pipelines
# Add an account interactively (guided prompts for each auth method)
osflip account add

# Switch the active account for subsequent commands
osflip account switch staging

# Use a specific account for a single command
osflip --account prod instance list

# Use a specific region for a single command
osflip --region eu-west-1 instance list

Configuration

OSFlip reads ~/.osflip/config.toml on startup and creates it with defaults if it does not exist. Every setting can be overridden with an OSFLIP_ environment variable using double underscores for nested keys.

[global]
default_account               = "prod"          # active account alias
default_region                = "us-east-1"
auto_snapshot                 = true            # snapshot a volume before deleting it
confirm_destructive           = true            # prompt before destructive operations
dry_run                       = false
log_level                     = "INFO"          # DEBUG | INFO | WARNING | ERROR
parallel                      = 4               # worker threads for --all-accounts listings
max_retries                   = 7               # retry budget for throttled AWS calls
api_timeout_seconds           = 30              # boto3 read timeout per API call
protected_instance_tag_key    = "osflip:protected"  # EC2 tag key that marks an instance as protected

[reinstall]
snapshot_before_reinstall     = true
delete_old_volume_on_success  = true
wait_timeout_seconds          = 600             # max seconds to wait per state transition
health_check_retries          = 20              # EC2 health check poll attempts
health_check_interval_seconds = 15              # seconds between polls

Protected instances:

Any EC2 instance carrying a tag whose key matches global.protected_instance_tag_key (default: osflip:protected) with a truthy value (true, 1, yes, on) requires the operator to type the instance name before a reinstall proceeds. This safeguards critical instances against accidental OS replacement. Because the tag is an explicit per-resource opt-in, the check cannot be bypassed with --force or by disabling confirm_destructive — remove the tag to reinstall the instance non-interactively.

Multi-account & output:

Pass --all-accounts to fan instance list and snapshot list across every configured account in a single command — accounts are queried concurrently (bounded by the parallel setting) and a failure in one account never blocks the others. For volume list the flag prints a warning and proceeds against the active account only, since volumes are instance-scoped. Pass --output json (or -o json) to any list, status, or doctor command for machine-readable output suitable for scripting and CI pipelines. In fan-out mode, each record includes an Account key identifying its source account.

Recovery:

Every reinstall persists its progress, so an interrupted or failed run is never a dead end:

  • osflip reinstall --resume continues an interrupted or failed reinstall from its last recorded step.
  • osflip reinstall --retry-volumes re-attaches only the data volumes that failed after a partial success.
  • osflip reinstall --rollback restores the previous OS from the safety snapshot.

Run osflip status to view all tracked operations with their current status, last completed step, timestamps, and the recommended next action.

Environment variable overrides:

OSFLIP_GLOBAL__LOG_LEVEL=DEBUG
OSFLIP_REINSTALL__WAIT_TIMEOUT_SECONDS=900
OSFLIP_REINSTALL__SNAPSHOT_BEFORE_REINSTALL=false

Terminal display:

OSFlip auto-detects terminal capabilities (colour support and UTF-8). You can override the defaults:

Variable Example Effect
NO_COLOR NO_COLOR=1 Disable all colour output (no-color.org convention)
OSFLIP_NO_UNICODE OSFLIP_NO_UNICODE=1 Use ASCII glyphs instead of Unicode symbols
OSFLIP_ASCII OSFLIP_ASCII=1 Force ASCII glyphs and disable colour (maximum compatibility)

NO_COLOR takes effect when set to any value (including empty); the OSFLIP_* toggles activate on a truthy value (1, true, yes, or on).


Available Commands

osflip [--account ALIAS] [--region REGION] [--all-accounts] [--output {table,json}] [--dry-run] [--log-level LEVEL]
│
├── init          — Guided first-run wizard: add first account and print next steps
│
├── doctor        — Preflight checks: config, STS identity, region connectivity,
│                   IAM permission simulation, in-progress operations
│                   (exits non-zero on any failure; supports --output json)
│
├── status        — List all tracked reinstall operations with status, step,
│                   timestamps, and next-action hint (read-only; supports --output json)
│
├── account
│   ├── add       — Add a new AWS account credential (interactive)
│   ├── list      — List all configured accounts
│   ├── remove    — Remove an account from the store
│   ├── switch    — Set the active account for subsequent commands
│   └── test      — Verify credentials via sts:GetCallerIdentity
│
├── instance
│   ├── list      — List instances (with --name and --state filters)
│   ├── stop      — Stop a running instance
│   ├── start     — Start a stopped instance
│   └── reboot    — Send a reboot command
│
├── reinstall     — Full OS swap state machine
│   ├── --instance      Target EC2 instance ID
│   ├── --ami           New AMI ID (or select interactively)
│   ├── --os            OS family for AMI search (default: ubuntu)
│   ├── --arch          AMI architecture (default: x86_64)
│   ├── --dry-run       Preview API calls without executing
│   ├── --force / -y    Skip confirmation prompts
│   ├── --no-snapshot   Skip the pre-reinstall safety snapshot
│   ├── --no-delete     Keep the old root volume after success
│   ├── --resume        Resume an interrupted reinstall
│   ├── --retry-volumes Retry failed data-volume reattachment
│   └── --rollback      Restore the previous OS from the safety snapshot
│
├── batch
│   └── reinstall <manifest.toml>   — Sequential multi-instance reinstall from a TOML manifest
│       ├── --force / -y    Skip batch confirmation
│       ├── --dry-run       Preview without executing
│       └── --output json   Machine-readable summary (exits non-zero if any entry failed)
│
├── volume
│   ├── list      — List volumes attached to an instance
│   ├── snapshot  — Create a snapshot of a volume
│   ├── attach    — Attach a volume to an instance
│   ├── detach    — Detach a volume from an instance
│   └── delete    — Delete a volume (with confirmation)
│
├── ami
│   ├── list      — List private AMIs in the account
│   └── search    — Search public AMIs by OS family and architecture
│
├── snapshot
│   ├── list      — List EBS snapshots
│   ├── create    — Create a snapshot of a volume
│   └── delete    — Delete a snapshot (with confirmation)
│
├── logs
│   ├── view      — Display recent log entries (--lines, --level, --account, --resource)
│   └── tail      — Follow the log file in real time (Ctrl+C to stop)
│
└── config
    ├── get       — Print a single setting by dot-path (e.g. global.dry_run)
    ├── set       — Set a configuration value
    ├── reset     — Reset to defaults
    └── validate  — Validate the config file and print the resolved settings

Run osflip with no arguments to open the interactive main menu.


Batch Reinstall Manifest

osflip batch reinstall <manifest.toml> reads a TOML manifest and reinstalls each instance sequentially. Each [[instance]] table requires id and ami; no_snapshot and no_delete are optional booleans (both default to false).

[[instance]]
id  = "i-0123456789abcdef0"
ami = "ami-0123456789abcdef0"

[[instance]]
id  = "i-0fedcba9876543210"
ami = "ami-0fedcba9876543210"
no_snapshot = false
no_delete   = false

The command exits non-zero if any entry failed. Pass --output json to receive a per-instance summary table in machine-readable form, or --dry-run to preview the operations without executing them.


Safety Guarantees

Guarantee Implementation
Root device name preserved New root attached at the exact device name read from the running instance, not from the AMI — prevents boot failure after a prior migration
Data volumes never deleted No non-root volume is ever detached, deleted, or modified during any reinstall step
Data volumes auto-reattach All data volumes are re-attached at their original device names after the new OS boots, with original delete-on-termination flags preserved
Pre-reinstall snapshot Safety EBS snapshot of the old root taken before it is detached — provides a recovery point in all cases
Old root deleted only on full success The replaced root volume is deleted only after the new OS passes EC2 health checks and all data volumes are confirmed in-use
Partial success handled gracefully If any data volume fails to re-attach, the instance is left running and the user is prompted to retry; the old root volume is not deleted
Confirmation prompts Every destructive operation asks for confirmation (default No) unless --force is passed or confirm_destructive is disabled in config
Protected-instance check Instances tagged as protected require typing the instance name to confirm — this check cannot be bypassed by --force or configuration
Resumable state machine If the process is killed at any step, the next run resumes from the last persisted step — no duplicate destructive actions

Security

  • Encrypted credentials — the credential file is AES-256-GCM encrypted (mode 0600). The master password is never stored on disk by OSFlip itself; it can optionally be cached in the OS keychain.
  • Credential redaction — the logging layer replaces known credential field names and AWS access key patterns with ***REDACTED*** on every log record and in dry-run output.
  • No shell=True — all subprocess calls use list arguments.
  • Short-lived key material — derived encryption keys exist only for the duration of a single encrypt/decrypt call and are never written to disk.
  • Resource-scoped IAM — the provided IAM policy restricts volume, snapshot, and instance actions to specific resource ARNs, and ec2:CreateTags only to creation actions.

Architecture

osflip/
├── cli/                     ← Typer commands, Rich tables, interactive menus
│   └── commands/            ← one file per subcommand group
├── core/                    ← business logic (pure Python, no AWS imports)
│   ├── instance_manager.py
│   ├── volume_manager.py
│   ├── ami_manager.py
│   ├── snapshot_manager.py
│   └── reinstall/           ← nine-step state machine
│       ├── state.py         ← ReinstallState dataclass + JSON persistence
│       ├── steps.py         ← one function per step (idempotent, resumable)
│       └── orchestrator.py  ← drives the state machine, emits progress events
├── aws/                     ← boto3 session factory, retry middleware, typed wrappers
│   ├── session.py           ← builds boto3 Sessions for all 5 auth methods
│   ├── middleware.py        ← dry-run interceptor (records planned calls)
│   └── ec2.py / sts.py      ← typed EC2 + STS wrapper functions
├── credentials/             ← AES-256-GCM encrypted store
│   ├── encryption.py        ← encrypt / decrypt with PBKDF2-HMAC-SHA256
│   ├── keychain.py          ← OS keychain integration (macOS Keychain, etc.)
│   ├── models.py            ← Pydantic Account / CredentialStore models
│   └── store.py             ← load / save / add / remove operations
├── config/                  ← TOML config + Pydantic settings schema
├── log/                     ← rotating JSON log handler with redaction
├── state/                   ← operation-state persistence for resume support
└── utils/                   ← formatting, validators, sanitisation

Data flow for a reinstall:

CLI command (reinstall.py)
  └─► ReinstallOrchestrator.run()
        └─► step_*(account, state, region)   [steps.py]
              └─► volume_manager / instance_manager   [core/]
                    └─► ec2_api.*()   [aws/ec2.py]
                          └─► boto3 client   [aws/session.py]

Development

git clone https://github.com/siyamsarker/OSFlip.git
cd OSFlip
pip install -e ".[dev]"
osflip --version

See CONTRIBUTING.md for the full development guide — coding standards, testing requirements, coverage thresholds, commit convention, and PR process.


License

MIT — Copyright 2026 Siyam Sarker

About

EC2 OS reinstall CLI — swaps the root EBS volume on a live instance without terminating it. Preserves instance ID, Elastic IP, ENI, IAM role, tags, security groups, and all data volumes. Idempotent 9-step state machine with dry-run, resume, safety snapshots, and one-command rollback.

Topics

Resources

License

Contributing

Security policy

Stars

Watchers

Forks

Contributors

Languages