A production-ready, secure Telegram bot for triggering deployments to staging and production environments β with role-based access control, audit logging, real-time log streaming, concurrent health checks, deploy locking, subprocess timeouts, and auto-rollback.
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β TELEGRAM DEPLOYMENT BOT β
β β
β Developer (Telegram) β
β β β
β β /deploy production β
β βΌ β
β βββββββββββββββ RBAC ββββββββββββββββββββ Audit Log β
β β Bot Handlerβ βββββββββββΊ β Role Check β βββββββββββΊ File/S3 β
β β (PTB) β β (admin_ids list)β β
β βββββββββββββββ ββββββββββ¬ββββββββββ β
β β β
Authorized β
β ββββββββββΌββββββββββ β
β β Deploy Lock β β prevents double- β
β β (_deploying set) β deploy race cond. β
β ββββββββββ¬ββββββββββ β
β β β
Lock acquired β
β ββββββββββΌββββββββββ β
β β Inline Confirm β β
β β (commit hash) β β
β ββββββββββ¬ββββββββββ β
β β β
Confirmed β
β ββββββββββΌββββββββββ β
β β DeploymentManagerβ β
β β subprocess exec β β
β β + timeout guard β β
β ββββββββββ¬ββββββββββ β
β ββββββββββββββββββββΌβββββββββββββββββββ β
β βΌ βΌ βΌ β
β βββββββββββββ ββββββββββββββββββββ ββββββββββββββ β
β β Git Pull β β Docker Build β β Push to ECRβ β
β βββββββββββββ ββββββββββββββββββββ βββββββ¬βββββββ β
β β β
β ββββββββββββββββββββββββββββββββββββββ β
β βΌ β
β ββββββββββββββββββββ β
β β Health Check β β state files written only β
β β (retry loop) β AFTER this passes β
β ββββββββββ¬βββββββββ β
β β
Pass β β Fail β
β βββββββββββββββ΄βββββββββββββββ β
β βΌ βΌ β
β βββββββββββββββββ βββββββββββββββββββ β
β β Notify user β
β β Auto-Rollback β β
β β Release lock β β Notify user β β β
β βββββββββββββββββ β Release lock β β
β βββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
telegram-deploy-bot/
β
βββ bot/ # Python bot source
β βββ bot.py # Entry point, command handlers, deploy lock
β βββ config.py # Lazy classmethod config (all values read at call time)
β βββ rbac.py # Role-based access control decorator
β βββ audit_logger.py # Structured audit log (JSON Lines)
β βββ deployment.py # Deployment orchestration + subprocess timeout
β βββ requirements.txt # Runtime Python dependencies
β
βββ scripts/ # Shell scripts (the actual deploy work)
β βββ deploy.sh # Full deployment pipeline
β βββ rollback.sh # Rollback to previous image
β
βββ terraform/ # AWS infrastructure as code
β βββ main.tf # EC2 + ECR + IAM + VPC + OIDC
β βββ destroy.sh # Safe teardown of all AWS resources
β
βββ docs/ # Documentation
β βββ INSTALLATION.md # Step-by-step installation guide
β βββ BENEFITS.md # Why use this bot
β
βββ nginx/ # Reverse proxy (webhook mode)
β βββ nginx.conf
β
βββ monitoring/ # Prometheus config
β βββ prometheus.yml
β
βββ .github/
β βββ workflows/
β βββ ci-cd.yml # GitHub Actions CI/CD pipeline
β
βββ Dockerfile # Multi-stage Docker build for the bot
βββ docker-compose.yml # Run the bot + supporting services
βββ requirements-dev.txt # Pinned dev + test dependencies
βββ .env.example # Environment variable template
βββ .secrets.baseline # detect-secrets baseline (committed)
βββ pytest.ini # Pytest configuration
βββ README.md
π Full step-by-step installation instructions are in
docs/INSTALLATION.md
Prerequisites:
- An AWS account β console.aws.amazon.com (free tier works)
- A GitHub account β github.com
- Terraform β₯ 1.5 β install guide
- The Telegram app and a bot token from @BotFather
GitHub Actions secrets required β set these under Settings β Secrets and variables β Actions:
| Secret | Source |
|---|---|
TELEGRAM_BOT_TOKEN |
From @BotFather |
TELEGRAM_CHAT_ID |
Your Telegram user ID (from @userinfobot) |
ECR_REGISTRY |
terraform output ecr_registry |
AWS_DEPLOY_ROLE_ARN |
terraform output deploy_role_arn |
STAGING_SSH_KEY |
Contents of ~/.ssh/deploy_key |
PRODUCTION_SSH_KEY |
Contents of ~/.ssh/deploy_key (same file) |
STAGING_HOST |
terraform output staging_ip |
PRODUCTION_HOST |
terraform output production_ip |
STAGING_HEALTH_URL |
http://<staging-ip>/health |
PRODUCTION_HEALTH_URL |
http://<production-ip>/health |
| Command | Role Required | Description |
|---|---|---|
/start or /help |
Any authorized | Show available commands |
/deploy staging |
Staging | Deploy develop branch to staging |
/deploy production |
Admin | Deploy main branch to production (requires confirmation) |
/rollback staging |
Admin | Rollback staging to the previous image |
/rollback production |
Admin | Rollback production to the previous image |
/status |
Staging | Show health and deployed commit for all environments |
All configuration is read from environment variables at call time β never frozen at import time. Copy .env.example to .env to get started.
| Variable | Required | Default | Description |
|---|---|---|---|
TELEGRAM_BOT_TOKEN |
β | β | Bot token from @BotFather |
ADMIN_TELEGRAM_IDS |
β | β | Comma-separated admin user IDs |
STAGING_TELEGRAM_IDS |
β | β | Comma-separated staging user IDs |
REGISTRY_URL |
β | β | ECR registry URL |
REGISTRY_IMAGE |
β | myapp |
Docker image name |
AWS_REGION |
β | us-east-1 |
AWS region for ECR auth |
STAGING_HOST |
β | β | Staging server IP/hostname |
PRODUCTION_HOST |
β | β | Production server IP/hostname |
DEPLOY_USER |
β | deploy |
SSH user on target servers |
SSH_KEY_PATH |
β | /app/secrets/deploy_key |
Path to SSH deploy key |
STAGING_HEALTH_URL |
β | β | Health check endpoint for staging |
PRODUCTION_HEALTH_URL |
β | β | Health check endpoint for production |
HEALTH_CHECK_TIMEOUT |
β | 30 |
Seconds per health check request |
HEALTH_CHECK_RETRIES |
β | 5 |
Number of health check retries |
DEPLOY_TIMEOUT_SECONDS |
β | 600 |
Max seconds before deploy is killed |
USE_KUBERNETES |
β | false |
Use kubectl instead of Docker Compose |
KUBE_NAMESPACE |
β | default |
Kubernetes namespace |
AUDIT_LOG_PATH |
β | /var/log/deploybot/audit.log |
Audit log file path |
GITHUB_BRANCH_STAGING |
β | develop |
Branch deployed to staging |
GITHUB_BRANCH_PRODUCTION |
β | main |
Branch deployed to production |
ADMIN β full access: production deploy, rollback, staging, /status
set via: ADMIN_TELEGRAM_IDS=123456789,987654321
STAGING β limited access: staging deploy + /status only
set via: STAGING_TELEGRAM_IDS=111222333
Roles are enforced by the @require_role decorator on every handler. Admin role is re-verified on every callback button press β buttons cannot be replayed by unauthorized users.
A module-level _deploying: set[str] prevents two concurrent deploys to the same environment. If an admin double-taps "Confirm" or a callback is replayed while a deploy is running, the second request is rejected immediately. The lock is released in a try/finally block so it is always freed, even if an unexpected exception occurs.
# β DANGEROUS β shell injection possible
subprocess.run(f"deploy.sh {user_input}", shell=True)
# β
SAFE β fixed argument list, no shell interpolation
asyncio.create_subprocess_exec("/app/scripts/deploy.sh", environment, commit)Environment and commit hash are additionally validated against strict allow-lists before reaching the subprocess call.
Every deploy and rollback subprocess is wrapped in asyncio.timeout(DEPLOY_TIMEOUT_SECONDS). If deploy.sh hangs β SSH timeout, docker build stall, network issue β the process is killed and an error is streamed back to the user. The bot never hangs indefinitely.
The audit log writes core fields (timestamp, user_id, action) after spreading arbitrary metadata, so no metadata key can silently overwrite the forensic trail. Every action β deploy started, deploy success, deploy failed, rollback, denial β is recorded with user identity, environment, commit, and UTC timestamp.
CI/CD deploy steps use trap 'rm -f /tmp/deploy_key' EXIT to guarantee the private key is deleted from the runner filesystem even if the SSH command fails.
User β /deploy production
β
βΌ
1. RBAC check β not admin? π« Denied + audited
β admin β
βΌ
2. Check deploy lock β env already deploying? β³ Rejected
β lock free β
βΌ
3. Fetch latest commit from Config.github_branch_production()
β
βΌ
4. Confirmation dialog (commit hash shown)
β Confirm clicked
βΌ
5. Re-verify admin role on callback
β
βΌ
6. Acquire deploy lock for environment
β
βΌ
7. Audit log: { user, action=deploy_started, env, commit, timestamp }
β
βΌ
8. Run deploy.sh production <commit> (timeout: DEPLOY_TIMEOUT_SECONDS)
βββ Validate inputs (whitelist env, validate commit SHA format)
βββ git fetch + checkout + pull origin main
βββ docker build --no-cache (image tagged with exact commit)
βββ aws ecr get-login-password | docker login
βββ docker push β ECR
βββ Save previous image ref for rollback
βββ ssh deploy@host β docker compose up -d
βββ Health check (10 retries Γ 10s)
β
βββ β
PASS β write state files (commit + timestamp)
β audit log deploy_success
β notify user β
β release deploy lock
β
βββ β FAIL β audit log deploy_failed
notify user β
run rollback.sh (with timeout + streaming)
audit log auto_rollback_completed/failed
notify user with rollback result
release deploy lock
Why state files are written after health check: If
deploy.shexits with code 1 (health check failed) and the bot triggers rollback,rollback.shreads the previous image ref to revert to. Writing state files before health check would record a broken deployment as the last known-good state β the rollback would restore the broken image. State files are written only after a successful health check confirms the deployment is live and healthy.
# Install runtime + dev dependencies
pip install -r bot/requirements.txt
pip install -r requirements-dev.txt
# Run all tests
pytest tests/ -v
# Run with coverage
pytest tests/ -v --cov=bot --cov-report=term-missing95 tests across 5 test files, covering:
- Config lazy evaluation and env-change reflection
- RBAC allow/deny logic and HTML parse mode on denial messages
- Deploy lock acquisition, rejection, and guaranteed release
- Deployment streaming, error detection, and subprocess timeout
- Concurrent health checks via
asyncio.gather() - Audit log integrity (metadata cannot overwrite core fields)
- Auto-rollback triggering on deploy failure
- Callback security (re-verification, double-confirm rejection)
Push to develop β test β build β push to ECR β deploy to staging β health check
Push to main β test β build β push to ECR β [approval gate] β deploy to production β health check β notify Telegram
Pull request β test only
All GitHub Actions are pinned to specific versions (no @master tags). The security scan (detect-secrets) runs against a committed .secrets.baseline so it produces stable, reproducible results.
cd terraform/
bash destroy.sh # interactive β prompts "type DESTROY to confirm"
bash destroy.sh --dry-run # preview all commands without executing
bash destroy.sh --force # skip confirmation (CI use)
bash destroy.sh --region eu-west-1 # override regionTears down EC2 instances, ECR repository and all images, IAM roles, VPC, subnets, internet gateway, security group, and SSH key pair.
Infrastructure:
[ ] SSH: disable password auth and root login (key-only)
[ ] Security group: restrict port 22 to your IP, not 0.0.0.0/0
[ ] Rotate the SSH deploy key every 90 days
[ ] ECR: scan images on push, fail CI on CRITICAL CVEs (Trivy configured)
[ ] Add SSH server fingerprints to known_hosts instead of StrictHostKeyChecking=no
Bot Security:
[ ] Whitelist only known Telegram user IDs β never run as a public bot
[ ] Permissions re-verified on every callback (already implemented)
[ ] Deploy lock prevents concurrent deploys (already implemented)
[ ] Subprocess timeout prevents hangs (already implemented)
[ ] Never log secrets (TELEGRAM_BOT_TOKEN excluded from safe_env)
Deployment:
[ ] Require PR review before merging to main
[ ] GitHub Environment protection rules with required reviewers for production
[ ] Add post-deploy smoke tests on top of the health check
[ ] Ship audit logs to immutable storage (S3 with Object Lock, CloudWatch Logs)
[ ] Set DEPLOY_TIMEOUT_SECONDS to match your slowest expected build time
Built with Python 3.12 Β· python-telegram-bot 21 Β· Runs on AWS EC2 Β· Deployed via Docker Β· Controlled via Telegram