feat(ci): add unified deployment pipeline with preflight checks and rollback support#1982
feat(ci): add unified deployment pipeline with preflight checks and rollback support#1982cal-id-actions[bot] wants to merge 13 commits into
Conversation
Greptile SummaryThis PR introduces a comprehensive unified deployment pipeline for all Cal-ID services (Web, API, Worker), replacing the per-service workflows with a single orchestrated
Confidence Score: 2/5Not safe to merge — the The migration step's SSH script references
|
| Filename | Overview |
|---|---|
| .github/workflows/deploy-all.yml | New 1870-line unified deployment pipeline; critical bug where $GITHUB_OUTPUT is used inside an SSH script but never forwarded via envs:, causing migration step failure on every run. |
| .github/workflows/rollback.yml | New manual rollback workflow; AWS credentials persist on EC2 host via aws configure set, and schema validation falls back to branch HEAD silently on SHA lookup failure. |
| infra/scripts/migrate.sh | New migration script with backup, idempotency, and timeout; uses eval for DB_BACKUP_COMMAND. |
| infra/scripts/acquire-lock.sh | New S3 conditional-write lock; dead expires_at variable (set to now instead of expiry) but lock payload is correct. |
| Dockerfile | Refactored to use BuildKit secret mounts for NEXTAUTH_SECRET and CALENDSO_ENCRYPTION_KEY, preventing leakage into image metadata. |
| packages/trpc/server/routers/viewer/webhook/testTrigger.handler.ts | Updated test webhook payload timezone and email to cal.id brand values. |
Reviews (1): Last reviewed commit: "Trigger auto pr" | Re-trigger Greptile
| run: | | ||
| set -euo pipefail | ||
| if [ "$REBUILD" = "true" ]; then | ||
| echo "image_exists=false" >> "$GITHUB_OUTPUT" | ||
| echo "Skipping image existence check — rebuild requested" | ||
| else | ||
| if aws ecr describe-images \ | ||
| --repository-name "$REPO_NAME" \ | ||
| --image-ids imageTag="$GIT_SHA" >/dev/null 2>&1; then | ||
| echo "image_exists=true" >> "$GITHUB_OUTPUT" | ||
| echo "Image ${REPO_NAME}:${GIT_SHA} already exists — will skip build" | ||
| else | ||
| echo "image_exists=false" >> "$GITHUB_OUTPUT" | ||
| echo "Image ${REPO_NAME}:${GIT_SHA} not found — will build" | ||
| fi | ||
| fi | ||
|
|
||
| - name: Build and push web image | ||
| if: ${{ steps.image-exists.outputs.image_exists != 'true' }} | ||
| id: build | ||
| uses: docker/build-push-action@v5 | ||
| with: | ||
| context: . | ||
| file: ./Dockerfile | ||
| platforms: linux/amd64 | ||
| push: true | ||
| build-args: | | ||
| NEXT_PUBLIC_GTM_ID=${{ needs.prepare-release.outputs.deploy_env == 'production' && secrets.NEXT_PUBLIC_GTM_ID_PROD || secrets.NEXT_PUBLIC_GTM_ID_STAG }} | ||
| NEXT_PUBLIC_META_WHATSAPP_BUSINESS_APP_ID=${{ secrets.NEXT_PUBLIC_META_WHATSAPP_BUSINESS_APP_ID }} | ||
| NEXT_PUBLIC_META_WHATSAPP_BUSINESS_CONFIG_ID=${{ secrets.NEXT_PUBLIC_META_WHATSAPP_BUSINESS_CONFIG_ID }} | ||
| NEXT_PUBLIC_WEBAPP_URL=${{ format('https://{0}', needs.prepare-release.outputs.deploy_env == 'production' && secrets.DOMAIN_NAME_PROD || secrets.DOMAIN_NAME_STAG) }} | ||
| NEXT_PUBLIC_WEBSITE_URL=${{ format('https://{0}', needs.prepare-release.outputs.deploy_env == 'production' && secrets.DOMAIN_NAME_PROD || secrets.DOMAIN_NAME_STAG) }} | ||
| NEXT_PUBLIC_API_V2_URL=${{ secrets.NEXT_PUBLIC_API_V2_URL }} | ||
| NEXT_PUBLIC_EMBED_LIB_URL=${{ format('https://{0}/embed-link/embed.js', needs.prepare-release.outputs.deploy_env == 'production' && secrets.DOMAIN_NAME_PROD || secrets.DOMAIN_NAME_STAG) }} | ||
| NEXT_PUBLIC_ONEHASH_URL=${{ secrets.NEXT_PUBLIC_ONEHASH_URL }} | ||
| NEXT_PUBLIC_SENDGRID_SENDER_NAME=${{ secrets.NEXT_PUBLIC_SENDGRID_SENDER_NAME }} | ||
| NEXT_PUBLIC_SENTRY_DSN=${{ needs.prepare-release.outputs.deploy_env == 'production' && secrets.NEXT_PUBLIC_SENTRY_DSN_PROD || secrets.NEXT_PUBLIC_SENTRY_DSN_STAG }} | ||
| NEXT_PUBLIC_LOGGER_LEVEL=${{ secrets.NEXT_PUBLIC_LOGGER_LEVEL }} | ||
| NEXT_PUBLIC_TEAM_IMPERSONATION=${{ secrets.NEXT_PUBLIC_TEAM_IMPERSONATION }} | ||
| NEXT_PUBLIC_APP_NAME=${{ secrets.NEXT_PUBLIC_APP_NAME }} | ||
| NEXT_PUBLIC_COMPANY_NAME=${{ secrets.BRAND_NAME }} | ||
| NEXT_PUBLIC_MINUTES_TO_BOOK=${{ secrets.NEXT_PUBLIC_MINUTES_TO_BOOK }} | ||
| NEXT_PUBLIC_BOOKER_NUMBER_OF_DAYS_TO_LOAD=${{ secrets.NEXT_PUBLIC_BOOKER_NUMBER_OF_DAYS_TO_LOAD }} | ||
| NEXT_PUBLIC_CALENDLY_OAUTH_URL=${{ secrets.NEXT_PUBLIC_CALENDLY_OAUTH_URL }} | ||
| NEXT_PUBLIC_CALENDLY_API_BASE_URL=${{ secrets.NEXT_PUBLIC_CALENDLY_API_BASE_URL }} | ||
| NEXT_PUBLIC_CALENDLY_CLIENT_ID=${{ needs.prepare-release.outputs.deploy_env == 'production' && secrets.NEXT_PUBLIC_CALENDLY_CLIENT_ID_PROD || secrets.NEXT_PUBLIC_CALENDLY_CLIENT_ID_STAG }} |
There was a problem hiding this comment.
$GITHUB_OUTPUT unbound on remote host — migration step always fails
The script: block runs on the EC2 host via appleboy/ssh-action, but GITHUB_OUTPUT is not listed in envs: and therefore is not set on the remote host. With set -euo pipefail active, the -u flag causes bash to abort with GITHUB_OUTPUT: unbound variable as soon as the echo "migrations_applied=..." >> "$GITHUB_OUTPUT" line is reached. Even if GITHUB_OUTPUT were somehow non-empty, it would point to a runner-local file path that does not exist on the EC2 host. Either way, every deployment that performs a migration will fail at this step, blocking deploy-api and deploy-web.
| git checkout "$TARGET_SHA" || git checkout "origin/$BRANCH_NAME" | ||
| aws configure set aws_access_key_id "$AWS_ACCESS_KEY_ID" | ||
| aws configure set aws_secret_access_key "$AWS_SECRET_ACCESS_KEY" | ||
| aws configure set default.region "$AWS_REGION" | ||
| aws ecr get-login-password --region "$AWS_REGION" \ |
There was a problem hiding this comment.
AWS credentials written permanently to EC2 host config
aws configure set writes to ~/.aws/credentials on the EC2 host and is never cleaned up by the script. These long-lived IAM key credentials will persist on the instance between deployments and across any other sessions on that host. Prefer injecting credentials as ephemeral environment variables (the environment already contains AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, and AWS_REGION via envs:) — the AWS CLI picks them up automatically without needing aws configure set, and they vanish when the SSH session ends.
| git fetch origin "$TARGET_SHA" --depth 1 || true | ||
| git checkout "$TARGET_SHA" || git checkout "origin/$BRANCH_NAME" |
There was a problem hiding this comment.
Schema validation silently falls back to wrong codebase version
If the shallow clone cannot resolve TARGET_SHA (e.g. the commit is old enough to be absent from a --depth 1 fetch), git checkout "$TARGET_SHA" fails and the || git checkout "origin/$BRANCH_NAME" fallback silently runs the validation against the branch HEAD — a newer commit. validate-rollback-schema.sh may then report the schema as compatible even when TARGET_SHA's actual migrations would be incompatible. The fallback should be removed and replaced with a hard failure. This pattern appears identically at line 177 (in the rollback-app job).
| expires_at="$(iso_from_epoch "$now_epoch")" | ||
| acquired_at="$(iso_from_epoch "$now_epoch")" | ||
| new_expires="$(iso_from_epoch "$expires_at_epoch")" |
There was a problem hiding this comment.
The variable
expires_at is assigned iso_from_epoch "$now_epoch" — that is the current time, not the lock expiry time. The jq payload correctly uses $new_expires for the expires_at field, so expires_at is never referenced after this line and the lock object records the right expiry. This is a dead variable that is misleading: it looks like the expiry time but contains "now", making the code harder to audit.
| expires_at="$(iso_from_epoch "$now_epoch")" | |
| acquired_at="$(iso_from_epoch "$now_epoch")" | |
| new_expires="$(iso_from_epoch "$expires_at_epoch")" | |
| acquired_at="$(iso_from_epoch "$now_epoch")" | |
| new_expires="$(iso_from_epoch "$expires_at_epoch")" |
| fail "ENABLE_DB_BACKUP=true but DB_BACKUP_COMMAND is not set — cannot run backup" | ||
| fi | ||
| log INFO "Backup enabled — executing: ${DB_BACKUP_COMMAND}" | ||
| eval "$DB_BACKUP_COMMAND" || fail "Backup step failed — aborting migration. Database may be in an inconsistent state." |
There was a problem hiding this comment.
eval "$DB_BACKUP_COMMAND" executes the secret value as a shell string, allowing embedded metacharacters (; rm -rf ..., subshell $(...), etc.) to run arbitrary commands. While DB_BACKUP_COMMAND is a repository secret today, prefer a safer invocation pattern — pass the command as an argument to a pre-approved script or use bash -c "$DB_BACKUP_COMMAND".
| eval "$DB_BACKUP_COMMAND" || fail "Backup step failed — aborting migration. Database may be in an inconsistent state." | |
| bash -c "$DB_BACKUP_COMMAND" || fail "Backup step failed — aborting migration. Database may be in an inconsistent state." |
Replace sparse checkout with full repository checkout in migrate-db step to resolve workspace dependency resolution failures (@calcom/lib 404 errors). Changes: - migrate-db: clone full repo instead of sparse checkout - migrate.sh: add DEFER_CLEANUP env support to defer cleanup until downstream stages complete - deploy-all.yml: add cleanup steps to verify and rollback-after-promotion jobs - migrate-db: set DEFER_CLEANUP=true to preserve checkout for deploy stages This ensures all workspace packages resolve correctly during yarn install while deferring cleanup until after deployment pipeline completes successfully. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Summary
Introduces a comprehensive unified deployment pipeline that includes preflight checks, migration validation, rollback mechanisms, and flow hardening to improve deployment reliability and safety.
Changes
Testing Notes