Skip to content

ci: add EC2 deployment workflow#2

Merged
anxi01 merged 14 commits into
mainfrom
codex/flashcard-aws-deploy-workflow
May 13, 2026
Merged

ci: add EC2 deployment workflow#2
anxi01 merged 14 commits into
mainfrom
codex/flashcard-aws-deploy-workflow

Conversation

@anxi01

@anxi01 anxi01 commented May 12, 2026

Copy link
Copy Markdown
Member

Add a Docker-based production deployment path for the FastAPI API and Postgres on a single EC2 instance. Document required GitHub secrets and make the settings/test setup compatible with production compose environment values.

Add a Docker-based production deployment path for the FastAPI API and Postgres on a single EC2 instance. Document required GitHub secrets and make the settings/test setup compatible with production compose environment values.
@anxi01 anxi01 self-assigned this May 12, 2026
Base automatically changed from codex/flashcard-api to main May 12, 2026 11:12

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e5c8baf736

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread .github/workflows/deploy-ec2.yml Outdated
mkdir -p ~/.ssh
printf '%s\n' "$EC2_SSH_KEY" > ~/.ssh/linko-ec2
chmod 600 ~/.ssh/linko-ec2
ssh-keyscan -p "${EC2_PORT:-22}" "$EC2_HOST" >> ~/.ssh/known_hosts

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Pin EC2 host key instead of trusting ssh-keyscan

Fetching known_hosts with ssh-keyscan during the same CI run that performs deployment accepts whatever key is presented on the network path at that moment, so StrictHostKeyChecking yes no longer guarantees server authenticity. In a MITM/DNS-hijack scenario, the workflow would trust and deploy to an attacker-controlled host. Store the expected EC2 host key (or fingerprint) in a secret and write that pinned value into known_hosts instead of discovering it live.

Useful? React with 👍 / 👎.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 75ee6536cd

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

name: Deploy
runs-on: ubuntu-latest
needs: test
if: github.event_name == 'workflow_dispatch' || github.ref == 'refs/heads/main'

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Restrict manual deploys to main branch

The deploy guard currently allows every workflow_dispatch run, so anyone triggering this workflow from a non-main branch (or tag) will still execute the production deploy job. Because this bypasses the main branch check used for push, it can ship unreviewed or experimental commits to EC2; tighten the condition so manual runs also require github.ref == 'refs/heads/main' (or an explicit input gate).

Useful? React with 👍 / 👎.

@anxi01 anxi01 force-pushed the codex/flashcard-aws-deploy-workflow branch from 75ee653 to f9f3f1b Compare May 12, 2026 11:26

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 61bf74381a

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread .github/workflows/deploy-ec2.yml Outdated
name: Deploy
runs-on: ubuntu-latest
needs: test
if: github.event_name == 'workflow_dispatch' || github.event_name == 'pull_request' || github.ref == 'refs/heads/main'

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Prevent deploy job from running on pull requests

The deploy guard explicitly includes github.event_name == 'pull_request', and the workflow itself is triggered on pull_request, so every PR that passes tests will execute the production deployment steps. This can push unmerged PR code to EC2 (and for fork PRs, it will at minimum fail when secrets are unavailable), which is a production-impacting regression for normal review flows; the deploy job should be limited to trusted refs/events only.

Useful? React with 👍 / 👎.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 3a382d6aac

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

ssh linko-ec2 "DEPLOY_PATH='${DEPLOY_PATH:-/opt/linko-server}' bash -s" <<'REMOTE'
set -euo pipefail
cd "$DEPLOY_PATH"
tar -xzf release.tar.gz

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Clean deploy directory before extracting release

The restart script untars release.tar.gz directly into the existing deploy path without removing files from previous releases, so deleted or renamed files persist on disk and are still included in subsequent Docker build contexts. This can cause production to run stale code paths even after those files were removed from Git; add a cleanup/sync step (or extract into a fresh release directory and switch) before building.

Useful? React with 👍 / 👎.

Comment thread README.md
Comment on lines +76 to +82
- `EC2_HOST`: EC2 public host or IP address.
- `EC2_USER`: SSH user, for example `ubuntu`.
- `EC2_SSH_KEY`: private key with SSH access to the instance.
- `EC2_PORT`: optional SSH port, defaults to `22`.
- `EC2_DEPLOY_PATH`: optional deploy directory, defaults to `/opt/linko-server`.
- `PROD_ENV`: full contents of the production `.env` file. Use `.env.example` as
the template, and replace secrets before deploying.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Add missing EC2 host key secret to deployment docs

The workflow now hard-requires EC2_HOST_KEY (test -n "$EC2_HOST_KEY") during SSH setup, but the README secret list omits it, so a user following the documented setup will configure an incomplete environment and the deploy job will fail at runtime. Document this required secret alongside the other production environment secrets.

Useful? React with 👍 / 👎.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6e3915fefc

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread app/services/transcripts.py Outdated
if not vtt_files:
try:
# Fetch list of transcripts
transcript_list = YouTubeTranscriptApi.list_transcripts(video_id)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Switch to supported youtube-transcript-api v1 calls

download_youtube_captions calls YouTubeTranscriptApi.list_transcripts(video_id), but pyproject.toml allows youtube-transcript-api>=0.6.2 and current 1.x releases removed that static method. On fresh installs this raises AttributeError, which is swallowed by the broad except and causes the function to return None, so transcript generation is treated as unavailable for valid videos. Update this code to the v1 instance API (or pin to a compatible pre-1.2 version) to avoid production transcript failures.

Useful? React with 👍 / 👎.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4cfe4633ed

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread app/services/transcripts.py Outdated
Comment on lines +113 to +116
proxies = {
"http": "socks5://127.0.0.1:9050",
"https": "socks5://127.0.0.1:9050",
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Make Tor proxy optional for caption downloads

download_youtube_captions now hard-codes a SOCKS proxy at 127.0.0.1:9050 for every transcript request, so any runtime without a local Tor daemon (including the documented local uvicorn flow) will hit a connection error and immediately return None via the broad except. That propagates to generate_lesson_artifacts_task as transcript_unavailable, causing lesson generation to fail even when captions exist; add a direct-network fallback or make proxy use configurable instead of mandatory.

Useful? React with 👍 / 👎.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 527c1881bd

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread app/services/transcripts.py Outdated
return None

# Fetch the actual transcript data using proxy
raw_data = transcript.fetch(proxies=proxies)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Remove unsupported proxies arg from transcript.fetch

download_youtube_captions calls transcript.fetch(proxies=proxies), but youtube-transcript-api's Transcript.fetch does not accept a proxies keyword (proxy settings are applied when creating/listing transcripts). This raises TypeError, which is swallowed by the broad outer except, so valid videos are treated as missing captions and artifact generation falls back to transcript_unavailable.

Useful? React with 👍 / 👎.

workflow_dispatch:

concurrency:
group: deploy-ec2-${{ github.ref }}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Use one deploy concurrency group for production

The workflow-level concurrency key is scoped to github.ref, but the deploy job can run from pull_request, push, and workflow_dispatch; those events often use different refs while targeting the same EC2 host/path. That allows concurrent production deploys to race on release.tar.gz upload/extract and service restarts, which can produce inconsistent rollout state.

Useful? React with 👍 / 👎.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 771ed66096

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread app/services/transcripts.py Outdated
Comment on lines +123 to +124
transcript = transcript_list.find_transcript([lang])
source = "youtube_caption"

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Select manual transcript before labeling source

find_transcript([lang]) already falls back to autogenerated captions, so assigning source = "youtube_caption" immediately after this call mislabels generated subtitles as manual whenever a video has no human-made captions. This also makes the allow_auto flag ineffective in that scenario, because autogenerated captions can be returned before the allow_auto check path runs. In production this will store incorrect transcript provenance (and future allow_auto=False callers would still receive auto captions).

Useful? React with 👍 / 👎.

@anxi01 anxi01 force-pushed the codex/flashcard-aws-deploy-workflow branch from 771ed66 to 61bf743 Compare May 12, 2026 14:42
@chatgpt-codex-connector

Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.
To continue using code reviews, you can upgrade your account or add credits to your account and enable them for code reviews in your settings.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7abdd9579c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +95 to +97
return TranscriptResult(
source="youtube_caption",
text=full_text,

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Reject fallback transcript when requested language is missing

download_youtube_captions always returns whatever Supadata sends and never verifies that the returned transcript language matches the requested lang. In this code path the caller always requests Korean (lang="ko" in generate_lesson_artifacts_task), but the Supadata response can fall back to another available language, so videos without Korean captions can be treated as successful and generate lesson artifacts from non-Korean text instead of failing with transcript_unavailable as intended.

Useful? React with 👍 / 👎.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 2d3c0a422e

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +63 to +64
if response.status_code != 200:
return None

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Differentiate Supadata outages from missing captions

Treating every non-200 response as None causes upstream failures (for example invalid API key, rate limiting, or transient 5xx) to be misclassified as "captions unavailable" by the lesson generation flow, which then marks the lesson as failed with transcript_unavailable instead of surfacing a retriable upstream error. This makes production incidents look like user-content problems and prevents targeted retry/error handling.

Useful? React with 👍 / 👎.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 8cb0addd77

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +50 to 51
if not settings.supadata_api_key:
return None

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Fail fast when Supadata API key is unset

Returning None when SUPADATA_API_KEY is missing makes every transcript request look like "captions unavailable" to generate_lesson_artifacts_task, which then marks valid lessons as failed with transcript_unavailable. In any environment where this secret is omitted (including local/mock setups), lesson generation is universally broken but reported as a content issue instead of a configuration error, making incidents hard to diagnose and preventing correct remediation.

Useful? React with 👍 / 👎.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 3c1f1fdd42

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +214 to +218
return TranscriptResult(
source=transcript.source,
text="\n".join(segment.text for segment in selected),
segments=selected,
lang=transcript.lang,

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Fallback when sampled windows yield no segments

For long videos, sample_transcript_for_flashcards picks two random 90-second windows and then always returns the sampled result even if neither window overlaps any caption segment. This can happen with sparse transcripts (e.g., short spoken sections separated by long silence), producing an empty transcript that then drives flashcard generation with no real source text (_mock_flashcards falls back to placeholder content, and Gemini can hallucinate ungrounded cards). The previous behavior always used an actual bounded transcript slice, so this is a regression in artifact quality/reliability; add a fallback (for example limit_transcript_for_flashcards) when selected is empty.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 0f483f2248

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread app/services/transcripts.py Outdated
Comment on lines +70 to +71
if require_requested_lang and actual_lang != lang:
return None

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Accept locale variants when checking transcript language

download_youtube_captions rejects any response where data.lang is not an exact string match to the requested lang, so a valid transcript can be discarded when the provider returns a regional code (for example ko-KR / en-US / zh-TW) for a request like ko or en. In that case the function returns None, and generate_lesson_artifacts_task marks the lesson as transcript_unavailable even though matching-language captions exist. Compare normalized base language codes (or validate against availableLangs) instead of strict equality.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@anxi01 anxi01 merged commit f90427a into main May 13, 2026
2 checks passed
@anxi01 anxi01 deleted the codex/flashcard-aws-deploy-workflow branch May 13, 2026 12:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant