Skip to content

feat: add hermetic backend e2e harness#8004

Open
Git-on-my-level wants to merge 13 commits into
mainfrom
feat/e2e-hermetic-harness
Open

feat: add hermetic backend e2e harness#8004
Git-on-my-level wants to merge 13 commits into
mainfrom
feat/e2e-hermetic-harness

Conversation

@Git-on-my-level

Copy link
Copy Markdown
Collaborator

Summary

Adds a manually runnable hermetic backend e2e harness under backend/testing/e2e/.

The harness imports the real FastAPI backend and exercises selected CRUD, failure-mode, and migration-safety paths against fake/disabled external-service boundaries.

What this covers

  • Real FastAPI app import and TestClient execution
  • Firestore replacement using fake-firestore-backed helpers
  • Redis replacement using fakeredis
  • Deterministic fixture seeding for conversations, memories, and action items
  • CRUD golden path coverage for action items, memories, and seeded conversations
  • Migration-safety coverage for legacy conversation/memory shapes
  • Failure/edge coverage for auth, invalid ids, unicode content, long action-item descriptions, and fake Redis paths
  • Local-only network guard during pytest execution:
    • blocks non-local DNS lookup
    • blocks non-local socket connect/connect_ex/create_connection/sendto
  • External credential hardening:
    • overwrites API-key env vars with fake values
    • clears SERVICE_ACCOUNT_JSON, GOOGLE_APPLICATION_CREDENTIALS, and PINECONE_API_KEY

Manual dogfood

Ran locally after commit:

bash backend/testing/e2e/run.sh -q --tb=short

Result:

37 passed, 6 skipped, 35 warnings
✅ All e2e tests passed!

Explicit v1 limitations

The harness intentionally keeps incomplete provider flows as explicit skips rather than silent passes:

  • Full LLM-backed conversation reprocessing/action-item/memory extraction is skipped until deterministic LLM clients are wired into those backend code paths.
  • STT WebSocket timeout/error simulation is skipped until a Deepgram-compatible async WS fake is implemented.
  • run.sh can bootstrap missing Python fake dependencies with pip install before pytest imports the network guard; once pytest starts, non-local DNS/socket attempts are blocked.

Notes

This is manual-only for now; no CI wiring in this PR.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 713a9e1764

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread backend/testing/e2e/run.sh Outdated
}

echo "Checking/installing fake dependencies..."
install_if_missing fake_firestore fake-firestore

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Track e2e dependencies instead of pip-installing them

In a clean checkout or any CI/offline developer environment where PyPI is blocked, bash backend/testing/e2e/run.sh exits before pytest because these required packages are not declared in backend/requirements.txt or another tracked lockfile and the script tries to fetch them dynamically. I checked repo-wide references with rg; the new harness is the only place these packages are named, so the advertised one-command harness is not reproducible without network access.

Useful? React with 👍 / 👎.

Comment thread backend/testing/e2e/fakes/redis.py Outdated
original_init(self, *args, **kwargs)
fake = get_fake_redis()
# Delegate all key methods to the fake
for attr in (

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Delegate Redis scripts to fakeredis

Rate-limited endpoints in these e2e tests do not actually exercise fakeredis because database/redis_db.py creates _RATE_LIMIT_LUA = r.register_script(...) at import time, but this delegation list never forwards register_script/evalsha/script_load to the fake client. In contexts like /v3/memories, the script remains bound to the real redis.Redis client and failures are swallowed by the fail-open path, so the harness can pass while Redis behavior is not being tested.

Useful? React with 👍 / 👎.

@greptile-apps

greptile-apps Bot commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

Adds a hermetic backend e2e test harness under backend/testing/e2e/ that imports the real FastAPI app, replaces Firestore/Redis with in-memory fakes, overwrites all external API keys, and installs a socket guard to block non-local network calls — enabling CRUD, failure-mode, and migration-safety coverage without any live services.

  • conftest.py wires the entire fake layer: Google auth patch → env hardening → socket guard → MockFirestore/FakeRedis patches → TestClient factory. The sendto guard only handles the 2-arg call form, leaving the 3-arg (data, flags, address) variant mismatched. The session-scoped Firestore store with hardcoded UID \"123\" accumulates state across all tests.
  • fakes/firestore.py delegates collection, batch, and get_all to the mock but never patches document(), leaving root-level document access on the uninstrumented client.
  • fakes/redis.py bridges only scalar operations; hash/list/sorted-set families are not delegated to FakeRedis.

Confidence Score: 4/5

Safe to merge as a manually-run local harness; the defects are in the test infrastructure itself and do not touch production code.

The network guard's sendto implementation has a wrong signature for the 3-arg call form — instead of raising a clear blocked-network-call assertion it would raise a TypeError, crashing the harness unexpectedly. The remaining issues (missing document() delegation, incomplete Redis method bridging, shared Firestore state) are all contained within the test layer and are non-blocking for the stated v1 scope, but they could cause confusing failures as more test cases are added.

conftest.py (socket guard sendto signature), fakes/firestore.py (missing document() delegation), and fakes/redis.py (incomplete method bridging) are the three files most likely to cause unexpected failures as the harness grows.

Important Files Changed

Filename Overview
backend/testing/e2e/conftest.py Core harness setup: env hardening, Google auth patch, socket guard, TestClient factory. The sendto guard has a wrong signature for the 3-arg form, and session-scoped Firestore with hardcoded UID "123" causes test state leakage.
backend/testing/e2e/fakes/firestore.py MockFirestore setup and seed/read helpers. patch_google_firestore delegates collection, batch, and get_all but never patches self.document, leaving root-level document access unguarded.
backend/testing/e2e/fakes/redis.py FakeRedis setup and Redis.Redis monkeypatching. Bridges only scalar operations; hash, list, and sorted-set families are not delegated, so those backend calls fall through to the real unconnected client.
backend/testing/e2e/fakes/llm.py Deterministic LLM response builders and pytest-httpserver configurators. make_openai_router_response is defined after its caller configure_llm_fakes; works at runtime but is confusing.
backend/testing/e2e/fakes/embeddings.py Fake embedding/diarization/VAD HTTP handlers using pytest-httpserver. Clean scaffold; unused until LLM processing tests are wired.
backend/testing/e2e/fakes/stt.py STT fake scaffold. Deepgram WS fake is explicitly deferred (TODO). HTTP pre-recorded endpoint helper present; configure_stt_timeout is intentionally a no-op.
backend/testing/e2e/fakes/storage.py Temp-dir GCS replacement with upload/download/delete/list helpers. Clean and self-contained.
backend/testing/e2e/run.sh One-command entry point: venv detection, fake-dep bootstrapping, pytest invocation, exit-code propagation. Straightforward and correct.
backend/testing/e2e/test_crud.py CRUD golden-path tests for conversations, action items, and memories. Solid route coverage; test_list_action_items swallows create-call failures silently.
backend/testing/e2e/test_conversation_processing.py Conversation processing and state-transition tests. LLM-backed paths are appropriately skipped; seed-and-read and discarded-filter tests are active.
backend/testing/e2e/test_failure_modes.py Failure-mode and edge-case tests. Auth/404/unicode/long-desc edge cases are active; LLM 500 and STT timeout tests are explicitly skipped with clear TODOs.
backend/testing/e2e/test_migration_safety.py Legacy-format reading, idempotency, field-shape evolution, and category-enum migration tests. Well-structured with clear intent for each scenario.

Sequence Diagram

%%{init: {'theme': 'neutral'}}%%
sequenceDiagram
    participant Shell as run.sh
    participant Pytest as pytest
    participant Conf as conftest.py
    participant Fakes as fakes/*
    participant App as FastAPI app (main.py)
    participant DB as MockFirestore / FakeRedis

    Shell->>Pytest: python -m pytest testing/e2e/
    Pytest->>Conf: import conftest.py
    Conf->>Conf: patch google.auth.default
    Conf->>Conf: _set_e2e_env() — overwrite credentials
    Conf->>Conf: install socket guard (connect/getaddrinfo)
    Conf->>Fakes: setup_fake_firestore() → MockFirestore
    Conf->>Fakes: setup_fake_redis() → FakeRedis
    Conf->>Fakes: patch_google_firestore() — delegate collection/batch/get_all
    Conf->>Fakes: patch_redis_client() — delegate scalar ops
    Conf->>App: import main → FastAPI app constructed
    App-->>Conf: app cached in _app_cache

    loop each test
        Pytest->>Conf: client fixture → TestClient(app)
        Pytest->>Fakes: seed_conversation / seed_memory / seed_action_item
        Fakes->>DB: MockFirestore.collection().document().set()
        Pytest->>App: TestClient.get / post / patch / delete
        App->>DB: Firestore query / Redis op (via patched clients)
        DB-->>App: in-memory result
        App-->>Pytest: HTTP response
        Pytest->>Pytest: assert status / body
    end
Loading
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
sequenceDiagram
    participant Shell as run.sh
    participant Pytest as pytest
    participant Conf as conftest.py
    participant Fakes as fakes/*
    participant App as FastAPI app (main.py)
    participant DB as MockFirestore / FakeRedis

    Shell->>Pytest: python -m pytest testing/e2e/
    Pytest->>Conf: import conftest.py
    Conf->>Conf: patch google.auth.default
    Conf->>Conf: _set_e2e_env() — overwrite credentials
    Conf->>Conf: install socket guard (connect/getaddrinfo)
    Conf->>Fakes: setup_fake_firestore() → MockFirestore
    Conf->>Fakes: setup_fake_redis() → FakeRedis
    Conf->>Fakes: patch_google_firestore() — delegate collection/batch/get_all
    Conf->>Fakes: patch_redis_client() — delegate scalar ops
    Conf->>App: import main → FastAPI app constructed
    App-->>Conf: app cached in _app_cache

    loop each test
        Pytest->>Conf: client fixture → TestClient(app)
        Pytest->>Fakes: seed_conversation / seed_memory / seed_action_item
        Fakes->>DB: MockFirestore.collection().document().set()
        Pytest->>App: TestClient.get / post / patch / delete
        App->>DB: Firestore query / Redis op (via patched clients)
        DB-->>App: in-memory result
        App-->>Pytest: HTTP response
        Pytest->>Pytest: assert status / body
    end
Loading

Comments Outside Diff (3)

  1. backend/testing/e2e/fakes/firestore.py, line 663-681 (link)

    P2 document() method not delegated to mock

    patch_google_firestore delegates collection, batch, and get_all to the mock instance and stores self._mock = mock, but never patches self.document. Any backend code that calls db.document("path") to obtain a root-level document reference will hit the real firestore.Client.document() — whose internal state (_database, etc.) was never properly initialized — and raise an AttributeError. The stored self._mock reference has no effect since self.document still points to the uninstrumented class method.

  2. backend/testing/e2e/fakes/redis.py, line 953-974 (link)

    P2 Hash, sorted-set, and list operations not delegated

    patch_redis_client bridges a fixed allowlist of scalar operations but omits the hash (hget, hset, hmset, hgetall, hdel), list (lpush, lrange, rpop), and sorted-set (zadd, zrange, zrangebyscore) families. Any backend code that uses these operations will call them on the real, unconnected redis.Redis instance rather than the FakeRedis instance. The backend may already handle these failures gracefully (fail-open), but the gap means that data written via the Redis module is invisible to the fake store and tests that rely on Redis-backed state for these operations will silently receive wrong results rather than surfacing an error.

  3. backend/testing/e2e/fakes/llm.py, line 857-862 (link)

    P2 make_openai_router_response called before its definition

    configure_llm_fakes calls make_openai_router_response() before that helper is defined in the file. Python resolves names at call time so this works at runtime, but the out-of-order definition is surprising when reading top-to-bottom. The helper should be defined above the function that uses it, or the call can be replaced with the make_openai_chat_response it delegates to.

    Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Reviews (1): Last reviewed commit: "feat: add hermetic backend e2e harness" | Re-trigger Greptile

Comment on lines +274 to +276
"""Return the fixed dev-test UID."""
return DEV_UID

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 sendto guard only handles the 2-arg form

socket.socket.sendto has two call forms: sendto(data, address) and sendto(data, flags, address). The patched signature _guarded_socket_sendto(self, data, address) only covers the first. If any library in the test process calls the 3-arg form (e.g. sock.sendto(b"payload", 0, ("host", 80))), Python invokes _guarded_socket_sendto(sock, b"payload", 0, ("host", 80)) — four positional arguments against a three-parameter function — raising an unhandled TypeError instead of the expected AssertionError. The guard would crash the test with an opaque error rather than surfacing a blocked network call.

Comment on lines +300 to +317


# ─── Utility fixtures ──────────────────────────────────────────────────


@pytest.fixture()
def fresh_uid():
"""Generate a unique UID per test for isolation."""
import uuid

return str(uuid.uuid4())


def _now_iso() -> str:
return datetime.now(timezone.utc).strftime("%Y-%m-%dT%H:%M:%SZ")


@pytest.fixture()

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Session-scoped Firestore store + hardcoded UID "123" accumulates state across tests

fake_firestore is session-scoped, so the same in-memory store is shared by every test in the session. Multiple test classes (TestConversationCRUD, TestConversationProcessing, TestFailureModes, TestMigrationSafety) each seed_conversation("123", ...) with fixed document IDs such as "fail-persist-001", "discard-test-active", "migration-idempotent-001", etc. Tests that assert on list responses (e.g., test_discarded_conversation_filtered, test_mixed_format_coexistence) see all documents seeded by previously-run tests, making the assertions dependent on execution order. A test that passes in isolation can fail when run after another test that left conflicting data in the shared store.

@Git-on-my-level

Copy link
Copy Markdown
Collaborator Author

Follow-up pushed in 97186a655 after reviewing Codex/Greptile feedback and a 3-agent harness committee pass.

Changes made:

  • Removed dynamic pip install from run.sh; added pinned backend/testing/e2e/requirements.txt.
  • Disabled dotenv reloading and cleared proxy env vars so local .env / proxy config cannot rehydrate real service credentials.
  • Added harness guard tests for dotenv disabling and non-local DNS/TCP/UDP blocking, including both sendto(data, address) and sendto(data, flags, address) forms.
  • Added per-test Firestore/Redis fake-state isolation around the fixed local-dev UID.
  • Broadened Firestore delegation to the fake client surface, including document() / collection_group() from MockFirestore.
  • Broadened Redis delegation to fakeredis, including constructor/from_url paths and Redis script/hash/list/zset families exposed by fakeredis.
  • Tightened previously permissive assertions in CRUD/edge tests to assert create statuses, captured IDs, durable postconditions, and exact current behavior.
  • Reworded docs/tests from “migration safety” to “legacy shape compatibility” where no production migration script is executed.
  • Kept LLM/STT/failure-provider gaps explicit as skipped/deferred v2 work rather than counting them as green coverage.

Dogfood after the follow-up:

bash backend/testing/e2e/run.sh -q --tb=short
42 passed, 6 skipped, 35 warnings
✅ All e2e tests passed!

GitHub Lint & Format Check passed on the new commit.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 97186a6553

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".


def _guarded_getaddrinfo(host, port, *args, **kwargs):
if host is not None and host not in _ALLOWED_NETWORK_HOSTS:
raise AssertionError(f"Hermetic e2e blocked DNS lookup for {host!r}")

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Make network guard failures uncatchable

When a real outbound call happens inside backend code that intentionally catches Exception (for example Stripe startup validation or vector/persona fallback paths), this AssertionError (and the connect guard above) can be swallowed; with the client fixture also disabling logging, the harness still passes even though a real service call was attempted. To preserve the hermetic guarantee in those contexts, raise a sentinel that broad except Exception handlers do not catch or record the attempt and fail during teardown.

Useful? React with 👍 / 👎.

@Git-on-my-level

Copy link
Copy Markdown
Collaborator Author

Expanded the hermetic harness across the five requested areas in fabdc5a0f.

What changed:

  • Deterministic conversation processing seam

    • Added a reprocess-route test that fakes only the provider-heavy processing function while still exercising real auth, route validation, model serialization, Firestore update, persisted readback, and action-item queryability.
  • Listen/STT route seam

    • Added /v4/web/listen websocket coverage for first-message auth, query parsing, and custom-STT dispatch using a fake stream handler. This is intentionally route/seam coverage, not a full Deepgram protocol fake.
  • Storage/GCS fake

    • Replaced helper-only storage fake with a filesystem-backed fake google.cloud.storage.Client wired before backend import.
    • Added speech-profile route tests for profile existence, signed URL generation, additional sample listing, and sample deletion.
    • Added a harness guard asserting the backend imported utils.other.storage.storage_client is the fake client.
  • Webhooks and task integrations

    • Added developer webhook config/status roundtrip and realtime webhook payload capture with httpx.MockTransport.
    • Added task-integration save/list/default coverage and Todoist task-creation outbound payload capture.
    • Kept the fake-firestore nested task-integration single-doc/delete limitation documented instead of overclaiming delete coverage.
  • User/auth/profile/account

    • Added route coverage for auth guard, profile 410/read, onboarding, language/transcription prefs, people CRUD, notification settings, assistant settings merge, AI profile merge, and BYOK activation/deactivation.

Other harness improvements:

  • Broadened per-test isolation to user doc/subcollections and fake GCS state.
  • Fixed run.sh argument forwarding so -k/specific selections work correctly.
  • Normalized seeded action-item timestamps to match backend sorting expectations.
  • README now reflects partial vs full coverage honestly.

Dogfood:

bash backend/testing/e2e/run.sh -q --tb=short
54 passed, 6 skipped, 35 warnings
✅ All e2e tests passed!

Also verified filtered runner behavior:

bash backend/testing/e2e/run.sh -q --tb=short -k listen_stt
1 passed, 58 deselected

Independent review of the staged diff passed with no blockers.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: fabdc5a0f0

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +285 to +286
tc = TestClient(app)
yield tc

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Run the FastAPI lifespan in the e2e client

In this fixture the TestClient is yielded without entering it as a context manager. FastAPI/Starlette only drive lifespan handlers when the client is used as a context manager, so the backend startup/shutdown handlers in backend/main.py (startup_event/shutdown_event) are skipped for every e2e request. That means regressions in startup tasks or shutdown cleanup can pass this harness; wrap the yield in with TestClient(app) as tc: so these tests exercise the real app lifecycle.

Useful? React with 👍 / 👎.

Comment on lines +108 to +110
def download_as_bytes(self, *args, **kwargs) -> bytes:
if not self.exists():
raise FileNotFoundError(self.name)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Raise GCS NotFound from missing fake blobs

When a backend path tests a missing GCS object through this fake, it will see FileNotFoundError instead of google.cloud.exceptions.NotFound/BlobNotFound. The real storage helpers catch the Google exception in paths like download_syncing_temporal_file, download_playback_artifact, and delete_blob, so missing-object e2e scenarios either fail/500 under the fake or have to avoid those branches even though production handles them. Raise the same NotFound class here.

Useful? React with 👍 / 👎.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 2181d5aad6

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@@ -0,0 +1,171 @@
# Hermetic Backend E2E Harness

A manually runnable integration test suite that imports the **real omi FastAPI backend** and exercises selected routes against **faked or disabled external dependencies**. It is intended as a local dogfood harness first; there is no CI wiring yet.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Wire the new e2e suite into CI

This adds a large set of backend test files while explicitly leaving them with “no CI wiring yet”, but backend/AGENTS.md requires new test files to be added to backend/test.sh; I also checked backend/test.sh and workflow YAML with rg and found no testing/e2e invocation. As a result these 50+ hermetic scenarios can silently rot and won’t catch regressions unless a developer remembers to run the separate script manually.

Useful? React with 👍 / 👎.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d726e5bbd1

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".


def _receive_until(websocket, predicate, *, limit=20):
for _ in range(limit):
message = websocket.receive()

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Bound websocket receives with a timeout

If the listen websocket regresses by keeping the connection open but not emitting the expected service_status or segment payload, this blocking receive() never reaches the limit check, so bash backend/testing/e2e/run.sh -k listen_stt can hang instead of failing. Add a per-receive timeout (or run the receive under a bounded wait) so missing websocket events produce a deterministic test failure.

Useful? React with 👍 / 👎.

Comment on lines +71 to +73
self.metadata = None
self.cache_control = None
self.content_type = None

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Persist fake blob metadata across blob instances

Because metadata, cache_control, and content_type are stored only on each transient FakeBlob, production paths that upload with metadata and later call bucket.blob(...).reload().metadata on a fresh object lose that state under the fake. For example utils.other.storage.get_or_create_merged_audio() relies on cached blob metadata to detect valid cached audio, so e2e coverage for that path would rebuild or skip expiry behavior that production exercises; keep per-bucket/name blob state or a sidecar metadata file with the bytes.

Useful? React with 👍 / 👎.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ee71643b4a

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@@ -0,0 +1,4 @@
fake-firestore==0.13.1
fakeredis==2.36.2

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Install fakeredis with Lua scripting

In a clean e2e install that follows this requirements file, fakeredis is installed without its lua extra, so EVAL/EVALSHA support needed by the backend's registered Lua scripts is absent. Fresh evidence after the earlier Redis delegation fix is that this new tracked requirements file still installs core fakeredis only; routes that hit database/redis_db.py rate-limit scripts or webhook health scripts can then fail open/fallback instead of exercising the fake Redis script behavior, letting the harness report green while that Redis logic remains untested. Use fakeredis[lua] here or add the equivalent lupa dependency.

Useful? React with 👍 / 👎.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7b64da7fcf

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

count = 1
r.hset(key, 'failure_count', str(count))

disabled = count >= _DEV_FAILURE_THRESHOLD

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Return true only on the disable transition

When this non-Lua fallback is used in a Redis-compatible store without scripting, concurrent webhook failures can both pass the earlier disabled check, receive HINCRBY counts at or above the threshold, and return True; each caller then runs the auto-disable path and sends the developer notification. The Lua path serializes the disabled check/set, so this fallback is not equivalent under multi-worker traffic right as a webhook crosses the threshold.

Useful? React with 👍 / 👎.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 5be32d8d26

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

echo "ERROR: e2e pytest exceeded timeout ${PYTEST_TIMEOUT}"
fi
else
python -m pytest testing/e2e/ "$@"

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve the e2e timeout when timeout is unavailable

In environments where command -v timeout is false, this fallback runs pytest without any bound, so the advertised E2E_PYTEST_TIMEOUT protection is silently disabled and websocket/provider regressions can hang the one-command harness indefinitely. Since the script already documents that those regressions should fail instead of hanging, make the fallback enforce the same timeout or fail with a clear prerequisite error.

Useful? React with 👍 / 👎.

@kodjima33 kodjima33 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Backend e2e test harness — useful infra; approve only (feature/infra, Nik owns backend test-suite direction).

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7974ff5f5f

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

hermetic-e2e:
name: Hermetic Backend E2E
runs-on: ubuntu-latest
continue-on-error: true

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Remove the e2e job's continue-on-error

For backend PRs where bash backend/testing/e2e/run.sh fails or times out, this job is still treated as non-blocking by GitHub Actions, so the workflow can remain green and required checks won't prevent the regression from merging. That defeats the new CI coverage for the hermetic route scenarios; remove continue-on-error or restrict it to an explicitly experimental/manual path.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants