feat: add hermetic backend e2e harness#8004
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 713a9e1764
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| } | ||
|
|
||
| echo "Checking/installing fake dependencies..." | ||
| install_if_missing fake_firestore fake-firestore |
There was a problem hiding this comment.
Track e2e dependencies instead of pip-installing them
In a clean checkout or any CI/offline developer environment where PyPI is blocked, bash backend/testing/e2e/run.sh exits before pytest because these required packages are not declared in backend/requirements.txt or another tracked lockfile and the script tries to fetch them dynamically. I checked repo-wide references with rg; the new harness is the only place these packages are named, so the advertised one-command harness is not reproducible without network access.
Useful? React with 👍 / 👎.
| original_init(self, *args, **kwargs) | ||
| fake = get_fake_redis() | ||
| # Delegate all key methods to the fake | ||
| for attr in ( |
There was a problem hiding this comment.
Delegate Redis scripts to fakeredis
Rate-limited endpoints in these e2e tests do not actually exercise fakeredis because database/redis_db.py creates _RATE_LIMIT_LUA = r.register_script(...) at import time, but this delegation list never forwards register_script/evalsha/script_load to the fake client. In contexts like /v3/memories, the script remains bound to the real redis.Redis client and failures are swallowed by the fail-open path, so the harness can pass while Redis behavior is not being tested.
Useful? React with 👍 / 👎.
Greptile SummaryAdds a hermetic backend e2e test harness under
Confidence Score: 4/5Safe to merge as a manually-run local harness; the defects are in the test infrastructure itself and do not touch production code. The network guard's
Important Files Changed
Sequence Diagram%%{init: {'theme': 'neutral'}}%%
sequenceDiagram
participant Shell as run.sh
participant Pytest as pytest
participant Conf as conftest.py
participant Fakes as fakes/*
participant App as FastAPI app (main.py)
participant DB as MockFirestore / FakeRedis
Shell->>Pytest: python -m pytest testing/e2e/
Pytest->>Conf: import conftest.py
Conf->>Conf: patch google.auth.default
Conf->>Conf: _set_e2e_env() — overwrite credentials
Conf->>Conf: install socket guard (connect/getaddrinfo)
Conf->>Fakes: setup_fake_firestore() → MockFirestore
Conf->>Fakes: setup_fake_redis() → FakeRedis
Conf->>Fakes: patch_google_firestore() — delegate collection/batch/get_all
Conf->>Fakes: patch_redis_client() — delegate scalar ops
Conf->>App: import main → FastAPI app constructed
App-->>Conf: app cached in _app_cache
loop each test
Pytest->>Conf: client fixture → TestClient(app)
Pytest->>Fakes: seed_conversation / seed_memory / seed_action_item
Fakes->>DB: MockFirestore.collection().document().set()
Pytest->>App: TestClient.get / post / patch / delete
App->>DB: Firestore query / Redis op (via patched clients)
DB-->>App: in-memory result
App-->>Pytest: HTTP response
Pytest->>Pytest: assert status / body
end
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
sequenceDiagram
participant Shell as run.sh
participant Pytest as pytest
participant Conf as conftest.py
participant Fakes as fakes/*
participant App as FastAPI app (main.py)
participant DB as MockFirestore / FakeRedis
Shell->>Pytest: python -m pytest testing/e2e/
Pytest->>Conf: import conftest.py
Conf->>Conf: patch google.auth.default
Conf->>Conf: _set_e2e_env() — overwrite credentials
Conf->>Conf: install socket guard (connect/getaddrinfo)
Conf->>Fakes: setup_fake_firestore() → MockFirestore
Conf->>Fakes: setup_fake_redis() → FakeRedis
Conf->>Fakes: patch_google_firestore() — delegate collection/batch/get_all
Conf->>Fakes: patch_redis_client() — delegate scalar ops
Conf->>App: import main → FastAPI app constructed
App-->>Conf: app cached in _app_cache
loop each test
Pytest->>Conf: client fixture → TestClient(app)
Pytest->>Fakes: seed_conversation / seed_memory / seed_action_item
Fakes->>DB: MockFirestore.collection().document().set()
Pytest->>App: TestClient.get / post / patch / delete
App->>DB: Firestore query / Redis op (via patched clients)
DB-->>App: in-memory result
App-->>Pytest: HTTP response
Pytest->>Pytest: assert status / body
end
|
| """Return the fixed dev-test UID.""" | ||
| return DEV_UID | ||
|
|
There was a problem hiding this comment.
sendto guard only handles the 2-arg form
socket.socket.sendto has two call forms: sendto(data, address) and sendto(data, flags, address). The patched signature _guarded_socket_sendto(self, data, address) only covers the first. If any library in the test process calls the 3-arg form (e.g. sock.sendto(b"payload", 0, ("host", 80))), Python invokes _guarded_socket_sendto(sock, b"payload", 0, ("host", 80)) — four positional arguments against a three-parameter function — raising an unhandled TypeError instead of the expected AssertionError. The guard would crash the test with an opaque error rather than surfacing a blocked network call.
|
|
||
|
|
||
| # ─── Utility fixtures ────────────────────────────────────────────────── | ||
|
|
||
|
|
||
| @pytest.fixture() | ||
| def fresh_uid(): | ||
| """Generate a unique UID per test for isolation.""" | ||
| import uuid | ||
|
|
||
| return str(uuid.uuid4()) | ||
|
|
||
|
|
||
| def _now_iso() -> str: | ||
| return datetime.now(timezone.utc).strftime("%Y-%m-%dT%H:%M:%SZ") | ||
|
|
||
|
|
||
| @pytest.fixture() |
There was a problem hiding this comment.
Session-scoped Firestore store + hardcoded UID "123" accumulates state across tests
fake_firestore is session-scoped, so the same in-memory store is shared by every test in the session. Multiple test classes (TestConversationCRUD, TestConversationProcessing, TestFailureModes, TestMigrationSafety) each seed_conversation("123", ...) with fixed document IDs such as "fail-persist-001", "discard-test-active", "migration-idempotent-001", etc. Tests that assert on list responses (e.g., test_discarded_conversation_filtered, test_mixed_format_coexistence) see all documents seeded by previously-run tests, making the assertions dependent on execution order. A test that passes in isolation can fail when run after another test that left conflicting data in the shared store.
|
Follow-up pushed in Changes made:
Dogfood after the follow-up: GitHub Lint & Format Check passed on the new commit. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 97186a6553
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
|
||
| def _guarded_getaddrinfo(host, port, *args, **kwargs): | ||
| if host is not None and host not in _ALLOWED_NETWORK_HOSTS: | ||
| raise AssertionError(f"Hermetic e2e blocked DNS lookup for {host!r}") |
There was a problem hiding this comment.
Make network guard failures uncatchable
When a real outbound call happens inside backend code that intentionally catches Exception (for example Stripe startup validation or vector/persona fallback paths), this AssertionError (and the connect guard above) can be swallowed; with the client fixture also disabling logging, the harness still passes even though a real service call was attempted. To preserve the hermetic guarantee in those contexts, raise a sentinel that broad except Exception handlers do not catch or record the attempt and fail during teardown.
Useful? React with 👍 / 👎.
|
Expanded the hermetic harness across the five requested areas in What changed:
Other harness improvements:
Dogfood: Also verified filtered runner behavior: Independent review of the staged diff passed with no blockers. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: fabdc5a0f0
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| tc = TestClient(app) | ||
| yield tc |
There was a problem hiding this comment.
Run the FastAPI lifespan in the e2e client
In this fixture the TestClient is yielded without entering it as a context manager. FastAPI/Starlette only drive lifespan handlers when the client is used as a context manager, so the backend startup/shutdown handlers in backend/main.py (startup_event/shutdown_event) are skipped for every e2e request. That means regressions in startup tasks or shutdown cleanup can pass this harness; wrap the yield in with TestClient(app) as tc: so these tests exercise the real app lifecycle.
Useful? React with 👍 / 👎.
| def download_as_bytes(self, *args, **kwargs) -> bytes: | ||
| if not self.exists(): | ||
| raise FileNotFoundError(self.name) |
There was a problem hiding this comment.
Raise GCS NotFound from missing fake blobs
When a backend path tests a missing GCS object through this fake, it will see FileNotFoundError instead of google.cloud.exceptions.NotFound/BlobNotFound. The real storage helpers catch the Google exception in paths like download_syncing_temporal_file, download_playback_artifact, and delete_blob, so missing-object e2e scenarios either fail/500 under the fake or have to avoid those branches even though production handles them. Raise the same NotFound class here.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 2181d5aad6
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| @@ -0,0 +1,171 @@ | |||
| # Hermetic Backend E2E Harness | |||
|
|
|||
| A manually runnable integration test suite that imports the **real omi FastAPI backend** and exercises selected routes against **faked or disabled external dependencies**. It is intended as a local dogfood harness first; there is no CI wiring yet. | |||
There was a problem hiding this comment.
Wire the new e2e suite into CI
This adds a large set of backend test files while explicitly leaving them with “no CI wiring yet”, but backend/AGENTS.md requires new test files to be added to backend/test.sh; I also checked backend/test.sh and workflow YAML with rg and found no testing/e2e invocation. As a result these 50+ hermetic scenarios can silently rot and won’t catch regressions unless a developer remembers to run the separate script manually.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: d726e5bbd1
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
|
||
| def _receive_until(websocket, predicate, *, limit=20): | ||
| for _ in range(limit): | ||
| message = websocket.receive() |
There was a problem hiding this comment.
Bound websocket receives with a timeout
If the listen websocket regresses by keeping the connection open but not emitting the expected service_status or segment payload, this blocking receive() never reaches the limit check, so bash backend/testing/e2e/run.sh -k listen_stt can hang instead of failing. Add a per-receive timeout (or run the receive under a bounded wait) so missing websocket events produce a deterministic test failure.
Useful? React with 👍 / 👎.
| self.metadata = None | ||
| self.cache_control = None | ||
| self.content_type = None |
There was a problem hiding this comment.
Persist fake blob metadata across blob instances
Because metadata, cache_control, and content_type are stored only on each transient FakeBlob, production paths that upload with metadata and later call bucket.blob(...).reload().metadata on a fresh object lose that state under the fake. For example utils.other.storage.get_or_create_merged_audio() relies on cached blob metadata to detect valid cached audio, so e2e coverage for that path would rebuild or skip expiry behavior that production exercises; keep per-bucket/name blob state or a sidecar metadata file with the bytes.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: ee71643b4a
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| @@ -0,0 +1,4 @@ | |||
| fake-firestore==0.13.1 | |||
| fakeredis==2.36.2 | |||
There was a problem hiding this comment.
Install fakeredis with Lua scripting
In a clean e2e install that follows this requirements file, fakeredis is installed without its lua extra, so EVAL/EVALSHA support needed by the backend's registered Lua scripts is absent. Fresh evidence after the earlier Redis delegation fix is that this new tracked requirements file still installs core fakeredis only; routes that hit database/redis_db.py rate-limit scripts or webhook health scripts can then fail open/fallback instead of exercising the fake Redis script behavior, letting the harness report green while that Redis logic remains untested. Use fakeredis[lua] here or add the equivalent lupa dependency.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 7b64da7fcf
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| count = 1 | ||
| r.hset(key, 'failure_count', str(count)) | ||
|
|
||
| disabled = count >= _DEV_FAILURE_THRESHOLD |
There was a problem hiding this comment.
Return true only on the disable transition
When this non-Lua fallback is used in a Redis-compatible store without scripting, concurrent webhook failures can both pass the earlier disabled check, receive HINCRBY counts at or above the threshold, and return True; each caller then runs the auto-disable path and sends the developer notification. The Lua path serializes the disabled check/set, so this fallback is not equivalent under multi-worker traffic right as a webhook crosses the threshold.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 5be32d8d26
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| echo "ERROR: e2e pytest exceeded timeout ${PYTEST_TIMEOUT}" | ||
| fi | ||
| else | ||
| python -m pytest testing/e2e/ "$@" |
There was a problem hiding this comment.
Preserve the e2e timeout when timeout is unavailable
In environments where command -v timeout is false, this fallback runs pytest without any bound, so the advertised E2E_PYTEST_TIMEOUT protection is silently disabled and websocket/provider regressions can hang the one-command harness indefinitely. Since the script already documents that those regressions should fail instead of hanging, make the fallback enforce the same timeout or fail with a clear prerequisite error.
Useful? React with 👍 / 👎.
kodjima33
left a comment
There was a problem hiding this comment.
Backend e2e test harness — useful infra; approve only (feature/infra, Nik owns backend test-suite direction).
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 7974ff5f5f
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| hermetic-e2e: | ||
| name: Hermetic Backend E2E | ||
| runs-on: ubuntu-latest | ||
| continue-on-error: true |
There was a problem hiding this comment.
Remove the e2e job's continue-on-error
For backend PRs where bash backend/testing/e2e/run.sh fails or times out, this job is still treated as non-blocking by GitHub Actions, so the workflow can remain green and required checks won't prevent the regression from merging. That defeats the new CI coverage for the hermetic route scenarios; remove continue-on-error or restrict it to an explicitly experimental/manual path.
Useful? React with 👍 / 👎.
Summary
Adds a manually runnable hermetic backend e2e harness under
backend/testing/e2e/.The harness imports the real FastAPI backend and exercises selected CRUD, failure-mode, and migration-safety paths against fake/disabled external-service boundaries.
What this covers
TestClientexecutionSERVICE_ACCOUNT_JSON,GOOGLE_APPLICATION_CREDENTIALS, andPINECONE_API_KEYManual dogfood
Ran locally after commit:
Result:
Explicit v1 limitations
The harness intentionally keeps incomplete provider flows as explicit skips rather than silent passes:
run.shcan bootstrap missing Python fake dependencies withpip installbefore pytest imports the network guard; once pytest starts, non-local DNS/socket attempts are blocked.Notes
This is manual-only for now; no CI wiring in this PR.