Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@ If you hit something requiring GitHub App / PAT / OAuth that the runtime cannot
<claude-mem-context>
# Memory Context

# [codeiq] recent context, 2026-04-28 1:14am UTC
# [codeiq] recent context, 2026-04-28 6:43am UTC

No previous sessions found.
</claude-mem-context>
140 changes: 140 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -225,6 +225,146 @@ for that specific tag for the per-commit details.
path-B board ruling, they are not to be re-introduced without an explicit
board reversal — see `shared/runbooks/engineering-standards.md` §5.1.

### Security

- **Production-readiness PR 1 of 5 — security baseline.** First half of the
audit findings catalogued under `docs/audits/2026-04-28-serve-path-prod-readiness.md`
(+ `-counter.md`). Closes audit findings #1, #7, #13 (HIGH/MEDIUM) and C2 (MEDIUM).
- **Bearer-token auth on `/api/**` and `/mcp/**`** (audit #1). Added
`spring-boot-starter-security`. New `config/security/SecurityConfig`,
`BearerAuthFilter`, `TokenResolver`. Token source priority:
`CODEIQ_MCP_TOKEN` env > `codeiq.mcp.auth.token` config > startup failure.
Constant-time compare via SHA-256 pre-hash + `MessageDigest.isEqual` —
32-byte digests on both sides defeat the length oracle. RFC 7235 §2.1
case-insensitive scheme matching (`Bearer`, `bearer`, etc.). Authorization
header value never reaches a logger from this code. Permit list:
`/`, `/index.html`, `/favicon.ico`, `/assets/**`, `/static/**`, `/error`,
`/actuator/health/{liveness,readiness}` — everything else under
`/api/**`, `/mcp/**`, `/actuator/**` requires the bearer token.
- **Fail-fast on misconfiguration** (audit #14 partial). `mode=bearer` with
no token resolved → throws at startup. `mode=none` with active `serving`
profile and `allow_unauthenticated` not explicitly set → throws at
startup. `mode=mtls` is reserved and explicitly throws "not yet
implemented" rather than silently passing through.
- **Defensive response headers** (audit #13). New
`config/security/SecurityHeadersFilter` sets `X-Content-Type-Options:
nosniff`, `X-Frame-Options: DENY`, `Content-Security-Policy: default-src
'self'; ... frame-ancestors 'none'`, `Referrer-Policy: no-referrer`,
`Permissions-Policy` disabling geolocation/camera/microphone.
`Strict-Transport-Security: max-age=31536000; includeSubDomains` is set
only when `X-Forwarded-Proto: https` is present (AKS terminates TLS at
ingress) — setting HSTS over plain HTTP would lock out misconfigured envs.
- **Uniform error envelope** (audit #7). New
`api/GlobalExceptionHandler` (`@RestControllerAdvice`,
`@Profile("serving")`) maps every uncaught exception to
`{"code","message","request_id"}` with the right HTTP status.
`IllegalArgumentException` → 400 with surfaced message.
`ResponseStatusException` → status code passes through. Anything else →
500 with generic message; the actual exception is logged at WARN with
the `request_id` so on-call can correlate without leaking stack frames
to the client. `application.yml` now sets
`server.error.include-stacktrace: never` + `include-message: never` +
`include-binding-errors: never` as belt-and-suspenders.
- **Default CORS deny-all in serving** (audit #13). `config/CorsConfig`
default changed from loopback patterns to empty. Empty means register
no mappings → Spring MVC rejects all preflighted cross-origin requests.
Operators who genuinely need cross-origin (e.g. dev with a separate
Vite server on a different port) explicitly set
`codeiq.cors.allowed-origin-patterns`. Logs the resolved state at
startup. The React UI at `/` is unaffected — it's served same-origin.
- **Swagger UI / api-docs disabled in serving** (counter-audit C2).
`springdoc.api-docs.enabled: false` + `springdoc.swagger-ui.enabled: false`
in the serving profile of `application.yml`. The OpenAPI schema is
reconnaissance data; reachable only when running locally or with the
indexing profile.
- **`management.endpoints.web.exposure.include` narrowed** to `health,info`
in serving (was `health,info,metrics`); `health.show-details: never`.
Defense-in-depth alongside the `SecurityFilterChain` `authenticated()`
rule on `/actuator/**`.
- **Spring Security autoconfig excluded outside serving.** Without the
`serving` profile (CLI, tests, IDE runs), Spring Security's default
HTTP Basic chain would lock all endpoints — adding the starter would
break ~3000 existing tests that pass through MockMvc with no token.
`application.yml` excludes `SecurityAutoConfiguration`,
`SecurityFilterAutoConfiguration`, `UserDetailsServiceAutoConfiguration`
at the default level; the `serving` profile re-enables them by listing
only `UserDetailsServiceAutoConfiguration` (so the auto user/password
is suppressed but the filter chain is built from `SecurityConfig`).
- **Tests:** 31 new unit tests across `BearerAuthFilterTest` (14 cases:
missing/wrong/empty/correct/lowercase scheme, length-oracle defense,
log-leak audit, `shouldNotFilter` paths, `SecurityContextHolder` cleanup),
`TokenResolverTest` (9 cases for mode/profile/env-priority/fail-fast),
`SecurityHeadersFilterTest` (5 cases for header presence/HSTS gating),
`GlobalExceptionHandlerTest` (3 cases verifying the envelope shape and
no stack-trace leak). Full suite: 3453 tests / 0 failures / 0 errors.

**Known follow-up (not in this PR):** the React UI cannot read env vars,
so the SPA shell is unauthenticated to access static assets. API/MCP calls
from the UI must inject `Authorization: Bearer <token>` from
operator-supplied localStorage. A first-class UI auth bootstrap (login
flow + token-issuance endpoint, OR server-side template injection) is its
own design — tracked as a follow-up issue.

- **Production-readiness PR 2 of 5 — resource limits & abuse protection.**
Closes audit findings #2, #3, C1 (HIGH) and #10, #11 (MEDIUM).
- **Cypher transaction timeout** (audit #2). Neo4j embedded
`GraphDatabaseSettings.transaction_timeout = 30s` configured in
`Neo4jConfig` — every transaction in the JVM, including `run_cypher`
and graph traversals, gets a hard wall-clock cap. Catches runaway
variable-length matches before they starve the page cache.
- **Result-set cap on `run_cypher`** (audit #2). Hard row cap at
`mcp.limits.max_results` (default 500); excess rows dropped, response
carries `truncated: true` + `max_results: N`. Defends the JVM heap
against `MATCH (a),(b),(c) RETURN a,b,c LIMIT 999999999` blowups.
- **MCP `traceImpact` depth cap** (audit #10 corrected, C3). New
`mcp.limits.max_depth` field (default 10) wired into
`McpTools.traceImpact` via `Math.min`. Defends against
`RELATES_TO*1..1000` Cartesian explosions on hub nodes.
- **TTL snapshot cache on topology tools** (audit C1). `McpTools.
getCachedData()` now backed by a 60-second TTL snapshot. Without it,
every concurrent `service_dependencies` / `blast_radius` /
`find_path` / `find_bottlenecks` / `find_circular_deps` /
`find_dead_services` / `find_node` call paid the full
`graphStore.findAll()` cost and double-allocated multi-GB heaps.
A bridge fix; the proper refactor (TopologyService → per-tool Cypher)
is a tracked follow-up.
- **Per-client rate limiter** (audit #3). New `RateLimitFilter` using
Bucket4j 8.18.0 (Apache-2.0). Token bucket sized at
`mcp.limits.rate_per_minute` (default 300). Keyed by SHA-256 hash of
the `Authorization` header (so the token never lives in our key map),
falls back to `X-Forwarded-For` (first hop) or `RemoteAddr`. 429
response with `Retry-After`, `X-RateLimit-Limit`, `X-RateLimit-Remaining`
headers. Registered before `BearerAuthFilter` so unauthenticated
brute-force is also throttled.
- **`/api/file` content-type sniff** (audit #11 corrected). Added
`Files.probeContentType` guard — non-text MIMEs (`.jks`, `.so`,
`.png`, native libs) return HTTP 415 with the probed type, instead
of being served as garbled `text/plain`. Allowlist: `text/*`,
`application/json`, `application/xml`, `application/x-yaml`,
`application/javascript`. The byte cap (already enforced by
`SafeFileReader`) is unchanged.
- **Tomcat slow-client tarpit** (audit #11). `server.tomcat.connection-
timeout: 10s`, `max-swallow-size: 1MB` in the serving profile —
drops connections that hold a virtual thread + Tomcat connection at
1 KB/s.
- **CodeQL hardening on the security baseline.** Sanitised request
method + URI before logging in `BearerAuthFilter` (CWE-117 / CodeQL
`java/log-injection`); removed env-var name from the bearer-token
bootstrap log line in `TokenResolver` (CodeQL `java/sensitive-log`);
documented the deliberate stateless-bearer rationale on
`SecurityConfig.csrf(disable)` (CodeQL `java/spring-disabled-csrf-protection`
— no exploit path on a no-cookie surface).
- **Tests:** new `RateLimitFilterTest` (10 cases: under/over limit,
separate buckets per client, header-hashing, X-Forwarded-For
precedence, permit-list, default-rate fallback). Existing 6 test
classes updated for the new `McpTools` ctor signature. Full suite:
3672 tests / 0 failures / 0 errors.

**Known follow-up:** TopologyService still walks the full snapshot
in-memory after the cache hit — long-term plan is to rewrite each
topology tool as a targeted Cypher query so the snapshot isn't needed.
The cache is the bridge; the rewrite reduces peak memory.

## [0.1.0] - 2026-03-28

First general-availability cut. See the
Expand Down
14 changes: 14 additions & 0 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -433,6 +433,20 @@ bean for code paths that haven't been ported yet.
- **Parallel agent conflicts**: Don't dispatch multiple agents editing the same files concurrently. Use worktree isolation or sequential execution.
- **SonarCloud project key**: `RandomCodeSpace_codeiq`, org: `randomcodespace`
- **CI workflow**: Single `ci-java.yml` runs build + SonarCloud analysis. No cross-platform builds needed (JVM).
- **Spring Security only loads in the `serving` profile.** `application.yml` excludes `SecurityAutoConfiguration` + `SecurityFilterAutoConfiguration` + `UserDetailsServiceAutoConfiguration` at the **default** level so adding `spring-boot-starter-security` doesn't break ~3000 MockMvc tests by activating a default HTTP Basic chain. The `serving` profile re-enables them by listing only `UserDetailsServiceAutoConfiguration` (suppresses the auto user/password printout); the chain itself is built by `config/security/SecurityConfig`. **Don't** drop the default exclude — non-serving contexts (CLI, tests) must have no Spring Security wiring at all.
- **`BearerAuthFilter.shouldNotFilter` and `SecurityConfig.permitAll()` paths must stay in sync.** The filter runs before Spring's `AuthorizationFilter`, so if a path is in `permitAll()` but NOT in `shouldNotFilter`, the filter rejects it with 401 before Spring's chain can permit it. Open paths today: `/`, `/index.html`, `/favicon.ico`, `/assets/**`, `/static/**`, `/error`, `/actuator/health`, `/actuator/health/liveness`, `/actuator/health/readiness`. Adding any new permit-all endpoint requires updating BOTH places.
- **Constant-time bearer-token compare uses SHA-256 pre-hash.** Both the provided and expected token are hashed with SHA-256 before `MessageDigest.isEqual`. SHA-256 always produces 32-byte digests, so `isEqual` runs over fixed-size arrays — defeats the length oracle that makes raw `isEqual` unsafe across mismatched-length inputs. **Don't** "optimize" by removing the hash and comparing raw token bytes; that re-introduces the oracle.
- **Never log the `Authorization` header.** `BearerAuthFilter` deliberately never passes the header value to a logger, even at DEBUG. The rejection log line carries only `method` and `requestURI`. There's a regression test (`tokenValueNeverAppearsInLogs`) that captures all log lines for the filter and asserts the secret substring is absent.
- **`mode=none` + active `serving` profile = startup failure** unless `codeiq.mcp.auth.allow_unauthenticated=true` is **explicitly** set. This is by design — operators must opt into permissive mode deliberately. `mode=mtls` is reserved and currently throws "not yet implemented" (better than silently passing through).
- **`server.error.include-stacktrace: never`** is set in the serving profile as defense-in-depth alongside `GlobalExceptionHandler`. Don't enable it for "easier debugging" — stack frames in the response body leak class names + paths (CWE-209). Use the `request_id` in the envelope to correlate to the WARN log line where the full stack is captured.
- **Cypher transaction wall-clock cap is configured at the DBMS level**, not per-call. `Neo4jConfig.databaseManagementService(...)` sets `GraphDatabaseSettings.transaction_timeout = 30s` so every transaction gets the cap automatically. Don't reach for `graphDb.beginTx(timeout, unit)` overload in tool code — the test suite mocks `beginTx()` with no args and the overload changes the matcher signature, breaking the existing stubs across `McpToolsTest` / `McpToolsExpandedTest` / `McpToolsEvidenceTest`.
- **`McpTools.runCypher` row cap is enforced in the iteration loop, not via `LIMIT`.** After `maxResults` rows are accumulated the loop breaks and the response carries `truncated: true` + `max_results: N`. Don't try to inject `LIMIT N` into the user-supplied query string — that would require parsing the query (and the user's query may already have its own LIMIT).
- **`McpTools.getCachedData()` 60-second TTL snapshot is a bridge fix.** It's NOT the proper solution — the proper solution is to rewrite each topology MCP tool to use a targeted Cypher query so the full graph never needs to live on heap. The cache caps peak memory under concurrent calls but the snapshot itself is still multi-GB on large graphs. When that refactor lands, the `AtomicReference<CachedSnapshot>` and `getCachedData()` itself can be deleted.
- **`RateLimitFilter` keys by `sha256(Authorization)`** — the raw token NEVER goes into the bucket key map. The 16-hex-char digest is enough collision resistance for keying. Falls back to `X-Forwarded-For` (first hop) → `RemoteAddr` when no auth header is present. Buckets live in a `ConcurrentHashMap` — bounded in practice by `num_distinct_clients`, which for the single-tenant pod shape is small. Swap to a Caffeine cache with a max-size eviction if multi-tenant exposure is ever added.
- **Filter chain order in `serving` profile**: `SecurityHeadersFilter` → `RateLimitFilter` → `BearerAuthFilter` → ... → controller. Each `addFilterBefore(X, UsernamePasswordAuthenticationFilter.class)` inserts X immediately before UPAFilter, pushing the previously-inserted filter farther from the target — so the **registration order in `SecurityConfig.servingFilterChain` IS the chain order**. Don't shuffle without re-reasoning about it: if `RateLimitFilter` ran AFTER `BearerAuthFilter`, an unauthenticated brute-force attempt would never get throttled (would just see 401 over and over, hitting the slow path).
- **`Files.probeContentType` is best-effort** — JDK 25 on Linux uses `/etc/mime.types` + magic-byte fallback. It returns `null` if the type can't be determined; treat that as "let it through" (the byte cap in `SafeFileReader` still bounds size). The allowlist for `/api/file` is `text/*` + `application/{json,xml,x-yaml,javascript}` — extending requires adding to the explicit list in `GraphController.readFile`.
- **Sanitize user-controlled values before logging.** `BearerAuthFilter.sanitizeForLog(String)` strips `\p{Cntrl}` and truncates at 256 chars. Use it on anything tainted by `request.getRequestURI()`, `request.getMethod()`, headers, etc. before passing to a logger. CodeQL `java/log-injection` will flag direct `log.warn("... {} ...", request.getRequestURI())` as a vuln.
- **`mcp.limits.max_depth` is a NEW field on `McpLimitsConfig`** (default 10). Audit #10 / C3 — the original audit assumed it existed but it didn't. When adding new MCP traversal tools, cap depth via `Math.min(callerSupplied, maxDepth)` before passing to Cypher. The REST endpoint already had this guard via `config.getMaxDepth()` from `CodeIqConfig`; the MCP path now mirrors it via `McpLimitsConfig.maxDepth()`.

## Supply-chain observability (OpenSSF)

Expand Down
Loading
Loading