Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -217,6 +217,7 @@ Complete documentation is available in [docs/README.md](docs/README.md):
- **[Chrome Extension](docs/06-chrome-extension.md)** - Building, configuring, and publishing the PII Guard extension
- **[Customizing the PII Model](docs/07-customizing-pii-model.md)** - Training a model with your own entity types
- **[Masking Controls & Review](docs/08-masking-controls.md)** - Disable entity types, custom regex, mapping review
- **[Coding Agents (Codex & Claude Code)](docs/09-coding-agents.md)** - Route terminal coding agents through the proxy

**Quick Links:**
- [Installation Guide](docs/01-getting-started.md#quick-installation)
Expand All @@ -225,6 +226,7 @@ Complete documentation is available in [docs/README.md](docs/README.md):
- [Build for macOS](docs/03-building-deployment.md#building-for-macos)
- [Build for Linux](docs/03-building-deployment.md#building-for-linux)
- [Masking Controls](docs/08-masking-controls.md) - disable entities, custom regex, review mappings
- [Coding Agents Setup](docs/09-coding-agents.md) - Codex & Claude Code via the proxy

---

Expand Down
173 changes: 173 additions & 0 deletions docs/09-coding-agents.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,173 @@
# Chapter 9: Coding Agents (Codex & Claude Code)

Route terminal coding agents — OpenAI **Codex** and Anthropic **Claude Code** — through Kiji Privacy Proxy so that PII in your prompts, files, and tool calls is masked before it reaches the model, and restored in the model's replies. Streaming responses are restored token-by-token, so the agent still feels live.

This chapter focuses on **what to set up on the client side**. For how the proxy itself works, see [Advanced Topics](05-advanced-topics.md#transparent-proxy--mitm).

## How it works

Coding agents talk to their provider over HTTPS. The proxy runs in **transparent (MITM) mode** (port `8081`): it intercepts traffic to known provider hosts, masks PII in the outgoing request, forwards it, then restores PII in the response — buffered or streamed (SSE).

Hosts the proxy intercepts for coding agents:

| Agent | Host(s) | Notes |
|-------|---------|-------|
| Claude Code | `api.anthropic.com` | |
| Codex (API key) | `api.openai.com` | `/v1/responses` and `/v1/chat/completions` |
| Codex (ChatGPT login) | `chatgpt.com` | `/backend-api/codex/responses` |

For **any** agent, two things must be true:

1. **The agent sends its HTTPS traffic through the proxy** — via `HTTP_PROXY` / `HTTPS_PROXY`. The macOS PAC auto-configuration only routes **browsers**; command-line agents must be pointed at the proxy explicitly.
2. **The agent trusts the proxy's CA** — so the MITM TLS handshake is accepted. Each agent reads its trusted CA from a different place (see below).

## Prerequisites

- Kiji Privacy Proxy is **running** (desktop app on macOS, or the standalone backend on Linux). See [Getting Started](01-getting-started.md).
- You know the path to the proxy CA certificate:

| Platform | CA certificate path |
|----------|---------------------|
| macOS | `$HOME/Library/Application Support/Kiji Privacy Proxy/certs/ca.crt` |
| Linux | `~/.kiji-proxy/certs/ca.crt` |

Throughout this chapter the proxy endpoint is `http://127.0.0.1:8081` (the transparent proxy port). Adjust if you changed `proxy_port`.

## Claude Code

Claude Code is a Node.js application, so it uses the standard Node proxy and CA variables.

### Environment variables

```bash
export HTTP_PROXY=http://127.0.0.1:8081
export HTTPS_PROXY=http://127.0.0.1:8081

# macOS
export NODE_EXTRA_CA_CERTS="$HOME/Library/Application Support/Kiji Privacy Proxy/certs/ca.crt"
# Linux
export NODE_EXTRA_CA_CERTS="$HOME/.kiji-proxy/certs/ca.crt"
```

Then run `claude` in the same shell. Requests to `api.anthropic.com` now flow through the proxy.

### Making it persistent

Instead of exporting in every shell, set the variables in Claude Code's settings file so they apply to every session. Add an `env` block to `~/.claude/settings.json`:

```json
{
"env": {
"HTTP_PROXY": "http://127.0.0.1:8081",
"HTTPS_PROXY": "http://127.0.0.1:8081",
"NODE_EXTRA_CA_CERTS": "/Users/you/Library/Application Support/Kiji Privacy Proxy/certs/ca.crt"
}
}
```

(The path may contain spaces — that's fine inside the JSON string. Use the absolute path; `~`/`$HOME` are not expanded here.)

## Codex

Codex (`codex-cli`) is a **native Rust binary that uses rustls** for TLS, not Node and not the macOS keychain. Two consequences:

- `NODE_EXTRA_CA_CERTS` is a Node concept — but Codex's CA loader happens to honor it as a fallback, so it still works (see below).
- Adding the CA to the macOS **System keychain alone is not enough**, because rustls uses its own root store. You must point Codex at the CA **file** via an environment variable.

### Environment variables

```bash
export HTTP_PROXY=http://127.0.0.1:8081
export HTTPS_PROXY=http://127.0.0.1:8081

# macOS
export CODEX_CA_CERTIFICATE="$HOME/Library/Application Support/Kiji Privacy Proxy/certs/ca.crt"
# Linux
export CODEX_CA_CERTIFICATE="$HOME/.kiji-proxy/certs/ca.crt"
```

Then run `codex`. `CODEX_CA_CERTIFICATE` is Codex's native variable. If it is unset, Codex falls back — in order — to these standard CA-bundle variables, so any of them works too:

```
CODEX_CA_CERTIFICATE → SSL_CERT_FILE → REQUESTS_CA_BUNDLE → CURL_CA_BUNDLE
→ NODE_EXTRA_CA_CERTS → GIT_SSL_CAINFO → BUNDLE_SSL_CA_CERT
```

This means if you already export `NODE_EXTRA_CA_CERTS` globally for Claude Code, Codex will pick up the same CA automatically — but setting `CODEX_CA_CERTIFICATE` explicitly is clearest.

### API-key vs ChatGPT-login Codex

- **API-key Codex** (`OPENAI_API_KEY` set) talks to `api.openai.com`. Your API key is forwarded untouched.
- **ChatGPT-login Codex** (signed in with `codex login`) talks to `chatgpt.com/backend-api/codex/responses` with an OAuth bearer token. The proxy leaves the `Authorization` header untouched and only masks/restores content, so your session keeps working.

Both are intercepted with the same setup above; no extra configuration is needed to switch between them.

## A shared snippet for both agents

Drop this in your shell profile (`~/.zshrc` / `~/.bashrc`) to cover both agents at once on macOS:

```bash
KIJI_CA="$HOME/Library/Application Support/Kiji Privacy Proxy/certs/ca.crt"
export HTTP_PROXY=http://127.0.0.1:8081
export HTTPS_PROXY=http://127.0.0.1:8081
export NODE_EXTRA_CA_CERTS="$KIJI_CA" # Claude Code (and Codex fallback)
export CODEX_CA_CERTIFICATE="$KIJI_CA" # Codex (explicit)
```

## What gets masked and restored

| Direction | Covered |
|-----------|---------|
| Request → model | Chat `messages`; Responses-API `input` (string or message/part arrays), `instructions` (system prompt), tool-result `output`, and tool-call `arguments` |
| Model → response | Assistant text and tool-call `arguments`, for both **streaming** (SSE) and **buffered** replies |

Every interception is recorded in the proxy's request log (visible in the desktop app), with the masked text the model actually saw and the restored text the agent received. See [Masking Controls & Review](08-masking-controls.md) to tune what gets masked and to review or delete recorded mappings.

## Verifying interception

1. **Check the proxy log.** Run a prompt in the agent, then open the desktop app's request log (or the standalone audit log). You should see an entry for the provider host with masked/restored content.
2. **Capture the raw stream (debugging).** Set `KIJI_SSE_CAPTURE_DIR` before starting the proxy to mirror each upstream SSE stream to a file — useful for confirming exactly what the agent received:

```bash
mkdir -p /tmp/agent-sse
KIJI_SSE_CAPTURE_DIR=/tmp/agent-sse <start the proxy>
# run one agent request, then inspect:
grep -o '"type":"[^"]*"' /tmp/agent-sse/sse-*.log | sort -u
```

Leave `KIJI_SSE_CAPTURE_DIR` unset in normal use; capture is off by default.

## Troubleshooting

**TLS / certificate error (`unable to get local issuer`, `invalid peer certificate`, handshake refused)**
- *Cause:* the agent doesn't trust the proxy CA.
- *Fix:* confirm the CA variable points at a file that exists and is readable. For Codex, use `CODEX_CA_CERTIFICATE` (a **file path**, not a directory); the macOS keychain alone won't satisfy rustls. For Claude Code, use `NODE_EXTRA_CA_CERTS`. Quote paths that contain spaces.

**Traffic isn't being intercepted (no log entries)**
- *Cause:* the agent isn't using the proxy.
- *Fix:* ensure `HTTP_PROXY` and `HTTPS_PROXY` are exported in the **same shell/process** that runs the agent. Check that `NO_PROXY`/`no_proxy` doesn't list `openai.com`, `chatgpt.com`, or `anthropic.com`. Confirm the proxy is listening on the port you set.

**ChatGPT-login Codex still fails after setup**
- *Cause:* the proxy build doesn't intercept `chatgpt.com`, or the CA isn't trusted on that host.
- *Fix:* verify `chatgpt.com` is in the proxy's intercept domains (it is added automatically when the OpenAI provider is configured) and that `CODEX_CA_CERTIFICATE` is set. As a last resort for diagnosis you can test with API-key Codex against `api.openai.com` to isolate whether the issue is host-specific.

**Streaming feels stuck or arrives all at once**
- *Cause:* a buffering layer between the agent and the proxy.
- *Fix:* the proxy streams SSE through chunked and flushes per event. Make sure no additional proxy sits between the agent and Kiji, and that you point the agent directly at `127.0.0.1`.

## Alternative: forward proxy without CA trust

If you'd rather not install/trust the CA, agents that let you override the base URL can use the **forward proxy** (port `8080`) instead. The client talks plain HTTP to the proxy, which makes the upstream TLS connection itself — so no client-side CA trust is needed.

```bash
# Claude Code → forward proxy (no CA needed)
export ANTHROPIC_BASE_URL=http://127.0.0.1:8080
```

This works for API-key clients with a configurable endpoint. It does **not** work for **ChatGPT-login Codex**, whose `chatgpt.com` endpoint is fixed — that path requires the transparent proxy + CA trust described above.

## See also

- [Getting Started](01-getting-started.md) — installing the proxy and CA certificate
- [Advanced Topics](05-advanced-topics.md#transparent-proxy--mitm) — MITM architecture, CA management, CORS
- [Masking Controls & Review](08-masking-controls.md) — what gets masked, reviewing mappings
22 changes: 22 additions & 0 deletions docs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -146,6 +146,22 @@ Control what gets masked and review what already has been, from the desktop app.

---

### [Chapter 9: Coding Agents (Codex & Claude Code)](09-coding-agents.md)

Route terminal coding agents through the proxy so PII in prompts, code, and tool calls is masked before it reaches the model and restored in replies.

**Topics:**
- How agents are intercepted (hosts, masking, streaming restore)
- Claude Code setup (`HTTP_PROXY`/`HTTPS_PROXY`, `NODE_EXTRA_CA_CERTS`, `settings.json`)
- Codex setup (rustls CA trust via `CODEX_CA_CERTIFICATE`, API-key vs ChatGPT-login)
- A shared shell snippet for both agents
- Verifying interception and troubleshooting TLS/proxy issues
- Forward-proxy alternative without CA trust

**Start here if you're:** Using OpenAI Codex or Claude Code and want their traffic masked by Kiji.

---

## Quick Links

### Getting Started
Expand Down Expand Up @@ -187,6 +203,12 @@ Control what gets masked and review what already has been, from the desktop app.
- [Custom Regex Patterns](08-masking-controls.md#custom-regex-patterns)
- [Review & Delete Mappings](08-masking-controls.md#reviewing-and-deleting-masked-entities)

### Coding Agents
- [Claude Code Setup](09-coding-agents.md#claude-code)
- [Codex Setup](09-coding-agents.md#codex)
- [Shared Shell Snippet](09-coding-agents.md#a-shared-snippet-for-both-agents)
- [Troubleshooting](09-coding-agents.md#troubleshooting)

## Document Status

These documents consolidate and supersede the following original files:
Expand Down
8 changes: 7 additions & 1 deletion src/backend/config/config.go
Original file line number Diff line number Diff line change
Expand Up @@ -317,13 +317,19 @@ func DefaultConfig() *Config {

// GetInterceptDomains returns the list of intercept domains (as a union of all provider domains)
func (pc ProvidersConfig) GetInterceptDomains() []string {
return []string{
domains := []string{
interceptDomain(pc.AnthropicProviderConfig.APIDomain),
interceptDomain(pc.OpenAIProviderConfig.APIDomain),
interceptDomain(pc.GeminiProviderConfig.APIDomain),
interceptDomain(pc.MistralProviderConfig.APIDomain),
interceptDomain(pc.CustomProviderConfig.APIDomain),
}
// ChatGPT-login Codex talks to chatgpt.com instead of the configured OpenAI
// API domain, so it must be intercepted explicitly whenever OpenAI is enabled.
if pc.OpenAIProviderConfig.APIDomain != "" {
domains = append(domains, "chatgpt.com")
}
return domains
}

func interceptDomain(apiDomain string) string {
Expand Down
60 changes: 48 additions & 12 deletions src/backend/providers/anthropic.go
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,20 @@ func (p *AnthropicProvider) ExtractRequestText(data map[string]interface{}) (str
}
if content, ok := msgMap["content"].(string); ok {
result.WriteString(content + "\n")
} else if blocks, ok := msgMap["content"].([]interface{}); ok {
// Messages API content-block array: collect text from text blocks.
for _, blk := range blocks {
blkMap, ok := blk.(map[string]interface{})
if !ok {
continue
}
if t, _ := blkMap["type"].(string); t != "text" {
continue
}
if text, ok := blkMap["text"].(string); ok {
result.WriteString(text + "\n")
}
}
}
}
return result.String(), nil
Expand Down Expand Up @@ -91,24 +105,46 @@ func (p *AnthropicProvider) CreateMaskedRequest(maskedRequest map[string]interfa
return maskedToOriginal, &entities, fmt.Errorf("no messages field in request")
}

// mask runs PII detection over a single piece of text and merges the
// resulting entities and mappings into the accumulators above.
mask := func(text string) string {
maskedText, _maskedToOriginal, _entities := maskPIIInText(text, "[MaskedRequest]")
entities = append(entities, _entities...)
for k, v := range _maskedToOriginal {
maskedToOriginal[k] = v
}
return maskedText
}

for _, msg := range messages {
msgMap, ok := msg.(map[string]interface{})
if !ok {
continue
}
content, ok := msgMap["content"].(string)
if !ok {
continue
}

// Mask PII in this message's content and update message content with masked text
maskedText, _maskedToOriginal, _entities := maskPIIInText(content, "[MaskedRequest]")
msgMap["content"] = maskedText

// Collect entities and mappings
entities = append(entities, _entities...)
for k, v := range _maskedToOriginal {
maskedToOriginal[k] = v
// The Messages API allows `content` to be either a plain string or an
// array of typed content blocks (Claude Code always uses the latter).
// Handle both so PII is masked in either shape.
switch content := msgMap["content"].(type) {
case string:
msgMap["content"] = mask(content)
case []interface{}:
for _, blk := range content {
blkMap, ok := blk.(map[string]interface{})
if !ok {
continue
}
// Only text blocks carry free text; skip image / tool_use /
// tool_result blocks (different/nested shapes).
if t, _ := blkMap["type"].(string); t != "text" {
continue
}
text, ok := blkMap["text"].(string)
if !ok {
continue
}
blkMap["text"] = mask(text)
}
}
}

Expand Down
31 changes: 26 additions & 5 deletions src/backend/providers/openai.go
Original file line number Diff line number Diff line change
Expand Up @@ -11,11 +11,16 @@ import (
)

const (
ProviderTypeOpenAI ProviderType = "openai"
ProviderSubpathOpenAI string = "/v1/chat/completions"
ProviderSubpathOpenAIResp string = "/v1/responses"
ProviderAPIDomainOpenAI string = "api.openai.com"
ProviderNameOpenAI string = "OpenAI"
ProviderTypeOpenAI ProviderType = "openai"
ProviderSubpathOpenAI string = "/v1/chat/completions"
ProviderSubpathOpenAIResp string = "/v1/responses"
ProviderAPIDomainOpenAI string = "api.openai.com"
ProviderNameOpenAI string = "OpenAI"
// ProviderAPIDomainCodex is the host used by ChatGPT-login Codex (the OpenAI
// CLI). It hits chatgpt.com/backend-api/codex/responses with an OAuth bearer
// token instead of api.openai.com, so it must be routed to and intercepted by
// the OpenAI provider alongside the API-key host.
ProviderAPIDomainCodex string = "chatgpt.com"
)

// reasoningModelFamilies lists OpenAI model family prefixes that require the
Expand Down Expand Up @@ -548,6 +553,22 @@ func restoreResponsesAPIResponse(maskedResponse map[string]interface{}, maskedTo
if !ok {
continue
}

// function_call output items carry model-generated tool arguments as a
// JSON string, which can echo masked PII just like assistant text. Restore
// it here so buffered responses match the streaming codec's behavior.
if args, ok := itemMap["arguments"].(string); ok {
restoredArgs := restorePII(args, maskedToOriginal)
if restoredArgs != args && getLogResponses() {
log.Printf("PII restored in response tool-call arguments")
if getLogVerbose() {
log.Printf("Original tool-call arguments: %s", args)
log.Printf("Restored tool-call arguments: %s", restoredArgs)
}
}
itemMap["arguments"] = restoredArgs
}

contents, ok := itemMap["content"].([]interface{})
if !ok {
continue
Expand Down
3 changes: 2 additions & 1 deletion src/backend/providers/provider.go
Original file line number Diff line number Diff line change
Expand Up @@ -175,7 +175,8 @@ func (p *Providers) GetProviderFromHost(host string, logPrefix string) (*Provide
}

switch {
case p.OpenAIProvider != nil && providerHostMatches(host, p.OpenAIProvider.apiDomain):
case p.OpenAIProvider != nil && (providerHostMatches(host, p.OpenAIProvider.apiDomain) || host == ProviderAPIDomainCodex):
// chatgpt.com is the ChatGPT-login Codex host; route it to OpenAI too.
provider = p.OpenAIProvider
case p.AnthropicProvider != nil && providerHostMatches(host, p.AnthropicProvider.apiDomain):
provider = p.AnthropicProvider
Expand Down
Loading
Loading