Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion DSL/CronManager/DSL/data_resync.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,4 +2,4 @@ agency_data_resync:
trigger: "0 0 0/1 * * ?"
# trigger: off
type: exec
command: "../app/scripts/agency_data_resync.sh -s 10"
command: "/app/scripts/agency_data_resync.sh -s 10"
2 changes: 1 addition & 1 deletion DSL/CronManager/DSL/delete_from_vault.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,4 +2,4 @@ delete_secrets:
trigger: off
type: exec
command: "/app/scripts/delete_secrets_from_vault.sh"
allowedEnvs: ['cookie','vaultUuid','llmPlatform', 'llmModel','embeddingModel','embeddingPlatform']
allowedEnvs: ['cookie','vaultUuid','llmPlatform', 'llmModel','embeddingModel','embeddingPlatform', 'vaultAgentUrl']
2 changes: 1 addition & 1 deletion DSL/CronManager/DSL/store_in_vault.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,4 +2,4 @@ store_secrets:
trigger: off
type: exec
command: "/app/scripts/store_secrets_in_vault.sh"
allowedEnvs: ['cookie','vaultUuid','llmPlatform', 'llmModel','secretKey','accessKey','deploymentName','targetUrl','apiKey','embeddingModel','embeddingPlatform','embeddingAccessKey','embeddingSecretKey','embeddingDeploymentName','embeddingTargetUri','embeddingAzureApiKey','deploymentEnvironment']
allowedEnvs: ['cookie','vaultUuid','llmPlatform', 'llmModel','secretKey','accessKey','deploymentName','targetUrl','apiKey','embeddingModel','embeddingPlatform','embeddingAccessKey','embeddingSecretKey','embeddingDeploymentName','embeddingTargetUri','embeddingAzureApiKey','deploymentEnvironment', 'vaultAgentUrl']
4 changes: 2 additions & 2 deletions DSL/CronManager/script/delete_secrets_from_vault.sh
Original file line number Diff line number Diff line change
Expand Up @@ -6,9 +6,9 @@
set -e # Exit on any error

# Configuration
# Use VAULT_AGENT_URL which points to vault-agent-cron proxy
# Use vaultAgentUrl which points to vault-agent-cron proxy
# The agent automatically injects the authentication token
VAULT_ADDR="${VAULT_AGENT_URL:-http://vault-agent-cron:8203}"
VAULT_ADDR="${vaultAgentUrl:-http://vault-agent-cron:8203}"

# Logging function
log() {
Expand Down
4 changes: 2 additions & 2 deletions DSL/CronManager/script/store_secrets_in_vault.sh
Original file line number Diff line number Diff line change
Expand Up @@ -6,9 +6,9 @@
set -e # Exit on any error

# Configuration
# Use VAULT_AGENT_URL which points to vault-agent-cron proxy
# Use vaultAgentUrl which points to vault-agent-cron proxy
# The agent automatically injects the authentication token
VAULT_ADDR="${VAULT_AGENT_URL:-http://vault-agent-cron:8203}"
VAULT_ADDR="${vaultAgentUrl:-http://vault-agent-cron:8203}"

# Decryption Configuration
PRIVATE_KEY_CACHE=""
Expand Down
12 changes: 8 additions & 4 deletions docker-compose-ec2.yml
Original file line number Diff line number Diff line change
Expand Up @@ -503,7 +503,11 @@ services:
- ./vault/config:/vault/config:ro
- ./vault/logs:/vault/logs
networks:
- vault-network # Only on vault-network for security
vault-network: # Only on vault-network for security
# Local testing: bare "vault" collides with the ckb stack on the shared
# bykstack network, so expose this Vault under a unique alias instead.
aliases:
- rag-vault
restart: unless-stopped
healthcheck:
test: ["CMD", "sh", "-c", "wget -q -O- http://127.0.0.1:8200/v1/sys/health || exit 0"]
Expand All @@ -520,7 +524,7 @@ services:
vault:
condition: service_healthy
environment:
VAULT_ADDR: http://vault:8200
VAULT_ADDR: http://rag-vault:8200
volumes:
- vault-data:/vault/data
- vault-agent-creds:/agent/credentials
Expand All @@ -529,8 +533,8 @@ services:
- vault-agent-llm-token:/agent/llm-token
- ./vault-init.sh:/vault-init.sh:ro
networks:
- vault-network # Access vault
- bykstack # Access to write agent tokens
# vault-network only: tokens/creds go via shared volumes, not the network.
- vault-network
entrypoint: ["/bin/sh"]
command:
- -c
Expand Down
14 changes: 9 additions & 5 deletions docker-compose.yml
Original file line number Diff line number Diff line change
Expand Up @@ -193,7 +193,7 @@ services:
environment:
- server.port=9010
- PYTHONPATH=/app:/app/src/vector_indexer:/app/src/intent_data_enrichment:/app/src/api_tool_indexer
- VAULT_AGENT_URL=http://vault-agent-cron:8203
- vaultAgentUrl=http://vault-agent-cron:8203
ports:
- 9010:8080
depends_on:
Expand Down Expand Up @@ -451,7 +451,11 @@ services:
- ./vault/config:/vault/config:ro
- ./vault/logs:/vault/logs
networks:
- vault-network # Only on vault-network for security
vault-network: # Only on vault-network for security
# Local testing: bare "vault" collides with the ckb stack on the shared
# bykstack network, so expose this Vault under a unique alias instead.
aliases:
- rag-vault
restart: unless-stopped
healthcheck:
test: ["CMD", "sh", "-c", "wget -q -O- http://127.0.0.1:8200/v1/sys/health || exit 0"]
Expand All @@ -468,7 +472,7 @@ services:
vault:
condition: service_healthy
environment:
VAULT_ADDR: http://vault:8200
VAULT_ADDR: http://rag-vault:8200
volumes:
- vault-data:/vault/data
- vault-agent-creds:/agent/credentials
Expand All @@ -477,8 +481,8 @@ services:
- vault-agent-llm-token:/agent/llm-token
- ./vault-init.sh:/vault-init.sh:ro
networks:
- vault-network # Access vault
- bykstack # Access to write agent tokens
# vault-network only: tokens/creds go via shared volumes, not the network.
- vault-network
entrypoint: ["/bin/sh"]
command:
- -c
Expand Down
81 changes: 46 additions & 35 deletions docs/VAULT_SECURITY_ARCHITECTURE.md
Original file line number Diff line number Diff line change
Expand Up @@ -197,9 +197,12 @@ Day 0+: Automatic Token Renewal:
Container Restart:
vault-init: Check if Vault is sealed
If unsealed: Regenerate secret_id only
If unsealed: Validate existing secret_ids
vault-agent: Re-authenticate with new secret_id
If valid: Reuse existing secret_id (no churn)
If invalid: Mint new secret_id and write to disk
vault-agent: Re-authenticate with secret_id
New token issued and cached
```
Expand Down Expand Up @@ -413,8 +416,9 @@ Connected Services:
- GUI (React Frontend)

Token Lifecycle:
- Default Lease: 768h (32 days)
- Auto-renewal: Before expiration
- Token type: periodic (token_period 20m, no max-TTL)
- Auto-renewal: Every ~13 minutes (~2/3 of period)
- Re-auth: only on agent restart (never in steady state)
```

#### Agent 2: vault-agent-cron
Expand All @@ -429,8 +433,9 @@ Connected Services:
- CronManager (Python worker)

Token Lifecycle:
- Default Lease: 768h (32 days)
- Auto-renewal: Before expiration
- Token type: periodic (token_period 30m, no max-TTL)
- Auto-renewal: Every ~20 minutes (~2/3 of period)
- Re-auth: only on agent restart (never in steady state)
```

#### Agent 3: vault-agent-llm
Expand All @@ -445,8 +450,9 @@ Connected Services:
- LLM Orchestration Service (FastAPI)

Token Lifecycle:
- Default Lease: 1h (shorter for higher security)
- Auto-renewal: Every ~45 minutes
- Token type: periodic (token_period 1h, no max-TTL)
- Auto-renewal: Every ~40 minutes (~2/3 of period)
- Re-auth: only on agent restart (never in steady state)
```

### Token Caching and Auto-Renewal
Expand All @@ -464,29 +470,31 @@ T=0: Initial Authentication
├─► POST /v1/auth/approle/login
│ Body: { role_id, secret_id }
└─► Receives: { token, ttl: 3600s, renewable: true }
└─► Receives: { token, period: 3600s, renewable: true } ← periodic token, no max-TTL
└─► Cache token in: /agent/llm-token/token


T=45min: Proactive Renewal (75% of TTL)
T≈40min: Proactive Renewal (~2/3 of period)
vault-agent monitors expiration
├─► POST /v1/auth/token/renew-self
│ Header: X-Vault-Token: <current_token>
└─► Receives: { token, ttl: 3600s } (same token, extended)
└─► Receives: { token, period: 3600s } (same token, period reset)
└─► Update cache: /agent/llm-token/token
└─► Repeats forever — a periodic token never hits a max-TTL,
so steady-state operation never needs approle/login again.


T=59min: Renewal Failed (fallback)
If renewal fails:
On agent restart only:
vault-agent re-reads role_id + secret_id from disk
├─► Re-authenticate from scratch
│ POST /v1/auth/approle/login
├─► POST /v1/auth/approle/login (secret_id must still be valid)
└─► New token issued and cached
└─► New periodic token issued and cached


Application Request (anytime):
Expand Down Expand Up @@ -856,15 +864,16 @@ Step 12: Check Vault Seal Status
└─► GET /v1/sys/seal-status
└─► If unsealed: Skip unseal steps

Step 13: Regenerate Secret IDs Only
└─► POST /v1/auth/approle/role/gui-service/secret-id
─► POST /v1/auth/approle/role/cron-manager-service/secret-id
─► POST /v1/auth/approle/role/llm-orchestration-service/secret-id
└─► Write new secret_ids to /agent/credentials/
Step 13: Validate and Reconcile Secret IDs
└─► For each role (gui, cron-manager, llm-orchestration):
─► Test existing on-disk secret_id via AppRole login
─► If valid: Reuse (no change to credential file)
└─► If invalid/missing: Mint new secret_id and write to disk

Note: role_ids remain unchanged (static identifiers)
Note: Existing secrets and policies preserved
Note: RSA keypair NOT regenerated (preserved)
Note: Stable secret_ids across restarts reduce credential churn

═══════════════════════════════════════════════════════════════════
COMPLETION
Expand Down Expand Up @@ -1128,13 +1137,14 @@ Startup Order:
vault-init Behavior:
- Detects Vault already initialized
- Skips initialization steps
- Regenerates secret_ids only
- Updates credential files
- Validates existing secret_ids (reuses if still valid)
- Mints new secret_ids only if existing ones are invalid

Result:
All services start with fresh credentials
All services start with validated credentials
Existing secrets preserved
No manual intervention needed
Stable secret_ids reduce unnecessary credential churn
```

### Token Regeneration Strategy
Expand All @@ -1143,22 +1153,23 @@ Result:
Current Implementation:

1. On Every Container Restart:
└─► vault-init regenerates secret_ids
─► Vault agents get new tokens
└─► Old tokens remain valid until expiration
└─► vault-init validates existing secret_ids
─► If valid: Reuse (agents continue with same credentials)
└─► If invalid: Mint new secret_id, agents re-authenticate

2. Token Lifecycle:
└─► Issue: vault-agent authenticates
└─► Issue: vault-agent authenticates (periodic token, token_period per role)
└─► Use: Application makes requests
└─► Renew: vault-agent extends TTL
└─► Expire: Automatic renewal failed
└─► Re-issue: vault-agent re-authenticates
└─► Renew: vault-agent renews within the period (~2/3 of period)
└─► No max-TTL: renewal continues indefinitely
└─► Re-issue: only on agent restart, via secret_id login

3. Security Benefits:
Short-lived tokens (1 hour for LLM, 32 days for others)
Automatic rotation on agent restart
No manual token management
Compromised tokens have limited lifetime
Periodic tokens (period 1h LLM, 30m Cron, 20m GUI), renewed continuously
Steady-state operation never re-runs approle/login (a stale secret_id
cannot strand a running agent)
Stable secret_ids (no unnecessary churn on restart)
Compromised tokens limited to one un-renewed period
```

### Audit Logging Capabilities
Expand Down
Loading
Loading