Replace Slack with Lark for approval notifications#10
Conversation
- Replace SlackClient/SlackNotifier with LarkClient/ApprovalNotifier using Lark's tenant_access_token + interactive card API - Replace SlackWebhookHandler with LarkWebhookHandler at POST /lark/actions with SHA-256 signature verification and URL challenge support - Add examples/mock-lark for local testing (replaces mock-slack) - Update config env vars: SLACK_* → LARK_APP_ID/APP_SECRET/CHAT_ID/VERIFICATION_TOKEN - Expose postgres:5432 and redis:6379 host ports in docker-compose - Update all tests and mocks Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ark binary Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Keep Lark notifier/webhook, bring in ApprovalLockTTL from main - NewRedisApprovalBridge now takes config.ApprovalLockTTL - Postgres port stays 5432:5432 (host mapping added by slackToLark) - Submodule pinned to 9fc10bc Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Replace SlackClient/SlackNotifier with LarkClient/ApprovalNotifier using Lark's tenant_access_token + interactive card API - Replace SlackWebhookHandler with LarkWebhookHandler at POST /lark/actions with SHA-256 signature verification and URL challenge support - Add examples/mock-lark for local testing (replaces mock-slack) - Update config env vars: SLACK_* → LARK_APP_ID/APP_SECRET/CHAT_ID/VERIFICATION_TOKEN - Expose postgres:5432 and redis:6379 host ports in docker-compose - Update all tests and mocks Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ark binary Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Slack → Lark migration (gateway + eval runner): - Rename all Slack references to Lark throughout gateway config, webhook handler, policy gate, approval bridge, stack health, and tests - Add LARK_APP_ID / LARK_APP_SECRET / LARK_CHAT_ID / LARK_VERIFICATION_TOKEN env vars; remove SLACK_* vars from config and docker-compose files - Implement LarkNotifier in slack_webhook.go using Lark Open Platform APIs (tenant_access_token + im/v1/messages); keep HMAC signature verification for inbound callbacks on /lark/actions - Update mock-lark example server and mock-slack to match new Lark API shape - Update deploy/docker-compose.yml with mock-lark service and Lark env vars Fix approval-timeout eval scenario: - Add Lark reachability precondition check to approval-timeout scenario: fails fast with a clear message if mock-lark is still up (mirrors the existing MCP-crash precondition pattern); mock-lark auto-approves every ticket so the scenario can never reach 'expired' while it is running - Add APPROVAL_TIMEOUT=25s to root docker-compose.yml gateway service (was missing; deploy/docker-compose.yml already had it); without this the 5-minute default outlasts the eval runner's 90s poll window - Wire STACK_HEALTH_SLACK_URL → scenarioDeps.larkURL in serve.go so the precondition probe uses the same address as the stack health check - Update UI scenario description from "Slack is down" to "Lark is down" localstripe_demo submodule (d652019): - Bump to commit that fixes approval-flow race conditions in demo webapp: emit SSE done only after full event stream exhaustion, handle late-arriving tool_end badges, discourage dry_run=True in agent prompt and MCP docstring Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Resolved conflicts between local slackToLark (ApprovalTimeout field) and remote slackToLark (ApprovalLockTTL field): - Keep ApprovalTimeout / APPROVAL_TIMEOUT throughout (gateway config, main) - Keep full larkCallbackEnvelope v1/v2 handler in slack_webhook.go - Use hardcoded demo Lark credentials in root docker-compose.yml (with LARK_API_BASE_URL pointing to mock-lark) for local development - Keep /healthz endpoint in mock-lark Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- slack_webhook: support Lark Schema 2.0 card.action.trigger callbacks (token-in-body auth, nested event.operator/event.action structure) alongside existing HMAC-signed mock-lark path - slack_notifier: add 15s HTTP timeout and success log for card sends - slack_webhook: log decision on approval/deny for observability - docker-compose: parameterise Lark credentials via env vars with mock-lark fallbacks; fix port conflicts (postgres 15432, redis 16379); add eval-trigger service; remove mock-slack port collision - docker-compose.override: drop stale mock-slack gateway dependency - evalsuite/ai-agent: add missing policyOutcome to refund-intercepted case Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…dd README setup guide - Rename all send_slack_message → send_lark_message (policy, evals, fake servers, tests) - Rename approval-timeout-slack-down → approval-timeout-lark-down in scenarios/suites/UI - Rename STACK_HEALTH_SLACK_URL → STACK_HEALTH_LARK_URL env var - README: update services table, Scenario 3, add Real Lark approval setup section - docker-compose: remove mock-lark default for LARK_API_BASE_URL so unset env routes to real Lark (open.feishu.cn); set LARK_API_BASE_URL=http://mock-lark:8090/open-apis in .env to use mock-lark for local dev - Bump localstripe_demo submodule to include LangGraph checkpoint repair Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Code Review
This pull request migrates the approval notification system from Slack to Lark (Feishu), replacing the mock Slack services and tests with Lark equivalents, and introduces conditional rule evaluation based on tool arguments (when clauses) in the policy evaluator. Feedback on the changes highlights critical issues with Lark's API error handling, noting that business-level failures must be decoded and verified even when HTTP 200 is returned. Additionally, the signature verification logic in both the gateway and mock server needs to be corrected to match Lark's official concatenation order, and standard string comparisons should be replaced with subtle.ConstantTimeCompare to prevent timing attacks. Finally, the matchWhen helper should be updated to handle non-comparable types and numeric type mismatches robustly to avoid runtime panics.
| // larkSendMessageReq is the payload for Lark's im/v1/messages endpoint. | ||
| // Content is the JSON-encoded card string (Lark requires a JSON string, not object). | ||
| type larkSendMessageReq struct { | ||
| ReceiveID string `json:"receive_id"` | ||
| MsgType string `json:"msg_type"` | ||
| Content string `json:"content"` | ||
| } |
There was a problem hiding this comment.
Lark Open APIs return HTTP 200 even for business-level failures, indicating the actual status via a code field in the JSON response body. To handle these errors properly, we should define a response structure to decode and verify the response code.
// larkSendMessageReq is the payload for Lark's im/v1/messages endpoint.
// Content is the JSON-encoded card string (Lark requires a JSON string, not object).
type larkSendMessageReq struct {
ReceiveID string `json:"receive_id"`
MsgType string `json:"msg_type"`
Content string `json:"content"`
}
// larkSendMessageResp is the response from Lark's im/v1/messages endpoint.
type larkSendMessageResp struct {
Code int `json:"code"`
Msg string `json:"msg"`
}| if resp.StatusCode != http.StatusOK { | ||
| return fmt.Errorf("slack notifier: unexpected status %d", resp.StatusCode) | ||
| body, _ := io.ReadAll(resp.Body) | ||
| return fmt.Errorf("lark notifier: unexpected status %d: %s", resp.StatusCode, body) | ||
| } | ||
|
|
||
| c.log.Info("lark approval card sent", "ticketID", ticketID, "chatID", c.chatID) | ||
| return nil |
There was a problem hiding this comment.
Lark's message sending API returns HTTP 200 even when the message fails to deliver (e.g., due to invalid chat ID or missing permissions), returning a non-zero code in the JSON payload. We must decode the response body and verify that code == 0 to prevent silent failures.
rawBody, err := io.ReadAll(resp.Body)
if err != nil {
return fmt.Errorf("lark notifier: read message response: %w", err)
}
if resp.StatusCode != http.StatusOK {
return fmt.Errorf("lark notifier: unexpected status %d: %s", resp.StatusCode, rawBody)
}
var sendResp larkSendMessageResp
if err := json.Unmarshal(rawBody, &sendResp); err != nil {
return fmt.Errorf("lark notifier: decode message response: %w", err)
}
if sendResp.Code != 0 {
return fmt.Errorf("lark notifier: message error code %d: %s", sendResp.Code, sendResp.Msg)
}
c.log.Info("lark approval card sent", "ticketID", ticketID, "chatID", c.chatID)
return nil| func matchWhen(when map[string]any, args map[string]any) bool { | ||
| for k, wantVal := range when { | ||
| gotVal, ok := args[k] | ||
| if !ok || gotVal != wantVal { | ||
| return false | ||
| } | ||
| } | ||
| return true | ||
| } |
There was a problem hiding this comment.
The current implementation of matchWhen uses standard != comparison on any values. This has two major issues:
- Comparing non-comparable types (like slices or maps, which are common in JSON arguments) will cause a runtime panic.
- Numbers parsed from JSON are always
float64, whereas numbers parsed from YAML are typicallyintorint64. This type mismatch causes numeric comparisons to silently fail (e.g.,100.0 != 100evaluates totrue).
Using reflect.DeepEqual and a numeric normalization helper solves both issues.
func matchWhen(when map[string]any, args map[string]any) bool {
for k, wantVal := range when {
gotVal, ok := args[k]
if !ok {
return false
}
if !compareValues(gotVal, wantVal) {
return false
}
}
return true
}
func compareValues(got, want any) bool {
if reflect.DeepEqual(got, want) {
return true
}
gotNum, gotOk := toFloat64(got)
wantNum, wantOk := toFloat64(want)
if gotOk && wantOk {
return gotNum == wantNum
}
return false
}
func toFloat64(v any) (float64, bool) {
switch val := v.(type) {
case int:
return float64(val), true
case int8:
return float64(val), true
case int16:
return float64(val), true
case int32:
return float64(val), true
case int64:
return float64(val), true
case uint:
return float64(val), true
case uint8:
return float64(val), true
case uint16:
return float64(val), true
case uint32:
return float64(val), true
case uint64:
return float64(val), true
case float32:
return float64(val), true
case float64:
return val, true
}
return 0, false
}| // computeLarkSignature returns hex(sha256(verificationToken + timestamp + nonce + body)). | ||
| // Both the gateway and mock-lark use this formula so signatures are mutually verifiable. | ||
| func computeLarkSignature(verificationToken, timestamp, nonce string, body []byte) string { | ||
| h := sha256.New() | ||
| h.Write([]byte(verificationToken)) | ||
| h.Write([]byte(timestamp)) | ||
| h.Write([]byte(nonce)) | ||
| h.Write(body) | ||
| return hex.EncodeToString(h.Sum(nil)) | ||
| } |
There was a problem hiding this comment.
The signature concatenation order in computeLarkSignature is incorrect. According to Lark's official specification, the signature must be computed as hex(sha256(timestamp + nonce + key + body)). The current implementation puts the key first, which will cause signature verification to fail for real Lark callbacks.
| // computeLarkSignature returns hex(sha256(verificationToken + timestamp + nonce + body)). | |
| // Both the gateway and mock-lark use this formula so signatures are mutually verifiable. | |
| func computeLarkSignature(verificationToken, timestamp, nonce string, body []byte) string { | |
| h := sha256.New() | |
| h.Write([]byte(verificationToken)) | |
| h.Write([]byte(timestamp)) | |
| h.Write([]byte(nonce)) | |
| h.Write(body) | |
| return hex.EncodeToString(h.Sum(nil)) | |
| } | |
| // computeLarkSignature returns hex(sha256(timestamp + nonce + verificationToken + body)). | |
| // This matches Lark's official signature verification specification. | |
| func computeLarkSignature(verificationToken, timestamp, nonce string, body []byte) string { | |
| h := sha256.New() | |
| h.Write([]byte(timestamp)) | |
| h.Write([]byte(nonce)) | |
| h.Write([]byte(verificationToken)) | |
| h.Write(body) | |
| return hex.EncodeToString(h.Sum(nil)) | |
| } |
| // computeSignature computes the Lark-compatible HMAC signature used by the gateway. | ||
| // Formula: hex(sha256(verificationToken + timestamp + nonce + body)) | ||
| func computeSignature(token, timestamp, nonce string, body []byte) string { | ||
| h := sha256.New() | ||
| h.Write([]byte(token)) | ||
| h.Write([]byte(timestamp)) | ||
| h.Write([]byte(nonce)) | ||
| h.Write(body) | ||
| return hex.EncodeToString(h.Sum(nil)) | ||
| } |
There was a problem hiding this comment.
Update the signature computation formula in the mock server to match Lark's official specification (timestamp + nonce + token + body) so that it remains in sync with the gateway's corrected signature verification.
| // computeSignature computes the Lark-compatible HMAC signature used by the gateway. | |
| // Formula: hex(sha256(verificationToken + timestamp + nonce + body)) | |
| func computeSignature(token, timestamp, nonce string, body []byte) string { | |
| h := sha256.New() | |
| h.Write([]byte(token)) | |
| h.Write([]byte(timestamp)) | |
| h.Write([]byte(nonce)) | |
| h.Write(body) | |
| return hex.EncodeToString(h.Sum(nil)) | |
| } | |
| // computeSignature computes the Lark-compatible signature used by the gateway. | |
| // Formula: hex(sha256(timestamp + nonce + token + body)) | |
| func computeSignature(token, timestamp, nonce string, body []byte) string { | |
| h := sha256.New() | |
| h.Write([]byte(timestamp)) | |
| h.Write([]byte(nonce)) | |
| h.Write([]byte(token)) | |
| h.Write(body) | |
| return hex.EncodeToString(h.Sum(nil)) | |
| } |
| import ( | ||
| "context" | ||
| "crypto/hmac" | ||
| "crypto/sha256" | ||
| "encoding/hex" | ||
| "encoding/json" | ||
| "io" | ||
| "log/slog" | ||
| "net/http" | ||
| "net/url" | ||
| "strconv" | ||
| "time" |
There was a problem hiding this comment.
We should import "crypto/subtle" to perform constant-time signature comparisons, mitigating potential timing attacks on the webhook signature verification.
| import ( | |
| "context" | |
| "crypto/hmac" | |
| "crypto/sha256" | |
| "encoding/hex" | |
| "encoding/json" | |
| "io" | |
| "log/slog" | |
| "net/http" | |
| "net/url" | |
| "strconv" | |
| "time" | |
| import ( | |
| "context" | |
| "crypto/sha256" | |
| "crypto/subtle" | |
| "encoding/hex" | |
| "encoding/json" | |
| "io" | |
| "log/slog" | |
| "net/http" | |
| "strconv" | |
| "time" |
| providedSig := r.Header.Get("X-Lark-Signature") | ||
| if expectedSig != providedSig { | ||
| h.log.Warn("lark webhook: signature mismatch") | ||
| http.Error(w, "bad request", http.StatusBadRequest) | ||
| return | ||
| } |
There was a problem hiding this comment.
Comparing cryptographic signatures using standard string comparison (!=) is vulnerable to timing attacks. Use subtle.ConstantTimeCompare to ensure constant-time comparison.
| providedSig := r.Header.Get("X-Lark-Signature") | |
| if expectedSig != providedSig { | |
| h.log.Warn("lark webhook: signature mismatch") | |
| http.Error(w, "bad request", http.StatusBadRequest) | |
| return | |
| } | |
| providedSig := r.Header.Get("X-Lark-Signature") | |
| if subtle.ConstantTimeCompare([]byte(expectedSig), []byte(providedSig)) != 1 { | |
| h.log.Warn("lark webhook: signature mismatch") | |
| http.Error(w, "bad request", http.StatusBadRequest) | |
| return | |
| } |
Summary
LARK_API_BASE_URLdocker-compose default — remove mock-lark hardcode so unset env routes to real Lark (open.feishu.cn); setLARK_API_BASE_URL=http://mock-lark:8090/open-apisin.envfor local devsend_slack_message→send_lark_messagein policy, eval suites, fake servers, and testsapproval-timeout-slack-down→approval-timeout-lark-downacross scenarios, suites, and UIToolMessages for orphanedtool_callsbefore the next agent turn.envconfig)Test plan
go test ./...passesdocker compose up -d --waitstarts all services healthycreate_refundin demo-webapp triggers an interactive Lark card in the configured chatINVALID_CHAT_HISTORYerror🤖 Generated with Claude Code