diff --git a/teps/0164-agent-native-workflows.md b/teps/0164-agent-native-workflows.md
new file mode 100644
index 000000000..2db237476
--- /dev/null
+++ b/teps/0164-agent-native-workflows.md
@@ -0,0 +1,1832 @@
+---
+status: proposed
+title: Agent-Native Workflows
+creation-date: '2026-03-20'
+last-updated: '2026-03-20'
+authors:
+- '@waveywaves'
+- '@anithapriyanatarajan'
+---
+
+# TEP-0164: Agent-Native Workflows
+
+---
+
+
+- [Summary](#summary)
+- [Motivation](#motivation)
+ - [Goals](#goals)
+ - [Non-Goals](#non-goals)
+ - [Use Cases](#use-cases)
+ - [AI Code Review Gate](#ai-code-review-gate)
+ - [Multi-Step Test Lifecycle](#multi-step-test-lifecycle)
+ - [Deployment Decision Agent](#deployment-decision-agent)
+ - [Cluster Diagnostics Agent](#cluster-diagnostics-agent)
+ - [Requirements](#requirements)
+- [Proposal](#proposal)
+ - [Overview](#overview)
+ - [AgentRun CRD](#agentrun-crd)
+ - [AgentConfig CRD](#agentconfig-crd)
+ - [Agent Lifecycle](#agent-lifecycle)
+ - [Standalone AgentRun](#standalone-agentrun)
+ - [AgentRun in a PipelineRun](#agentrun-in-a-pipelinerun)
+ - [Multiple Agents in a PipelineRun](#multiple-agents-in-a-pipelinerun)
+ - [Integration with kagent](#integration-with-kagent)
+ - [Integration with Tekton Pipelines](#integration-with-tekton-pipelines)
+ - [Codebase Access via MCP Tools](#codebase-access-via-mcp-tools)
+ - [Security Layer](#security-layer)
+ - [Provenance](#provenance)
+ - [Notes and Caveats](#notes-and-caveats)
+- [Design Details](#design-details)
+ - [Execution Flow: Standalone AgentRun](#execution-flow-standalone-agentrun)
+ - [Execution Flow: PipelineRun with Agent Steps](#execution-flow-pipelinerun-with-agent-steps)
+ - [Agent CR Scoping and Reuse](#agent-cr-scoping-and-reuse)
+ - [kagent Resource Resolution](#kagent-resource-resolution)
+ - [Security Implementation Details](#security-implementation-details)
+ - [CustomTask Adapter](#customtask-adapter)
+ - [Provenance Recording](#provenance-recording)
+ - [AgentConfig Snapshot](#agentconfig-snapshot)
+- [Design Evaluation](#design-evaluation)
+ - [Reusability](#reusability)
+ - [Simplicity](#simplicity)
+ - [Flexibility](#flexibility)
+ - [Conformance](#conformance)
+ - [User Experience](#user-experience)
+ - [Performance](#performance)
+ - [Risks and Mitigations](#risks-and-mitigations)
+ - [Drawbacks](#drawbacks)
+- [Alternatives](#alternatives)
+ - [Build Agent Stack Inside Tekton](#build-agent-stack-inside-tekton)
+ - [Pod-Per-Run Without Agent CR](#pod-per-run-without-agent-cr)
+ - [Pipeline spec.agents Field](#pipeline-specagents-field)
+ - [Volume Mounts for Codebase Access](#volume-mounts-for-codebase-access)
+ - [Convention-Based Container Wrapping](#convention-based-container-wrapping)
+- [Implementation Plan](#implementation-plan)
+ - [Milestones](#milestones)
+ - [Test Plan](#test-plan)
+ - [Infrastructure Needed](#infrastructure-needed)
+ - [Upgrade and Migration Strategy](#upgrade-and-migration-strategy)
+ - [Implementation Pull Requests](#implementation-pull-requests)
+- [References](#references)
+
+
+## Summary
+
+AI agents are appearing in CI/CD pipelines, doing code review, security
+analysis, test generation, and deployment decisions. Today they run as
+opaque Python scripts inside container steps. Tekton cannot see what
+model they called, which tools they used, how many tokens they consumed,
+or whether they followed security policy.
+
+This TEP proposes making Tekton an **agent-native workflow engine** by
+introducing `AgentRun` and `AgentConfig` CRDs that use
+[kagent][kagent]'s Agent CR as the agent runtime and add per-run
+security controls, pipeline integration via CustomTask, and provenance
+recording.
+
+```mermaid
+flowchart TB
+ subgraph User["Pipeline Author"]
+ AR["AgentRun CR
(goal + configRef)"]
+ AC["AgentConfig CR
(model + tools + RBAC + OPA)"]
+ end
+
+ subgraph TEP["AgentRun Controller (this TEP)"]
+ Lifecycle["Agent Lifecycle
create / reuse / cleanup"]
+ Security["Security Layer
RBAC + NetworkPolicy + OPA"]
+ Provenance["Provenance
model + tools + tokens"]
+ end
+
+ subgraph Kagent["kagent"]
+ AgentCR["Agent CR"]
+ MC["ModelConfig"]
+ RMS["RemoteMCPServer"]
+ ADK["ADK Runtime"]
+ end
+
+ subgraph Tekton["Tekton Pipelines"]
+ CT["CustomTask Protocol"]
+ PR["PipelineRun"]
+ Chains["Tekton Chains"]
+ end
+
+ AR --> TEP
+ AC --> TEP
+ TEP --> AgentCR
+ TEP --> Security
+ TEP --> Provenance
+ MC --> AgentCR
+ RMS --> AgentCR
+ AgentCR --> ADK
+ CT --> TEP
+ PR --> CT
+ Provenance --> Chains
+```
+
+The execution model uses **kagent's existing Agent CR**, which creates a
+Deployment and Service for the agent. For standalone AgentRuns, the Agent
+CR is created for a single goal and cleaned up after completion. For
+PipelineRuns with multiple agent steps, the Agent CR is created at the
+first agent step and shared across subsequent steps that reference the
+same AgentConfig, preserving conversation context. The Agent CR is
+cleaned up when the PipelineRun completes.
+
+The controller adds security controls that neither kagent nor Tekton
+Pipelines provides alone: per-run RBAC scoping with configurable rules,
+per-run NetworkPolicy generation, layered policy enforcement (OPA at
+goal submission + kagent tool allowlists at execution), prompt
+auditability, and provenance recording for [Tekton Chains][chains].
+
+A [CustomTask][customtask] adapter allows AgentRuns to participate in
+Tekton Pipeline DAGs, making agent steps composable with traditional
+container steps using standard result passing and `when` expressions.
+
+## Motivation
+
+AI agents are already appearing in CI/CD pipelines, but they are
+invisible to the platform. A [comparison of two real-world pipelines][pipeline-comparison],
+one implemented [without agents][pipeline-without-agents] and one
+[with agents][pipeline-with-agents], illustrates the problem:
+
+**Without agents** (614 lines, 12 tasks): Template-based test plan
+generation using static `case` statements. A manual approval gate where
+humans write tests from scratch. Raw pass/fail counts posted as results.
+
+**With agents** (1247 lines, 14 tasks): An agent reads actual source
+code, generates real test implementations, self-reviews its own work,
+triages test failures (infrastructure vs test bugs vs real regressions),
+and produces an intelligent summary report.
+
+The following diagram illustrates the difference between today's opaque
+agent steps and the governed model proposed by this TEP:
+
+```mermaid
+flowchart TB
+ subgraph TODAY["Today: Opaque Agent Steps"]
+ direction TB
+ T_Step["Container Step"]
+ T_Pip["pip install anthropic"]
+ T_Key["Read API key from file"]
+ T_Call["Call LLM directly"]
+ T_Parse["Parse response with regex"]
+ T_Out["Write stdout"]
+
+ T_Step --> T_Pip --> T_Key --> T_Call --> T_Parse --> T_Out
+
+ T_Invisible["Tekton sees:
container ran, exit 0"]
+ T_Out -.-> T_Invisible
+ end
+
+ subgraph TEP["TEP-0164: Governed Agent Steps"]
+ direction TB
+ A_Run["AgentRun CR"]
+ A_OPA["Gate 1: OPA evaluates goal"]
+ A_RBAC["Gate 2: scoped RBAC + NetworkPolicy"]
+ A_Agent["kagent Agent CR
(model + tools configured)"]
+ A_Tools["Gate 3: allowedTools enforced"]
+ A_MCP["Gate 4: MCP-only data access"]
+ A_Result["Gate 5: results + provenance"]
+
+ A_Run --> A_OPA --> A_RBAC --> A_Agent --> A_Tools --> A_MCP --> A_Result
+ end
+```
+
+The agents transform the pipeline from a structure-only scaffold into an
+intelligence-augmented workflow. But every agent step is an opaque
+container:
+
+```python
+# Repeated in every agent step: ~100 lines of boilerplate
+subprocess.check_call([sys.executable, "-m", "pip", "install", "anthropic", "-q"])
+client = anthropic.Anthropic(api_key=api_key)
+# ... 80 more lines of prompt construction, response parsing
+```
+
+Tekton sees these as ordinary container steps. There is no way for a
+platform operator to:
+
+- Scope agent permissions to exactly the Kubernetes resources they need
+- Enforce which tools an agent may call, with real-time policy evaluation
+- Audit which models and prompts were used across the organization
+- Record agent behavior in SLSA-compatible attestations
+- Set token budgets or network isolation per agent execution
+
+Meanwhile, [kagent][kagent] provides CRDs for [model configuration][kagent-models]
+(8 providers), [MCP tool servers][kagent-tools] (with automatic tool
+discovery), and agent runtimes (Python and Go ADKs). But kagent does not provide pipeline
+orchestration, supply-chain provenance, or the per-execution security
+controls that a CI/CD platform requires.
+
+### Goals
+
+1. Define `AgentRun` and `AgentConfig` CRDs that enable goal-driven
+ agent execution with per-run security controls
+2. Use kagent's Agent CR as the agent runtime, creating Agent CRs
+ scoped to AgentRun or PipelineRun lifetimes
+3. Use kagent's `ModelConfig` and `RemoteMCPServer` CRDs for model and
+ tool configuration without reimplementing them
+4. Add per-run security controls: RBAC scoping with configurable rules,
+ NetworkPolicy, layered policy enforcement (OPA + kagent tool
+ restrictions)
+5. Provide a CustomTask adapter so AgentRuns participate in Tekton
+ Pipeline DAGs with result passing and `when` expression support
+6. Record agent execution provenance for consumption by Tekton Chains
+7. Route all data access (codebase, cluster state) through MCP tool
+ calls for uniform policy enforcement and audit
+
+### Non-Goals
+
+1. Building a new agent runtime (kagent provides the ADK)
+2. Building LLM provider integrations (kagent supports 8 providers)
+3. Building MCP server infrastructure (kagent provides
+ `RemoteMCPServer` and `ToolServer`)
+4. Adding new fields to Tekton's Pipeline or PipelineRun CRDs
+5. Defining multi-agent communication protocols
+6. Implementing or bundling any LLM model or inference engine
+
+### Use Cases
+
+#### AI Code Review Gate
+
+As a platform engineer, I want an agent step in my Pipeline that reviews
+a pull request diff and returns a structured `approved`/`findings`
+result, so that downstream steps can conditionally block or proceed,
+with full provenance of what the agent saw and decided.
+
+This is a standalone agent task. One AgentRun, one goal, one result.
+
+#### Multi-Step Test Lifecycle
+
+As a QA engineer, I want a pipeline where an agent analyzes a codebase,
+generates test implementations, self-reviews them, and triages any
+failures. The agent should maintain context across these steps so it
+does not re-read the codebase at each step.
+
+This is a multi-step agent workflow. Multiple AgentRuns in a PipelineRun
+sharing the same Agent CR, with conversation context preserved across
+steps.
+
+#### Deployment Decision Agent
+
+As a DevOps engineer, I want an agent step that queries my observability
+stack via permitted MCP tools and returns a structured `proceed`/`hold`
+recommendation, enforced by OPA policy so it cannot access resources
+outside its declared scope.
+
+#### Cluster Diagnostics Agent
+
+As a cluster administrator, I want to create an AgentRun with a goal
+like "diagnose why deployment api-server is failing in namespace
+production" and have the agent investigate using only the Kubernetes
+resources I have authorized via per-run RBAC.
+
+### Requirements
+
+| ID | Requirement | Priority |
+|----|-------------|----------|
+| R1 | AgentRun controller MUST create kagent Agent CRs for agent execution | Must |
+| R2 | Agent CRs MUST be scoped to AgentRun (standalone) or PipelineRun (multi-step) lifetime | Must |
+| R3 | Multiple AgentRuns with the same configRef in a PipelineRun MUST reuse the same Agent CR | Must |
+| R4 | AgentRun controller MUST generate per-run RBAC using rules from AgentConfig | Must |
+| R5 | AgentConfig MUST reference kagent ModelConfig for model selection | Must |
+| R6 | AgentConfig MUST reference kagent RemoteMCPServer for tool providers | Must |
+| R7 | Agent execution results MUST be recorded in AgentRun status | Must |
+| R8 | Policy enforcement MUST use a five-gate model: OPA at goal submission (Gate 1) + RBAC and NetworkPolicy (Gate 2) + allowedTools/requireApproval at tool execution (Gate 3) + MCP-only data access (Gate 4) + provenance (Gate 5) | Must |
+| R9 | OPA goal-level input MUST be constructed by the controller from AgentRun metadata, not from LLM-controlled data | Must |
+| R10 | Per-tool-call OPA enforcement inside the kagent ADK SHOULD be contributed upstream as a future enhancement | Should |
+| R11 | A CustomTask adapter MUST allow AgentRun to participate in Pipeline DAGs via the Tekton CustomRun protocol | Must |
+| R12 | AgentRun results MUST be passable to downstream Pipeline steps via `when` expressions | Must |
+| R13 | NetworkPolicy SHOULD be generated when networkPolicy is set to strict | Should |
+| R14 | Provenance metadata MUST be recorded in AgentRun status (observable fields in Phase 1, full execution trace when kagent telemetry is available) | Must |
+| R15 | All per-run resources MUST use owner references for garbage collection | Must |
+| R16 | AgentRun SHOULD support Tekton PipelineRun-based pre/post hooks | Should |
+| R17 | AgentRun MUST fail gracefully with a clear message when kagent CRDs are not installed | Must |
+| R18 | OPA policy MUST default to fail-closed (deny-all) when no policy is configured, unconditionally | Must |
+| R19 | All codebase and cluster access SHOULD go through MCP tool calls, not volume mounts | Should |
+
+## Proposal
+
+### Overview
+
+The following diagram shows how a PipelineRun with multiple agent steps
+and a traditional container step works end-to-end:
+
+```mermaid
+sequenceDiagram
+ participant User
+ participant Pipeline as PipelineRun
+ participant Ctrl as AgentRun Controller
+ participant K as kagent Controller
+ participant A1 as Agent: test-agent
+ participant A2 as Agent: security-agent
+ participant MCP as MCP Tool Server
+
+ User->>Pipeline: Create PipelineRun
+
+ Note over Pipeline,Ctrl: Step 1: security-review (configRef: security-agent)
+ Pipeline->>Ctrl: AgentRun created
+ Ctrl->>Ctrl: Gate 1: OPA evaluates goal
+ Ctrl->>Ctrl: Create RBAC (SA + Role)
+ Ctrl->>K: Create Agent CR (security-agent)
+ K->>A2: Deployment + Service ready
+ Ctrl->>A2: POST goal
+ A2->>MCP: read_file (allowedTools enforced)
+ MCP-->>A2: file content
+ A2-->>Ctrl: result: {vulnerabilities: [...]}
+ Pipeline->>Pipeline: results available
+
+ Note over Pipeline,Ctrl: Step 2: analyze-code (configRef: test-agent)
+ Pipeline->>Ctrl: AgentRun created
+ Ctrl->>Ctrl: Gate 1: OPA evaluates goal
+ Ctrl->>Ctrl: Gate 2: Create RBAC (SA + Role)
+ Ctrl->>K: Create Agent CR (test-agent)
+ K->>A1: Deployment + Service ready
+ Ctrl->>A1: POST goal
+ A1->>MCP: search_code, list_functions (Gate 3: allowedTools)
+ A1-->>Ctrl: result: {analysis: ...}
+
+ Note over Pipeline,Ctrl: Step 3: generate-tests (configRef: test-agent, REUSES agent)
+ Pipeline->>Ctrl: AgentRun created
+ Ctrl->>Ctrl: Gate 1: OPA evaluates goal
+ Ctrl->>Ctrl: Find existing Agent CR by labels
+ Ctrl->>A1: POST goal (agent has context from step 2)
+ A1->>MCP: read_file, write_file (Gate 3: allowedTools)
+ A1-->>Ctrl: result: {tests_generated: 12}
+
+ Note over Pipeline,Ctrl: Step 4: run-tests (normal container step)
+ Pipeline->>Pipeline: go test ./...
+
+ Note over Pipeline,Ctrl: PipelineRun completes
+ Pipeline->>Pipeline: Owner refs trigger cleanup
+ K->>A1: Delete Deployment
+ K->>A2: Delete Deployment
+```
+
+```
+PipelineRun
+│
+├── AgentRun Controller sees agent steps (CustomTask references)
+│
+├── Creates kagent Agent CR per unique configRef
+│ └── kagent controller creates Deployment + Service
+│ └── Agent HTTP server running ADK runtime
+│
+├── Step 1 (agent, security-agent): Gate 1 OPA, Gate 2 RBAC, create Agent CR, POST goal
+├── Step 2 (agent, test-agent): Gate 1 OPA, Gate 2 RBAC, create Agent CR, POST goal
+├── Step 3 (agent, test-agent): Gate 1 OPA, reuse Agent CR, POST goal (context preserved)
+├── Step 4 (container): normal Tekton step, uses agent results
+│
+├── PipelineRun completes
+└── Agent CRs garbage collected via owner references
+```
+
+| Layer | Responsibility | Owner |
+|-------|---------------|-------|
+| Agent Runtime | LLM calls, MCP tool execution, agent loop | kagent (Agent CR, ADK) |
+| Model + Tool Config | Model endpoints, credentials, MCP servers | kagent (ModelConfig, RemoteMCPServer) |
+| Agent Lifecycle | Create/reuse/cleanup Agent CRs per scope | AgentRun controller (this TEP) |
+| Security | Per-run RBAC, NetworkPolicy, OPA | AgentRun controller (this TEP) |
+| Pipeline Integration | DAG sequencing, hooks, result passing | Tekton Pipelines + CustomTask |
+| Provenance | Attestation of agent behavior | AgentRun status + Tekton Chains |
+
+### AgentRun CRD
+
+```yaml
+apiVersion: agent.tekton.dev/v1alpha1
+kind: AgentRun
+metadata:
+ name: debug-api-server
+spec:
+ configRef:
+ name: cluster-diagnostics
+ goal: |
+ Diagnose why deployment 'api-server' is failing in namespace 'production'.
+ context:
+ hints:
+ - "Check recent events"
+ - "Review pod logs for OOMKilled"
+status:
+ phase: Succeeded
+ startTime: "2026-03-20T14:20:04Z"
+ completionTime: "2026-03-20T14:22:30Z"
+ iterations: 3
+ agentRef: cluster-diagnostics-7xk2 # kagent Agent CR used
+ results:
+ - name: diagnosis
+ value: "OOMKilled: memory limit 256Mi too low for request pattern"
+ - name: recommendation
+ value: "Increase memory limit to 512Mi"
+ provenance:
+ buildType: "https://tekton.dev/agent-provenance/v1"
+ reproducible: false
+ internalParameters:
+ model:
+ provider: Anthropic
+ modelId: "claude-sonnet-4-6"
+ systemPromptHash: "sha256:abc123..."
+ tokenUsage:
+ totalTokens: 4200
+ policyDecisions:
+ layer1_opa:
+ evaluated: 1
+ allowed: 1
+ denied: 0
+```
+
+### AgentConfig CRD
+
+```yaml
+apiVersion: agent.tekton.dev/v1alpha1
+kind: AgentConfig
+metadata:
+ name: cluster-diagnostics
+spec:
+ # -- kagent references --------------------------
+ modelConfigRef:
+ name: claude-sonnet
+ namespace: kagent-system
+
+ toolServers:
+ - ref:
+ name: k8s-read-tools
+ namespace: kagent-system
+ kind: RemoteMCPServer
+ allowedTools:
+ - k8s_get_resources
+ - k8s_get_logs
+ - k8s_describe
+ requireApproval: []
+
+ # -- Agent behavior -----------------------------
+ maxIterations: 5
+ timeout: 10m
+ tokenBudget: 16384 # enforced in Phase 2; informational in Phase 1
+ systemPrompt: |
+ You are a Kubernetes cluster diagnostics agent.
+ You may ONLY use the tools provided.
+
+ # -- Per-run RBAC (configurable rules) ----------
+ rbac:
+ rules:
+ - apiGroups: [""]
+ resources: [pods, services, events]
+ verbs: [get, list, watch]
+ - apiGroups: [apps]
+ resources: [deployments, replicasets]
+ verbs: [get, list, watch]
+
+ # -- OPA policy ---------------------------------
+ policy:
+ opa:
+ configMapRef:
+ name: agent-policies
+ key: tool-policy.rego
+ defaultDeny: true
+
+ # -- Network isolation --------------------------
+ networkPolicy: strict
+
+ # -- Tekton hooks (optional) --------------------
+ preHooks:
+ pipelineRef:
+ name: prompt-security-scan
+ postHooks:
+ pipelineRef:
+ name: audit-bundle-collection
+```
+
+### Agent Lifecycle
+
+#### Standalone AgentRun
+
+When an AgentRun is created outside a PipelineRun:
+
+1. Controller creates a kagent Agent CR, owner-referenced to the
+ AgentRun
+2. kagent creates the Deployment + Service
+3. Controller waits for Agent Ready condition
+4. Controller POSTs the goal to the Agent Service via HTTP
+5. Controller collects results from the response
+6. AgentRun marked Succeeded/Failed
+7. Agent CR garbage collected via owner reference
+
+The Agent Deployment is short-lived. It exists only for this one goal.
+
+#### AgentRun in a PipelineRun
+
+When multiple AgentRuns in a PipelineRun reference the same
+AgentConfig:
+
+1. First AgentRun: controller creates a kagent Agent CR, labeled
+ with the PipelineRun UID and AgentConfig name
+2. kagent creates the Deployment + Service
+3. Controller POSTs the first goal, collects results
+4. Second AgentRun (same configRef, same PipelineRun): controller
+ finds the existing Agent CR by label, reuses it
+5. Controller POSTs the second goal. The agent has conversation
+ context from the first goal.
+6. PipelineRun completes: Agent CR is cleaned up
+
+The Agent maintains conversation context across all steps that share
+the same configRef within a PipelineRun.
+
+#### Multiple Agents in a PipelineRun
+
+Different configRef values create different Agent CRs:
+
+```yaml
+tasks:
+ - name: security-review
+ taskRef:
+ apiVersion: agent.tekton.dev/v1alpha1
+ kind: AgentRun
+ params:
+ - name: configRef
+ value: security-agent # Agent CR #1
+ - name: goal
+ value: "Review for vulnerabilities"
+
+ - name: analyze-code
+ taskRef:
+ apiVersion: agent.tekton.dev/v1alpha1
+ kind: AgentRun
+ params:
+ - name: configRef
+ value: test-agent # Agent CR #2
+ - name: goal
+ value: "Analyze the codebase"
+
+ - name: generate-tests
+ runAfter: [analyze-code]
+ taskRef:
+ apiVersion: agent.tekton.dev/v1alpha1
+ kind: AgentRun
+ params:
+ - name: configRef
+ value: test-agent # Reuses Agent CR #2
+ - name: goal
+ value: "Generate tests based on your analysis"
+```
+
+This PipelineRun creates two Agent CRs. `security-agent` handles one
+step. `test-agent` handles two steps with shared context.
+
+### Integration with kagent
+
+The AgentRun controller uses kagent in two ways:
+
+**1. Agent CR (agent runtime)**
+
+The controller creates kagent `Agent` CRs via the dynamic client. Each
+Agent CR references a kagent `ModelConfig` for the LLM provider and
+kagent `RemoteMCPServer` resources for tools. kagent's controller
+handles creating the Deployment, Service, and configuring the ADK
+runtime. The AgentRun controller does not build Pods directly.
+
+**2. Configuration CRDs (read-only, via dynamic client)**
+
+The controller reads kagent `ModelConfig` and `RemoteMCPServer` CRs to
+validate that referenced models exist and tools are discovered. These
+are long-lived cluster resources managed by platform administrators.
+
+The controller interacts with all kagent CRDs via
+`k8s.io/client-go/dynamic` to avoid Go module version coupling
+(kagent uses k8s.io v0.35, this controller uses v0.32).
+
+### Integration with Tekton Pipelines
+
+**CustomTask adapter** (Phase 1): AgentRun implements the Tekton
+CustomTask protocol, allowing it to be referenced from Pipeline steps:
+
+```yaml
+apiVersion: tekton.dev/v1
+kind: Pipeline
+metadata:
+ name: review-and-deploy
+spec:
+ tasks:
+ - name: code-review
+ taskRef:
+ apiVersion: agent.tekton.dev/v1alpha1
+ kind: AgentRun
+ params:
+ - name: configRef
+ value: code-review-agent
+ - name: goal
+ value: "Review the PR diff for security issues"
+ - name: deploy
+ runAfter: [code-review]
+ when:
+ - input: "$(tasks.code-review.results.approved)"
+ operator: in
+ values: ["true"]
+ taskRef:
+ name: kubectl-deploy
+```
+
+**Pre/post hooks**: Optional Tekton PipelineRuns for security scanning
+(pre) and audit collection (post) around agent execution. Created via
+dynamic client, owner-referenced for cleanup. When Tekton Pipelines is
+not installed, hooks configuration is rejected at validation time.
+
+### Codebase Access via MCP Tools
+
+Agents access codebases and cluster state through [MCP][mcp] tool
+calls, not through volume mounts. This is a deliberate architectural decision.
+
+Volume mounts give the agent raw filesystem access. The agent can read
+any file on the mounted volume. OPA policy cannot restrict which files
+the agent reads because file reads happen inside the container, outside
+the tool call protocol.
+
+MCP tools route every data access through the tool protocol. Every
+`read_file`, `search_code`, `list_functions` call goes through the MCP
+server, which means every call is restricted by kagent's
+`allowedTools` enforcement and recorded in the agent's tool call
+history for provenance.
+
+```
+Volume mount: Agent reads filesystem directly. No restriction. No audit.
+MCP tools: Agent calls tool. kagent enforces allowedTools. MCP server reads file. Audited.
+```
+
+The following diagram shows how multiple agents access the same
+codebase through a shared MCP server with different tool allowlists:
+
+```mermaid
+flowchart LR
+ subgraph Pipeline["PipelineRun"]
+ Clone["git-clone step
(writes to PVC)"]
+ end
+
+ subgraph MCP["MCP ToolServer Pod"]
+ PVC["Workspace PVC
(mounted read/write)"]
+ RF["read_file"]
+ SC["search_code"]
+ LF["list_functions"]
+ WF["write_file"]
+ end
+
+ subgraph Agents["Agent Deployments"]
+ A1["Security Agent
allowedTools:
read_file, search_code"]
+ A2["Test Agent
allowedTools:
read_file, list_functions,
write_file"]
+ end
+
+ Clone --> PVC
+ A1 -->|"read_file"| RF
+ A1 -->|"search_code"| SC
+ A1 -.->|"BLOCKED"| WF
+ A2 -->|"read_file"| RF
+ A2 -->|"list_functions"| LF
+ A2 -->|"write_file"| WF
+```
+
+For multi-agent pipelines working on the same codebase, all agents talk
+to the same MCP server. Each agent has its own tool allowlist controlling
+what it can access through that server:
+
+```
+PipelineRun
+├── MCP Server (ToolServer CR, has workspace PVC mounted)
+│ ├── read_file
+│ ├── search_code
+│ ├── list_functions
+│ └── write_file
+│
+├── Agent CR #1 (security-agent)
+│ └── allowedTools: [read_file, search_code]
+│
+├── Agent CR #2 (test-agent)
+│ └── allowedTools: [read_file, list_functions, write_file]
+```
+
+Agents that need to write files (test generation) use a `write_file`
+tool on the MCP server. The write is controlled by kagent's
+`allowedTools` (Gate 3) and recorded in the provenance trace (Gate 5).
+
+### Security Layer
+
+#### Threat Model
+
+Agentic workflows introduce threats that do not exist in traditional
+container-based CI/CD:
+
+```mermaid
+flowchart TD
+ subgraph Threats["Threat Sources"]
+ LLM["LLM itself
(hallucination,
instruction failure)"]
+ Prompt["Prompt injection
(via pipeline params,
Jira tickets, PR descriptions)"]
+ Tools["Malicious tools
(compromised MCP servers,
poisoned skill packages)"]
+ Data["Poisoned data
(crafted pod logs,
error messages designed
to manipulate agent)"]
+ Human["Misconfiguration
(overly broad permissions,
missing policy)"]
+ end
+
+ subgraph Impacts["What Can Go Wrong"]
+ Access["Agent accesses data
it should not"]
+ Exfil["Agent exfiltrates data
to LLM provider"]
+ Write["Agent calls write tools
it should not"]
+ Cost["Agent runs indefinitely
consuming tokens"]
+ Invisible["Nobody knows
what happened"]
+ end
+
+ LLM --> Access
+ LLM --> Write
+ Prompt --> Access
+ Prompt --> Write
+ Tools --> Exfil
+ Data --> Write
+ Human --> Access
+ Human --> Cost
+ LLM --> Invisible
+ Tools --> Invisible
+```
+
+The [CoSAI Principles for Secure Agentic Systems][cosai] state: "The
+non-deterministic nature of AI means we cannot always predict the exact
+path an agent will take, making strong foundational cybersecurity
+controls that strictly limit potential actions to expected and intended
+purposes critical."
+
+The [OWASP Top 10 for Agentic Applications][owasp-agentic] identifies
+agent behavior hijacking (ASI01), prompt injection (ASI02), and tool
+misuse (ASI03) as the top risks. These are the threats this security
+layer addresses.
+
+#### Five-Gate Security Architecture
+
+Each threat is stopped at a specific point in the execution path.
+No single gate is sufficient. The five gates compose Kubernetes RBAC,
+NetworkPolicy, OPA, kagent tool restrictions, and MCP protocol into a
+coherent security boundary around agent execution.
+
+```mermaid
+flowchart TD
+ Goal["Goal submitted"]
+
+ subgraph G1["Gate 1: Goal Admission"]
+ OPA["OPA evaluates goal
+ namespace + tools"]
+ PreHook["Pre-hook pipeline
(prompt security scan)"]
+ Validate["AgentConfig validation
(RBAC rules, token budget)"]
+ end
+
+ subgraph G2["Gate 2: Cluster Access"]
+ SA["Per-run ServiceAccount"]
+ Role["Per-run Role
(from AgentConfig.rbac.rules)"]
+ NetPol["Per-run NetworkPolicy
(only declared endpoints)"]
+ end
+
+ subgraph G3["Gate 3: Tool Restriction"]
+ Allowed["kagent allowedTools
(static allowlist)"]
+ Approval["kagent requireApproval
(human gate)"]
+ FutureOPA["Future: per-tool-call OPA
(conditional policy)"]
+ end
+
+ subgraph G4["Gate 4: Data Flow"]
+ MCPOnly["All access via MCP tools
(no volume mounts)"]
+ Audit["Every tool call
recorded in trace"]
+ end
+
+ subgraph G5["Gate 5: Attestation"]
+ Prov["Provenance recorded
(buildType, model, tools)"]
+ NonDet["reproducible: false
(explicit non-determinism)"]
+ Chains["Tekton Chains
(cryptographic signature)"]
+ PostHook["Post-hook pipeline
(audit bundle)"]
+ end
+
+ Goal --> G1
+ G1 -->|pass| G2
+ G2 --> G3
+ G3 --> G4
+ G4 --> G5
+
+ G1 -->|fail| Reject1["REJECT: goal denied"]
+ G3 -->|fail| Reject2["REJECT: tool blocked"]
+```
+
+| Gate | Threat Addressed | Provided By | Exists Today? |
+|------|-----------------|-------------|---------------|
+| 1. Goal admission | Prompt injection, misconfiguration | OPA (library) + Tekton pre-hook PipelineRun | Yes |
+| 2. Cluster access | Unauthorized data access, lateral movement | Kubernetes RBAC + NetworkPolicy (native K8s) | Yes |
+| 3. Tool restriction | Tool misuse, unauthorized write operations | kagent allowedTools + requireApproval | Yes |
+| 4. Data flow | Data exfiltration, unaudited access | MCP protocol (all access via tool calls) | Yes |
+| 5. Attestation | Invisible agent behavior, no accountability | Provenance struct + Tekton Chains | Partially (provenance new, Chains exists) |
+
+Every gate except provenance uses existing infrastructure. AgentRun
+does not invent new security primitives. It composes existing
+Kubernetes, kagent, and Tekton mechanisms into a per-execution
+security boundary with cleanup and audit trail.
+
+#### Gate 1: Goal Admission (OPA)
+
+The controller evaluates OPA policy at goal submission time. The OPA
+input includes the goal text, requested tool servers, target
+namespaces, and the AgentConfig reference. OPA can reject the entire
+execution before any agent is created.
+
+```rego
+package agent.goals
+
+default allow = false
+
+allow {
+ input.namespace in data.allowed_namespaces
+}
+
+deny[msg] {
+ some tool in input.requested_tools
+ tool in data.write_tools
+ not tool in input.require_approval
+ msg := sprintf("write tool %s must have requireApproval set", [tool])
+}
+```
+
+OPA input is constructed by the controller, not the LLM:
+
+```go
+input := map[string]interface{}{
+ "goal": agentRun.Spec.Goal,
+ "namespace": agentRun.Namespace,
+ "requested_tools": allowedToolsList,
+ "require_approval": requireApprovalList,
+ "config": agentConfigName,
+}
+```
+
+Default policy is **fail-closed**: when no OPA policy ConfigMap is
+configured, goal submission is denied unconditionally. The
+`defaultDeny` field in AgentConfig is reserved for future use to
+allow explicit opt-in to permissive mode; the default behavior is
+always deny.
+
+The controller also sets `allowedTools` and `requireApproval` on
+the Agent CR, which kagent enforces at Gate 3.
+
+#### Gate 2: Cluster Access (RBAC + NetworkPolicy)
+
+For each AgentRun (or per PipelineRun for shared agents), the
+controller creates:
+- A `ServiceAccount` named `-sa`
+- A `Role` with rules from `AgentConfig.spec.rbac.rules`
+- A `RoleBinding` binding the Role to the ServiceAccount
+
+The kagent Agent CR is configured with
+`spec.declarative.deployment.serviceAccountName` so the agent
+Deployment uses this scoped ServiceAccount. All resources are
+owner-referenced for cleanup.
+
+When `networkPolicy: strict`, the controller generates a
+NetworkPolicy that:
+- Allows ingress from the AgentRun controller (HTTP communication)
+- Allows egress to: Kubernetes API server (resolved cluster IP,
+ 443/tcp), DNS (53/udp), declared MCP tool server endpoints
+- Denies all other traffic
+
+#### Gate 3: Tool Restriction (kagent)
+
+kagent's ADK runtime enforces `allowedTools` (the agent cannot call
+tools not in the list) and `requireApproval` (the agent pauses and
+waits for human approval before calling specified tools). These are
+set by the AgentRun controller when constructing the Agent CR from
+the AgentConfig.
+
+Future: per-tool-call OPA evaluation inside the kagent ADK, where
+the full tool call input (namespace, resource type, label selectors)
+is available. This requires an upstream contribution to kagent.
+
+#### Gate 4: Data Flow (MCP)
+
+All codebase and cluster access goes through MCP tool calls, not
+volume mounts. This means every data access is visible in the
+execution trace, restricted by the tool allowlist, and auditable
+in provenance. See [Codebase Access via MCP Tools](#codebase-access-via-mcp-tools).
+
+#### Gate 5: Attestation (Provenance + Chains)
+
+See [Provenance](#provenance) for the full provenance schema,
+buildType URI, telemetry data flow, and Chains integration.
+
+### Provenance
+
+#### Agent Provenance vs Build Provenance
+
+Traditional CI/CD provenance (SLSA, in-toto) assumes a deterministic
+build: the same source, builder, and parameters produce the same
+artifact. Agent execution is fundamentally different. The same goal,
+model, and tools can produce different tool call sequences, different
+reasoning paths, and different results on every run. This is not a
+bug; it is the nature of LLM-based reasoning.
+
+This means agent provenance must be **descriptive** (what happened)
+rather than **prescriptive** (what should happen). A verifier cannot
+reproduce an agent execution from its provenance. Instead, provenance
+answers: what model was used, what prompt was given, what tools were
+called in what order, what policy decisions were made, and what
+results were produced.
+
+The TEP proposes an agentic provenance extension that records this
+metadata in a format compatible with [SLSA provenance][slsa] and
+informed by the [PROV-AGENT][prov-agent] schema for tracking AI
+agent interactions. The [LLM Agents for Interactive Workflow
+Provenance][workflow-provenance] reference architecture provides
+additional context for provenance capture in non-deterministic
+workflows.
+
+#### buildType
+
+The TEP defines a new buildType URI for agentic executions:
+
+```
+https://tekton.dev/agent-provenance/v1
+```
+
+This buildType signals to verifiers that the execution is
+non-deterministic, the artifact cannot be reproduced from the same
+inputs, and the provenance contains agent-specific fields
+(model identity, tool call sequence, policy decisions).
+
+#### Provenance Fields
+
+The `AgentRun.status.provenance` captures the following, mapped to
+SLSA predicate fields:
+
+```yaml
+provenance:
+ # Build definition
+ buildType: "https://tekton.dev/agent-provenance/v1"
+ reproducible: false
+ reproducibilityNote: "LLM-based agent execution is non-deterministic"
+
+ # External parameters (user-provided inputs)
+ externalParameters:
+ goal: "Diagnose why deployment api-server is failing"
+ goalHash: "sha256:def456..."
+ hints: ["Check recent events", "Review pod logs"]
+ agentConfigRef: cluster-diagnostics
+ agentConfigHash: "sha256:789abc..." # hash of snapshotted config
+
+ # Internal parameters (system-determined)
+ internalParameters:
+ model:
+ provider: Anthropic
+ modelId: "claude-sonnet-4-6"
+ apiVersion: "2023-06-01"
+ temperature: 0.2
+ maxTokens: 4096
+ tokenBudget: 16384
+ systemPromptHash: "sha256:abc123..."
+ maxIterations: 5
+ timeout: "10m"
+ opaPolicy:
+ configMapRef: agent-policies
+ policyHash: "sha256:fed321..."
+
+ # Resolved dependencies (runtime-discovered)
+ resolvedDependencies:
+ - name: kagent-agent-cr
+ uri: "kagent.dev/v1alpha2/Agent/default/cluster-diagnostics-7xk2"
+ - name: adk-runtime-image
+ uri: "ghcr.io/kagent-dev/kagent/app"
+ digest: "sha256:a1b2c3..."
+ - name: mcp-server-k8s-read-tools
+ uri: "kagent.dev/v1alpha2/RemoteMCPServer/kagent-system/k8s-read-tools"
+ toolsDiscovered: ["k8s_get_resources", "k8s_get_logs", "k8s_describe"]
+ - name: model-config
+ uri: "kagent.dev/v1alpha2/ModelConfig/kagent-system/claude-sonnet"
+
+ # Execution trace (ordered tool call sequence)
+ executionTrace:
+ iterations: 3
+ toolCalls:
+ - sequence: 1
+ iteration: 1
+ tool: k8s_get_resources
+ inputHash: "sha256:111..."
+ outputHash: "sha256:222..."
+ timestamp: "2026-03-20T14:20:05Z"
+ durationMs: 340
+ policyVerdict: allowed
+ - sequence: 2
+ iteration: 1
+ tool: k8s_get_logs
+ inputHash: "sha256:333..."
+ outputHash: "sha256:444..."
+ timestamp: "2026-03-20T14:20:06Z"
+ durationMs: 520
+ policyVerdict: allowed
+ - sequence: 3
+ iteration: 2
+ tool: k8s_describe
+ inputHash: "sha256:555..."
+ outputHash: "sha256:666..."
+ timestamp: "2026-03-20T14:20:08Z"
+ durationMs: 280
+ policyVerdict: allowed
+ llmInvocations:
+ - sequence: 1
+ iteration: 1
+ requestHash: "sha256:aaa..."
+ responseHash: "sha256:bbb..."
+ promptTokens: 1200
+ completionTokens: 800
+ - sequence: 2
+ iteration: 2
+ requestHash: "sha256:ccc..."
+ responseHash: "sha256:ddd..."
+ promptTokens: 2400
+ completionTokens: 600
+
+ # Token usage (total and per-invocation breakdown)
+ tokenUsage:
+ totalPromptTokens: 3600
+ totalCompletionTokens: 1400
+ totalTokens: 5000
+
+ # Policy decisions
+ policyDecisions:
+ layer1_opa:
+ engine: OPA
+ evaluated: 1
+ allowed: 1
+ denied: 0
+ layer2_kagent:
+ allowedToolsEnforced: true
+ toolsBlocked: 0
+ approvalsPaused: 0
+
+ # Builder identity
+ builder:
+ controllerVersion: "v0.1.0"
+ kagentVersion: "v0.7.13"
+ adkImageDigest: "sha256:a1b2c3..."
+
+ # Result
+ resultHash: "sha256:eee..."
+```
+
+#### Telemetry Data Flow
+
+The controller cannot observe individual tool calls and LLM
+invocations because they happen inside the kagent ADK runtime. The
+execution trace is collected from the kagent Agent's HTTP response.
+
+The controller POSTs a goal to the Agent Service and expects a
+structured JSON response that includes both the result and execution
+telemetry:
+
+```json
+{
+ "result": {
+ "diagnosis": "OOMKilled: memory limit too low",
+ "recommendation": "Increase to 512Mi"
+ },
+ "telemetry": {
+ "iterations": 3,
+ "toolCalls": [...],
+ "llmInvocations": [...],
+ "tokenUsage": {...}
+ }
+}
+```
+
+kagent's ADK already tracks tool calls and LLM invocations internally
+for its session management. Exposing this data in the HTTP response
+is an upstream contribution to kagent. Until this is available, the
+controller records what it can observe directly: model identity,
+prompt hash, policy decisions (Gate 1), and timing metadata.
+
+#### Tekton Chains Integration
+
+Tekton Chains discovers agent provenance through the CustomTask
+adapter. When an AgentRun completes as a CustomRun within a
+PipelineRun, Chains processes it like any other step:
+
+1. Chains watches CustomRun completion events
+2. The CustomRun status contains the `provenance` struct
+3. Chains maps the struct to an in-toto attestation using the
+ `https://tekton.dev/agent-provenance/v1` buildType
+4. The attestation is signed and stored alongside the PipelineRun
+ attestation
+
+For standalone AgentRuns (not in a Pipeline), a Chains extension
+watches AgentRun completion events directly and produces standalone
+attestations.
+
+The `reproducible: false` flag signals to any SLSA verifier that
+this execution cannot be reproduced from the same inputs. This is
+a necessary extension for non-deterministic build steps.
+
+#### Prompt Auditability
+
+The `systemPrompt` field in AgentConfig is mutable. The controller
+records `systemPromptHash` (SHA-256) in provenance. This is
+auditability, not immutability: you can verify what was used and
+detect changes between runs. True immutability would require an
+admission webhook and is out of scope.
+
+#### Token Budget
+
+The `tokenBudget` field is passed to the kagent Agent as
+configuration. If the ADK runtime does not enforce it natively, the
+controller enforces a timeout-based fallback. Token budgets via Pod
+timeout are best-effort because a model can consume many tokens in a
+short time. This limitation is acknowledged.
+
+#### Prior Art
+
+- [PROV-AGENT][prov-agent] extends W3C PROV with agent-specific
+ entities (AIAgent, AgentTool, AIModelInvocation) and relationships
+ for tracking non-deterministic agent interactions. The provenance
+ schema in this TEP is informed by PROV-AGENT's entity model.
+- [CoSAI Principles][cosai] recommend adapting SLSA for agent and
+ model artifact provenance, with continuous runtime validation.
+- [OWASP Top 10 for Agentic Applications][owasp-agentic] identifies
+ agent behavior hijacking (ASI01), prompt injection (ASI02), and
+ tool misuse (ASI03) as top risks. The provenance trace enables
+ post-hoc detection of all three.
+
+### Notes and Caveats
+
+- **kagent ADK image compatibility**: The controller creates kagent
+ Agent CRs that reference a specific ADK image tag. Breaking changes
+ in kagent's Agent CR spec would require controller updates. This is
+ mitigated by pinning to tested kagent versions in CI.
+- **Cross-namespace secret access**: ModelConfig in `kagent-system`
+ references API key secrets in `kagent-system`. The Agent Deployment
+ runs in the user's namespace. kagent's controller handles secret
+ mounting in the Agent Deployment. The AgentRun controller does not
+ need to manage cross-namespace secrets directly.
+- **AgentConfig mutability during execution**: If AgentConfig is
+ updated while an AgentRun is in the Acting phase, the running agent
+ is not affected because the Agent CR was created with a snapshot of
+ the configuration at reconcile time. Subsequent AgentRuns will use
+ the updated AgentConfig.
+
+## Design Details
+
+### Execution Flow: Standalone AgentRun
+
+```
+AgentRun.Phase: Pending
+ ├── Validate AgentConfig exists
+ ├── Snapshot AgentConfig spec (immutable for this run)
+ ├── Resolve kagent ModelConfig via dynamic client GET
+ ├── Resolve kagent RemoteMCPServer(s) via dynamic client GET
+ ├── Validate: model ready, tools discovered, OPA policy exists
+ ├── Create ServiceAccount, Role, RoleBinding (owner-referenced)
+ └── Create NetworkPolicy if strict (owner-referenced)
+
+AgentRun.Phase: PreHooks (if configured)
+ ├── Create Tekton PipelineRun (owner-referenced)
+ └── Watch PipelineRun completion
+
+AgentRun.Phase: Acting
+ ├── Create kagent Agent CR (owner-referenced to AgentRun):
+ │ spec.type: Declarative
+ │ spec.declarative.modelConfig:
+ │ spec.declarative.systemMessage:
+ │ spec.declarative.tools:
+ │ spec.declarative.deployment.serviceAccountName:
+ ├── Wait for Agent Ready condition
+ ├── POST goal to Agent Service HTTP endpoint
+ └── Collect results from response
+
+AgentRun.Phase: PostHooks (if configured)
+ ├── Create Tekton PipelineRun with results as params
+ └── Watch PipelineRun completion
+
+AgentRun.Phase: Succeeded / Failed
+ ├── Update status with results and provenance
+ ├── Emit Kubernetes events
+ └── Per-run resources cleaned up via owner references
+```
+
+### Execution Flow: PipelineRun with Agent Steps
+
+```
+PipelineRun starts with agent steps (CustomTask references)
+
+First AgentRun with configRef "test-agent":
+ ├── Create Agent CR "test-agent-"
+ │ labels:
+ │ agent.tekton.dev/pipelinerun:
+ │ agent.tekton.dev/config: test-agent
+ │ ownerReferences: [{kind: PipelineRun}]
+ ├── Create RBAC resources (owner-referenced to PipelineRun)
+ ├── Wait for Agent Ready
+ ├── POST goal, collect results
+ └── AgentRun marked Succeeded
+
+Second AgentRun with configRef "test-agent" (same PipelineRun):
+ ├── Find existing Agent CR by labels:
+ │ agent.tekton.dev/pipelinerun:
+ │ agent.tekton.dev/config: test-agent
+ ├── Agent already Ready
+ ├── POST goal (agent has context from first call)
+ ├── Collect results
+ └── AgentRun marked Succeeded
+
+PipelineRun completes:
+ └── Agent CR garbage collected via owner reference to PipelineRun
+```
+
+### Agent CR Scoping and Reuse
+
+The following diagram shows how the controller decides whether to
+create a new Agent CR or reuse an existing one:
+
+```mermaid
+flowchart TD
+ Start["AgentRun reconciled"]
+ InPipeline{"Part of a
PipelineRun?"}
+
+ Standalone["Standalone mode"]
+ CreateNew["Create new Agent CR
owner-ref: AgentRun"]
+
+ Pipeline["PipelineRun mode"]
+ Search["Search for Agent CR with labels:
pipelinerun=UID, config=name"]
+ Found{"Agent CR
exists?"}
+ Reuse["Reuse existing Agent CR
POST goal to running agent"]
+ CreatePR["Create new Agent CR
owner-ref: PipelineRun"]
+
+ Start --> InPipeline
+ InPipeline -->|no| Standalone --> CreateNew
+ InPipeline -->|yes| Pipeline --> Search --> Found
+ Found -->|yes| Reuse
+ Found -->|no| CreatePR
+```
+
+The controller uses labels to track Agent CR ownership:
+
+| Label | Value | Purpose |
+|-------|-------|---------|
+| `agent.tekton.dev/config` | AgentConfig name | Identifies which config this agent uses |
+| `agent.tekton.dev/agentrun` | AgentRun name | Set for standalone AgentRuns |
+| `agent.tekton.dev/pipelinerun` | PipelineRun UID | Set for PipelineRun-scoped agents |
+
+Reuse logic:
+- Standalone: always create a new Agent CR
+- In PipelineRun: list Agent CRs with matching `pipelinerun` and
+ `config` labels. If found, reuse. If not, create.
+
+Owner references:
+- Standalone: Agent CR owner-referenced to AgentRun
+- In PipelineRun: Agent CR owner-referenced to PipelineRun (so it outlives
+ individual AgentRuns but is cleaned up when the PipelineRun ends)
+
+### kagent Resource Resolution
+
+The controller reads kagent CRDs via `k8s.io/client-go/dynamic`.
+
+At reconcile time, the controller:
+
+1. GETs the referenced `kagent.dev/v1alpha2 ModelConfig` to validate
+ the model exists and is ready
+2. GETs each referenced `kagent.dev/v1alpha2 RemoteMCPServer` to
+ validate tools are discovered
+3. Constructs the kagent Agent CR spec with the resolved references
+
+The controller does not build `config.json` itself. kagent's own
+controller handles the translation from Agent CR to ADK configuration.
+
+If kagent CRDs are not installed in the cluster, the controller sets a
+`KagentNotInstalled` condition on the AgentRun with a clear message.
+
+### Security Implementation Details
+
+The five-gate security architecture is described in the
+[Security Layer](#security-layer) section of the Proposal. This
+section provides implementation-level details for Gates 1 and 2,
+which are implemented by the AgentRun controller. Gate 3
+(allowedTools/requireApproval) and Gate 4 (MCP-only access) are
+enforced by [kagent][kagent] inside the agent runtime. Gate 5
+(provenance) is detailed in
+[Provenance Recording](#provenance-recording).
+
+#### Gate 1: OPA Goal Admission
+
+```
+// Pseudocode: OPA evaluation at goal submission
+allowResult := opaEngine.Evaluate("data.agent.goals.allow", input)
+denyResults := opaEngine.Evaluate("data.agent.goals.deny", input)
+
+if !allowResult || len(denyResults) > 0 {
+ reject AgentRun with PolicyDenied condition
+}
+```
+
+Default policy when no ConfigMap is configured:
+
+```rego
+package agent.goals
+default allow = false
+```
+
+#### Gate 2: RBAC Resources
+
+```yaml
+apiVersion: v1
+kind: ServiceAccount
+metadata:
+ name: debug-api-server-sa
+ ownerReferences:
+ - apiVersion: agent.tekton.dev/v1alpha1
+ kind: AgentRun
+ name: debug-api-server
+ uid:
+---
+apiVersion: rbac.authorization.k8s.io/v1
+kind: Role
+metadata:
+ name: debug-api-server-role
+ ownerReferences: # abbreviated, same as above
+ - apiVersion: agent.tekton.dev/v1alpha1
+ kind: AgentRun
+ name: debug-api-server
+ uid:
+rules: # from AgentConfig.spec.rbac.rules
+ - apiGroups: [""]
+ resources: [pods, services, events]
+ verbs: [get, list, watch]
+ - apiGroups: [apps]
+ resources: [deployments, replicasets]
+ verbs: [get, list, watch]
+---
+apiVersion: rbac.authorization.k8s.io/v1
+kind: RoleBinding
+metadata:
+ name: debug-api-server-binding
+ ownerReferences: # abbreviated
+ - apiVersion: agent.tekton.dev/v1alpha1
+ kind: AgentRun
+ name: debug-api-server
+ uid:
+roleRef:
+ apiGroup: rbac.authorization.k8s.io
+ kind: Role
+ name: debug-api-server-role
+subjects:
+ - kind: ServiceAccount
+ name: debug-api-server-sa
+```
+
+The kagent Agent CR is configured with
+`spec.declarative.deployment.serviceAccountName: debug-api-server-sa`.
+
+For PipelineRun-scoped agents, RBAC resources are owner-referenced to the
+PipelineRun. Since all AgentRuns sharing this agent use the same
+AgentConfig, there is no rule conflict.
+
+#### Gate 2: NetworkPolicy Resources
+
+```yaml
+apiVersion: networking.k8s.io/v1
+kind: NetworkPolicy
+metadata:
+ name: debug-api-server-netpol
+ ownerReferences: # abbreviated
+ - apiVersion: agent.tekton.dev/v1alpha1
+ kind: AgentRun
+ name: debug-api-server
+ uid:
+spec:
+ podSelector:
+ matchLabels:
+ agent.tekton.dev/config: cluster-diagnostics
+ policyTypes: [Ingress, Egress]
+ ingress:
+ - from:
+ - podSelector:
+ matchLabels:
+ app.kubernetes.io/component: agentrun-controller
+ egress:
+ - to:
+ - ipBlock:
+ cidr: /32
+ ports: [{protocol: TCP, port: 443}]
+ - to:
+ - namespaceSelector:
+ matchLabels:
+ kubernetes.io/metadata.name: kube-system
+ ports: [{protocol: UDP, port: 53}]
+ # MCP tool server endpoints added dynamically
+```
+
+API server egress uses `ipBlock` with the resolved cluster IP, not
+`namespaceSelector`, to avoid allowing the agent to reach arbitrary
+HTTPS endpoints.
+
+### CustomTask Adapter
+
+When a Pipeline step references an AgentRun via `taskRef`, Tekton's
+PipelineRun controller creates a `CustomRun` object (not an AgentRun
+directly). The AgentRun controller watches `CustomRun` objects where
+`spec.customRef.apiVersion` is `agent.tekton.dev/v1alpha1` and
+`spec.customRef.kind` is `AgentRun`.
+
+```yaml
+# Pipeline author writes:
+taskRef:
+ apiVersion: agent.tekton.dev/v1alpha1
+ kind: AgentRun
+
+# Tekton creates a CustomRun. AgentRun controller reconciles it.
+```
+
+The controller reconciles the `CustomRun` directly (it does not
+create a separate `AgentRun` CR). The `CustomRun.spec.params` are
+mapped to AgentRun spec fields:
+
+| CustomRun param | Maps to |
+|-----------------|---------|
+| `configRef` | AgentConfig name |
+| `goal` | Goal text |
+| `hints` | Context hints |
+
+The controller discovers the owning PipelineRun by inspecting
+`CustomRun.metadata.ownerReferences` for a reference with
+`kind: PipelineRun`. This PipelineRun UID is used for Agent CR
+scoping and reuse.
+
+Results are written to `CustomRun.status.results` so downstream
+Pipeline steps can reference them via `$(tasks..results.)`.
+The `when` expression support follows from standard Tekton result
+passing.
+
+Timeout and cancellation are handled by observing the `CustomRun`
+spec: if Tekton sets a timeout or cancellation condition, the
+controller stops the agent execution and cleans up.
+
+### Provenance Recording
+
+The provenance struct captures the full execution trace of an
+agent run, mapped to SLSA predicate fields:
+
+```go
+type AgentRunProvenance struct {
+ BuildType string `json:"buildType"`
+ Reproducible bool `json:"reproducible"`
+ ReproducibilityNote string `json:"reproducibilityNote,omitempty"`
+ ExternalParameters ExternalParams `json:"externalParameters"`
+ InternalParameters InternalParams `json:"internalParameters"`
+ ResolvedDependencies []ResolvedDependency `json:"resolvedDependencies"`
+ ExecutionTrace ExecutionTrace `json:"executionTrace"`
+ TokenUsage TokenUsage `json:"tokenUsage"`
+ PolicyDecisions PolicyDecisions `json:"policyDecisions"`
+ Builder BuilderIdentity `json:"builder"`
+ ResultHash string `json:"resultHash"`
+}
+
+type ExternalParams struct {
+ Goal string `json:"goal"`
+ GoalHash string `json:"goalHash"`
+ Hints []string `json:"hints,omitempty"`
+ AgentConfigRef string `json:"agentConfigRef"`
+ AgentConfigHash string `json:"agentConfigHash"`
+}
+
+type InternalParams struct {
+ Model ModelIdentity `json:"model"`
+ SystemPromptHash string `json:"systemPromptHash"`
+ MaxIterations int `json:"maxIterations"`
+ Timeout string `json:"timeout"`
+ TokenBudget int `json:"tokenBudget,omitempty"`
+ OPAPolicyHash string `json:"opaPolicyHash,omitempty"`
+}
+
+type ModelIdentity struct {
+ Provider string `json:"provider"`
+ ModelID string `json:"modelId"`
+ APIVersion string `json:"apiVersion,omitempty"`
+ Temperature float64 `json:"temperature,omitempty"`
+ MaxTokens int `json:"maxTokens,omitempty"`
+}
+
+type ResolvedDependency struct {
+ Name string `json:"name"`
+ URI string `json:"uri"`
+ Digest string `json:"digest,omitempty"`
+ ToolsDiscovered []string `json:"toolsDiscovered,omitempty"`
+}
+
+type ExecutionTrace struct {
+ Iterations int `json:"iterations"`
+ ToolCalls []ToolCallRecord `json:"toolCalls"`
+ LLMInvocations []LLMInvocation `json:"llmInvocations"`
+}
+
+type ToolCallRecord struct {
+ Sequence int `json:"sequence"`
+ Iteration int `json:"iteration"`
+ Tool string `json:"tool"`
+ InputHash string `json:"inputHash"`
+ OutputHash string `json:"outputHash"`
+ Timestamp string `json:"timestamp"`
+ DurationMs int `json:"durationMs"`
+ PolicyVerdict string `json:"policyVerdict"`
+}
+
+type LLMInvocation struct {
+ Sequence int `json:"sequence"`
+ Iteration int `json:"iteration"`
+ RequestHash string `json:"requestHash"`
+ ResponseHash string `json:"responseHash"`
+ PromptTokens int `json:"promptTokens"`
+ CompletionTokens int `json:"completionTokens"`
+}
+
+type TokenUsage struct {
+ TotalPromptTokens int `json:"totalPromptTokens"`
+ TotalCompletionTokens int `json:"totalCompletionTokens"`
+ TotalTokens int `json:"totalTokens"`
+}
+
+type PolicyDecisions struct {
+ Layer1OPA OPADecisions `json:"layer1_opa"`
+ Layer2Kagent KagentDecisions `json:"layer2_kagent"`
+}
+
+type OPADecisions struct {
+ Engine string `json:"engine"`
+ Evaluated int `json:"evaluated"`
+ Allowed int `json:"allowed"`
+ Denied int `json:"denied"`
+}
+
+type KagentDecisions struct {
+ AllowedToolsEnforced bool `json:"allowedToolsEnforced"`
+ ToolsBlocked int `json:"toolsBlocked"`
+ ApprovalsPaused int `json:"approvalsPaused"`
+}
+
+type BuilderIdentity struct {
+ ControllerVersion string `json:"controllerVersion"`
+ KagentVersion string `json:"kagentVersion"`
+ ADKImageDigest string `json:"adkImageDigest"`
+}
+```
+
+The `ExecutionTrace` and `TokenUsage` fields depend on telemetry
+from the kagent ADK runtime (see [Telemetry Data Flow](#telemetry-data-flow)
+in the Proposal section). Until kagent exposes this telemetry in its
+HTTP response, the controller populates what it can observe directly:
+`ExternalParameters`, `InternalParameters`, `ResolvedDependencies`,
+`PolicyDecisions` (Gate 1), and `Builder`.
+
+### AgentConfig Snapshot
+
+When an AgentRun is reconciled, the controller snapshots the
+AgentConfig spec into the AgentRun status. This ensures that if the
+AgentConfig is modified during execution, the running agent is not
+affected and the provenance record reflects the actual configuration
+used.
+
+## Design Evaluation
+
+### Reusability
+
+This proposal follows Tekton's [design principles][design-principles]
+by reusing two existing projects:
+- **[kagent][kagent]** provides the [Agent CR][kagent-agents], ADK
+ runtime, [ModelConfig][kagent-models], and
+ [RemoteMCPServer][kagent-tools] CRDs
+- **Tekton Pipelines** provides Pipeline orchestration and the
+ [CustomTask][customtask] protocol
+
+The [AgentRun PoC][agentrun-poc] demonstrates the concept. The
+controller adds agent lifecycle management, security, and provenance.
+
+### Simplicity
+
+Users interact with two CRDs (`AgentRun` and `AgentConfig`). In a
+Pipeline, agent steps look like any other CustomTask reference. The
+agent lifecycle (create, reuse, cleanup) is managed by the controller.
+
+### Flexibility
+
+- kagent ModelConfig supports 8 LLM providers
+- MCP tool servers are pluggable
+- RBAC rules are configurable per AgentConfig
+- [OPA][opa] policies are user-defined via ConfigMaps
+- Tekton Pipelines integration is optional
+- Multiple agents with different configs can coexist in a PipelineRun
+
+### Conformance
+
+This proposal does not modify any existing Tekton APIs. New CRDs are
+under the `agent.tekton.dev` API group. The CustomTask adapter follows
+the existing CustomTask protocol. No changes to `Task`, `Pipeline`,
+`TaskRun`, or `PipelineRun` resources.
+
+This proposal introduces kagent and OPA as additional concepts users
+must understand. kagent CRDs are managed by cluster administrators.
+OPA policies are managed by security teams. Pipeline authors only
+interact with AgentRun and AgentConfig.
+
+### User Experience
+
+- **Cluster administrators** install kagent and configure ModelConfigs
+ and RemoteMCPServers
+- **Platform engineers** create AgentConfigs with RBAC rules and OPA
+ policies
+- **Pipeline authors** reference AgentRuns in Pipeline specs via
+ CustomTask
+- **Security teams** define OPA policies and review agent provenance
+
+### Performance
+
+- **Agent startup**: kagent Agent Deployments require pod scheduling.
+ Typical startup is 5-15 seconds with pre-pulled images. For
+ PipelineRun-scoped agents, this cost is paid once and amortized
+ across all agent steps.
+- **Controller footprint**: The controller creates Agent CRs and sends
+ HTTP requests. No LLM processing occurs in the controller.
+- **Cleanup**: Owner references ensure no resource leaks.
+
+### Risks and Mitigations
+
+| Risk | Mitigation |
+|------|------------|
+| kagent Agent CR spec changes | Dynamic client is resilient to field additions. Pin to kagent v1alpha2. Test against kagent releases in CI. |
+| kagent CRDs not installed | Controller checks at startup. Clear condition on AgentRun. |
+| Tekton Pipelines not installed | Hooks are optional. CustomTask adapter degrades gracefully. Hooks configuration rejected at validation when Tekton is absent. |
+| Agent Deployment startup latency | Pre-pulled images. PipelineRun-scoped agents amortize startup across steps. |
+| OPA policy misconfiguration | Default fail-closed. Deny all when no policy is configured. |
+| Per-tool-call OPA not available in Phase 1 | Five-gate model: OPA at goal level (Gate 1) + kagent allowedTools/requireApproval (Gate 3) provide meaningful security. Per-tool-call OPA is a future kagent contribution. |
+| LLM-controlled tool inputs spoofing OPA | Gate 1 OPA input is constructed by the controller, not the LLM. Gate 3 tool restrictions are set on the Agent CR, not controllable by the LLM. |
+| Token budget not enforced by ADK | Fallback to timeout. Acknowledge as best-effort. |
+| PipelineRun cancelled while agent is running | Owner reference on Agent CR triggers garbage collection. Agent Deployment receives SIGTERM. |
+| AgentConfig modified during execution | Config snapshotted at reconcile time. Running agent not affected. |
+
+### Drawbacks
+
+- **Dependency on kagent**: The proposal uses kagent's Agent CR and
+ CRDs. If kagent changes direction, the controller would need
+ adaptation. The dynamic client approach minimizes coupling.
+- **Two systems to install**: Users must install kagent and the
+ AgentRun controller. Mitigated by Helm charts that bundle both.
+- **Deployment overhead for standalone AgentRuns**: A single-goal
+ AgentRun creates a full Deployment + Service for one HTTP call. This
+ is the cost of using kagent's existing model. If kagent adds a
+ Job/batch mode ([kagent#1089][kagent-1089]), standalone AgentRuns
+ could switch to the lighter-weight model.
+
+## Alternatives
+
+### Build Agent Stack Inside Tekton
+
+Add an `agent` step type to `Task.spec.steps`, build MCPServerRef CRD,
+build agent runtime sidecar, model configuration, tool discovery.
+
+Rejected: Massive scope that duplicates kagent. Would require the
+Tekton community to build and maintain an agent runtime.
+
+### Pod-Per-Run Without Agent CR
+
+Create a Pod directly using kagent's ADK image for each AgentRun,
+bypassing the Agent CR entirely.
+
+Rejected: Requires building a config.json translator that replicates
+kagent's controller logic (model credential injection, MCP server
+resolution, TLS configuration). Also requires kagent to support a
+batch/one-shot mode ([kagent#1089][kagent-1089]) that does not exist
+today. Using the Agent CR avoids both issues.
+
+### Pipeline spec.agents Field
+
+Add a new `spec.agents` field to Tekton's Pipeline CRD, analogous to
+`spec.workspaces`, for declaring agent environments.
+
+Rejected: Requires changes to Tekton's core Pipeline CRD, which is a
+much larger scope and would need its own TEP. The CustomTask approach
+achieves the same result without modifying existing APIs.
+
+### Volume Mounts for Codebase Access
+
+Mount workspace PVCs directly into agent Deployments so agents can
+read codebases via the filesystem.
+
+Rejected: Volume mounts give the agent raw filesystem access outside
+the tool call protocol. OPA cannot restrict which files the agent
+reads. No audit trail for file access. MCP tools route all data access
+through the tool protocol, enabling uniform policy enforcement and
+provenance recording.
+
+### Convention-Based Container Wrapping
+
+Continue wrapping agents in container steps with ad-hoc Python scripts.
+
+Rejected: This is the status quo. Opaque, insecure, unauditable.
+
+## Implementation Plan
+
+### Milestones
+
+**Phase 1: Core**
+- AgentRun and AgentConfig CRDs with `rbac.rules`, `tokenBudget`,
+ `policy.opa.defaultDeny` fields
+- kagent Agent CR creation with dynamic client (standalone lifecycle)
+- Per-run RBAC generation (ServiceAccount + Role + RoleBinding)
+- Real-time OPA enforcement (both `allow` and `deny`, namespaced
+ inputs, fail-closed default)
+- kagent ModelConfig and RemoteMCPServer validation via dynamic client
+- CustomTask adapter for Tekton Pipeline integration
+- PipelineRun-scoped Agent CR reuse (same configRef = same agent)
+- Provenance recording in AgentRun status
+- AgentConfig snapshot at reconcile time
+
+**Phase 2: Hardening**
+- Per-run NetworkPolicy generation (API server ipBlock, controller
+ ingress, MCP server egress)
+- Tekton PipelineRun-based pre/post hooks
+- Token budget enforcement (ADK config + timeout fallback)
+- Tekton Chains extension for agent provenance attestation
+
+**Phase 3: Advanced**
+- Per-tool-call OPA enforcement via kagent ADK hook (upstream
+ contribution to kagent)
+- Pipeline-level agent cost aggregation
+- Agent memory integration (kagent Memory CRD)
+- Prompt auditability alerting (hash comparison between runs)
+- Standalone AgentRun optimization via kagent batch mode
+ ([kagent#1089][kagent-1089]) when available
+
+### Test Plan
+
+- **Unit tests**: AgentConfig validation (rbac.rules required, OPA
+ configMapRef format), Agent CR construction (correct labels, owner
+ references, serviceAccountName), RBAC generation (rules from config,
+ not hardcoded), OPA input namespacing (verify key injection is
+ impossible), NetworkPolicy construction (ipBlock for API server, MCP
+ server egress), AgentConfig snapshot immutability
+- **Integration tests**: End-to-end AgentRun lifecycle with mock kagent
+ CRDs (fake dynamic client), Agent CR reuse with same configRef in
+ mock PipelineRun, CustomTask adapter with mock Pipeline controller
+- **E2E tests**: Full execution in Kind cluster with kagent installed.
+ Create ModelConfig, RemoteMCPServer, AgentConfig, AgentRun. Validate:
+ Agent CR created with correct SA, RBAC matches config rules, OPA
+ denies disallowed tools, results collected, provenance recorded.
+ Multi-step PipelineRun with shared agent context.
+- **Security tests**: OPA input injection (verify `input.params.tool`
+ cannot overwrite `input.tool`), RBAC isolation (agent cannot access
+ resources outside declared rules), fail-closed default (no policy =
+ all denied)
+- **Negative tests**: Non-existent AgentConfig reference, empty API
+ key secret, kagent CRDs not installed, PipelineRun cancellation
+ during agent execution, malformed Rego in OPA ConfigMap
+
+### Infrastructure Needed
+
+- Repository: `tektoncd/agentrun` (or initially
+ `waveywaves/tekton-agentrun`)
+- CI pipeline: Kind cluster with kagent + Tekton Pipelines installed
+- Helm chart for bundled installation
+
+### Upgrade and Migration Strategy
+
+This is a new feature with no existing behavior to migrate from. CRDs
+are introduced at `v1alpha1` stability. Breaking changes are expected
+during alpha.
+
+### Implementation Pull Requests
+
+To be populated when implementation begins.
+
+## References
+
+- [kagent][kagent]
+- [kagent Agent CRD documentation][kagent-agents]
+- [kagent ModelConfig documentation][kagent-models]
+- [kagent RemoteMCPServer documentation][kagent-tools]
+- [kagent batch/Job mode request][kagent-1089]
+- [Model Context Protocol specification][mcp]
+- [AgentRun PoC][agentrun-poc]
+- [Pipeline comparison (with/without agents)][pipeline-comparison]
+- [Tekton Chains][chains]
+- [Tekton CustomTask specification][customtask]
+- [SLSA Provenance Framework][slsa]
+- [Open Policy Agent][opa]
+- [Tekton Design Principles][design-principles]
+- [PROV-AGENT: Unified Provenance for AI Agent Interactions][prov-agent]
+- [LLM Agents for Interactive Workflow Provenance][workflow-provenance]
+- [CoSAI Principles for Secure Agentic Systems][cosai]
+- [OWASP Top 10 for Agentic Applications][owasp-agentic]
+
+[kagent]: https://kagent.dev
+[kagent-agents]: https://kagent.dev/docs/kagent/concepts/agents
+[kagent-models]: https://kagent.dev/docs/kagent/concepts/model-providers
+[kagent-tools]: https://kagent.dev/docs/kagent/concepts/tool-servers
+[kagent-1089]: https://github.com/kagent-dev/kagent/issues/1089
+[mcp]: https://modelcontextprotocol.io/
+[agentrun-poc]: https://github.com/waveywaves/tekton-agentrun
+[pipeline-without-agents]: https://github.com/waveywaves/tektoncd-pipeline-mgmt/blob/main/demos/jira-test-lifecycle/pipeline-without-agents.yaml
+[pipeline-with-agents]: https://github.com/waveywaves/tektoncd-pipeline-mgmt/blob/main/demos/jira-test-lifecycle/pipeline-with-agents.yaml
+[pipeline-comparison]: https://github.com/waveywaves/tektoncd-pipeline-mgmt/tree/main/demos/jira-test-lifecycle
+[chains]: https://github.com/tektoncd/chains
+[customtask]: https://tekton.dev/docs/pipelines/runs/
+[slsa]: https://slsa.dev/
+[opa]: https://www.openpolicyagent.org/
+[design-principles]: https://github.com/tektoncd/community/blob/main/design-principles.md
+[prov-agent]: https://arxiv.org/abs/2508.02866
+[workflow-provenance]: https://arxiv.org/abs/2509.13978
+[cosai]: https://www.coalitionforsecureai.org/announcing-the-cosai-principles-for-secure-by-design-agentic-systems/
+[owasp-agentic]: https://www.practical-devsecops.com/owasp-top-10-agentic-applications/
diff --git a/teps/README.md b/teps/README.md
index 03e237a50..877646eb6 100644
--- a/teps/README.md
+++ b/teps/README.md
@@ -150,3 +150,4 @@ This is the complete list of Tekton TEPs:
|[TEP-0161](0161-resolver-caching.md) | Resolver Caching for Task and Pipeline Resolution | proposed | 2024-06-15 |
|[TEP-0162](0162-event-based-pruning-of-tekton-resources.md) | event based pruning of tekton resources | proposed | 2025-06-18 |
|[TEP-0163](0163-profilebased-dynamic-compute-resources-for-steps.md) | Profile-Based Dynamic Compute Resources for Steps | proposed | 2025-09-01 |
+|[TEP-0164](0164-agent-native-workflows.md) | Agent-Native Workflows | proposed | 2026-03-20 |