Skip to content

Add Knowledge Loop runtime scaffold#302

Merged
Svaag merged 6 commits into
mainfrom
feature/knowledge-loop-runtime
Jun 27, 2026
Merged

Add Knowledge Loop runtime scaffold#302
Svaag merged 6 commits into
mainfrom
feature/knowledge-loop-runtime

Conversation

@Svaag

@Svaag Svaag commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

Summary

  • add disabled-by-default knowledge_loop role for the Knowledge Loop producer agent
  • add systemd service/timer, GitHub App token helper, git askpass, and wrapper
  • add dedicated Vault Agent templates and Vault policy for kv/knowledge-loop
  • let the engineering-loop playbook wire a separate vault-agent-knowledge-loop.service
  • extend the apply workflow to include knowledge_loop_apply=true and mint the Knowledge Loop AppRole bootstrap
  • add a bootstrap runbook and contract tests

Safety

  • timer remains disabled by default
  • OpenRouter live enrichment budget defaults to 0
  • runtime scope is separate from Engineering Loop and CI/CD PR-Agent credentials
  • no fleet SSH, Docker socket, wallet, app runtime, or broad Vault credentials are included

Validation

  • python -m pytest tests/iac/test_vault_and_runner_contracts.py -q
  • scripts/ci/render-all.sh
  • scripts/ci/deploy-preflight.sh --repo-only
  • ansible-playbook playbooks/engineering-loop.yml --tags validate --connection=local --limit loop --skip-tags snapshot

Dependency

Merge/deploy after AS215932/knowledge#18 so a follow-up deploy PR can pin knowledge_loop_version to a commit that contains hyrule-knowledge loop --once.

@Svaag Svaag requested a review from a team as a code owner June 23, 2026 17:58
@github-actions

Copy link
Copy Markdown
Contributor

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

🎫 Ticket compliance analysis ❌

18 - Not compliant

Non-compliant requirements:

  • None of the ticket requirements are addressed by this PR.
⏱️ Estimated effort to review: 3 🔵🔵🔵⚪⚪
🏅 Score: 80
🧪 PR contains tests
🔒 No security concerns identified
⚡ Recommended focus areas for review

Unpinned Git checkout

knowledge_loop_version defaults to "main" instead of a pinned commit SHA. This means the first deploy (or any deploy before a follow-up PR pins the version) will check out the latest commit on main of the knowledge repo, bypassing the intended safety gate. Per the deploy runbook, all production checkouts should use an SHA-pinned ref to ensure reproducibility and prevent untested changes from reaching the loop VM. The PR description acknowledges this by stating a follow-up deploy PR will pin the version, but the current default introduces a window where a deploy could pull unreviewed code.

knowledge_loop_version: main

@github-actions

Copy link
Copy Markdown
Contributor

PR Code Suggestions ✨

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
Possible issue
Safely serialize list to env var

If learning_event_paths in Vault is a YAML list (e.g., ["path1", "path2"]), the
template will inject the raw list representation like ['path1', 'path2'] as the env
variable value, which is unlikely parseable by the application. Use a Jinja filter
to join the list into a comma-separated string, or convert to a safe format.

ansible/roles/vault_agent/templates/knowledge-loop.env.ctmpl.j2 [25]

-HYRULE_KNOWLEDGE_LOOP_LEARNING_EVENTS={{ or .Data.data.learning_event_paths "" }}
+HYRULE_KNOWLEDGE_LOOP_LEARNING_EVENTS={{ (or .Data.data.learning_event_paths []) | join(",") }}
Suggestion importance[1-10]: 7

__

Why: If learning_event_paths is stored as a list in Vault, the current template would inject an unparsable raw list representation, causing runtime failure. The proposed join filter correctly handles this case, making it a meaningful improvement.

Medium
Fix portable base64url encoding

The openssl base64 -A command is not available in all OpenSSL versions (it was added
in OpenSSL 1.1.0). On older systems this will fail silently or produce invalid
output. Replace with openssl enc -base64 -A or use a Python one-liner for more
reliable base64url encoding.

ansible/roles/knowledge_loop/templates/github-app-token.sh.j2 [20-22]

 b64url() {
-  openssl base64 -A | tr '+/' '-_' | tr -d '='
+  openssl enc -base64 -A | tr '+/' '-_' | tr -d '='
 }
Suggestion importance[1-10]: 5

__

Why: The suggestion correctly identifies that openssl base64 -A may not be available in older OpenSSL versions, but the impact is limited as the target environment likely runs modern OpenSSL. The fix enhances portability without addressing a critical bug.

Low
Security
Secure uv installation method

Downloading and executing a script directly from the internet (curl | sh) is a
security risk and bypasses package management. Consider installing uv via the system
package manager if available, or at minimum verify the script’s checksum and use
creates to avoid re-downloading on every run.

ansible/roles/knowledge_loop/tasks/apply.yml [9-13]

 - name: Install uv for Knowledge Loop
+  ansible.builtin.get_url:
+    url: https://astral.sh/uv/install.sh
+    dest: /tmp/uv-install.sh
+    checksum: sha256:EXPECTED_CHECKSUM_HERE
+    mode: '0755'
+  register: uv_install_script
+  changed_when: false
+- name: Execute uv installer
   ansible.builtin.shell: |
     set -euo pipefail
-    curl -LsSf https://astral.sh/uv/install.sh | UV_INSTALL_DIR=/usr/local/bin sh
+    UV_INSTALL_DIR=/usr/local/bin sh /tmp/uv-install.sh
   args:
     creates: /usr/local/bin/uv
     executable: /bin/bash
+  when: uv_install_script.changed
Suggestion importance[1-10]: 7

__

Why: The suggestion addresses a security concern by recommending script download with checksum verification instead of piping directly into a shell. While the current approach is common, the proposed change adds a layer of trust and is a valid security enhancement.

Medium

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 52b648ddc3

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread ansible/roles/knowledge_loop/defaults/main.yml
Comment thread docs/runbooks/bootstrap-knowledge-loop-vault.md
Address the two Codex findings on the Knowledge Loop runtime scaffold:

- Pin the loop checkout. apply.yml always passes knowledge_loop_apply=true for
  the engineering-loop playbook, so without a host-var override the first apply
  would deploy whatever AS215932/knowledge `main` points at — non-reproducible and
  able to pick up unreviewed changes. Pin knowledge_loop_version on `loop` to the
  merged 0b414ae (knowledge#18, ships `hyrule-knowledge loop --once`) with the
  timer left disabled.
- Document the runner policy refresh. The bootstrap runbook only wrote the
  knowledge-loop policy, but apply.yml mints the SecretID with the runner's Vault
  token, which needs the new auth/approle/role/knowledge-loop/* paths from
  github-runner.hcl first. Add the `vault policy write github-runner` step so the
  first apply does not fail permission denied.

Add an IaC contract test asserting the pin is a 40-char commit and the runbook
documents the github-runner policy refresh.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@Svaag

Svaag commented Jun 27, 2026

Copy link
Copy Markdown
Contributor Author

@codex review

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b48083dfbc

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread ansible/roles/knowledge_loop/templates/run-loop.sh.j2 Outdated
Codex flagged that the wrapper passed --repo-path = the pinned install_dir, so a
timer run with --create-pr would branch/commit in the Ansible-managed runtime
checkout. A successful run could leave /opt/knowledge-loop on a bot branch or
dirty instead of the pinned knowledge_loop_version, and the apply git task uses
force: false so it would not reliably repair it.

Mirror the engineering_loop split:

- Add knowledge_loop_workspace_dir / knowledge_loop_repo_workspace under the state
  dir and clone the Knowledge repo there at apply time.
- Point the loop wrapper --repo-path at the workspace clone; the CLI still runs
  from the pinned install_dir venv.
- Drop install_dir from the service ReadWritePaths so the reviewed runtime stays
  read-only; only the state dir (workspace + daemon state) is writable.
- Validate the new paths and add a contract test for the separation.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@Svaag

Svaag commented Jun 27, 2026

Copy link
Copy Markdown
Contributor Author

@codex review

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 14113828b1

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread ansible/roles/knowledge_loop/tasks/apply.yml
The knowledge_loop role runs before the knowledge-loop vault_agent renders
/etc/knowledge-loop/knowledge-loop.env and the GitHub App key. With the timer's
Persistent=true, enabling it after the scheduled time fires the service
immediately on activation, so a canary apply that both enables the timer and
first renders secrets could trigger a run before the EnvironmentFile/key exist.

Gate the timer start on the rendered env file and private key existing: stat both
and only set state=started when present, otherwise enable the unit and let the
next apply (post-render) start it. Add a contract test for the guard.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@Svaag

Svaag commented Jun 27, 2026

Copy link
Copy Markdown
Contributor Author

@codex review

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 1c37633eaf

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread ansible/playbooks/engineering-loop.yml
Comment thread ansible/roles/knowledge_loop/handlers/main.yml
…handler

Two ordering/handler findings from Codex:

- Per-instance Vault Agent restart handler (P1). engineering-loop.yml includes the
  vault_agent role twice (engineering-loop + knowledge-loop). Ansible handlers are
  a single play-level namespace, so the shared `restart vault agent` handler name
  was deduped and a notify could restart the wrong instance, leaving the other on
  stale config/secrets. Template the handler name and its notifies with
  vault_agent_name so each instance gets its own handler.
- Keep the secrets gate when restarting the timer (P2). The restart knowledge-loop
  timer handler bypassed the guard on the Set Knowledge Loop timer state task, so a
  changed checkout/template could restart the Persistent=true timer and fire the
  service before the env file/key exist. Apply the same knowledge_loop_secret_files
  existence guard in the handler.

Extend the IaC contract tests for both.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@Svaag

Svaag commented Jun 27, 2026

Copy link
Copy Markdown
Contributor Author

@codex review

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 9da6cf1564

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread ansible/roles/vault_agent/handlers/main.yml Outdated
Comment thread ansible/roles/knowledge_loop/templates/run-loop.sh.j2 Outdated
Two more Codex findings:

- Static/in-role Vault Agent restart. The previous templated handler name is still
  unreliable: Ansible resolves handler names at play load, so two vault_agent
  includes (engineering-loop + knowledge-loop) cannot each get their own handler.
  Replace the handler with an in-role restart: register the config/secret/unit
  tasks and have the existing "Enable and start" step restart when any changed, so
  vault_agent_name binds at task-execution time and the right instance restarts.
- Let Vault-rendered OpenRouter budget take effect. The wrapper always passed
  --max-openrouter-calls-per-day (Ansible default 0), which overrode the argparse
  default sourced from HYRULE_KNOWLEDGE_LOOP_MAX_OPENROUTER_CALLS_PER_DAY, so a
  higher budget in kv/knowledge-loop was ignored and live enrichment was rejected.
  Only pass the flag when that env var is unset.

Update the IaC contract tests for both.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@Svaag

Svaag commented Jun 27, 2026

Copy link
Copy Markdown
Contributor Author

@codex review

@chatgpt-codex-connector

Copy link
Copy Markdown

Codex Review: Didn't find any major issues. Delightful!

Reviewed commit: 5ae9a1f074

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@Svaag Svaag merged commit ba95feb into main Jun 27, 2026
10 of 11 checks passed
@Svaag Svaag deleted the feature/knowledge-loop-runtime branch June 27, 2026 08:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant