Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
40 changes: 20 additions & 20 deletions okf/README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Enrichment Agent — an OKF proof of concept
# Reference Agent — an OKF proof of concept

### 📖 [Read the Open Knowledge Format v0.1 specification → SPEC.md](SPEC.md)

Expand Down Expand Up @@ -77,7 +77,7 @@ properties that are hard to get from a service-owned metadata store:
normal markdown links, expressing relationships richer than the
parent/child implied by the directory layout.

The net effect is that enrichment agents, consumption agents, and humans
The net effect is that reference agents, consumption agents, and humans
collaborate on the same artifacts in the same way they already collaborate
on source code.

Expand All @@ -97,20 +97,20 @@ python3.13 -m venv .venv
`GOOGLE_GENAI_USE_VERTEXAI=true`, `GOOGLE_CLOUD_PROJECT=<id>`, and
`GOOGLE_CLOUD_LOCATION=<region>`.

## How enrichment works

Enrichment runs in two passes. The **BQ pass** writes one OKF doc per
concept the source advertises, using BigQuery metadata alone. The **web
pass** then runs the LLM as its own crawler: it receives a list of seed
URLs (provided via `--web-seed` or `--web-seed-file`), fetches the seeds
via the `fetch_url` tool, and decides which outbound links are worth
following based on whether they look like authoritative documentation for
the existing concepts. For each page it fetches, the agent chooses to
(a) enrich one or more existing concept docs, (b) mint a standalone
`references/<slug>` doc, or (c) skip. A hard `--web-max-pages` cap and a
same-domain allowed-hosts filter (configurable via `--web-allowed-host`)
are enforced inside the tool, so the agent cannot overrun. Use `--no-web`
to skip the web pass.
## How the reference agent works

The reference agent runs in two passes. The **BQ pass** writes one OKF
doc per concept the source advertises, using BigQuery metadata alone.
The **web pass** then runs the LLM as its own crawler: it receives a
list of seed URLs (provided via `--web-seed` or `--web-seed-file`),
fetches the seeds via the `fetch_url` tool, and decides which outbound
links are worth following based on whether they look like authoritative
documentation for the existing concepts. For each page it fetches, the
agent chooses to (a) enrich one or more existing concept docs, (b) mint
a standalone `references/<slug>` doc, or (c) skip. A hard
`--web-max-pages` cap and a same-domain allowed-hosts filter
(configurable via `--web-allowed-host`) are enforced inside the tool,
so the agent cannot overrun. Use `--no-web` to skip the web pass.

## Run

Expand All @@ -119,7 +119,7 @@ directory. Seeds for the web pass are explicit; omit them (or pass
`--no-web`) to run BQ-only:

```
.venv/bin/python -m enrichment_agent enrich \
.venv/bin/python -m reference_agent enrich \
--source bq \
--dataset <project>.<dataset> \
--web-seed-file <path/to/seeds.txt> \
Expand Down Expand Up @@ -163,7 +163,7 @@ host it on a static file server, or commit it next to the bundle (as
this repo does).

The viewer is itself a proof-of-concept *consumer* of OKF, mirroring
the way the enrichment agent is a proof-of-concept *producer*. OKF
the way the reference agent is a proof-of-concept *producer*. OKF
bundles can be consumed by anything that reads markdown; this is just
one shape.

Expand All @@ -185,7 +185,7 @@ one shape.
### Generate

```
.venv/bin/python -m enrichment_agent visualize --bundle ./bundles/<name>
.venv/bin/python -m reference_agent visualize --bundle ./bundles/<name>
```

That writes `bundles/<name>/viz.html`. Flags:
Expand All @@ -199,7 +199,7 @@ That writes `bundles/<name>/viz.html`. Flags:
Example, writing the output somewhere else and overriding the header:

```
.venv/bin/python -m enrichment_agent visualize \
.venv/bin/python -m reference_agent visualize \
--bundle ./bundles/crypto_bitcoin \
--out /tmp/btc.html \
--name "Bitcoin OKF"
Expand Down
8 changes: 4 additions & 4 deletions okf/pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -3,9 +3,9 @@ requires = ["setuptools>=68", "wheel"]
build-backend = "setuptools.build_meta"

[project]
name = "enrichment-agent"
name = "reference-agent"
version = "0.1.0"
description = "Reference enrichment agent that produces Open Knowledge Format bundles"
description = "Reference agent that produces Open Knowledge Format bundles"
requires-python = ">=3.11"
dependencies = [
"google-adk>=2.0",
Expand All @@ -19,13 +19,13 @@ dependencies = [
dev = ["pytest>=7.0"]

[project.scripts]
enrichment-agent = "enrichment_agent.cli:main"
reference-agent = "reference_agent.cli:main"

[tool.setuptools.packages.find]
where = ["src"]

[tool.setuptools.package-data]
enrichment_agent = ["prompts/*.md", "viewer/templates/*.html", "viewer/static/*"]
reference_agent = ["prompts/*.md", "viewer/templates/*.html", "viewer/static/*"]

[tool.pytest.ini_options]
testpaths = ["tests"]
Expand Down
4 changes: 2 additions & 2 deletions okf/samples/crypto_bitcoin/README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Bitcoin public dataset sample

Runs the enrichment agent against the public
Runs the reference agent against the public
`bigquery-public-data.crypto_bitcoin` dataset (blocks, transactions,
inputs, outputs — produced by the open-source `bitcoin-etl` pipeline)
and seeds the web pass with the canonical schema source and the
Expand Down Expand Up @@ -35,7 +35,7 @@ how the agent surfaces cross-table foreign-key relationships in prose.
## Run

```
.venv/bin/python -m enrichment_agent enrich \
.venv/bin/python -m reference_agent enrich \
--source bq \
--dataset bigquery-public-data.crypto_bitcoin \
--web-seed-file samples/crypto_bitcoin/seeds.txt \
Expand Down
4 changes: 2 additions & 2 deletions okf/samples/ga4_merch_store/README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# GA4 Google Merchandise Store sample

Runs the enrichment agent against the public
Runs the reference agent against the public
`bigquery-public-data.ga4_obfuscated_sample_ecommerce` dataset (a GA4 export
from the Google Merchandise Store) and seeds the web pass with canonical GA4
BigQuery Export documentation URLs.
Expand All @@ -26,7 +26,7 @@ BigQuery Export documentation URLs.
## Run

```
.venv/bin/python -m enrichment_agent enrich \
.venv/bin/python -m reference_agent enrich \
--source bq \
--dataset bigquery-public-data.ga4_obfuscated_sample_ecommerce \
--web-seed-file samples/ga4_merch_store/seeds.txt \
Expand Down
4 changes: 2 additions & 2 deletions okf/samples/stackoverflow/README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Stack Overflow public dataset sample

Runs the enrichment agent against the public
Runs the reference agent against the public
`bigquery-public-data.stackoverflow` dataset (a mirror of the Stack
Exchange Data Dump for Stack Overflow — `posts_questions`,
`posts_answers`, `users`, `votes`, `comments`, `badges`, `tags`,
Expand Down Expand Up @@ -34,7 +34,7 @@ updates more than one concept per fetched page.
## Run

```
.venv/bin/python -m enrichment_agent enrich \
.venv/bin/python -m reference_agent enrich \
--source bq \
--dataset bigquery-public-data.stackoverflow \
--web-seed-file samples/stackoverflow/seeds.txt \
Expand Down
11 changes: 0 additions & 11 deletions okf/src/enrichment_agent/bundle/__init__.py

This file was deleted.

3 changes: 0 additions & 3 deletions okf/src/enrichment_agent/sources/__init__.py

This file was deleted.

3 changes: 0 additions & 3 deletions okf/src/enrichment_agent/viewer/__init__.py

This file was deleted.

Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
from enrichment_agent.cli import main
from reference_agent.cli import main

if __name__ == "__main__":
raise SystemExit(main())
Original file line number Diff line number Diff line change
Expand Up @@ -5,30 +5,30 @@
from google.adk import Agent
from google.adk.tools import FunctionTool

from enrichment_agent.tools.bundle_tools import read_existing_doc, write_concept_doc
from enrichment_agent.tools.source_tools import (
from reference_agent.tools.bundle_tools import read_existing_doc, write_concept_doc
from reference_agent.tools.source_tools import (
list_concepts,
read_concept_raw,
sample_rows,
)
from enrichment_agent.tools.web_tools import fetch_url
from reference_agent.tools.web_tools import fetch_url

DEFAULT_MODEL = "gemini-flash-latest"


def _load_prompt(filename: str) -> str:
return (
resources.files("enrichment_agent.prompts")
resources.files("reference_agent.prompts")
.joinpath(filename)
.read_text(encoding="utf-8")
)


def build_bq_agent(model: str = DEFAULT_MODEL) -> Agent:
return Agent(
name="okf_bq_enrichment_agent",
name="okf_bq_reference_agent",
model=model,
instruction=_load_prompt("enrichment_instruction.md"),
instruction=_load_prompt("reference_instruction.md"),
tools=[
FunctionTool(list_concepts),
FunctionTool(read_concept_raw),
Expand Down
11 changes: 11 additions & 0 deletions okf/src/reference_agent/bundle/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
from reference_agent.bundle.document import OKFDocument, REQUIRED_FRONTMATTER_KEYS
from reference_agent.bundle.index import regenerate_indexes
from reference_agent.bundle.paths import concept_id_to_path, path_to_concept_id

__all__ = [
"OKFDocument",
"REQUIRED_FRONTMATTER_KEYS",
"concept_id_to_path",
"path_to_concept_id",
"regenerate_indexes",
]
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,8 @@
from pathlib import Path
from typing import Callable

from enrichment_agent.bundle.document import OKFDocument
from enrichment_agent.bundle.synthesizer import synthesize_description
from reference_agent.bundle.document import OKFDocument
from reference_agent.bundle.synthesizer import synthesize_description

_INDEX_FILE = "index.md"
_FALLBACK_MODEL = "gemini-flash-latest"
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,10 +6,10 @@
from pathlib import Path
from urllib.parse import urlparse

from enrichment_agent.agent import DEFAULT_MODEL
from enrichment_agent.bundle.paths import parse_concept_id
from enrichment_agent.runner import EnrichmentRunner
from enrichment_agent.sources.bigquery import BigQuerySource
from reference_agent.agent import DEFAULT_MODEL
from reference_agent.bundle.paths import parse_concept_id
from reference_agent.runner import ReferenceRunner
from reference_agent.sources.bigquery import BigQuerySource

_SOURCES = ("bq",)

Expand Down Expand Up @@ -57,7 +57,7 @@ def _dedup_preserve_order(items: list[str]) -> list[str]:


def _parser() -> argparse.ArgumentParser:
p = argparse.ArgumentParser(prog="enrichment-agent")
p = argparse.ArgumentParser(prog="reference-agent")
sub = p.add_subparsers(dest="command", required=True)

enrich = sub.add_parser(
Expand Down Expand Up @@ -168,13 +168,13 @@ def main(argv: list[str] | None = None) -> int:
format="%(message)s",
)
if getattr(args, "verbose", False):
logging.getLogger("enrichment_agent").setLevel(logging.DEBUG)
logging.getLogger("reference_agent").setLevel(logging.DEBUG)
# Quiet chatty third-party loggers regardless of mode.
for noisy in ("google", "google_genai", "google_adk", "urllib3", "httpx"):
logging.getLogger(noisy).setLevel(logging.WARNING)

if args.command == "visualize":
from enrichment_agent.viewer import generate_visualization
from reference_agent.viewer import generate_visualization
out = args.out or (args.bundle / "viz.html")
stats = generate_visualization(args.bundle, out, bundle_name=args.name)
print(
Expand All @@ -193,7 +193,7 @@ def main(argv: list[str] | None = None) -> int:
allowed_hosts = {urlparse(s).netloc for s in seeds if urlparse(s).netloc}
if args.web_allowed_host:
allowed_hosts |= set(args.web_allowed_host)
runner = EnrichmentRunner(
runner = ReferenceRunner(
source=source,
bundle_root=args.out,
model=args.model,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -11,19 +11,19 @@
from google.adk.sessions import InMemorySessionService
from google.genai import types

from enrichment_agent.agent import DEFAULT_MODEL, build_bq_agent, build_web_agent
from enrichment_agent.bundle.index import regenerate_indexes
from enrichment_agent.sources.base import ConceptRef, Source
from enrichment_agent.tools.context import (
from reference_agent.agent import DEFAULT_MODEL, build_bq_agent, build_web_agent
from reference_agent.bundle.index import regenerate_indexes
from reference_agent.sources.base import ConceptRef, Source
from reference_agent.tools.context import (
clear_web_state,
set_context,
set_web_state,
)

log = logging.getLogger(__name__)

_BQ_APP_NAME = "enrichment_agent_bq"
_WEB_APP_NAME = "enrichment_agent_web"
_BQ_APP_NAME = "reference_agent_bq"
_WEB_APP_NAME = "reference_agent_web"
_USER_ID = "enricher"

_COMPACT_STR_LIMIT = 120
Expand Down Expand Up @@ -152,7 +152,7 @@ def _build_web_user_message(
return types.Content(role="user", parts=[types.Part(text=text)])


class EnrichmentRunner:
class ReferenceRunner:
def __init__(
self,
source: Source,
Expand Down
3 changes: 3 additions & 0 deletions okf/src/reference_agent/sources/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
from reference_agent.sources.base import ConceptRef, Source

__all__ = ["ConceptRef", "Source"]
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@

from google.cloud import bigquery

from enrichment_agent.sources.base import ConceptRef, Source
from reference_agent.sources.base import ConceptRef, Source

_SHARD_SUFFIX_RE = re.compile(r"^(?P<prefix>.+?_)(?P<shard>\d{6,8})$")

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -4,13 +4,13 @@
from datetime import datetime, timezone
from typing import Any

from enrichment_agent.bundle.document import (
from reference_agent.bundle.document import (
REQUIRED_FRONTMATTER_KEYS,
OKFDocument,
OKFDocumentError,
)
from enrichment_agent.bundle.paths import concept_id_to_path, parse_concept_id
from enrichment_agent.tools.context import get_context, is_web_pass
from reference_agent.bundle.paths import concept_id_to_path, parse_concept_id
from reference_agent.tools.context import get_context, is_web_pass

_PREFERRED_KEY_ORDER = ("type", "resource", "title", "description", "tags", "timestamp")

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
from dataclasses import dataclass, field
from pathlib import Path

from enrichment_agent.sources.base import Source
from reference_agent.sources.base import Source


@dataclass
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,8 @@

from typing import Any

from enrichment_agent.bundle.paths import parse_concept_id
from enrichment_agent.tools.context import get_context
from reference_agent.bundle.paths import parse_concept_id
from reference_agent.tools.context import get_context


def _ref_to_dict(ref) -> dict[str, Any]:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,8 @@
from typing import Any
from urllib.parse import urlparse

from enrichment_agent.tools.context import get_web_state
from enrichment_agent.web.fetcher import FetchError, fetch_and_parse
from reference_agent.tools.context import get_web_state
from reference_agent.web.fetcher import FetchError, fetch_and_parse


def fetch_url(url: str) -> dict[str, Any]:
Expand Down
3 changes: 3 additions & 0 deletions okf/src/reference_agent/viewer/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
from reference_agent.viewer.generator import generate_visualization

__all__ = ["generate_visualization"]
Loading