Skip to content

pronob1010/Aether

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Aether

A small Python framework for building AI-native applications, organized around classical software-engineering patterns: Strategy, Adapter, Decorator, Factory, Builder, Facade, and a generic Plugin Registry.

Status: pre-1.0, not yet on PyPI.

Why

Most LLM frameworks ship the features you need (streaming, retries, tool calling) wired together in ways you can't easily pull apart. Aether ships the primitives — small typed objects you compose — plus a sensible facade for the 90% case.

Three things you can do that most frameworks make hard:

  1. Swap the LLM provider with one env var, no code changes.
  2. Inspect cumulative cost (client.usage.total_cost_usd) without threading a tracker through your app.
  3. Write a Python function, decorate it with @register_tool, and it becomes available to any provider — schema generated for you.

Quick start

import asyncio
from aether import Aether

async def main():
    client = Aether()                 # reads LLM_PROVIDER, OPENAI_API_KEY, ...
    answer = await client.ask("What is the meaning of life?")
    print(answer)
    print(f"Spent ${client.usage.total_cost_usd:.6f}")

asyncio.run(main())
export LLM_PROVIDER=openai       # or gemini, fake
export OPENAI_API_KEY=sk-...
python try_it.py

Aether() builds a fully resilient client by default: retry → circuit breaker → cost tracking, in the correct nesting order. Opt out via Aether(with_retry=False, ...). For an explicit config, pass Aether(config=ProviderConfig(...)). For a pre-built provider (testing, custom wrapping), pass Aether(some_provider) positionally.

Features

Streaming

async for delta in client.stream_text("Tell me a story"):
    print(delta, end="", flush=True)

Rich-chunk variant if you need metadata:

async for chunk in client.stream("..."):
    chunk.text             # delta
    chunk.finish_reason    # only on final chunk
    chunk.output_tokens    # only on final chunk

Both Retry and CircuitBreaker decorators handle streaming. Retry only applies to the handshake (before the first chunk yields); errors mid-stream propagate so the caller never sees duplicate output.

Tool calling works during streaming too:

async for delta in client.stream_text(
    "What's 17+25?", tools=["add"],
):
    print(delta, end="", flush=True)
# "Let me add those." → [tool runs] → "The answer is 42."

Internally the framework runs multiple provider sessions — text → tool dispatch → text — but the user sees ONE logical stream. stream.start and stream.complete fire once across the whole stream; tool dispatches fire the normal tool.* events.

Tool calling

from aether import Aether, register_tool

@register_tool(description="Add two numbers")
def add(a: int, b: int) -> int:
    return a + b

client = Aether()
answer = await client.ask("What is 17 + 25?", tools=["add"])
# Aether runs the LLM↔tool loop and returns the final assistant text.

The framework auto-generates the JSON Schema from your function's signature and docstring. Async tools work too. Tool errors become content (the LLM sees the failure as a tool result) rather than exceptions that abort the conversation.

Three reference tools ship under aether.extensions.tools:

import aether.extensions.tools  # registers get_current_time, http_get, read_file

Cost tracking

On by default. Reports tokens and (where pricing is known) dollar cost:

client = Aether()
await client.ask("hi")
await client.ask("how are you")

client.usage.total_requests          # 2
client.usage.total_input_tokens      # 7
client.usage.total_output_tokens     # 23
client.usage.total_cost_usd          # 0.00002
client.usage.by_model                # {"gpt-4o-mini": TokenUsage(...)}

Cost tracking sits outermost in the decorator stack, so it only counts what actually billed — not retried attempts.

Multi-turn conversations

prompt is a string for the single-turn case, or a list[Message] for multi-turn:

from aether import Aether, Message

convo = [
    Message(role="system",    content="You are terse."),
    Message(role="user",      content="hi"),
    Message(role="assistant", content="hi"),
    Message(role="user",      content="say more"),
]
answer = await client.ask(convo)

Sessions (stateful conversations)

For chat apps, threading messages by hand gets old fast. Sessions remember the conversation for you:

client = Aether()
session = client.session("user_alice", system="You are helpful.")

await session.ask("My name is Alice.")
await session.ask("What's my name?")   # → "Your name is Alice."

# Streaming works too — final text auto-saved to history
async for delta in session.stream_text("Tell me a story"):
    print(delta, end="", flush=True)

Same session_id returns the same Session object within a client. Across processes, share the underlying store:

from aether.extensions.memory import InMemorySessionStore
store = InMemorySessionStore()              # or your own Redis/SQL store
client_a = Aether(memory_store=store)
client_b = Aether(memory_store=store)
# Both see the same `session("alice")` history

The default store is in-memory. For production, implement the SessionStore Protocol against Redis, SQLite, etc.:

from aether import SessionStore
from aether.llm.contracts import Message

class RedisSessionStore:
    async def load(self, session_id: str) -> list[Message]: ...
    async def save(self, session_id: str, messages: list[Message]) -> None: ...
    async def delete(self, session_id: str) -> None: ...
    async def exists(self, session_id: str) -> bool: ...

Three correctness guarantees baked in:

  • Tool dispatches stay out of session history. A session sees [system, user, assistant_final_text] — internal tool turns from the loop don't pollute it.
  • Failed calls roll back. If the LLM errors mid-turn, the dangling user message is removed; next turn starts clean.
  • Concurrent session.ask() is serialized via a per-session asyncio.Lock — no interleaved history corruption.

Resilience

Retry (with exponential backoff) and Circuit Breaker both ship as decorators. The builder composes them in the load-bearing order: retry inside circuit breaker, so one retry-exhausted call counts as one breaker failure, not N.

Observability hooks

Subscribe sync or async callbacks to 10 lifecycle events. Use them for logging, tracing, metrics, debugging — no need to thread loggers through your code.

from aether import Aether
from aether.events import REQUEST_COMPLETE, TOOL_ERROR

client = Aether()

@client.on(REQUEST_COMPLETE)
def log_latency(event):
    print(f"{event.request.model}: {event.duration_seconds:.2f}s")

@client.on(TOOL_ERROR)
async def alert(event):
    await send_alert(f"Tool {event.call.name} failed: {event.error}")

await client.ask("hi")

The 10 events:

Event Fires when
request.start / request.complete / request.error Around each provider.complete() call (including each tool-loop iteration)
stream.start / stream.chunk / stream.complete / stream.error Around streaming responses, one stream.chunk per delta
tool.start / tool.complete / tool.error Around each tool dispatch in the tool loop

Subscriber exceptions are caught and logged — observability never breaks the request path. Share an EventBus across multiple clients by passing events=bus to each Aether().

Architecture

aether/
├── client.py                ← Aether facade (the front door)
├── registry.py              ← generic plugin registry (any kind)
├── config.py                ← runtime config (env-driven, call-time)
├── events.py                ← EventBus + 10 lifecycle event types
├── llm/                     ← user-facing LLM API
│   ├── contracts.py         ← LLMProvider Protocol + Message, ToolCall, ...
│   └── ask.py               ← thin convenience
├── tools/                   ← user-facing tool API
│   ├── registry.py          ← @register_tool decorator
│   ├── schema.py            ← signature → JSON Schema
│   └── (dispatch_tool, get_tool, list_tools exposed via __init__)
├── memory/                  ← user-facing session API
│   ├── contracts.py         ← SessionStore Protocol
│   └── session.py           ← Session class
└── extensions/              ← all plugin implementations
    ├── llm/
    │   ├── openai.py, gemini.py, fake.py   ← adapters
    │   ├── retrying.py, circuit_breaker.py, cost_tracking.py  ← decorators
    │   ├── registry.py      ← LLM-provider registration helper
    │   ├── factory.py       ← name → instance
    │   └── builder.py       ← config → composed stack
    ├── tools/
    │   ├── time.py          ← get_current_time
    │   ├── http.py          ← http_get
    │   └── file.py          ← read_file
    └── memory/
        └── in_memory.py     ← InMemorySessionStore (default)

Each layer only knows the one below it. The generic registry at the top level (aether/registry.py) is the single source of truth for what's pluggable — providers, tools, and any future "kind" (vector stores, databases, ...) live in nested dicts keyed by kind.

Extending

Register a new LLM provider

from aether import register_provider
from aether.llm.contracts import LLMRequest, LLMResponse

@register_provider("ollama", api_key_env="OLLAMA_API_KEY", model_env="OLLAMA_MODEL")
class OllamaProvider:
    def __init__(self, api_key: str | None = None, default_model: str = "llama3"):
        self.default_model = default_model

    async def complete(self, request: LLMRequest) -> LLMResponse:
        ...

Now LLM_PROVIDER=ollama works with Aether() — no framework code changes. Retry, CircuitBreaker, and CostTracking wrap it automatically.

Register a new tool

from aether import register_tool

@register_tool(description="Look up a customer by ID")
def get_customer(customer_id: str) -> dict:
    """
    Args:
        customer_id: Internal customer ID (UUID).
    """
    return {"id": customer_id, "name": "..."}

Tool is now available as tools=["get_customer"] in any Aether.complete() call. JSON Schema is generated from the signature + docstring; the LLM sees the customer_id description verbatim.

Register any other kind of plugin

The same registry mechanism underlies both providers and tools. For future subsystems (vector stores, databases, ...) the pattern is register(kind, name, **metadata):

from aether import register

@register("vector_store", "pinecone", dimension=1536)
class PineconeStore:
    ...

Configuration

All runtime defaults live in aether/config.py and read from env vars at call time (not import time), so tests and live config reloads work cleanly.

Env var Default Affects
LLM_PROVIDER openai Which provider Aether() builds
OPENAI_API_KEY / GEMINI_API_KEY Per-provider API keys
OPENAI_MODEL / GEMINI_MODEL (provider default) Override default model
AETHER_DEFAULT_TEMPERATURE 0.7 Default sampling temp
AETHER_MAX_TOOL_ITERATIONS 10 Tool-loop cap before giving up
AETHER_HTTP_TOOL_TIMEOUT 10.0 Default timeout (seconds) for http_get
AETHER_HTTP_TOOL_MAX_BYTES 100000 Max body size before http_get truncates
AETHER_FILE_TOOL_MAX_BYTES 200000 Max bytes before read_file truncates

Precedence: per-call kwarg > env var > in-code fallback. Invalid env values silently fall back to the default rather than crashing.

Testing

.venv/bin/python -m pytest tests/

The FakeProvider lets you write end-to-end tests with no API calls:

from aether import Aether
from aether.extensions.llm.fake import FakeProvider

async def test_my_agent_logic():
    fake = FakeProvider(canned_response="hello")
    client = Aether(fake)
    assert await client.ask("hi") == "hello"

For scripted multi-turn flows (including tool calls):

from aether.llm.contracts import LLMResponse, ToolCall

fake = FakeProvider(responses=[
    LLMResponse(text="", model="...", input_tokens=1, output_tokens=1,
                tool_calls=[ToolCall(id="c1", name="add", arguments={"a": 2, "b": 3})]),
    LLMResponse(text="The answer is 5.", model="...", input_tokens=1, output_tokens=1),
])

Design principles

  1. Explicit over magical — no hidden globals; what you import is what runs.
  2. Composable over monolithic — decorators, providers, tools all opt-in.
  3. Observable by default — cost tracking on by default; usage exposed via client.usage.
  4. Failure-aware — retry + circuit breaker compose correctly.
  5. Cost & latency aware — tokens + dollar cost tracked per model.
  6. Typed contracts — Pydantic models for requests/responses; Protocol for providers.
  7. Async-firstcomplete() and stream() are both async; sync tools wrapped transparently.
  8. Testable without API callsFakeProvider plays both single-turn and scripted multi-turn flows.

Status & roadmap

Shipped: provider abstraction, resilience (retry + circuit breaker), cost tracking, streaming (including with tool calls), multi-turn messages, tool calling, reference tools, env-driven config, observability hooks (Observer pattern), sessions/memory subsystem.

On deck (not yet built):

  • Caching decorator
  • Anthropic provider
  • Redis-backed SessionStore
  • pyproject.toml and PyPI release

License

Not yet specified.

About

A Python framework for building AI-native applications, blending classical software engineering patterns with the demands of LLM-based execution.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages