SimpleAgent is primarily a local AI coding agent built around a simple idea: small models can become useful when the surrounding system gives them strong structure, good context, safe patching, and human-in-the-loop review.
Tested on Macbook Pro M2, 16GB RAM. This is made for users who wants to experience a local model with tool-like capabilities with their edge devices.
Work work. Ship ship. Poor man's Claude Code for tiny local models.
SimpleAgent is designed for local Ollama models such as nemotron-3-nano:4b, with a focus on practical code editing, workflow orchestration, file attachments, and safe patch application inside a terminal UI.
It also supports heavier workflows through Pollinations.ai models and Ollama cloud models.
Served in a fun flavour!
To install:
pipx install weirenong-simpleagentRun SimpleAgent:
simpleagentSet up pollinations api for quick usage:
/api-pollinations- Designing for Small Local Models
- What SimpleAgent Does
- Core Features
- Coding Workflow
- Persona System
- Workflow System
- Patch Review and Apply System
- Attachment and Web Context System (Local RAG)
- Terminal UX
- Commands
- Keybindings
- Installation
- Requirements
- Recommended Models
- Configuration
- Project Structure
- Example Coding Flow
- Why This Project Matters
- Contributing
- License
- Contact
SimpleAgent was built around a practical discovery: a 4B local model may struggle as a freeform software engineer, but it can still perform useful code editing when the environment reduces ambiguity.
The project therefore focuses less on making the model magically smarter and more on building a system that compensates for small-model weaknesses.
Small models often fail in predictable ways:
- They understand the task but output malformed patches.
- They produce correct code but wrap it in unusable formatting.
- They forget exact whitespace, which breaks Python patches.
- They create widgets but forget framework-specific layout calls.
- They output partial code while pretending it is the entire file.
- They follow one instruction well but degrade when too many goals are mixed together.
SimpleAgent's design is a response to those failure modes.
Small models need clear file context. SimpleAgent uses /attach to load files into the conversation and embed their contents for retrieval.
When an attached file changes after /code applies an edit, SimpleAgent refreshes the attachment context before the next prompt. This is critical for iterative coding because the model must see the latest file, not stale context.
Design goals:
- Attach the exact file the user wants edited.
- Prefer current attachment content over vague conversation history.
- Refresh changed attachments automatically.
- Allow
/workflow-debugto inspect the actual prompt messages sent.
Small models can generate dangerous edits. A model may output only one function inside a fenced code block, while the parser may initially treat it as a whole-file replacement.
That can produce a diff that appears to delete most of the file.
SimpleAgent therefore does not blindly apply model output. It always shows a review screen before writing files.
The /code flow is:
Model output
→ parse possible edits
→ generate diff
→ classify safe and risky regions
→ user reviews
→ F2 or F3 applies
One of SimpleAgent's most important design ideas is using the diff itself as a safety signal.
If the model outputs partial code that is surrounded by unchanged code, those unchanged lines act as safety bounds. Changes inside those bounds are likely intentional.
If the diff shows large unbounded deletions at the top or bottom of the file, those regions are risky.
SimpleAgent supports a safe/risky apply model:
| Key | Behaviour |
|---|---|
F2 |
Apply only safe bounded changes |
F3 |
Apply all changes, including risky highlighted regions |
Esc |
Cancel without changing files |
Risky diff regions are rendered in dark yellow where supported.
This allows SimpleAgent to recover useful edits from imperfect model output. Instead of rejecting the entire response or applying a destructive patch, it can apply only the parts that are structurally safe.
Small models do not always produce perfect Aider-style patches.
SimpleAgent therefore supports multiple edit formats:
| Format | Purpose |
|---|---|
| SEARCH/REPLACE | Precise Aider-style edits |
| Whole-file fenced code | Replace an entire file when appropriate |
| Partial fenced code | Recover function/class-level updates |
| Unified diff | Apply standard diff output |
| Context-only diff-like output | Recover useful intent from incomplete diff output |
The parser also normalises noisy filename lines such as:
hello.py
File: hello.py
File name: hello.py
File name:** hello.py
**File name:** `hello.py`
a/hello.py
b/hello.py
This matters because small models often produce path labels with markdown artefacts.
SimpleAgent is not designed to silently rewrite a project. It is designed to make a small model useful while keeping the developer in control.
The human reviews the diff before applying. The tool makes the model productive, but does not pretend the model is always correct.
This is especially important for local models because code quality can vary depending on prompt structure, temperature, context, and task complexity.
SimpleAgent workflows are plain markdown files. They are easy to edit, inspect, and version.
A workflow can define steps like:
prompt_start: "plan"
print: "Summoning the tiny intern to scribble a rough battle plan."
add_persona_context
add_attachment_context
add_original_user_prompt
prompt: "plan"
prompt_end
prompt_start: "coding"
print: "Handing the crayons to the serious robot now. Generating the proper solution."
add_persona_context
add_attachment_context
add_prompt_output: "plan"
add_original_user_prompt
add_user_prompt: "Output the file name first..."
prompt: "output"
prompt_endThis keeps the agent behaviour transparent. The model is not hidden behind a black-box orchestration framework.
SimpleAgent TUI is a terminal-based local AI assistant for:
- chatting with models,
- attaching project files as context,
- running multi-step prompt workflows,
- generating code edits,
- reviewing diffs before applying changes,
- experimenting with small-model coding workflows.
It is especially focused on making small local models useful for controlled software development tasks.
- Local Ollama chat interface for small and medium local models.
- Persona system for switching between general and coding behaviour.
- Workflow engine for multi-prompt agent flows.
- File attachments with embedded context retrieval.
- Automatic attachment refresh when attached files change.
- Code editing mode with
/codereview and apply. - SEARCH/REPLACE patch support for Aider-style edits.
- Whole-file and partial-file recovery from fenced code blocks.
- Unified diff parsing for standard patch output.
- Safe/risky diff application with F2/F3 controls.
- Pretty TUI rendering with markdown, code blocks, tables, and diff colouring.
- Raw output mode for parser debugging.
- Slash command completion with
/attachfile path autocomplete. - Conversation memory and retrieval using embeddings.
- Web context loading for URL/search-based context.
The coding workflow is designed around small-model reliability.
A typical coding flow looks like this:
- Attach a file.
- Ask for a small code change.
- Workflow prompt generates patch output.
/codeparses the last assistant reply.- SimpleAgent shows a diff.
- User presses
F2,F3, orEsc.
Example:
/attach hello.py
edit hello.py to add a quit button
/codeThe model output may contain SEARCH/REPLACE blocks:
hello.py
<<<<<<< SEARCH
old code
=======
new code
>>>>>>> REPLACE
SimpleAgent parses the raw response, generates a diff, and asks for confirmation before applying it.
Personas are SimpleAgent's lightweight way to change both the model's behaviour and the workflow that runs for a given task.
A persona provides system context. This is the high-level instruction block that tells the model how it should behave, what style it should use, and what constraints it should follow.
For example, a coding persona can tell the model to:
- act as a software engineer,
- be concise and action-oriented,
- avoid inventing file contents or tool results,
- preserve the user's intent,
- prefer structured code-edit output,
- follow small-model-friendly patch instructions.
This matters because small models are highly sensitive to framing. A clear persona gives the model stable behaviour before the workflow adds task-specific instructions.
Personas also act as an easy way to switch workflows.
Instead of manually choosing a workflow for every request, SimpleAgent can map a persona to a workflow. This allows different modes of operation:
| Persona | Example Workflow | Purpose |
|---|---|---|
default |
General chat workflow | Normal conversation and lightweight assistance |
coding |
Plan → patch workflow | File-aware code editing with /code support |
review |
Review-only workflow | Inspect code without applying changes |
debug |
Error analysis workflow | Analyse logs/errors and propose fixes |
This design keeps the user experience simple:
Switch persona → get different system context + different workflow behaviour.
Workflows are markdown files stored in either the project workflows directory or the user workflows directory. This is SimpleAgent's solution to automation and agentic tool usage. They are intentionally simple, readable and easy to edit.
| Command | Purpose |
|---|---|
prompt_start: "name" |
Begin a prompt block |
prompt_end |
End a prompt block |
add_persona_context |
Add current persona/system prompt |
add_recent_messages |
Add recent chat messages |
add_memory_context |
Add relevant memory context |
add_attachment_context |
Add relevant attached file context |
add_web_context |
Add relevant web context |
add_original_user_prompt |
Add the user's original prompt |
add_to_original_user_prompt: "..." |
Append text to the original user prompt |
add_system_context: "..." |
Add literal system context |
add_user_prompt: "..." |
Add literal user instruction |
add_prompt_output: "name" |
Add output from a previous prompt block |
print: "..." |
Print workflow status text to the interface |
prompt: "output_name" |
Run the model and store output under a name |
stage_code_changes" |
Applies the model suggested changes to a temp working copy |
stage_diffs" |
Shows the diff between original and staged temp copy |
Workflow commands can use multiline quoted strings:
add_user_prompt: "
Output the file name first.
Then output SEARCH/REPLACE blocks.
Preserve whitespace.
"Example of a multi prompt workflow with code staging and diff applying:
Coding SimpleAgent workflow
prompt_start: "plan"
print: "Summoning the intern to draft plans..."
add_persona_context
add_recent_messages
add_attachment_context
add_original_user_prompt
add_user_prompt: "
You are a senior engineer. Create a short coding plan.
Use EXACTLY these three headers. Nothing else.
CHANGE
One short sentence only.
FILES
One filename per line. Only files that need editing.
STEPS
Numbered list. Max 6 steps.
Start each step with a verb: Add, Remove, Replace, Update, Fix, Delete.
Do not add explanations, greetings, or extra text.
"
prompt: "plan"
prompt_end
prompt_start: "code"
print: "Tiny goblin engineer deployed. Work work. Generating first version..."
add_persona_context
add_attachment_context
add_original_user_prompt
add_prompt_output: "plan"
add_user_prompt: "
Implement the plan. Output ONLY file paths and unified diffs. No explanations. Give fast responses.
Rules (follow exactly):
1. File path on its own line.
2. Then a unified diff block.
3. Always show 1-2 lines of unchanged content
Do not write any text outside the file path + diff blocks.
Do not use python. Always use diff.
Keep changes small and precise.
"
prompt: "code"
prompt_end
prompt_start: "review"
print: "Doing some self checks. Generating the second version..."
add_persona_context
add_attachment_context
add_original_user_prompt
add_prompt_output: "plan"
add_prompt_output: "code"
add_user_prompt: "
You are a strict code reviewer.
Check the previous code output. Give fast responses.
Tasks:
- Fix any broken unified diff format
- Make sure it actually follows the goal of the original user prompt
- Fix wrong line numbers or bad context if needed
Output ONLY corrected unified diff blocks in the exact same format as the code stage (file path + ```diff block).
If no fixes needed, output the original blocks unchanged.
Do not add any explanations.
"
prompt: "output"
prompt_end
prompt_start: "apply"
print: "Review staged diffs carefully before accepting changes."
stage_code_changes: "review"
stage_diffs
prompt_end
Use:
/workflow-debugto inspect the exact prompt messages sent during the last workflow run.
This gives full transparency on token usage and also all the context + system + workflow prompts that goes in.
The /code command attempts to parse the last assistant reply into file edits.
Supported edit styles include:
hello.py
<<<<<<< SEARCH
old code
=======
new code
>>>>>>> REPLACE
hello.py
```python
full file content
### Partial Fenced Code
```text
hello.py
```python
def updated_function():
...
### Unified Diff
```diff
--- a/hello.py
+++ b/hello.py
@@ -1,3 +1,4 @@
old line
+new line
SimpleAgent classifies changes using unchanged diff lines as safety bounds.
| Key | Action |
|---|---|
F2 |
Apply safe bounded changes only |
F3 |
Apply all proposed changes |
Esc |
Cancel |
This allows the user to accept useful edits from imperfect model output without applying obviously risky deletions.
SimpleAgent treats local files and web pages as first-class context sources. This is SimpleAgent's lightweight local Retrieval-Augmented Generation (RAG) layer: /attach and /web retrieve relevant information, inject it into the prompt, and give small local models more grounded context before they answer or generate code.
Files can be attached with:
/attach path/to/file.pyAttached files are:
- read from disk,
- split into smaller text chunks,
- embedded using the configured embedding model,
- stored in an in-session vector index,
- retrieved as relevant context during prompts,
- refreshed when changed by
/code.
SimpleAgent uses langchain_text_splitters to break larger files into manageable chunks before embedding them. This keeps the context system useful even when files are too large to send fully into every prompt.
The basic flow is:
attached file
→ read text
→ split into chunks with langchain_text_splitters
→ generate embeddings with the configured Ollama embedding model
→ store chunks in the attachment vector index
→ retrieve relevant chunks when building the model prompt
This is especially important for small models because they benefit from precise context instead of noisy full-project dumps. The model receives the most relevant chunks for the user's request, while full attachment context can still be included where appropriate for smaller files.
The embedding model is configurable. A code-aware embedding model such as ordis/jina-embeddings-v2-base-code:latest is recommended because SimpleAgent often needs to retrieve function definitions, nearby code, and implementation details.
Typing:
/attach opens prompt-toolkit completions for files in the current workspace.
Current-directory files are prioritised before subdirectory files. Files ignored by .gitignore are skipped.
The /web command can be used with either a direct URL or a search query:
/web https://example.com/article
/web latest ollama structured output examplesWhen given a URL, SimpleAgent fetches and extracts the page content. When given a search query, SimpleAgent uses DuckDuckGo search to find relevant pages, scrape useful information, and add it as web context.
Web context follows a similar pattern to file attachments:
URL or DuckDuckGo query
→ fetch/search web content
→ extract readable text
→ split content into chunks
→ embed chunks with the configured embedding model
→ store chunks in the web context index
→ retrieve relevant chunks when building the prompt
This lets /web serve the same role as /attach, but for external information. The model can use scraped web context as grounded reference material instead of relying only on its internal training data.
In short:
| Source | Command | Purpose |
|---|---|---|
| Local files | /attach |
Give the model project/code/document context |
| Direct web pages | /web <url> |
Add a specific page as context |
| Web search | /web <query> |
Use DuckDuckGo to find and scrape selected relevant context |
DuckDuckGo web search implementation was inspired by Google Gemini's notebook feature.
SimpleAgent is built as a terminal-first tool.
UX features include:
- slash command suggestions,
- file path autocomplete for
/attach, - loading messages between workflow prompts,
- formatted markdown output,
- diff code blocks with colour,
/markuptoggle for raw output,Escto cancel command/apply states,F2andF3apply modes for code edits.
The TUI aims to make local model experimentation fast and inspectable.
| Command | Description |
|---|---|
/attach <path...> |
Attach supported files by path |
/web <url or search query> |
Load URL/search context |
/paste |
Paste text or image from clipboard |
/clear |
Clear session history, memory, attachments, web context, and workflow debug |
/code |
Review/apply edits from the last assistant reply |
/model <name> |
Show or change the chat model, use prefix pollinations/qwen-coder |
/embedding <name> |
Show or change the embedding model, use prefix pollinations/openai-3-small |
/vision <name> |
Show or change the vision model, use prefix pollinations/mistral |
/models |
List installed Ollama models and whitelisted Pollinations.ai models |
/api-pollinations |
Set up pollinations.ai api usage |
/persona |
Open persona manager |
/workflow |
Show workflow help |
/workflow-install <path> |
Install a workflow markdown file |
/workflow-debug |
Print prompt messages from last workflow run |
/markup |
Toggle markdown-style rendering for agent replies |
/history |
Show session history |
/about |
Show app info |
/version |
Show version |
/help |
Show help menu |
/exit, /quit, /q |
Exit app |
| Key | Action |
|---|---|
/ |
Open slash command suggestions |
Enter |
Submit input or accept selected completion |
Esc |
Clear active slash command or cancel apply dialog |
Esc then Enter |
Insert newline fallback where needed |
F1 |
Toggle thinking display |
F2 |
Apply safe bounded code changes in /code review |
F3 |
Apply all changes in /code review |
SimpleAgent is published on PyPI as:
weirenong-simpleagent
- Install pipx
If you do not already have pipx installed:
python -m pip install --user pipx
python -m pipx ensurepath- Install SimpleAgent
pipx install weirenong-simpleagent- Install and run Ollama:
ollama serve- Pull recommended models:
ollama pull nemotron-3-nano:4b
ollama pull ordis/jina-embeddings-v2-base-code:latest
ollama pull granite3.2-vision:2b- Start SimpleAgent:
simpleagentAlternatively, you may use pollinations.ai models instead of ollama too:
/api-pollinationsFollow the instructions as shown in the screenshot to set up the pollinations.ai api:

- Python 3.10+
- Ollama or Pollinations.ai api
SimpleAgent was built and tested around small local models.
Using Pollinations.ai:
| Role | Suggested Model |
|---|---|
| General chat and coding workflows | qwen-safety or qwen-coder |
| Embeddings | openai-3-small |
| Vision | mistral |
Using Ollama:
| Role | Suggested Model |
|---|---|
| General chat and coding workflows | nemotron-3-nano:4b |
| Embeddings | ordis/jina-embeddings-v2-base-code:latest |
| Vision | granite3.2-vision:2b |
| Alternative coding model | Qwen Coder / DeepSeek Coder local variants |
Nemotron is especially interesting because it can follow step-by-step procedural prompts well, which makes it suitable for structured local workflows. The inference speed for Nemotron is also very fast making the user experience better.
Configuration is saved to:
~/.simpleagent/config.json
Common configuration values include:
- selected chat model,
- embedding model,
- vision model,
- personas,
- persona-to-workflow mapping,
- markup formatting preference.
User workflows are stored in:
~/.simpleagent/workflows
Runtime attachments are stored under:
~/.simpleagent/attachments
High-level files:
| File | Purpose |
|---|---|
main.py |
TUI application, command handling, prompt flow |
editblock.py |
/code parsing, diff generation, patch application |
formatter.py |
Terminal markdown/code/diff rendering |
ollama.py |
Lightweight Ollama client |
utils.py |
Attachment reading, chunking, embeddings, retrieval helpers |
workflows/ |
Markdown workflow definitions |
/attach hello.py
edit hello.py so the quit button closes the app
/codeSimpleAgent may run a workflow like:
Summoning the tiny intern to scribble a rough battle plan.
Handing the crayons to the serious robot now. Generating the proper solution.
Then it shows a diff:
--- a/hello.py
+++ b/hello.py
@@ -1,3 +1,3 @@
-quit_btn = tk.Button(root, text="Quit", command=quit_app)
+quit_btn = tk.Button(root, text="Quit", command=root.destroy)Then the user chooses:
F2to apply safe changes,F3to apply all changes,Escto cancel.
SimpleAgent is a practical experiment in making small local models useful.
Instead of assuming only frontier models can perform code editing, the project explores what happens when a small model is paired with:
- explicit procedural prompting,
- multi-step workflows,
- file attachments,
- retrieval,
- conservative patch parsing,
- safe diff review,
- human approval.
This makes the project relevant to:
- local-first AI tooling,
- low-resource agent design,
- developer productivity tools,
- prompt orchestration,
- coding assistant UX,
- AI safety for file-editing agents.
The goal is not to beat large commercial coding agents. The goal is to show that careful systems design can make tiny local models surprisingly useful.
Contributions are welcome. Good areas to help with:
- parser robustness,
- workflow examples,
- model benchmarking,
- syntax/lint integrations,
- terminal UX improvements,
- documentation.
Please open an issue or submit a pull request.
MIT
For questions, feedback, or collaboration, please open an issue in this repository.






