Given a plain-English goal β "merge a pull request" or "reply to an email thread" β automatically discover the exact sequence of API calls required, annotated with what you provide and what each step produces.
Modern SaaS platforms expose hundreds of API endpoints. Using them correctly is non-trivial:
| Challenge | Reality |
|---|---|
| APIs have strict call-ordering requirements | You cannot merge a PR without first fetching it |
| Outputs of one call must feed inputs of the next | repo_id from GET flows into CREATE_ISSUE |
| Documentation is vast and scattered | GitHub alone has 600+ endpoints |
| Developers waste hours tracing dependencies manually | Especially painful for cross-service workflows |
API Dependency Planner solves this. Describe what you want to do in plain English, and the system maps the full dependency chain β which tools to call, in what order, with what data flowing between them.
- Two platforms supported β GitHub (issues, PRs, merges) and Google Workspace (Gmail, Calendar, Drive, Sheets)
- Hybrid planning engine β data-driven BFS graph first, LLM ranking second
- Field-level dependency tracking β knows exactly which field from step N feeds into step N+1
- User input annotation β tells you upfront what you need to provide vs what the API chain produces automatically
- Interactive dependency graph β visual left-to-right flow with color-coded edges
User Query (plain English)
β
βΌ
βββββββββββββββββββββββ
β Keyword Filter β Narrows 500+ tools β ~20 relevant candidates
β (tools.py) β using intent maps + slug/field scoring
ββββββββββ¬βββββββββββββ
β
βΌ
βββββββββββββββββββββββ
β BFS Dependency β Builds a directed graph on the filtered subset.
β Graph Builder β Matches produces_candidates β inputs by field name.
β (dependency.py) β Returns up to 5 valid chains.
ββββββββββ¬βββββββββββββ
β
βββββ chains found? ββββ YES βββΆ rank_plan() LLM picks the best chain
β from BFS candidates
βββββ chains found? ββββ NO βββΆ llm_plan() LLM builds chain from
scratch using tool catalog
β
βΌ
βββββββββββββββββββββββ
β Chain Validator β Drops hallucinated slugs.
β + Analyzer β Classifies each input as: user-provided vs chained.
β (tools.py) β
ββββββββββ¬βββββββββββββ
β
βΌ
βββββββββββββββββββββββ
β Graphviz Render β Left-to-right dependency graph.
β (graph.py) β Green dashed = user input. Amber solid = chained data.
βββββββββββββββββββββββ
| Layer | Technology |
|---|---|
| UI | Streamlit |
| LLM inference | Groq (llama-3.3-70b-versatile) |
| Graph algorithm | Custom BFS with field-name scoring |
| Visualisation | Graphviz (python-graphviz) |
| Tool catalogues | JSON (GitHub REST API + Google Workspace API) |
This is the core of the system. Two strategies work together:
Builds a directed graph purely from field-name matching across tool definitions:
Tool A produces_candidates: "repo_id, full_name"
Tool B inputs: "repo_id, title, body"
β
match β draw edge A β B
Match scoring:
| Match type | Score | Example |
|---|---|---|
Input ends with _<output> |
3 (strong) | output=id, input=issue_id |
| Exact field name match | 2 | output=repo_name, input=repo_name |
Output is id, input ends _id |
1 (weak) | generic id propagation |
Domain filtering β tools are grouped by domain (ISSUE, PULL, GMAIL, CALENDARβ¦). Edges are only drawn within the same domain by default, preventing nonsensical chains like LIST_COMMITS β CREATE_ISSUE. The filter relaxes automatically for cross-service goals (e.g. Calendar β Gmail for "daily agenda email").
BFS explores the graph from the most-connected nodes and returns up to 5 valid chains.
If BFS found chains β rank_plan() sends all candidates to llama-3.3-70b-versatile (Groq) and asks it to pick the best one for the user's goal. The LLM is only choosing, not inventing β much more reliable.
If BFS found nothing β llm_plan() sends the filtered tool catalog directly and asks the LLM to construct the chain from scratch, using strict rules about field satisfaction and fetch-before-act ordering.
Before BFS runs, the tool catalogue is narrowed from 500+ to ~20 using a two-pass filter:
Pass 1 β Intent matching The user query is compared against a curated intent map:
"merge pr": ["pull", "merge", "repo", "get"]
"reply to email": ["reply", "message", "thread", "email", "gmail", "fetch"]The best-matching intent's keywords are used to score tools.
Pass 2 β Slug + field scoring
score += 3 if keyword in tool slug
score += 1 if keyword in tool inputs
score += 1 if keyword in tool produces_candidates
Tools that introduce create/delete/send actions are suppressed unless the user's query explicitly contains those verbs β preventing CREATE_REPOSITORY from appearing in a "create issue" chain.
---
Rendered with Graphviz (rankdir=LR, splines=polyline).
[USER INPUT] ββ(green dashed)βββΆ [TOOL A] ββ(amber solid)βββΆ [TOOL B]
owner repo_id issue_id
repo_name full_name
title
| Edge type | Color | Meaning |
|---|---|---|
| Green dashed | #16a34a |
Field provided directly by the user |
| Amber solid | #d97706 |
Field produced by a previous tool in the chain |
| Grey solid | #94a3b8 |
Control-flow only (no explicit field match) |
Each tool node shows its short name and the fields it produces, color-coded by service for Google tools (Gmail = red, Calendar = blue, Drive = green, Sheets = teal).
βββ app.py
βββ tools.py
βββ dependency.py
βββ planner.py
βββ graph.py
βββ ui.py
βββ styles.py
Each file has a single responsibility. app.py imports from all modules but contains no business logic itself.
You provide: owner, repo_name, pr_number
GITHUB_GET_A_REPOSITORY
β GITHUB_GET_A_PULL_REQUEST
β GITHUB_CHECK_IF_PULL_REQUEST_HAS_BEEN_MERGED
β GITHUB_MERGE_A_PULL_REQUEST
| Step | Needs from you | Gets from prev step | Produces |
|---|---|---|---|
| GET_A_REPOSITORY | owner, repo_name | β | repo_id, full_name |
| GET_A_PULL_REQUEST | pr_number | repo_id | sha, mergeable, mergeable_state |
| CHECK_IF_...MERGED | β | repo_id, pr_number | merged (bool) |
| MERGE_A_PULL_REQUEST | β | sha, repo_id, pr_number | merged, sha |
You provide: user_id, thread_id, message_body
GOOGLESUPER_FETCH_MESSAGE_BY_THREAD_ID
β GOOGLESUPER_REPLY_TO_THREAD
| Step | Needs from you | Gets from prev step | Produces |
|---|---|---|---|
| FETCH_MESSAGE_BY_THREAD_ID | user_id, thread_id | β | message_id, headers, body |
| REPLY_TO_THREAD | message_body | thread_id, message_id | message_id, thread_id |
# Install dependencies
pip install streamlit groq graphviz
# Set your Groq API key
export GROQ_API_KEY=your_key_here
# Run
streamlit run app.pyTool JSON schema (one entry per tool):
{
"slug": "GITHUB_CREATE_AN_ISSUE",
"description": "Creates a new issue in a repository.",
"inputs": "owner,repo,title,body",
"output": "IssueResponse",
"produces_candidates": "issue_number,issue_id,url"
}Why BFS first, LLM second? BFS is deterministic and grounded in real field names from the tool catalog. The LLM is better at understanding intent but prone to hallucinating tool names. Using BFS to generate candidates and LLM to rank them combines the strengths of both.
Why Groq with temperature=0.0?
Dependency chains must be reproducible. Any randomness introduces hallucinated slugs or wrong ordering. Zero temperature makes the LLM behave as a deterministic ranker.
Why a keyword pre-filter before BFS? Running BFS on all 500+ tools generates a graph with thousands of weak edges and meaningless chains. Filtering to ~20 relevant tools first makes BFS fast, precise, and scoped to the actual goal.
Why Graphviz over a JS library?
Streamlit's iframe sandbox blocks CDN-loaded JS libraries (vis-network, d3) at runtime. Graphviz renders to SVG natively via st.graphviz_chart with no network dependency.