GitHub - nayanguide/agent_capstone

Project Overview This project develops a multi‑agent pipeline to explore community‑driven methods for treating the common cold. The system collects raw data from Reddit forums, normalizes and consolidates treatment methods, retrieves supporting or refuting evidence from PubMed, and generates a draft article intended for physician review. The goal is to demonstrate how agent architectures can bridge informal knowledge sources with scientific validation.

Problem Statement Learners and practitioners often face a gap between different stages of AI workflows: one may understand how to build or run a model but struggle with data preprocessing, or conversely, have datasets without clarity on how to extract meaningful insights. In the medical context, this translates into difficulty connecting anecdotal treatments discussed online with validated scientific evidence. Additional challenges include resource requirements (local or cloud memory, storage) and execution time when external models are involved.

**Solution Statement ** The proposed solution is a sequential multi‑agent pipeline. Each agent performs a distinct role: collecting treatments from forums, normalizing them, searching PubMed for evidence, and drafting a structured article. This modular design makes the workflow easy to start, transparent to follow, and extensible with evaluation or human‑in‑the‑loop components. By separating tasks into agents, the system reduces complexity for learners and provides a clear path from raw data to actionable insights

Architecture

Architecture:

ForumCollector: gathers raw treatment methods from Reddit posts.
MethodNormalizer: consolidates duplicates and standardizes terminology using Gemini.
ArticleFinder: queries PubMed for scientific articles linked to each method.
MythPostWriter: produces a draft article with structured sections and placeholders for physician comments.
Orchestrator: manages sequential execution of agents.

Concepts: EvaluationAgent (metrics and coverage) and Loop Agent (human‑in‑the‑loop with physician feedback).

Outcome: The pipeline outputs structured JSON files (methods.json, normalized_methods.json, evidence.json) and a Markdown draft (draft.md). This demonstrates how multi‑agent systems can bridge community knowledge and scientific evidence, while leaving room for expert evaluation and iterative refinement

Pipeline overview ForumCollector (based on Conversational Bot):

Role: Collect community‑reported cold treatment methods from Reddit. Your changes: Input redesign: Replace free‑text dialog with search parameters (keywords, subreddits, timeframe). Processing swap: Use Reddit API calls instead of LLM turns; define minimal fields (method, context, URL, timestamp). Output format: Emit structured JSON for downstream use. Output: methods.json (raw methods with context and links). MethodNormalizer (based on LLM Auditor):

Role: Consolidate duplicates, unify naming, and standardize the list of treatments. Your changes: Prompting: Create normalization instructions (merge near‑duplicates, map slang to medical terms, drop non‑treatments). Schema enforcement: Require a stable JSON schema (id, canonical_name, aliases, notes). Determinism: Add simple heuristics (lowercasing, token dedup) to reduce LLM variance. Output: normalized_methods.json (unique, canonicalized methods). ArticleFinder (based on Conversational Bot):

Role: Retrieve PubMed evidence for each normalized method. Your changes: Query building: Generate method‑specific queries (canonical name + synonyms, optional filters like review/clinical trial). API integration: Use PubMed eutils; paginate; cap results; store key metadata (PMID, title, abstract link). Mapping: Maintain method→articles mapping for traceability. Output: evidence.json (articles grouped by method). MythPostWriter (based on LLM Auditor):

Role: Produce a physician‑oriented draft article summarizing what works and what doesn’t. Your changes: Structured sections: Introduction, methods overview, evidence summary per method, practical notes, limitations. Citation style: Inline PubMed links (PMIDs/URLs), consistent formatting; add placeholders for physician comments. Tone control: Neutral, clinician‑friendly, concise with bullet summaries per method. Output: draft.md (Markdown draft with citations and “Physician comment: __” fields). Orchestrator (ADK Graph / sequential runner):

Role: Define and execute the end‑to‑end workflow. Your changes: Node definition: Wrap each agent as a node with clear inputs/outputs. Edge wiring: ForumCollector → MethodNormalizer → ArticleFinder → MythPostWriter. State passing: Enforce JSON/Markdown contracts, simple logging, and basic failure handling (skip empty results). Output: Reproducible run producing all artifacts. Essential tools and data contracts APIs: Reddit (collection), PubMed eutils (evidence), Gemini (normalization + drafting). Data formats: methods.json: list of raw items with context and URLs. normalized_methods.json: canonical method objects with aliases. evidence.json: method‑keyed arrays of article metadata. draft.md: structured Markdown with physician placeholders. Standards: Stable schemas, consistent IDs across files, reproducible prompts, capped API calls for speed. Your contribution highlights Template adaptation: Converted dialog‑first templates into data‑first agents with explicit inputs/outputs. Schema design: Defined robust JSON contracts to make agents composable and testable. Query strategy: Built synonym‑aware queries for PubMed to improve recall without noise. Draft structure: Designed clinician‑friendly Markdown with clear sections and embedded evidence. Orchestration: Assembled a clean sequential pipeline ready for local runs and future ADK graph execution. Roadmap and planned extensions Physician verification (human‑in‑the‑loop): Plan: Add a review stage where a physician fills placeholders; iterate the draft based on feedback. Future: Loop Agent to re‑generate targeted sections until “approved”. EvaluationAgent (concept initially): Plan: Report coverage (methods collected/normalized/evidenced), link checks, draft length, and citation presence. Future: Quality heuristics (e.g., minimum evidence per method). Scalability: Parallelization: Batch ArticleFinder queries; cache results. Deployment: Move to Vertex AI pipelines; optional Notion integration for progress tracking. Compliance and provenance: Plan: Track PMIDs, timestamps, and source URLs for auditability; mark speculative claims and limitations. **Essential Tools and Utilities **

Reddit API (praw) Used in ForumCollector to gather posts from selected subreddits. Allows filtering by keywords and timeframe. Produces raw data about cold treatment methods saved into methods.json.

PubMed API (NCBI eutils) Used in ArticleFinder to search scientific publications. Builds queries from normalized methods, retrieves metadata such as PMID, title, and abstract link. Produces evidence.json with articles grouped by method.

Gemini (Vertex AI) Used in MethodNormalizer to consolidate and standardize methods, and in MythPostWriter to generate draft articles. Produces normalized_methods.json and draft.md.

Jupyter Notebook Development and testing environment for agents. Supports step‑by‑step execution and inspection of intermediate files.

Python libraries (requests, json, logging) Provide basic functionality for API calls, data processing, and saving results in JSON or Markdown formats.

ADK (Agent Development Kit) Used to describe the pipeline as a graph of agents. Defines nodes and edges, manages sequential execution, and ensures reproducibility.

Markdown Format for the final draft article. Easy to read for physicians and suitable for publication.

Notion Used for progress tracking and documentation. Provides checklists, tables, and visual representation of architecture and tasks.

Planned extensions EvaluationAgent to report metrics such as coverage and citation presence. Loop Agent to enable human‑in‑the‑loop physician review and iterative draft updates.

**Conclusion **

This project demonstrates how a modular, multi‑agent pipeline can bridge informal community knowledge with scientific validation. By adapting existing agent templates into data‑driven modules, the system shows a clear path from raw Reddit discussions to structured, evidence‑based drafts ready for physician review.

Template adaptation: transforming conversational and auditing agents into collectors, normalizers, evidence finders, and draft writers. Schema design: enforcing consistent JSON/Markdown outputs for interoperability. Pipeline orchestration: assembling agents into a reproducible sequential workflow. Conceptual extensions: outlining EvaluationAgent for metrics and Loop Agent for human‑in‑the‑loop physician feedback. The outcome is a set of artifacts (methods.json, normalized_methods.json, evidence.json, draft.md) and a documented architecture that can be run locally, published to GitHub, and extended to cloud environments. Future development will focus on physician verification, automated evaluation, and scaling the pipeline for broader medical or expert domains. This can also extend this pipeline to handle other types of queries in an advanced version. Instead of focusing only on cold treatment methods, the same agent architecture can be applied to different domains: collecting community discussions on various health topics, normalizing terminology, retrieving scientific evidence from PubMed or other databases, and generating structured drafts for expert review. This makes the system adaptable for broader use cases such as nutrition, mental health, or even non‑medical areas where community knowledge needs to be validated against authoritative sources.

https://github.com/nayanguide/agent_capstone.git License This Writeup has been released under the Attribution 4.0 International (CC BY 4.0) license.

Citation Eric Schmidt. Agent Shutton (sample submission). https://www.kaggle.com/competitions/agents-intensive-capstone-project/writeups/agent-shutton-sample-submission. 2025. Kaggle

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages