The first open-source, self-hosted voice agent that actually drives your iPhone — built for the 中国 app ecosystem.
Hold the mic. Say "打开淘宝搜 AirPods" — your phone does it.
Ask "北京明天天气怎么样" — it answers. Then "那后天呢" — context preserved.
Existing options either can't drive Chinese apps by voice (Apple Intelligence, Siri Shortcuts), require dedicated hardware (Rabbit R1, Humane), or are research demos with no voice loop (Mobile-Agent, AppAgent). This is the first end-to-end voice → LLM → iPhone loop tuned for the 中国 app ecosystem — running entirely on your own Azure OpenAI subscription, your own LAN, your own keys.
| Voice-in | Drives Chinese apps | Self-hosted | Open source | Multi-turn ctx | No new hardware | |
|---|---|---|---|---|---|---|
| personal-assistant (this) | ✅ | ✅ 15+ | ✅ | ✅ MIT | ✅ | ✅ |
| Apple Intelligence / Siri | ✅ | ❌ | ❌ | ❌ | partial | ✅ |
| Rabbit R1 / Humane Pin | ✅ | ❌ | ❌ | ❌ | ✅ | ❌ |
| Mobile-Agent / AppAgent (research) | ❌ text only | ✅ | ✅ | ✅ | ❌ | ✅ |
| Auto-GPT / OpenInterpreter | ❌ | ❌ | ✅ | ✅ | ✅ | n/a |
A complete voice → intent → device-action loop running on your own infra:
🎙 iPhone PWA (hold-to-talk)
↓ HTTPS
🧠 FastAPI server (LAN, mDNS pa-agent.local)
├─ Whisper (ASR)
├─ GPT-4.1 (chat with multi-turn memory + screen vision)
├─ Realtime TTS (Azure WS, alloy voice)
└─ Intent planner → action chain
↓
📱 Tethered iPhone
├─ devicectl (open any installed app by name)
├─ WebDriverAgent (tap / type / swipe / screenshot)
├─ URL schemes (tel:, taobao:, openjd:, iosamap:, …)
└─ Open-Meteo (weather lookups)
| Feature | Try saying | |
|---|---|---|
| 📱 | Open any app | "打开微信 / 京东 / 小红书" |
| 🔍 | Search inside apps (15+ apps) | "淘宝搜 AirPods Pro" / "B站搜原神" |
| 📞 | Phone calls | "打电话给 13812345678" |
| 🌦 | Weather (real data) | "上海周末会下雨吗" |
| 👁 | Screen vision | "看看我屏幕上写的什么" |
| 🗺 | Navigation | "导航去最近的星巴克" |
| 🎵 | Music | "播放周杰伦" |
| ⏱ | Timer / reminder | "5分钟后提醒我" |
| 🔗 | Multi-step chains | "打开淘宝然后搜 iPhone 壳" |
| 💬 | Multi-turn context | "明天天气" → "那后天呢?" |
| 🧠 | Personal RAG memory | "我女朋友几号生日?" / "帮我推荐午餐 注意我的过敏" |
| 📲 | Mobile PWA | Add-to-Home-Screen, hold-to-talk, dark UI |
| 🌐 | Zero config networking | mDNS + auto LAN-IP detection |
# 1. install
./scripts/bootstrap.sh
# 2. configure Azure OpenAI keys
cp .env.example .env && $EDITOR .env
# 3. start
PA_PORT=8780 ./scripts/start.sh
# → http://192.168.1.x:8780/ (LAN)
# → http://pa-agent.local:8780/ (mDNS)
# 4. open the URL on your iPhone Safari → Add to Home ScreenFor phone control, set PA_IOS_DEVICE_UDID in .env and run WebDriverAgent.
uv run pytest # 18/18 passing
make lint type # ruff + mypy cleanThree layers — see docs/ARCHITECTURE.md.
src/pa/
agent/ # intent planner, multi-turn session memory
voice/ # Azure OpenAI: ASR, chat, TTS, vision
adapters/ # ios_devicectl, ios_wda, weather, echo
executor/ # Action / ActionResult contract
api/ # FastAPI routes + PWA static
memory/ # long-term memory bridge (claude-mem)
core/ # config, logging
cli.py # `pa` Typer CLI
- Phase 1 — knowledge ingestion (claude-mem batch pipeline)
- Phase 2 — voice loop + iPhone control (you are here)
- Phase 3 — Android adapter, calendar/email/tasks tools
- Phase 4 — proactive assistant (reminders, summaries, suggestions from memory)
- All audio/text round-trips through your own Azure OpenAI deployment.
- LAN-only by default; no cloud hosting required.
- iPhone control is over USB-tethered WDA — nothing leaves your network.
- Ingestion script excludes
*.key / *.pem / *.env / id_rsa*by default.
MIT © 2026 @wanmeng
Built with ❤️ as the world's most personal AI agent — one that knows you.