Skip to content

feat(app): mime-agnostic share-sheet ingest + vault-pit storage#1

Draft
Camaraarthur wants to merge 2 commits into
developfrom
feat/data-ingest-vault-pit
Draft

feat(app): mime-agnostic share-sheet ingest + vault-pit storage#1
Camaraarthur wants to merge 2 commits into
developfrom
feat/data-ingest-vault-pit

Conversation

@Camaraarthur

Copy link
Copy Markdown
Owner

What

Turns the daemon Android app into a content-addressed data pit: any file shared into daemon from any other app, any mime type, gets streamed → AES-256-GCM encrypted → dedup'd → indexed in the SQLCipher vault.

Before: share-sheet handler parsed inbound URIs but dropped the bytes (ChatScreen.kt:83-87 literally rendered (file-bytes import lands in v0.2)).
After: bytes go all the way to disk, the vault knows about them, and the schema has the L1/L2 slots ready for future transcription / embedding / persona work to populate.

Two commits, separately reviewable

  1. e9c21bfapp/v0.1: import Android orb baseline into git — first-time tracking of ~/daemon/app/ (the v0.1 orb architecture per app/SPEC.md). It had lived as untracked working-tree files only; this commit imports it as-is. Build artifacts (build/, .gradle/, *.apk), local.properties, and the parked aquarium/ TripoSR pipeline are excluded.

  2. ba2b847feat(app): mime-agnostic share-sheet ingest + vault-pit storage — the actual feature work, scoped to:

    • AndroidManifest.xml — collapse 4 mime-specific filters to one catch-all */* per action with android:order=999; add <meta-data> pointing to res/xml/shortcuts.xml.
    • res/xml/shortcuts.xml (new) — <share-target> for direct-share hinting.
    • vault/Vault.kt — schema v1→v2 migration adding files, message_files, derivations tables + an internal fileBlobKey() derived from the SQLCipher passphrase via HMAC-SHA-256 with "daemon-file-blob-v1" for domain separation.
    • vault/FileStore.kt (new) — URI → encrypted sidecar at filesDir/blobs/<sha256>.enc. Content-addressed dedup. open(FileRow) returns a CipherInputStream for future readers.
    • ui/ChatScreen.kt — replaces the metadata-only stub with a real ingest pipeline. Renders 📎 name · mime · size · ✓ in vault and persists message_files linkage.
    • Confirm-on-huge: ≥200 MB total share triggers an "import N GB?" dialog. Does not reject — just asks. Per the "data pit, never lose user data" intent.

What this PR does NOT do

  • Transcription / OCR / diarization / embedding (separate PRs — they INSERT INTO derivations)
  • L2 synthesis (summary, persona, cross-doc thread)
  • Wiring the existing ScreenshotWatcher OCR output into derivations (legacy SYSTEM-message path keeps working untouched)
  • Retrieval / RAG / chat-time lookup over imported files
  • Audio playback / preview / Descript-style editor

Test plan

  • ./gradlew assembleDebug — ✓ passing (15s in the worktree).
  • Install on Pixel 8 Pro, fingerprint-unlock.
  • Share a podcast .m4a from Pocket Casts / Files / etc. — verify daemon appears high in share sheet, chip renders with ✓ in vault.
  • adb shell run-as dev.daemon.app ls -la files/blobs/ — confirm encrypted sidecar exists.
  • Share a 1 GB+ video — verify confirm dialog appears.
  • Share the same file twice — verify dedup tag ✓ already in vault.
  • Re-share image to verify ScreenshotWatcher OCR path is unbroken.

Notes for review

  • app/SPEC.md:51 says data ingestion is v0.3. This PR moves it forward — vault-as-pit is now treated as core, per chat alignment with @Camaraarthur. SPEC.md edit deliberately not included; vision specs are human-edited.
  • The ~/daemon/CLAUDE.md rules document the v0.1 app/ architecture; that file is not on develop and is not added here either — would be a separate doc PR.

🤖 Generated with Claude Code

Camaraarthur and others added 2 commits May 19, 2026 00:33
First-time tracking of ~/daemon/app/ — the v0.1 Android orb architecture
that has lived as untracked working-tree files until now.

  - SQLCipher-encrypted vault (vault/) + biometric-gated Keystore master key
  - LlmProvider abstraction (llm/) with Echo / Anthropic / Mistral /
    OpenRouter / Gemini Nano implementations
  - PII regex strip (privacy/) + egress audit log (net/)
  - Share-with-daemon intent parsing (share/) — currently metadata-only
  - ScreenshotWatcher OCR ingest (ingest/) gated to Pictures/Screenshots/
  - Compose UI (ui/): chat shell, settings, egress audit, biometric lock

Build artifacts (build/, .gradle/, *.apk), local.properties, and the
parked aquarium/ TripoSR pipeline are excluded.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Before this change the share-sheet handler in app/ parsed inbound
URIs but dropped the bytes on the floor — ChatScreen rendered a
metadata-only "(file-bytes import lands in v0.2)" stub. This wires
the bytes through end to end.

  Manifest
  - Catch-all */* intent filters for ACTION_SEND + SEND_MULTIPLE
    with android:order=999 so daemon ranks high in the share sheet,
    no mime allow-list.
  - <share-target> declared in res/xml/shortcuts.xml as a Sharing
    Shortcuts hint so Android can surface daemon in the top
    direct-share row.

  Vault (schema v1 → v2)
  - files(id, sha256 UNIQUE, name, mime, size_bytes, blob_path,
    imported_at) — L0 raw bytes index, content-addressed.
  - message_files(msg_id, file_id, ord) — junction so messages can
    reference one or more imported files.
  - derivations(id, file_id, kind, model, text, blob, meta,
    created_at) — L1/L2 slot for future transcripts, OCR,
    embeddings, summaries, persona excerpts. Empty in this PR.
  - DerivationKind conventions exposed for future pipelines.
  - internal fileBlobKey() = HMAC-SHA-256(passphrase,
    "daemon-file-blob-v1") so file encryption is domain-separated
    from SQLCipher's use of the same passphrase.

  FileStore.kt (new)
  - Streams URI bytes via ContentResolver → AES-256-GCM encrypts
    with a fresh 12-byte IV → writes to
    filesDir/blobs/<sha256>.enc.
  - Plaintext sha256 is computed in-stream via DigestInputStream
    for content-addressed dedup (already-present sha → row is
    reused, new blob discarded).
  - open(FileRow) returns a CipherInputStream over decrypted
    plaintext — entry point for future transcription / embedding
    pipelines.
  - Mime-agnostic: anything that ContentResolver can openInputStream
    is ingestible. No size limit at the store level.

  ChatScreen
  - SharedPayload.Files now fans through ingestFiles() which:
    - calls FileStore.import for each item (in IO),
    - renders a single chat summary
      ("📎 name · mime · size · ✓ in vault"),
    - persists the summary as a SYSTEM message and links each
      imported file to it via attachFileToMessage().
  - SharedPayload.Files ≥ 200 MB total triggers a confirm dialog
    ("import N GB into daemon?") — per Arthur's "don't fail, ask".
  - Vault-locked path emits a useful "locked — couldn't store"
    note instead of dropping silently.

End-to-end effect: any file (any mime) shared into daemon lands as
encrypted, dedup'd, indexed bytes in the vault, ready for future
RAG / transcription / persona work to populate the derivations
table on top.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant