A local-first desktop archive for exported ChatGPT conversations. ChatArchive now uses a Tauri 2 shell, a React reader, a Rust OpenAI importer, durable SQLite app state, and a filesystem-backed library folder for normalized conversations, screenshots, attachments, exports, and manifests.
This project started as a static personal archive reader, but the direction is broader: a provider export goes in, normalized conversation data and durable indexes come out, and the same reader can eventually support ChatGPT, Claude, Gemini, local LLM tools, and other AI conversation sources without trusting browser storage as the permanent home.
- Runs as a Tauri desktop app on top of the existing Vite/React UI.
- Lets the user choose a visible
ChatArchive/library folder for backup clarity. - Imports a ChatGPT/OpenAI export through the Rust backend.
- Normalizes conversation trees into ordered message threads.
- Separates visible chat messages from hidden/raw system, tool, and metadata messages.
- Extracts Markdown text, fenced code blocks, execution output, citations, references, and image pointers.
- Copies matching local image assets into the selected library folder.
- Tracks unresolved asset pointers and external image URLs in durable archive metadata.
- Writes normalized per-conversation JSON on disk.
- Stores archive, conversation, message, artifact, search, tag, bookmark, favorite, pin, read-state, recent-view, and scroll metadata in SQLite.
- Builds dedicated artifact records for code, assets, document-like Markdown, and links.
- Provides a React reader with search, month grouping, conversation outline, image lightbox, code copy, all-code copy, raw-message toggle, and Markdown export.
- Highlights code blocks with a locally bundled Prism build; no CDN or external runtime call is needed.
- Renders Mermaid and ZenUML fenced diagrams from local npm packages, with source fallback when a diagram cannot be parsed.
- Opens to a local dashboard with archive totals, first/latest chat dates, code block counts, unresolved asset counts, recently viewed conversations, favorites, pins, and read/unread totals.
- Supports richer client-side search with phrases, regex mode, field chips, typed operators such as
type:code,language:python,type:document,type:link,domain:github.com, date ranges, and conversation length filters. - Migrates existing browser
localStorageviewer state fromchatArchive.viewerState.v1once, then treats SQLite as authoritative.
The current local archive build contains 448 conversations, 26,374 visible messages, 9,584 hidden/raw messages, 3,315 copied local assets, 25,006 code artifacts, 10,297 document artifacts, and 8,096 link artifacts. Those numbers come from the generated data currently in this working tree and will change whenever a different export is ingested.
Most chat exports are useful but awkward. They preserve data, not continuity. This project tries to make exported conversations browsable, inspectable, and reusable:
- Find old work without logging into a platform.
- Recover code snippets and decisions from long-running chats.
- Keep image-heavy conversations with local assets when possible.
- Export a single conversation to Markdown for notes, repos, documentation, or follow-up work.
- Build toward provider-neutral conversation archives that could include ChatGPT, Gemini, Claude, local LLM chats, and other tools.
- Node.js
- npm
- Rust toolchain compatible with Tauri 2
- A ChatGPT/OpenAI export folder containing
conversations.json
The app is now a Tauri 2 + React + TypeScript + Rust project.
Install dependencies:
npm installRun the desktop app in development:
npm run tauri:devBuild the React frontend:
npm run buildBuild the Windows desktop bundle:
npm run tauri:buildThe app will ask for a library folder. A normal library layout looks like:
ChatArchive/
├── archives/
│ └── openai-2026-02/
│ ├── raw/
│ ├── conversations/
│ ├── assets/
│ ├── exports/
│ └── manifest.json
├── chatarchive.db
└── settings.json
The legacy static ingest path is still present for comparison and fallback development:
npm run ingest
npm run dev
npm run previewIn the Tauri app, choose the OpenAI export folder from the import dialog. The folder should contain conversations.json.
For the legacy Node ingest script, the default source is D:\Chat\openai-history. You can point it at another OpenAI export folder with OPENAI_HISTORY_DIR:
$env:OPENAI_HISTORY_DIR = "D:\Exports\openai-history"
npm run ingestThe legacy static archive is written into:
public/archive-data/
public/archive-assets/
D:\Chat
├── src-tauri/
│ ├── src/
│ │ ├── main.rs # Tauri command registration
│ │ ├── commands.rs # Frontend command boundary
│ │ ├── db.rs # SQLite schema, settings, viewer state
│ │ ├── importer.rs # Rust OpenAI importer
│ │ └── models.rs # Shared archive/viewer models
│ ├── capabilities/ # Tauri permissions
│ └── tauri.conf.json
├── scripts/
│ └── ingest-openai-history.js # Legacy static OpenAI normalizer
├── src/
│ ├── App.tsx # Archive reader UI
│ ├── archiveApi.ts # Tauri command/static fetch adapter
│ ├── main.tsx # React entrypoint
│ ├── styles.css # App styling
│ └── types.ts # Archive data types
├── prism/
│ ├── prism.js # Locally bundled Prism languages/plugins
│ └── prism.css # Prism Okaidia theme
├── public/
│ ├── archive-data/
│ │ ├── index.json # Legacy static search/list index
│ │ ├── artifacts.json # Legacy static artifact index
│ │ ├── assets-manifest.json # Copied, external, and missing asset records
│ │ └── conversations/ # One normalized JSON file per conversation
│ └── archive-assets/ # Legacy copied local image assets
├── openai-history/ # Source export folder, local/private
└── dist/ # Production build output
The app reads a small normalized model instead of rendering the raw provider export directly.
ArchiveIndexcontains generated metadata, totals, and conversation summaries.ConversationSummarypowers search, grouping, counts, snippets, and selection.ConversationFilecontains a full normalized conversation and its messages.ArchiveMessagestores role, author, time, content type, extracted blocks, assets, references, hidden/raw status, and original content type.MessageBlocksupports Markdown, code, execution output, and notices.ArchiveAssettracks local, external, and missing assets.ArtifactIndex/SQLite artifact tables power exact language, code, asset, document, and link search without loading every conversation file.- SQLite owns user state: favorites, pins, read/unread status, recently viewed conversations, message bookmarks, tags, saved searches, and scroll positions.
This normalized layer is what makes future provider support realistic. Gemini, Claude, Ollama, Jan, or other sources do not need to match OpenAI's export format; they only need adapters that produce the same archive model.
- Only OpenAI/ChatGPT export ingestion is implemented.
- Markdown rendering is intentionally lightweight and does not cover every Markdown extension.
- Asset recovery is best-effort. Some OpenAI pointers cannot be matched to local files, but unresolved pointers are recorded.
- Audio and video payloads are skipped by the current asset extractor.
- Search and listing are moving behind Tauri/SQLite. Some rich filtering still reuses the existing frontend filter layer over the loaded index while Phase 2 explorer views are built.
- Prism and Mermaid are bundled locally, so the production build is intentionally larger than a CDN-based version.
- Mermaid diagram rendering is limited to fenced
mermaid,mmd, andzenumlcode blocks. - There is no built-in privacy scrubber yet. Treat generated archive files as sensitive.
- Provider-neutral import begins with the Rust
ProviderImporterboundary, but only the OpenAI implementation exists right now.
Exports can contain personal data, private code, credentials, screenshots, attachments, and sensitive conversation history. This project keeps processing local, but the generated files are still readable static assets.
Before publishing or sharing:
- Review
public/archive-data. - Review
public/archive-assets. - Consider deleting or excluding
openai-history. - Consider adding a future redaction pass for secrets, emails, paths, API keys, and personal identifiers.
Add importers that map other platforms into the shared archive model.
- Claude export adapter.
- Gemini export adapter.
- ChatGPT shared-link or HTML export adapter.
- Open WebUI, Jan, LM Studio, and Ollama conversation adapters where export formats are available.
- Adapter test fixtures so provider support does not depend on private archives.
The goal is a plugin-like ingestion layer:
provider export -> provider adapter -> normalized archive JSON -> same reader UI
Use archived conversations as context packs for local models.
Potential paths:
- Export selected conversations as model-ready Markdown.
- Export condensed summaries plus important code/assets.
- Create Ollama-compatible prompt bundles.
- Create Jan/Open WebUI import bundles if those formats support it.
- Add "continue this conversation locally" actions that prepare a compact handoff file.
Local models may not handle giant conversation histories in one pass, so this likely needs context batching:
- Chunk long conversations by topic, time, or message boundaries.
- Preserve code blocks and decisions as high-priority context.
- Generate rolling summaries.
- Let users choose "full transcript", "working summary", "code only", or "decision log" handoff modes.
Phase 1 added the first mature search pass: phrases, regex mode, field chips, typed operators, date ranges, conversation length filters, browser-local navigation state, and a dedicated artifact index for code, assets, documents, and links. The remaining work is deeper retrieval rather than basic viewer search.
- Full-text index with field weighting for title, user messages, assistant messages, code, assets, documents, and links.
- Conversation tags and manual notes.
- Saved searches.
- Semantic search using local embeddings.
- "Find related conversations" based on shared code, filenames, topics, or embeddings.
Make the archive useful as a personal knowledge base, not just a viewer.
- Rename conversations locally.
- Add tags, notes, and bookmarks.
- Mark important messages.
- Build collections across provider boundaries.
- Export curated bundles to Markdown, JSON, or static HTML.
- Generate repo-ready documentation from selected chats.
Chat platforms often make media export awkward. This framework can do more because it controls the local asset layer.
- Better pointer matching for OpenAI image assets.
- Support downloaded files, PDFs, audio transcripts, and generated images.
- Asset deduplication by hash.
- Attachment manifests with original names, content types, dimensions, and source messages.
- Missing-asset repair tools.
- Optional thumbnail generation.
Before this becomes generally useful for other people, privacy tooling should be first-class.
- Local redaction pass for API keys, tokens, emails, phone numbers, file paths, and custom patterns.
- Per-conversation exclusion rules.
- "Public export" mode that strips hidden/raw messages and risky metadata.
- Diffable redaction report.
- Secret scanner integration before static publishing.
The current app can already build to dist, but publishing needs guardrails.
- Public/private build modes.
- Optional password gate for personal hosting.
- GitHub Pages-compatible output.
- Cloudflare Pages-compatible output.
- Portable offline bundle.
- Single-conversation publish mode.
Once conversations are normalized, the archive can support workflows that platforms usually do not expose.
- Topic clustering.
- Decision extraction.
- Code snippet library.
- Timeline views.
- Project-specific conversation grouping.
- "What did I already try?" summaries.
- Cross-model comparison when the same task appears in multiple providers.
The ingest script is deliberately plain Node.js so it can run before the React app exists or without a server. It reads the provider export, writes static JSON, copies assets, and records anything it cannot resolve.
The UI is deliberately static. It fetches JSON from /archive-data, renders conversations in the browser, and does not require a backend service. That keeps the archive portable and makes it easier to host, zip, back up, or run locally.
- Split the current OpenAI ingest logic into a provider adapter shape.
- Add a small fixture-based test set for normalized archive output.
- Add a privacy scrubber before broader sharing.
- Add provider-neutral import documentation.
- Add a local-model handoff exporter for one selected conversation.
- Add Code Explorer, Document Explorer, Asset Explorer, and Link Explorer views on top of the artifact index.
Phase 1 archive viewer maturity is complete. The app is useful today for local OpenAI export browsing, dashboard review, Prism-highlighted code reading, local Mermaid/ZenUML diagram rendering, richer search/filtering, exact artifact-backed operators, and browser-local navigation state, with a clear path toward Phase 2 explorer views.