Skip to content

File attachments overhaul + message-files junction table#16

Open
1337hero wants to merge 14 commits into
mainfrom
improvements/attachments
Open

File attachments overhaul + message-files junction table#16
1337hero wants to merge 14 commits into
mainfrom
improvements/attachments

Conversation

@1337hero
Copy link
Copy Markdown
Owner

Summary

Two related bodies of work landed on this branch:

1. File attachments capability model (10 phases — specs/file-attachments/)

End-to-end overhaul of how uploads are validated, classified, sent to providers, and served back to the browser.

  • Phase 1–2 — Shared attachment capability model + MIME normalization/classification (packages/shared/src/constants/files.js)
  • Phase 3 — Text-like inline conversion
  • Phase 4 — PDF native preflight
  • Phase 5 — Office document extraction (server/src/lib/officeExtraction.js, .docx/.pptx/.xlsx fixtures + tests)
  • Phase 6 — Image validation and resizing (server/src/lib/imageValidation.js)
  • Phase 7 — Provider/model preflight (server/src/lib/providerErrors.js, providerFactory.js)
  • Phase 8 — Error propagation to UI (ErrorBanner.jsx, errorHandler.js)
  • Phase 9 — Safe download + active-content handling, SSRF guard (server/src/lib/ssrf.js)
  • Phase 10 — Frontend accept/copy alignment (FilePreviewList.jsx, useFileUploader.js, InputArea.jsx; replaces the old FileUpload.jsx)

2. Message↔Files junction table (specs/message-files-junction-table.md)

Replaces the JSON-array messages.file_ids column with a proper relational message_files junction table.

  • New migration 003_message_files_junction.js — creates table, backfills from JSON, drops file_ids column
  • ON DELETE CASCADE on files(id) and messages(id) — orphaned references can no longer accumulate
  • API response shape preserved (fileIds: string[]) — frontend untouched
  • Comprehensive test coverage in server/src/test/db.message-files.test.js

Other improvements bundled in

  • DB layer split out of monolithic server/src/lib/db.js into per-domain modules under server/src/lib/db/ (audit, chats, files, folders, memory, models, providers, settings, users)
  • Migration runner introduced (migrations/index.js)
  • Server error handling refactor (lib/errorHandler.js, providerErrors.js)
  • Admin/login bug fix
  • .oxlintrc.json added
  • scripts/reset-admin-password.js

Test plan

  • bun run test — server suite passes (new files: db.message-files.test.js, files.test.js, imageValidation.test.js, officeExtraction.test.js)
  • Manual: upload PDF, image, .docx, .pptx, .xlsx — verify each renders/sends correctly
  • Manual: delete a file referenced by a message → verify junction row cascades and message no longer lists the file
  • Manual: delete a message with attachments → verify junction rows clean up
  • Manual: attempt SVG / oversize uploads → verify rejection with clear UI error
  • Manual: provider preflight error surfaces to ErrorBanner

1337hero added 12 commits April 28, 2026 13:22
- Drop unused PROVIDER_ATTACHMENT_CAPABILITIES (providerFactory is source of truth)
- Drop FILE_CONFIG.ALLOWED_TYPES + validateFileType; validateFile now requires filename and gates purely on classifier
- Remove orphan debug files at server/ root
- officeExtraction: decode XML entities (&, <, numeric/hex refs); shared extractTagText helper covers docx/xlsx/pptx
- useChat: surface submit-failure errors via sendError merged into chat error banner; preserve files on failure
- FileUpload: parallel uploads via Promise.allSettled instead of sequential await loop
- Drop dead code: formatErrorDetails, collectAttachmentIdsFromRequest, getMimeTypeFromExtension, asFileContentPart, lying error.response.body branch, unused warnings, double export
- Replace forwardRef/useImperativeHandle file-input ceremony with native <label htmlFor>; FileUpload becomes useFileUploader hook + FilePreviewList
- Split preflightAttachments into pure classifyForModel + small aggregator; drop hasImageDimensionIssue flag-leak
- Return category from POST /api/files; FilePreviewList renders from server-classified category instead of re-deriving
- Drop unused PROVIDER capability scaffold and unused ATTACHMENT_ACCEPT_* constants; ATTACHMENT_INPUT_ACCEPT now derives from FILE_CATEGORY_DEFINITIONS
- Strip phase-numbered + name-restating JSDoc across attachment files
- MessageAttachment query: staleTime/gcTime Infinity for immutable upload metadata
- useChat exposes discrete appendFiles/removeFile mutators instead of raw setInputFiles drilling
- ChatInterface passes onFilesUploaded/onRemoveFile into InputArea (no setter exposed across boundary)
- Split FileUpload.jsx into useFileUploader hook (frontend/src/hooks/useFileUploader.js) and FilePreviewList component (frontend/src/components/chat/FilePreviewList.jsx); each file is named after its single export
Add security-ports.test.js covering:
- XSS/SQL injection/path traversal prevention
- Data isolation between users (chats, files, metadata, content, delete)
- Cross-user file attachment blocking (403)
- Cascade delete: chat → messages gone
- Input validation limits (100K message content, 200 char title,
  50 char username, duplicate username, pagination cap)
- Concurrent operations (parallel chat/file/message creation)
- Dangerous file type rejection (SVG, .exe, zero-byte)

Python test files (server/python_tests/) and manual rate-limit
test script (server/test-rate-limit-bypass.sh) removed; all
unique coverage is now in the Bun suite.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant