Skip to content

Feature: Lazy Loading Architecture for Large Repository Performance#73

Open
vovarbv wants to merge 2 commits into
kleneway:mainfrom
vovarbv:feat/lazy-loading-performance
Open

Feature: Lazy Loading Architecture for Large Repository Performance#73
vovarbv wants to merge 2 commits into
kleneway:mainfrom
vovarbv:feat/lazy-loading-performance

Conversation

@vovarbv

@vovarbv vovarbv commented Jun 19, 2025

Copy link
Copy Markdown
Contributor

Description

This pull request introduces a fundamental shift in how PasteMax handles file systems by implementing a lazy loading architecture. The primary goal is to eliminate UI freezing and dramatically improve initial load times when working with large repositories, making the application scalable and responsive.

Previously, the app would read and tokenize every file in a selected folder upfront, which was not feasible for large codebases. Now, it performs a quick metadata scan and only processes files on-demand.

Key Changes

  • 🚀 Lazy Loading & Token Estimation:

    • The initial folder scan is now lightweight, gathering only file metadata and providing token estimations.
    • Actual file content and accurate token counts are only processed when a user selects a file.
    • The UI now displays visual cues for estimated tokens (~123 est) and a loading spinner for files being processed.
  • ** modals for Large Folders:**

    • A LargeFolderModal now warns the user if a selected directory is very large, giving them options to proceed, load with files deselected, or cancel.
    • A LargeSubfolderModal provides a similar warning when selecting a large subfolder from the file tree.
  • ✨ UI & UX Improvements:

    • A ProcessingOverlay now appears during batch operations to provide clear feedback.
    • The file tree (TreeItem.tsx) and file list (FileCard.tsx, FileList.tsx) have been updated to handle the new isTokenEstimate state and display loading indicators.
    • CopyButton.tsx logic has been refactored to be more flexible.
  • 🏗️ Architectural Refactoring:

    • Workspace management logic has been extracted from App.tsx into a new useWorkspaces.ts hook for better separation of concerns.
    • Dependencies have been cleaned up (e.g., removing gpt-3-encoder).
  • 📚 Documentation:

    • Added ARCHITECTURE.md to provide a high-level overview of the application structure.
    • Added docs/features/lazy-loading.md to detail the new performance architecture.
    • Updated README.md and CONTRIBUTING.md with the latest development workflow and project info.

How to Test

  1. Test with a small repository:

    • Select a small folder and verify that all existing functionality (file selection, sorting, copying) still works as expected.
    • Confirm that files show estimated tokens initially and then update to real counts upon selection.
  2. Test with a very large repository (e.g., >10,000 files):

    • Observe that the initial folder load is now very fast and the UI remains responsive.
    • Select a large subfolder from the tree view and verify the LargeSubfolderModal appears.
    • Select a few individual files and confirm they process on-demand without freezing the app.
  3. Verify Documentation:

    • Check the new and updated documentation files for clarity and accuracy.

This PR resolves major performance bottlenecks and sets a strong foundation for future scalability. Looking forward to your feedback!

vovarbv added 2 commits June 19, 2025 17:23
This commit introduces a lazy loading architecture to significantly improve performance and UI responsiveness when handling large codebases.

The previous implementation loaded all file contents and tokenized them upfront, causing severe slowdowns and freezing on large folders.

This new architecture addresses these issues by:
- Performing a lightweight initial scan that only gathers file metadata and provides token estimations based on file type and size.
- Deferring the expensive work of reading file content and performing accurate tokenization until a file is explicitly selected by the user.
- Introducing UI components to handle large folder warnings (LargeFolderModal, LargeSubfolderModal) and provide clear user feedback during on-demand processing (ProcessingOverlay).
- Refactoring App.tsx by extracting workspace logic into a dedicated useWorkspaces hook to improve state management and readability.
- Updating documentation to reflect the new architecture (ARCHITECTURE.md, lazy-loading.md).
@haikalllp haikalllp self-requested a review June 20, 2025 05:24
@haikalllp haikalllp added the Work In Progress Still need additional fixes and review label Jun 20, 2025
@haikalllp

haikalllp commented Jun 20, 2025

Copy link
Copy Markdown
Collaborator

Hey @vovarbv, this looks pretty good. I've done some manual testing both on my Linux system and my Windows system, and there seem to be some issues.

Issues With Sidebar and File Tree Handling:

  • The sidebar is completely broken, and cannot collapse/expand folders in the file tree.
  • Selecting and Deselecting Folders is a little buggy and sometimes doesn't work.
  • The Collapse Folder or Expand Folder button does not work. I think this must have something to do with the sidebar issues.

Issues with loading Small Repository:

  • When loading a small repository, it seems to load fine, but it doesn't process or read any of the loaded files. The user must manually click the refresh button for it actually to trigger the manual processing.
  • After the manual processing then they can see the actual file preview and copy the file contents.

Note: There seems to be no issue with loading a very large repository, and I think it's working as you intended.

Issues With Binary Files Handling:

  • Binary Files Handling is completely broken and revamped.
  • Binary Files shouldn't be loaded and selected if the toggle binary as paths is off.
  • The flag styling for binary files is overridden by a new CSS styling in both the Sidebar and FileCard/FileList (I don't think this is intended).
  • File cards of files that are considered Binary should have proper binary-based file card stylings, and they should not be able to be 'Previewed' or 'Copied' (As the file card actions), and can only be 'Removed', showing only X symbol on the file card.
  • Toggling binary as paths should automatically select/deselect any binary files.
  • When toggled off, these binary files should not be selectable and flagged as 'Excluded'.
  • When toggled on, these binary files should be selectable and flagged as 'Binary'.

What I recommend:

  • You should review the previous builds to fully understand how binary files are handled first, and re-incorporate them into this new build.
  • With the Binary Files CSS Stylings on Sidebar, FileCard and FileList, you might need to review them and use the old stylings instead of the new ones.
  • Issues with sidebar collapsing/expanding, you should check into sidebar.tsx, treeItem.tsx and App.tsx and see what's wrong. Right now, they don't work at all. You can maybe check this pr Add collapse/expand feature #74 and implement their version on this build.

These issues seem to happen on both systems, in which I suspect that it is not a platform-based issue.
Since these issues occur on Linux systems, I suspect that it also occurs on Mac OS Systems.

Tested on:

  • Windows 11
  • Linux Fedora (RPM)

@vovarbv

vovarbv commented Jun 23, 2025

Copy link
Copy Markdown
Contributor Author

Hi @haikalllp - You’re absolutely right, that was my oversight. I added it for my own convenience and didn’t have the chance to test it thoroughly. I’ll put together a more robust solution shortly. It’s tough to nail every detail on the first pass, so please forgive any slip-ups. I really enjoy this project and use it often!

@RagingKore

Copy link
Copy Markdown

This is excellent, because I'm getting issues with a project with 400+ folders and 2000+ files. it just locks and then reverts to an empty result. So it never finishes its analysis.

Just some immediate feedback. But I can try the changes as soon as I have time.

Thanks sirs.

@haikalllp

haikalllp commented Jun 25, 2025

Copy link
Copy Markdown
Collaborator

This is excellent, because I'm getting issues with a project with 400+ folders and 2000+ files. it just locks and then reverts to an empty result. So it never finishes its analysis.

Just some immediate feedback. But I can try the changes as soon as I have time.

Thanks sirs.

@RagingKore For now just try to add folders into the ignore filter, some binary files and large build files might be processed, hence it causes errors and empty results.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Work In Progress Still need additional fixes and review

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants