Feature: Workspace Text Extraction & Full-Text Page Search

Currently, the search bar in the toolbar only matches document names, page labels, or page numbers. For professional workflows, users need to search for text keywords inside the PDF pages themselves and filter or highlight the pages that contain those matches.

### Technical Proposal:
1. In `pdfjsReader.ts`, add a helper to extract text from a page using:
 ```typescript
 const textContent = await page.getTextContent();
 const textItems = textContent.items.map(item => item.str).join(" ");
 ```
2. Index this text content during document ingestion in `ingest.worker.ts` and add it to the page entity schema in `types.ts`.
3. Integrate a client-side search index (e.g., `flexsearch` or a simple keyword regex matcher) in the store selectors to filter the page grid.
 
<hr/>

<details><summary>This repo is using Opire - what does it mean? 👇</summary> 💵 Everyone can add rewards for this issue commenting <code>/reward 100</code> (replace <code>100</code> with the amount). 🕵️‍♂️ If someone starts working on this issue to earn the rewards, they can comment <code>/try</code> to let everyone know! 🙌 And when they open the PR, they can comment <code>/claim #6</code> either in the PR description or in a PR's comment. 🪙 Also, everyone can tip any user commenting <code>/tip 20 @YurMil</code> (replace <code>20</code> with the amount, and <code>@YurMil</code> with the user to tip). 📖 If you want to learn more, check out our <a href="https://docs.opire.dev">documentation</a>.</details>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature: Workspace Text Extraction & Full-Text Page Search #6

Technical Proposal:

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Feature: Workspace Text Extraction & Full-Text Page Search #6

Description

Technical Proposal:

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions