Skip to content

feat: hybrid search implementation#82

Open
barbara-celi wants to merge 14 commits into
mainfrom
feat/hybrid-search
Open

feat: hybrid search implementation#82
barbara-celi wants to merge 14 commits into
mainfrom
feat/hybrid-search

Conversation

@barbara-celi

Copy link
Copy Markdown
Contributor

Description

This PR adds hybrid search backend support to @vtexdocs/components as an alternative to Algolia, enabling both Help Center and Dev Portal to use the new VTEX Docs Hybrid Search API.

Changes:

  • Extended SearchConfig with new backend option: { backend: 'hybrid', hybrid: {...} }
  • Implemented hybrid search adapter that translates InstantSearch queries to /api/search calls and transforms responses to Algolia-compatible format.
  • Exported new types: HybridSearchConfig and SearchBackendConfig.
  • Maintained full backward compatibility. Existing Algolia implementations work unchanged.
  • The hybrid backend is opt-in via configuration, with no breaking changes to component APIs.

Related:

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Requires change to documentation, which has been updated accordingly.

This commit introduces a new hybrid search adapter for the `@vtexdocs/components` package, allowing integration with the Help Center's API while maintaining backward compatibility with Algolia. Key changes include the addition of a new `HybridSearchConfig` interface, updates to the `search-config.ts` file to support hybrid search, and modifications to the `SearchConfig` function to handle both Algolia and hybrid configurations. The implementation aims for minimal code changes and reuses existing components.
This update modifies the request selection logic in the `search-config.ts` file to prioritize requests with a non-empty query. If no such request is found, it defaults to the first request in the array. This change enhances the hybrid search functionality by ensuring more relevant queries are processed.
@barbara-celi barbara-celi self-assigned this May 5, 2026
@barbara-celi barbara-celi added the release-minor Minor version bump label May 5, 2026
@barbara-celi barbara-celi changed the title [EDU-17906] - feat: hybrid search implementation feat: hybrid search implementation May 5, 2026
export interface HybridSearchConfig {
apiEndpoint: string
source: 'help-center' | 'dev-portal'
defaultLimit?: number

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The helpcenter consumer (PR vtexdocs/helpcenter#456, src/utils/libraryConfig.ts:13) passes itemsPerPage: 10 into this config object, expecting it to set the default page size. With the field named defaultLimit here, that value is silently ignored — the destructure on line 77 falls back to the hardcoded 10.

Suggest aligning the names so the contract is consistent. Two options:

  1. Rename here to itemsPerPage?: number (matches the surrounding InstantSearch / Algolia vocabulary; no consumer change needed).
  2. Keep defaultLimit and update libraryConfig.ts on the helpcenter side to use the same name.

Either works; option 1 is friendlier to existing widget terminology, but option 2 keeps the components-side semantics explicit. Worth picking one before merge so the field actually has effect.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another factor for option 2 is to keep components compatible with other portals.

Comment thread src/utils/config/search-config.ts Outdated
Comment on lines +337 to +365
content: result.snippet || result.content || '',
hierarchy,
language: result.metadata?.locale || 'en',
type: 'content',
_highlightResult: {
content: {
value: result.snippet || result.content || '',
matchLevel: 'full',
fullyHighlighted: false,
matchedWords: [],
},
hierarchy: {
lvl0: {
value: hierarchy.lvl0,
matchLevel: 'none',
},
lvl1: {
value: hierarchy.lvl1,
matchLevel: result.title ? 'partial' : 'none',
},
},
},
_snippetResult: {
content: {
value: result.snippet || '',
matchLevel: 'full',
},
},
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue: snippets render as raw markdown in the Help Center search results

A query like sku surfaces snippets such as | sku_manufacturer_code | character varying(65535) | Code used by merchant to reference the manufacturer. — literal markdown table syntax instead of plain text.

Why

The upstream /api/hybrid-search returns each hit's snippet as a raw substring of the indexed .md source. In transformHybridToAlgolia, that raw string is forwarded into the InstantSearch hit shape at three points:

  • content (line 337)
  • _highlightResult.content.value (line 343)
  • _snippetResult.content.value (line 361)

connectHighlight then renders _highlightResult.content.value as plain text inside SearchCard, so markdown characters appear verbatim. Algolia does not have this problem because its indexing pipeline strips markdown before storing the content attribute, so the search client only ever sees plain text.

Recommendation

Strip markdown inside transformHybridToAlgolia before assigning the snippet:

const cleanedSnippet = stripMarkdown(result.snippet || result.content || '')
// use `cleanedSnippet` for `content`, `_highlightResult.content.value`, `_snippetResult.content.value`

A small regex pass (headings, emphasis, links, code fences, table pipes) or a strip-markdown + remark round-trip is enough.

Doing it here — rather than in customHighlight.tsx — avoids corrupting InstantSearch's highlight boundaries, which by that point are already split fragments. The adapter is the single choke point through which every hybrid hit flows, so the fix stays isolated.

Long-term

Proper fix is upstream in vtexdocs/vtexdocs-mcp-app — either index plain text or return a sanitized snippet. Same surface as vtexdocs-mcp-app#46 (server-side facet counts) and tracked in EDU-18399. Once that lands, the adapter-level stripping can be removed.

barbara-celi and others added 8 commits June 8, 2026 14:06
…tion

This commit introduces a new utility function, `stripMarkdownForSnippet`, to remove Markdown syntax from text snippets, enhancing the readability of search results. Additionally, the `hitsPerPage` configuration is now dynamically set from the search configuration, improving pagination handling in the search results component.
- Enhanced stripMarkdownForSnippet to handle images, code blocks, blockquotes
- Fixed regex patterns for better markdown syntax removal
- Ensures search results display clean plain text without markdown formatting

Co-authored-by: Cursor <cursoragent@cursor.com>
- Add special handling for tracks URLs to use only last segment
- Fix URL pattern: /pt/docs/tracks/modulos-da-vtex-i instead of full path
- Improve snippet handling to remove mid-word truncation
- Strip incomplete words at start of truncated snippets

Co-authored-by: Cursor <cursoragent@cursor.com>
- Added special handling for tutorial URLs to use only the last segment
- Improved URL pattern for special doctypes (faq, known-issues, troubleshooting, announcements) to return only the slug
- Updated comments for clarity on URL construction logic
- Updated URL construction logic to handle multiple special document types (tracks, tutorials, faq, known-issues, troubleshooting, announcements) using a unified approach.
- Simplified the URL pattern to return only the slug for these document types.
- Removed redundant conditions and improved code clarity with updated comments.
- Updated URL construction logic to differentiate between document types (tracks, tutorials, faq, known-issues, troubleshooting, announcements).
- Implemented specific handling for tracks and tutorials to retain "docs" in the URL, while removing it for other types.
- Enhanced comments for clarity on the URL patterns and their respective handling.
…ration

- Added specific handling for 'faq', 'troubleshooting', 'announcements', and 'known-issues' document types to return only the slug in the URL.
- Enhanced comments to clarify the URL patterns and their handling logic.
- Ensured consistent URL construction for localized paths.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

release-minor Minor version bump

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants