Skip to content

AG Grid Table Content Not Being Extracted #1

Description

@JKevinXu

AG Grid Table Content Not Being Extracted

Issue Description

The ChatBrowse extension is not properly extracting content from AG Grid tables when attempting to summarize pages that contain these data grids. This results in incomplete page summaries that miss important tabular data.

Problem Details

Current Behavior

  • When running "summarize the page" on pages containing AG Grid tables, the extracted content does not include the table data
  • The content extraction focuses on static HTML elements but misses dynamically rendered grid content
  • Users receive incomplete summaries that lack crucial tabular information

Expected Behavior

  • AG Grid table content should be extracted and included in page summaries
  • Table headers, row data, and structure should be captured
  • The AI should be able to analyze and summarize the tabular data

Technical Context

Current Content Extraction Strategy

The extension currently uses these selectors for content extraction:

const mainContentSelectors = [
  'main', 'article', '#content', '.content', 
  '#main', '.main', '.article', 'section',
  '[role="main"]', '[data-testid="post-content"]',
  '.post-content', '.article-content', '.entry-content'
];

AG Grid Specific Challenges

  1. Dynamic Rendering: AG Grid renders content dynamically using JavaScript
  2. Virtual Scrolling: Large datasets may use virtual scrolling, showing only visible rows
  3. Custom DOM Structure: AG Grid uses its own DOM structure that differs from standard HTML tables
  4. Timing Issues: Content may not be fully rendered when extraction occurs

Reproduction Steps

  1. Navigate to a page containing AG Grid tables (e.g., GitHub's magentic-ui repository with data tables)
  2. Open ChatBrowse extension popup
  3. Type "summarize the page"
  4. Observe that the summary lacks table content

Affected Files

  • src/utils/content-extractor.ts - Main content extraction logic
  • src/content.ts - Content script that handles page info extraction
  • src/services/context-service.ts - Service that manages page context

Proposed Solutions

1. AG Grid Specific Selectors

Add AG Grid specific selectors to the content extraction:

const agGridSelectors = [
  '.ag-root-wrapper',
  '.ag-grid',
  '.ag-body-viewport',
  '.ag-row',
  '.ag-cell'
];

2. Dynamic Content Detection

Implement detection for dynamically rendered content:

  • Wait for AG Grid initialization
  • Check for ag-grid-angular, ag-grid-react, or ag-grid-vue components
  • Monitor for grid ready events

3. Table Data Extraction

Create specialized extraction for tabular data:

function extractAGGridData() {
  // Get grid API if available
  const gridApis = window.agGrid?.gridApis || [];
  
  // Extract data from each grid
  gridApis.forEach(api => {
    const rowData = [];
    api.forEachNode(node => rowData.push(node.data));
    return rowData;
  });
}

4. Fallback to DOM Parsing

If API access isn't available, parse the rendered DOM:

function parseAGGridDOM() {
  const grids = document.querySelectorAll('.ag-root-wrapper');
  return Array.from(grids).map(grid => {
    const headers = Array.from(grid.querySelectorAll('.ag-header-cell-text'))
      .map(cell => cell.textContent?.trim());
    
    const rows = Array.from(grid.querySelectorAll('.ag-row'))
      .map(row => Array.from(row.querySelectorAll('.ag-cell'))
        .map(cell => cell.textContent?.trim()));
    
    return { headers, rows };
  });
}

Priority

High - This affects the core functionality of page summarization for data-heavy applications that commonly use AG Grid.

Labels

  • bug
  • enhancement
  • content-extraction
  • ag-grid
  • data-tables

Additional Context

AG Grid is widely used in enterprise applications for displaying large datasets. Many GitHub repositories, admin dashboards, and data visualization tools use AG Grid, making this a common issue for users trying to summarize such pages.

Related Technologies

  • AG Grid Community/Enterprise
  • Angular AG Grid
  • React AG Grid
  • Vue AG Grid

Testing Pages

Acceptance Criteria

  • AG Grid tables are detected on pages
  • Table headers are extracted correctly
  • Row data is captured (at least visible rows)
  • Extracted table data is included in page summaries
  • Performance impact is minimal
  • Works with different AG Grid configurations (Community/Enterprise)
  • Handles virtual scrolling scenarios
  • Graceful fallback when AG Grid API is not accessible

Metadata

Metadata

Assignees

Labels

bugSomething isn't workingenhancementNew feature or request

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions