Skip to content

manavsehgal/claude-code-analyst

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

50 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

Claude Code Analyst

Python uv Claude Code License

Transform web content into structured knowledge with AI-powered analysis tools

Features โ€ข Quick Start โ€ข Documentation โ€ข Examples

๐Ÿ“‹ Overview

Claude Code Analyst is a comprehensive toolkit for capturing, converting, and visualizing web content. Built specifically for Claude Code integration, it provides powerful utilities to transform unstructured web content into clean Markdown documents, preserve complete HTML archives, and generate insightful Mermaid.js visualizations.

Why Claude Code Analyst?

  • ๐ŸŒ Complete Content Capture: Convert web articles to Markdown OR preserve as clean HTML archives
  • ๐Ÿ“Š Content Intelligence: Extract structured data with comprehensive metadata preservation
  • ๐ŸŽจ Visual Understanding: Automatically generate diagrams from text to reveal hidden patterns and relationships
  • ๐Ÿš€ Production Quality: Respects robots.txt, handles edge cases, and produces clean, consistent output

โœจ Features

๐Ÿ”„ Article to Markdown Converter

Transform web articles into clean, portable Markdown files:

  • Smart Extraction: Uses Mozilla's Readability algorithm to extract main content while filtering out ads, navigation, and clutter
  • Image Preservation: Downloads and organizes images with proper relative path references
  • Rich Metadata: Captures title, publication date, word count, and source attribution in YAML frontmatter
  • Dual Input Support: Works with both web URLs and local HTML files
  • Respectful Scraping: Checks robots.txt before processing any URL
  • Clean Output: Generates well-formatted Markdown with preserved text flow

๐ŸŒ HTML Page Downloader

Create self-contained HTML archives of web pages:

  • Complete Preservation: Downloads entire web pages as clean, readable HTML documents
  • Smart Content Extraction: Uses advanced algorithms to identify and extract main content
  • Image Archiving: Downloads all referenced images with proper HTTP headers to bypass basic protection
  • Enhanced Substack Support: Properly handles Substack articles with anchor-wrapped images
  • Comprehensive Metadata: Preserves OpenGraph, Twitter cards, publication dates, and source attribution
  • Professional Styling: Generates clean HTML5 output with embedded responsive CSS
  • Offline Ready: Creates fully self-contained archives perfect for offline reading and research

๐Ÿ“Š Mermaid Visualization Generator

Create intelligent visualizations from Markdown content:

  • Auto-Analysis: Identifies concepts, workflows, timelines, and relationships from text
  • Multiple Diagram Types: Generates flowcharts, timelines, mind maps, Sankey diagrams, and more
  • Contextual Output: Each visualization includes relevant source text and explanations
  • Batch Processing: Creates comprehensive visualization sets from single documents
  • Claude Code Integration: Available as a custom /mermaid command

๐Ÿ–ผ๏ธ Mermaid to Image Converter

Convert Mermaid diagrams to high-quality images:

  • Professional Quality: Uses official Mermaid CLI for production-grade rendering
  • Multiple Formats: Export as PNG, SVG, or PDF with configurable themes and dimensions
  • Batch Processing: Convert multiple diagrams from a single markdown file
  • Organized Output: Sequential naming and proper folder structure
  • Theme Support: Default, dark, forest, neutral, and base themes available
  • Custom Configuration: Configurable via config.yml for dimensions, themes, and output settings

๐Ÿš€ Quick Start

Prerequisites

  • Python 3.13+
  • uv package manager
  • Optional: Mermaid CLI for image conversion

Installation

# Clone the repository
git clone https://github.com/manavsehgal/claude-code-analyst.git
cd claude-code-analyst

# Install dependencies with uv
uv sync

Basic Usage

1๏ธโƒฃ Convert Web Article to Markdown

# Convert any web article
uv run python scripts/article_to_md.py https://example.com/article

# Convert local HTML file
uv run python scripts/article_to_md.py /path/to/local/file.html

# Specify custom output directory
uv run python scripts/article_to_md.py https://example.com/article --output-dir my-articles

Output Structure:

markdown/
โ””โ”€โ”€ article-title-kebab-case/
    โ”œโ”€โ”€ article.md        # Clean Markdown with YAML frontmatter
    โ””โ”€โ”€ images/           # Preserved images
        โ”œโ”€โ”€ image1.jpg
        โ””โ”€โ”€ image2.png

2๏ธโƒฃ Download HTML Archive

# Download complete HTML archive
uv run python scripts/html_downloader.py https://example.com/article

# Custom output directory
uv run python scripts/html_downloader.py https://example.com/article --output-dir archives

# Skip robots.txt check (use responsibly)
uv run python scripts/html_downloader.py https://example.com/article --skip-robots

Output Structure:

html/
โ””โ”€โ”€ article-title-kebab-case/
    โ”œโ”€โ”€ index.html        # Self-contained HTML document
    โ””โ”€โ”€ images/           # All downloaded images
        โ”œโ”€โ”€ diagram1.png
        โ””โ”€โ”€ chart2.svg

3๏ธโƒฃ Generate Visualizations (Claude Code)

# In Claude Code, use the custom command
/mermaid markdown/article-title/article.md

Output Structure:

mermaid/
โ””โ”€โ”€ article-title/
    โ”œโ”€โ”€ 01-timeline.md
    โ”œโ”€โ”€ 02-flowchart.md
    โ”œโ”€โ”€ 03-relationships.md
    โ””โ”€โ”€ README.md

4๏ธโƒฃ Convert Visualizations to Images

# Convert Mermaid diagrams to high-quality images
uv run python scripts/mermaid_to_image.py mermaid/article-title/01-timeline.md --format png --theme dark

# Batch convert all diagrams in a file
uv run python scripts/mermaid_to_image.py mermaid/article-title/workflow.md --format svg

Output Structure:

visualizations/
โ””โ”€โ”€ article-title/
    โ”œโ”€โ”€ 01-timeline-01.png
    โ”œโ”€โ”€ 02-flowchart-01.svg
    โ””โ”€โ”€ 03-relationships-01.pdf

๐Ÿ”— Complete Workflow Examples

Research & Analysis Workflow

# Step 1: Create HTML archive for clean reading
uv run python scripts/html_downloader.py https://research-paper.com/ai-study

# Step 2: Create Markdown for text analysis  
uv run python scripts/article_to_md.py https://research-paper.com/ai-study

# Step 3: Generate visualizations (in Claude Code)
/mermaid markdown/ai-study/article.md

# Step 4: Convert diagrams to presentation-ready images
uv run python scripts/mermaid_to_image.py mermaid/ai-study/01-workflow.md --format png --theme dark

# Result: Complete research package with readable archive, processable text, 
# and visual insights with presentation-ready images

Documentation Preservation

# For offline documentation that preserves original styling
uv run python scripts/html_downloader.py https://docs.example.com/api-guide --output-dir documentation

# For portable markdown documentation
uv run python scripts/article_to_md.py https://docs.example.com/api-guide --output-dir documentation

๐Ÿ“š Documentation

Guide Description
Article Converter Guide Complete guide for Markdown conversion tool
HTML Downloader Guide Comprehensive HTML archiving tool documentation
Mermaid Generator Guide Creating visualizations with Claude Code
CLAUDE.md Claude Code configuration and development settings
Documentation Index All available documentation

๐ŸŽฏ Examples

Article Metadata Output (Markdown)

---
title: "Understanding Neural Networks"
source_url: https://example.com/neural-networks
article_date: 2024-12-15
date_scraped: 2024-12-20
word_count: 2847
image_count: 12
---

# Understanding Neural Networks

Article content with preserved formatting and ![relative image links](images/diagram.png)...

HTML Archive Features

<!DOCTYPE html>
<html lang="en">
<head>
    <!-- Comprehensive metadata preservation -->
    <meta name="source-url" content="https://original-url.com">
    <meta property="og:title" content="Article Title">
    <meta name="twitter:card" content="summary_large_image">
    
    <!-- Embedded responsive styling -->
    <style>
        body { font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI'... }
        img { max-width: 100%; height: auto; }
    </style>
</head>
<body>
    <!-- Clean, readable content with local image references -->
    <img src="images/local-diagram.png" alt="Diagram">
</body>
</html>

Generated Mermaid Visualization

timeline
    title Evolution of AI Models
    
    2017 : Transformer Architecture
         : Attention Mechanism
    
    2020 : GPT-3 Release
         : 175B Parameters
    
    2023 : ChatGPT Launch
         : Consumer AI Era
Loading

๐Ÿ—๏ธ Project Structure

claude-code-analyst/
โ”œโ”€โ”€ scripts/                    # Python tools and utilities
โ”‚   โ”œโ”€โ”€ article_to_md.py       # Web article to Markdown converter
โ”‚   โ”œโ”€โ”€ html_downloader.py     # HTML page archiving tool
โ”‚   โ””โ”€โ”€ mermaid_to_image.py    # Mermaid diagram to image converter
โ”œโ”€โ”€ docs/                       # User guides and documentation
โ”‚   โ”œโ”€โ”€ README.md              # Documentation index
โ”‚   โ”œโ”€โ”€ article-to-md-guide.md
โ”‚   โ”œโ”€โ”€ html-downloader-guide.md
โ”‚   โ””โ”€โ”€ mermaid-visualization-guide.md
โ”œโ”€โ”€ html/                       # HTML archives (generated)
โ”‚   โ””โ”€โ”€ article-title/
โ”‚       โ”œโ”€โ”€ index.html
โ”‚       โ””โ”€โ”€ images/
โ”œโ”€โ”€ markdown/                   # Converted articles (generated)
โ”‚   โ””โ”€โ”€ article-title/
โ”‚       โ”œโ”€โ”€ article.md
โ”‚       โ””โ”€โ”€ images/
โ”œโ”€โ”€ mermaid/                   # Visualizations (generated)
โ”‚   โ””โ”€โ”€ article-title/
โ”‚       โ””โ”€โ”€ *.md
โ”œโ”€โ”€ visualizations/            # Generated images (from mermaid_to_image.py)
โ”‚   โ””โ”€โ”€ article-title/
โ”‚       โ”œโ”€โ”€ diagram-01.png
โ”‚       โ”œโ”€โ”€ chart-02.svg
โ”‚       โ””โ”€โ”€ flow-03.pdf
โ”œโ”€โ”€ projects/                  # Analysis projects
โ”œโ”€โ”€ transcripts/              # Video transcripts
โ”œโ”€โ”€ backlog/                  # Project planning
โ”‚   โ””โ”€โ”€ active-backlog.md
โ”œโ”€โ”€ tests/                    # Test suite
โ”œโ”€โ”€ .claude/                  # Claude Code custom commands
โ”‚   โ””โ”€โ”€ commands/
โ”‚       โ”œโ”€โ”€ mermaid.md        # Mermaid visualization generator
โ”‚       โ””โ”€โ”€ readme.md         # README generation command
โ”œโ”€โ”€ config.yml                # Configuration file
โ”œโ”€โ”€ CLAUDE.md                 # Claude Code configuration
โ”œโ”€โ”€ pyproject.toml           # Project dependencies
โ””โ”€โ”€ README.md                # This file

๐Ÿ› ๏ธ Development

Setup Development Environment

# Install all dependencies
uv sync

# Install development dependencies
uv sync --dev

# Run tests
uv run pytest tests/

# Code quality checks
uv run ruff check .
uv run black .
uv run mypy .

Code Style

  • Follow PEP 8 guidelines
  • Use type hints for all functions
  • Write comprehensive docstrings
  • Maintain test coverage
  • Respect robots.txt and website terms of service

๐Ÿ“ฆ Dependencies

Package Purpose
requests Web fetching and HTTP handling
beautifulsoup4 HTML parsing and manipulation
markdownify HTML to Markdown conversion
readability-lxml Article content extraction
mermaid-mcp Mermaid diagram processing and image conversion
pyyaml Configuration file handling
ruff Fast Python linting
black Code formatting
mypy Static type checking
pytest Testing framework

๐ŸŽฏ Use Cases

๐Ÿ“š Research & Academia

  • Academic Papers: Archive research papers as HTML for citation and clean Markdown for analysis
  • Literature Reviews: Convert multiple sources to consistent formats for comparative analysis
  • Reference Management: Build structured knowledge bases with metadata preservation

๐Ÿ“– Documentation & Knowledge Management

  • Technical Documentation: Convert API docs to portable Markdown or preserve as styled HTML
  • Team Knowledge Base: Archive important articles and resources for offline access
  • Competitive Intelligence: Analyze competitor content and track changes over time

๐Ÿ“ฐ Content Analysis & Journalism

  • News Archiving: Preserve news articles before they change or disappear
  • Content Migration: Move content between platforms while maintaining formatting
  • Fact Checking: Create timestamped archives of web content for verification

๐Ÿข Business Intelligence

  • Market Research: Archive industry reports and analysis
  • Competitive Analysis: Track competitor announcements and strategy documents
  • Compliance: Maintain records of regulatory content and policy changes
  • Strategic Planning: Visualize business processes and strategies from archived content

๐Ÿšฆ Roadmap

Completed Features โœ…

  • Article to Markdown conversion with metadata (web URLs and local HTML files)
  • HTML page archiving with image preservation and enhanced Substack support
  • Mermaid visualization generation (Claude Code integration)
  • Mermaid to image conversion (PNG, SVG, PDF export)
  • Comprehensive documentation and user guides
  • Professional development tooling (ruff, black, mypy, pytest)

Planned Enhancements ๐Ÿ”„

  • PDF article processing support
  • Batch processing multiple URLs with progress tracking
  • Custom CSS themes for HTML archives
  • Export to additional formats (JSON, CSV, EPUB)
  • Enhanced metadata extraction (author detection, category classification)
  • API endpoint for programmatic access
  • Video/audio content transcription and processing
  • Archive compression (ZIP/TAR formats)
  • Integration with more visualization formats beyond Mermaid
  • Chrome/Firefox browser extension for one-click archiving
  • Cloud storage integration (S3, Google Drive, Dropbox)

๐Ÿค Contributing

We welcome contributions! Please follow these guidelines:

  1. Fork the repository and create a feature branch
  2. Follow PEP 8 and add comprehensive type hints
  3. Write tests for new functionality (aim for >80% coverage)
  4. Update documentation for any user-facing changes
  5. Respect ethical guidelines - ensure tools are used responsibly
  6. Test thoroughly with various website types and edge cases

Development Workflow

# 1. Setup development environment
git clone https://github.com/manavsehgal/claude-code-analyst.git
cd claude-code-analyst
uv sync --dev

# 2. Create feature branch
git checkout -b feature/amazing-feature

# 3. Make changes and test
uv run pytest tests/
uv run ruff check .
uv run black .

# 4. Commit and push
git commit -m 'Add amazing feature'
git push origin feature/amazing-feature

# 5. Open Pull Request

See CLAUDE.md for detailed development guidelines.

๐Ÿ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

๐Ÿ”’ Ethical Usage

This toolkit is designed for legitimate research, documentation, and analysis purposes. Please use responsibly:

  • Respect robots.txt and website terms of service
  • Don't overload servers - use reasonable delays between requests
  • Respect copyright - maintain proper attribution and don't republish without permission
  • Be transparent - the tools identify themselves with appropriate User-Agent strings

๐Ÿ™ Acknowledgments

๐Ÿ“ฎ Support

  • ๐Ÿ“– Documentation: Check the comprehensive guides for detailed instructions
  • ๐Ÿ› Bug Reports: Use the GitHub issue tracker
  • ๐Ÿ’ก Feature Requests: Join discussions in the community forum
  • ๐Ÿš€ Claude Code: Integrated custom commands for seamless workflow

Built with โค๏ธ for the Claude Code community

โฌ† Back to Top

About

Combine agents, vibe coding, custom tools, and desktop apps to create a privacy-first Analyst AI using Claude Code and Amazon Bedrock

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors