LifeTrace - Intelligent Life Recording System

Project Overview

LifeTrace is an AI-powered intelligent life recording system that helps users record and retrieve daily activities through automatic screenshot capture, OCR text recognition, and multimodal search technologies. The system supports traditional keyword search, semantic search, and multimodal search, providing powerful life trajectory tracking capabilities.

Core Features

Automatic Screenshot Recording: Timed automatic screen capture to record user activities
Intelligent OCR Recognition: Uses RapidOCR to extract text content from screenshots
Multimodal Search: Supports text, image, and semantic search
Vector Database: Efficient vector storage and retrieval based on ChromaDB
Web API Service: Provides complete RESTful API interfaces
Frontend Integration: Supports integration with various frontend frameworks

Backend Architecture

System Architecture

┌─────────────────────────────────────────────────────────────┐
│                   LifeTrace Backend Architecture            │
├─────────────────────────────────────────────────────────────┤
│                                                            │
│ ┌─────────────┐   ┌─────────────┐   ┌─────────────┐     │
│ │  Web API   │   │ Frontend UI │   │ Admin Tools │     │
│ │(FastAPI)   │   │            │   │            │     │
│ └─────────────┘   └─────────────┘   └─────────────┘     │
│        │                  │                  │          │
│        └───────────────────┼───────────────────┘          │
│                            │                              │
│ ┌─────────────────────────────────────────────────────────┐│
│ │                  Core Services                         ││
│ ├─────────────────────────────────────────────────────────┤│
│ │                                                        ││
│ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐     ││
│ │ │Screenshot   │ │File         │ │OCR          │     ││
│ │ │Recorder     │ │Processor    │ │Service      │     ││
│ │ └─────────────┘ └─────────────┘ └─────────────┘     ││
│ │        │               │               │            ││
│ │        └────────────────┼────────────────┘            ││
│ │                         │                             ││
│ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐     ││
│ │ │Vector       │ │Multimodal   │ │Storage      │     ││
│ │ │Service      │ │Service      │ │Manager      │     ││
│ │ └─────────────┘ └─────────────┘ └─────────────┘     ││
│ └─────────────────────────────────────────────────────────┘│
│                            │                              │
│ ┌─────────────────────────────────────────────────────────┐│
│ │                  Data Storage                          ││
│ ├─────────────────────────────────────────────────────────┤│
│ │                                                        ││
│ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐     ││
│ │ │SQLite DB    │ │Vector DB    │ │File Storage │     ││
│ │ │Metadata     │ │ChromaDB     │ │Screenshots  │     ││
│ │ └─────────────┘ └─────────────┘ └─────────────┘     ││
│ └─────────────────────────────────────────────────────────┘│
└─────────────────────────────────────────────────────────────┘

Core Module Details

1. Web API Service (`lifetrace_backend/server.py`)

RESTful API service built on FastAPI, providing the following main endpoints:

Screenshot Management
- GET /api/screenshots - Get screenshot list
- GET /api/screenshots/{id} - Get single screenshot details
- GET /api/screenshots/{id}/image - Get screenshot image file
Search Services
- POST /api/search - Traditional keyword search
- POST /api/semantic-search - Semantic search
- POST /api/multimodal-search - Multimodal search
System Management
- GET /api/statistics - Get system statistics
- GET /api/config - Get system configuration
- GET /api/health - Health check
- POST /api/cleanup - Clean old data
Vector Database Management
- GET /api/vector-stats - Vector database statistics
- POST /api/vector-sync - Sync vector database
- POST /api/vector-reset - Reset vector database

2. Data Models (`lifetrace_backend/models.py`)

Defines the core data models of the system:

Screenshot: Screenshot record model
OCRResult: OCR recognition result model
SearchIndex: Search index model
ProcessingQueue: Processing queue model

3. Configuration Management (`lifetrace_backend/config.py`)

Unified configuration management system:

Supports YAML configuration files
Environment variable override
Default configuration
Configuration validation and type conversion

4. Storage Management (`lifetrace_backend/storage.py`)

Database management and storage services:

DatabaseManager: SQLite database management
Transaction management support
Automatic database migration
Connection pool management
Data cleanup and maintenance

5. OCR Processing Module (`lifetrace_backend/simple_ocr.py`)

Image text recognition service:

SimpleOCRProcessor: Text recognition based on RapidOCR
Supports multiple image formats
Batch processing capability
Result caching mechanism
Integration with vector services

6. Vector Services

6.1 Text Vector Service (`lifetrace_backend/vector_service.py`)

VectorService: Text semantic search service
Text embedding based on sentence-transformers
ChromaDB vector database storage
Supports reranking
Automatic synchronization mechanism

6.2 Multimodal Vector Service (`lifetrace_backend/multimodal_vector_service.py`)

MultimodalVectorService: Image + text joint search
Multimodal embedding based on CLIP model
Separate text and image vector storage
Weight fusion search algorithm
Cross-modal semantic understanding

7. File Processing Module (`lifetrace_backend/processor.py`)

File system monitoring and processing:

FileProcessor: File monitoring and processing
ScreenshotHandler: Screenshot file event handling
Asynchronous processing queue
File change monitoring
Batch processing optimization

8. Screen Recording Module (`lifetrace_backend/recorder.py`)

Automatic screenshot functionality:

ScreenRecorder: Screen recording management
Multi-screen support
Intelligent deduplication mechanism
Configurable screenshot interval
Active window information acquisition

9. Utility Module (`lifetrace_backend/utils.py`)

Common utility functions:

Log configuration management
File hash calculation
Active window information acquisition
Cross-platform compatibility
File cleanup tools

Data Flow Architecture

Screenshot → File Monitor → OCR Process → Vector → Storage
    ↓          ↓           ↓          ↓       ↓
Scheduled   File Events  Text Extract Embedding Database
    ↓          ↓           ↓          ↓       ↓
Multi-screen Queue Process RapidOCR   CLIP    SQLite
                                      ↓       ↓
                                   Vector DB
                                  (ChromaDB)

Search Architecture

User Query
    ↓
┌─────────────┬─────────────┬─────────────┐
│Keyword      │ Semantic    │Multimodal   │
│Search       │ Search      │Search       │
├─────────────┼─────────────┼─────────────┤
│SQL LIKE     │Vector       │Image-Text   │
│Full-text    │Similarity   │Fusion       │
│Exact Match  │Semantic     │CLIP Model   │
│             │Understanding│Cross-modal  │
└─────────────┴─────────────┴─────────────┘
    ↓            ↓            ↓
Result Ranking → Reranking → Weight Fusion
    ↓
Unified Result Format

Technology Stack

Backend Core

FastAPI: Web framework and API service
SQLAlchemy: ORM and database operations
SQLite: Main database
ChromaDB: Vector database

AI/ML Components

RapidOCR: Text recognition engine
sentence-transformers: Text embedding models
CLIP: Multimodal embedding model
transformers: Transformer model library

System Tools

Pillow: Image processing
watchdog: File system monitoring
psutil: System information acquisition
pydantic: Data validation

Deployment and Configuration

Environment Requirements

Python 3.8+
Supported OS: Windows, macOS, Linux
Optional: CUDA support (for GPU acceleration)

Install Dependencies

pip install -r requirements.txt

Configuration File

Main configuration file: config/default_config.yaml

server:
  host: 127.0.0.1
  port: 8840
  debug: false

vector_db:
  enabled: true
  collection_name: "lifetrace_ocr"
  embedding_model: "shibing624/text2vec-base-chinese"
  rerank_model: "BAAI/bge-reranker-base"
  persist_directory: "vector_db"

multimodal:
  enabled: true
  text_weight: 0.6
  image_weight: 0.4

Starting Services

Start All Services

python start_all_services.py

Start Web Service Only

python -m lifetrace_backend.server --port 8840

Start Individual Services

# Start recorder
python -m lifetrace_backend.recorder

# Start processor
python -m lifetrace_backend.processor

# Start OCR service
python -m lifetrace_backend.simple_ocr

API Documentation

After starting the service, access API documentation at:

Swagger UI: http://localhost:8840/docs
ReDoc: http://localhost:8840/redoc

Development Guide

Project Structure

LifeTrace/
├── lifetrace_backend/      # Core modules
│   ├── server.py           # Web API service
│   ├── models.py           # Data models
│   ├── config.py           # Configuration management
│   ├── storage.py          # Storage management
│   ├── simple_ocr.py       # OCR processing
│   ├── vector_service.py   # Vector service
│   ├── multimodal_*.py     # Multimodal services
│   ├── processor.py        # File processing
│   ├── recorder.py         # Screen recording
│   └── utils.py            # Utility functions
├── config/                 # Configuration files
├── doc/                    # Documentation
├── data/                   # Data directory
├── logs/                   # Log directory
└── requirements.txt        # Dependencies

Extension Development

Add new search algorithms: Extend vector_service.py
Support new OCR engines: Modify simple_ocr.py
Add new API endpoints: Extend server.py
Custom data models: Modify models.py

Performance Optimization

Vector Database Optimization

Regular index rebuilding
Batch insert optimization
Memory usage monitoring

OCR Processing Optimization

Image preprocessing
Batch processing
Result caching

Search Performance Optimization

Result pagination
Query caching
Index optimization

Monitoring and Maintenance

Log Management

Log files: logs/lifetrace_YYYYMMDD.log
Log levels: DEBUG, INFO, WARNING, ERROR

Database Maintenance

Regular cleanup of old data
Database backup
Index rebuilding

System Monitoring

Service health check: GET /api/health
System statistics: GET /api/statistics
Queue status: GET /api/queue/status

Troubleshooting

Common Issues

Vector database initialization failure
- Check ChromaDB dependency installation
- Verify data directory permissions
Poor OCR recognition quality
- Adjust image preprocessing parameters
- Check RapidOCR model files
Multimodal search unavailable
- Install CLIP-related dependencies
- Check model download status

Debug Mode

python -m lifetrace_backend.server --debug

Contributing

Fork the project
Create a feature branch
Commit changes
Create a Pull Request

License

This project is licensed under the MIT License.

Documentation

For detailed documentation, please refer to the doc/ directory:

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
._image		._image
.claude		.claude
__pycache__		__pycache__
config		config
debug		debug
doc		doc
front		front
lifetrace_backend		lifetrace_backend
logs		logs
reference_source		reference_source
requirements		requirements
~/.lifetrace		~/.lifetrace
.DS_Store		.DS_Store
.gitattributes		.gitattributes
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
README.md		README.md
analyze_recorder_failure.py		analyze_recorder_failure.py
check_failed_tasks.py		check_failed_tasks.py
check_ocr_status.py		check_ocr_status.py
check_queue.py		check_queue.py
check_recent_screenshots.py		check_recent_screenshots.py
check_recorder_status.py		check_recorder_status.py
check_restart_conditions.py		check_restart_conditions.py
cleanup_duplicate_processes.py		cleanup_duplicate_processes.py
debug_ocr_logging.py		debug_ocr_logging.py
diagnostic_tool.py		diagnostic_tool.py
fix_missing_tasks.py		fix_missing_tasks.py
heartbeat_demo.py		heartbeat_demo.py
init_database.py		init_database.py
main.ipynb		main.ipynb
memory_analyzer.py		memory_analyzer.py
package-lock.json		package-lock.json
processor_debug.py		processor_debug.py
recorder_failure_analysis_report.md		recorder_failure_analysis_report.md
reset_failed_tasks.py		reset_failed_tasks.py
setup.py		setup.py
start_all_services.py		start_all_services.py
start_ocr_service.py		start_ocr_service.py
system_resource_analysis.py		system_resource_analysis.py
test_all_services_config.py		test_all_services_config.py
test_config_loading.py		test_config_loading.py
test_file_monitoring.py		test_file_monitoring.py
test_heartbeat_fix.py		test_heartbeat_fix.py
test_heartbeat_system.py		test_heartbeat_system.py
test_log_viewer.py		test_log_viewer.py
test_ocr_logger_creation.py		test_ocr_logger_creation.py
test_ocr_logging.py		test_ocr_logging.py
test_ocr_queue.py		test_ocr_queue.py
test_timeout_mechanism.py		test_timeout_mechanism.py
verify_heartbeat_fix.py		verify_heartbeat_fix.py

Folders and files

Latest commit

History

Repository files navigation

LifeTrace - Intelligent Life Recording System

Project Overview

Core Features

Backend Architecture

System Architecture

Core Module Details

1. Web API Service (lifetrace_backend/server.py)

2. Data Models (lifetrace_backend/models.py)

3. Configuration Management (lifetrace_backend/config.py)

4. Storage Management (lifetrace_backend/storage.py)

5. OCR Processing Module (lifetrace_backend/simple_ocr.py)

6. Vector Services

6.1 Text Vector Service (lifetrace_backend/vector_service.py)

6.2 Multimodal Vector Service (lifetrace_backend/multimodal_vector_service.py)

7. File Processing Module (lifetrace_backend/processor.py)

8. Screen Recording Module (lifetrace_backend/recorder.py)

9. Utility Module (lifetrace_backend/utils.py)

Data Flow Architecture

Search Architecture

Technology Stack

Backend Core

AI/ML Components

System Tools

Deployment and Configuration

Environment Requirements

Install Dependencies

Configuration File

Starting Services

Start All Services

Start Web Service Only

Start Individual Services

API Documentation

Development Guide

Project Structure

Extension Development

Performance Optimization

Vector Database Optimization

OCR Processing Optimization

Search Performance Optimization

Monitoring and Maintenance

Log Management

Database Maintenance

System Monitoring

Troubleshooting

Common Issues

Debug Mode

Contributing

License

Documentation

Star History

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

1. Web API Service (`lifetrace_backend/server.py`)

2. Data Models (`lifetrace_backend/models.py`)

3. Configuration Management (`lifetrace_backend/config.py`)

4. Storage Management (`lifetrace_backend/storage.py`)

5. OCR Processing Module (`lifetrace_backend/simple_ocr.py`)

6.1 Text Vector Service (`lifetrace_backend/vector_service.py`)

6.2 Multimodal Vector Service (`lifetrace_backend/multimodal_vector_service.py`)

7. File Processing Module (`lifetrace_backend/processor.py`)

8. Screen Recording Module (`lifetrace_backend/recorder.py`)

9. Utility Module (`lifetrace_backend/utils.py`)

Packages