AI Skin Doctor - Project Documentation

Overview

AI Skin Doctor is an interactive Streamlit-based application that combines speech recognition, computer vision, and text-to-speech technologies to provide medical advice for skin conditions. Users can record their voice describing their symptoms and upload an image of their skin condition, and the AI will analyze the information and provide a voice response.

Project Structure

LLMProject/
├── src/
│   ├── app.py                    # Main Streamlit application
│   ├── llm_brain.py              # Image analysis with Groq LLM
│   ├── patient_voice.py          # Speech-to-text transcription
│   ├── doctors_voice.py          # Text-to-speech generation
│   ├── utils.py                  # Utility functions
│   └── main.py                   # Entry point
├── test/                         # Test files
├── doc/                          # Documentation
├── audio_records/
│   ├── inputs/                   # User audio recordings
│   └── outputs/                  # Generated doctor responses
├── images/                       # Uploaded images
├── requirements.txt              # Python dependencies
├── pyproject.toml               # Project configuration
├── .env                         # Environment variables (not in repo)
└── README.md                    # Project overview

Features

1. Voice Input

Record audio through web browser
Automatic audio format conversion (to MP3)
Speech-to-text transcription using Groq's Whisper model

2. Image Analysis

Support for JPG, JPEG, and PNG formats
Medical image analysis using Meta's Llama 4 Scout vision model
Base64 encoding for secure image transmission

3. AI Response

Context-aware medical advice
Natural language responses
Differential diagnosis suggestions

4. Voice Output

Text-to-speech conversion using ElevenLabs
Natural-sounding voice responses
Audio playback in browser

Technology Stack

Core Technologies

Python 3.10+: Programming language
Streamlit: Web application framework
Groq API: LLM and STT services
ElevenLabs API: Text-to-speech services

Key Libraries

groq==0.15.0: Groq API client
streamlit: Web UI framework
SpeechRecognition: Audio processing
pydub: Audio format conversion
elevenlabs: Text-to-speech API
python-dotenv: Environment variable management
gtts: Google Text-to-Speech (alternative)

AI Models Used

Whisper Large V3: Speech-to-text transcription
Llama 4 Scout 17B: Multimodal image analysis
ElevenLabs Multilingual V2: Natural voice synthesis

Installation

Prerequisites

Python 3.10 or higher
FFmpeg (for audio processing)
API Keys:
- Groq API Key
- ElevenLabs API Key

Step-by-Step Installation

Clone the repository

git clone <repository-url>
cd LLMProject

Create virtual environment (recommended)

python -m venv venv
# Windows
venv\Scripts\activate
# Linux/Mac
source venv/bin/activate

Install dependencies
```
pip install -r requirements.txt
```
Install FFmpeg
- Windows: Download from ffmpeg.org and add to PATH
- Mac: brew install ffmpeg
- Linux: sudo apt-get install ffmpeg

Set up environment variables Create a .env file in the root directory:

GROQ_API_KEY=your_groq_api_key_here
ELEVENLABS_API_KEY=your_elevenlabs_api_key_here

Configuration

Environment Variables

Variable	Description	Required	Default
`GROQ_API_KEY`	Groq API authentication key	Yes	None
`ELEVENLABS_API_KEY`	ElevenLabs API authentication key	Yes	None

Model Configuration

Models can be configured in the respective Python files:

llm_brain.py:

model = "meta-llama/llama-4-scout-17b-16e-instruct"

patient_voice.py:

stt_model = "whisper-large-v3"

doctors_voice.py:

voice_id = "21m00Tcm4TlvDq8ikWAM"  # Rachel voice
model_id = "eleven_multilingual_v2"

System Prompt Customization

The system prompt in app.py can be modified to change the AI's behavior:

system_prompt = """You have to act as a professional doctor..."""

Usage

Running the Application

Navigate to the src directory
```
cd src
```
Start the Streamlit app
```
streamlit run app.py
```
Access the application
- Open your browser to http://localhost:8501

User Workflow

Record Audio: Click the audio input button and describe your skin condition
Upload Image: Select an image file showing your skin condition
Submit: The app processes both inputs automatically
View Results:
- Transcribed text of your audio
- AI doctor's response
- Audio playback of the response

Example Use Case

User Audio: "I have red bumps on my face that are painful and won't go away"
User Image: [Photo of facial acne]

AI Response: "With what I see, I think you have inflammatory acne. 
              Try using benzoyl peroxide and consider consulting a 
              dermatologist if symptoms persist."

API Documentation

Module: llm_brain.py

`encode_image(image_path: str) -> str`

Encodes an image file to base64 format.

Parameters:

image_path (str): Path to the image file

Returns:

str: Base64 encoded image string

Raises:

FileNotFoundError: If image file doesn't exist
Exception: For other encoding errors

Example:

encoded = encode_image("path/to/image.jpg")

`analyze_image_with_query(query: str, model: str, encoded_image: str) -> str`

Analyzes an image using Groq's multimodal LLM.

Parameters:

query (str): Text prompt for analysis
model (str): Model identifier
encoded_image (str): Base64 encoded image

Returns:

str: AI-generated analysis

Raises:

Exception: If API call fails

Example:

response = analyze_image_with_query(
    query="What skin condition is this?",
    model="meta-llama/llama-4-scout-17b-16e-instruct",
    encoded_image=encoded_img
)

Module: patient_voice.py

`convert_audio_to_mp3(input_file: str) -> str`

Converts audio file to MP3 format.

Parameters:

input_file (str): Path to input audio file

Returns:

str: Path to converted MP3 file

Raises:

ValueError: If no audio file provided
FileNotFoundError: If FFmpeg not found
Exception: For conversion errors

`transcribe_with_groq(audio_wav_file: str, stt_model: str) -> str`

Transcribes audio to text using Groq's Whisper model.

Parameters:

audio_wav_file (str): Path to audio file
stt_model (str): Model identifier

Returns:

str: Transcribed text

Raises:

ValueError: If no audio file provided
Exception: For transcription errors

Example:

text = transcribe_with_groq("audio.wav", "whisper-large-v3")

Module: doctors_voice.py

`text_to_speech_with_elevenlabs(input_text: str, output_filepath: str) -> str`

Converts text to speech using ElevenLabs API.

Parameters:

input_text (str): Text to convert
output_filepath (str): Path to save audio file

Returns:

str: Path to saved audio file

Example:

audio_path = text_to_speech_with_elevenlabs(
    "Hello patient",
    "output.mp3"
)

Module: utils.py

`createDirIfNotExists(directory_path: str) -> Path`

Creates directory if it doesn't exist.

Parameters:

directory_path (str): Path to create

Returns:

Path: Path object of created directory

Example:

from pathlib import Path
dir_path = createDirIfNotExists("audio_records/inputs")

Module: app.py

`process_inputs(audio_filepath: str, image_filepath: str) -> tuple`

Main processing function that orchestrates the entire workflow.

Parameters:

audio_filepath (str): Path to user's audio recording
image_filepath (str): Path to uploaded image

Returns:

tuple: (transcribed_text, doctor_response, audio_response_path)

Raises:

FileNotFoundError: If files not found
Exception: For processing errors

Architecture

System Flow

User Input (Audio + Image)
    ↓
[Streamlit UI (app.py)]
    ↓
[File Storage] → audio_records/inputs/, images/
    ↓
[Audio Processing] → patient_voice.py
    ├── Format Conversion (MP3)
    └── Speech-to-Text (Groq Whisper)
    ↓
[Image Processing] → llm_brain.py
    ├── Base64 Encoding
    └── Image Analysis (Llama 4 Scout)
    ↓
[Response Generation] → doctors_voice.py
    └── Text-to-Speech (ElevenLabs)
    ↓
[Output Storage] → audio_records/outputs/
    ↓
[Display Results] → Streamlit UI

Data Flow

Input Phase:
- User records audio via browser
- User uploads image file
- Files saved to local storage
Processing Phase:
- Audio converted to MP3
- Audio transcribed to text
- Image encoded to base64
- Combined query sent to Llama model
Response Phase:
- AI generates text response
- Text converted to speech
- Audio file saved
Output Phase:
- Display transcription
- Display AI response
- Play audio response

Error Handling

Exception Types

The application implements comprehensive error handling:

File Errors:

FileNotFoundError: Missing audio/image files
IOError: File read/write errors

API Errors:

ValueError: Missing API keys
Exception: API request failures

Processing Errors:

Exception: Transcription/analysis failures

Error Recovery

Missing Files: User-friendly error messages displayed in UI
API Failures: Logged with detailed error messages
Processing Errors: Graceful degradation with informative feedback

Logging

The application uses Python's logging module:

logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s - %(levelname)s - %(message)s'
)

Log Levels:

INFO: Normal operation events
ERROR: Error conditions

Log Locations:

Console output (stdout/stderr)
Can be configured to file output

Best Practices

For Developers

API Keys: Never commit .env file to version control
Testing: Test audio/image inputs before deployment
Error Handling: Always wrap API calls in try-except blocks
Logging: Use appropriate log levels
Path Handling: Use Path objects for cross-platform compatibility

For Users

Audio Quality: Record in quiet environment
Image Quality: Use clear, well-lit photos
File Formats: Use supported formats (MP3, WAV for audio; JPG, PNG for images)
Privacy: Don't share real medical images without consent

Troubleshooting

Common Issues

1. FFmpeg Not Found

Error: FFmpeg or the input file was not found
Solution: Install FFmpeg and add to system PATH

2. API Key Missing

ValueError: GROQ_API_KEY not found in environment variables
Solution: Create .env file with valid API keys

3. Audio Recording Issues

Solution: Check browser permissions for microphone access

4. Image Upload Fails

Solution: Ensure file size < 200MB and correct format

Contributing

Development Setup

Fork the repository
Create a feature branch
Make changes with tests
Submit pull request

Code Style

Follow PEP 8 guidelines
Add docstrings to functions
Include type hints where appropriate
Write descriptive commit messages

License

This project is for educational purposes. Consult license file for details.

Disclaimer

⚠️ Medical Disclaimer: This application is for educational purposes only and should not replace professional medical advice. Always consult with qualified healthcare providers for medical concerns.

Support

For issues or questions:

Open an issue on GitHub
Check existing documentation
Review error logs

Last Updated: February 2026 Version: 1.0.0

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

AI Skin Doctor - Project Documentation

Overview

Table of Contents

Project Structure

Features

1. Voice Input

2. Image Analysis

3. AI Response

4. Voice Output

Technology Stack

Core Technologies

Key Libraries

AI Models Used

Installation

Prerequisites

Step-by-Step Installation

Configuration

Environment Variables

Model Configuration

System Prompt Customization

Usage

Running the Application

User Workflow

Example Use Case

API Documentation

Module: llm_brain.py

encode_image(image_path: str) -> str

analyze_image_with_query(query: str, model: str, encoded_image: str) -> str

Module: patient_voice.py

convert_audio_to_mp3(input_file: str) -> str

transcribe_with_groq(audio_wav_file: str, stt_model: str) -> str

Module: doctors_voice.py

text_to_speech_with_elevenlabs(input_text: str, output_filepath: str) -> str

Module: utils.py

createDirIfNotExists(directory_path: str) -> Path

Module: app.py

process_inputs(audio_filepath: str, image_filepath: str) -> tuple

Architecture

System Flow

Data Flow

Error Handling

Exception Types

Error Recovery

Logging

Best Practices

For Developers

For Users

Troubleshooting

Common Issues

Contributing

Development Setup

Code Style

License

Disclaimer

Support

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

`encode_image(image_path: str) -> str`

`analyze_image_with_query(query: str, model: str, encoded_image: str) -> str`

`convert_audio_to_mp3(input_file: str) -> str`

`transcribe_with_groq(audio_wav_file: str, stt_model: str) -> str`

`text_to_speech_with_elevenlabs(input_text: str, output_filepath: str) -> str`

`createDirIfNotExists(directory_path: str) -> Path`

`process_inputs(audio_filepath: str, image_filepath: str) -> tuple`

Packages