Capture and GPT Assistant

This project provides a tool to capture a selected area of the screen, perform OCR (Optical Character Recognition) on the captured image, and then use the OpenAI GPT API to get intelligent responses based on the extracted text.

Prerequisites

Python 3.x
pip (Python package installer)

Installation

Clone the repository:

git clone https://github.com/your-repo/capture-gpt-assistant.git
cd capture-gpt-assistant

Set up a virtual environment:

python3 -m venv myenv
source myenv/bin/activate  # On Windows, use `myenv\Scripts\activate`

Install the dependencies:

pip install pytesseract pillow openai screeninfo python-dotenv flask

Install Tkinter:
- Windows: Tkinter is usually included with Python on Windows. No additional installation should be needed.
- macOS: Tkinter is also included with Python on macOS, but if you encounter issues, you can install it via Homebrew:
```
brew install python-tk
```
- Linux (Debian/Ubuntu): Install Tkinter using apt-get:
```
sudo apt-get install python3-tk
```
Install Tesseract:
- On Debian/Ubuntu:
```
sudo apt-get install tesseract-ocr
```
- On macOS:
```
brew install tesseract
```
- On Windows: Download and install Tesseract from this link.
Set up the environment variables:

Create a .env file in the project directory and add the following environment variables:
```
OPENAI_API_KEY=your_openai_api_key
TESSERACT_CMD_PATH=/
```
Replace your_openai_api_key with your actual OpenAI API key.
Set up the Tesseract path:

Make sure the Tesseract executable is in your system's PATH. If not, update the path in the script:
```
pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files\Tesseract-OCR\tesseract.exe'  # Update if necessary
```
Configure the OpenAI API key:

Obtain your API key from OpenAI and set it in the script:
```
openai.api_key = 'YOUR_API_KEY'
```

Usage

Run the script:
```
python main.py
```
Using the tool:
- Select Area: Click on "Select Area" to define the area of the screen to capture. Click and drag to create a rectangle over the area you want to capture.
- Capture and Get Response: After selecting the area, click on "Capture and Get Response" to capture the selected area, extract text using OCR, and get a response from GPT.

Customizing the Pre-Prompt

The script includes a pre-prompt to help the GPT model understand the context of the captured text. You can customize this pre-prompt to better suit your needs.

Locate the pre-prompt definition in the script:

pre_prompt = "You are a smart assistant. Answer the following questions clearly and concisely:\n\n"

Modify the pre-prompt to fit your specific context:

For example, if you're asking technical questions, you might change it to:
```
pre_prompt = "You are a technical expert. Provide detailed and accurate answers to the following questions:\n\n"
```
Save the script after making your changes.

By customizing the pre-prompt, you can guide the GPT model to provide more relevant and accurate responses based on the specific context of your captured text.

Script Overview

Main Functions:

capture_screen_area(x1, y1, x2, y2): Captures the specified area of the screen and saves it as an image.
ocr_image(image_path): Performs OCR on the captured image to extract text.
get_gpt_response(question): Sends the extracted text to the OpenAI GPT API to get a response.
select_area(): Opens a transparent window to allow the user to select an area of the screen.
on_area_selected(x1, y1, x2, y2): Callback function that stores the selected area.
capture_and_process(): Captures the stored area and processes the text using OCR and GPT.

Classes:

AreaSelector: Handles the selection of the screen area with a transparent overlay.

License

This project is licensed under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
main.py		main.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Capture and GPT Assistant

Prerequisites

Installation

Usage

Customizing the Pre-Prompt

Script Overview

Main Functions:

Classes:

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Capture and GPT Assistant

Prerequisites

Installation

Usage

Customizing the Pre-Prompt

Script Overview

Main Functions:

Classes:

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages