This project provides a tool to capture a selected area of the screen, perform OCR (Optical Character Recognition) on the captured image, and then use the OpenAI GPT API to get intelligent responses based on the extracted text.
- Python 3.x
pip(Python package installer)
-
Clone the repository:
git clone https://github.com/your-repo/capture-gpt-assistant.git cd capture-gpt-assistant -
Set up a virtual environment:
python3 -m venv myenv source myenv/bin/activate # On Windows, use `myenv\Scripts\activate`
-
Install the dependencies:
pip install pytesseract pillow openai screeninfo python-dotenv flask
-
Install Tkinter:
-
Windows: Tkinter is usually included with Python on Windows. No additional installation should be needed.
-
macOS: Tkinter is also included with Python on macOS, but if you encounter issues, you can install it via Homebrew:
brew install python-tk
-
Linux (Debian/Ubuntu): Install Tkinter using apt-get:
sudo apt-get install python3-tk
-
-
Install Tesseract:
-
On Debian/Ubuntu:
sudo apt-get install tesseract-ocr
-
On macOS:
brew install tesseract
-
On Windows: Download and install Tesseract from this link.
-
-
Set up the environment variables:
Create a
.envfile in the project directory and add the following environment variables:OPENAI_API_KEY=your_openai_api_key TESSERACT_CMD_PATH=/Replace
your_openai_api_keywith your actual OpenAI API key. -
Set up the Tesseract path:
Make sure the Tesseract executable is in your system's PATH. If not, update the path in the script:
pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files\Tesseract-OCR\tesseract.exe' # Update if necessary
-
Configure the OpenAI API key:
Obtain your API key from OpenAI and set it in the script:
openai.api_key = 'YOUR_API_KEY'
-
Run the script:
python main.py
-
Using the tool:
- Select Area: Click on "Select Area" to define the area of the screen to capture. Click and drag to create a rectangle over the area you want to capture.
- Capture and Get Response: After selecting the area, click on "Capture and Get Response" to capture the selected area, extract text using OCR, and get a response from GPT.
The script includes a pre-prompt to help the GPT model understand the context of the captured text. You can customize this pre-prompt to better suit your needs.
-
Locate the pre-prompt definition in the script:
pre_prompt = "You are a smart assistant. Answer the following questions clearly and concisely:\n\n"
-
Modify the pre-prompt to fit your specific context:
For example, if you're asking technical questions, you might change it to:
pre_prompt = "You are a technical expert. Provide detailed and accurate answers to the following questions:\n\n"
-
Save the script after making your changes.
By customizing the pre-prompt, you can guide the GPT model to provide more relevant and accurate responses based on the specific context of your captured text.
- capture_screen_area(x1, y1, x2, y2): Captures the specified area of the screen and saves it as an image.
- ocr_image(image_path): Performs OCR on the captured image to extract text.
- get_gpt_response(question): Sends the extracted text to the OpenAI GPT API to get a response.
- select_area(): Opens a transparent window to allow the user to select an area of the screen.
- on_area_selected(x1, y1, x2, y2): Callback function that stores the selected area.
- capture_and_process(): Captures the stored area and processes the text using OCR and GPT.
- AreaSelector: Handles the selection of the screen area with a transparent overlay.
This project is licensed under the MIT License.