Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
89 changes: 89 additions & 0 deletions .dockerignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,89 @@
# Git
.git
.gitignore
.gitattributes


# CI
.codeclimate.yml
.travis.yml
.taskcluster.yml

# Docker
docker-compose.yml
Dockerfile
.docker
.dockerignore

# Byte-compiled / optimized / DLL files
**/__pycache__/
**/*.py[cod]

# C extensions
*.so

# Distribution / packaging
.Python
env/
build/
develop-eggs/
dist/
downloads/
eggs/
lib/
lib64/
parts/
sdist/
var/
*.egg-info/
.installed.cfg
*.egg

# PyInstaller
# Usually these files are written by a python script from a template
# before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec

# Installer logs
pip-log.txt
pip-delete-this-directory.txt

# Unit test / coverage reports
htmlcov/
.tox/
.coverage
.cache
nosetests.xml
coverage.xml

# Translations
*.mo
*.pot

# Django stuff:
*.log

# Sphinx documentation
docs/_build/

# PyBuilder
target/

# Virtual environment
.env
.venv/
venv/

# PyCharm
.idea

# Python mode for VIM
.ropeproject
**/.ropeproject

# Vim swap files
**/*.swp

# VS Code
.vscode/
55 changes: 55 additions & 0 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
# -------------------------------------------------------
# 1. Base image
# -------------------------------------------------------
FROM python:3.12-slim AS base

# Prevent Python from writing .pyc files and using stdout buffering
ENV PYTHONDONTWRITEBYTECODE=1
ENV PYTHONUNBUFFERED=1
#additionally run export UV_HTTP_TIMEOUT=120 on local shell
ENV UV_HTTP_TIMEOUT=300
ENV PYTHONPATH="/app/src"



# -------------------------------------------------------
# 2. Install system dependencies
# -------------------------------------------------------
RUN apt-get update && apt-get install -y --no-install-recommends \
build-essential \
ffmpeg \
&& rm -rf /var/lib/apt/lists/*

# -------------------------------------------------------
# 3. Install uv (fast Python package manager)
# -------------------------------------------------------
RUN pip install --no-cache-dir uv

# -------------------------------------------------------
# 4. Copy project metadata first (layer caching)
# -------------------------------------------------------
WORKDIR /app
COPY pyproject.toml uv.lock ./

# -------------------------------------------------------
# 5. Install dependencies using uv
# -------------------------------------------------------
RUN uv sync
#--no-dev

# -------------------------------------------------------
# 6. Copy the actual application
# -------------------------------------------------------
COPY src/ ./src/

# -------------------------------------------------------
# 7. Expose API port and run app
# -------------------------------------------------------
#It does not open the port or publish anything to your host.
#Indicates to other developers that the container listens on port 8000.
#Has no effect unless someone uses --expose or configures port mapping.
#Flask app’s default port is 5000,
EXPOSE 5000

#CORRECT from root directory
CMD ["uv", "run", "flask", "--app", "src.app.main", "run", "--host=0.0.0.0", "--port=5000"]
56 changes: 51 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -56,19 +56,30 @@ This automatically:

------------------------------------------------------------------------

## 🏃 3. Running the Transcription Script
## 🏃 3. Running the Transcription Script and Flask API

Run from the **root directory**:

``` bash
uv run src/app/convertor/service/transcription_service.py
source .env
uv run -m src.app.convertor.service.transcription
```

``` bash
uv run flask --app src.app.main run --debug
```

### Important

Running from the project root ensures that relative paths like
`data/inputs/...` resolve correctly.

The .env file contains the python path (PYTHONPATH):

``` bash
export PYTHONPATH=$(pwd)/src
```
Which ensures the project root is in the PYTHONPATH, so python can find the project modules. For this project "src" should be treated as the root of the package.

## 🧪 5. Running Tests (if applicable)

Expand All @@ -95,6 +106,43 @@ uv sync

------------------------------------------------------------------------

## 🚀 Docker


1. Create an image from the Dockerfile

``` bash
docker build -t myflaskapp .
```
2. Create, start and attach a Docker Container

``` bash
docker run -t myflaskapp .
```

3. To access Flask from localhost

Flask must listen on all interfaces, not just localhost, or it will be unreachable from your machine. The local cmd

``` bash
CMD ["uv", "run", "flask", "--app", "src.app.main", "run"] ❌ Required change: Add host
CMD ["uv", "run", "flask", "--app", "src.app.main", "run", "--host=0.0.0.0", "--port=5000"]

```

Then run the container, and access Flask using Port Binding
``` bash
docker run --rm -p <host_port>:<conteiner_port> myflaskapp

docker run --rm -p 5001:5000 myflaskapp
```

Now in the local browser we can access our running flask application at:

[text](http://localhost:5004/)

------------------------------------------------------------------------

## ❗ Troubleshooting

### **FileNotFoundError for audio inputs**
Expand All @@ -112,6 +160,4 @@ Incorrect:
``` bash
cd src/app/convertor/service/
uv run transcription_service.py # ❌ breaks relative paths
```

------------------------------------------------------------------------
```
13 changes: 13 additions & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -15,3 +15,16 @@ dependencies = [
"ruff>=0.14.6",
"torch>=2.9.1",
]

# Add the PyTorch CPU index as an additional index
[[tool.uv.index]]
name = "pytorch-cpu"
url = "https://download.pytorch.org/whl/cpu"
#recommend the use of explicit = true to ensure that the index is only used for torch
explicit = true

# Tell uv that ONLY torch should come from this index -> To point torch to the desired index:
[tool.uv.sources]
torch = [
{ index = "pytorch-cpu" }
]
Empty file added src/__init__.py
Empty file.
Empty file added src/app/__init__.py
Empty file.
Empty file added src/app/convertor/__init__.py
Empty file.
5 changes: 3 additions & 2 deletions src/app/convertor/service/convertor_service.py
Original file line number Diff line number Diff line change
@@ -1,12 +1,13 @@
from convertor.service.transcription import Transcription
from app.convertor.service.transcription import Transcription


class ConvertorService:

@classmethod
def create_text(cls):
# data_dir = "data"
input_file_name = "./convertor/service/data/inputs/5846093734223028963.ogg"
input_file_name = "./src/app/convertor/service/data/inputs/5846093734223028963.ogg"
# input_file_name = "./convertor/service/data/inputs/5846093734223028963.ogg"
# output_file_name = "./data/outputs/5846093734223028963"
model_id = "tiny"
show_text = True
Expand Down
1 change: 1 addition & 0 deletions src/app/convertor/service/transcription.py
Original file line number Diff line number Diff line change
Expand Up @@ -98,6 +98,7 @@ def get_transcription(self):

result = model.transcribe(
self.input_file_name,
fp16=False
)

if self.show_text:
Expand Down
2 changes: 1 addition & 1 deletion src/app/main.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
from flask import Flask
from convertor.service.convertor_service import ConvertorService
from app.convertor.service.convertor_service import ConvertorService

app = Flask(__name__)

Expand Down
Loading
Loading