Repositorio del desarrollo de software del trabajo terminal:
Sistema de generación de corpus lingüísticos para el análisis del inglés como lengua extranjera en México
-
desarrollado por Rodrigo Iván González Valenzuela
-
dirigido por Olivia Carolina Gutú Ocampo
Puedes citar el trabajo como
Rodrigo Iván González Valenzuela (2025) "Sistema de generación de corpus lingüísticos para el análisis del inglés como lengua extranjera en México", Tesis de Maestría en Ciencia de Datos, Unversidad de Sonora
MXESCO-DOCKER is a robust and modular application designed for processing audio files. It includes features for transcription, phonemization, metadata generation, and storage in a MongoDB database. Built with FastAPI, it leverages advanced libraries such as OpenAI's Whisper and Hugging Face's Wav2Vec2 for speech and phoneme recognition.
- Audio Transcription: Extracts text from audio files with word-level timestamps.
- Phonemization: Converts audio data into phonemes with detailed character offsets.
- Metadata Generation: Includes information about the transcriber model, phonemizer, and timestamps.
- Data Storage: Stores processed data and raw audio in MongoDB using GridFS.
- REST API: Exposes endpoints for uploading and processing audio files.
- Containerization: Dockerized for ease of deployment.
MXESCO-DOCKER/
├── app/
│ ├── routes/
│ │ ├── __init__.py
│ │ ├── audio_routes.py
│ ├── services/
│ │ ├── __init__.py
│ │ ├── audio_processing.py
│ │ ├── corpus_app.py
│ │ ├── database.py
│ ├── utils/
│ │ ├── __init__.py
│ │ ├── phonemization.py
│ │ ├── timestamps.py
│ │ ├── transcription.py
│ ├── main.py
├── docker-compose.yml
├── Dockerfile
├── LICENSE
├── README.md
├── requirements.txt
main.py: Entry point for the FastAPI application.audio_routes.py: Defines API endpoints for processing audio files.audio_processing.py: Handles transcription, phonemization, and metadata generation.database.py: Saves metadata and audio files to MongoDB.corpus_app.py: Processes word and phoneme data for enriched metadata.utils/: Utility functions for timestamps, transcription, and phonemization.
- Docker and Docker Compose installed
- Python 3.9+
- Clone the repository:
git clone <repository_url> cd mxesco-docker
- Build and run the Docker containers:
docker compose up --build
- The API will be available at http://localhost:8000.
- Install dependencies:
pip install -r requirements.txt
- Start the FastAPI server:
uvicorn app.main:app --reload
- Endpoint:
/api/process-audio/ - Method:
POST - Description: Uploads an audio file for processing.
- Example Request:
curl -X POST "http://127.0.0.1:8000/api/process-audio/" \ -F "file=@example_audio.mp3"
- Response:
{ "status": "success", "message": "Audio processed and saved successfully." }
- Swagger UI: http://127.0.0.1:8000/docs
- ReDoc: http://127.0.0.1:8000/redoc
- FastAPI: Web framework for building APIs.
- PyTorch: For handling audio data and Wav2Vec2 model inference.
- Whisper: OpenAI's speech-to-text library.
- MongoDB & GridFS: For data persistence.
- Docker: For containerized deployment.
- Pydub: For audio file manipulation.
- Phonemizer: For generating phonemes from text.
docker-compose.yml:- Defines two services:
app: The FastAPI application.mongo: MongoDB database.
- Exposes ports
8000for the application and27017for MongoDB.
- Defines two services:
To customize settings, modify the environment variables in the docker-compose.yml file or create a .env file.
This project is licensed under the MIT License. See the LICENSE file for details.
Contributions are welcome! To contribute:
- Fork the repository.
- Create a feature branch.
- Commit your changes.
- Submit a pull request.
- OpenAI for Whisper
- Hugging Face for Wav2Vec2
- MongoDB for efficient data handling
- Maestría en Ciencia de Datos, Universidad de Sonora (GitHub Repository)