llm-eval-on-test-data

A tool for evaluating and comparing large language model performance on translation tasks using a chinese-english test dataset.

Overview

This project evaluates various LLMs on translation tasks, comparing their performance using COMET and generating visualizations of the results.

Requirements

Rye package manager

Installation

install dependencies:

rye sync

Configuration

Create a .env file with required API keys and settings (see .env.example)
Prepare your test dataset or use the provided passage_pairs_test_dataset.json

Usage

Run the evaluation script:

rye run python src/llm_eval_on_test_data/__init__.py

The script will:

Load test data from passage_pairs_test_dataset.json
Fetch translations using the configured LLMs via the translation_fetcher.py module
Store results in translations.db
Generate performance comparisons and visualizations using the plot.py module

Results

Performance comparisons are visualized and saved as model_performance_comparison.png.

Project Structure

llm_eval_on_test_data: Core source code
- __init__.py: Entry point
- translation_fetcher.py: Handles translation requests to LLMs
- plot.py: Generates visualizations of performance metrics

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
src/llm_eval_on_test_data		src/llm_eval_on_test_data
.env.example		.env.example
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
model_performance_comparison.png		model_performance_comparison.png
passage_pairs_test_dataset.json		passage_pairs_test_dataset.json
pyproject.toml		pyproject.toml
requirements-dev.lock		requirements-dev.lock
requirements.lock		requirements.lock
translations.db		translations.db

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

llm-eval-on-test-data

Overview

Requirements

Installation

Configuration

Usage

Results

Project Structure

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

llm-eval-on-test-data

Overview

Requirements

Installation

Configuration

Usage

Results

Project Structure

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages