From abeacda152f503bfdb1b95606949cd918be5cc9e Mon Sep 17 00:00:00 2001 From: Chaoyue He Date: Sun, 1 Jun 2025 23:57:08 +0800 Subject: [PATCH] Resolve README merge conflicts and add figures --- README.md | 77 +++++++++++++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 75 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index 66bb962..ec1bab3 100644 --- a/README.md +++ b/README.md @@ -1,2 +1,75 @@ -# ESGenius -The code and dataset are coming soon... +# ESGenius Benchmark + +This repository contains the dataset and evaluation code released with the **ESGenius** paper. The benchmark focuses on question answering within the domains of Environmental, Social and Governance (ESG) and sustainability. The full paper is provided as [`ESGenius_arxiv_v1.pdf`](ESGenius_arxiv_v1.pdf). + +## Dataset + +The dataset consists of 1,136 multiple-choice questions covering a range of ESG frameworks such as IPCC, GRI, SASB, ISO, IFRS/ISSB, TCFD and CDP. Each question provides four answer options (A–D) plus a fallback choice `Z` for "Not sure". The plain question set is stored in [`data/ESGenius_1136q.csv`](data/ESGenius_1136q.csv) and [`data/ESGenius_1136q.json`](data/ESGenius_1136q.json). A version with references is available as [`data/ESGenius_w_ref_1136q.csv`](data/ESGenius_w_ref_1136q.csv) and includes page numbers, document names and the original text snippet. + +Example header from the reference file: + +```csv +"query_id","new_id","query","answer","A","B","C","D","Z","ref_page","ref_doc","source_text" +``` +【F:data/ESGenius_w_ref_1136q.csv†L1-L1】 + +## Repository Layout + +- **`evaluation_utils.py`** – common utilities for loading data, building prompts, computing metrics and saving results. It also loads API keys from a `.env` file: + + ```python + load_dotenv() + HF_TOKEN = os.getenv("HF_TOKEN") + DEEPSEEK_API_KEY = os.getenv("DEEPSEEK_API_KEY") + DASHSCOPE_API_KEY = os.getenv("DASHSCOPE_API_KEY") + OPENAI_API_KEY = os.getenv("OPENAI_API_KEY") + ``` + 【F:evaluation_utils.py†L36-L46】 + +- **`eval_opensource.py`** – evaluates open‑source Hugging Face models locally using GPU acceleration. +- **`eval_opensource_rag.py`** – adds a minimal retrieval‑augmented generation (RAG) step before generation. +- **`eval_qwen_api.py`** – evaluates Qwen models through the Dashscope API. +- **`index.html`** – interactive heatmap visualising results for 1,136 questions: + + ```html +

ESGenius_1136q - Interactive Evaluation Heatmap

+

+ This report visualizes the evaluation results for 50 models tested on 1136 questions from the ESGenius_1136q benchmark dataset. + ``` + 【F:index.html†L27-L35】 + +- **`figures/`** – images of word clouds and summary plots used in the paper. + +## Figures + +The following figures summarise the dataset and evaluation results: + +![Question word cloud](figures/ESGenius_QA_question_wordcloud.png) +![Answer option word cloud](figures/ESGenius_QA_option_wordcloud.png) +![Source text word cloud](figures/ESGenius_Source_Text_wordcloud.png) + +![Question distribution](figures/num_questions_distribution_pie.png) +![Page distribution](figures/pages_distribution_pie.png) + +![Main results](figures/main_results.png) +![Accuracy vs model size](figures/acc_vs_model_size.png) + +## Running Evaluations + +1. Install dependencies (PyTorch, Transformers, pandas, etc.). +2. Create a `.env` file with the required API tokens (`HF_TOKEN`, `DASHSCOPE_API_KEY`, etc.). +3. Run one of the evaluation scripts: + +```bash +python eval_opensource.py # evaluate local open‑source models +python eval_opensource_rag.py # evaluate with RAG retrieval +python eval_qwen_api.py # evaluate Qwen via Dashscope API +``` + +Results are written to the `results/` directory as Excel workbooks containing a summary sheet and detailed predictions. + +## License + +This project is licensed under the [Apache 2.0 License](LICENSE). + +For full details of the benchmark and experimental setup, please refer to the accompanying paper `ESGenius_arxiv_v1.pdf`.