E-Test-package

This artifact contains the environment, code, and data required to fully replicate the results of our ICSE 2026 paper.

📄 Paper Details

E-Test: E'er-Improving Test Suites Accepted at the 48th International Conference on Software Engineering (ICSE 2026)

Authors: Ketai Qiu, Luca Di Grazia, Leonardo Mariani, and Mauro Pezzè.

🔗 Resources

Paper PDF: Read here
Source Code: GitHub Repository

Repository Structure

AutonomicTester: A Python application designed to implement advanced techniques for E-TEST.
DataAnalysis: A set of Jupyter notebooks to analyze results and compute evaluation metrics.
Archives: A set of tar archives of datasets of prompts and responses from LLMs.

Getting Started

Environment Setup

Create a HuggingFace user access token on https://huggingface.co/docs/hub/security-tokens.
Install Docker.
Run the following commands from the project root directory using a Unix-compatible shell (Bash, Zsh). You can build an image from scratch and then switch to other LLMs by changing OLLAMA_MODEL to any LLMs available on Ollama. If you want to use different LLMs, you have to set -e HUGGING_FACE_API_KEY="hf_xxxxxx" when starting a Docker container.

Step 1. Prepare the Docker image

# Choice 1: Pull the pre-built image from Docker Hub
docker pull ketaiq/e-test:v1-llama3-1b
docker tag ketaiq/e-test:v1-llama3-1b e-test-llama3-1b

# Choice 2: Load the pre-built image for Linux AMD64 platform
docker load -i e-test-llama3-1b-amd64.tar
docker tag e-test-llama3-1b-amd64 e-test-llama3-1b

# Choice 3: Build the image locally
docker build -f ./E-Test.Dockerfile -t e-test-llama3-1b .

# Build and push to Docker Hub
docker buildx build -f E-Test.Dockerfile --platform linux/amd64,linux/arm64 -t ketaiq/e-test:v1-llama3-1b --push .

Step 2. Run the Docker container

docker run -it --rm \
  -p 20268:8888 \
  -e OLLAMA_MODEL="llama3.2:1b" \
  -v $(pwd):/app \
  e-test-llama3-1b

Data Analysis

To reproduce evaluation results shown in the paper, please run notebooks in DataAnalysis folder. You can open http://localhost:20268 to run and edit notebooks directly.

Dataset Stats.ipynb and GH Dataset Stats.ipynb compute statistics about the dataset, which corresponds to Section 2.2 Dataset paragraph, and Table 1 in the paper.
RQ1 Impact of LLMs.ipynb computes evaluation metrics (precision, recall, and F1-score) for each scenario and the average F1-scores, which corresponds to Section 3.1, Table 3, Figure 3 and Figure 4 in the paper.
RQ2 Comparative Evaluation.ipynb computes evaluation metrics of two state-of-the-art approaches (i.e., FAST++ and Field-ready testing), which corresponds to Section 3.2 and Table 3 in the paper.
RQ3 Impact of Queries.ipynb computes evaluation metrics for different combinations of queries, which corresponds to Section 3.3 and Figure 5 in the paper.
RQ4 Efficiency.ipynb measures efficiency of E-Test in terms of response time and token consumption, which corresponds to Section 3.4 and Figure 6 in the paper.
RQ5 Test Case Generation.ipynb analyzes JUnit test cases generated by E-Test, which corresponds to Section 3.5 and Figure 7 in the paper.

E-Test Program

In the Docker interactive shell, run the following commands to launch experiments. The results are generated in the folder AutonomicTester/experiment_results. The test case generation also outputs in the folder Defects4jDataset.

# Test Llama3 1B with prompts generated from error-prone scenarios in Defects4J
python AutonomicTester/main.py prompt -v 4 -d Defects4J -m LLama3_2_1B -s BUGGY
# Test Llama3 1B with prompts generated from need-test scenarios in Defects4J
python AutonomicTester/main.py prompt -v 4 -d Defects4J -m LLama3_2_1B -s FIXED
# Test Llama3 1B with prompts generated from already-tested scenarios in Defects4J
python AutonomicTester/main.py prompt -v 4 -d Defects4J -m LLama3_2_1B -s SIMILAR

# Test Llama3 1B with test case generation for error-prone scenarios in Defects4J
python AutonomicTester/main.py prompt -v 4 -d Defects4J -m LLama3_2_1B -s BUGGY -tcg

For other settings mentioned in the paper, please check the help message via python AutonomicTester/main.py -h.

Run exit to stop the Docker container.

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
Archives		Archives
AutonomicTester		AutonomicTester
DataAnalysis		DataAnalysis
.dockerignore		.dockerignore
.gitignore		.gitignore
E-Test.Dockerfile		E-Test.Dockerfile
LICENSE		LICENSE
README.md		README.md
entrypoint.sh		entrypoint.sh
extract_archives.sh		extract_archives.sh
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

E-Test-package

📄 Paper Details

🔗 Resources

Repository Structure

Getting Started

Environment Setup

Data Analysis

E-Test Program

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

E-Test-package

📄 Paper Details

🔗 Resources

Repository Structure

Getting Started

Environment Setup

Data Analysis

E-Test Program

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages