EB-ReFine: Improving Suffix Prediction Accuracy for Long and Non-Pareto Event Logs through Evidence-Based Refinement

This repository provides the official PyTorch implementation of the paper EB-ReFine: Improving Suffix Prediction Accuracy for Long and Non-Pareto Event Logs through Evidence-Based Refinement.

Details

Predictive Process Monitoring (PPM) seeks to anticipate the future behavior of ongoing process instances, an area that has seen rapid progress with deep learning techniques. Yet, accurately predicting future activity sequences (suffix) remains difficult, particularly on long and Non-Pareto event logs. In long-tail suffix prediction, stepwise errors accumulate, significantly reducing sequence-level accuracy. This challenge is amplified in Non-Pareto logs, where activity frequencies are more evenly distributed and dominant behavioral patterns are absent, limiting statistical cues for generalization. To address these limitations, we propose EB-ReFine, a refinement framework that integrates deep learning with evidence-based refinement mechanism to improve suffix prediction, especially on long and Non-Pareto event logs. EB-ReFine employs a two-stage design: (1) fine-tune a BERT-based model to generate an initial suffix prediction; (2) this prediction is then refined by comparing it against frequency-indexed database(FDB) through the proposed evidence-based refinement mechanism. By combining learned predictions with historical evidence, EB-ReFine efficiently retrieves the most relevant historical trace to guide the final revision. Crucially, the refinement operates solely at inference, introducing no additional training overhead while yielding substantial accuracy gains. Extensive experiments demonstrate that EB-ReFine consistently outperforms state-of-the-art baselines by 10%–30% across diverse public benchmark datasets. Additional evaluations on synthetic datasets simulating varying noise types in a wastewater treatment process further confirm its robustness to data imperfections.

Datasets

We utilized six public event logs from various domains: BPI15_1, BPI17O, BPI20R, Helpdesk, MIP and SEPSIS. To assess noise resilience, we construct a synthetic dataset simulating wastewater treatment.

Environment Setup(python=3.9)

pip install torch==2.6.0 torchvision==0.21.0 torchaudio==2.6.0 --index-url https://download.pytorch.org/whl/cu124
pip install transformers
pip install -U scikit-learn scipy matplotlib
pip install pandas
pip install faiss-cpu
pip install jellyfish

Data preparation

from preprocessing.log_to_history import Log
Log(csv_log, TYPE)

Training

python train.py --csv_log helpdesk --type all --multi_gpu

Models

You can directly download our trained models via this link and put it under models/ for evaluation and testing.

Evaluation

python evaluation.py --csv_log helpdesk --type all --beta 0.8 --threshold 0.4

Test

In this notebook, you can input either:

prefix as IDs: "12 5 9"
prefix as activity names

Hyperparameters Sensitivity (beta × threshold)

On the Helpdesk dataset, the heatmap shows that performance is more sensitive to β than to the threshold θ: higher β (around 0.8–0.9) consistently yields better and more stable mean DL similarity. In contrast, θ has a relatively small effect within 0.1–0.8, while setting θ too high (0.9) noticeably degrades performance, likely due to overly conservative overriding. Therefore, we choose β = 0.8 and θ = 0.4 as the default configuration.

Inference Time Comparison

Run multiple iterations (2000 runs) and report: total time and avg time per run

Notebook: test.ipynb

Method	Total time	Avg time per run (s)
BERT	6.410728s	0.003205s
EB-ReFine	6.447285s	0.003224s

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
event_log		event_log
figures		figures
images		images
neural_network		neural_network
preprocessing		preprocessing
utility		utility
README.md		README.md
evaluation.py		evaluation.py
test.ipynb		test.ipynb
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

EB-ReFine: Improving Suffix Prediction Accuracy for Long and Non-Pareto Event Logs through Evidence-Based Refinement

Details

Datasets

Environment Setup(python=3.9)

Data preparation

Training

Models

Evaluation

Test

Hyperparameters Sensitivity (beta × threshold)

Inference Time Comparison

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

EB-ReFine: Improving Suffix Prediction Accuracy for Long and Non-Pareto Event Logs through Evidence-Based Refinement

Details

Datasets

Environment Setup(python=3.9)

Data preparation

Training

Models

Evaluation

Test

Hyperparameters Sensitivity (beta × threshold)

Inference Time Comparison

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages