This repository contains the official implementation of the experiments presented in our paper.
It provides tools for:
- Generating dissimilarity-based datasets using prototypical signatures
- Training and testing writer-independent classifiers (SVM or SGD)
- Reproducing validation and test experiments reported in the paper
- Automatically producing prediction files and EER metrics for all datasets
This repository requires Python ≥ 3.8.
Install the required dependency:
pip install git+https://github.com/kdmoura/stream_hsv.gitClone this repository:
git clone https://github.com/kdmoura/proto_hsv.git
cd proto_hsvproto_hsv/
│
├── main_process.py # Main training/validation/test pipeline
├── prototype_model.py # Prototype model
├── reproduce.py # Full reproduction of paper results
├── util.py # Helper functions
├── README.md # This file
Experiments are performed on the three datasets used in the paper: GPDS-S, CEDAR, and MCYT.
Each dataset must be preprocessed to extract deep features using the feature extractor described in the paper. They should:
- Undergo preprocessing as outlined in https://github.com/tallesbrito/contrastive_sigver.
- Have features extracted using the SigNet-S available in the same repository.
- The resulting data should be a single .NPZ file containing:
features: shape (samples, features)y: writer IDyforg: 1 for forgery, 0 otherwise
To:
- compute prototypes
- generate dissimilarity training/validation/test data
- train the chosen classifier
- produce prediction files
- compute EER (global + user thresholds)
Example:
python main_process.py \
--cluster-algo kmeans \
--n-clusters 150 \
--model-choice svm \
--dist-type poscentroid \
--f-pred-path /path/to/pred_val_folder \
--f-metric-path /path/to/metric_val_folder \
--input-feat-path /path/to/features.npz \
--dev-users 300 581 \
--perform-validationExample:
python main_process.py \
--cluster-algo kmeans \
--n-clusters 100 \
--model-choice sgd \
--dist-type poscentroid \
--f-pred-path /path/to/pred_test_folder \
--f-metric-path /path/to/metric_test_folder \
--input-feat-path /path/to/features.npz \
--exp-users 0 300 \
--dev-users 300 581The complete reproduction of results (validation + test for all datasets, and all model choices) is handled by:
python reproduce.pyThis script:
-
Creates the required folders:
pred_test/ pred_val/ metric_test/ metric_val/ -
Runs the validation protocol for all datasets
-
Runs the test protocol with the selected best-k values
-
Produces prediction files and metrics as reported in the paper
To use it, update the paths in reproduce.py:
gpdss_npz_path = "path/to/gpdss_features.npz"
mcyt_npz_path = "path/to/mcyt_features.npz"
cedar_npz_path = "path/to/cedar_features.npz"While the script is fully configured and can be run as-is, it is not advisable to execute it sequentially due to the significant amount of time this would take. To optimize your workflow, it is recommended to adapt the execution by parallelizing the processes.