The project is managed with uv. To use uv you should download it (https://docs.astral.sh/uv/getting-started/installation/).
To setup the environment go to the root of the project and run the following commands:
uv syncThis will install the correct python version and all dependencies (specified in the pyproject.toml) to an environment (.venv).
To execute a python program with uv you run uv run python <path to file>. So lets say you have a python file called main.py you want to run. Then you would execute this with:
uv run python main.pyThe RAISE-1k dataset has 1 incomplete image: r0bf7f938t.tif. It is possible to
download the image, but opening the image or loading the image via the code results
in errors and/or crashes your code. The error is an TIFFFillStrip error and will look
like the following:
TIFFFillStrip: Read error on strip 4899; got 18446744073705070833 bytes, expected 4396.So only 999 images from the RAISE-1k can be used
To compute the predicted scores for a set of images, place them under a directory and use the following command.
uv run python -m spai infer \
--input <input_csv> \
--output <output_folder> \
--batch-size 8 \
where:
input_csv: is the path to the csv file containing all the input images. For the original data, the csv files exist under the directorydata. For the modified datasets, the csv files are under the directoriesdata_with_filter,data_sr,meme-generatorV3/msc-dl2/data_with_memes, andinstagram_simulation/screenshot_simulation/.output_dir: is a directory where the csv file with the predictions are written to. For the original data, the output directory wasinference. For the modified datasets, the output directories werefilter_inference,inference_sr,meme_inference, andss_inference.
To compute the average AUC of a fake imageset over several real imagesets, the following command can be executed:
uv run python evaluate.py --metric auc --input_dir <input_dir>where:
input_dir: is the directory containing all the csv files from inference. For the original data, the input directory wasinference. For the modified datasets, the input directories werefilter_inference,inference_sr,meme_inference, andss_inference.
In order to generate Instagram screenshot data, you should first go to instagram_simulation directory, and then run the following for each required csv file:
python generate_posts.py --csv_path <./data/input_csv> --avatars_dir ./assets/avatars --output_dir ./screenshot_simulation where:
input_csv: is the path to the csv file, under the data directory, containing all the input images.
For creating the Instagram filter data, the following command can be run:
python instagram_filter.py filter --input_csv <./data/input_csv>To simulate meme filters, you first have to go to the meme-generatorV3/msc-dl2/ directory, then run the following:
python meme-python.py <./data/input_csv>To modify images by applying super-resolution you can run the following command:
python super_resolution.py \
-f <./data/input_csv> \
--root_dir data \
-o <output directory> \
--batch_size=1where:
input_csv: is the path to the csv file, under the data directory, containing all the input images.
After downloading the latent_diffusion_trainingset and COCO, we used the following prompt to create a csv file consisting of all images:
python -m spai.tools.modified_create_dmid_ldm_train_val_csv \
--train_dir "./datasets/latent_diffusion_trainingset/train" \
--val_dir "./datasets/latent_diffusion_trainingset/valid" \
--coco_dir "./datasets/COCO" \
-o "./datasets/ldm_train_val.csv"To reduce the training set to 17,997 real and 17,997 generated images, we first ran reduce_training_set.py under the datasets directory, and then
python ./datasets/training_dataset.py trainset --input_csv ./datasets/reduced_training_data.csv
in the main directory.
To create the embeddings for the training and validation data you can use the create_multi_embeddings.py script.
The training data is created by the following command:
python create_multi_embedding.py \
--dataPath datasets/reduced_training_data.csv \
--batchSize 64 \
--root_dir datasets \
--data_split train \
--embedding_file train_embeddings.pklThe validation data is created by the following command:
python create_multi_embedding.py \
--dataPath datasets/reduced_training_data.csv \
--batchSize 64 \
--root_dir datasets \
--data_split val \
--embedding_file embeddings/val_embeddings.pklTo create the embeddings fo the test datasets you can use again the create_multi_embedding.py script. The following command is an example of how to create the embeddings of the COCO dataset:
python create_multi_embedding.py \
--dataPath data/real_coco.csv \
--batchSize 32 \
--root_dir data \
--data_split test \
--embedding_file coco_test_embeddings.pklTo create the embeddings for different test datasets you would change the --dataPath argument.
python train.py \
--lr 3e-6 \
--batch_size 256 \
--epochs 50 \
--model_name SplitMLPTo create the prediction it is expected that the embeddings are already created and stored in the folder specified by --eval_data_path. The predictions are stored per input file in the directory specified in --outdir_dir.
python eval.py \
--checkpoint_path checkpoints/SplitMLP-v1.ckpt \
--eval_data_path data/test_data_modifications/sr \
--output_dir eval_splitmlp_sr