ActionPiece: Contextual Action Tokenization

This repository provides the code for implementing ActionPiece described in our ICML 25 Spotlight paper "Contextually Tokenizing Action Sequences forGenerative Recommendation".

Unlike existing generative recommendation (GR) models that tokenize each action independently, we propose ActionPiece, a method that explicitly incorporates context into action sequence tokenization. In ActionPiece, each action is initially represented as a set of item features. Using a corpus of action sequences, we construct a vocabulary by merging frequently co-occurring feature patterns, both within individual sets and across adjacent sets.

Quick start

CUDA_VISIBLE_DEVICES=0 python main.py \
    --category=Sports_and_Outdoors

Note that:

The datasets will be automatically downloaded once the category argument is specified.
All hyperparameters can be specified via command line arguments. Please refer to:
- genrec/default.yaml
- genrec/datasets/AmazonReviews2014/config.yaml
- genrec/models/ActionPiece/config.yaml

Reproduction

Sports and Outdoors

Below are the scripts to reproduce the results reported in our paper.

for seed in {2024..2028}; do
    CUDA_VISIBLE_DEVICES=0 python main.py \
        --category=Sports_and_Outdoors \
        --rand_seed=${seed} \
        --weight_decay=0.15 \
        --lr=0.005 \
        --n_hash_buckets=128
done

Beauty

for seed in {2024..2028}; do
    CUDA_VISIBLE_DEVICES=0 python main.py \
        --category=Beauty \
        --rand_seed=${seed} \
        --weight_decay=0.15 \
        --lr=0.001 \
        --n_hash_buckets=64
done

CDs and Vinyl

for seed in {2024..2028}; do
    CUDA_VISIBLE_DEVICES=0 python main.py \
        --category=CDs_and_Vinyl \
        --rand_seed=${seed} \
        --weight_decay=0.07 \
        --lr=0.001 \
        --d_model=256 \
        --d_ff=2048 \
        --n_hash_buckets=256
done

Locating the code

Vocabulary construction

genrec/models/ActionPiece/core.py - ActionPieceCore.train()

Segmentation

Set permutation regularization (SPR) during

training: genrec/models/ActionPiece/tokenizer.py - ActionPieceTokenizer.collate_fn_train()
inference:
- genrec/models/ActionPiece/tokenizer.py - ActionPieceTokenizer.collate_fn_test()
- genrec/models/ActionPiece/model.py - ActionPiece.generate()

Tokenizers

After the first run on each dataset, the provided code automatically constructs and caches the tokenizer vocabulary. The cached tokenizer follows a structure like:

{
    "n_categories": 5,
    "n_init_feats": 1153,
    "token2feat": [
        [-1, -1], # placeholder
        [0, 0], [0, 1], ...,
        [1, 0], [1, 1], ...,
        [4, 126], [4, 127], # [a, b] denotes the b-th choice of each item's a-th feature
        [-1, 363, 763], [-1, 269, 515], ...,
[-1, 241, 1040], [-1, 30314, 39998] # [-1, u, v] denotes token u and v are merged into a new token
    ],
    "priority": [
        0, 0, ...,
        723.6, 670.04, ...
    ]
}

Citing this work

Please cite the following paper if you find our code, processed datasets, or tokenizers helpful.

@inproceedings{hou2025actionpiece,
  title={{ActionPiece}: Contextually Tokenizing Action Sequences for Generative Recommendation},
  author={Yupeng Hou and Jianmo Ni and Zhankui He and Noveen Sachdeva and Wang-Cheng Kang and Ed H. Chi and Julian McAuley and Derek Zhiyuan Cheng},
  booktitle={{ICML}},
  year={2025}
}

License and disclaimer

All software is licensed under the Apache License, Version 2.0 (Apache 2.0); you may not use this file except in compliance with the Apache 2.0 license. You may obtain a copy of the Apache 2.0 license at: https://www.apache.org/licenses/LICENSE-2.0

All other materials are licensed under the Creative Commons Attribution 4.0 International License (CC-BY). You may obtain a copy of the CC-BY license at: https://creativecommons.org/licenses/by/4.0/legalcode

Unless required by applicable law or agreed to in writing, all software and materials distributed here under the Apache 2.0 or CC-BY licenses are distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the licenses for the specific language governing permissions and limitations under those licenses.

This is not an official Google product.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
asset		asset
genrec		genrec
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ActionPiece: Contextual Action Tokenization

Quick start

Reproduction

Sports and Outdoors

Beauty

CDs and Vinyl

Locating the code

Vocabulary construction

Segmentation

Tokenizers

Citing this work

License and disclaimer

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ActionPiece: Contextual Action Tokenization

Quick start

Reproduction

Sports and Outdoors

Beauty

CDs and Vinyl

Locating the code

Vocabulary construction

Segmentation

Tokenizers

Citing this work

License and disclaimer

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages