Skip to content

GPTQ for RWKV#98

Draft
3outeille wants to merge 20 commits into
BlinkDL:mainfrom
3outeille:quantize
Draft

GPTQ for RWKV#98
3outeille wants to merge 20 commits into
BlinkDL:mainfrom
3outeille:quantize

Conversation

@3outeille

Copy link
Copy Markdown

This is work in progress and serve as main thread for any questions related to this topic

@3outeille

3outeille commented Apr 19, 2023

Copy link
Copy Markdown
Author

@BlinkDL Do I have to quantize blocks.1.att.* as well ? (I am thinking of key, value, receptance weight)

@BlinkDL

BlinkDL commented Apr 20, 2023

Copy link
Copy Markdown
Owner

@3outeille yes do it for all matrices weights (ignore time_xxx)

@3outeille

Copy link
Copy Markdown
Author

@BlinkDL Do you happen to have a reference perplexity measure (or whatever metrics ) I can use as a baseline ?

@BlinkDL

BlinkDL commented Apr 25, 2023

Copy link
Copy Markdown
Owner

@BlinkDL Do you happen to have a reference perplexity measure (or whatever metrics ) I can use as a baseline ?

https://github.com/BlinkDL/ChatRWKV/blob/main/v2/benchmark.py use the LAMBADA ppl here

@meditans

Copy link
Copy Markdown

Question: would we expect a huge improvement wrt perplexity if we did quantization-aware training?

@3outeille

3outeille commented Apr 27, 2023

Copy link
Copy Markdown
Author

@meditans QAT will probably yield huge improvement but this imply re-training your model whereas GPTQ uses a post-training quantization strategy (no re-training involved)

@3outeille 3outeille force-pushed the quantize branch 3 times, most recently from f4584b4 to 76d937b Compare April 28, 2023 20:29
@BlinkDL

BlinkDL commented May 8, 2023

Copy link
Copy Markdown
Owner

How's it going :) are you in Discord

@3outeille

Copy link
Copy Markdown
Author

Yep, I sent a message on discord in quantization channel

@Evilran

Evilran commented May 19, 2023

Copy link
Copy Markdown

Hi. Is it available now?

@3outeille

Copy link
Copy Markdown
Author

@Evilran Hi, making it work with chatRWKV is too much of a hassle because it requires to change the RWKV class too much, thus the PR will not be accepted. However, I made it work with HuggingFace version of RWKV if you want: https://github.com/3outeille/GPTQ-for-RWKV

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants