Skip to content

Commit 1d77b03

Browse files
committed
Update on "Use unfused SDPA for short sequences (q_len <= 128 or kv_len <= 128)"
ATT Differential Revision: [D96044308](https://our.internmc.facebook.com/intern/diff/D96044308/) [ghstack-poisoned]
2 parents 8e6f0c3 + 2809edd commit 1d77b03

1 file changed

Lines changed: 4 additions & 3 deletions

File tree

.ci/scripts/test_lora.sh

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -137,12 +137,13 @@ EXPECTED_QUANT_PREFIX="<|im_start|>user Calculate 15% of 80?<|im_end|><|im_start
137137
Okay, so I need to calculate 15% of 80."
138138
EXPECTED_QUANT_LORA_PREFIX="
139139
<|im_start|>user Calculate 15% of 80?<|im_end|><|im_start|>assistant
140-
To calculate 15% of 80, we can multiply 80 by 15/100 and then simplify the fraction.
141-
So, 15% of 80 is equal to (80 * 15) / 100 = 1200 / 100 = 12.
140+
To calculate 15% of 80, we can multiply 80 by 15/100.
141+
So, 15% of 80 is equal to 80 * 15/100 = 12.
142142
#### 12
143143
The answer is: 12<|im_end|>"
144144

145145

146+
146147
# Export Quantized PTE, PTD file, no LoRA.
147148
# override base.lora_config=null to avoid creating a lora model
148149
# and loading lora weights.
@@ -202,7 +203,7 @@ fi
202203
NOW=$(date +"%H:%M:%S")
203204
echo "Test 4: Quantized, program-data separation lora. Starting to run llama runner at ${NOW}"
204205
# shellcheck source=/dev/null
205-
cmake-out/examples/models/llama/llama_main --model_path=qwen_lora_math_q.pte --data_paths="qwen_foundation_q.ptd,qwen_lora_math_q.ptd" --prompt="${PROMPT}" ${RUNTIME_ARGS} --seq_len=104 > result.txt
206+
cmake-out/examples/models/llama/llama_main --model_path=qwen_lora_math_q.pte --data_paths="qwen_foundation_q.ptd,qwen_lora_math_q.ptd" --prompt="${PROMPT}" ${RUNTIME_ARGS} > result.txt
206207
NOW=$(date +"%H:%M:%S")
207208
echo "Finished at ${NOW}"
208209

0 commit comments

Comments
 (0)