I found that a masked vslideup.vx with a large slide offset uses the wrong mask window on Ara with VLEN = 4096.
The testcase is:
li t0, 300
vsetvli x0, t0, e8, m1
li t1, 0xaa
vmv.v.x v1, t1
vid.v v2
la t2, mask_data
vlm.v v0, (t2)
li t3, 256
vslideup.vx v1, v2, t3, v0.t
with:
vl = 300
- destination
v1 preinitialized to 0xaa
- source
v2 initialized with vid.v, so element i contains i
stride = 256
mask_data keeps mask bits 0..255 inactive and mask bits 256..299 active
So the expected architectural behavior is:
- elements before the slide region that are not written should remain
0xaa
- elements in the post-stride region should use mask bits starting at index 256
- the testcase probes elements
255, 256, 257, and 299
Spike reports:
mem 0x0000000080021400 0x000000aa
mem 0x0000000080021404 0x00000000
mem 0x0000000080021408 0x00000001
mem 0x000000008002140c 0x0000002b
mem 0x0000000080021410 0x0000000000000000 # vstart
mem 0x0000000080021418 0x0000000000000000 # exit code
This matches the expected vslideup result:
- element
255 stays 0xaa
- element
256 becomes source element 0
- element
257 becomes source element 1
- element
299 becomes source element 43 (0x2b)
Ara completes the testcase and exits successfully, but the probed bytes differ:
mem 0x0000000080021400 0x00000000000000aa
mem 0x0000000080021404 0x00000000000000aa
mem 0x0000000080021408 0x00000000000000aa
mem 0x000000008002140c 0x00000000000000aa
mem 0x0000000080021410 0x0000000000000000 # vstart
mem 0x0000000080021418 0x0000000000000000 # exit code
Ara leaves elements 256, 257, and 299 at their old value 0xaa instead of updating them. This looks like the post-stride mask bits are not being selected correctly for a large vslideup offset.
Run commands:
riscv64-unknown-elf-gcc -nostdlib -static \
-march=rv64imfdcv_zicsr_zifencei_zfh \
-mabi=lp64d \
-T /home/user/common.ld \
-o /home/user/masked_vslideup_large_stride_mask_probe.elf \
/home/user/masked_vslideup_large_stride_mask_probe.S
/home/user/riscv-isa-sim/install/bin/spike \
-p1 \
--isa=RV64IMAFDCV_ZICSR_ZIFENCEI_ZFH_ZVL4096B \
--log-commits \
/home/user/masked_vslideup_large_stride_mask_probe.elf \
> masked_vslideup_large_stride_mask_probe.spike.log 2>&1
/home/user/ara/hardware/build-rvfi9/verilator/Vara_tb_verilator \
-l ram,/home/user/masked_vslideup_large_stride_mask_probe.elf,elf \
> masked_vslideup_large_stride_mask_probe.ara.log 2>&1
program.zip
I found that a masked
vslideup.vxwith a large slide offset uses the wrong mask window on Ara withVLEN = 4096.The testcase is:
with:
vl = 300v1preinitialized to0xaav2initialized withvid.v, so elementicontainsistride = 256mask_datakeeps mask bits0..255inactive and mask bits256..299activeSo the expected architectural behavior is:
0xaa255,256,257, and299Spike reports:
This matches the expected
vslideupresult:255stays0xaa256becomes source element0257becomes source element1299becomes source element43(0x2b)Ara completes the testcase and exits successfully, but the probed bytes differ:
Ara leaves elements
256,257, and299at their old value0xaainstead of updating them. This looks like the post-stride mask bits are not being selected correctly for a largevslideupoffset.Run commands:
program.zip