Would there be interest in defining a pdep method for UInt128 in terms of pdep(::UInt64)?
using SmallCollections
using BenchmarkTools
s32 = SmallBitSet{UInt32}(1:32)
s64 = SmallBitSet{UInt64}(1:64)
s128 = SmallBitSet{UInt128}(1:128)
@btime Iterators.nth(s32, 30) # 23.046 ns (0 allocations: 0 bytes)
@btime Iterators.nth(s64, 30) # 23.972 ns (0 allocations: 0 bytes)
@btime Iterators.nth(s128, 30) # 51.874 ns (0 allocations: 0 bytes)
@btime Iterators.nth(s128, 1) # 24.598 ns (0 allocations: 0 bytes)
@btime Iterators.nth(s128, 64) # 87.891 ns (0 allocations: 0 bytes)
@btime Iterators.nth(s128, 128) # 156.787 ns (0 allocations: 0 bytes)
After defining pdep for UInt128
@btime Iterators.nth(s128, 1) # 27.236 ns (0 allocations: 0 bytes)
@btime Iterators.nth(s128, 64) # 27.583 ns (0 allocations: 0 bytes)
@btime Iterators.nth(s128, 128) # 28.442 ns (0 allocations: 0 bytes)
Here's an implementation:
using SmallCollections
using SmallCollections: pdep, unsafe_shl, bits
function SmallCollections.pdep(src::UInt128, mask::UInt128)
# Split the mask into high and low 64-bit parts
mask_lo = UInt64(mask & 0xFFFFFFFFFFFFFFFF)
mask_hi = UInt64(mask >> 64)
# Determine the number of bits consumed in the low half
bits_lo = count_ones(mask_lo)
# Split the input value into high and low 64-bit parts
src_lo = UInt64(src & 0xFFFFFFFFFFFFFFFF)
src_hi = UInt64((src >> bits_lo) & 0xFFFFFFFFFFFFFFFF)
res_lo = pdep(src_lo, mask_lo)
res_hi = pdep(src_hi, mask_hi)
# Combine the two results
return UInt128(res_lo) | (UInt128(res_hi) << 64)
end
@inline function Iterators.nth(s::SmallBitSet{UInt128}, n::Integer)
@boundscheck 1 <= n <= length(s) || boundserror(s, n)
b = pdep(unsafe_shl(one(UInt128), n-1), bits(s))
return trailing_zeros(b) + 1
end
Would there be interest in defining a
pdepmethod forUInt128in terms ofpdep(::UInt64)?After defining
pdepforUInt128Here's an implementation: