Skip to content

Foundations for BList#825

Draft
loladenney wants to merge 36 commits into
typelevel:l/faster-immutable-list-634from
loladenney:l/faster-immutable-list-634
Draft

Foundations for BList#825
loladenney wants to merge 36 commits into
typelevel:l/faster-immutable-list-634from
loladenney:l/faster-immutable-list-634

Conversation

@loladenney

Copy link
Copy Markdown
Contributor

building out the framework for the new datatype

Comment thread core/src/main/scala/cats/collections/BList.scala Outdated
@johnynek

Copy link
Copy Markdown
Contributor

This is the same issue as #809 and #634 right?

A lot of the same discussion from #809 would be relevant here I think.

loladenney and others added 2 commits May 28, 2026 13:27
Co-authored-by: Sergey Torgashov <satorg@users.noreply.github.com>
Comment thread core/src/main/scala/cats/collections/BList.scala Outdated
Comment thread core/src/main/scala/cats/collections/BList.scala Outdated
Comment thread core/src/main/scala/cats/collections/BList.scala Outdated
Comment thread core/src/main/scala/cats/collections/BList.scala Outdated
loladenney and others added 3 commits June 2, 2026 10:13
Co-authored-by: Sergey Torgashov <satorg@users.noreply.github.com>
Co-authored-by: Sergey Torgashov <satorg@users.noreply.github.com>
}

// (maybe impl will be covariant or not)
private case class Impl[A](offset: Int, block: Array[A], tail: BList[A]) extends NonEmpty[A] {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can make this covariant if block: Array[_ <: A] that said, since we don't have a classtag for the type A and we may not want to require it, we may want to instead use Array[Any] and cast when we get out, which would require boxing primitives, but List also boxes primitives.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It used to be Array[Any], but it ended up requiring so many uses of .asInstanceOf, and I didn't like how cluttered and confusing it made everything. Is there a good reason why we would want Impl to be covariant? If its worth it then I will try to find ways to contain the typecasts in a maintainable way.

Comment thread core/src/main/scala/cats/collections/BList.scala Outdated
}
def getUnsafe(idx: Long): A = {
if (idx < 0)
throw new IndexOutOfBoundsException

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we are throwing a different kind of exception for < 0 vs too large. Does List also do that? I would comment and follow List behavior here so it is fully compliant.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just checked and it seems to throw the same index out of bounds exception for index too small or too large!

Comment thread core/src/main/scala/cats/collections/BList.scala Outdated
Comment thread core/src/main/scala/cats/collections/BList.scala Outdated
}

def empty[A]: BList[A] = Empty
def unapply[A](l: BList[A]): Option[(A, BList[A])] = l.uncons

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be on the object NonEmpty inside BList and we should also have an apply method that builds a NonEmpty by prepending onto a BList. So BList.NonEmpty(h, t) can be used in construction and deconstruction.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, makes sense. I think something like this should work:

object BList {
  object NonEmpty {
    def apply[A](h: A, t: BList[A]): NonEmpty[B] = ...
    def unapply[A](l: BList.NonEmpty[A]): Some[(A, BList[A])] = l.uncons
  }
}

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What kinds of things do in the object NonEmpty vs the object Impl?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, should the return type of unapply still keep the Option?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What kinds of things do in the object NonEmpty vs the object Impl?

NonEmpty is a part of the public API, whereas Impl hosts implementation details which we don't want to reveal.

Also, should the return type of unapply still keep the Option?

That is a part of the trick: if we returned Option[...], then Scala compiler might struggle on exhausiveness checks in some cases – because it doesn't know whether it resolves to Some or None at runtime.

However, when we return Some, then we reassure the compiler that the result does always exist:

val fooBList: BList[Int] = ...

fooBList match {
  case BList.NonEmpty(head, tail) => // only matches if our `fooBList` is a `NonEmpty` instance.
  case BList.Empty =>
  // or `case _ : BList.Empty.type =>` or just `case _ =>` – the `match` should be exhaustive in any case.
}

Minimal-ish example: https://scastie.scala-lang.org/smsfk232TM6C01PJonKThQ

Comment thread core/src/main/scala/cats/collections/BList.scala Outdated
Comment thread core/src/main/scala/cats/collections/BList.scala Outdated
Comment thread core/src/main/scala/cats/collections/BList.scala
Comment thread core/src/main/scala/cats/collections/BList.scala
Comment thread core/src/main/scala/cats/collections/BList.scala Outdated
Comment thread core/src/main/scala/cats/collections/BList.scala Outdated

private class Impl[A](val offset: Int, val block: Array[A], val tailBList: BList[A]) extends NonEmpty[A] {

override def equals(other: Any): Boolean = {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While this implementation would probably work in general, it makes a lot of temporary heap allocations (for tuples and options). I’d suggest implementing it block‑by‑block rather than element‑by‑element, and avoiding allocations as much as possible. Something like this:

// Helper recursive class-level function.
@tailrec
private def isEqualTo(that: BList[A]): Boolean = {
  // We always know that `this` is `Impl` already.
  // Therefore we can safely short-circut to `false` if `that` is not.
  that match {
    case _: Empty.type => false
    case thatImpl: Impl[A] =>
        if (offset != thatImpl.offset || block.lengh != thatImpl.block.length)
          false
        else {
           // The offsets and block sizes are the same. Consequently,
           // we can use a boring yet efficient `while` loop over the array
           // with just a single
           var i = offset; // ..or `i = block.length - 1` if you will
           while { ... } // exits with `false` on the first mistmatch
           
           // If both blocks are equal, we can try to proceed with the next block:
           tailBList match {
              case _: Empty.type => thatImpl.isEmpty
              case nextImpl: Impl[A] =>
                nextImpl.isEqualTo(thatImpl.tailBList)
           }
        }
  }
}

Now, override def equals(other: Any) just needs to call our isEqualTo properly.

NOTES:

  1. Instead of multi-level nesting, short-circuiting with return false would probably look better and cleaner.
  2. case _: Empty.type => can be more efficient compared to case Empty =>
  3. Checks for block.length aren’t strictly necessary right now, but they will be if we ever decide to implement some kind of “adaptive” block sizes.
  4. do ... while would fit even better our case, because every Impl is always non-empty. However, Scala 3 dropped do ... while in favor of while { do side effects here then emit true or false }() kind of thing.

@loladenney loladenney Jun 18, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, so the issue I have with this implementation is that it checks for equality of blocks, which is stricter than the equality we want on ordered elements. Corresponding blocks of two BLists do not need to have the same offsets for the two lists to be equal.
Consider the lists BList(Block(1,2,3), Block(4,5), Empty) and BList(Block(1,2), Block(3,4,5), Empty). We want to say these are equal and I don't think isEqualTo will help with that, unless while checking equality we shift all the element to the left most spots as we go. That could be nice for reducing fragmentation, but can get expensive.

I could make isEqualTo compare arrays from a start index to an end index and do that where the blocks overlap...

@loladenney loladenney Jun 18, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update: I made a new equals that greedily compares within blocks. I think once I have some benchmarking up I might try to compare the performance between the two, but before then if anyone has thoughts on which might be faster I'd love to hear them.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the issue I have with this implementation is that it checks for equality of blocks, which is stricter than the equality we want on ordered elements.

Good point. In that case I'd personally go with the Iterator-based approach.

I believe we do need to get Iterator here anyway, and having that, the equals and hashCode can be implemented in a very simple and straightforward way.

Comment on lines +169 to +171
for (i <- this.offset until BlockSize) {
res = res * 31 + this.block(i).hashCode()
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

while would probably be more efficient that for:

var res = 31
var i = offset
while ( ... )

Besides, you already have var res here anyway.

import org.scalacheck.Gen

trait ArbitraryBList {
def bListGen[A](implicit arb: Arbitrary[A], cogen: Cogen[A]): Gen[BList[A]] = {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Usually, Gen functions take Gen parameters, not Arbitrary. Moreover, Cogen param here looks a little strange – for a list-like structure it should not be necessary.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I based my implementation off the other arbitrary datatype implementations in the same folder, and it seems like none of their Gen functions have a Gen parameter, but they do have implicit Arbitrary parameters.
As for the Cogen param, I will remove it.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the clarification. It's somewhat surprising that *Gen functions are implemented that way in CatsCollections. The ScalaCheck itself follows a slightly different approach.

For example,
stringOf takes Gen and returns Gen, whereas arbString is an implicit function that takes implicit Arbitrary (and uses stringOf under the hood).

Likewise,
buildableOf takes Gen[T] explicitly and returns Gen as well, whereas arbContainer is an implicit that takes Arbitrary[T] implicitly and returns Arbitrary.

So I'd say – that was the original idea of ScalaCheck, whereas the current implementation in CatsCollections deviates from there quite a bit.

Comment on lines +49 to +51
Gen.resize(n / 2, bListGen[A]).flatMap { l =>
arbitrary[A => A].map(fn => l.map(fn))
}

@satorg satorg Jun 23, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the only place that requires Cogen instance, but I don't think we really need it:

// map     
Gen.resize(n / 2, bListGen[A]).map { l =>
  l.map(identity)
}

We can simply pass identity here, because it won't affect how map function works.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! I made this change.

def concat[B >: A](l2: BList[B]): BList[B]
def toList: List[A]
def isEmpty: Boolean
def toIterator: Iterator[A]

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, .toIterator is deprecated in Scala Library in favor of .iterator starting with v2.13:
https://github.com/scala/scala/blob/b5e60a0853bbd53f9975050240199483d41bda92/src/library/scala/collection/IterableOnce.scala#L204-L205

So perhaps, BList should use name iterator here as well.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

made this change!

Comment thread core/src/main/scala/cats/collections/BList.scala Outdated
Comment on lines +141 to +146
while (iterX.hasNext && iterY.hasNext) {
if (iterX.next() != iterY.next()) { // not sure if i could be comparing with != here...
return false
}
}
iterX.hasNext == iterY.hasNext

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps, it can simply call iterX.sameElements(iterY) (after casting to the same base element type):
https://github.com/scala/scala/blob/b5e60a0853bbd53f9975050240199483d41bda92/src/library/scala/collection/Iterator.scala#L841-L855

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems cleaner, thanks!

Comment on lines +368 to +371
final class BListIterator[A](from: BList[A]) extends Iterator[A] {
private var curNode: BList[A] = from
private var curOffset: Int = if (!from.isEmpty) curNode.asInstanceOf[Impl[A]].offset else BlockSize

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BListIterator can be private. In that case, no need to copy init params into private variables. Moreover, since Empty.iterator can simply return Iterator.empty, perhaps no need to pass common base BList here and simply work with Impl all along:

private final class BListIterator[A](
  var curNode: Impl[A],
  var curOffset: Int) extends Iterator[A] {

  def hasNext: Boolean =
    // Leave `curOffset` at `curNode.block.length` if `curNode` is the last one.
    curOffset < curNode.block.length

  def next: A = ???

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Will do.

loladenney and others added 2 commits June 29, 2026 14:48
Co-authored-by: Sergey Torgashov <satorg@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants