Foundations for BList#825
Conversation
Co-authored-by: Sergey Torgashov <satorg@users.noreply.github.com>
Co-authored-by: Sergey Torgashov <satorg@users.noreply.github.com>
Co-authored-by: Sergey Torgashov <satorg@users.noreply.github.com>
| } | ||
|
|
||
| // (maybe impl will be covariant or not) | ||
| private case class Impl[A](offset: Int, block: Array[A], tail: BList[A]) extends NonEmpty[A] { |
There was a problem hiding this comment.
I think you can make this covariant if block: Array[_ <: A] that said, since we don't have a classtag for the type A and we may not want to require it, we may want to instead use Array[Any] and cast when we get out, which would require boxing primitives, but List also boxes primitives.
There was a problem hiding this comment.
It used to be Array[Any], but it ended up requiring so many uses of .asInstanceOf, and I didn't like how cluttered and confusing it made everything. Is there a good reason why we would want Impl to be covariant? If its worth it then I will try to find ways to contain the typecasts in a maintainable way.
| } | ||
| def getUnsafe(idx: Long): A = { | ||
| if (idx < 0) | ||
| throw new IndexOutOfBoundsException |
There was a problem hiding this comment.
we are throwing a different kind of exception for < 0 vs too large. Does List also do that? I would comment and follow List behavior here so it is fully compliant.
There was a problem hiding this comment.
Just checked and it seems to throw the same index out of bounds exception for index too small or too large!
| } | ||
|
|
||
| def empty[A]: BList[A] = Empty | ||
| def unapply[A](l: BList[A]): Option[(A, BList[A])] = l.uncons |
There was a problem hiding this comment.
I think this should be on the object NonEmpty inside BList and we should also have an apply method that builds a NonEmpty by prepending onto a BList. So BList.NonEmpty(h, t) can be used in construction and deconstruction.
There was a problem hiding this comment.
Yes, makes sense. I think something like this should work:
object BList {
object NonEmpty {
def apply[A](h: A, t: BList[A]): NonEmpty[B] = ...
def unapply[A](l: BList.NonEmpty[A]): Some[(A, BList[A])] = l.uncons
}
}There was a problem hiding this comment.
What kinds of things do in the object NonEmpty vs the object Impl?
There was a problem hiding this comment.
Also, should the return type of unapply still keep the Option?
There was a problem hiding this comment.
What kinds of things do in the object NonEmpty vs the object Impl?
NonEmpty is a part of the public API, whereas Impl hosts implementation details which we don't want to reveal.
Also, should the return type of unapply still keep the Option?
That is a part of the trick: if we returned Option[...], then Scala compiler might struggle on exhausiveness checks in some cases – because it doesn't know whether it resolves to Some or None at runtime.
However, when we return Some, then we reassure the compiler that the result does always exist:
val fooBList: BList[Int] = ...
fooBList match {
case BList.NonEmpty(head, tail) => // only matches if our `fooBList` is a `NonEmpty` instance.
case BList.Empty =>
// or `case _ : BList.Empty.type =>` or just `case _ =>` – the `match` should be exhaustive in any case.
}Minimal-ish example: https://scastie.scala-lang.org/smsfk232TM6C01PJonKThQ
Co-authored-by: Arman Bilge <arman@typelevel.org>
|
|
||
| private class Impl[A](val offset: Int, val block: Array[A], val tailBList: BList[A]) extends NonEmpty[A] { | ||
|
|
||
| override def equals(other: Any): Boolean = { |
There was a problem hiding this comment.
While this implementation would probably work in general, it makes a lot of temporary heap allocations (for tuples and options). I’d suggest implementing it block‑by‑block rather than element‑by‑element, and avoiding allocations as much as possible. Something like this:
// Helper recursive class-level function.
@tailrec
private def isEqualTo(that: BList[A]): Boolean = {
// We always know that `this` is `Impl` already.
// Therefore we can safely short-circut to `false` if `that` is not.
that match {
case _: Empty.type => false
case thatImpl: Impl[A] =>
if (offset != thatImpl.offset || block.lengh != thatImpl.block.length)
false
else {
// The offsets and block sizes are the same. Consequently,
// we can use a boring yet efficient `while` loop over the array
// with just a single
var i = offset; // ..or `i = block.length - 1` if you will
while { ... } // exits with `false` on the first mistmatch
// If both blocks are equal, we can try to proceed with the next block:
tailBList match {
case _: Empty.type => thatImpl.isEmpty
case nextImpl: Impl[A] =>
nextImpl.isEqualTo(thatImpl.tailBList)
}
}
}
}Now, override def equals(other: Any) just needs to call our isEqualTo properly.
NOTES:
- Instead of multi-level nesting, short-circuiting with
return falsewould probably look better and cleaner. case _: Empty.type =>can be more efficient compared tocase Empty =>- Checks for
block.lengtharen’t strictly necessary right now, but they will be if we ever decide to implement some kind of “adaptive” block sizes. do ... whilewould fit even better our case, because everyImplis always non-empty. However, Scala 3 droppeddo ... whilein favor ofwhile { do side effects here then emit true or false }()kind of thing.
There was a problem hiding this comment.
Ok, so the issue I have with this implementation is that it checks for equality of blocks, which is stricter than the equality we want on ordered elements. Corresponding blocks of two BLists do not need to have the same offsets for the two lists to be equal.
Consider the lists BList(Block(1,2,3), Block(4,5), Empty) and BList(Block(1,2), Block(3,4,5), Empty). We want to say these are equal and I don't think isEqualTo will help with that, unless while checking equality we shift all the element to the left most spots as we go. That could be nice for reducing fragmentation, but can get expensive.
I could make isEqualTo compare arrays from a start index to an end index and do that where the blocks overlap...
There was a problem hiding this comment.
Update: I made a new equals that greedily compares within blocks. I think once I have some benchmarking up I might try to compare the performance between the two, but before then if anyone has thoughts on which might be faster I'd love to hear them.
There was a problem hiding this comment.
the issue I have with this implementation is that it checks for equality of blocks, which is stricter than the equality we want on ordered elements.
Good point. In that case I'd personally go with the Iterator-based approach.
I believe we do need to get Iterator here anyway, and having that, the equals and hashCode can be implemented in a very simple and straightforward way.
| for (i <- this.offset until BlockSize) { | ||
| res = res * 31 + this.block(i).hashCode() | ||
| } |
There was a problem hiding this comment.
while would probably be more efficient that for:
var res = 31
var i = offset
while ( ... )Besides, you already have var res here anyway.
| import org.scalacheck.Gen | ||
|
|
||
| trait ArbitraryBList { | ||
| def bListGen[A](implicit arb: Arbitrary[A], cogen: Cogen[A]): Gen[BList[A]] = { |
There was a problem hiding this comment.
Usually, Gen functions take Gen parameters, not Arbitrary. Moreover, Cogen param here looks a little strange – for a list-like structure it should not be necessary.
There was a problem hiding this comment.
I based my implementation off the other arbitrary datatype implementations in the same folder, and it seems like none of their Gen functions have a Gen parameter, but they do have implicit Arbitrary parameters.
As for the Cogen param, I will remove it.
There was a problem hiding this comment.
Thanks for the clarification. It's somewhat surprising that *Gen functions are implemented that way in CatsCollections. The ScalaCheck itself follows a slightly different approach.
For example,
stringOf takes Gen and returns Gen, whereas arbString is an implicit function that takes implicit Arbitrary (and uses stringOf under the hood).
Likewise,
buildableOf takes Gen[T] explicitly and returns Gen as well, whereas arbContainer is an implicit that takes Arbitrary[T] implicitly and returns Arbitrary.
So I'd say – that was the original idea of ScalaCheck, whereas the current implementation in CatsCollections deviates from there quite a bit.
| Gen.resize(n / 2, bListGen[A]).flatMap { l => | ||
| arbitrary[A => A].map(fn => l.map(fn)) | ||
| } |
There was a problem hiding this comment.
This is the only place that requires Cogen instance, but I don't think we really need it:
// map
Gen.resize(n / 2, bListGen[A]).map { l =>
l.map(identity)
}We can simply pass identity here, because it won't affect how map function works.
There was a problem hiding this comment.
Thank you! I made this change.
| def concat[B >: A](l2: BList[B]): BList[B] | ||
| def toList: List[A] | ||
| def isEmpty: Boolean | ||
| def toIterator: Iterator[A] |
There was a problem hiding this comment.
Actually, .toIterator is deprecated in Scala Library in favor of .iterator starting with v2.13:
https://github.com/scala/scala/blob/b5e60a0853bbd53f9975050240199483d41bda92/src/library/scala/collection/IterableOnce.scala#L204-L205
So perhaps, BList should use name iterator here as well.
There was a problem hiding this comment.
made this change!
| while (iterX.hasNext && iterY.hasNext) { | ||
| if (iterX.next() != iterY.next()) { // not sure if i could be comparing with != here... | ||
| return false | ||
| } | ||
| } | ||
| iterX.hasNext == iterY.hasNext |
There was a problem hiding this comment.
Perhaps, it can simply call iterX.sameElements(iterY) (after casting to the same base element type):
https://github.com/scala/scala/blob/b5e60a0853bbd53f9975050240199483d41bda92/src/library/scala/collection/Iterator.scala#L841-L855
There was a problem hiding this comment.
This seems cleaner, thanks!
| final class BListIterator[A](from: BList[A]) extends Iterator[A] { | ||
| private var curNode: BList[A] = from | ||
| private var curOffset: Int = if (!from.isEmpty) curNode.asInstanceOf[Impl[A]].offset else BlockSize | ||
|
|
There was a problem hiding this comment.
BListIterator can be private. In that case, no need to copy init params into private variables. Moreover, since Empty.iterator can simply return Iterator.empty, perhaps no need to pass common base BList here and simply work with Impl all along:
private final class BListIterator[A](
var curNode: Impl[A],
var curOffset: Int) extends Iterator[A] {
def hasNext: Boolean =
// Leave `curOffset` at `curNode.block.length` if `curNode` is the last one.
curOffset < curNode.block.length
def next: A = ???There was a problem hiding this comment.
Thanks! Will do.
Co-authored-by: Sergey Torgashov <satorg@users.noreply.github.com>
building out the framework for the new datatype