GuardiANN by normen662 · Pull Request #4083 · FoundationDB/fdb-record-layer

normen662 · 2026-04-14T19:19:48Z

GuardiANN

Guided Updatable cluster Assignment and Routing via Distance-based Indexing for ANN searches.

Summary

Introduces GuardiANN, a new approximate nearest neighbor (ANN) vector structure built on FoundationDB.

GuardiANN partitions vectors into clusters with centroids indexed in an HNSW graph, supporting transactional insert, search, and delete operations with lazy background maintenance.

Architecture

Vectors are organized into clusters. Each cluster maintains:

primary vector references — the authoritative copies, with running statistics
(mean, variance, max distance to centroid) tracked via RunningStats (Welford's online algorithm)
replicated vector references — copies from neighboring clusters kept for improved search recall,
scored by a replication priority metric
collapsed references — deduplicated representatives for groups of identical vectors (>100 duplicates by default),
with individual IDs stored in a separate subspace

Cluster centroids are indexed in an HNSW graph for fast nearest-cluster lookup. Cluster metadata tracks vector counts, replication state, and running distance statistics.

Operations

Insert (Insert.java) — finds the nearest cluster(s), writes primary and replicated references, optionally
samples vectors for RaBitQ centroid computation
Search (Search.java) — five-stage pipeline: candidate cluster selection → distance-ratio pruning → vector retrieval with maxEver-based ball pruning → collapsed reference expansion → metadata validation and enrichment. Supports both top-k search (kNearestNeighborsSearch) and resumable outward search (searchOrderedByDistance)
Delete (Delete.java) — locates references by probing candidate clusters, reads actual vector references to determine primary vs. replicated status, handles collapsed vectors by cleaning up the collapsed vector ID subspace

Maintenance Tasks

Structural maintenance runs lazily, piggy-backed on insert/delete transactions:

SplitMergeTask — when a cluster exceeds primaryClusterMax or drops below primaryClusterMin, evaluates candidate repartitionings (1→2 / 2→3 splits, or 2→1 / 3→2 merges) using bounded k-means, scores them via SplitMergeEvaluator (composite quality metric over SSE gain, separation, balance, low-margin rate), and executes the best candidate
ReassignTask — recomputes vector-to-cluster assignments and replication for a target cluster, fixing
underreplicated primaries and cleaning up excess replicas
CollapseTask — detects and collapses large groups of identical vectors into single representatives
BounceTask — coordinates dependent task execution order (executes one dependent per bounce, re-enqueues
itself until all dependents complete)

Configuration

All algorithm parameters are configurable via Config (record class with builder):

Cluster size bounds, replication thresholds, replication priority minimum
Per-operation knobs: search concurrency, insert candidate clusters, delete candidate clusters, split neighborhood size,
merge neighborhood sizes, k-means iterations/restarts, reassign neighborhood sizes, collapse duplicate threshold
Search parameters (searchMaxClusters, searchMinClustersBeforePruning, searchDistanceRatioCutoff) are query-time
parameters passed per call, not stored in config
RaBitQ quantization support with configurable extra bits per dimension

Other Changes

HNSW Config — converted from final class to record (same pattern as Guardiann Config), updated all accessor call sites
RunningStats (renamed from RunningStandardDeviation) — added maxEver tracking (exact on add/combine, stale upper bound
after remove/subtract), converted to record
TopK / DistinctTopK — separated into two classes with min()/max() factory methods; fixed worstElement() and simplified
toSortedList() in DistinctTopK
VectorId — made Comparable for use in sorted collections

Test Plan

SiftTest — insert 100k SIFT vectors, verify search recall
DebugIndexTest — end-to-end insert + search + assignment quality checks
TopKTest — unit tests for TopK/DistinctTopK correctness
Verify split/merge/reassign/collapse task execution via task listener counters
Verify delete removes both regular and collapsed vector references

hatyo · 2026-06-18T13:18:12Z

+     * @return a future completing with the retained top-K elements, sorted from best to worst
+     */
+    @Nonnull
+    public CompletableFuture<List<T>> collect(final AsyncIterable<T> iterable,


this method, and collectRemaining do not seem to be used anywhere. Can we remove them?

it's a pattern used throughout MoreAsyncUtil. I am using it here as well to provide the helpers. I actually had a need for them temporarily during the development of this PR. Since TopK and DistinctTopK has been moved to async/common for general usage I keep these methods here as well. I will, however, provide proper testing, so these methods are not actually technically unused.

hatyo · 2026-06-18T13:21:10Z

+ * copy and its replicated copies) and provides a stable secondary ordering. Instances order by primary key first,
+ * then by UUID.
+ */
+class VectorId implements Comparable<VectorId> {


This looks like a textbook record to me, can you convert it to a record?

hatyo · 2026-06-18T13:28:21Z

+    }
+
+    @Override
+    public boolean equals(final Object o) {


this takes additionalValues into account, at the same time, this inherits Comparable indirectly from VectorId, including the implementation of compareTo of the parent which does not even know about additioalValues, which breaks Comparable contract.

https://docs.oracle.com/en/java/javase/25/docs/api/java.base/java/lang/Comparable.html

"The natural ordering for a class C is said to be consistent with equals if and only if e1.compareTo(e2) == 0 has the same boolean value as e1.equals(e2) for every e1 and e2 of class C."

I think this might be fine, but definitely worth some documentation.

Please take a look at how this is solved now -- I changed the scheme to prefer composition over inheritance. The problem went away. The whole thing looks cleaner now.

hatyo · 2026-06-25T10:26:48Z

+
+        return primitives.fetchClusterMetadata(transaction, getTargetClusterId())
+                .thenCompose(clusterMetadata -> {
+                    if (clusterMetadata == null) {


same as reassign task, does this warrant a logger warning?

We could warn on it but it's something that can occur naturally. The cluster might have gotten merged in the meantime (between enqueueing the task and executing it). In that case this can happen and is benign.

hatyo · 2026-06-25T10:28:09Z

+                    }
+
+                    final EnumSet<ClusterMetadata.State> states = clusterMetadata.states();
+                    if (!states.contains(ClusterMetadata.State.COLLAPSE)) {


is it possible (and ok) to have a state that is, for example, COLLAPSE + SPLIT or COLLAPSE + REASSIGN?

ReassignTask and SplitMergeTask both will both check before they do any mutations and abort in the presence of COLLAPSE. While splitting if we decide to attempt a collapse we actually enqueue COLLAPSE and BOUNCE[SPLIT] and leave the state in COLLAPSE only. If CollapseTask encounters COLLAPSE is sufficient as it will enqueue a SPLIT afterwards anyway meaning that if something else was already waiting to do a task on something that also is in the state of COLLAPSE, it will just get clobbered and that is ok.

hatyo · 2026-06-25T10:46:38Z

+                // If this is a replica we may as well subsample it like reassign would normally do it.
+                // Collapsing a cluster should also work in an identical way if we didn't do that here.
+                //
+                replicatedTopK.add(vectorReference);


It feels like this is not related, or at least orthogonal, to collapse task logic?

IIUC this causes lower priority replicas to be removed until the number of replicas is less than specific threshold (replicatedClusterTarget?).

If I am right, this seems like a cleanup task that should have its own identity instead of piggypacking on collapse here? If you feel this is justified, could you please add tis behavior to the documentation of CollapseTask?

I am doing this here because I have all the info needed, and reading the vectors is somewhat expensive. I will document it.

hatyo · 2026-06-25T12:27:50Z

+        //
+        final ImmutableSetMultimap<UUID, UUID> vectorReferenceToVectorSignatureMap =
+                vectorReferenceToSignatureMap(vectorReferences);
+        final ImmutableSetMultimap<UUID, UUID> vectorSignatureToVectorUuidMap =


this is a multimap holding all vectors sharing the same signature, so they are .... identical and collapsible?

They use the same signature (the same vector), they are collapsible.

…es config) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

alecgrieser

By file, I'm only about 1/4 of the way through this, but I figured I'd give a preliminary round first. Overall, I think this makes sense, and most of this is about documentation or clarification.

alecgrieser · 2026-06-26T17:51:46Z

+    public List<T> toSortedList() {
+        final PriorityQueue<T> copy = new PriorityQueue<>(queue);
+        final int size = copy.size();
+        final T[] array = (T[]) new Object[size];


Given that you then copy this into an ImmutableList anyway, I think you could alternatively use ImmutableList.newBuilderWithExpectedSize() and then return builder.build().reverse(), which returns an in-place reversed view of the list rather than making a copy

alecgrieser · 2026-06-26T17:55:37Z

+                if (assignedVectorReference.isPrimaryCopy() != vectorReference.isPrimaryCopy() ||
+                        assignedVectorReference.isUnderreplicated() != vectorReference.isUnderreplicated()) {


Does it need to be rewritten if the replicationPriority changes? Should this just be turned into a .equals check on the two objects?

Yes, it should incorporate replicationPriority as suggested, I added that (also to mask out noise from SIMD operations it has to be in a range and not just bit-equal), however, we should not do a general equals() here as we dont want to also compare the vector and that equality is already guaranteed when we get here.

alecgrieser · 2026-06-29T09:58:48Z

 *       stable post-quiescence shape to inject into),</li>
 *   <li>find the cluster a fixed query vector would naturally belong to, using the exact
 *       neighborhood-discovery logic {@code Insert} uses (stable across reseeds — cluster UUIDs
- *       change with every split, but the "nearest cluster to vector v" choice does not), and</li>


Was this line deliberately removed? I think the list items don't parse any more without it

alecgrieser

Another additional set of comments. I'm now about 1/3 of the way through. Sorry if some of these have become outdated since I left them; I haven't gone through all of them to validate that the latest set of concurrent changes didn't make them obsolete

alecgrieser · 2026-06-29T10:28:56Z

+                                       final int numInnerNeighborhood,
+                                       final int numOuterNeighborhood) {


There's some singular/plural confusion in both the comments and also variable names in this method. I think both the inner and outer neighborhood lists can contain multiple elements, so these should probably all be "neighborhoods"

did a big renaming, please check.

alecgrieser · 2026-06-29T10:34:29Z

+        if (foundPrimaryCluster) {
+            final int cappedNumInnerNeighborhood = Math.min(numInnerNeighborhood, clusterMetadataWithDistances.size());
+            return new Neighborhoods(clusterMetadataWithDistances.subList(0, cappedNumInnerNeighborhood),
+                    clusterMetadataWithDistances.subList(cappedNumInnerNeighborhood, clusterMetadataWithDistances.size()));


Why does this return the whole tail as the outer neighborhoods instead of incorporating the numOuterNeighborhoods list? Is it assuming that that limit is baked into the input?

This is indeed an oversight which didn't actually matter as every caller implicitly or explicitly capped it already. In order to depend on that in the future, I capped it here though.

alecgrieser · 2026-06-29T11:04:55Z

+ *        enqueued
+ * @param underreplicatedPrimaryClusterMax maximum number of under-replicated primary vectors in a cluster, overflow
+ *        will result in a reassign task to be enqueued
+ * @param replicatedClusterMaxWrites maximum number of writes of replicated vectors to a cluster


"writes" here is an interesting clarification. Is this supposed to be the maximum number of replicated vectors in a cluster, or is this a limit on the amount of I/O (implied by "writes"--like per transaction)? Or is this the maximum number of clusters that a single vector will be written to (as opposed to vectors per cluster)?

alecgrieser · 2026-06-29T13:32:33Z

+ * @param replicationStatsMinSampleSize minimum number of primary vectors in a cluster before its distance statistics
+ *        (mean/standard deviation) are trusted for the replication priority z-score term


What's the "trust" here about? The actual stats are accurate, right, so is this more about drift or about small sets have artificially high variance or something?

alecgrieser · 2026-06-29T13:36:54Z

+        }
+
+        @Nonnull
+        public ConfigBuilder setMetric(@Nonnull final Metric metric) {


minor: We're a bit inconsistent on this in our codebase, but it may be good practice to label these setters as @CanIgnoreReturnValue

alecgrieser · 2026-06-29T18:11:55Z

+        }
+        final long hi = readLongBigEndian(keyAsBytes, 0);
+        final long lo = readLongBigEndian(keyAsBytes, 8);
+        return new UUID(hi, lo);


I get that in some level, we're just using this as a signature of 128 bits in a convenient form, but should this clear out the appropriate bits so that it conforms with UUID type v4? I guess that takes us down to 126 bits of entropy, but it may avoid confusion with other types of UUIDs

alecgrieser · 2026-06-29T18:17:05Z

+        return new ClusterReference(valueTuple.getUUID(0),
+                storageTransform.transform(StorageHelpers.vectorFromBytes(config, valueTuple.getBytes(1))));


I think I left a similar comment on one of the HNSW PRs, but it is a bit of shame that using Tuples is the path of least resistance here, rather than, say Protobuf, for all of the things that don't need to be in FDB keys (and therefore we don't care about sacrificing space for maintaining serialization order). I think the easiest path might be to split out this (and/or the HNSW) its own subproject which could also depend on Protobuf. We could theoretically change that later, though not too much later, as once we start serializing stuff in real indexes, this becomes effectively impossible to change

Understood. I was talking to @ScottDugas the other day about moving all vector-related stuff into its own subcomponent. Then we could also do that.

alecgrieser · 2026-06-29T18:38:02Z

+        return Tuple.from(vectorId.uuid());
+    }
+
+    static double replicationPriority(@Nonnull final Config config,


I'm trying to understand what this value is actually encapsulating. It's something like a weighted average of the number of standard deviations from the centroid as well as the actual distance itself. It's a bit of an odd metric, as it's combining data from two different unit systems. Is this one of those things that's taken out of a paper?

It might be good to add a bit more docs on the "why" here, or at least a reference to where this comes from

alecgrieser · 2026-06-29T18:44:36Z

+    static boolean isOccluded(@Nonnull final DistanceEstimator estimator,
+                              @Nonnull final ClusterMetadataWithDistance replicationCandidate,
+                              @Nonnull final List<ClusterMetadataWithDistance> selectedReplicationClusters) {


This could probably use Javadoc. From the implementation, I take it it's trying to figure out if for a given vector v that is being replicated into a cluster c if there's another cluster c^′ such that |v - c| > |c - c^′|--that is, the two clusters are closer than the vector is to the cluster it's trying to replicate into. It's not immediately obvious to me why that would automatically mean that we want to reject c or c^′, though I could see why from the triangle inequality, we'd only want to pick the closest cluster to v

normen662 added 30 commits February 6, 2026 18:33

some preliminaries

6693899

added fetchClusters

bf7c535

more stuff

d32dc38

insert almost done

0ffef82

insert almost done; added basic tasks

cf3743a

insert code complete

3016c74

deferred tasks code-complete

4bde30c

added mutable vectors and kmeans computation for SplitMergeTask

7adb468

fetch clusters

f377fdd

in the midst of SplitMergeTask; project compiles

ac3f555

save point

6a38424

save point

763f154

save point -- split almost working

ab990dd

save point

2763dff

split code complete

aacbba1

split code complete

fa13f64

code complete merge

1d511a7

states as enumset

aea3ac7

Merge remote-tracking branch 'upstream/main' into guardiann-poc

e01bb4a

fix bug to always update isPrimaryCopy during split/merge

62f06c6

code complete reassign task

a5d9fbb

some minimal changes and introduction of the Search class

875e6bf

tons of fixes around vector replication

2d3edf0

save point; search almost done

1fedc3c

insert/search code complete

943c643

save point

4ab54ad

cause clusters in reassign taks

cb0512f

bounce reassign code complete

c169db0

preparations done

6fdc6e0

reimplementation to avoid reassignment storm done

24c0c08

normen662 added 2 commits June 19, 2026 15:27

restarting the CI pipeline

191ac0d

addressing team scale critical findings

e7edab9

normen662 force-pushed the guardiann-poc branch from 9aa9cdb to e7edab9 Compare June 20, 2026 14:56

normen662 added 3 commits June 20, 2026 18:44

adding ConfigTest for GuardiANN

337a48e

fix for the collapse/reassign bug

0e8e008

Merge remote-tracking branch 'upstream/main' into guardiann-poc

411f831

normen662 force-pushed the guardiann-poc branch from a7961d6 to 411f831 Compare June 22, 2026 18:31

normen662 added 4 commits June 25, 2026 12:25

Merge remote-tracking branch 'upstream/main' into guardiann-poc-2

2ddee3a

test cases for searchOrderedByDistance

dcf819e

imrpoving tests

564390d

fix diff computation of big PRs

5fb739a

hatyo requested changes Jun 25, 2026

View reviewed changes

normen662 and others added 2 commits June 26, 2026 17:16

Reset .idea/misc.xml to main (drop stray IDE FrameworkDetectionExclud…

6b752b9

…es config) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Merge remote-tracking branch 'upstream/main' into guardiann-poc

6c556c1

alecgrieser requested changes Jun 26, 2026

View reviewed changes

normen662 added 3 commits June 27, 2026 10:46

addressing some comments

369044a

refactoring VectorId <-> VectorMetadata

d333292

addressing more comments

8251732

alecgrieser reviewed Jun 29, 2026

View reviewed changes

VectorReference as sealead interface

fd7f51d

alecgrieser requested changes Jun 29, 2026

View reviewed changes

normen662 added 9 commits June 30, 2026 10:39

run some tests is parallel

f2ac537

refactored neighborhood namings + tons pf addressed comments

dfcebee

Merge remote-tracking branch 'upstream/main' into guardiann-poc

7bede2b

increase timeout for tests in CI

1c7d7d4

more comments addressed

b9861d7

addressing more comments

8111486

addressing comments in BounceTask

e33c846

make RaBitQ bit-exact

051cf50

bounce task random-seed bug

9288758

		if (assignedVectorReference.isPrimaryCopy() != vectorReference.isPrimaryCopy() \|\|
		assignedVectorReference.isUnderreplicated() != vectorReference.isUnderreplicated()) {

		final int numInnerNeighborhood,
		final int numOuterNeighborhood) {

		* @param replicationStatsMinSampleSize minimum number of primary vectors in a cluster before its distance statistics
		* (mean/standard deviation) are trusted for the replication priority z-score term

		return new ClusterReference(valueTuple.getUUID(0),
		storageTransform.transform(StorageHelpers.vectorFromBytes(config, valueTuple.getBytes(1))));

Uh oh!

Conversation

normen662 commented Apr 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

GuardiANN

Summary

Architecture

Operations

Maintenance Tasks

Configuration

Other Changes

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

normen662 Jun 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

alecgrieser left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alecgrieser left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

normen662 commented Apr 14, 2026 •

edited

Loading

normen662 Jun 25, 2026 •

edited

Loading