Skip to content

GuardiANN#4083

Open
normen662 wants to merge 158 commits into
FoundationDB:mainfrom
normen662:guardiann-poc
Open

GuardiANN#4083
normen662 wants to merge 158 commits into
FoundationDB:mainfrom
normen662:guardiann-poc

Conversation

@normen662

@normen662 normen662 commented Apr 14, 2026

Copy link
Copy Markdown
Contributor

GuardiANN

Guided Updatable cluster Assignment and Routing via Distance-based Indexing for ANN searches.

Summary

Introduces GuardiANN, a new approximate nearest neighbor (ANN) vector structure built on FoundationDB.

GuardiANN partitions vectors into clusters with centroids indexed in an HNSW graph, supporting transactional insert, search, and delete operations with lazy background maintenance.

Architecture

Vectors are organized into clusters. Each cluster maintains:

  • primary vector references — the authoritative copies, with running statistics
    (mean, variance, max distance to centroid) tracked via RunningStats (Welford's online algorithm)
  • replicated vector references — copies from neighboring clusters kept for improved search recall,
    scored by a replication priority metric
  • collapsed references — deduplicated representatives for groups of identical vectors (>100 duplicates by default),
    with individual IDs stored in a separate subspace

Cluster centroids are indexed in an HNSW graph for fast nearest-cluster lookup. Cluster metadata tracks vector counts, replication state, and running distance statistics.

Operations

  • Insert (Insert.java) — finds the nearest cluster(s), writes primary and replicated references, optionally
    samples vectors for RaBitQ centroid computation
  • Search (Search.java) — five-stage pipeline: candidate cluster selection → distance-ratio pruning → vector retrieval with maxEver-based ball pruning → collapsed reference expansion → metadata validation and enrichment. Supports both top-k search (kNearestNeighborsSearch) and resumable outward search (searchOrderedByDistance)
  • Delete (Delete.java) — locates references by probing candidate clusters, reads actual vector references to determine primary vs. replicated status, handles collapsed vectors by cleaning up the collapsed vector ID subspace

Maintenance Tasks

Structural maintenance runs lazily, piggy-backed on insert/delete transactions:

  • SplitMergeTask — when a cluster exceeds primaryClusterMax or drops below primaryClusterMin, evaluates candidate repartitionings (1→2 / 2→3 splits, or 2→1 / 3→2 merges) using bounded k-means, scores them via SplitMergeEvaluator (composite quality metric over SSE gain, separation, balance, low-margin rate), and executes the best candidate
  • ReassignTask — recomputes vector-to-cluster assignments and replication for a target cluster, fixing
    underreplicated primaries and cleaning up excess replicas
  • CollapseTask — detects and collapses large groups of identical vectors into single representatives
  • BounceTask — coordinates dependent task execution order (executes one dependent per bounce, re-enqueues
    itself until all dependents complete)

Configuration

All algorithm parameters are configurable via Config (record class with builder):

  • Cluster size bounds, replication thresholds, replication priority minimum
  • Per-operation knobs: search concurrency, insert candidate clusters, delete candidate clusters, split neighborhood size,
    merge neighborhood sizes, k-means iterations/restarts, reassign neighborhood sizes, collapse duplicate threshold
  • Search parameters (searchMaxClusters, searchMinClustersBeforePruning, searchDistanceRatioCutoff) are query-time
    parameters passed per call, not stored in config
  • RaBitQ quantization support with configurable extra bits per dimension

Other Changes

  • HNSW Config — converted from final class to record (same pattern as Guardiann Config), updated all accessor call sites
  • RunningStats (renamed from RunningStandardDeviation) — added maxEver tracking (exact on add/combine, stale upper bound
    after remove/subtract), converted to record
  • TopK / DistinctTopK — separated into two classes with min()/max() factory methods; fixed worstElement() and simplified
    toSortedList() in DistinctTopK
  • VectorId — made Comparable for use in sorted collections

Test Plan

  • SiftTest — insert 100k SIFT vectors, verify search recall
  • DebugIndexTest — end-to-end insert + search + assignment quality checks
  • TopKTest — unit tests for TopK/DistinctTopK correctness
  • Verify split/merge/reassign/collapse task execution via task listener counters
  • Verify delete removes both regular and collapsed vector references

Comment thread .idea/GrepConsole.xml Outdated
Comment thread fdb-extensions/src/main/java/com/apple/foundationdb/async/guardiann/TopK.java Outdated
* @return a future completing with the retained top-K elements, sorted from best to worst
*/
@Nonnull
public CompletableFuture<List<T>> collect(final AsyncIterable<T> iterable,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this method, and collectRemaining do not seem to be used anywhere. Can we remove them?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's a pattern used throughout MoreAsyncUtil. I am using it here as well to provide the helpers. I actually had a need for them temporarily during the development of this PR. Since TopK and DistinctTopK has been moved to async/common for general usage I keep these methods here as well. I will, however, provide proper testing, so these methods are not actually technically unused.

* copy and its replicated copies) and provides a stable secondary ordering. Instances order by primary key first,
* then by UUID.
*/
class VectorId implements Comparable<VectorId> {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a textbook record to me, can you convert it to a record?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

}

@Override
public boolean equals(final Object o) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this takes additionalValues into account, at the same time, this inherits Comparable indirectly from VectorId, including the implementation of compareTo of the parent which does not even know about additioalValues, which breaks Comparable contract.

https://docs.oracle.com/en/java/javase/25/docs/api/java.base/java/lang/Comparable.html

"The natural ordering for a class C is said to be consistent with equals if and only if e1.compareTo(e2) == 0 has the same boolean value as e1.equals(e2) for every e1 and e2 of class C."

I think this might be fine, but definitely worth some documentation.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please take a look at how this is solved now -- I changed the scheme to prefer composition over inheritance. The problem went away. The whole thing looks cleaner now.


return primitives.fetchClusterMetadata(transaction, getTargetClusterId())
.thenCompose(clusterMetadata -> {
if (clusterMetadata == null) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as reassign task, does this warrant a logger warning?

@normen662 normen662 Jun 25, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could warn on it but it's something that can occur naturally. The cluster might have gotten merged in the meantime (between enqueueing the task and executing it). In that case this can happen and is benign.

}

final EnumSet<ClusterMetadata.State> states = clusterMetadata.states();
if (!states.contains(ClusterMetadata.State.COLLAPSE)) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it possible (and ok) to have a state that is, for example, COLLAPSE + SPLIT or COLLAPSE + REASSIGN?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ReassignTask and SplitMergeTask both will both check before they do any mutations and abort in the presence of COLLAPSE. While splitting if we decide to attempt a collapse we actually enqueue COLLAPSE and BOUNCE[SPLIT] and leave the state in COLLAPSE only. If CollapseTask encounters COLLAPSE is sufficient as it will enqueue a SPLIT afterwards anyway meaning that if something else was already waiting to do a task on something that also is in the state of COLLAPSE, it will just get clobbered and that is ok.

// If this is a replica we may as well subsample it like reassign would normally do it.
// Collapsing a cluster should also work in an identical way if we didn't do that here.
//
replicatedTopK.add(vectorReference);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels like this is not related, or at least orthogonal, to collapse task logic?

IIUC this causes lower priority replicas to be removed until the number of replicas is less than specific threshold (replicatedClusterTarget?).

If I am right, this seems like a cleanup task that should have its own identity instead of piggypacking on collapse here? If you feel this is justified, could you please add tis behavior to the documentation of CollapseTask?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am doing this here because I have all the info needed, and reading the vectors is somewhat expensive. I will document it.

//
final ImmutableSetMultimap<UUID, UUID> vectorReferenceToVectorSignatureMap =
vectorReferenceToSignatureMap(vectorReferences);
final ImmutableSetMultimap<UUID, UUID> vectorSignatureToVectorUuidMap =

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a multimap holding all vectors sharing the same signature, so they are .... identical and collapsible?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They use the same signature (the same vector), they are collapsible.

@alecgrieser alecgrieser left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By file, I'm only about 1/4 of the way through this, but I figured I'd give a preliminary round first. Overall, I think this makes sense, and most of this is about documentation or clarification.

Comment thread .idea/misc.xml Outdated
Comment thread fdb-extensions/src/main/java/com/apple/foundationdb/async/common/ResultEntry.java Outdated
Comment thread fdb-extensions/src/main/java/com/apple/foundationdb/async/guardiann/TopK.java Outdated
public List<T> toSortedList() {
final PriorityQueue<T> copy = new PriorityQueue<>(queue);
final int size = copy.size();
final T[] array = (T[]) new Object[size];

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that you then copy this into an ImmutableList anyway, I think you could alternatively use ImmutableList.newBuilderWithExpectedSize() and then return builder.build().reverse(), which returns an in-place reversed view of the list rather than making a copy

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

Comment on lines +300 to +301
if (assignedVectorReference.isPrimaryCopy() != vectorReference.isPrimaryCopy() ||
assignedVectorReference.isUnderreplicated() != vectorReference.isUnderreplicated()) {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it need to be rewritten if the replicationPriority changes? Should this just be turned into a .equals check on the two objects?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it should incorporate replicationPriority as suggested, I added that (also to mask out noise from SIMD operations it has to be in a range and not just bit-equal), however, we should not do a general equals() here as we dont want to also compare the vector and that equality is already guaranteed when we get here.

* stable post-quiescence shape to inject into),</li>
* <li>find the cluster a fixed query vector would naturally belong to, using the exact
* neighborhood-discovery logic {@code Insert} uses (stable across reseeds — cluster UUIDs
* change with every split, but the "nearest cluster to vector v" choice does not), and</li>

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this line deliberately removed? I think the list items don't parse any more without it

@alecgrieser alecgrieser left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another additional set of comments. I'm now about 1/3 of the way through. Sorry if some of these have become outdated since I left them; I haven't gone through all of them to validate that the latest set of concurrent changes didn't make them obsolete

Comment on lines +424 to +425
final int numInnerNeighborhood,
final int numOuterNeighborhood) {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's some singular/plural confusion in both the comments and also variable names in this method. I think both the inner and outer neighborhood lists can contain multiple elements, so these should probably all be "neighborhoods"

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did a big renaming, please check.

if (foundPrimaryCluster) {
final int cappedNumInnerNeighborhood = Math.min(numInnerNeighborhood, clusterMetadataWithDistances.size());
return new Neighborhoods(clusterMetadataWithDistances.subList(0, cappedNumInnerNeighborhood),
clusterMetadataWithDistances.subList(cappedNumInnerNeighborhood, clusterMetadataWithDistances.size()));

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this return the whole tail as the outer neighborhoods instead of incorporating the numOuterNeighborhoods list? Is it assuming that that limit is baked into the input?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is indeed an oversight which didn't actually matter as every caller implicitly or explicitly capped it already. In order to depend on that in the future, I capped it here though.

* enqueued
* @param underreplicatedPrimaryClusterMax maximum number of under-replicated primary vectors in a cluster, overflow
* will result in a reassign task to be enqueued
* @param replicatedClusterMaxWrites maximum number of writes of replicated vectors to a cluster

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"writes" here is an interesting clarification. Is this supposed to be the maximum number of replicated vectors in a cluster, or is this a limit on the amount of I/O (implied by "writes"--like per transaction)? Or is this the maximum number of clusters that a single vector will be written to (as opposed to vectors per cluster)?

Comment on lines +48 to +49
* @param replicationStatsMinSampleSize minimum number of primary vectors in a cluster before its distance statistics
* (mean/standard deviation) are trusted for the replication priority z-score term

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the "trust" here about? The actual stats are accurate, right, so is this more about drift or about small sets have artificially high variance or something?

}

@Nonnull
public ConfigBuilder setMetric(@Nonnull final Metric metric) {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: We're a bit inconsistent on this in our codebase, but it may be good practice to label these setters as @CanIgnoreReturnValue

}
final long hi = readLongBigEndian(keyAsBytes, 0);
final long lo = readLongBigEndian(keyAsBytes, 8);
return new UUID(hi, lo);

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get that in some level, we're just using this as a signature of 128 bits in a convenient form, but should this clear out the appropriate bits so that it conforms with UUID type v4? I guess that takes us down to 126 bits of entropy, but it may avoid confusion with other types of UUIDs

Comment on lines +374 to +375
return new ClusterReference(valueTuple.getUUID(0),
storageTransform.transform(StorageHelpers.vectorFromBytes(config, valueTuple.getBytes(1))));

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I left a similar comment on one of the HNSW PRs, but it is a bit of shame that using Tuples is the path of least resistance here, rather than, say Protobuf, for all of the things that don't need to be in FDB keys (and therefore we don't care about sacrificing space for maintaining serialization order). I think the easiest path might be to split out this (and/or the HNSW) its own subproject which could also depend on Protobuf. We could theoretically change that later, though not too much later, as once we start serializing stuff in real indexes, this becomes effectively impossible to change

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understood. I was talking to @ScottDugas the other day about moving all vector-related stuff into its own subcomponent. Then we could also do that.

return Tuple.from(vectorId.uuid());
}

static double replicationPriority(@Nonnull final Config config,

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm trying to understand what this value is actually encapsulating. It's something like a weighted average of the number of standard deviations from the centroid as well as the actual distance itself. It's a bit of an odd metric, as it's combining data from two different unit systems. Is this one of those things that's taken out of a paper?

It might be good to add a bit more docs on the "why" here, or at least a reference to where this comes from

Comment on lines +445 to +447
static boolean isOccluded(@Nonnull final DistanceEstimator estimator,
@Nonnull final ClusterMetadataWithDistance replicationCandidate,
@Nonnull final List<ClusterMetadataWithDistance> selectedReplicationClusters) {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could probably use Javadoc. From the implementation, I take it it's trying to figure out if for a given vector v that is being replicated into a cluster c if there's another cluster c such that |v - c| > |c - c|--that is, the two clusters are closer than the vector is to the cluster it's trying to replicate into. It's not immediately obvious to me why that would automatically mean that we want to reject c or c, though I could see why from the triangle inequality, we'd only want to pick the closest cluster to v

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants