Skip to content

Batch aggregation (two-level batching)#522

Open
karasikov wants to merge 72 commits into
masterfrom
mk/query
Open

Batch aggregation (two-level batching)#522
karasikov wants to merge 72 commits into
masterfrom
mk/query

Conversation

@karasikov
Copy link
Copy Markdown
Member

@karasikov karasikov commented May 16, 2025

No description provided.

@karasikov karasikov requested a review from hmusta May 16, 2025 16:38
Copy link
Copy Markdown
Collaborator

@hmusta hmusta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fine for now, but I'm gonna change it in #521 since it'll cause race conditions there

Comment thread metagraph/src/annotation/binary_matrix/multi_brwt/clustering.cpp Outdated
Comment thread metagraph/src/annotation/binary_matrix/multi_brwt/clustering.cpp Outdated
@karasikov karasikov changed the title hide progress bar in extracting contigs from the query graph Batch query improvements May 19, 2025
karasikov and others added 30 commits May 26, 2025 00:32
Also changed processing: always map to the full dbg contigs pulled from batches
Resolved conflict in src/cli/load/load_annotated_graph.cpp:
kept mk/query's split (load_annotation + load_coord_to_header
free functions for parallel graph/annotation loading) and
adopted master's removal of redundant high-level error logs
(now reported by the lower-level loaders, per #611).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
# Conflicts:
#	metagraph/src/cli/load/load_annotated_graph.cpp
#	metagraph/src/cli/query.cpp
When assembling with annotation-based masks, the graph and annotation
are independent loads. Use load_graph_with_async_annotation (from
#621) so they overlap instead of running serially.

Graph-only assembly (no annotators) keeps load_critical_dbg.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ion API

Replace the two ad-hoc ThreadPool(1, 1) wrappers (one for graph, one
for annotation in the batched-query path) with the existing async
loaders. Both graph and AnnotatedDBG are kicked off together at the
top of query_graph; queries can begin as soon as the graph is ready,
with the annotation arriving in the background.

Hoist coord_to_header_exists out of the per-file loop -- it's
file-independent. Move check_annotation onto the stub annotation
(temp_anno) so type validation runs immediately rather than waiting
for the full data load. As a side effect, non-batched runs are now
also validated, where they previously weren't.

Expose async_load_critical_dbg as public API. Drop load_annotation
from the public header now that no external caller uses it -- it
remains as an internal helper in the anonymous namespace.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The extraction was driven by the merge resolution, but no external
caller needs it after query.cpp was switched to use
load_graph_with_async_annotation. Inlining it back into
build_annotated_dbg restores master's row_diff/CTH parallelism: CTH
load now runs concurrently with the graph await again, instead of
serially after it.

Net diff vs master: just the public exposure of async_load_critical_dbg.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
No callers in the codebase.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The flag returned by sequence_to_kmers indicates whether the k-mer
is well-formed (no invalid characters), not whether it's already in
the graph. The latter check happens later via insert(). Rename for
accuracy.

Also drops two unused additions from earlier work-in-progress: a
parallel add_sequence overload that exposed k-mer position to the
on_insertion callback (no external callers), and a
VectorRowBinMat<Vector<uint32_t>> template instantiation (also no
callers).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Resolves conflicts with the query/refactoring PR that landed on master.
mk/query went further than #625 (split construct_query_graph into a
construct_contigs helper, made the annotation lazy via a getter, switched
batched_query_fasta to take the file path directly so the caller doesn't
own a FastaParser), so kept HEAD's signatures everywhere they diverged.

Key resolution choices:
  * query.cpp/hpp: kept HEAD's batched-query and async-annotation flow;
    took master's relocation of score_kmer_presence_mask out of the file
    (it now lives in graph/alignment/score_kmer_presence_mask.cpp); kept
    master's `align::score_kmer_presence_mask` qualified call sites.
  * benchmark_query.cpp: kept HEAD's getter-based construct_query_graph
    forward declaration so the benchmark links against the new signature.
  * tests/annotation/test_annotated_dbg.cpp: dropped the local
    score_kmer_presence_mask test since it now lives in
    tests/graph/test_score_kmer_presence_mask.cpp.
  * Removed a stray graph.reset() leftover from the pre-async loader.

Verified: unit_tests (3,643) and integration_tests (3,594) pass.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants