Skip to content

[Low/Nit] Repo-wide review: minor findings tracking issue #41

Description

@gratus907

Severity: Low / Nit (tracking issue — bundles minor findings from a full repo-wide review)

These are lower-severity items from a repo-wide multi-vector review (see the Critical/High/Medium findings filed as separate issues). Grouped here since each is small in isolation; split out into its own issue before starting work on any one of them.

Spaces

  • TupleSpace/RecordSpace use identity equality while every sibling space (RealSpace, IntegerSpace, CategoricalSpace, PermutationSpace, ArraySpace) is a frozen dataclass with value equality/hash — TupleSpace(RealSpace(0,1)) == TupleSpace(RealSpace(0,1)) is False. Degrades Problem equality for composite-space problems to identity. (spaces/composites/tuple_space.py:22-55, record_space.py:23-65)
  • RealSpace.__post_init__ normalizes bounds via bare float(), accepting invalid definitions like RealSpace("0", "1") and RealSpace(True, True). (spaces/scalar.py:59-88)
  • Compiled vs. generic geometry disagree on the permutation distance law — compiled PermutationSpaceGeometry scores 0/1 positional mismatch, generic fallback scores magnitude (|l-r|/(size-1))^2. A third-party space wrapping a PermutationSpace without a geometry provider gets materially different diversity distances. (spaces/geometry/compile.py:126-172 vs geometry/permutation.py:50-60)
  • ArraySpace permits length=0 while TupleSpace/RecordSpace/PermutationSpace require non-empty declarations; a zero-length array poisons StructuredSpaceDiversityMetric.distance (raises on first pairwise distance). (spaces/composites/array_space.py:81-83)
  • Duplicated/dead helpers: is_builtin_structured_space/is_builtin_child_space are byte-identical (geometry/leaf.py:174-229); require_real_candidate/require_integer_candidate/require_record_candidate exist in both composites/adapters.py and geometry/leaf.py with divergent strictness; StructuredSearchSpace.is_active_leaf_path has zero callers; CategoricalSpace.sample allocates a throwaway IntegerSpace per draw instead of using the already-imported random_state_randint.
  • When a direction-free EvaluationProtocol is supplied, direction is stored but not meaningfully bound — downstream consumers read problem.direction as authoritative regardless, a silent dual source of truth. (problem.py:229-236)

Study / execution / evaluators

  • Stale-async skips the max_evaluations < 0 guard the sync path and fast path both have — silently returns an empty report instead of raising. (study/stale_async.py:287-331)
  • store_completion_group is duplicated verbatim across two modules; the drain-to-ordered-outcomes loop is triplicated across AsyncEvaluator.evaluate, evaluate_batch_exact_async, and StudyExactAsyncStepSession.finish; the records/refinements/budget/trace assimilation block is duplicated between execution.run and run_stale_async. (study/common.py:61-106 vs evaluators/async_evaluator/artifacts.py:194-235)
  • Joblib infrastructure retry/suspend-resume silently re-executes every not-yet-consumed request on resume, including ones already completed in workers whose results were discarded — undocumented at-least-once semantics with no accounting; a wall-time/cost-metered objective double-bills. (evaluators/joblib/asynchronous.py:279-321,479-515)
  • _classify_joblib_failure matches infrastructure failures by exception class name ("BrokenProcessPool", "TerminatedWorkerError"); a coincidentally-named user exception is misclassified as infrastructure and silently retried. (evaluators/joblib/asynchronous.py:56-72)

CSA

  • Checkpoints silently drop trace_state on restore — a traced run resumed from checkpoint stops tracing with no error. (csa/engine/state.py:329 / optimizer.state_from_dict)
  • ask() crashes on a structured seed whose active_leaf_paths() is empty (all-conditional branches inactive) instead of falling back to the plain operator path. (csa/engine/ask.py:119-130 with operators/editing.py:48-50)
  • With biased potential enabled and a degenerate bank (all identical candidates → distance scale 0), _resolve_biased_sigma2 raises ValueError inside tell() instead of degrading gracefully. (csa/scoring/model_state.py:513-515)
  • Dead code: module-level infer_score_gap is imported nowhere, shadowed by same-named keyword params, duplicates optimizer.infer_score_gap_for_entries (csa/banking/update/logic.py:577-595); the full-bank branch of admit_observation is unreachable from the engine and duplicates admit_full_bank_observation with divergent semantics (csa/banking/update/admission.py:64-119).
  • newcomer_mask name is inverted — actually contains preserved-elite indices, masking them so newcomers seed first; misleading for future edits. (csa/engine/boundary.py:184-196)
  • CSAPendingProposals.get is a linear scan over a tuple — O(p·m) tell-side validation, quadratic for large async batches. (csa/engine/state.py:79-83)

Artifacts

  • RunReport/RunResult/NondominatedRunSurface/Trace direct constructors don't canonicalize sequence fields to tuples (records/observations/events, unlike refinements which is normalized) — externally mutable, unhashable "frozen" artifacts whose construction-time invariants can be invalidated after the fact. (artifacts/terminal.py:248,386-387,623-624,213)
  • terminal_surface_setstate silently fills unknown/future fields with None on unpickling older payloads instead of failing loudly; unpickling then replace-ing with new refinements can raise a surprising TypeError (the _candidate_equal field is dropped by design on pickle). (artifacts/terminal.py:818-819)
  • The shared require_json_* validators in json_types.py are used only by progression/*; every other codec hand-rolls isinstance checks with inconsistent semantics (e.g. accepting True as an int/float where the shared helper wouldn't). Already-drifted duplicated boundary validation.

Population algorithms

  • ~350 lines of byte-identical generational scaffolding (ask/tell/_ask_initial_population/_materialize_generation/_generate_child/tournament selection/_sort_population) duplicated across ga, clearing_ga, species_ga, restricted_tournament_ga, and (partially) de optimizers — the species-GA bugs filed separately live exactly in the block that diverged. Consider extracting a shared base parametrized by member type + survival-policy hook.
  • ScipyMinimizeResult.__post_init__ raises ValueError on any non-finite scipy result, aborting the whole proposal batch instead of degrading to the original evaluated proposal. (local_search/scipy/results.py:36-56)
  • sample_exchange_count's documented fractional cap is never attainable due to an off-by-one exclusive randint bound (e.g. max_segment_fraction=0.5, size 10 → max segment length 4, not 5). (population/permutation/operators.py:95-99)

Packaging / docs / tests

  • CHANGELOG's 0.1.0 "Added" list omits TupleSpace and PermutationSpace (both exported and documented as 0.1.0 features), and cites StructuredSearchSpace where the root facade exports SearchSpacestability.md calls the changelog "the authoritative list" of supported surface. (CHANGELOG.md:19-22)
  • tests/surface/test_packaging_metadata.py unconditionally import tomli, making the whole module a collection error (not a skip) on any env without the test extra, even on 3.11+ where stdlib tomllib would suffice; also uses a CWD-relative Path("pyproject.toml").
  • No __version__ attribute on the package (only in build metadata); no PyPI classifiers/project.urls/authors in pyproject.toml.
  • docs.yml only runs mkdocs build --strict — nothing ever deploys the site (no gh-pages/Pages job, no site_url). Build-only is safe but may not be intended if docs are meant to be published.
  • The documented checkpoint save pattern (open("checkpoint.json", "w") + json.dump) overwrites the previous checkpoint in place — a crash mid-write leaves a truncated, unloadable file with the prior good checkpoint gone. Docs-only; recommend temp-file + os.replace.

Fix direction: triage into individual issues as capacity allows; several items above (species-GA scaffolding duplication, CSA aligned-bank-views abstraction) are natural follow-ups once their corresponding High-severity issues are fixed.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions