Skip to content

Bug: CitationRegistry global state causes cross-request citation contamination #394

@CrepuscularIRIS

Description

@CrepuscularIRIS

Bug Description

CitationRegistry stores its state in a class-level dictionary (_instances) shared across all requests. init_citation_registry() calls CitationRegistry.reset(), which wipes this global dict for every concurrent session. When two requests run the init_citation_registryassign_citation_ids_stateful pipeline at the same time, one request's reset destroys the other's in-progress state, producing corrupted or swapped citation IDs in responses.

Location

servers/custom/src/custom.py, ~line 405:

class CitationRegistry:
    _instances: Dict[int, Dict[str, Any]] = {}  # class-level, shared across all requests

    @classmethod
    def reset(cls):
        cls._instances = {}                      # wipes state for ALL concurrent sessions

~line 435:

@app.tool(output="q_ls->q_ls")
def init_citation_registry(q_ls: List[str]) -> Dict[str, Any]:
    CitationRegistry.reset()                     # global reset triggered per request
    return {"q_ls": q_ls}

Reproduction

  1. Send two concurrent requests that both invoke init_citation_registry followed by assign_citation_ids_stateful.
  2. Request A calls reset() while Request B is mid-way through assign_citation_ids_stateful.
  3. Request B's accumulated citations are wiped; it returns citation IDs starting from 1 for documents it had already assigned higher IDs.

Impact

Users receive incorrect citation numbers in answers, causing documents to be cited under wrong IDs. In multi-tenant deployments this also constitutes a cross-session information leak (one user's citation state can be reset by another user's request).

Suggested Fix

Scope registry state per request using a unique session/request ID rather than global class state:

def init_citation_registry(q_ls: List[str], request_id: str) -> Dict[str, Any]:
    CitationRegistry._instances[request_id] = {}
    return {"q_ls": q_ls, "request_id": request_id}

Or pass a fresh CitationRegistry instance through the pipeline context instead of using class-level storage.


Found via automated codebase analysis. Happy to submit a PR if this is confirmed.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions