Skip to content

A1-B08: MCP Server Logging #1069

@MitchellShiell

Description

@MitchellShiell

B8: MCP Server Logging

Every execute_query invocation can produce a structured NDJSON record capturing three things:

  • Input: the researcher's natural-language question, plus the session, LLM model, and catalogue it ran in.
  • Outcome: the SQON the LLM produced, its validation result, the researcher's confirmation, and Arranger's response (status, record count, content digest).
  • Provenance: when the query ran, which Arranger instance served it, and the catalogue's index state at that moment.

Logging is profile-driven: a single LOG_PROFILE environment variable selects what is captured and where it is written. Three profiles are defined:

  1. off (no logs)
  2. operational (system metadata only, written to an admin-owned file)
  3. testing (operational fields plus researcher-content fields, server-stored, for dev environments and consented user testing).

The Arranger response payload (the rows returned by a query) is never written by this logger. Provenance fields plus responseDigest (hash of records returned) are sufficient to re-execute a query and verify the result.

These records serve two primary uses in Aim 1:

  • User testing: Records produced under the testing profile during user testing drive KPI collection. Researchers consent to research-content capture will be obtained at enrollment.
  • Regression baselines. Records produced under the operational profile provide information about system behaviour over time. The responseDigest field lets us detect when the same query starts returning different rows without ever storing the rows themselves.

The schema is versioned (schemaVersion: "1"). Field additions are non-breaking; removals or renames require a version increment.

Note

Two further requirments may overlap and extend this in Aim 2, conversational session memory and reproducible research packages (RRP)

Proposed Log Schema

The Min. profile column is the lowest profile under which the field is written; fields captured by operational are also captured by testing.

Field Min. profile Description
schemaVersion operational Record format version.
timestamp operational ISO 8601 time of query execution.
sessionId operational Links related queries within a single research session.
model operational LLM model name and version.
catalogue operational Which catalogue was queried.
validationResult.kind operational valid or invalid. Error text requires the testing profile or higher.
responseStatus operational GraphQL response status.
recordCount operational Number of records returned.
responseDigest operational SHA-256 of the canonicalised result set. Lets drift checks detect when the same query returns different rows without persisting the rows themselves.
dataIndexedAt operational Timestamp of the catalogue's last Elasticsearch index update at query time.
dataRelease operational Operator-configured release version string, if set; null otherwise.
arrangerUrl operational The Arranger instance the query was executed against.
naturalLanguageInput testing The researcher's original question.
sqon testing The complete SQON object as constructed by the LLM.
validationResult.errors testing Validation error detail (may echo SQON contents).

Together, dataIndexedAt, dataRelease, catalogue, arrangerUrl, and responseDigest form the minimum provenance record needed to cite a query: the query ran against data indexed on a specific date, from a specific release, on a specific platform, and returned the rows that hash to this digest.

Logging profiles

Behaviour is controlled by the LOG_PROFILE environment variable:

Profile Fields written Destination Use case
off (default) None Production where no logs are required.
operational System metadata only (see schema) SQON_LOG_FILE Production with operational telemetry for regression baselines (E01).
testing Operational + researcher-content fields SQON_LOG_FILE Dev environments + user testing, with researcher consent obtained at enrollment.

Default is off. The profile must be set explicitly and is logged at server startup for operator audit.

Note

Records are written as NDJSON (one JSON object per line). The operational and testing profiles write to the path set by SQON_LOG_FILE.

Acceptance criteria

  • LOG_PROFILE accepts off, operational, and testing; default is off.
  • Server logs the active LOG_PROFILE at startup for operator audit.
  • Each execute_query invocation produces a structured NDJSON record under any non-off profile.
  • operational profile writes only the operational fields listed in the schema table; researcher-content fields are excluded.
  • testing profile writes both operational and researcher-content fields.
  • testing profile must be set explicitly; it never becomes active by default.
  • Schema includes a schemaVersion field set to "1".
  • SQON_LOG_FILE is configurable with a sensible default.
  • sessionId is generated such that records can be reliably attributed to a single researcher session across multiple execute_query invocations.
  • responseDigest is computed deterministically from a canonicalised form of the Arranger result set so the same rows always produce the same hash.
  • The logger never writes Arranger response data (only the digest).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions