Skip to content

Validator rejects unit/ucumUnit on numeric+null unions and on int8/uint8/int16/uint16/float8 #175

@clemensv

Description

@clemensv

unit / ucumUnit wrongly rejected on numeric+null unions and on int8/uint8/int16/uint16/float8

SchemaValidator(extended=True) rejects the unit keyword on schema elements
whose underlying type is numeric but whose type is either (a) a union that
includes "null", or (b) one of the smaller numeric primitives. Both
contradict the Units extension spec and the validator's own numeric handling
elsewhere.

Spec basis

The Units draft ("The unit Keyword"): "the unit keyword MAY be used as an
annotation on any schema element whose underlying type is numeric."
A
["double","null"] union's underlying value type is numeric ("null" only
encodes explicit-null on the wire), so unit is valid on it. The normative
Units meta-schema (UnitsPropertyAddIn) adds unit to every Property with
no numeric gate at all.

Reproducer (verified)

from json_structure import SchemaValidator

base = {
    "$schema": "https://json-structure.org/meta/extended/v0",
    "$id": "https://example.com/repro",
    "$uses": ["JSONStructureUnits"],
    "type": "object", "name": "Repro", "properties": {},
}
def run(label, prop):
    s = dict(base); s["properties"] = {"f": prop}
    errs = SchemaValidator(extended=True).validate(s)
    hits = [f"{e.code} {e.path}" for e in errs if (e.path or "").endswith("/unit")]
    print(f"[{label}] type={prop.get('type')} -> {hits or 'OK'}")

run("nullable-double+unit", {"type": ["double", "null"], "unit": "kilometer"})
run("bare-double+unit",     {"type": "double",           "unit": "kilometer"})
run("int8+unit",            {"type": "int8",             "unit": "celsius"})
run("string+null+unit",     {"type": ["string", "null"], "unit": "kilometer"})  # SHOULD fail

Output:

[nullable-double+unit] type=['double', 'null'] -> ['SCHEMA_CONSTRAINT_TYPE_MISMATCH #/properties/f/unit']   <-- BUG
[bare-double+unit]     type=double             -> OK
[int8+unit]            type=int8               -> ['SCHEMA_CONSTRAINT_TYPE_MISMATCH #/properties/f/unit']   <-- BUG
[string+null+unit]     type=['string', 'null'] -> ['SCHEMA_CONSTRAINT_TYPE_MISMATCH #/properties/f/unit']   <-- correct

Reproduced on 0.6.3.dev2 and on published 0.7.0.

Root cause — sdk/python/src/json_structure/schema_validator.py

_check_units_keywords (unit gate, ~L833):

type_name = obj.get("type")
if not isinstance(type_name, str) or type_name not in numeric_types:
    self._err("'unit' can only appear in numeric schemas.", f"{path}/unit",
              ErrorCodes.SCHEMA_CONSTRAINT_TYPE_MISMATCH)

_check_ucum_unit_keyword (~L803) has the identical shape.

Two problems:

  1. Unions are rejected. A union type is a list, so
    isinstance(type_name, str) is False and every numeric+null field fails.
    By contrast, the Validation gate at ~L738 only applies its constraint checks
    if isinstance(tval, str) — it skips (tolerates) unions. So minimum/
    maximum on ["double","null"] is accepted while unit on the same field
    is rejected — internally inconsistent.
  2. The numeric set is under-inclusive. The units gates define
    numeric_types = {number, integer, float, double, decimal, int32, uint32, int64, uint64, int128, uint128} (L810-813 and L798-801), omitting
    int8, uint8, int16, uint16, float8 — which are present in the canonical
    numeric set at L740-742 used by minimum/maximum. So unit on an int8
    is rejected even though minimum on the same int8 is accepted.

Proposed fix

Use the canonical numeric set and accept a union whose non-null members are
all numeric, in both _check_units_keywords and _check_ucum_unit_keyword:

def _unit_eligible(type_value, numeric_types):
    if isinstance(type_value, str):
        return type_value in numeric_types
    if isinstance(type_value, list):
        non_null = [m for m in type_value if m != "null"]
        return bool(non_null) and all(
            isinstance(m, str) and m in numeric_types for m in non_null
        )
    return False
# unit gate
if not _unit_eligible(obj.get("type"), NUMERIC_TYPES):
    self._err("'unit' can only appear in numeric schemas.", f"{path}/unit",
              ErrorCodes.SCHEMA_CONSTRAINT_TYPE_MISMATCH)

where NUMERIC_TYPES is the full L740-742 set (ideally hoisted to a module
constant and shared with the Validation gate). This is strictly more permissive
than today, so it cannot break any schema that already validates; the
string|null control above still correctly fails.

Impact

This blocks the entire clemensv/real-time-sources feeder fleet — 524 fields
across 54 feeders
use the spec-valid {"type":["<numeric>","null"],"unit":…}
pattern (the bridges emit JSON null for absent measurements). Tracked
downstream as clemensv/real-time-sources#1391; we are carrying a temporary
in-repo compat shim that drops only these false-positive errors until a fixed
SDK release is available, after which we will bump the pin and remove the shim.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions