Skip to content

Temporal range key-bracket binding: foo[t1...t2], foo[...t2], foo[t1...] (grammar/AST/parse only) #74

Description

@jtnelson

Summary

I propose extending the key-bracket binding grammar so a key can bind to a
temporal range rather than a single instant. Today foo[t] binds a key
to the value(s) that existed at one point in time (parsed into a
TemporalKeySymbol wrapping a single TimestampSymbol, or folded onto a
NavigationKeyStop). This ticket adds three new forms:

  • foo[t1...t2] binds the key to a half-open interval [t1, t2)
    (start-inclusive, end-exclusive).
  • foo[...t2] is the open-start form, bounded only on the end side
    (exclusive).
  • foo[t1...] is the open-end form, bounded only on the start side
    (inclusive).

This is the grammar, AST, and parse side only. It defines the syntax,
the new AST node, and the canonical, re-parseable string form. The AST
records the two endpoints; it does not decide what a range read returns.
Server-side evaluation (resolving a range binding against stored revisions)
is a separate concourse change and is explicitly out of scope here, and the
precise read semantics described below under "Evaluation semantics are
deferred" are part of that change, not this one.

The motivating use case is a data-sync capability in cinchapi-server, a
load-balanced service whose instances scale up and down on Kubernetes.
Connector configuration (sources, credentials, schedule, and a lock field)
lives in a shared Concourse instance with no external coordinator: each
instance claims an available connection, records a lock, runs the sync, and
releases the lock. Because an instance can die mid-sync and never release
its lock, a lock must be treated as expired when it was last asserted longer
ago than a timeout window, and that staleness must be decided at query time
without mutating stored state. A range binding on the lock key is the
primitive that lets a caller scope a lock read to a freshness window in a
single key expression. Whether that window is interpreted as "values
present at some moment in the window" or "values whose write instant falls
in the window" is an evaluation-semantics decision that belongs to the
concourse change; see "Evaluation semantics are deferred" below.

Proposed API / Syntax

I propose three new bracket-content shapes, recognized only where the
existing single-timestamp bracket binding is already accepted. The
range separator is the literal token ... (three dots).

key[<start>...<end>]   closed range:  [start, end)
key[...<end>]          open start:    bounded on the end side (exclusive)
key[<start>...]        open end:      bounded on the start side (inclusive)

Precise parse-side semantics:

  • Inclusivity. The interval is half-open, start-inclusive and
    end-exclusive, matching Concourse's existing BETWEEN operator
    convention (verified in IndexRecord: the BETWEEN branch scans
    subSet(start, true, end, false) and guards historical matches with
    >= start and < end). An endpoint equal to start is included; an
    endpoint equal to end is excluded.
  • Endpoint values. Each endpoint is a timestamp in exactly the forms
    the single-timestamp bracket already accepts: a microsecond epoch, a
    date string, or a natural-language phrase. Each endpoint is parsed
    independently by NaturalLanguage.parseMicros
    (src/main/java/com/cinchapi/ccl/util/NaturalLanguage.java:78), which
    cannot parse a range string, so the two endpoints must be split on
    ... and parsed one at a time.
  • At least one endpoint required. key[...] with both endpoints
    omitted is rejected at parse time; a range must pin at least one side.
  • Ordering. When both endpoints are present and resolve to micros, I
    recommend the parser reject start > end at parse time with a
    SyntaxException (a backwards interval is always a user error and is
    cheap to catch). start == end denotes an empty interval under the
    half-open rule; I recommend accepting it at parse time and letting the
    evaluator return nothing, so the parser does not need to special-case
    it. Note that natural-language endpoints are resolved relative to the
    current instant, so the ordering check applies to the resolved micros at
    parse time.
  • At most one binding per key. A key carries at most one
    bracket-timestamp binding, whether single or range. foo[t1][t2...t3]
    and any other doubled bracket remain rejected, consistent with the
    current invariant.
  • Single-instant form unchanged. foo[t] continues to parse to a
    TemporalKeySymbol. The range forms parse to a new sibling node so the
    single-instant contract and its toString() are untouched.

New AST node (sibling to TemporalKeySymbol, e.g.
TemporalRangeKeySymbol):

  • It wraps a KeyTokenSymbol plus an optional start TimestampSymbol and
    an optional end TimestampSymbol, with the invariant that at least one
    is present.
  • It mirrors the existing TemporalKeySymbol constructor guards
    (src/main/java/com/cinchapi/ccl/grammar/TemporalKeySymbol.java:61-74):
    it rejects wrapping a NavigationKeySymbol (navigation timestamps live
    on stops) and rejects wrapping an already-temporal key.
  • isParameterized() returns true; stripParameters() and baseKey()
    delegate to the wrapped key, exactly as TemporalKeySymbol does
    (TemporalKeySymbol.java:108-122).
  • toString() emits a canonical, re-parseable string using the resolved
    micros of whichever endpoints are present: key[<m1>...<m2>],
    key[...<m2>], or key[<m1>...]. This canonical form is a cross-repo
    contract: concourse's Keys.parse must re-parse the exact same string
    (see the related concourse change in Scope and dependencies). The
    existing single form emits key[<micros>]
    (TemporalKeySymbol.java:104-106); the range form follows the same
    bracket-of-micros style.

No new programmatic builder API is required. Per the concourse-driver-java
Criteria contract, the binding is expressed inside the key string (for
example .key("lock[t1...]")) and parsed by CCL; the Criteria Javadoc
documents this where().key("name[t]") pattern directly. The
operation-level .at(Timestamp) builder
(concourse-driver-java/.../lang/Criteria.java:111) is an unrelated
point-in-time fallback and is not touched.

Evaluation semantics are deferred

This ticket records two endpoints on the AST; it does not decide what a
range read returns. There are two coherent interpretations, and choosing
between them is an evaluation-semantics decision that belongs to the
concourse change:

  • Present-during: the values that existed at some moment in the
    interval. This is the natural generalization of the single foo[t]
    form, which reads the state as of t (per CCL_REFERENCE.md section 8.1,
    a leaf bracket pins the leaf's evaluation timestamp). Under this
    interpretation a value written before start and never changed is still
    present throughout the interval and is in range.
  • Asserted-during: the values whose write instant falls in the
    interval. Under this interpretation a value written before start is
    not in range, even if it is still the current value.

These differ precisely for the lock-freshness case. A lock written once,
long before now - timeout, and never re-asserted is present-during the
freshness window but was not asserted-during it; freshness wants the
asserted-during answer. The parser does not need to take a position to
record the endpoints, and it must not, because evaluation is out of scope.
The concourse evaluation change is where this is pinned down and where the
endpoints described here are translated into a revision scan. I am calling
this out so the cross-repo follow-up does not silently inherit the wrong
interpretation.

Examples

Connector lock freshness (the motivating case). The caller scopes a lock
read to a freshness window in a single key expression; the exact
freshness rule is decided by the concourse evaluation change (see
"Evaluation semantics are deferred"):

lock[1718380800000000...] = "claimed"

Read commands with a range-bound key:

select lock[1718380800000000...] from 1
get status[...1718380800000000] from 1
find lock[1718380800000000...1718384400000000] = "claimed"

Endpoint value forms (each endpoint independently, as in the single form):

score["2024-01-01"..."2024-02-01"] >= 90
name["last week"...] = "Jeff"
status[..."yesterday"] = "active"

Canonical round-trip (what toString() re-emits, all-micros):

input:  status["2024-01-01"...]
output: status[1704067200000000...]

Implementation pointers

This is a planning ticket; the following is the recommended approach. The
existing single-timestamp binding lives in two parallel parse paths that
must change in lockstep
, exactly as the single form does.

1. Token: add an explicit ... separator. Add a RANGE_SEPARATOR
token for the literal ... rather than leaning on the NON_ALPHANUMERIC
catch-all (grammar.jjt:316). Hazard: . is shared across the
NON_ALPHANUMERIC catch-all, SIGNED_DECIMAL (grammar.jjt:302), and the
dotted key tokens, so an unquoted run such as 123...456 is lexically
ambiguous and must be disambiguated deliberately rather than left to the
default token competition. (Note that 123...456 does not itself match
SIGNED_DECIMAL, which requires exactly one period with digits on both
sides; the risk is mis-tokenization across the shared . rules, not a
single-token collision.) Guard the new production with LOOKAHEAD, and to
sidestep the lexer hazard entirely for a first cut I recommend requiring
numeric-micros or quoted endpoints around ... (the single form's tests
already exercise both numeric and quoted endpoints). Note that the
lexer-path #KEY_BRACKET fragment (grammar.jjt:310) already admits any
character except ], [, \n, \r inside brackets, so ... is already
lexically legal inside a lexer-captured bracket; the work there is in the
content parser, not the token.

2. Parser path (flat / leaf and scope-prefix keys). Key()
(grammar.jjt:543-573) calls BracketedTimestamp() (grammar.jjt:771-780),
which returns a single TimestampSymbol. Extend the bracket production (or
add a sibling, e.g. BracketedTimestampRange(), selected by LOOKAHEAD on
the presence of ...) to return either a single timestamp or a start/end
pair. In Key(), when the bracket yields a range and result is not a
NavigationKeySymbol, construct the new TemporalRangeKeySymbol instead of
TemporalKeySymbol (grammar.jjt:572). The SearchKey() path
(grammar.jjt:529-541) reaches flat search keys via the lexer-captured
bracket and does not call BracketedTimestamp(), so it is covered by the
lexer-path work below.

3. Lexer path (bracket content captured inside key tokens). For keys
captured as PERIOD_SEPARATED_STRING / NAVIGATION_SCOPE_OPEN /
ASTERISK_SUFFIXED_STRING, the bracket content is parsed by
NavigationKeyStop.parseBracketContent
(src/main/java/com/cinchapi/ccl/grammar/NavigationKeyStop.java:208-230),
which today returns one TimestampSymbol and splits on whitespace. Teach
it to detect the ... separator, split into endpoints, parse each via
NaturalLanguage.parseMicros, and carry the result.

4. AST node and per-stop carrier. Add TemporalRangeKeySymbol modeled
on TemporalKeySymbol (file cited above), including equals/hashCode
over the wrapped key plus both endpoints, and the parameter-aware hooks
(isParameterized, stripParameters, baseKey) defined on
KeyTokenSymbol
(src/main/java/com/cinchapi/ccl/grammar/KeyTokenSymbol.java:106-132).

5. Scope. I recommend implementing leaf keys first. The per-stop
navigation carrier (NavigationKeyStop, which currently holds a single
@Nullable TimestampSymbol at NavigationKeyStop.java:265-266, with
withTimestamp at line 323 and segment() at line 364) and the
scope-prefix forms multiply the work across NavigationKeyStop and
NavigationKeySymbol, so range support on navigation stops and scope
prefixes is a sensible follow-up rather than part of the first cut.

6. Command acceptance. Range bindings should be accepted on the same
commands that accept the single bracket binding and rejected on the same
ones that reject it, via the existing requireNotParameterized guard
(KeyTokenSymbol.java:34-57), which is already invoked from write-command and
range-history grammar actions (grammar.jjt:1059, 1081, 1099, 1124, 1136,
1595, 1612, 1696, 1710, 1780, and others). Because the new node returns
isParameterized() == true, writes and the history-range commands
(audit, chronicle, diff) reject it for free; those commands already
take a from ... to window, as noted in CCL_REFERENCE.md section 8.2. For
a first cut I recommend accepting the range binding wherever the single form
is accepted (read and find paths).

7. No grammar-visitor change. As with the single form, the binding
lives on the key symbol and the expression node merely wraps it, so the
compiler visitor needs no change.

8. Regenerate the parser. The grammar is JavaCC/JJTree at
grammar/grammar.jjt (not ANTLR). After editing the .jjt, regenerate the
checked-in sources under src/main/java/com/cinchapi/ccl/generated by
running ./javacc-parser-generator.sh (a self-contained jar in the repo
root). Edit the .jjt, never the generated files.

Performance

  • Parse cost. Parsing is unchanged in order of growth: one extra
    LOOKAHEAD to detect ... and at most two NaturalLanguage.parseMicros
    calls per bracket instead of one. There is no full scan and no nested
    iteration introduced at parse time.
  • No O(n^2) risk in the grammar. The LOOKAHEAD guard that
    disambiguates ... from SIGNED_DECIMAL and the dotted-key tokens must
    be bounded (fixed-token lookahead), not a syntactic lookahead over the
    whole bracket, so it does not turn parsing super-linear.
  • Evaluation is out of scope, but the AST must enable push-down. The
    node should expose start and end as resolved micros so the concourse
    evaluator can translate a range binding into an index-backed range scan
    over a key's revisions rather than materializing and filtering full
    history. Carrying the endpoints as plain micros (not as opaque phrases)
    is what keeps that push-down possible on large datasets. The parser
    resolving natural-language endpoints once, at parse time, also avoids
    repeated re-resolution downstream.

Testing

This repo writes tests but does not run them (a live server is required);
reproduction and behavior tests still belong in the ticket. Follow the
existing conventions: JUnit 4, and the mandatory Goal / Start state /
Workflow / Expected Javadoc block that the project guidelines require on
every @Test. BracketTimestampMatrixTest already follows this block on
every method and is the model for the new tests; the new tests must carry
the same block.

  • Matrix tests in
    src/test/java/com/cinchapi/ccl/BracketTimestampMatrixTest.java (the
    Bracket-timestamp syntax for key annotations #58 matrix). Add range rows alongside the existing flat cases:
    • closed foo[t1...t2], open-start foo[...t2], open-end foo[t1...]
      each parse to the new node with the expected resolved endpoints;
    • half-open boundary intent is documented in the Goal (an endpoint equal
      to start is in range, an endpoint equal to end is not), even though
      the parser only records endpoints;
    • foo[...] (both endpoints omitted) is rejected;
    • backwards foo[t2...t1] with t2 > t1 is rejected;
    • double binding foo[t1][t2...t3] is rejected, mirroring the existing
      testF7_DoubleBracketOnLeafRejected flat double-bracket case;
    • keyword-equivalence is unaffected (the at/on/during keyword is a
      single-instant concept and does not combine with ...);
    • round-trip: extend the assertRoundTrip helper coverage so each
      new form survives tokenize then toString then re-parse to an equal
      AST. This is the gate that proves the canonical toString() is
      re-parseable.
  • Node unit tests mirroring
    src/test/java/com/cinchapi/ccl/grammar/TemporalKeySymbolTest.java for the
    new TemporalRangeKeySymbol: null-key and null-endpoint rejection,
    rejection of a fully-open (both-null) range, rejection of wrapping a
    NavigationKeySymbol or an already-temporal key, accessor behavior,
    equals/hashCode, and the three toString() shapes.
  • Command tests in
    src/test/java/com/cinchapi/ccl/BracketTimestampCommandTest.java:
    range binding accepted on the read/find commands chosen for the first
    cut, and rejected on writes and on audit/chronicle/diff with the
    context-naming SyntaxException the existing assertCommandRejected
    helper checks for.
  • Docs. Update CCL_REFERENCE.md section 8.4 (bracket annotation, in
    detail) to document the three range forms and the half-open semantics,
    and confirm section 8.2 (where brackets are accepted) still describes the
    accept/reject split correctly for the range forms.

Scope and dependencies

In scope (first cut):

  • The ... range separator token and the three range forms on leaf
    keys, in both the parser path and the lexer path.
  • The new TemporalRangeKeySymbol AST node with a canonical,
    re-parseable toString().
  • Parse-time validation: at least one endpoint, reject backwards ranges,
    reject double bindings.
  • Command acceptance via the existing requireNotParameterized guard, and
    the matrix, node, command, and reference-doc updates above.

Out of scope:

  • Range bindings on navigation stops (a[t1...t2].b) and on scope
    prefixes
    (a[t1...].(...)). These multiply the work across
    NavigationKeyStop and NavigationKeySymbol and are a natural
    follow-up.
  • Server-side evaluation of a range binding, including the choice
    between the present-during and asserted-during interpretations
    documented above. That is a separate concourse change.

Cross-repo dependency. Concourse pins ccl 4.0.0
(concourse-driver-java/build.gradle); the local ccl base version is 4.1.0
(.version). This grammar change must ship in a ccl release, after which
concourse bumps its ccl dependency before the matching concourse
range-evaluation change can land. That concourse change is where Keys.parse
must be taught to re-parse the canonical range toString() produced here,
making this node's canonical string form a cross-repo contract. I am
naming the related repo and feature in prose deliberately and am not
inventing an issue number for it.

Related

Part of the connector data-sync locking initiative (cinchapi-server).

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions