Skip to content

fix feature property order to follow the provider schema#538

Merged
cportele merged 9 commits into
masterfrom
fix-feature-property-order
Jun 26, 2026
Merged

fix feature property order to follow the provider schema#538
cportele merged 9 commits into
masterfrom
fix-feature-property-order

Conversation

@cportele

Copy link
Copy Markdown
Contributor

NOTE: This is a change in the core of the feature pipeline and needs broad testing before the release of ldproxy 4.8.

Properties backed by a joined table (objects, object arrays, value arrays, feature references) were emitted after the main-table columns even when declared before them, because the per-feature event buffer placed each property where its tokens first arrive - the provider's per-table order, not the schema order. For XML/GML this produced output that did not match the application schema's element order.

FeatureEventBuffer now re-sorts each buffered feature at flush into the declared schema order: object (and feature-root) children are ordered by their schema position; array elements keep their data order, and the children inside each element are ordered like any other object. The pass runs after the slice transformers, so transform behaviour is unchanged.

Object and array element children are consequently emitted in schema order too (previously source/arrival order); the affected token fixtures are updated. Adds FeatureEventBufferOrderSpec and re-enables the previously disabled "joined value array between main columns" mapping case.

Properties backed by a joined table (objects, object arrays, value arrays, feature references) were emitted after the main-table columns even when declared before them, because the per-feature event buffer placed each property where its tokens first arrive - the provider's per-table order, not the schema order. For XML/GML this produced output that did not match the application schema's element order.

FeatureEventBuffer now re-sorts each buffered feature at flush into the declared schema order: object (and feature-root) children are ordered by their schema position; array elements keep their data order, and the children inside each element are ordered like any other object. The pass runs after the slice transformers, so transform behaviour is unchanged.

Object and array element children are consequently emitted in schema order too (previously source/arrival order); the affected token fixtures are updated. Adds FeatureEventBufferOrderSpec and re-enables the previously disabled "joined value array between main columns" mapping case.
@cportele cportele requested a review from azahnen as a code owner June 18, 2026 18:09
@azahnen

azahnen commented Jun 23, 2026

Copy link
Copy Markdown
Collaborator

@cportele
I am hesitating to approve this since it seems to me we would be adding a second sorting logic on top of the (dysfunctional) existing sorting logic. I think we should either fix the existing logic or remove/tighten it and then introduce the new one.

I tried to remember and understand how the existing logic is supposed to work. The basic idea is that sorting happens as part of the buffer inserts using cursors:

  • FeatureEventBuffer has cursors current and currentEnclosing
  • SchemaMapping contains desired property positions (computed in SchemaMappingBase)
  • onObjectStart, onValue etc. in FeatureTokenTransformerMappings call FeatureEventBuffer.next with property positions
  • this moves the cursors to the desired position
  • the following call to FeatureEventBuffer.append inserts the tokens at cursor position
  • FeatureEventBuffer holds tokens in desired order at any time

I did not dig any deeper yet to find the underlying issue. Questions:

  1. Since the new logic uses mapping.getPositionsForTargetPath(path) as well, I assume that part is working as expected? In my memory that was the most tricky and unstable part.
  2. Do you see any inherent flaw in the design of the existing logic, or are we just looking for an implementation error?
  3. I do not see any dependency on insert order regarding joined tables etc in the design idea. If there is such an unwanted dependency, it would have to come from getPositionsForTargetPath, right?
  4. I am not opposed to using the new logic if it works better and is easier to understand. Do you see any performance hits?
  5. That being said, we cannot remove the cursor logic since it is used by slice transformers as well, right? And most likely it is also still needed on inserts for nesting?

@cportele

Copy link
Copy Markdown
Contributor Author

@azahnen - I agree that the right approach is not to put new logic on top of the current, incomplete logic. I just had a hard time to understand the current logic (and its limitations). I will take a deeper look at your questions and come back with answers.

cportele added 2 commits June 23, 2026 14:16
The per-feature event buffer ordered properties in two places: an
incremental, cursor-driven placement in append, and a flush-time pass
that re-sorted the feature into schema order. The cursor placement only
got top-level properties right - a property nested in another object and
backed by a joined table was left in production order - so the second
pass was layered on top, leaving two overlapping ordering mechanisms.

Keep a single mechanism. append now stores tokens in the order the
provider produces them, and orderedBySchema is the only pass that applies
schema order. It runs in place, lazily and once per feature: before the
in-buffer slice transformers read the buffer, and at the latest at flush
when no transformer ran.

Ordering before the slice transformers is required, not incidental: a
property produced from several tables arrives as several fragments, and a
transformer reads a property as a contiguous buffer range. Only once the
feature is in schema order are a property's fragments contiguous; without
it, an unrelated property emitted between two fragments is swallowed into
the slice (e.g. a separate array nested inside a concatenated object
array). The transformers rewrite within a property and preserve schema
order, so the pass is not repeated after they run.

The position-addressable slice index that getSlice/replaceSlice need is
built once per feature by a single pass over the buffered tokens, only
when a slice is first accessed, and then updated in place as slices are
rewritten - because the buffer is in schema-position order, a slice that
changes size simply shifts every later position by the same amount, so
the index is not rebuilt per rewrite.

The schema-order pass also coalesces the per-table fragments of a
single-valued object into one object: a provider produces an object
backed by more than one table as several OBJECT[path]..OBJECT_END[path]
blocks at the same position, which the previous per-position accrual
merged implicitly. Object-array elements are wrapped in an array and are
never affected.

Remove the now-unused cursor plumbing (next, the current/currentEnclosing
state, the increase/propagate accounting) and the dead source-path
reorder transformer FeatureTokenTransformerSorting, which was only
referenced from a commented-out, empty getDecoderTransformers override.

Regression specs cover a nested joined object declared before its scalar
siblings, an object produced as two per-table fragments coalescing into
one, and a property whose fragments are split around an unrelated property
keeping that property out of its slice.
@cportele

Copy link
Copy Markdown
Contributor Author

@azahnen

Short version: append no longer sorts — it stores tokens in the provider's order. A single pass (orderedBySchema) applies schema order and is the only thing that orders.

Please have a look, whether the latest commit addresses your questions and concerns.

I tested it also with configurations that use embed, concat and various nested objects/arrays.

Your questions:

  1. Yes — getPositionsForTargetPath works as expected. Positions come from a declaration-order walk of the target schema, no insert-order input. It's correct and stable; the old cursor placement and the new pass read the same map.
  2. The position math was fine; the old append tried to maintain order incrementally and only got top-level properties right — a property nested in another object and backed by a joined table was left in production order, which is why initially a second pass got layered on top.
  3. There is no insert-order dependency. It was entirely in append's incremental slot accounting, not in the position lookup.
  4. Performance: The schema-order pass is O(n) per feature (build tree, sort + coalesce each object's children, flatten), ~30M tokens/s once warm — about 6 µs for a typical feature, ~0.3 ms for a large one (a 500-element object array), low-ms only for pathological features. It runs once per feature. The slice index that getSlice/replaceSlice need is built once per feature (only if a transformer reads a slice) and then updated in place as slices are rewritten — so a feature with many transformed properties (e.g. several concatenated arrays) pays the index cost once, not once per property. append is now cheaper than before (plain append vs. position-indexed insert + offset propagation). All off the fetch/encode/transfer path.
  5. The cursor logic can be and was removed. getSlice/replaceSlice stay (the slice transformers depend on them), but the cursor placement, the next calls, and the current/currentEnclosing state are gone. The index is derived from the buffered tokens instead of maintained during the (now order-agnostic) append.

Why one pass rather than fixing the incremental sort in place: the incremental accounting conflated two jobs — slice bookkeeping for the transformers and emission order — and the propagation that made the first work is exactly what mis-ordered nested joined-table properties in the second. Attempts to fix it in append scrambled object-array grouping. A single pass over the completed feature is simpler and safer: build the token tree, order each object's children by schema position.

When the pass runs: it can't only run at flush. A property produced from several tables arrives as several fragments, and a slice transformer (concat, etc.) reads a property as a contiguous buffer range. The fragments are only contiguous once the feature is in schema order — so the pass has to run before the slice transformers read the buffer, otherwise a transformer's slice can swallow an unrelated property the provider emitted between two fragments. So ensureOrdered() applies the pass in place, lazily, exactly once per feature: before the transformers run, and at the latest at flush when none did. The old append-sort happened to provide this contiguity implicitly; the single pass now provides it explicitly.

Net: one ordering mechanism. The slice machinery is tightened to just slice-indexing, a dead source-path reorder transformer is removed, and regression specs cover the cases that surfaced — a nested joined object declared before its scalar siblings; a single-valued object split into per-table fragments coalescing into one; and a property whose fragments are split around an unrelated property, keeping that property out of its slice.

azahnen and others added 3 commits June 25, 2026 08:22
…lice

getSlice threw IndexOutOfBoundsException (fromIndex < 0) on features
where an in-buffer slice transformer shrinks a property's slice and a
later property is absent from the feature.

computeIndex fills the slice index from scratch and recorded a span only
for positions that have tokens, leaving every absent property at start 0.
replaceSlice shrinking a present slice shifts the start of every later
position by the negative size delta, driving an absent position after it
below zero; the next getSlice then called buffer.subList with a negative
fromIndex. The previous incremental index maintenance kept a valid offset
for every position, including empty ones; the single-pass rebuild dropped
that.

After computing spans, a forward scan now stamps every empty position
with a valid buffer offset (the boundary between its occupied
neighbours). The buffer is in schema-position order at this point, so the
offsets are monotonic and non-negative, and only top-level enclosing
positions carry a non-zero length.
azahnen
azahnen previously approved these changes Jun 26, 2026

@azahnen azahnen left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good besides the one little nitpick.

cportele added 2 commits June 26, 2026 11:30
After the schema-order rework, the per-token position in
FeatureTokenTransformerMappings was used only by the `if (pos > -1)`
guard, which filtered exactly one thing: the path-less feature-root
object. A path-less marker resolves to the root schema (an object) but
has no schema position, so pos() returns -1 while schema() is present;
for every other token pos and schema agree.

Drop pos from all handlers. The object handlers now exclude the root
explicitly with `!schema.isFeature()` (isObject and empty parent path) -
self-documenting and provably the same set as pos > -1, since the only
token with pos == -1 and a present schema is that root. The array, value
and geometry handlers need no such check: the object root fails their
type checks already.
@cportele cportele merged commit 0b5b52a into master Jun 26, 2026
3 checks passed
@cportele cportele deleted the fix-feature-property-order branch June 26, 2026 09:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants