You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
#403 made tableau's proto regeneration wire-compatible by:
Preserving stable field numbers for surviving fields.
Emitting reserved <numbers>; for deleted fields and pre-existing gaps.
This protects binary (binpb) compatibility. However, protobuf has a second golden rule that sits next to "never reuse a deleted field
number":
Never reuse a deleted field name either, otherwise text-based
encodings (JSON / text_format / txtpb) can silently mis-decode
old payloads against a new schema.
protobuf supports this via:
reserved "old_field", "another_old_field";
Today tableau does not emit reserved <names>; on regeneration, so
field name reuse is still possible. This issue is to discuss whether
we should close that gap.
Proposal
When a field is deleted or renamed during regeneration, in addition to
reserving its number, also reserve its old name:
Text-format compatibility (JSON / txtpb / text_format). Without
reserved names, a future field that happens to reuse a deleted name
with a different type will silently parse old JSON/txtpb payloads
into the wrong field, producing data corruption that is
hard to detect (no wire-format error, no compile error).
Aligns with protobuf's official guidance. The Google protobuf
style guide explicitly recommends reserving both number and name on
field deletion.
Catches accidental rename-as-rewrite. If a user renames a field
in the workbook expecting it to be a "logical rename" but actually
changes the type, a reserved name will turn it into a hard
generation/build error instead of a silent runtime bug.
Cons
Forces awkward renames on legitimate type changes. A user may
legitimately want to change a field's type while keeping the name.
Example: a BeginTime column was originally typed as int64 because
the user didn't know tableau has a built-in datetime. Switching to datetime while keeping the old name is a reasonable evolution, but
with reserved names the user is forced into BeginTimeV2 / BeginTimeNew, hurting readability and producing version-suffix
noise across the codebase.
No "rename" primitive in tableau today. Reserving names assumes
that "field gone from sheet" == "field deleted forever". In practice
workbook edits often mix delete + add of the same logical concept.
Without an explicit rename / type-change primitive, reserved names
make natural editing flows fail loudly.
Migration cost for existing projects. Older generated .proto
files have no reserved-name history. Auto-reserving on the next
regeneration may declare a large batch of names as reserved at once,
surprising users.
Harder to opt out per case. Reserved numbers are abstract (just
ints) and rarely conflict with intent. Reserved names directly
collide with human-meaningful identifiers, so escape hatches will be
requested more often.
Other angles worth discussing
Opt-in vs opt-out. Should this be governed by a new option
(e.g. --preserve-field-names, mirroring the existing PreserveFieldNumbers)? Default on or off?
Explicit rename annotation. Could we add a workbook-level marker
(e.g. a sheet-meta directive rename: BeginTime -> BeginTimeAt) so
that legitimate renames bypass the reserved-name rule, while
unannotated removals still get the protection?
Per-message escape hatch. Allow a // tableau:allow-name-reuse
comment on a message to opt out, for users who do not consume
JSON / txtpb at all and only care about wire format.
Scope. Do we want the same treatment for enum value names
(which have the same JSON-name reuse hazard) or limit this to
message fields for now?
Background
#403 made tableau's proto regeneration wire-compatible by:
reserved <numbers>;for deleted fields and pre-existing gaps.This protects binary (
binpb) compatibility. However, protobuf has asecond golden rule that sits next to "never reuse a deleted field
number":
protobuf supports this via:
Today tableau does not emit
reserved <names>;on regeneration, sofield name reuse is still possible. This issue is to discuss whether
we should close that gap.
Proposal
When a field is deleted or renamed during regeneration, in addition to
reserving its number, also reserve its old name:
Pros
reserved names, a future field that happens to reuse a deleted name
with a different type will silently parse old JSON/txtpb payloads
into the wrong field, producing data corruption that is
hard to detect (no wire-format error, no compile error).
reservedstatements on regeneration #403 already protects numbers; namesare the other half of the same invariant. Doing only one half leaves
a footgun.
style guide explicitly recommends reserving both number and name on
field deletion.
in the workbook expecting it to be a "logical rename" but actually
changes the type, a reserved name will turn it into a hard
generation/build error instead of a silent runtime bug.
Cons
legitimately want to change a field's type while keeping the name.
Example: a
BeginTimecolumn was originally typed asint64becausethe user didn't know tableau has a built-in
datetime. Switching todatetimewhile keeping the old name is a reasonable evolution, butwith reserved names the user is forced into
BeginTimeV2/BeginTimeNew, hurting readability and producing version-suffixnoise across the codebase.
that "field gone from sheet" == "field deleted forever". In practice
workbook edits often mix delete + add of the same logical concept.
Without an explicit rename / type-change primitive, reserved names
make natural editing flows fail loudly.
.protofiles have no reserved-name history. Auto-reserving on the next
regeneration may declare a large batch of names as reserved at once,
surprising users.
ints) and rarely conflict with intent. Reserved names directly
collide with human-meaningful identifiers, so escape hatches will be
requested more often.
Other angles worth discussing
(e.g.
--preserve-field-names, mirroring the existingPreserveFieldNumbers)? Default on or off?(e.g. a sheet-meta directive
rename: BeginTime -> BeginTimeAt) sothat legitimate renames bypass the reserved-name rule, while
unannotated removals still get the protection?
// tableau:allow-name-reusecomment on a message to opt out, for users who do not consume
JSON / txtpb at all and only care about wire format.
enumvalue names(which have the same JSON-name reuse hazard) or limit this to
message fields for now?
reservedstatements on regeneration #403's auto-reserve of historical gaps. Fornumbers we can synthesize reservations from gaps; for names we
cannot recover deleted names from an old
.protoif previousversions never wrote them. Is "best-effort, only reserve names we
observe being deleted from now on" acceptable?
detector, codegen for other languages) that we should sanity-check
before turning this on.
Decision needed
regeneration guarantees?
Discussion welcome. cc @wenchy