Skip to content

protogen: should we also reserve field names on regeneration? #408

@Kybxd

Description

@Kybxd

Background

#403 made tableau's proto regeneration wire-compatible by:

  • Preserving stable field numbers for surviving fields.
  • Emitting reserved <numbers>; for deleted fields and pre-existing gaps.

This protects binary (binpb) compatibility. However, protobuf has a
second golden rule that sits next to "never reuse a deleted field
number":

Never reuse a deleted field name either, otherwise text-based
encodings (JSON / text_format / txtpb) can silently mis-decode
old payloads against a new schema.

protobuf supports this via:

reserved "old_field", "another_old_field";

Today tableau does not emit reserved <names>; on regeneration, so
field name reuse is still possible. This issue is to discuss whether
we should close that gap.

Proposal

When a field is deleted or renamed during regeneration, in addition to
reserving its number, also reserve its old name:

message Item {
  int64 id = 1;
  reserved 2;            // already done by #403
  reserved "begin_time"; // proposed
}

Pros

  • Text-format compatibility (JSON / txtpb / text_format). Without
    reserved names, a future field that happens to reuse a deleted name
    with a different type will silently parse old JSON/txtpb payloads
    into the wrong field, producing data corruption that is
    hard to detect (no wire-format error, no compile error).
  • Symmetry & defense in depth. feat(protogen): preserve stable field numbers and auto-emit reserved statements on regeneration #403 already protects numbers; names
    are the other half of the same invariant. Doing only one half leaves
    a footgun.
  • Aligns with protobuf's official guidance. The Google protobuf
    style guide explicitly recommends reserving both number and name on
    field deletion.
  • Catches accidental rename-as-rewrite. If a user renames a field
    in the workbook expecting it to be a "logical rename" but actually
    changes the type, a reserved name will turn it into a hard
    generation/build error instead of a silent runtime bug.

Cons

  • Forces awkward renames on legitimate type changes. A user may
    legitimately want to change a field's type while keeping the name.
    Example: a BeginTime column was originally typed as int64 because
    the user didn't know tableau has a built-in datetime. Switching to
    datetime while keeping the old name is a reasonable evolution, but
    with reserved names the user is forced into BeginTimeV2 /
    BeginTimeNew, hurting readability and producing version-suffix
    noise across the codebase.
  • No "rename" primitive in tableau today. Reserving names assumes
    that "field gone from sheet" == "field deleted forever". In practice
    workbook edits often mix delete + add of the same logical concept.
    Without an explicit rename / type-change primitive, reserved names
    make natural editing flows fail loudly.
  • Migration cost for existing projects. Older generated .proto
    files have no reserved-name history. Auto-reserving on the next
    regeneration may declare a large batch of names as reserved at once,
    surprising users.
  • Harder to opt out per case. Reserved numbers are abstract (just
    ints) and rarely conflict with intent. Reserved names directly
    collide with human-meaningful identifiers, so escape hatches will be
    requested more often.

Other angles worth discussing

  • Opt-in vs opt-out. Should this be governed by a new option
    (e.g. --preserve-field-names, mirroring the existing
    PreserveFieldNumbers)? Default on or off?
  • Explicit rename annotation. Could we add a workbook-level marker
    (e.g. a sheet-meta directive rename: BeginTime -> BeginTimeAt) so
    that legitimate renames bypass the reserved-name rule, while
    unannotated removals still get the protection?
  • Per-message escape hatch. Allow a // tableau:allow-name-reuse
    comment on a message to opt out, for users who do not consume
    JSON / txtpb at all and only care about wire format.
  • Scope. Do we want the same treatment for enum value names
    (which have the same JSON-name reuse hazard) or limit this to
    message fields for now?
  • Interaction with feat(protogen): preserve stable field numbers and auto-emit reserved statements on regeneration #403's auto-reserve of historical gaps. For
    numbers we can synthesize reservations from gaps; for names we
    cannot recover deleted names from an old .proto if previous
    versions never wrote them. Is "best-effort, only reserve names we
    observe being deleted from now on" acceptable?
  • Tooling impact. Anything downstream (BSR breaking-change
    detector, codegen for other languages) that we should sanity-check
    before turning this on.

Decision needed

  1. Do we agree text-format compatibility is in scope for tableau's
    regeneration guarantees?
  2. If yes, opt-in flag or default-on?
  3. Do we need an explicit rename primitive before this can ship safely?

Discussion welcome. cc @wenchy

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions