Skip to content

feat: populate Glue table description from HMS comment parameter#139

Merged
jamespfaulkner merged 5 commits into
mainfrom
feat/gluesync-table-description-from-comment
Jul 1, 2026
Merged

feat: populate Glue table description from HMS comment parameter#139
jamespfaulkner merged 5 commits into
mainfrom
feat/gluesync-table-description-from-comment

Conversation

@jamespfaulkner

Copy link
Copy Markdown
Contributor

Summary

  • Populates the Glue TableInput.description field from the HMS comment TBLPROPERTY during sync, making table documentation visible in the AWS Glue Data Catalog UI
  • Consistent with existing behaviour for databases, which already map Database.description to DatabaseInput.description
  • Extends GlueMetadataStringCleaner.cleanTable() to sanitise the description field so the ValidationException retry path cannot fail on a non-Unicode table comment

Notes

The comment value will appear in both description and parameters["comment"] on the Glue table — Glue treats these as separate fields and the redundancy is harmless.

Test plan

  • HiveToGlueTransformerTest — table with comment param maps to TableInput.description; table without comment param produces null description
  • GlueMetadataStringCleanerTest — non-Unicode chars stripped from description in retry path; null description handled safely
  • Full module test suite passes (mvn test -pl hive-event-listeners/apiary-gluesync-listener)

🤖 Generated with Claude Code

TableInput.description was never set during HMS→Glue sync, leaving table
documentation invisible in the Glue Data Catalog UI. Now reads the
table-level 'comment' TBLPROPERTY and maps it to the Glue description
field, consistent with how database descriptions are already handled.

Also extends GlueMetadataStringCleaner.cleanTable() to sanitise the
description field so the ValidationException retry path is safe if the
comment contains non-Unicode characters.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@jamespfaulkner jamespfaulkner requested a review from a team as a code owner July 1, 2026 10:45
jamespfaulkner and others added 4 commits July 1, 2026 12:22
Extend GlueMetadataStringCleaner to also clean the TableInput description
field — removing non-Unicode characters and truncating to the Glue API
limit of 2048 chars — so the ValidationException retry path is safe if a
table comment contains invalid characters. Extracts cleanDescription() as
a private method consistent with cleanColumns(). Also adds a test covering
tables with parameters but no comment key.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…tants for limits

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
String.repeat() requires Java 11+; GHA runs Java 8. Also fixes the
pre-existing bug in generateCharString() which ignored its length param.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@jamespfaulkner jamespfaulkner merged commit 2679fd6 into main Jul 1, 2026
1 check passed
@jamespfaulkner jamespfaulkner deleted the feat/gluesync-table-description-from-comment branch July 1, 2026 13:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants