feat(timeline-ui): [HUDI-9315] Add Hudi Timeline UI#13147
Conversation
|
@voonhous Nice feature, do you think it deserves a RFC then? |
|
Sure, will create a new RFC for this. Note: Calling out that this feature is not to be confused with RFC-05: |
18a627f to
d6feff9
Compare
yihua
left a comment
There was a problem hiding this comment.
🤖 This review was generated by an AI agent and may contain mistakes. Please verify any suggestions before applying.
Style & Readability Review — Code is generally clean and readable. One naming convention suggestion: the variable suffix "Obj" is non-standard in Java.
yihua
left a comment
There was a problem hiding this comment.
🤖 This review was generated by an AI agent and may contain mistakes. Please verify any suggestions before applying.
Thanks for contributing! The timeline UI is a nice addition for debugging and understanding Hudi timelines. A few functional issues to address: the hardcoded InstantComparatorV1 in getInstantDetails could cause lookup failures on V2-layout tables, the broad catch (Exception) silently swallows errors returning null with HTTP 200, and Integer.parseInt on the user-provided limit param can throw uncaught exceptions. See inline comments for details.
|
This is pretty amazing @voonhous 👏 |
hudi-agent
left a comment
There was a problem hiding this comment.
🤖 This review was generated by an AI agent and may contain mistakes. Please verify any suggestions before applying.
The new change in this update is a single pom.xml addition that excludes **/resources/public/lib/** from the Apache RAT license-header check, with a comment noting that the bundled third-party UI libraries (Bootstrap, renderjson, vis-timeline) ship with their own licenses. The change looks appropriate and well-scoped. The prior inline findings (hardcoded InstantComparatorV1 in getInstantDetails, broad catch (Exception) returning null, unsafe Integer.parseInt on the limit param, default metadata config flip in buildFileSystemViewManager, ungated v2 API endpoints, and per-request HoodieTableMetaClient construction) do not appear to have been addressed in this update — please take another look at those when you get a chance. Please take a look at any inline comments, and this should be ready for a Hudi committer or PMC member to take it from here.
hudi-agent
left a comment
There was a problem hiding this comment.
🤖 This review was generated by an AI agent and may contain mistakes. Please verify any suggestions before applying.
CodeRabbit Walkthrough: This PR introduces a web-based timeline explorer UI for the Hudi Timeline Service. It adds v2 API endpoints to expose timeline and instant data as DTOs, implements backend handlers to serve timeline/config/schema information, and provides a complete frontend with Bootstrap/vis-timeline for interactive timeline visualization, filtering, and instant detail inspection.
Sequence Diagram (CodeRabbit):
sequenceDiagram
participant Browser as Browser / UI
participant TimelineService as Timeline Service
participant FileSystemView as FileSystem View<br/>(Metadata Cache)
participant HoodieTimeline as HoodieTimeline
Browser->>TimelineService: GET /v2/hoodie/view/timeline/instants
TimelineService->>FileSystemView: getTimeline()
FileSystemView->>HoodieTimeline: getInstantsAsStream()
HoodieTimeline-->>FileSystemView: stream of HoodieInstant
FileSystemView-->>TimelineService: HoodieTimeline
TimelineService->>TimelineService: TimelineDTO.fromTimeline()
TimelineService-->>Browser: TimelineDTO (JSON)
Browser->>Browser: vis.Timeline render<br/>with instants
Browser->>TimelineService: GET /v2/hoodie/view/instants/details?instant=...&action=COMMIT
TimelineService->>FileSystemView: getTimeline()
FileSystemView->>HoodieTimeline: readCommitted() or readPending()
HoodieTimeline-->>FileSystemView: SpecificRecordBase (metadata/plan)
FileSystemView-->>TimelineService: Avro record
TimelineService->>TimelineService: JsonUtils.convert to Map
TimelineService-->>Browser: Detail JSON
Browser->>Browser: Render detail card<br/>+ compute clean range<br/>if KEEP_LATEST_COMMITS
CodeRabbit: hudi-agent#24 (review)
hudi-agent
left a comment
There was a problem hiding this comment.
🤖 This review was generated by an AI agent and may contain mistakes. Please verify any suggestions before applying.
Thanks for working on this! This PR adds a Timeline UI feature and supporting v2 endpoints (timeline, instant details, table config, schema history) to the timeline service. Prior review rounds have already flagged the key issues (parameter validation, error handling, script-fallback race, license exclusion). No new critical correctness issues flagged from this automated pass — a Hudi committer or PMC member can take it from here for a final review. A few naming and consistency suggestions below — two hard-coded URL strings that should be constants, a fully-qualified return type caused by a class name collision, and a minor variable naming nit.
cc @yihua
708ced4 to
f8b116f
Compare
Add a read-only Timeline UI, served by hudi-timeline-service, for visualizing a Hudi table's active timeline. Backend: - New v2 endpoints (timeline, instant details, table config, schema history) in TimelineHandler, registered via registerTimelineV2API() and gated behind the --enable-ui flag. - Per-basepath HoodieTableMetaClient caching; layout-aware InstantGenerator when reading instant details; request-parameter validation (400 for missing or malformed params, bounded schema-history limit) and proper error responses instead of returning null. Frontend: - Static HTML/JS/CSS (vis-timeline, Bootstrap, renderjson) served from bundled assets only, with no external/CDN calls. Integration: - Embedded-mode startup and a Spark UI "Hudi Timeline" tab. - Exclude bundled UI libraries from the apache-rat check.
hudi-agent
left a comment
There was a problem hiding this comment.
🤖 This review was generated by an AI agent and may contain mistakes. Please verify any suggestions before applying.
Thanks for iterating on this! At this point the timeline UI plumbing — v2 DTOs, the gated /v2 routes, the new /ui static handler, instant-detail/table-config/schema-history endpoints, and the metaClient cache — looks consistent with how it's exercised by the bundled timeline.js. I traced the timeline source (viewManager.getFileSystemView(basePath).getTimeline() resolves to visibleActiveTimeline from IncrementalTimelineSyncFileSystemView, which is filterCompletedAndCompactionInstants() — so completed clean/rollback/restore/savepoint actions plus all compaction states are visible to the UI as expected), the completion-time lookup path in getInstantFileName, the ConcurrentHashMap.computeIfAbsent semantics for the metaClient cache, and the Thread.currentThread().join() shutdown pattern, and didn't surface anything new beyond what prior rounds already flagged and the author addressed. No correctness issues found. A few style/readability suggestions in the inline comments. Please take a look, and this should be ready for a Hudi committer or PMC member to take it from here. A few naming and simplification suggestions below.
cc @yihua
| } | ||
|
|
||
| public static HoodieInstant toInstant(InstantDTO dto, InstantGenerator factory) { | ||
| if (null == dto) { |
There was a problem hiding this comment.
🤖 nit: could you rename factory to instantGenerator here? The same type is referred to as instantGenerator in TimelineDTOV2.toTimeline and in TimelineHandler, so calling it factory in this one spot breaks the naming convention and makes readers wonder whether it's a different object.
- AI-generated; verify before applying. React 👍/👎 to flag quality.
|
|
||
| app.get(RemoteHoodieTableFileSystemView.SCHEMA_HISTORY_V2_URL, new ViewHandler(ctx -> { | ||
| metricsRegistry.add("SCHEMA_HISTORY", 1); | ||
| int limit; |
There was a problem hiding this comment.
🤖 nit: the limit parsing and validation is done inline here, while every other query param in this class is extracted to a private helper (getInstantParam, getBasePathParam, etc.). It might be worth pulling this out to something like getLimitParam(Context ctx) to keep the pattern consistent and make the lambda body easier to scan.
- AI-generated; verify before applying. React 👍/👎 to flag quality.
|
|
||
| Map<String, Object> result = new HashMap<>(); | ||
|
|
||
| // Get current schema |
There was a problem hiding this comment.
🤖 nit: throws Exception is broader than needed here — every checked exception in the method body is caught internally (either swallowed or rethrown as HoodieException/BadRequestResponse). Could you narrow this to throws IOException or remove the throws clause entirely? As-is, callers see throws Exception and have to decide what to catch unnecessarily.
- AI-generated; verify before applying. React 👍/👎 to flag quality.
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #13147 +/- ##
============================================
- Coverage 68.25% 67.94% -0.31%
- Complexity 29477 30365 +888
============================================
Files 2542 2565 +23
Lines 142541 147754 +5213
Branches 17798 18973 +1175
============================================
+ Hits 97293 100395 +3102
- Misses 37242 39061 +1819
- Partials 8006 8298 +292
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
Describe the issue this Pull Request addresses
Visualizing the timeline via console and finder/explorer is often challenging as files are often sorted via naming convention. With completion time thrown into completed instants, it makes grouping of instants even harder when looking at it via console.
Hence, a timeline visualizer can oftentimes make one able to understand the timeline better, especially for concurrent/long writes.
The timeline-ui page is accessible via
http://timeline_server__host:timeline_server__port/uiafter starting the timeline service.Summary and Changelog
Attached below is an example of how it looks like for now, functionalities are there, but it is a little rough:
Main page
The main page allow users to input the the table paths and the timeline will be visualised.
Hovering over instants
Hovering over the instant bars will show you additional details like duration an action took.
Selecting an instant
Since commit details are saved to the filesystem as avro binaries, the commit information are not easily readable. Selecting the commit/deltacommit will return the details in json format.
Clean range
Upon selection of a
cleancommit/plan that usesKEEP_LATEST_COMMITS, clean range will be displayed.Configs page
Table configurations are displayed in the
Table Configtab:Schema History
Schema related changelogs are also displayed (only when schema operations are done using
hoodie.schema.on.read.enable=trueunder theShcema Historytab.The schema changelog can also be expanded to see what column changed at which schemacommit.
Impact
Describe any public API or user-facing feature change or any performance impact.
Added 4 API endpoints on the
/v2/hoodie/viewpath:timeline/instants/all?basepath={basepath}- all instants (v2 format)timeline/instant?basepath={basepath}&instant={instant}&instantaction={action}&instantstate={state}- instant detailstable/config?basepath={basepath}- table configurationtable/schema/history?basepath={basepath}&limit={limit}- schema evolution historyThe UI is served at
/ui(previously redirected to/index.html).Risk Level
None
Documentation Update
We do not have any documentations for timeline service, hence, i suppose no documentation changes are required.
Contributor's checklist