Skip to content

docs(cookbooks): add PDF pipeline cookbook#3796

Merged
abelanger5 merged 10 commits intohatchet-dev:mainfrom
BloggerBust:docs/pdf-pipeline-cookbook
May 5, 2026
Merged

docs(cookbooks): add PDF pipeline cookbook#3796
abelanger5 merged 10 commits intohatchet-dev:mainfrom
BloggerBust:docs/pdf-pipeline-cookbook

Conversation

@BloggerBust
Copy link
Copy Markdown
Contributor

Description

Adds a new PDF processing pipeline cookbook showing how to model a fixed document-processing workflow as a Hatchet DAG.

The cookbook includes Python and TypeScript examples that:

  • accept a PDF as base64 workflow input
  • extract text from the PDF
  • run classification, summarization, and keyword extraction as independent DAG branches
  • combine the parent task outputs in a final formatting step
  • include trigger/run scripts and e2e tests
  • use generated snippets in the docs page

This is a documentation/example-only change. It does not add product functionality or require product dependency changes.

Type of change

  • Documentation change (pure documentation change)

What's Changed

  • Add a PDF pipeline cookbook under frontend/docs/pages/cookbooks
  • Add Python and TypeScript PDF pipeline examples
  • Add Python and TypeScript e2e tests for the example pipeline
  • Add generated snippets and example mirrors
  • Add the cookbook to the cookbooks index and navigation

@vercel
Copy link
Copy Markdown

vercel Bot commented Apr 30, 2026

@BloggerBust is attempting to deploy a commit to the Hatchet Team on Vercel.

A member of the Team first needs to authorize it.

@BloggerBust BloggerBust force-pushed the docs/pdf-pipeline-cookbook branch from 0c781b0 to 51dd077 Compare April 30, 2026 22:23
@BloggerBust BloggerBust changed the title Docs/pdf pipeline cookbook docs(cookbooks): add PDF pipeline cookbook Apr 30, 2026
@BloggerBust BloggerBust force-pushed the docs/pdf-pipeline-cookbook branch from 51dd077 to bfdb378 Compare April 30, 2026 22:40
@vercel
Copy link
Copy Markdown

vercel Bot commented Apr 30, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
hatchet-docs Ready Ready Preview, Comment May 5, 2026 3:18pm

Request Review

@BloggerBust BloggerBust marked this pull request as draft May 1, 2026 16:07
@BloggerBust BloggerBust marked this pull request as draft May 1, 2026 16:07
@BloggerBust BloggerBust marked this pull request as ready for review May 1, 2026 22:37
Copy link
Copy Markdown
Contributor

@abelanger5 abelanger5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! A few comments here, but overall looks great 💯

Comment thread frontend/docs/pages/cookbooks/pdf-pipeline.mdx

- a working local Hatchet environment or access to [Hatchet Cloud](https://cloud.onhatchet.run)
- a Hatchet SDK example environment (see the [Quickstart](/v1/quickstart))
- optionally, a PDF text extraction library for real PDF parsing:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's mention that we'll switch to Reducto at the end here?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I decided to introduce Reducto in the introduction so readers see the production parsing option before starting the walkthrough. I am happy to move it into the Setup if you prefer.

Comment thread frontend/docs/pages/cookbooks/pdf-pipeline.mdx Outdated
@abelanger5 abelanger5 merged commit 4efd1ab into hatchet-dev:main May 5, 2026
29 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants