Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions .changeset/bright-llamas-pay.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
---
"@loveholidays/eval-kit": patch
---

Migrate evaluator structured output generation to AI SDK v6.
12 changes: 6 additions & 6 deletions docs/EVALUATOR.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,13 +10,13 @@ The Evaluator enables LLM-powered content evaluation with flexible prompt templa

1. **Evaluator** - Main evaluator class that orchestrates evaluation
2. **TemplateRenderer** - Handlebars-style template engine for prompts
3. **Vercel AI SDK** - Handles LLM API calls with structured output via generateObject
3. **Vercel AI SDK** - Handles LLM API calls with structured output via generateText
4. **Zod Schemas** - Dynamic schema generation based on score configuration

### Data Flow

```
User Input → Template Rendering → Vercel AI SDK generateObject → Structured Result with Stats
User Input → Template Rendering → Vercel AI SDK generateText → Structured Result with Stats
```

## Template Engine
Expand Down Expand Up @@ -64,7 +64,7 @@ This enables automatic detection of required inputs based on the template.

## Structured Output with Vercel AI SDK

The evaluator uses Vercel AI SDK's `generateObject` function to ensure structured, validated responses from the LLM.
The evaluator uses Vercel AI SDK's `generateText` function with an output schema to ensure structured, validated responses from the LLM.

### How It Works

Expand Down Expand Up @@ -192,7 +192,7 @@ Any provider compatible with Vercel AI SDK, including:

### Model Settings

Optional settings passed to `generateObject`:
Optional settings passed to `generateText`:

```typescript
{
Expand Down Expand Up @@ -394,7 +394,7 @@ The evaluator dynamically creates Zod schemas based on scoreConfig:
- All score configurations (numeric, categorical, default)
- Error conditions (API failures, undefined usage)
- Processing stats tracking (execution time, token usage)
- Model settings passthrough to generateObject
- Model settings passthrough to generateText

## Performance Considerations

Expand All @@ -408,6 +408,6 @@ Templates are rendered on every evaluation. For high-frequency evaluations, cons
- Caching rendered templates if variables don't change
- Using simpler templates without conditionals

### Vercel AI SDK generateObject
### Vercel AI SDK generateText

The Vercel AI SDK handles response parsing efficiently with structured output. The Zod schema validation ensures type-safe responses without manual parsing overhead.
2 changes: 1 addition & 1 deletion package.json
Original file line number Diff line number Diff line change
Expand Up @@ -87,7 +87,7 @@
},
"dependencies": {
"@xenova/transformers": "^2.17.2",
"ai": "^5.0.52",
"ai": "^6.0.175",
"csv-parse": "^6.1.0",
"csv-stringify": "^6.6.0",
"fastest-levenshtein": "^1.0.16",
Expand Down
69 changes: 38 additions & 31 deletions pnpm-lock.yaml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

19 changes: 11 additions & 8 deletions src/evaluators/evaluator-telemetry.spec.ts
Original file line number Diff line number Diff line change
Expand Up @@ -14,9 +14,12 @@ provider.addSpanProcessor(new SimpleSpanProcessor(exporter));
provider.register();

// Mock the ai module
const mockGenerateObject = jest.fn();
const mockGenerateText = jest.fn();
jest.unstable_mockModule("ai", () => ({
generateObject: mockGenerateObject,
generateText: mockGenerateText,
Output: {
object: jest.fn((output) => ({ type: "object", ...output })),
},
}));

const { Evaluator } = await import("./evaluator.js");
Expand All @@ -34,12 +37,12 @@ describe("Evaluator telemetry", () => {
exporter.reset();
_resetTracer();
enableTelemetry(true);
mockGenerateObject.mockClear();
mockGenerateText.mockClear();
});

it("should create a span with correct name and initial attributes on success", async () => {
mockGenerateObject.mockResolvedValue({
object: { score: 85, feedback: "Good quality" },
mockGenerateText.mockResolvedValue({
output: { score: 85, feedback: "Good quality" },
usage: { inputTokens: 100, outputTokens: 20, totalTokens: 120 },
});

Expand Down Expand Up @@ -79,7 +82,7 @@ describe("Evaluator telemetry", () => {
});

it("should record error attributes when evaluation fails", async () => {
mockGenerateObject.mockRejectedValue(new Error("API rate limited"));
mockGenerateText.mockRejectedValue(new Error("API rate limited"));

const evaluator = new Evaluator({
name: "accuracy",
Expand Down Expand Up @@ -112,8 +115,8 @@ describe("Evaluator telemetry", () => {
});

it("should not break existing behavior when OTel is present", async () => {
mockGenerateObject.mockResolvedValue({
object: { score: "excellent", feedback: "Top quality" },
mockGenerateText.mockResolvedValue({
output: { score: "excellent", feedback: "Top quality" },
usage: undefined,
});

Expand Down
Loading
Loading