Skip to content
Merged

dev #181

Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
1da8b33
docs: add instructions for commenting generated code in promptpex spe…
pelikhan Jul 1, 2025
15fd054
chore: bump version to 0.0.14
github-actions[bot] Jul 1, 2025
6e882bd
chore: update dependencies to latest versions
pelikhan Jul 1, 2025
cdd117c
[chore] upgrade image in action.yml
github-actions[bot] Jul 1, 2025
78d10d8
Merge branch 'dev' of https://github.com/microsoft/promptpex into dev
pelikhan Jul 1, 2025
2fcf09d
chore: specify Dockerfile for building image in release script
pelikhan Jul 1, 2025
e50b3c9
chore: bump version to 0.0.15
github-actions[bot] Jul 1, 2025
840ca7a
[chore] upgrade image in action.yml
github-actions[bot] Jul 1, 2025
fa558d0
docs: add usage instructions for running PromptPex in Docker
pelikhan Jul 1, 2025
453384a
Merge branch 'dev' of https://github.com/microsoft/promptpex into dev
pelikhan Jul 1, 2025
198d009
chore: update Docker image reference in action.yml and comment out ve…
pelikhan Jul 1, 2025
01be7a0
chore: bump version to 0.0.16
github-actions[bot] Jul 1, 2025
bfc08d6
chore: update dependencies and remove release script
pelikhan Jul 2, 2025
4ff8541
chore: bump version to 0.0.17
github-actions[bot] Jul 2, 2025
8d925fb
chore: add Azure environment configuration for GENAISCRIPT models
pelikhan Jul 2, 2025
973d1c7
Merge branch 'dev' of https://github.com/microsoft/promptpex into dev
pelikhan Jul 2, 2025
0f5e0af
chore: bump version to 0.0.18
github-actions[bot] Jul 2, 2025
624273b
chore: add .promptpex.env to .gitignore and update package.json for d…
pelikhan Jul 3, 2025
687ce79
updated instructions
pelikhan Jul 17, 2025
dabb9dc
getting old samples to work after updates
bzorn Jul 23, 2025
fe81d4f
handle missing "score" field in metric
bzorn Jul 23, 2025
d43f42b
fixed NaN bug
bzorn Jul 23, 2025
cdb99a1
chore: update dependencies to latest versions
pelikhan Aug 4, 2025
2fc4352
refactor: remove unused dependencies and update test ID generation me…
pelikhan Aug 4, 2025
58dd824
refactor: update video formatting and improve GitHub Models integrati…
pelikhan Aug 4, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions .azure.env
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
GENAISCRIPT_MODEL_LARGE="azure:gpt-4o"
GENAISCRIPT_MODEL_SMALL="azure:gpt-4o-mini"
GENAISCRIPT_MODEL_EVAL="azure:gpt-4o"
GENAISCRIPT_MODEL_RULES="azure:gpt-4o"
GENAISCRIPT_MODEL_BASELINE="azure:gpt-4o"
9 changes: 8 additions & 1 deletion .github/instructions/implementation.instructions.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,14 +17,17 @@ at runtime, rather than hardcoding them in your codebase.

Whenever possible, use the original `.prompty` files from the `src/prompts` directory.

Always to try to create minimal changes to the existing source code and make sure the generated code is compatible with the existing codebase.
Make changes in such a way that a developer will be able to understand and review the updates.

## Phase 1: Test Generation

PromptPex is a test generate framework for prompts. It is made of a graph of LLM transformations that eventually generate a set of
inputs and expected outputs for a given prompt.

- The core of the framework is documented in [Test Generation](docs/src/content/docs/reference/test-generation.md).

## Phase: Validate Test Generation
## Phase 2: Validate Test Generation

Once you have implemented the test generation, you should validate it on a prompt.

Expand Down Expand Up @@ -79,3 +82,7 @@ It is implemented using [GenAIScript](https://microsoft.github.io/genaiscript/).

**Following the patterns and habits of the the target framework/language you are generating**.
The reference implementation is a good starting point but you should adapt it to the target framework/language you are generating.

## Instructions

- Add comments in generated code explaining the source of the code in the promptpex specification.
10 changes: 5 additions & 5 deletions .github/workflows/release.sh
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ IMAGE_NAME="ghcr.io/microsoft/promptpex"
echo "Building Docker image: $IMAGE_NAME:$NEW_VERSION"

# Build the Docker image with version tag
docker build -t "$IMAGE_NAME:$NEW_VERSION" .
docker build -t "$IMAGE_NAME:$NEW_VERSION" . -f Dockerfile.serve

# Tag with major version
docker tag "$IMAGE_NAME:$NEW_VERSION" "$IMAGE_NAME:$MAJOR"
Expand All @@ -40,10 +40,10 @@ docker logout ghcr.io
echo "✅ Docker image pushed to GHCR: $IMAGE_NAME:$NEW_VERSION and $IMAGE_NAME:$MAJOR"

# Update action.yml with new version
sed -i "s|image: .*|image: docker://$IMAGE_NAME:$NEW_VERSION|" action.yml
git add action.yml
git commit -m "[chore] upgrade image in action.yml"
git push origin HEAD
#sed -i "s|image: .*|image: docker://$IMAGE_NAME:$NEW_VERSION|" action.yml
#git add action.yml
#git commit -m "[chore] upgrade image in action.yml"
#git push origin HEAD

# Step 4: Create GitHub release
gh release create "$NEW_VERSION" --title "$NEW_VERSION" --notes "Patch release $NEW_VERSION"
Expand Down
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -33,3 +33,4 @@ evals/explainer/
samples/github-models/**.prompty
evals/demo/
evals/summarizer/
.promptpex.env
22 changes: 13 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,20 +3,12 @@
> Test Generation for Prompts

- [Read the documentation](https://microsoft.github.io/promptpex/)
- [PromptPex technical paper](http://arxiv.org/abs/2503.05070)

**Prompts** are an important part of any software project that incorporates
the power of AI models. As a result, tools to help developers create and maintain
effective prompts are increasingly important.

- [Prompts Are Programs - ACM Blog Post](https://blog.sigplan.org/2024/10/22/prompts-are-programs/)

**PromptPex** is a tool for exploring and testing AI model prompts. PromptPex is
intended to be used by developers who have prompts as part of their code base.
PromptPex treats a prompt as a function and automatically generates test inputs
to the function to support unit testing.

- [PromptPex technical paper](http://arxiv.org/abs/2503.05070)

<https://github.com/user-attachments/assets/c9198380-3e8d-4a71-91e0-24d6b7018949>

PromptPex provides the following capabilities:
Expand All @@ -32,6 +24,18 @@ PromptPex provides the following capabilities:
- PromptPex uses an LLM to automatically determine whether model outputs meet the specified requirements.
- Automatically export the generated tests and rule-based evaluations to the OpenAI Evals API.

## Integrations

- [GitHub Models Extension](https://github.com/github/gh-models/releases/tag/v0.0.25)

## Running PromptPex

The PromptPex tool runs dockerized with this command.

```sh
docker run -p 8003:8003 ghcr.io/microsoft/promptpex:v0
```

## Responsible AI Transparency Note

Please reference [responsible-ai-transparency-note.md](./docs/src/content/docs/responsible-ai-transparency-note.md) for more information.
Expand Down
26 changes: 9 additions & 17 deletions action.yml
Original file line number Diff line number Diff line change
Expand Up @@ -356,63 +356,55 @@ inputs:
.prompty,.md,.txt,.json,.prompt.yml
required: false
debug:
description: Enable debug logging
(https://microsoft.github.io/genaiscript/reference/scripts/logging/).
description: Enable [debug
logging](https://microsoft.github.io/genaiscript/reference/scripts/logging/).
required: false
model_alias:
description: "A YAML-like list of model aliases and model id: `translation:
github:openai/gpt-4o`"
required: false
openai_api_key:
description: OpenAI API key
required: false
default: ${{ secrets.OPENAI_API_KEY }}
openai_api_base:
description: OpenAI API base URL
required: false
default: ${{ env.OPENAI_API_BASE }}
azure_openai_api_endpoint:
description: Azure OpenAI endpoint. In the Azure Portal, open your Azure OpenAI
resource, Keys and Endpoints, copy Endpoint.
required: false
default: ${{ env.AZURE_OPENAI_API_ENDPOINT }}
azure_openai_api_key:
description: Azure OpenAI API key. **You do NOT need this if you are using
Microsoft Entra ID.
required: false
default: ${{ secrets.AZURE_OPENAI_API_KEY }}
azure_openai_subscription_id:
description: Azure OpenAI subscription ID to list available deployments
(Microsoft Entra only).
required: false
default: ${{ env.AZURE_OPENAI_SUBSCRIPTION_ID }}
azure_openai_api_version:
description: Azure OpenAI API version.
required: false
default: ${{ env.AZURE_OPENAI_API_VERSION }}
azure_openai_api_credentials:
description: Azure OpenAI API credentials type. Leave as 'default' unless you
have a special Azure setup.
required: false
default: ${{ env.AZURE_OPENAI_API_CREDENTIALS }}
azure_ai_inference_api_key:
description: Azure AI Inference key
required: false
default: ${{ secrets.AZURE_AI_INFERENCE_API_KEY }}
azure_ai_inference_api_endpoint:
description: Azure Serverless OpenAI endpoint
required: false
default: ${{ env.AZURE_AI_INFERENCE_API_ENDPOINT }}
azure_ai_inference_api_version:
description: Azure Serverless OpenAI API version
required: false
default: ${{ env.AZURE_AI_INFERENCE_API_VERSION }}
azure_ai_inference_api_credentials:
description: Azure Serverless OpenAI API credentials type
required: false
default: ${{ env.AZURE_AI_INFERENCE_API_CREDENTIALS }}
github_token:
description: "GitHub token with `models: read` permission at least
(https://microsoft.github.io/genaiscript/reference/github-actions/#github\
-models-permissions)."
description: "GitHub token with [models:
read](https://microsoft.github.io/genaiscript/reference/github-actions/#g\
ithub-models-permissions) permission at least."
required: false
default: ${{ secrets.GITHUB_TOKEN }}
outputs:
text:
description: The generated text output.
Expand Down
16 changes: 10 additions & 6 deletions docs/src/content/docs/index.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,10 @@ intended to be used by developers who have prompts as part of their code base.
PromptPex treats a prompt as a function and automatically generates test inputs
to the function to support unit testing.

<video src="https://github.com/user-attachments/assets/0a81f506-ca1c-42f3-b876-9ba52e047493" controls />
<video
src="https://github.com/user-attachments/assets/0a81f506-ca1c-42f3-b876-9ba52e047493"
controls
/>

<hr />

Expand All @@ -48,16 +51,17 @@ to the function to support unit testing.
</Card>
<Card title="Groundtruth" icon="check-circle">
Generate expected outputs for tests using an AI model, and evaluate the
output from the groundtruth model using a list of models.
[Learn more](/promptpex/reference/groundtruth).
output from the groundtruth model using a list of models. [Learn
more](/promptpex/reference/groundtruth).
</Card>
<Card title="Integrated in the GitHub Models CLI" icon="github">
Generate test data for [GitHub Models
Evals](/promptpex/integrations/github-models-evals).
</Card>
<Card title="Export to OpenAI Evals" icon="add-document">
Export generated tests and metrics using (Azure) [OpenAI
Evals](/promptpex/integrations/openai-evals).
</Card>
<Card title="Export to GitHub Models Evals" icon="github">
Generate test data for [GitHub Models Evals](/promptpex/integrations/github-models-evals).
</Card>
<Card title="Azure OpenAI Store Completions" icon="cloud-download">
Use generated tests to distillate to smaller models using [Azure OpenAI
Stored
Expand Down
19 changes: 5 additions & 14 deletions docs/src/content/docs/integrations/github-models-evals.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,15 +4,15 @@ sidebar:
order: 28.5
---

[GitHub Models](https://github.com/marketplace/models) is a service that allows to run inference through your GitHub
subscription. Recently, GitHub Models added support for running evals.
[GitHub Models](https://github.com/marketplace/models) is a service that allows to run inference through your GitHub
subscription. PromptPex was integrated as the [generate](https://github.com/github/gh-models/tree/main/cmd/generate) command.

## .prompt.yml support
## gh models generate

PromptPex supports the GitHub Models `.prompt.yml` prompt format.
PromptPex is integrated in the [models extension](https://github.com/github/gh-models) for the GitHub CLI.

```sh
promptex summarizer.prompt.yml
gh models generate summarizer.prompt.yml
```

## Install the runner
Expand All @@ -24,12 +24,3 @@ promptex summarizer.prompt.yml
```bash wrap
gh extension install https://github.com/github/gh-models
```

## Generated eval file

For each model under test, PromptPex will generate a `.prompt.yml` file that contains the model under test, the test data and the metrics.
This file can be executed through the `gh models eval` command.

```bash
gh models eval <modelname>.prompt.yml
```
Loading