A CLI tool for versioning, testing, and sharing AI prompts across different LLM providers. Treat prompts as code with git integration, A/B testing, templating, and a shared prompt registry.

Features

Version Control: Track prompt changes with Git, create branches for A/B testing variations
Multi-Provider Support: Unified API for OpenAI, Anthropic, and Ollama
Prompt Templating: Jinja2-based variable substitution with type validation
A/B Testing Framework: Compare prompt variations with statistical analysis
Output Validation: Validate LLM responses against JSON schemas or regex patterns
Prompt Registry: Share prompts via local and remote registries

Installation

pip install promptforge

Or from source:

git clone https://github.com/yourusername/promptforge.git
cd promptforge
pip install -e .

Quick Start

Initialize a new PromptForge project:

pf init

Create your first prompt:

pf prompt create "Summarizer" -c "Summarize the following text: {{text}}"

Run the prompt:

pf run Summarizer -v text="Your long text here..."

Configuration

Create a configs/promptforge.yaml file:

providers:
  openai:
    api_key: ${OPENAI_API_KEY}
    model: gpt-4
    temperature: 0.7
  anthropic:
    api_key: ${ANTHROPIC_API_KEY}
    model: claude-3-sonnet-20240229
  ollama:
    base_url: http://localhost:11434
    model: llama2

defaults:
  provider: openai
  output_format: text

Creating Prompts

Prompts are YAML files with front matter:

---
name: Code Explainer
description: Explain code snippets
version: "1.0.0"
provider: openai
tags: [coding, education]
variables:
  - name: language
    type: choice
    required: true
    choices: [python, javascript, rust, go]
  - name: code
    type: string
    required: true
validation:
  - type: regex
    pattern: "(def|function|fn|func)"
---
Explain this {{language}} code:

{{code}}

Focus on:
- What the code does
- Key functions/classes used
- Any potential improvements

Running Prompts

# Run with variables
pf run "Code Explainer" -v language=python -v code="def hello(): print('world')"

# Use a different provider
pf run "Code Explainer" -p anthropic -v language=rust -v code="..."

# Output as JSON
pf run "Code Explainer" -o json -v ...

Version Control

# Create a version commit
pf version create "Added validation rules"

# View history
pf version history

# Create a branch for A/B testing
pf version branch test-variation-a

# List all branches
pf version list

A/B Testing

Compare prompt variations:

# Test a single prompt
pf test "Code Explainer" --iterations 5

# Compare multiple prompts
pf test "Prompt A" "Prompt B" --iterations 3

# Run in parallel
pf test "Prompt A" "Prompt B" --parallel

Output Validation

Add validation rules to your prompts:

validation:
  - type: regex
    pattern: "^\\d+\\. .+"
    message: "Response must be numbered list"
  
  - type: json
    schema:
      type: object
      properties:
        summary:
          type: string
          minLength: 10
        keywords:
          type: array
          items:
            type: string

Prompt Registry

Local Registry

# List local prompts
pf registry list

# Add prompt to registry
pf registry add "Code Explainer" --author "Your Name"

# Search registry
pf registry search "python"

Remote Registry

# Pull a prompt from remote
pf registry pull <entry_id>

# Publish your prompt
pf registry publish "Code Explainer"

API Reference

Core Classes

Prompt: Main prompt model with YAML serialization
TemplateEngine: Jinja2-based template rendering
GitManager: Git integration for version control
ProviderBase: Abstract interface for LLM providers

Providers

OpenAIProvider: OpenAI GPT models
AnthropicProvider: Anthropic Claude models
OllamaProvider: Local Ollama models

Testing

ABTest: A/B test runner
Validator: Response validation framework
MetricsCollector: Metrics aggregation

Contributing

Fork the repository
Create a feature branch
Make your changes
Run tests: pytest tests/ -v
Submit a pull request

License

MIT License - see LICENSE file for details.