Pytest Integration

Integrate FluxLoop experiments into your pytest test suite for CI/CD pipelines.

Overview

Starting with FluxLoop 0.2.29, you can run FluxLoop experiments as part of your pytest tests using specialized fixtures. This enables:

Familiar Testing Workflow: Use pytest commands developers already know
CI/CD Integration: Run experiments in GitHub Actions, GitLab CI, etc.
Assertion Support: Use pytest assertions on experiment results
Test Discovery: Automatically discover and run FluxLoop tests
Parallel Execution: Run multiple experiment tests in parallel

Quick Start

1. Install with Dev Dependencies

pip install -e packages/cli[dev]
# or for published package
pip install fluxloop-cli[dev]

2. Generate Test Template

# Generate pytest template in default location (tests/)
fluxloop init pytest-template

# Custom location
fluxloop init pytest-template --tests-dir integration_tests

# Custom filename
fluxloop init pytest-template --filename test_agent_smoke.py

What gets created:

# tests/test_fluxloop_smoke.py
import pytest
from pathlib import Path
from fluxloop_cli.testing.pytest_plugin import fluxloop_runner

PROJECT_ROOT = Path(__file__).resolve().parents[1]

def test_fluxloop_smoke(fluxloop_runner):
    """Smoke test: verify agent runs without errors."""
    result = fluxloop_runner(
        project_root=PROJECT_ROOT,
        simulation_config=PROJECT_ROOT / "configs" / "simulation.yaml",
        overrides={"iterations": 1},
        env={"PYTHONPATH": str(PROJECT_ROOT)},
    )
    
    # Assert on results
    assert result.total_runs > 0
    assert result.success_rate >= 0.8
    
    # Or use convenience method
    result.require_success(threshold=0.8)

3. Run Tests

# Run FluxLoop tests only
pytest -k fluxloop_smoke

# Run with verbose output
pytest -k fluxloop -v

# Stop on first failure
pytest -k fluxloop --maxfail=1

# Run all tests including FluxLoop
pytest

Adapter Workflow for Existing Agents

Use the pytest bridge as an adapter layer when your AI agent already lives inside another repository (e.g., a LangGraph tutorial project) and FluxLoop needs to drive that code end-to-end.

1. Expose a FluxLoop Runner Entry Point

Create a module such as customer_support/runner.py and define a run function. The FluxLoop python-function runner imports and calls this entry point directly.

"""FluxLoop simulation runner entry point."""
from __future__ import annotations

import uuid
from typing import Any, Dict, Optional

from dotenv import load_dotenv
from langchain_anthropic import ChatAnthropic
from langchain_openai import ChatOpenAI

from customer_support import prepare_database
from customer_support.data.travel_db import get_default_storage_dir
from customer_support.graphs import build_part4_graph
from customer_support.tracing import init_tracing, trace_graph_execution

DEFAULT_PROVIDER = "anthropic"

@trace_graph_execution
def run(input_payload: Optional[Dict[str, Any]] = None) -> Dict[str, Any]:
    load_dotenv()
    init_tracing()

    payload = input_payload or {}
    user_message = payload.get("message", "Hello, I need help with my flight.")
    passenger_id = payload.get("passenger_id", "3442 587242")
    provider = payload.get("provider", DEFAULT_PROVIDER).lower()
    thread_id = payload.get("thread_id") or str(uuid.uuid4())

    data_dir = get_default_storage_dir()
    data_dir.mkdir(parents=True, exist_ok=True)
    db_path = prepare_database(target_dir=data_dir, overwrite=False)

    if provider == "openai":
        llm = ChatOpenAI(model="gpt-4o-mini", temperature=1)
    else:
        llm = ChatAnthropic(model="claude-haiku-4-5-20251001", temperature=1)

    graph = build_part4_graph(str(db_path), llm=llm)
    result = graph.invoke({"messages": [("user", user_message)]}, config={
        "configurable": {"passenger_id": passenger_id, "thread_id": thread_id}
    })

    messages = result.get("messages", [])
    response = ""
    if messages and hasattr(messages[-1], "content"):
        response = messages[-1].content

    return {
        "response": response,
        "messages": [
            {"role": getattr(m, "type", "unknown"), "content": getattr(m, "content", str(m))}
            for m in messages
        ],
        "thread_id": thread_id,
    }

2. Align `simulation.yaml`

Point the runner section at the entry point you just created. If you need additional import paths, add them to python_path and FluxLoop will extend sys.path for you.

runner:
  module_path: "customer_support.runner"
  function_name: "run"
  working_directory: /Users/you/projects/customer-support
  python_path:
    - src
  timeout_seconds: 120
  max_retries: 3

Even if the tutorial removes metrics such as task_completion, FluxLoop CLI now injects default thresholds so the evaluation report keeps rendering.

3. Write a Pytest Adapter Test

The pytest file itself acts as the adapter that launches the FluxLoop experiment.

# tests/test_customer_support.py
from pathlib import Path

PROJECT_ROOT = Path(__file__).resolve().parents[1]

def test_customer_support_agent(fluxloop_runner):
    result = fluxloop_runner(
        project_root=PROJECT_ROOT,
        simulation_config=PROJECT_ROOT / "configs" / "simulation.yaml",
        overrides={
            "iterations": 1,
            "runner.working_directory": str(PROJECT_ROOT / "langgraph" / "customer-support"),
            "runner.python_path.0": "src",
        },
        env={
            "PYTHONPATH": str(PROJECT_ROOT / "langgraph" / "customer-support"),
            "OPENAI_API_KEY": "...",
            "ANTHROPIC_API_KEY": "...",
        },
        timeout=180,
    )
    result.require_success("customer support agent")

Before running the test for the first time, install the SDK locally (pip install -e packages/sdk) and verify python -c "import fluxloop; print(fluxloop.__version__)". Use env={"PYTHONPATH": ...} to append extra agent source folders whenever imports would otherwise fail.

4. Run from Pytest or CI

pytest tests/test_customer_support.py -k customer_support_agent -v

Use the same command inside GitHub Actions or GitLab CI (pytest -k customer_support_agent --maxfail=1). Upload result.output_dir / report.html as an artifact to share the regression report with the rest of your team.

Available Fixtures

fluxloop_runner

Executes experiments using FluxLoop's ExperimentRunner directly (SDK mode).

Signature:

def fluxloop_runner(
    project_root: Path,
    simulation_config: Path,
    overrides: dict | None = None,
    env: dict | None = None,
    timeout: int = 600,
) -> FluxLoopTestResult:
    ...

Parameters:

Parameter	Type	Description	Default
`project_root`	`Path`	Project root directory	Required
`simulation_config`	`Path`	Path to simulation.yaml	Required
`overrides`	`dict`	Override config values	`None`
`env`	`dict`	Environment variables	`None`
`timeout`	`int`	Timeout in seconds	`600`

Example:

def test_basic_run(fluxloop_runner):
    result = fluxloop_runner(
        project_root=Path.cwd(),
        simulation_config=Path("configs/simulation.yaml"),
        overrides={"iterations": 5},
        env={"PYTHONPATH": str(Path.cwd())},
    )
    
    assert result.total_runs == 5
    result.require_success()

fluxloop_runner_multi_turn

Convenience fixture for multi-turn experiments (auto-enables multi-turn mode).

Example:

def test_multi_turn(fluxloop_runner_multi_turn):
    result = fluxloop_runner_multi_turn(
        project_root=Path.cwd(),
        simulation_config=Path("configs/simulation.yaml"),
        overrides={
            "iterations": 2,
            "multi_turn": {
                "max_turns": 8,
                "auto_approve_tools": True,
            }
        },
    )
    
    assert result.total_runs == 2
    result.require_success(threshold=0.7)

fluxloop_cli (Advanced)

Executes experiments by calling fluxloop test as a subprocess (CLI mode).

When to Use:

Testing actual CLI commands
Verifying command-line behavior
Debugging CLI output
Integration testing with full CLI stack

Example:

def test_cli_execution(fluxloop_cli):
    result = fluxloop_cli(
        project_root=Path.cwd(),
        simulation_config=Path("configs/simulation.yaml"),
        cli_args=["--iterations", "3", "--yes"],
    )
    
    # Check CLI execution
    assert result.success_rate > 0.8
    
    # Access CLI-specific info
    print(f"Command: {result.cli_command}")
    print(f"Stdout: {result.stdout_path}")
    print(f"Stderr: {result.stderr_path}")

FluxLoopTestResult API

All fixtures return a FluxLoopTestResult object with test metrics and paths.

Properties

class FluxLoopTestResult:
    # Metrics
    total_runs: int              # Total experiment runs
    success_rate: float          # Success rate (0.0-1.0)
    avg_duration_ms: float       # Average duration in ms
    
    # File Paths
    experiment_dir: Path         # Experiment output directory
    trace_summary_path: Path     # trace_summary.jsonl path
    per_trace_path: Path | None  # per_trace.jsonl (if parsed)
    
    # CLI-specific (fluxloop_cli fixture only)
    cli_command: str | None      # Full CLI command
    stdout_path: Path | None     # Stdout log file
    stderr_path: Path | None     # Stderr log file

Methods

require_success()

Assert that success rate meets threshold.

result.require_success(threshold=0.8)
# Raises AssertionError if success_rate < 0.8

Parameters:

Parameter	Type	Default	Description
`threshold`	`float`	`1.0`	Minimum success rate (0.0-1.0)
`message`	`str`	Auto-generated	Custom error message

Examples:

# Require 100% success
result.require_success()

# Require at least 80% success
result.require_success(threshold=0.8)

# Custom error message
result.require_success(
    threshold=0.9,
    message="Agent quality below 90%"
)

require_min_runs()

Assert minimum number of runs.

result.require_min_runs(min_runs=10)
# Raises AssertionError if total_runs < 10

require_max_duration()

Assert average duration is below threshold.

result.require_max_duration(max_ms=500)
# Raises AssertionError if avg_duration_ms > 500

Complete Examples

Basic Smoke Test

def test_agent_smoke(fluxloop_runner):
    """Quick validation that agent runs."""
    result = fluxloop_runner(
        project_root=Path.cwd(),
        simulation_config=Path("configs/simulation.yaml"),
        overrides={"iterations": 1},
    )
    
    # Just verify it completes
    assert result.total_runs > 0

Regression Test

def test_agent_regression(fluxloop_runner):
    """Ensure agent maintains quality standards."""
    result = fluxloop_runner(
        project_root=Path.cwd(),
        simulation_config=Path("configs/simulation.yaml"),
        overrides={"iterations": 10},
    )
    
    # Strict success requirements
    result.require_success(threshold=0.95)
    result.require_max_duration(max_ms=1000)

Performance Test

def test_agent_performance(fluxloop_runner):
    """Verify agent meets latency SLAs."""
    result = fluxloop_runner(
        project_root=Path.cwd(),
        simulation_config=Path("configs/simulation.yaml"),
        overrides={"iterations": 50},
    )
    
    # Check latency
    assert result.avg_duration_ms < 500, \
        f"Agent too slow: {result.avg_duration_ms}ms"
    
    # Still require reasonable quality
    result.require_success(threshold=0.85)

Multi-Turn Conversation Test

def test_multi_turn_conversation(fluxloop_runner_multi_turn):
    """Test agent handles multi-turn dialogues."""
    result = fluxloop_runner_multi_turn(
        project_root=Path.cwd(),
        simulation_config=Path("configs/simulation.yaml"),
        overrides={
            "iterations": 5,
            "multi_turn": {
                "max_turns": 10,
                "auto_approve_tools": True,
            }
        },
    )
    
    result.require_success(threshold=0.80)
    result.require_min_runs(min_runs=5)

Persona-Specific Test

def test_expert_persona(fluxloop_runner):
    """Test agent with expert user persona."""
    result = fluxloop_runner(
        project_root=Path.cwd(),
        simulation_config=Path("configs/simulation.yaml"),
        overrides={
            "iterations": 10,
            "personas": ["expert_user"],  # Filter to specific persona
        },
    )
    
    # Expert users should get faster responses
    result.require_max_duration(max_ms=300)
    result.require_success(threshold=0.90)

Custom Assertions

def test_custom_metrics(fluxloop_runner):
    """Test with custom evaluation logic."""
    result = fluxloop_runner(
        project_root=Path.cwd(),
        simulation_config=Path("configs/simulation.yaml"),
        overrides={"iterations": 20},
    )
    
    # Read per-trace data for custom checks
    import json
    with open(result.trace_summary_path) as f:
        traces = [json.loads(line) for line in f]
    
    # Custom assertions
    long_traces = [t for t in traces if t["duration_ms"] > 1000]
    assert len(long_traces) < 2, "Too many slow responses"
    
    failed_traces = [t for t in traces if not t.get("success")]
    assert len(failed_traces) == 0, "Found failed traces"

CI/CD Integration

GitHub Actions

Create .github/workflows/fluxloop-tests.yml:

name: FluxLoop Tests

on:
  push:
    branches: [main, develop]
  pull_request:
    branches: [main]

jobs:
  test:
    runs-on: ubuntu-latest
    
    steps:
      - uses: actions/checkout@v4
      
      - uses: actions/setup-python@v5
        with:
          python-version: '3.11'
      
      - name: Install dependencies
        run: |
          python -m venv .venv
          source .venv/bin/activate
          pip install -U pip
          pip install -e packages/cli[dev]
      
      - name: Run FluxLoop tests
        env:
          PYTHONPATH: ${{ github.workspace }}
          OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
        run: |
          source .venv/bin/activate
          pytest -k fluxloop --maxfail=1 -v
      
      - name: Upload results
        if: always()
        uses: actions/upload-artifact@v3
        with:
          name: fluxloop-results
          path: results/

GitLab CI

Create .gitlab-ci.yml:

test:fluxloop:
  stage: test
  image: python:3.11
  
  before_script:
    - python -m venv .venv
    - source .venv/bin/activate
    - pip install -U pip
    - pip install -e packages/cli[dev]
  
  script:
    - export PYTHONPATH=$CI_PROJECT_DIR
    - pytest -k fluxloop --maxfail=1 -v
  
  artifacts:
    when: always
    paths:
      - results/
    expire_in: 1 week

Example Workflow

Full example at examples/ci/fluxloop_pytest.yml:

name: fluxloop-pytest

on:
  workflow_dispatch:
  workflow_call:

jobs:
  smoke:
    runs-on: ubuntu-latest
    
    steps:
      - uses: actions/checkout@v4
      
      - uses: actions/setup-python@v5
        with:
          python-version: "3.11"
      
      - name: Install deps
        run: |
          python -m venv .venv
          source .venv/bin/activate
          pip install -U pip
          pip install -e packages/cli[dev]
      
      - name: Run FluxLoop Pytest suite
        env:
          PYTHONPATH: ${{ github.workspace }}
        run: |
          source .venv/bin/activate
          pytest -k fluxloop_smoke --maxfail=1 --disable-warnings

Best Practices

1. Start with Minimal Iterations

Use low iteration counts for fast feedback:

def test_quick_smoke(fluxloop_runner):
    result = fluxloop_runner(
        project_root=Path.cwd(),
        simulation_config=Path("configs/simulation.yaml"),
        overrides={"iterations": 1},  # Fast validation
    )
    result.require_success()

2. Use Fixtures for Setup

Share common configuration:

import pytest
from pathlib import Path

@pytest.fixture
def project_root():
    return Path(__file__).parents[1]

@pytest.fixture
def simulation_config(project_root):
    return project_root / "configs" / "simulation.yaml"

def test_agent(fluxloop_runner, project_root, simulation_config):
    result = fluxloop_runner(
        project_root=project_root,
        simulation_config=simulation_config,
        overrides={"iterations": 5},
    )
    result.require_success()

3. Set PYTHONPATH

Ensure agent modules are importable:

def test_agent(fluxloop_runner):
    project_root = Path.cwd()
    result = fluxloop_runner(
        project_root=project_root,
        simulation_config=project_root / "configs/simulation.yaml",
        env={"PYTHONPATH": str(project_root)},  # Important!
    )
    result.require_success()

4. Use Timeouts

Prevent hanging tests:

def test_agent(fluxloop_runner):
    result = fluxloop_runner(
        project_root=Path.cwd(),
        simulation_config=Path("configs/simulation.yaml"),
        timeout=300,  # 5 minute timeout
    )
    result.require_success()

5. Organize Tests by Category

# tests/test_fluxloop_smoke.py
def test_smoke(fluxloop_runner):
    """Quick validation."""
    ...

# tests/test_fluxloop_regression.py
def test_regression(fluxloop_runner):
    """Quality regression tests."""
    ...

# tests/test_fluxloop_performance.py
def test_performance(fluxloop_runner):
    """Latency and throughput tests."""
    ...

Run by category:

pytest tests/test_fluxloop_smoke.py  # Fast smoke tests
pytest tests/test_fluxloop_regression.py  # Quality checks
pytest -k fluxloop  # All FluxLoop tests

Advanced Usage

Parameterized Tests

Test multiple configurations:

import pytest

@pytest.mark.parametrize("iterations", [1, 5, 10])
def test_varying_iterations(fluxloop_runner, iterations):
    result = fluxloop_runner(
        project_root=Path.cwd(),
        simulation_config=Path("configs/simulation.yaml"),
        overrides={"iterations": iterations},
    )
    result.require_success(threshold=0.8)

Custom Result Processing

def test_with_custom_processing(fluxloop_runner):
    result = fluxloop_runner(
        project_root=Path.cwd(),
        simulation_config=Path("configs/simulation.yaml"),
        overrides={"iterations": 10},
    )
    
    # Parse per-trace results
    import json
    traces = []
    with open(result.trace_summary_path) as f:
        for line in f:
            traces.append(json.loads(line))
    
    # Custom analysis
    durations = [t["duration_ms"] for t in traces]
    p95 = sorted(durations)[int(len(durations) * 0.95)]
    
    assert p95 < 1000, f"P95 latency too high: {p95}ms"

Troubleshooting

Tests Hang or Timeout

Issue: Tests don't complete.

Solutions:

Set explicit timeout:

result = fluxloop_runner(..., timeout=300)

Use pytest-timeout:

pip install pytest-timeout
pytest -k fluxloop --timeout=600

Add --maxfail=1 to stop early:
```
pytest -k fluxloop --maxfail=1
```

Module Not Found

Issue: ModuleNotFoundError: No module named 'my_agent'

Solution: Set PYTHONPATH:

result = fluxloop_runner(
    ...,
    env={"PYTHONPATH": str(project_root)},
)

Or in CI:

env:
  PYTHONPATH: ${{ github.workspace }}

API Key Not Found

Issue: LLM provider errors.

Solution: Set API keys in environment:

# CI configuration
env:
  OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}

Or locally:

export OPENAI_API_KEY=sk-your-key
pytest -k fluxloop

Overview​

Quick Start​

1. Install with Dev Dependencies​

2. Generate Test Template​

3. Run Tests​

Adapter Workflow for Existing Agents​

1. Expose a FluxLoop Runner Entry Point​

2. Align simulation.yaml​

3. Write a Pytest Adapter Test​

4. Run from Pytest or CI​

Available Fixtures​

fluxloop_runner​

fluxloop_runner_multi_turn​

fluxloop_cli (Advanced)​

FluxLoopTestResult API​

Properties​

Methods​

require_success()​

require_min_runs()​

require_max_duration()​

Complete Examples​

Basic Smoke Test​

Regression Test​

Performance Test​

Multi-Turn Conversation Test​

Persona-Specific Test​

Custom Assertions​

CI/CD Integration​

GitHub Actions​

GitLab CI​

Example Workflow​

Best Practices​

1. Start with Minimal Iterations​

2. Use Fixtures for Setup​

3. Set PYTHONPATH​

4. Use Timeouts​

5. Organize Tests by Category​

Advanced Usage​

Parameterized Tests​

Custom Result Processing​

Troubleshooting​

Tests Hang or Timeout​

Module Not Found​

API Key Not Found​

See Also​

Overview

Quick Start

1. Install with Dev Dependencies

2. Generate Test Template

3. Run Tests

Adapter Workflow for Existing Agents

1. Expose a FluxLoop Runner Entry Point

2. Align `simulation.yaml`

3. Write a Pytest Adapter Test

4. Run from Pytest or CI

Available Fixtures

fluxloop_runner

fluxloop_runner_multi_turn

fluxloop_cli (Advanced)

FluxLoopTestResult API

Properties

Methods

require_success()

require_min_runs()

require_max_duration()

Complete Examples

Basic Smoke Test

Regression Test

Performance Test

Multi-Turn Conversation Test

Persona-Specific Test

Custom Assertions

CI/CD Integration

GitHub Actions

GitLab CI

Example Workflow

Best Practices

1. Start with Minimal Iterations

2. Use Fixtures for Setup

3. Set PYTHONPATH

4. Use Timeouts

5. Organize Tests by Category

Advanced Usage

Parameterized Tests

Custom Result Processing

Troubleshooting

Tests Hang or Timeout

Module Not Found

API Key Not Found

See Also