Pytest Integration
Integrate FluxLoop experiments into your pytest test suite for CI/CD pipelines.
Overview
Starting with FluxLoop 0.2.29, you can run FluxLoop experiments as part of your pytest tests using specialized fixtures. This enables:
- Familiar Testing Workflow: Use
pytestcommands developers already know - CI/CD Integration: Run experiments in GitHub Actions, GitLab CI, etc.
- Assertion Support: Use pytest assertions on experiment results
- Test Discovery: Automatically discover and run FluxLoop tests
- Parallel Execution: Run multiple experiment tests in parallel
Quick Start
1. Install with Dev Dependencies
pip install -e packages/cli[dev]
# or for published package
pip install fluxloop-cli[dev]
2. Generate Test Template
# Generate pytest template in default location (tests/)
fluxloop init pytest-template
# Custom location
fluxloop init pytest-template --tests-dir integration_tests
# Custom filename
fluxloop init pytest-template --filename test_agent_smoke.py
What gets created:
# tests/test_fluxloop_smoke.py
import pytest
from pathlib import Path
from fluxloop_cli.testing.pytest_plugin import fluxloop_runner
PROJECT_ROOT = Path(__file__).resolve().parents[1]
def test_fluxloop_smoke(fluxloop_runner):
"""Smoke test: verify agent runs without errors."""
result = fluxloop_runner(
project_root=PROJECT_ROOT,
simulation_config=PROJECT_ROOT / "configs" / "simulation.yaml",
overrides={"iterations": 1},
env={"PYTHONPATH": str(PROJECT_ROOT)},
)
# Assert on results
assert result.total_runs > 0
assert result.success_rate >= 0.8
# Or use convenience method
result.require_success(threshold=0.8)
3. Run Tests
# Run FluxLoop tests only
pytest -k fluxloop_smoke
# Run with verbose output
pytest -k fluxloop -v
# Stop on first failure
pytest -k fluxloop --maxfail=1
# Run all tests including FluxLoop
pytest
Adapter Workflow for Existing Agents
Use the pytest bridge as an adapter layer when your AI agent already lives inside another repository (e.g., a LangGraph tutorial project) and FluxLoop needs to drive that code end-to-end.
1. Expose a FluxLoop Runner Entry Point
Create a module such as customer_support/runner.py and define a run function. The FluxLoop python-function runner imports and calls this entry point directly.
"""FluxLoop simulation runner entry point."""
from __future__ import annotations
import uuid
from typing import Any, Dict, Optional
from dotenv import load_dotenv
from langchain_anthropic import ChatAnthropic
from langchain_openai import ChatOpenAI
from customer_support import prepare_database
from customer_support.data.travel_db import get_default_storage_dir
from customer_support.graphs import build_part4_graph
from customer_support.tracing import init_tracing, trace_graph_execution
DEFAULT_PROVIDER = "anthropic"
@trace_graph_execution
def run(input_payload: Optional[Dict[str, Any]] = None) -> Dict[str, Any]:
load_dotenv()
init_tracing()
payload = input_payload or {}
user_message = payload.get("message", "Hello, I need help with my flight.")
passenger_id = payload.get("passenger_id", "3442 587242")
provider = payload.get("provider", DEFAULT_PROVIDER).lower()
thread_id = payload.get("thread_id") or str(uuid.uuid4())
data_dir = get_default_storage_dir()
data_dir.mkdir(parents=True, exist_ok=True)
db_path = prepare_database(target_dir=data_dir, overwrite=False)
if provider == "openai":
llm = ChatOpenAI(model="gpt-4o-mini", temperature=1)
else:
llm = ChatAnthropic(model="claude-haiku-4-5-20251001", temperature=1)
graph = build_part4_graph(str(db_path), llm=llm)
result = graph.invoke({"messages": [("user", user_message)]}, config={
"configurable": {"passenger_id": passenger_id, "thread_id": thread_id}
})
messages = result.get("messages", [])
response = ""
if messages and hasattr(messages[-1], "content"):
response = messages[-1].content
return {
"response": response,
"messages": [
{"role": getattr(m, "type", "unknown"), "content": getattr(m, "content", str(m))}
for m in messages
],
"thread_id": thread_id,
}
2. Align simulation.yaml
Point the runner section at the entry point you just created. If you need additional import paths, add them to python_path and FluxLoop will extend sys.path for you.
runner:
module_path: "customer_support.runner"
function_name: "run"
working_directory: /Users/you/projects/customer-support
python_path:
- src
timeout_seconds: 120
max_retries: 3
Even if the tutorial removes metrics such as task_completion, FluxLoop CLI now injects default thresholds so the evaluation report keeps rendering.
3. Write a Pytest Adapter Test
The pytest file itself acts as the adapter that launches the FluxLoop experiment.
# tests/test_customer_support.py
from pathlib import Path
PROJECT_ROOT = Path(__file__).resolve().parents[1]
def test_customer_support_agent(fluxloop_runner):
result = fluxloop_runner(
project_root=PROJECT_ROOT,
simulation_config=PROJECT_ROOT / "configs" / "simulation.yaml",
overrides={
"iterations": 1,
"runner.working_directory": str(PROJECT_ROOT / "langgraph" / "customer-support"),
"runner.python_path.0": "src",
},
env={
"PYTHONPATH": str(PROJECT_ROOT / "langgraph" / "customer-support"),
"OPENAI_API_KEY": "...",
"ANTHROPIC_API_KEY": "...",
},
timeout=180,
)
result.require_success("customer support agent")
Before running the test for the first time, install the SDK locally (pip install -e packages/sdk) and verify python -c "import fluxloop; print(fluxloop.__version__)". Use env={"PYTHONPATH": ...} to append extra agent source folders whenever imports would otherwise fail.
4. Run from Pytest or CI
pytest tests/test_customer_support.py -k customer_support_agent -v
Use the same command inside GitHub Actions or GitLab CI (pytest -k customer_support_agent --maxfail=1). Upload result.output_dir / report.html as an artifact to share the regression report with the rest of your team.
Available Fixtures
fluxloop_runner
Executes experiments using FluxLoop's ExperimentRunner directly (SDK mode).
Signature:
def fluxloop_runner(
project_root: Path,
simulation_config: Path,
overrides: dict | None = None,
env: dict | None = None,
timeout: int = 600,
) -> FluxLoopTestResult:
...
Parameters:
| Parameter | Type | Description | Default |
|---|---|---|---|
project_root | Path | Project root directory | Required |
simulation_config | Path | Path to simulation.yaml | Required |
overrides | dict | Override config values | None |
env | dict | Environment variables | None |
timeout | int | Timeout in seconds | 600 |
Example:
def test_basic_run(fluxloop_runner):
result = fluxloop_runner(
project_root=Path.cwd(),
simulation_config=Path("configs/simulation.yaml"),
overrides={"iterations": 5},
env={"PYTHONPATH": str(Path.cwd())},
)
assert result.total_runs == 5
result.require_success()
fluxloop_runner_multi_turn
Convenience fixture for multi-turn experiments (auto-enables multi-turn mode).
Example:
def test_multi_turn(fluxloop_runner_multi_turn):
result = fluxloop_runner_multi_turn(
project_root=Path.cwd(),
simulation_config=Path("configs/simulation.yaml"),
overrides={
"iterations": 2,
"multi_turn": {
"max_turns": 8,
"auto_approve_tools": True,
}
},
)
assert result.total_runs == 2
result.require_success(threshold=0.7)
fluxloop_cli (Advanced)
Executes experiments by calling fluxloop test as a subprocess (CLI mode).
When to Use:
- Testing actual CLI commands
- Verifying command-line behavior
- Debugging CLI output
- Integration testing with full CLI stack
Example:
def test_cli_execution(fluxloop_cli):
result = fluxloop_cli(
project_root=Path.cwd(),
simulation_config=Path("configs/simulation.yaml"),
cli_args=["--iterations", "3", "--yes"],
)
# Check CLI execution
assert result.success_rate > 0.8
# Access CLI-specific info
print(f"Command: {result.cli_command}")
print(f"Stdout: {result.stdout_path}")
print(f"Stderr: {result.stderr_path}")
FluxLoopTestResult API
All fixtures return a FluxLoopTestResult object with test metrics and paths.
Properties
class FluxLoopTestResult:
# Metrics
total_runs: int # Total experiment runs
success_rate: float # Success rate (0.0-1.0)
avg_duration_ms: float # Average duration in ms
# File Paths
experiment_dir: Path # Experiment output directory
trace_summary_path: Path # trace_summary.jsonl path
per_trace_path: Path | None # per_trace.jsonl (if parsed)
# CLI-specific (fluxloop_cli fixture only)
cli_command: str | None # Full CLI command
stdout_path: Path | None # Stdout log file
stderr_path: Path | None # Stderr log file