Skip to main content

test Command

Run agent tests with synthetic inputs from bundles or scenarios.

Usage

fluxloop test [OPTIONS]

Options

OptionDescriptionDefault
--bundle, -bBundle ID or path to useLatest bundle
--scenario, -sScenario ID to test againstNone
--iterations, -iNumber of runs per input1
--config, -cPath to configuration filefluxloop.yaml
--output-dirOutput directory for results./fluxloop/results
--yes, -ySkip confirmation promptfalse
--skip-upload/--no-skip-uploadSkip or force upload after testauto_upload (default: true)

Examples

Basic Test

Run test with the latest local bundle:

fluxloop test

Test Specific Scenario

Run against a scenario from the Web Platform:

fluxloop test --scenario prod-regression-v2

Multiple Iterations

Run each input multiple times to test consistency:

fluxloop test --iterations 10

Test and Upload Results

Run tests and automatically upload results to Web Platform (default behavior unless disabled in config):

fluxloop test

To explicitly ensure upload or override a config that skips it:

fluxloop test --no-skip-upload

CI/CD Mode

Skip confirmation prompts for automation:

fluxloop test --yes --no-skip-upload

How It Works

1. Load Test Bundle

The command loads synthetic inputs from:

  • Local bundle: Generated with fluxloop bundles create
  • Scenario: Pulled from Web Platform with fluxloop scenarios pull
# Example bundle structure
bundle_id: bundle_20241101_143022
inputs:
- id: input_001
text: "How do I get started?"
persona: novice_user
context:
user_type: beginner
goal: onboarding
- id: input_002
text: "What are the advanced features?"
persona: expert_user

2. Execute Agent

For each input, runs your agent according to the runner configuration:

# fluxloop.yaml
runner:
target: "src.agent:run"
type: python-function

test:
iterations: 1
timeout: 30

3. Collect Results

Captures execution data:

  • Input/output pairs
  • Execution time
  • Trace data (if using FluxLoop SDK)
  • Errors and exceptions

4. Save Results

Outputs to ./fluxloop/results/run_YYYYMMDD_HHMMSS/:

results/run_20241101_143022/
├── summary.json # Aggregate statistics
├── results.jsonl # Per-input results
├── traces.jsonl # Detailed traces (if SDK used)
└── config.yaml # Test configuration snapshot

Output Files

summary.json

Aggregate test statistics:

{
"test_id": "run_20241101_143022",
"bundle_id": "bundle_20241101_140000",
"total_inputs": 50,
"total_runs": 500,
"successful": 495,
"failed": 5,
"avg_duration_ms": 245.5,
"started_at": "2024-11-01T14:30:22Z",
"completed_at": "2024-11-01T14:35:10Z"
}

results.jsonl

One line per test run:

{"input_id": "input_001", "iteration": 0, "persona": "novice_user", "input": "How do I start?", "output": "...", "duration_ms": 245, "success": true}
{"input_id": "input_002", "iteration": 0, "persona": "expert_user", "input": "What are options?", "output": "...", "duration_ms": 267, "success": true}

Testing Against Scenarios

Scenarios are curated test suites managed on the Web Platform. They include:

  • Predefined synthetic inputs
  • Expected behavior criteria
  • Regression test cases
  • Production edge cases

Pull and Test Scenario

# Pull scenario from Web Platform
fluxloop sync pull --scenario prod-regression-v2

# Run tests against it
fluxloop test --scenario prod-regression-v2

Automatic Evaluation

When testing against scenarios with evaluation criteria:

# Run test and evaluate against criteria
fluxloop test --scenario prod-regression-v2

Results are automatically evaluated on the Web Platform.

Integration with Web Platform

Upload Results

Upload test results to view on the Web Platform (uploads are automatic by default):

fluxloop test --no-skip-upload

This enables:

  • Visual result analysis
  • Team collaboration
  • Historical tracking
  • Automated evaluation

View Results

After uploading, you'll get a URL to view results:

[INFO] Test completed successfully
[INFO] Results uploaded to Web Platform
[INFO] View results at: https://results.fluxloop.ai/run/run_20241101_143022

See Viewing Results for details.

Error Handling

Graceful Failures

If an agent run fails:

  • Error is captured in the result
  • Test continues with next input
  • Summary shows failure count

Critical Failures

If configuration is invalid:

  • Test stops immediately
  • Error message shows what to fix

Example:

$ fluxloop test
[ERROR] Runner target not found: src.agent:run
[ERROR] Please check your fluxloop.yaml configuration

Performance Tips

Parallel Execution

Configure parallel execution in your config:

test:
parallel: 4 # Run 4 tests in parallel

Batch Testing

For large test suites, consider batching:

# Test specific subset
fluxloop test --bundle bundle_batch1 --output-dir results/batch1
fluxloop test --bundle bundle_batch2 --output-dir results/batch2

Best Practices

1. Use Multiple Iterations

Test consistency by running multiple iterations:

fluxloop test --iterations 10

This helps identify:

  • Non-deterministic behavior
  • Edge case failures
  • Performance variance

2. Regular Scenario Testing

Pull and test against scenarios regularly:

# In CI/CD pipeline
fluxloop scenarios pull production
fluxloop test --scenario production --yes --no-skip-upload

3. Automatic Upload

Results are automatically uploaded for team visibility and historical tracking.

fluxloop test
  • bundles - Create and manage test bundles
  • scenarios - Pull scenarios from Web Platform
  • criteria - View evaluation criteria
  • sync - Sync results with Web Platform

Next Steps