test Command

Run agent tests with synthetic inputs from bundles or scenarios.

Usage

fluxloop test [OPTIONS]

Options

Option	Description	Default
`--bundle`, `-b`	Bundle ID or path to use	Latest bundle
`--scenario`, `-s`	Scenario ID to test against	None
`--iterations`, `-i`	Number of runs per input	1
`--config`, `-c`	Path to configuration file	`fluxloop.yaml`
`--output-dir`	Output directory for results	`./fluxloop/results`
`--yes`, `-y`	Skip confirmation prompt	`false`
`--skip-upload/--no-skip-upload`	Skip or force upload after test	`auto_upload` (default: true)

Examples

Basic Test

Run test with the latest local bundle:

fluxloop test

Test Specific Scenario

Run against a scenario from the Web Platform:

fluxloop test --scenario prod-regression-v2

Multiple Iterations

Run each input multiple times to test consistency:

fluxloop test --iterations 10

Test and Upload Results

Run tests and automatically upload results to Web Platform (default behavior unless disabled in config):

fluxloop test

To explicitly ensure upload or override a config that skips it:

fluxloop test --no-skip-upload

CI/CD Mode

Skip confirmation prompts for automation:

fluxloop test --yes --no-skip-upload

How It Works

1. Load Test Bundle

The command loads synthetic inputs from:

Local bundle: Generated with fluxloop bundles create
Scenario: Pulled from Web Platform with fluxloop scenarios pull

# Example bundle structure
bundle_id: bundle_20241101_143022
inputs:
  - id: input_001
    text: "How do I get started?"
    persona: novice_user
    context:
      user_type: beginner
      goal: onboarding
  - id: input_002
    text: "What are the advanced features?"
    persona: expert_user

2. Execute Agent

For each input, runs your agent according to the runner configuration:

# fluxloop.yaml
runner:
  target: "src.agent:run"
  type: python-function

test:
  iterations: 1
  timeout: 30

3. Collect Results

Captures execution data:

Input/output pairs
Execution time
Trace data (if using FluxLoop SDK)
Errors and exceptions

4. Save Results

Outputs to ./fluxloop/results/run_YYYYMMDD_HHMMSS/:

results/run_20241101_143022/
├── summary.json           # Aggregate statistics
├── results.jsonl          # Per-input results
├── traces.jsonl           # Detailed traces (if SDK used)
└── config.yaml            # Test configuration snapshot

Output Files

summary.json

Aggregate test statistics:

{
  "test_id": "run_20241101_143022",
  "bundle_id": "bundle_20241101_140000",
  "total_inputs": 50,
  "total_runs": 500,
  "successful": 495,
  "failed": 5,
  "avg_duration_ms": 245.5,
  "started_at": "2024-11-01T14:30:22Z",
  "completed_at": "2024-11-01T14:35:10Z"
}

results.jsonl

One line per test run:

{"input_id": "input_001", "iteration": 0, "persona": "novice_user", "input": "How do I start?", "output": "...", "duration_ms": 245, "success": true}
{"input_id": "input_002", "iteration": 0, "persona": "expert_user", "input": "What are options?", "output": "...", "duration_ms": 267, "success": true}

Testing Against Scenarios

Scenarios are curated test suites managed on the Web Platform. They include:

Predefined synthetic inputs
Expected behavior criteria
Regression test cases
Production edge cases

Pull and Test Scenario

# Pull scenario from Web Platform
fluxloop sync pull --scenario prod-regression-v2

# Run tests against it
fluxloop test --scenario prod-regression-v2

Automatic Evaluation

When testing against scenarios with evaluation criteria:

# Run test and evaluate against criteria
fluxloop test --scenario prod-regression-v2

Results are automatically evaluated on the Web Platform.

Integration with Web Platform

Upload Results

Upload test results to view on the Web Platform (uploads are automatic by default):

fluxloop test --no-skip-upload

This enables:

Visual result analysis
Team collaboration
Historical tracking
Automated evaluation

View Results

After uploading, you'll get a URL to view results:

[INFO] Test completed successfully
[INFO] Results uploaded to Web Platform
[INFO] View results at: https://results.fluxloop.ai/run/run_20241101_143022

See Viewing Results for details.

Error Handling

Graceful Failures

If an agent run fails:

Error is captured in the result
Test continues with next input
Summary shows failure count

Critical Failures

If configuration is invalid:

Test stops immediately
Error message shows what to fix

Example:

$ fluxloop test
[ERROR] Runner target not found: src.agent:run
[ERROR] Please check your fluxloop.yaml configuration

Performance Tips

Parallel Execution

Configure parallel execution in your config:

test:
  parallel: 4  # Run 4 tests in parallel

Batch Testing

For large test suites, consider batching:

# Test specific subset
fluxloop test --bundle bundle_batch1 --output-dir results/batch1
fluxloop test --bundle bundle_batch2 --output-dir results/batch2

Best Practices

1. Use Multiple Iterations

Test consistency by running multiple iterations:

fluxloop test --iterations 10

This helps identify:

Non-deterministic behavior
Edge case failures
Performance variance

2. Regular Scenario Testing

Pull and test against scenarios regularly:

# In CI/CD pipeline
fluxloop scenarios pull production
fluxloop test --scenario production --yes --no-skip-upload

3. Automatic Upload

Results are automatically uploaded for team visibility and historical tracking.

fluxloop test

bundles - Create and manage test bundles
scenarios - Pull scenarios from Web Platform
criteria - View evaluation criteria
sync - Sync results with Web Platform

Next Steps

Basic Workflow - End-to-end testing guide
Testing Best Practices - Tips and patterns
Evaluation Guide - Understanding test results

Usage​

Options​

Examples​

Basic Test​

Test Specific Scenario​

Multiple Iterations​

Test and Upload Results​

CI/CD Mode​

How It Works​

1. Load Test Bundle​

2. Execute Agent​

3. Collect Results​

4. Save Results​

Output Files​

summary.json​

results.jsonl​

Testing Against Scenarios​

Pull and Test Scenario​

Automatic Evaluation​

Integration with Web Platform​

Upload Results​

View Results​

Error Handling​

Graceful Failures​

Critical Failures​

Performance Tips​

Parallel Execution​

Batch Testing​

Best Practices​

1. Use Multiple Iterations​

2. Regular Scenario Testing​

3. Automatic Upload​

Related Commands​

Next Steps​