test Command
Run agent tests with synthetic inputs from bundles or scenarios.
Usage
fluxloop test [OPTIONS]
Options
| Option | Description | Default |
|---|---|---|
--bundle, -b | Bundle ID or path to use | Latest bundle |
--scenario, -s | Scenario ID to test against | None |
--iterations, -i | Number of runs per input | 1 |
--config, -c | Path to configuration file | fluxloop.yaml |
--output-dir | Output directory for results | ./fluxloop/results |
--yes, -y | Skip confirmation prompt | false |
--skip-upload/--no-skip-upload | Skip or force upload after test | auto_upload (default: true) |
Examples
Basic Test
Run test with the latest local bundle:
fluxloop test
Test Specific Scenario
Run against a scenario from the Web Platform:
fluxloop test --scenario prod-regression-v2
Multiple Iterations
Run each input multiple times to test consistency:
fluxloop test --iterations 10
Test and Upload Results
Run tests and automatically upload results to Web Platform (default behavior unless disabled in config):
fluxloop test
To explicitly ensure upload or override a config that skips it:
fluxloop test --no-skip-upload
CI/CD Mode
Skip confirmation prompts for automation:
fluxloop test --yes --no-skip-upload
How It Works
1. Load Test Bundle
The command loads synthetic inputs from:
- Local bundle: Generated with
fluxloop bundles create - Scenario: Pulled from Web Platform with
fluxloop scenarios pull
# Example bundle structure
bundle_id: bundle_20241101_143022
inputs:
- id: input_001
text: "How do I get started?"
persona: novice_user
context:
user_type: beginner
goal: onboarding
- id: input_002
text: "What are the advanced features?"
persona: expert_user
2. Execute Agent
For each input, runs your agent according to the runner configuration:
# fluxloop.yaml
runner:
target: "src.agent:run"
type: python-function
test:
iterations: 1
timeout: 30
3. Collect Results
Captures execution data:
- Input/output pairs
- Execution time
- Trace data (if using FluxLoop SDK)
- Errors and exceptions
4. Save Results
Outputs to ./fluxloop/results/run_YYYYMMDD_HHMMSS/:
results/run_20241101_143022/
├── summary.json # Aggregate statistics
├── results.jsonl # Per-input results
├── traces.jsonl # Detailed traces (if SDK used)
└── config.yaml # Test configuration snapshot
Output Files
summary.json
Aggregate test statistics:
{
"test_id": "run_20241101_143022",
"bundle_id": "bundle_20241101_140000",
"total_inputs": 50,
"total_runs": 500,
"successful": 495,
"failed": 5,
"avg_duration_ms": 245.5,
"started_at": "2024-11-01T14:30:22Z",
"completed_at": "2024-11-01T14:35:10Z"
}
results.jsonl
One line per test run:
{"input_id": "input_001", "iteration": 0, "persona": "novice_user", "input": "How do I start?", "output": "...", "duration_ms": 245, "success": true}
{"input_id": "input_002", "iteration": 0, "persona": "expert_user", "input": "What are options?", "output": "...", "duration_ms": 267, "success": true}
Testing Against Scenarios
Scenarios are curated test suites managed on the Web Platform. They include:
- Predefined synthetic inputs
- Expected behavior criteria
- Regression test cases
- Production edge cases
Pull and Test Scenario
# Pull scenario from Web Platform
fluxloop sync pull --scenario prod-regression-v2
# Run tests against it
fluxloop test --scenario prod-regression-v2
Automatic Evaluation
When testing against scenarios with evaluation criteria:
# Run test and evaluate against criteria
fluxloop test --scenario prod-regression-v2
Results are automatically evaluated on the Web Platform.
Integration with Web Platform
Upload Results
Upload test results to view on the Web Platform (uploads are automatic by default):
fluxloop test --no-skip-upload
This enables:
- Visual result analysis
- Team collaboration
- Historical tracking
- Automated evaluation
View Results
After uploading, you'll get a URL to view results:
[INFO] Test completed successfully
[INFO] Results uploaded to Web Platform
[INFO] View results at: https://results.fluxloop.ai/run/run_20241101_143022
See Viewing Results for details.
Error Handling
Graceful Failures
If an agent run fails:
- Error is captured in the result
- Test continues with next input
- Summary shows failure count
Critical Failures
If configuration is invalid:
- Test stops immediately
- Error message shows what to fix
Example:
$ fluxloop test
[ERROR] Runner target not found: src.agent:run
[ERROR] Please check your fluxloop.yaml configuration
Performance Tips
Parallel Execution
Configure parallel execution in your config:
test:
parallel: 4 # Run 4 tests in parallel
Batch Testing
For large test suites, consider batching:
# Test specific subset
fluxloop test --bundle bundle_batch1 --output-dir results/batch1
fluxloop test --bundle bundle_batch2 --output-dir results/batch2
Best Practices
1. Use Multiple Iterations
Test consistency by running multiple iterations:
fluxloop test --iterations 10
This helps identify:
- Non-deterministic behavior
- Edge case failures
- Performance variance
2. Regular Scenario Testing
Pull and test against scenarios regularly:
# In CI/CD pipeline
fluxloop scenarios pull production
fluxloop test --scenario production --yes --no-skip-upload
3. Automatic Upload
Results are automatically uploaded for team visibility and historical tracking.
fluxloop test
Related Commands
bundles- Create and manage test bundlesscenarios- Pull scenarios from Web Platformcriteria- View evaluation criteriasync- Sync results with Web Platform
Next Steps
- Basic Workflow - End-to-end testing guide
- Testing Best Practices - Tips and patterns
- Evaluation Guide - Understanding test results