Skip to main content

fluxloop criteria

Manage evaluation criteria.

Synopsis

fluxloop criteria [command] [options]

Description

The criteria command manages evaluation criteria for test scenarios. Criteria define what makes a test pass or fail.

Commands

fluxloop criteria list

List all evaluation criteria.

Usage:

fluxloop criteria list [options]

Options:

  • --scenario <name>: List criteria for specific scenario
  • --type <type>: Filter by criteria type
  • --json: Output in JSON format

Examples:

# List all criteria
fluxloop criteria list

# List criteria for specific scenario
fluxloop criteria list --scenario password-reset

# List criteria of specific type
fluxloop criteria list --type contains

Output:

Evaluation Criteria

Scenario: password-reset
┌──────────────────────────────────────────────────────────────────┐
│ contains-reset-link │
│ Type: contains │
│ Field: response │
│ Value: "reset link" │
│ Required: Yes | Weight: 0.3 │
│ Pass rate: 95% (last 30 days) │
└──────────────────────────────────────────────────────────────────┘

┌──────────────────────────────────────────────────────────────────┐
│ response-time │
│ Type: response_time │
│ Threshold: < 3000ms │
│ Required: Yes | Weight: 0.2 │
│ Pass rate: 82% (last 30 days) │
└──────────────────────────────────────────────────────────────────┘

┌──────────────────────────────────────────────────────────────────┐
│ empathy-check │
│ Type: sentiment │
│ Min score: 0.6 │
│ Required: No | Weight: 0.3 │
│ Pass rate: 78% (last 30 days) │
└──────────────────────────────────────────────────────────────────┘

Total: 4 criteria

fluxloop criteria create

Create a new evaluation criterion.

Usage:

fluxloop criteria create [options]

Options:

  • --scenario <name>: Scenario to add criterion to (required)
  • --type <type>: Criterion type (required)
  • --name <name>: Criterion name/ID
  • --required: Mark as required (default: true)
  • --weight <weight>: Weight for scoring (0.0-1.0, default: 1.0)

Type-specific options:

For contains type:

  • --field <field>: Field to check (response, metadata, etc.)
  • --value <value>: Value to search for

For response_time type:

  • --threshold <ms>: Max response time in milliseconds

For sentiment type:

  • --min-score <score>: Minimum sentiment score (0.0-1.0)

For regex type:

  • --pattern <regex>: Regular expression pattern

Examples:

# Create "contains" criterion
fluxloop criteria create \
--scenario password-reset \
--type contains \
--name "mentions-email" \
--field response \
--value "email"

# Create response time criterion
fluxloop criteria create \
--scenario password-reset \
--type response_time \
--threshold 3000 \
--required

# Create sentiment criterion
fluxloop criteria create \
--scenario password-reset \
--type sentiment \
--min-score 0.6 \
--weight 0.3

# Create regex criterion
fluxloop criteria create \
--scenario order-tracking \
--type regex \
--name "order-number-format" \
--pattern "^#[0-9]{6}$"

Interactive Flow:

$ fluxloop criteria create --scenario password-reset

Create Evaluation Criterion

Scenario: password-reset

Criterion types:
1. contains - Response contains specific text
2. not_contains - Response doesn't contain text
3. response_time - Response time under threshold
4. sentiment - Sentiment analysis score
5. regex - Regular expression match
6. json_schema - JSON schema validation
7. custom - Custom evaluation function

Select type (1-7): 1

Field to check:
1. response - Agent response text
2. metadata - Response metadata
3. tool_calls - Tool invocations
4. all - All fields

Select field (1-4): 1

Value to search for: reset link

Criterion name (optional): contains-reset-link

Required? (y/n): y

Weight (0.0-1.0, default 1.0): 0.3

✅ Criterion created: contains-reset-link

Criterion added to: scenarios/password-reset.yaml

Next steps:
• Run test to validate: fluxloop test --scenario password-reset
• View criteria: fluxloop criteria list --scenario password-reset

fluxloop criteria show

Display details for a specific criterion.

Usage:

fluxloop criteria show <criterion-id> [options]

Options:

  • --scenario <name>: Scenario containing the criterion (required)
  • --json: Output in JSON format

Examples:

# Show criterion details
fluxloop criteria show contains-reset-link \
--scenario password-reset

Output:

Criterion: contains-reset-link

Scenario: password-reset
Type: contains
Field: response
Value: "reset link"

Configuration:
Required: Yes
Weight: 0.3
Case sensitive: No

Performance (Last 30 days):
Tests run: 156
Passed: 148 (95%)
Failed: 8 (5%)

Avg check time: 12ms

Recent failures:
• Jan 15 14:30 - "password reset" (missing "link")
• Jan 14 11:20 - "sent reset email" (missing "link")
• Jan 13 16:45 - "check your inbox" (missing "reset link")

Suggestions:
• Consider accepting variations: "reset email", "reset message"
• Current strict matching may be too rigid
• 5% failure rate is within acceptable range

fluxloop criteria update

Update an existing criterion.

Usage:

fluxloop criteria update <criterion-id> [options]

Options:

  • --scenario <name>: Scenario containing the criterion (required)
  • --required <bool>: Update required flag
  • --weight <weight>: Update weight
  • --value <value>: Update value (for contains/regex types)
  • --threshold <ms>: Update threshold (for response_time type)

Examples:

# Update criterion weight
fluxloop criteria update contains-reset-link \
--scenario password-reset \
--weight 0.5

# Make criterion optional
fluxloop criteria update empathy-check \
--scenario password-reset \
--required false

# Update threshold
fluxloop criteria update response-time \
--scenario password-reset \
--threshold 5000

fluxloop criteria delete

Delete a criterion.

Usage:

fluxloop criteria delete <criterion-id> [options]

Options:

  • --scenario <name>: Scenario containing the criterion (required)
  • --force: Skip confirmation prompt

Examples:

# Delete criterion (with confirmation)
fluxloop criteria delete old-criterion \
--scenario password-reset

# Delete without confirmation
fluxloop criteria delete old-criterion \
--scenario password-reset \
--force

fluxloop criteria pull

Pull criteria from the cloud.

Usage:

fluxloop criteria pull [options]

Options:

  • --scenario <name>: Pull criteria for specific scenario
  • --all: Pull criteria for all scenarios (default)

Examples:

# Pull all criteria
fluxloop criteria pull

# Pull criteria for specific scenario
fluxloop criteria pull --scenario password-reset

fluxloop criteria push

Push criteria to the cloud.

Usage:

fluxloop criteria push [options]

Options:

  • --scenario <name>: Push criteria for specific scenario
  • --all: Push criteria for all scenarios (default)

Examples:

# Push all criteria
fluxloop criteria push

# Push criteria for specific scenario
fluxloop criteria push --scenario password-reset

Criterion Types

1. Contains

Check if response contains specific text:

- id: contains-reset-link
type: contains
field: response
value: "reset link"
required: true
weight: 0.3
case_sensitive: false

2. Not Contains

Check if response doesn't contain specific text:

- id: no-error-messages
type: not_contains
field: response
value: ["error", "failed", "cannot"]
required: true
weight: 0.2

3. Response Time

Check if response time is under threshold:

- id: response-time
type: response_time
threshold_ms: 3000
required: true
weight: 0.2

4. Sentiment

Analyze sentiment of response:

- id: empathy-check
type: sentiment
min_score: 0.6
max_score: 1.0
required: false
weight: 0.3

5. Regex

Match response against regular expression:

- id: order-number-format
type: regex
field: response
pattern: "#[0-9]{6}"
required: true
weight: 0.3

6. JSON Schema

Validate JSON response structure:

- id: api-response-schema
type: json_schema
schema:
type: object
required: ["status", "data"]
properties:
status:
type: string
enum: ["success", "error"]
data:
type: object
required: true
weight: 0.4

7. Custom Function

Custom evaluation logic:

- id: custom-validation
type: custom
function: |
def evaluate(response, context):
# Custom validation logic
if "password" in response.lower():
if "reset" in response.lower():
return True, "Mentions password reset"
else:
return False, "Mentions password but not reset"
return False, "Doesn't mention password"
required: true
weight: 0.3

Criteria File Format

Criteria are typically embedded in scenario files:

# scenarios/password-reset.yaml
name: password-reset
description: Test password reset flow

personas:
- frustrated-user
- tech-savvy-user

inputs:
- "I can't login"
- "Forgot my password"

criteria:
- id: contains-reset-link
type: contains
field: response
value: "reset link"
required: true
weight: 0.3
description: "Agent mentions sending a reset link"

- id: response-time
type: response_time
threshold_ms: 3000
required: true
weight: 0.2
description: "Response within 3 seconds"

- id: empathy-check
type: sentiment
min_score: 0.6
required: false
weight: 0.3
description: "Empathetic and helpful tone"

- id: follow-up-offered
type: contains
field: response
value: ["help", "contact", "support", "assist"]
required: false
weight: 0.2
description: "Offers additional help"

Best Practices

1. Balance Required and Optional

criteria:
# Required criteria (must pass)
- id: core-functionality
required: true
weight: 0.5

# Optional criteria (nice to have)
- id: extra-feature
required: false
weight: 0.2

2. Use Appropriate Weights

criteria:
# Critical (high weight)
- id: security-check
weight: 0.4

# Important (medium weight)
- id: functionality-check
weight: 0.3

# Nice-to-have (low weight)
- id: tone-check
weight: 0.1

3. Provide Clear Descriptions

- id: mentions-timeline
type: contains
value: ["minutes", "hours", "shortly", "soon"]
description: |
Agent provides a timeline for resolution.
Helps set user expectations.
Pass examples: "within 10 minutes", "shortly"
Fail examples: no timeline mentioned

4. Test Criteria Independently

# Test single criterion
fluxloop test \
--scenario password-reset \
--criterion contains-reset-link

5. Monitor Criterion Performance

# View criterion statistics
fluxloop criteria show contains-reset-link \
--scenario password-reset

Troubleshooting

Criterion Always Fails

⚠️  Warning: Criterion 'contains-reset-link' has 0% pass rate

Scenario: password-reset
Tests run: 10
Passed: 0 (0%)

Recent responses:
• "I've sent a password reset email"
• "Check your inbox for reset instructions"
• "You'll receive a reset message shortly"

Suggestion:
• Value "reset link" is too specific
• Consider accepting variations: ["reset", "password reset"]
• Or use regex: "reset.*(link|email|message)"

Criterion Too Lenient

⚠️  Warning: Criterion 'mentions-help' has 100% pass rate

This criterion might be too lenient and not providing value.

Suggestion:
• Review criterion requirements
• Make it more specific
• Or remove if not needed

See Also