Viewing Results

Learn how to view and analyze test results on the FluxLoop Web Platform.

Accessing Results

Via Web Dashboard

Log in to results.fluxloop.ai
Select a project
Select a scenario
View the list of recent test runs

Direct Links

When you run tests locally, the result URL is automatically displayed:

$ fluxloop test

✅ Test completed successfully

📊 View results at:
https://results.fluxloop.ai/project/customer-support/run/abc123

Claude Code also provides the link:

/fluxloop test

✅ Test completed

View results: https://results.fluxloop.ai/...

Result Overview

Test Run Summary

Summary information for each test run:

┌─────────────────────────────────────────┐
│ Test Run: password-reset-test           │
│ Status: ✅ Passed                        │
│ Duration: 12.3s                         │
│ Started: 2024-01-15 14:30:00           │
│ Scenarios: 5/5 passed                   │
└─────────────────────────────────────────┘

Key Metrics:

Status: Passed / Failed / Partial
Duration: Total execution time
Scenarios: Number of scenarios run
Success Rate: Percentage of passed scenarios
Token Usage: Total tokens consumed
Cost: Estimated API cost

Scenario Results

Detailed results for each scenario:

Scenario	Status	Duration	Turns	Tokens	Cost
New User	✅ Pass	3.2s	4	1,245	$0.02
Frustrated User	✅ Pass	4.1s	6	2,103	$0.03
Technical User	❌ Fail	5.5s	8	3,421	$0.05
Mobile User	✅ Pass	2.8s	3	987	$0.01
Returning User	✅ Pass	3.5s	5	1,654	$0.02

Conversation View

Interactive Transcript

View the conversation content for each scenario:

┌─────────────────────────────────────────┐
│ Conversation: Frustrated User           │
│ Status: ✅ Passed                        │
│ Duration: 4.1s                          │
│ Turns: 6                                │
└─────────────────────────────────────────┘

👤 User (0.0s)
I can't login! Forgot my password

🤖 Agent (1.2s) [1,245 tokens, $0.02]
I understand how frustrating that can be. I can help
you reset your password right away. Let me send a
reset link to your registered email address.

What email address did you use to register?

👤 User (0.0s)
user@example.com

🤖 Agent (0.9s) [654 tokens, $0.01]
Perfect! I've sent a password reset link to
user@example.com. You should receive it within a
few minutes.

The link will be valid for 24 hours. Once you click it,
you'll be able to create a new password.

Is there anything else I can help you with?

👤 User (0.0s)
No, thanks

🤖 Agent (0.5s) [321 tokens, $0.01]
You're welcome! If you don't receive the email within
10 minutes, please check your spam folder or contact
us again. Have a great day!

✅ Criteria Check: Response contains "reset link" ✓
✅ Criteria Check: Response time < 3s ✓
✅ Criteria Check: Empathetic tone ✓

Trace Details

Detailed information for each message:

Agent Response Details:

Latency: Time taken to respond
Tokens: Input/Output token counts
Cost: Estimated cost
Model: Model used (e.g., claude-3-opus-20240229)
Context: Size of context passed

Metadata:

Tool Calls: Tools called and their parameters
Documents Retrieved: Documents retrieved in RAG
Embeddings: Embedding vector information
Custom Metadata: User-defined metadata

Filtering & Search

Filter conversation content:

Search in conversation...

Filters:
☑ Show user messages
☑ Show agent messages
☑ Show tool calls
☑ Show errors
□ Show debug info

Sort by:
○ Time (ascending)
● Time (descending)
○ Token count
○ Cost

Performance Analysis

Time Breakdown

Time spent at each stage:

Total Duration: 4.1s

┌─────────────────────────────┐
│ ██████ Model Inference 2.5s │ 61%
│ ███ Context Retrieval 1.2s  │ 29%
│ █ Tool Execution 0.3s       │  7%
│ ▌ Other 0.1s               │  3%
└─────────────────────────────┘

Token Usage

Token usage analysis:

Total Tokens: 2,103

Input:  1,234 tokens ($0.018)
Output:   869 tokens ($0.013)
Total:  2,103 tokens ($0.031)

┌─────────────────────────────┐
│ Breakdown by Turn:          │
│ Turn 1:  645 tokens         │
│ Turn 2:  521 tokens         │
│ Turn 3:  456 tokens         │
│ Turn 4:  301 tokens         │
│ Turn 5:  180 tokens         │
└─────────────────────────────┘

Cost Analysis

Cost breakdown:

Total Cost: $0.031

By Model:
Claude Opus:    $0.025 (80%)
Claude Sonnet:  $0.004 (13%)
Embeddings:     $0.002 ( 7%)

Daily Spend:    $2.45
Monthly Spend:  $67.20
Projected:      $85.00

Evaluation Results

Criteria Checks

Evaluation criteria results:

┌─────────────────────────────────────────┐
│ Evaluation Criteria                     │
└─────────────────────────────────────────┘

✅ Response contains "reset link"
   Expected: "reset link"
   Found: "I've sent a password reset link"

✅ Response time < 3s
   Expected: < 3000ms
   Actual: 2,850ms

✅ Empathetic tone detected
   Confidence: 0.92
   Phrases: "I understand", "help you", "right away"

❌ Offered alternative contact method
   Expected: phone/chat mention
   Found: none

   Suggestion: Consider offering phone support for
   urgent password reset requests

Pass/Fail Status

Scenario success/failure determination:

Passed: All required criteria met
Failed: One or more required criteria not met
Partial: Only some optional criteria met

Comparison View

Multi-Run Comparison

Compare multiple test runs:

Select runs to compare:
☑ Run #123 (Jan 15, 2:30 PM) - Current
☑ Run #122 (Jan 15, 1:15 PM) - Baseline
☑ Run #121 (Jan 14, 11:00 AM) - Previous

                  #123    #122    #121    Δ from #122
Success Rate      80%     100%    60%     -20%
Avg Duration      3.5s    3.2s    4.1s    +0.3s
Avg Tokens        1,856   1,654   2,103   +202
Avg Cost          $0.025  $0.022  $0.031  +$0.003

Scenario-Level Comparison

Compare different runs of the same scenario:

Scenario: Frustrated User

           Run #123   Run #122   Changes
Status     ✅ Pass    ✅ Pass    -
Duration   4.1s       3.8s       +0.3s
Turns      6          6          -
Tokens     2,103      1,987      +116
Cost       $0.031     $0.029     +$0.002

Differences:
• Turn 2: Added empathy phrase "I understand"
• Turn 3: More detailed explanation (+78 tokens)
• Turn 4: Same response

A/B Testing

Compare two versions of an agent:

A/B Test: Password Reset Flow
Version A (Current) vs Version B (Experimental)

                Version A   Version B   Winner
Success Rate    80%         90%         B ⭐
Avg Duration    3.5s        4.2s        A ⭐
User Sentiment  0.75        0.82        B ⭐
Cost per Test   $0.025      $0.031      A ⭐

Overall: Version B wins (3/4 metrics improved)

Exporting Results

Export Formats

Export results in various formats:

Export Results:

Format:
○ JSON (API-compatible)
● CSV (Spreadsheet-ready)
○ PDF (Report)
○ Markdown (Documentation)

Include:
☑ Conversation transcripts
☑ Performance metrics
☑ Evaluation results
□ Debug information
□ Raw API responses

[Export]

CSV Export

CSV for spreadsheets:

run_id,scenario,status,duration,turns,tokens,cost
123,new-user,passed,3.2,4,1245,0.02
123,frustrated-user,passed,4.1,6,2103,0.03
123,technical-user,failed,5.5,8,3421,0.05

JSON Export

JSON for programmatic analysis:

{
  "run_id": "123",
  "project": "customer-support",
  "timestamp": "2024-01-15T14:30:00Z",
  "scenarios": [
    {
      "name": "frustrated-user",
      "status": "passed",
      "duration_ms": 4100,
      "turns": 6,
      "conversation": [
        {
          "role": "user",
          "content": "I can't login! Forgot my password",
          "timestamp": "2024-01-15T14:30:00.000Z"
        },
        {
          "role": "agent",
          "content": "I understand...",
          "timestamp": "2024-01-15T14:30:01.200Z",
          "tokens": {
            "input": 234,
            "output": 156
          },
          "cost": 0.02
        }
      ],
      "evaluation": {
        "criteria_passed": 3,
        "criteria_total": 4,
        "details": [...]
      }
    }
  ]
}

Share results with team members:

Share this test run:

🔗 Public Link (anyone with link):
https://results.fluxloop.ai/share/abc123def456

🔒 Team-Only Link (requires login):
https://results.fluxloop.ai/project/customer-support/run/123

⏱️ Link expires: Never | 1 day | 1 week | 1 month

[Copy Link]

Email Report

Send report via email:

Send Report:

To: team@company.com
Subject: FluxLoop Test Results - Password Reset Flow

Include:
☑ Summary
☑ Failed scenarios
☑ Performance charts
□ Full transcripts

[Send]

Slack Integration

Automatic notifications to Slack:

Connected to: #agent-testing

Notify on:
☑ Test completed
☑ Test failed
□ Test passed
□ Performance degraded

Message format:
"✅ Test completed: {scenario} - {status}"

[Update Settings]

Advanced Features

Custom Dashboards

Create custom dashboards:

Place specific metric widgets
Time-based trend charts
Success rate heatmaps
Cost tracking graphs

Alerts & Monitoring

Set up automatic alerts:

Alert Rules:

1. Success rate drops below 80%
   → Notify: team@company.com
   → Severity: High

2. Avg response time > 5s
   → Notify: #performance-alerts
   → Severity: Medium

3. Daily cost > $100
   → Notify: finance@company.com
   → Severity: Low

Historical Analysis

Long-term trend analysis:

Time Range: Last 30 days

Success Rate Trend:
Jan 1:  85% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Jan 15: 90% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Jan 30: 92% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Improvement: +7% over 30 days

Next Steps

API Keys: Create and manage API keys
CLI Sync Command: Upload results with CLI
Platform Overview: Web platform overview

Accessing Results​

Via Web Dashboard​

Direct Links​

Result Overview​

Test Run Summary​

Scenario Results​

Conversation View​

Interactive Transcript​

Trace Details​

Filtering & Search​

Performance Analysis​

Time Breakdown​

Token Usage​

Cost Analysis​

Evaluation Results​

Criteria Checks​

Pass/Fail Status​

Comparison View​

Multi-Run Comparison​

Scenario-Level Comparison​

A/B Testing​

Exporting Results​

Export Formats​

CSV Export​

JSON Export​

Sharing Results​

Share Link​

Email Report​

Slack Integration​

Advanced Features​

Custom Dashboards​

Alerts & Monitoring​

Historical Analysis​

Next Steps​