Skip to main content

Agent Test Skill ⭐

The most powerful way to use FluxLoop. Just talk naturally—the skill handles everything.

Why Use the Skill?

ApproachExperience
Skill (Recommended)"test my agent" → Everything happens automatically
Slash CommandsManual: /fluxloop:pull/fluxloop:test → check results

The skill is context-aware. It:

  • Knows your current setup state
  • Guides you through missing steps
  • Asks before executing anything
  • Handles errors gracefully

How to Use

Just ask Claude naturally:

"test my agent"
"generate test data"
"run a simulation"
"set up fluxloop"
"evaluate my chatbot"

That's it. The skill activates automatically.


Activation Keywords

The skill activates when you mention:

CategoryKeywords
Testingtest, testing, test my agent, run test
Simulationsimulation, run simulation, simulate
Data Generationgenerate test data, synthesize, create inputs
Evaluationevaluate, evaluation, assess, check quality
Setupset up fluxloop, configure, initialize

Skill Workflow

Core Principle: Context-First

The skill always follows this pattern:

1. Check context (fluxloop context show)

2. Summarize current state to you

3. Present options (NO auto-execution)

4. Execute after your confirmation

5. Show results and next steps

You're always in control. Nothing runs without your approval.


Example Conversations

First-Time Setup

You: "I want to test my agent with fluxloop"

Claude: I'll help you set up FluxLoop for agent testing.

Let me check your current state...

📊 Current State:
- FluxLoop CLI: ✗ Not installed
- Authentication: -
- Project: -

To get started, I need to:
1. Install FluxLoop CLI
2. Log you in
3. Create or select a project

Should I proceed with the installation?

You: "yes"

Claude: Installing FluxLoop CLI...
[Executes: pip install fluxloop-cli]

✓ FluxLoop CLI installed!

Now let's log you in. Run this command and follow the browser prompt:

[Executes: fluxloop auth login]

...

Generate Test Data

You: "generate test data for my customer support bot"

Claude: I'll generate test data for your agent.

📊 Current State:
- Project: customer-support (proj_abc)
- Scenario: support-bot (scn_123)
- Existing bundles: 2 found
- v2 (stress-test, 50 inputs, 1 day ago)
- v1 (basic, 10 inputs, 1 week ago)

Would you like to:
1. Use existing bundle v2 (50 inputs)
2. Use existing bundle v1 (10 inputs)
3. Generate new test data

You: "3"

Claude: I'll generate new test data. First, let me create personas...

[Executes: fluxloop personas suggest --scenario-id scn_123]

✓ 5 personas generated:
- frustrated_customer
- new_user
- power_user
- confused_elderly
- impatient_professional

Now synthesizing test inputs...

[Executes: fluxloop inputs synthesize --scenario-id scn_123]

✓ 30 test inputs generated!

Publishing as bundle v3...

[Executes: fluxloop bundles publish ...]

✓ Bundle v3 published! Ready to test.

Run tests now?

Run Tests

You: "run the tests"

Claude: Running tests with bundle v3...

[Executes: fluxloop sync pull --bundle-version-id bv_xyz]
[Executes: fluxloop test --scenario support-bot]

📊 Test Results:
✓ Passed: 27/30 (90%)
⚠️ Warnings: 3

Failed cases:
1. [frustrated_customer] "This is ridiculous!"
→ Agent used "sorry" (forbidden word)

2. [power_user] "What's the API rate limit?"
→ Response too long (2500 chars > 2000 limit)

3. [impatient_professional] "Just fix it now"
→ Empty response

🔗 Detailed analysis: https://app.fluxloop.ai/runs/...

Would you like me to analyze these failures and suggest fixes?

State-Based Guidance

The skill adapts to your current state:

Your StateSkill Response
No CLI installedInstalls CLI automatically
Not logged inHelps with authentication
No projectLists projects or creates new
No scenarioCreates local scenario folder
No test dataGenerates personas and inputs
Ready to testRuns tests immediately

Decision Tree

When you ask to test, the skill follows this logic:

Check bundles list

├─ Multiple bundles → "Which bundle?" (shows list)

├─ One bundle → "Use existing or create new?"

└─ No bundles → Check inputs

├─ Has inputs → "Publish as bundle?"

└─ No inputs → "Generate new data?"

Key Commands Used by Skill

The skill orchestrates these CLI commands for you:

PhaseCommands
Setupfluxloop auth login, fluxloop projects select
Scenariofluxloop init scenario, fluxloop scenarios create
Data Genfluxloop personas suggest, fluxloop inputs synthesize
Publishfluxloop bundles publish
Testfluxloop sync pull, fluxloop test
Resultsfluxloop test results

You don't need to remember any of these. Just talk naturally.


Best Practices

1. Be Specific About Your Agent

Good: "test my customer support chatbot"
Good: "generate edge cases for my order cancellation flow"

Less helpful: "test it"

2. Let the Skill Guide You

Don't try to run commands manually. Say what you want to achieve:

Good: "I want to test with more difficult scenarios"
Less helpful: "/fluxloop:synthesis" (manual command)

3. Ask for Analysis

After tests run, ask:

"Why did test #3 fail?"
"How can I fix these warnings?"
"What patterns do you see in the failures?"

Troubleshooting

Skill Not Activating?

Make sure your message includes activation keywords:

✗ "check this"
✓ "test this agent"

✗ "make some data"
✓ "generate test data"

Wrong Context?

If the skill is checking the wrong project:

"switch to project customer-support"
"use scenario order-bot"

Need Manual Control?

For edge cases, you can use slash commands:

/fluxloop:status   → Check current state
/fluxloop:pull → Pull specific bundle
/fluxloop:test → Run with specific options

But 99% of the time, just talk naturally!