CI/CD Integration

Integrate FluxLoop into CI/CD pipelines for automated agent testing and regression detection.

Overview

FluxLoop can be integrated into CI/CD pipelines to:

Automate regression testing after code changes
Validate agent quality before deployment
Track performance trends over time
Generate test reports for stakeholders

This guide covers GitHub Actions, GitLab CI, and generic CI/CD setup.

Quick Setup

Prerequisites

FluxLoop project with configuration in configs/
Agent code instrumented with FluxLoop SDK
Base inputs defined in configs/input.yaml
API keys (OpenAI, Anthropic, etc.) stored as secrets

Key Principles

For CI/CD environments:

Use deterministic mode for input generation (or commit pre-generated inputs)
Set fixed seed in configs/simulation.yaml for reproducibility
Cache dependencies (Python packages, MCP index)
Store API keys as secrets, not in code
Generate artifacts (reports, traces) for review

GitHub Actions

Basic Workflow

Create .github/workflows/fluxloop-test.yml:

name: FluxLoop Agent Tests

on:
  push:
    branches: [main, develop]
  pull_request:
    branches: [main]

jobs:
  test-agent:
    runs-on: ubuntu-latest
    
    steps:
      - name: Checkout code
        uses: actions/checkout@v4
      
      - name: Set up Python
        uses: actions/setup-python@v5
        with:
          python-version: '3.11'
          cache: 'pip'
      
      - name: Install dependencies
        run: |
          pip install fluxloop-cli fluxloop
          pip install -r requirements.txt
      
      - name: Verify setup
        run: fluxloop doctor
      
      - name: Generate inputs (deterministic)
        run: |
          cd fluxloop/my-agent
          fluxloop generate inputs --limit 20 --mode deterministic
      
      - name: Run agent tests
        env:
          FLUXLOOP_API_KEY: ${{ secrets.FLUXLOOP_API_KEY }}
        run: |
          cd my-agent
          fluxloop sync pull
          fluxloop test
      
      - name: Parse results
        run: |
          cd fluxloop/my-agent
          LATEST_EXP=$(ls -td results/*/ | head -1)
          fluxloop parse experiment "$LATEST_EXP"
      
      - name: Evaluate results
        run: |
          cd fluxloop/my-agent
          LATEST_EXP=$(ls -td results/*/ | head -1)
      
      - name: Check evaluation threshold
        run: |
          cd fluxloop/my-agent
          LATEST_EXP=$(ls -td results/*/ | head -1)
          python3 << 'EOF'
          import json
          import sys
          
              summary = json.load(f)
          
          score = summary.get("overall_score", 0)
          threshold = 0.7
          
          print(f"Score: {score:.2f}, Threshold: {threshold}")
          
          if score < threshold:
              print(f"FAIL: Score {score:.2f} below threshold {threshold}")
              sys.exit(1)
          else:
              print(f"PASS: Score {score:.2f} meets threshold {threshold}")
          EOF
      
      - name: Upload test artifacts
        if: always()
        uses: actions/upload-artifact@v4
        with:
          name: fluxloop-results
          path: |
            fluxloop/my-agent/results/
            fluxloop/my-agent/inputs/
          retention-days: 30
      
      - name: Comment on PR (if PR)
        if: github.event_name == 'pull_request' && always()
        uses: actions/github-script@v7
        with:
          script: |
            const fs = require('fs');
            const expDir = 'fluxloop/my-agent/results/';
            const latest = fs.readdirSync(expDir)
              .filter(f => fs.statSync(expDir + f).isDirectory())
              .sort()
              .reverse()[0];
            
            const summary = JSON.parse(
            );
            
            const body = `
            ## FluxLoop Evaluation Results
            
            **Overall Score:** ${summary.overall_score.toFixed(2)} / 1.00
            **Status:** ${summary.pass_fail_status}
            **Traces:** ${summary.total_traces}
            
            ### Evaluator Scores
            ${Object.entries(summary.by_evaluator || {}).map(([name, data]) => 
              `- **${name}**: ${data.score.toFixed(2)}`
            ).join('\n')}
            
            [View Full Report](../actions/runs/${context.runId})
            `;
            
            github.rest.issues.createComment({
              issue_number: context.issue.number,
              owner: context.repo.owner,
              repo: context.repo.repo,
              body: body
            });

Advanced: Comparison with Baseline

Compare results against a baseline:

      - name: Download baseline results
        uses: dawidd6/action-download-artifact@v3
        with:
          workflow: fluxloop-test.yml
          branch: main
          name: baseline-summary
          path: baseline/
        continue-on-error: true
      
      - name: Compare with baseline
        run: |
          cd fluxloop/my-agent
          LATEST_EXP=$(ls -td results/*/ | head -1)
          
          if [ -f "baseline/summary.json" ]; then
              --baseline baseline/summary.json
          fi
      
      - name: Save new baseline (on main)
        if: github.ref == 'refs/heads/main'
        uses: actions/upload-artifact@v4
        with:
          name: baseline-summary

Scheduled Regression Testing

Run tests on a schedule:

on:
  schedule:
    - cron: '0 2 * * *'  # Daily at 2 AM UTC
  workflow_dispatch:      # Manual trigger

GitLab CI

Basic Pipeline

Create .gitlab-ci.yml:

image: python:3.11

variables:
  PIP_CACHE_DIR: "$CI_PROJECT_DIR/.cache/pip"

cache:
  paths:
    - .cache/pip
    - venv/

stages:
  - setup
  - test
  
  - report

before_script:
  - python -m venv venv
  - source venv/bin/activate
  - pip install fluxloop-cli fluxloop
  - pip install -r requirements.txt

setup:
  stage: setup
  script:
    - cd fluxloop/my-agent
    - fluxloop doctor
    - fluxloop config validate
  artifacts:
    reports:
      dotenv: build.env

generate_inputs:
  stage: setup
  script:
    - cd fluxloop/my-agent
    - fluxloop generate inputs --limit 20 --mode deterministic
  artifacts:
    paths:
      - fluxloop/my-agent/inputs/
    expire_in: 1 week

run_experiment:
  stage: test
  dependencies:
    - generate_inputs
  script:
    - cd fluxloop/my-agent
    - fluxloop test --iterations 1
  artifacts:
    paths:
      - fluxloop/my-agent/results/
    expire_in: 1 month

parse_results:
  dependencies:
    - run_experiment
  script:
    - cd fluxloop/my-agent
    - LATEST_EXP=$(ls -td results/*/ | head -1)
    - fluxloop parse experiment "$LATEST_EXP"
  artifacts:
    paths:
      - fluxloop/my-agent/results/*/per_trace_analysis/
    expire_in: 1 month

generate_report:
  stage: report
  dependencies:
    _results
  script:
    - cd fluxloop/my-agent
    - LATEST_EXP=$(ls -td results/*/ | head -1)
  artifacts:
    paths:
      - public
  only:
    - main

Jenkins

Jenkinsfile

pipeline {
    agent any
    
    environment {
        OPENAI_API_KEY = credentials('openai-api-key')
        FLUXLOOP_ENABLED = 'true'
    }
    
    stages {
        stage('Setup') {
            steps {
                sh '''
                    python3 -m venv venv
                    . venv/bin/activate
                    pip install fluxloop-cli fluxloop
                    pip install -r requirements.txt
                '''
            }
        }
        
        stage('Verify') {
            steps {
                sh '''
                    . venv/bin/activate
                    cd fluxloop/my-agent
                    fluxloop doctor
                    fluxloop config validate
                '''
            }
        }
        
        stage('Generate Inputs') {
            steps {
                sh '''
                    . venv/bin/activate
                    cd fluxloop/my-agent
                    fluxloop generate inputs --limit 20 --mode deterministic
                '''
            }
        }
        
        stage('Run Experiment') {
            steps {
                sh '''
                    . venv/bin/activate
                    cd fluxloop/my-agent
                    fluxloop test --iterations 1
                '''
            }
        }
        
        stage('Evaluate') {
            steps {
                sh '''
                    . venv/bin/activate
                    cd fluxloop/my-agent
                    LATEST_EXP=$(ls -td results/*/ | head -1)
                    fluxloop parse experiment "$LATEST_EXP"
                '''
            }
        }
        
        stage('Quality Gate') {
            steps {
                script {
                    def score = summary.overall_score
                    
                    if (score < 0.7) {
                        error("Quality gate failed: score ${score} below 0.7")
                    }
                }
            }
        }
    }
    
    post {
        always {
            archiveArtifacts artifacts: 'fluxloop/my-agent/results/**/*', 
                             allowEmptyArchive: true
            
            publishHTML([
                reportDir: 'fluxloop/my-agent/results/*/evaluation/',
                reportFiles: 'report.html',
                reportName: 'FluxLoop Evaluation Report'
            ])
        }
    }
}

Docker Integration

Dockerfile

FROM python:3.11-slim

WORKDIR /app

# Install dependencies
COPY requirements.txt .
RUN pip install --no-cache-dir \
    fluxloop-cli \
    fluxloop \
    -r requirements.txt

# Copy project
COPY . .

# Set up FluxLoop
RUN cd fluxloop/my-agent && \
    fluxloop doctor

# Default command
CMD ["bash", "-c", "cd fluxloop/my-agent && fluxloop test"]

Docker Compose for Testing

version: '3.8'

services:
  fluxloop-test:
    build: .
    environment:
      - OPENAI_API_KEY=${OPENAI_API_KEY}
      - FLUXLOOP_ENABLED=true
    volumes:
      - ./fluxloop:/app/fluxloop
      - test-results:/app/fluxloop/my-agent/experiments
    command: |
      bash -c "
        cd fluxloop/my-agent &&
        fluxloop generate inputs --limit 20 --mode deterministic &&
        fluxloop test --iterations 1 &&
        fluxloop parse experiment results/*/ &&
      "

volumes:
  test-results:

Run tests:

docker-compose run fluxloop-test

Best Practices

1. Use Deterministic Mode in CI

# configs/input.yaml (for CI)
input_generation:
  mode: deterministic  # or commit pre-generated inputs

# configs/simulation.yaml
seed: 42  # Fixed seed for reproducibility

2. Separate Test Configs from Production

fluxloop/my-agent/
├── configs/              # Production configs
│   ├── project.yaml
│   ├── input.yaml
│   ├── simulation.yaml
│   └── evaluation.yaml
└── configs-ci/           # CI-specific configs
    ├── input.yaml        # Deterministic, fewer inputs
    ├── simulation.yaml   # Lower iterations, fixed seed
    └── evaluation.yaml   # Stricter thresholds

Run with CI configs:

# Override config directory
cp -r configs-ci/* configs/
fluxloop test

3. Cache Wisely

Cache these:

Python packages (pip cache)
Generated inputs (if deterministic)
MCP index (~/.fluxloop/mcp/index/)

Don't cache:

Experiment outputs
Traces
Evaluation results

4. Store Secrets Securely

GitHub Actions:

env:
  OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}

GitLab CI:

variables:
  OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}

Jenkins:

environment {
    OPENAI_API_KEY = credentials('openai-api-key')
}

5. Set Quality Gates

# check_quality.py
import json
import sys

    summary = json.load(f)

score = summary["overall_score"]
threshold = 0.7

# Can also check individual evaluators
intent_score = summary["by_evaluator"]["intent_recognition"]["score"]
latency_score = summary["by_evaluator"]["token_budget"]["score"]

if score < threshold:
    print(f"❌ FAIL: Overall score {score:.2f} < {threshold}")
    sys.exit(1)

if intent_score < 0.8:
    print(f"❌ FAIL: Intent recognition {intent_score:.2f} < 0.8")
    sys.exit(1)

print(f"✅ PASS: All quality gates passed")

Monitoring and Reporting

Track Metrics Over Time

Store evaluation results in a time-series database:

# upload_metrics.py
import json
import requests
from datetime import datetime

    summary = json.load(f)

# Send to monitoring system
requests.post("https://metrics.example.com/fluxloop", json={
    "timestamp": datetime.now().isoformat(),
    "project": "my-agent",
    "branch": os.getenv("CI_COMMIT_BRANCH"),
    "overall_score": summary["overall_score"],
    "by_evaluator": summary["by_evaluator"],
    "total_traces": summary["total_traces"],
})

Generate Trend Reports

# Compare last N runs
python3 << 'EOF'
import json
import glob


for exp in experiments:
    with open(exp) as f:
        summary = json.load(f)
    print(f"{exp}: {summary['overall_score']:.2f}")
EOF

Troubleshooting CI/CD

Tests Pass Locally but Fail in CI

Common causes:

Missing environment variables
```
# Check what's set
fluxloop config env
```

Different Python version

# Pin Python version
python-version: '3.11'

Non-deterministic inputs

# Use fixed seed
seed: 42
mode: deterministic

Slow CI Runs

Optimizations:

Reduce test scope

# Fewer inputs
fluxloop generate inputs --limit 10

# Single iteration
fluxloop test --iterations 1

Sample LLM evaluations

# configs/evaluation.yaml
limits:
  sample_rate: 0.2  # Only 20% of traces
  max_llm_calls: 10

Cache dependencies

- uses: actions/cache@v4
  with:
    path: ~/.cache/pip
    key: ${{ runner.os }}-pip-${{ hashFiles('requirements.txt') }}

Example: Full GitHub Actions Workflow

Comprehensive example with all best practices:

name: FluxLoop CI

on:
  push:
    branches: [main]
  pull_request:
    branches: [main]
  schedule:
    - cron: '0 2 * * *'  # Daily at 2 AM

jobs:
  test:
    runs-on: ubuntu-latest
    strategy:
      matrix:
        python-version: ['3.11', '3.12']
    
    steps:
      - uses: actions/checkout@v4
      
      - name: Set up Python ${{ matrix.python-version }}
        uses: actions/setup-python@v5
        with:
          python-version: ${{ matrix.python-version }}
          cache: 'pip'
      
      - name: Cache MCP index
        uses: actions/cache@v4
        with:
          path: ~/.fluxloop/mcp/index
          key: mcp-index-v1
      
      - name: Install dependencies
        run: |
          pip install fluxloop-cli fluxloop fluxloop-mcp
          pip install -r requirements.txt
      
      - name: Verify installation
        run: fluxloop doctor
      
      - name: Prepare test environment
        run: |
          cd fluxloop/my-agent
          cp -r configs-ci/* configs/
          fluxloop config validate
      
      - name: Run tests
        env:
          FLUXLOOP_API_KEY: ${{ secrets.FLUXLOOP_API_KEY }}
        run: |
          cd my-agent
          fluxloop sync pull
          fluxloop test
      
      - name: Quality gate
        run: |
          cd fluxloop/my-agent
          python3 scripts/check_quality.py
      
      - name: Upload artifacts
        if: always()
        uses: actions/upload-artifact@v4
        with:
          name: results-py${{ matrix.python-version }}
          path: fluxloop/my-agent/results/
      
      - name: Report to PR
        if: github.event_name == 'pull_request'
        uses: actions/github-script@v7
        with:
          script: |
            const fs = require('fs');
            // ... (PR comment script from earlier)

Overview​

Quick Setup​

Prerequisites​

Key Principles​

GitHub Actions​

Basic Workflow​

Advanced: Comparison with Baseline​

Scheduled Regression Testing​

GitLab CI​

Basic Pipeline​

Jenkins​

Jenkinsfile​

Docker Integration​

Dockerfile​

Docker Compose for Testing​

Best Practices​

1. Use Deterministic Mode in CI​

2. Separate Test Configs from Production​

3. Cache Wisely​

4. Store Secrets Securely​

5. Set Quality Gates​

Monitoring and Reporting​

Track Metrics Over Time​

Generate Trend Reports​

Troubleshooting CI/CD​

Tests Pass Locally but Fail in CI​

Slow CI Runs​

Example: Full GitHub Actions Workflow​

See Also​