Best Practices

This guide covers best practices for designing, implementing, and maintaining StepFlow workflows that are reliable, maintainable, and performant.

Workflow Design Principles

Minimize Dependencies

Design workflows with minimal coupling between steps to maximize parallelism and maintainability:

# ✅ Good - minimal dependencies
steps:
  - id: load_user
    component: /user/load
    input:
      user_id: { $from: { workflow: input }, path: "user_id" }

  # These can run in parallel
  - id: load_permissions
    component: /auth/permissions
    input:
      user_id: { $from: { step: load_user }, path: "id" }

  - id: load_preferences
    component: /user/preferences
    input:
      user_id: { $from: { step: load_user }, path: "id" }

# ❌ Avoid - unnecessary dependencies
steps:
  - id: load_user
    component: /user/load
    input:
      user_id: { $from: { workflow: input }, path: "user_id" }

  - id: load_permissions
    component: /auth/permissions
    input:
      user_id: { $from: { step: load_user }, path: "id" }

  - id: load_preferences
    component: /user/preferences
    input:
      user_id: { $from: { step: load_user }, path: "id" }
      # Unnecessary dependency creates false sequence
      permissions: { $from: { step: load_permissions } }

Use Appropriate Granularity

Balance between too many small steps and too few large steps:

# ✅ Good - appropriate granularity
steps:
  - id: validate_and_parse_input
    component: /data/validate_parse
    input:
      raw_data: { $from: { workflow: input } }

  - id: enrich_data
    component: /data/enrich
    input:
      parsed_data: { $from: { step: validate_and_parse_input } }

  - id: process_and_format
    component: /data/process_format
    input:
      enriched_data: { $from: { step: enrich_data } }

# ❌ Avoid - too granular
steps:
  - id: validate_input
    component: /validation/check
  - id: parse_input
    component: /parsing/parse
  - id: extract_field_1
    component: /data/extract
  - id: extract_field_2
    component: /data/extract
  # ... many tiny steps

Handle Errors Gracefully

Design workflows to handle failures elegantly:

steps:
  # Critical operation - must succeed
  - id: authenticate_user
    component: /auth/verify
    input:
      token: { $from: { workflow: input }, path: "auth_token" }

  # Optional enhancement - can fail gracefully
  - id: load_user_preferences
    component: /user/preferences
    on_error:
      action: use_default
      default_value:
        theme: "default"
        notifications: true
    input:
      user_id: { $from: { step: authenticate_user }, path: "user_id" }

  # Main processing - uses preferences if available
  - id: generate_response
    component: /response/create
    input:
      user_data: { $from: { step: authenticate_user } }
      preferences:
        $from: { step: load_user_preferences }
        $on_skip: "use_default"
        $default: { theme: "default", notifications: true }

Component Usage Best Practices

Choose the Right Components

Select components based on your specific needs:

# For simple operations, use builtin components
- id: store_data
  component: /builtin/put_blob
  input:
    data: { $from: { step: process_data } }

# For AI operations, use OpenAI components
- id: generate_summary
  component: /builtin/openai
  input:
    messages: { $from: { step: create_messages } }

# For complex business logic, use custom components
- id: complex_analysis
  component: /custom/business_analyzer
  input:
    data: { $from: { step: load_data } }
    rules: { $from: { step: load_rules } }

Optimize Component Configuration

Configure components appropriately for your use case:

steps:
  # Fast, deterministic AI responses
  - id: quick_classification
    component: /builtin/openai
    input:
      messages: { $from: { step: create_simple_prompt } }
      model: "gpt-3.5-turbo"      # Faster model
      temperature: 0.1            # Low temperature for consistency
      max_tokens: 50              # Short responses

  # Creative AI responses
  - id: creative_writing
    component: /builtin/openai
    input:
      messages: { $from: { step: create_creative_prompt } }
      model: "gpt-4"             # Better model for creativity
      temperature: 0.8           # Higher temperature for variety
      max_tokens: 500            # Longer responses

Validate Inputs Early

Catch errors before expensive operations:

steps:
  # Fast validation first
  - id: validate_request
    component: /validation/request
    input:
      request: { $from: { workflow: input } }

  # Expensive operations only run on valid input
  - id: process_with_ai
    component: /builtin/openai
    input:
      messages: { $from: { step: create_messages_from_valid_request } }
      validated_request: { $from: { step: validate_request } }

Data Management Best Practices

Use Blob Storage Effectively

Store large or reusable data in blobs:

steps:
  # Store large dataset once
  - id: store_dataset
    component: /builtin/put_blob
    input:
      data: { $from: { step: load_large_dataset } }

  # Multiple analyses reference the same blob
  - id: statistical_analysis
    component: /analytics/statistics
    input:
      data_blob: { $from: { step: store_dataset }, path: "blob_id" }

  - id: ml_analysis
    component: /analytics/machine_learning
    input:
      data_blob: { $from: { step: store_dataset }, path: "blob_id" }

Reference Specific Fields

Avoid copying entire large objects:

# ✅ Good - reference specific fields
- id: create_user_summary
  component: /user/summarize
  input:
    user_id: { $from: { step: load_user }, path: "id" }
    user_name: { $from: { step: load_user }, path: "profile.name" }
    user_email: { $from: { step: load_user }, path: "contact.email" }

# ❌ Avoid - copying entire object
- id: create_user_summary
  component: /user/summarize
  input:
    user_data: { $from: { step: load_user } }  # Copies entire user object

Batch Operations When Possible

Process multiple items efficiently:

# ✅ Good - batch processing
- id: process_all_users
  component: /user/batch_process
  input:
    users: { $from: { step: load_users } }
    batch_size: 50

# ❌ Avoid - individual processing (unless parallelism is needed)
- id: process_user_1
  component: /user/process
  input:
    user: { $from: { step: load_users }, path: "users[0]" }
# ... repeated for each user

Schema and Validation Best Practices

Define Clear Schemas

Use comprehensive input and output schemas:

name: "User Data Processor"

input_schema:
  type: object
  properties:
    user_id:
      type: string
      pattern: "^[a-zA-Z0-9]{8,}$"
      description: "Unique user identifier"
    processing_options:
      type: object
      properties:
        include_analytics:
          type: boolean
          default: true
        output_format:
          type: string
          enum: ["json", "xml", "csv"]
          default: "json"
      additionalProperties: false
  required: ["user_id"]
  additionalProperties: false

output_schema:
  type: object
  properties:
    processed_data:
      type: object
      description: "Processed user data"
    metadata:
      type: object
      properties:
        processing_time_ms:
          type: integer
          minimum: 0
        version:
          type: string
      required: ["processing_time_ms", "version"]
  required: ["processed_data", "metadata"]

Validate at Step Level

Add validation to individual steps when needed:

steps:
  - id: process_user_data
    component: /user/process
    input_schema:
      type: object
      properties:
        user_data:
          type: object
          properties:
            id: { type: string, minLength: 1 }
            email: { type: string, format: email }
          required: ["id", "email"]
      required: ["user_data"]
    input:
      user_data: { $from: { step: load_user } }

Testing Best Practices

Comprehensive Test Coverage

Test different scenarios and edge cases:

test:
  cases:
    # Happy path
    - name: successful_processing
      description: "Test normal operation with valid input"
      input:
        user_id: "user123"
        processing_options:
          include_analytics: true
          output_format: "json"
      output:
        outcome: success
        result:
          processed_data: "*"
          metadata:
            processing_time_ms: "*"
            version: "1.0"

    # Error cases
    - name: invalid_user_id
      description: "Test handling of invalid user ID"
      input:
        user_id: ""  # Invalid empty ID
      output:
        outcome: failed
        error:
          code: "VALIDATION_ERROR"

    # Edge cases
    - name: minimal_input
      description: "Test with minimal required input"
      input:
        user_id: "user123"
      output:
        outcome: success
        result:
          processed_data: "*"

Use Test-Specific Configuration

Create dedicated test configurations:

test:
  stepflow_config: "test/test-config.yml"
  cases:
    # Test cases using mocked components

# test/test-config.yml
plugins:
  - name: builtin
    type: builtin
  - name: mock_external_apis
    type: stepflow
    transport: stdio
    command: "test/mock-server.py"

state_store:
  type: in_memory

Documentation Best Practices

Descriptive Names and Documentation

Use clear, self-documenting names:

name: "Customer Order Processing Pipeline"
description: |
  Processes customer orders through validation, inventory checking,
  payment processing, and fulfillment scheduling. Handles both
  standard and priority orders with appropriate error handling.

steps:
  - id: validate_order_details
    description: "Validate order format, customer info, and product availability"
    component: /order/validate
    input:
      order: { $from: { workflow: input }, path: "order" }

  - id: check_inventory_availability
    description: "Verify all ordered items are in stock"
    component: /inventory/check
    input:
      items: { $from: { step: validate_order_details }, path: "validated_items" }

Include Examples

Provide examples in your workflow documentation:

examples:
  - name: standard_order
    description: "Example of a standard order processing"
    input:
      order:
        customer_id: "cust_12345"
        items:
          - product_id: "prod_001"
            quantity: 2
          - product_id: "prod_002"
            quantity: 1
        shipping_address:
          street: "123 Main St"
          city: "Anytown"
          state: "CA"
          zip: "12345"

  - name: priority_order
    description: "Example of a priority order with expedited processing"
    input:
      order:
        customer_id: "cust_vip"
        priority: true
        items:
          - product_id: "prod_premium"
            quantity: 1

Security Best Practices

Handle Sensitive Data Carefully

Never include sensitive data in workflow definitions:

# ❌ Avoid - hardcoded secrets
steps:
  - id: api_call
    component: /http/request
    input:
      url: "https://api.example.com/data"
      headers:
        Authorization: "Bearer sk-1234567890abcdef"  # Don't do this!

# ✅ Good - use environment variables
steps:
  - id: api_call
    component: /http/request
    input:
      url: "https://api.example.com/data"
      headers:
        Authorization: { $env: "API_TOKEN" }

Validate External Input

Always validate data from external sources:

steps:
  - id: validate_external_data
    component: /validation/external
    input:
      data: { $from: { step: fetch_external_data } }
      schema: { $from: { step: load_validation_schema } }

  - id: sanitize_data
    component: /security/sanitize
    input:
      validated_data: { $from: { step: validate_external_data } }

Performance Best Practices

Monitor and Optimize

Track performance metrics and optimize bottlenecks:

steps:
  - id: performance_critical_step
    component: /analytics/heavy_computation
    input:
      data: { $from: { step: load_data } }
    # Consider adding performance monitoring
    metadata:
      performance_critical: true
      expected_duration_ms: 5000

Use Caching When Appropriate

Cache expensive computations:

steps:
  - id: expensive_computation
    component: /analytics/complex
    input:
      data: { $from: { step: prepare_data } }
      cache_key: { $from: { step: generate_cache_key } }
      use_cache: true

Maintenance Best Practices

Version Your Workflows

Use semantic versioning for workflows:

name: "User Processing Pipeline"
version: "2.1.0"
description: |
  Version 2.1.0: Added optional analytics processing
  Version 2.0.0: Redesigned with new component architecture
  Version 1.x.x: Legacy processing pipeline

Keep Dependencies Updated

Regularly review and update component dependencies:

# Document component versions or requirements
metadata:
  component_requirements:
    openai_component: ">=1.2.0"
    data_processor: "^2.0.0"
  last_updated: "2024-01-15"
  updated_by: "dev-team"

Refactor When Needed

Regularly review and refactor workflows:

Remove unused steps: Clean up workflows periodically
Combine related operations: Merge steps that always run together
Split complex steps: Break down overly complex operations
Update deprecated patterns: Migrate to newer best practices

Common Anti-Patterns to Avoid

Don't Create Monolithic Workflows

# ❌ Avoid - one giant workflow doing everything
name: "Everything Processor"
steps:
  # 50+ steps doing unrelated things

# ✅ Good - focused, composable workflows
name: "User Data Processor"
steps:
  # 5-10 related steps for user processing

Don't Ignore Error Handling

# ❌ Avoid - no error handling
steps:
  - id: critical_operation
    component: /external/api
    input:
      data: { $from: { workflow: input } }
    # What happens if the API is down?

# ✅ Good - comprehensive error handling
steps:
  - id: critical_operation
    component: /external/api
    on_error:
      action: retry
      max_attempts: 3
      fallback:
        action: use_default
        default_value: { status: "unavailable" }
    input:
      data: { $from: { workflow: input } }

Don't Overcomplicate Simple Operations

# ❌ Avoid - overengineering simple tasks
steps:
  - id: extract_field_setup
    component: /config/setup
  - id: extract_field_validate
    component: /validation/check
  - id: extract_field_execute
    component: /data/extract
  - id: extract_field_cleanup
    component: /cleanup/finalize

# ✅ Good - simple extraction
steps:
  - id: extract_field
    component: /extract
    input:
      data: { $from: { step: load_data } }
      path: "user.email"

Following these best practices will help you create robust, maintainable, and efficient StepFlow workflows that scale well and are easy to debug and modify.

Workflow Design Principles​

Minimize Dependencies​

Use Appropriate Granularity​

Handle Errors Gracefully​

Component Usage Best Practices​

Choose the Right Components​

Optimize Component Configuration​

Validate Inputs Early​

Data Management Best Practices​

Use Blob Storage Effectively​

Reference Specific Fields​

Batch Operations When Possible​

Schema and Validation Best Practices​

Define Clear Schemas​

Validate at Step Level​

Testing Best Practices​

Comprehensive Test Coverage​

Use Test-Specific Configuration​

Documentation Best Practices​

Descriptive Names and Documentation​

Include Examples​

Security Best Practices​

Handle Sensitive Data Carefully​

Validate External Input​

Performance Best Practices​

Monitor and Optimize​

Use Caching When Appropriate​

Maintenance Best Practices​

Version Your Workflows​

Keep Dependencies Updated​

Refactor When Needed​

Common Anti-Patterns to Avoid​

Don't Create Monolithic Workflows​

Don't Ignore Error Handling​

Don't Overcomplicate Simple Operations​