Best Practices
This guide covers best practices for designing, implementing, and maintaining StepFlow workflows that are reliable, maintainable, and performant.
Workflow Design Principles
Minimize Dependencies
Design workflows with minimal coupling between steps to maximize parallelism and maintainability:
# ✅ Good - minimal dependencies
steps:
- id: load_user
component: /user/load
input:
user_id: { $from: { workflow: input }, path: "user_id" }
# These can run in parallel
- id: load_permissions
component: /auth/permissions
input:
user_id: { $from: { step: load_user }, path: "id" }
- id: load_preferences
component: /user/preferences
input:
user_id: { $from: { step: load_user }, path: "id" }
# ❌ Avoid - unnecessary dependencies
steps:
- id: load_user
component: /user/load
input:
user_id: { $from: { workflow: input }, path: "user_id" }
- id: load_permissions
component: /auth/permissions
input:
user_id: { $from: { step: load_user }, path: "id" }
- id: load_preferences
component: /user/preferences
input:
user_id: { $from: { step: load_user }, path: "id" }
# Unnecessary dependency creates false sequence
permissions: { $from: { step: load_permissions } }
Use Appropriate Granularity
Balance between too many small steps and too few large steps:
# ✅ Good - appropriate granularity
steps:
- id: validate_and_parse_input
component: /data/validate_parse
input:
raw_data: { $from: { workflow: input } }
- id: enrich_data
component: /data/enrich
input:
parsed_data: { $from: { step: validate_and_parse_input } }
- id: process_and_format
component: /data/process_format
input:
enriched_data: { $from: { step: enrich_data } }
# ❌ Avoid - too granular
steps:
- id: validate_input
component: /validation/check
- id: parse_input
component: /parsing/parse
- id: extract_field_1
component: /data/extract
- id: extract_field_2
component: /data/extract
# ... many tiny steps
Handle Errors Gracefully
Design workflows to handle failures elegantly:
steps:
# Critical operation - must succeed
- id: authenticate_user
component: /auth/verify
input:
token: { $from: { workflow: input }, path: "auth_token" }
# Optional enhancement - can fail gracefully
- id: load_user_preferences
component: /user/preferences
on_error:
action: use_default
default_value:
theme: "default"
notifications: true
input:
user_id: { $from: { step: authenticate_user }, path: "user_id" }
# Main processing - uses preferences if available
- id: generate_response
component: /response/create
input:
user_data: { $from: { step: authenticate_user } }
preferences:
$from: { step: load_user_preferences }
$on_skip: "use_default"
$default: { theme: "default", notifications: true }
Component Usage Best Practices
Choose the Right Components
Select components based on your specific needs:
# For simple operations, use builtin components
- id: store_data
component: /builtin/put_blob
input:
data: { $from: { step: process_data } }
# For AI operations, use OpenAI components
- id: generate_summary
component: /builtin/openai
input:
messages: { $from: { step: create_messages } }
# For complex business logic, use custom components
- id: complex_analysis
component: /custom/business_analyzer
input:
data: { $from: { step: load_data } }
rules: { $from: { step: load_rules } }
Optimize Component Configuration
Configure components appropriately for your use case:
steps:
# Fast, deterministic AI responses
- id: quick_classification
component: /builtin/openai
input:
messages: { $from: { step: create_simple_prompt } }
model: "gpt-3.5-turbo" # Faster model
temperature: 0.1 # Low temperature for consistency
max_tokens: 50 # Short responses
# Creative AI responses
- id: creative_writing
component: /builtin/openai
input:
messages: { $from: { step: create_creative_prompt } }
model: "gpt-4" # Better model for creativity
temperature: 0.8 # Higher temperature for variety
max_tokens: 500 # Longer responses
Validate Inputs Early
Catch errors before expensive operations:
steps:
# Fast validation first
- id: validate_request
component: /validation/request
input:
request: { $from: { workflow: input } }
# Expensive operations only run on valid input
- id: process_with_ai
component: /builtin/openai
input:
messages: { $from: { step: create_messages_from_valid_request } }
validated_request: { $from: { step: validate_request } }
Data Management Best Practices
Use Blob Storage Effectively
Store large or reusable data in blobs:
steps:
# Store large dataset once
- id: store_dataset
component: /builtin/put_blob
input:
data: { $from: { step: load_large_dataset } }
# Multiple analyses reference the same blob
- id: statistical_analysis
component: /analytics/statistics
input:
data_blob: { $from: { step: store_dataset }, path: "blob_id" }
- id: ml_analysis
component: /analytics/machine_learning
input:
data_blob: { $from: { step: store_dataset }, path: "blob_id" }
Reference Specific Fields
Avoid copying entire large objects:
# ✅ Good - reference specific fields
- id: create_user_summary
component: /user/summarize
input:
user_id: { $from: { step: load_user }, path: "id" }
user_name: { $from: { step: load_user }, path: "profile.name" }
user_email: { $from: { step: load_user }, path: "contact.email" }
# ❌ Avoid - copying entire object
- id: create_user_summary
component: /user/summarize
input:
user_data: { $from: { step: load_user } } # Copies entire user object
Batch Operations When Possible
Process multiple items efficiently:
# ✅ Good - batch processing
- id: process_all_users
component: /user/batch_process
input:
users: { $from: { step: load_users } }
batch_size: 50
# ❌ Avoid - individual processing (unless parallelism is needed)
- id: process_user_1
component: /user/process
input:
user: { $from: { step: load_users }, path: "users[0]" }
# ... repeated for each user
Schema and Validation Best Practices
Define Clear Schemas
Use comprehensive input and output schemas:
name: "User Data Processor"
input_schema:
type: object
properties:
user_id:
type: string
pattern: "^[a-zA-Z0-9]{8,}$"
description: "Unique user identifier"
processing_options:
type: object
properties:
include_analytics:
type: boolean
default: true
output_format:
type: string
enum: ["json", "xml", "csv"]
default: "json"
additionalProperties: false
required: ["user_id"]
additionalProperties: false
output_schema:
type: object
properties:
processed_data:
type: object
description: "Processed user data"
metadata:
type: object
properties:
processing_time_ms:
type: integer
minimum: 0
version:
type: string
required: ["processing_time_ms", "version"]
required: ["processed_data", "metadata"]
Validate at Step Level
Add validation to individual steps when needed:
steps:
- id: process_user_data
component: /user/process
input_schema:
type: object
properties:
user_data:
type: object
properties:
id: { type: string, minLength: 1 }
email: { type: string, format: email }
required: ["id", "email"]
required: ["user_data"]
input:
user_data: { $from: { step: load_user } }
Testing Best Practices
Comprehensive Test Coverage
Test different scenarios and edge cases:
test:
cases:
# Happy path
- name: successful_processing
description: "Test normal operation with valid input"
input:
user_id: "user123"
processing_options:
include_analytics: true
output_format: "json"
output:
outcome: success
result:
processed_data: "*"
metadata:
processing_time_ms: "*"
version: "1.0"
# Error cases
- name: invalid_user_id
description: "Test handling of invalid user ID"
input:
user_id: "" # Invalid empty ID
output:
outcome: failed
error:
code: "VALIDATION_ERROR"
# Edge cases
- name: minimal_input
description: "Test with minimal required input"
input:
user_id: "user123"
output:
outcome: success
result:
processed_data: "*"
Use Test-Specific Configuration
Create dedicated test configurations:
test:
stepflow_config: "test/test-config.yml"
cases:
# Test cases using mocked components
# test/test-config.yml
plugins:
- name: builtin
type: builtin
- name: mock_external_apis
type: stepflow
transport: stdio
command: "test/mock-server.py"
state_store:
type: in_memory
Documentation Best Practices
Descriptive Names and Documentation
Use clear, self-documenting names:
name: "Customer Order Processing Pipeline"
description: |
Processes customer orders through validation, inventory checking,
payment processing, and fulfillment scheduling. Handles both
standard and priority orders with appropriate error handling.
steps:
- id: validate_order_details
description: "Validate order format, customer info, and product availability"
component: /order/validate
input:
order: { $from: { workflow: input }, path: "order" }
- id: check_inventory_availability
description: "Verify all ordered items are in stock"
component: /inventory/check
input:
items: { $from: { step: validate_order_details }, path: "validated_items" }
Include Examples
Provide examples in your workflow documentation:
examples:
- name: standard_order
description: "Example of a standard order processing"
input:
order:
customer_id: "cust_12345"
items:
- product_id: "prod_001"
quantity: 2
- product_id: "prod_002"
quantity: 1
shipping_address:
street: "123 Main St"
city: "Anytown"
state: "CA"
zip: "12345"
- name: priority_order
description: "Example of a priority order with expedited processing"
input:
order:
customer_id: "cust_vip"
priority: true
items:
- product_id: "prod_premium"
quantity: 1
Security Best Practices
Handle Sensitive Data Carefully
Never include sensitive data in workflow definitions:
# ❌ Avoid - hardcoded secrets
steps:
- id: api_call
component: /http/request
input:
url: "https://api.example.com/data"
headers:
Authorization: "Bearer sk-1234567890abcdef" # Don't do this!
# ✅ Good - use environment variables
steps:
- id: api_call
component: /http/request
input:
url: "https://api.example.com/data"
headers:
Authorization: { $env: "API_TOKEN" }
Validate External Input
Always validate data from external sources:
steps:
- id: validate_external_data
component: /validation/external
input:
data: { $from: { step: fetch_external_data } }
schema: { $from: { step: load_validation_schema } }
- id: sanitize_data
component: /security/sanitize
input:
validated_data: { $from: { step: validate_external_data } }
Performance Best Practices
Monitor and Optimize
Track performance metrics and optimize bottlenecks:
steps:
- id: performance_critical_step
component: /analytics/heavy_computation
input:
data: { $from: { step: load_data } }
# Consider adding performance monitoring
metadata:
performance_critical: true
expected_duration_ms: 5000
Use Caching When Appropriate
Cache expensive computations:
steps:
- id: expensive_computation
component: /analytics/complex
input:
data: { $from: { step: prepare_data } }
cache_key: { $from: { step: generate_cache_key } }
use_cache: true
Maintenance Best Practices
Version Your Workflows
Use semantic versioning for workflows:
name: "User Processing Pipeline"
version: "2.1.0"
description: |
Version 2.1.0: Added optional analytics processing
Version 2.0.0: Redesigned with new component architecture
Version 1.x.x: Legacy processing pipeline
Keep Dependencies Updated
Regularly review and update component dependencies:
# Document component versions or requirements
metadata:
component_requirements:
openai_component: ">=1.2.0"
data_processor: "^2.0.0"
last_updated: "2024-01-15"
updated_by: "dev-team"
Refactor When Needed
Regularly review and refactor workflows:
- Remove unused steps: Clean up workflows periodically
- Combine related operations: Merge steps that always run together
- Split complex steps: Break down overly complex operations
- Update deprecated patterns: Migrate to newer best practices
Common Anti-Patterns to Avoid
Don't Create Monolithic Workflows
# ❌ Avoid - one giant workflow doing everything
name: "Everything Processor"
steps:
# 50+ steps doing unrelated things
# ✅ Good - focused, composable workflows
name: "User Data Processor"
steps:
# 5-10 related steps for user processing
Don't Ignore Error Handling
# ❌ Avoid - no error handling
steps:
- id: critical_operation
component: /external/api
input:
data: { $from: { workflow: input } }
# What happens if the API is down?
# ✅ Good - comprehensive error handling
steps:
- id: critical_operation
component: /external/api
on_error:
action: retry
max_attempts: 3
fallback:
action: use_default
default_value: { status: "unavailable" }
input:
data: { $from: { workflow: input } }
Don't Overcomplicate Simple Operations
# ❌ Avoid - overengineering simple tasks
steps:
- id: extract_field_setup
component: /config/setup
- id: extract_field_validate
component: /validation/check
- id: extract_field_execute
component: /data/extract
- id: extract_field_cleanup
component: /cleanup/finalize
# ✅ Good - simple extraction
steps:
- id: extract_field
component: /extract
input:
data: { $from: { step: load_data } }
path: "user.email"
Following these best practices will help you create robust, maintainable, and efficient StepFlow workflows that scale well and are easy to debug and modify.