Model Evaluation
Find more details about how to choose the right model on our Model Evaluation page.
Why LLM Choice Matters
- Accuracy: Better models provide more reliable element detection and action planning
- Speed: Faster models reduce automation latency
- Cost: Different providers offer varying pricing structures
- Reliability: Structured output support ensures consistent automation behavior
Find more details about how to choose the right model on our Model Evaluation page.
Small models on Ollama struggle with consistent structured outputs. While technically supported, we don’t recommend them for production Stagehand workflows.
Environment Variables Setup
Set up your API keys before configuring Stagehand:Supported Providers
Stagehand supports major LLM providers with structured output capabilities:Production-Ready Providers
Provider | Best Models | Strengths | Use Case |
---|---|---|---|
OpenAI | gpt-4.1 , gpt-4.1-mini | High accuracy, reliable | Production, complex sites |
Anthropic | claude-3-7-sonnet-latest | Excellent reasoning | Complex automation tasks |
gemini-2.5-flash , gemini-2.5-pro | Fast, cost-effective | High-volume automation |
Additional Providers
Basic Configuration
Model Name Format
Stagehand uses the formatprovider/model-name
for model specification.
Examples:
- OpenAI:
openai/gpt-4.1
- Anthropic:
anthropic/claude-3-7-sonnet-latest
- Google:
google/gemini-2.5-flash
(Recommended)
Quick Start Examples
- Google (Recommended)
- OpenAI
- Anthropic
Custom LLM Integration
Custom LLMs are currently only supported in TypeScript.
Vercel AI SDK
The Vercel AI SDK is a popular library for interacting with LLMs. You can use any of the providers supported by the Vercel AI SDK to create a client for your model, as long as they support structured outputs. Vercel AI SDK supports providers for OpenAI, Anthropic, and Google, along with support for Amazon Bedrock and Azure OpenAI. To get started, you’ll need to install theai
package and the provider you want to use. For example, to use Amazon Bedrock, you’ll need to install the @ai-sdk/amazon-bedrock
package.
You’ll also need to import the Vercel AI SDK external client which is exposed as AISdkClient
to create a client for your model.
- npm
- pnpm
- yarn
AISdkClient
to create a client for your model.
Troubleshooting
Common Issues
Model doesn't support structured outputs
Model doesn't support structured outputs
Error:
Model does not support structured outputs
Solution:
Use models that support function calling/structured outputs. The minimum requirements are:- Model must support JSON/structured outputs
- Model must have strong reasoning capabilities
- Model must be able to handle complex instructions
- OpenAI: GPT-4 series or newer
- Anthropic: Claude 3 series or newer
- Google: Gemini 2 series or newer
- Other providers: Latest models with structured output support
Authentication errors
Authentication errors
Error:
Invalid API key
or Unauthorized
Solution:- Verify your environment variables are set correctly
- Check API key permissions and quotas
- Ensure you’re using the correct API key for the provider
- For Anthropic, make sure you have access to the Claude API
Inconsistent automation results
Inconsistent automation results
Symptoms: Actions work sometimes but fail other timesCauses & Solutions:
- Weak models: Use more capable models - check our Model Evaluation page for current recommendations
- High temperature: Set temperature to 0 for deterministic outputs
- Complex pages: Switch to models with higher accuracy scores on our Model Evaluation page
- Rate limits: Implement retry logic with exponential backoff
- Context limits: Reduce page complexity or use models with larger context windows
- Prompt clarity: Ensure your automation instructions are clear and specific
Slow performance
Slow performance
Issue: Automation takes too long to respondSolutions:
- Use fast models: Choose models optimized for speed
- Any model with < 1s response time
- Models with “fast” or “flash” variants
- Optimize settings:
- Use
verbose: 0
to minimize token usage - Set temperature to 0 for fastest processing
- Keep max tokens as low as possible
- Use
- Consider local deployment: Local models can provide lowest latency
- Batch operations: Group multiple actions when possible
High costs
High costs
Issue: LLM usage costs are too highCost Optimization Strategies:
- Switch to cost-effective models:
- Check our Model Evaluation page for current cost-performance benchmarks
- Choose models with lower cost per token that still meet accuracy requirements
- Consider models optimized for speed to reduce total runtime costs
- Optimize token usage:
- Set
verbose: 0
to reduce logging overhead - Use concise prompts and limit response length
- Set
- Smart model selection: Start with cheaper models, fallback to premium ones only when needed
- Cache responses: Implement LLM response caching for repeated automation patterns
- Monitor usage: Set up billing alerts and track costs per automation run
- Batch processing: Process multiple similar tasks together