Tasks
A task is a coding session. You describe what you want, an agent builds it in a VM, and you review the diff and create a PR. Each task gets its own branch (capy/<task-slug>-<id>). Your main branch stays clean. You can run the same task with different models to compare approaches - each run has its own conversation history and VM.
Tasks move through four states: backlog (planned, editable), in progress (agent working), needs review (work complete), and archived (done).
The task interface has a chat panel for talking to the agent, a file browser for navigating changes, a diff view for reviewing code, terminal output for command results, and a todo list showing the agent’s progress.
Captain vs Build
Captain is for planning. It reads your codebase, creates detailed task specs, and delegates to Build agents. Use it for complex features, multi-step projects, or when you’re not sure how to approach something. Captain can read code, browse the web, view PRs and issues, and create tasks - but it doesn’t edit files or run commands. That’s by design. Build is for coding. It has full VM access - edits files, runs commands, installs packages, browses the web, commits code. It works in an isolated Ubuntu VM with Python, Node.js, TypeScript, Rust, Go, Java, Docker, and common tools pre-installed. Use it when you know exactly what needs to be done. Start with Captain if you’re not sure. It’ll read your code and figure out the plan. If you already know exactly what you want, go straight to Build.Prompting tips
The biggest factor in getting good results is how you describe the task. Vague prompts get vague results. Specific prompts get working code. Be explicit. “Add error handling to the payment flow insrc/payments/checkout.ts for network failures and invalid card responses” beats “improve the payment flow.”
Reference files and patterns. If your codebase already does something well, point to it: “Follow the pattern in src/services/order.ts.” The agent will read that file and mirror the approach.
Break big things up. One task per feature or fix. “Implement auth, add the dashboard, and set up email notifications” is three tasks. Let Captain break it down, or do it yourself.
Give feedback directly. If the agent gets something wrong, tell it specifically what to change. “Move the validation to a middleware” is better than “this doesn’t look right.”
Here are some strong prompts:
“The useAuth hook in src/hooks/useAuth.ts doesn’t handle token refresh. Add automatic refresh when the token expires, following the error handling pattern in src/hooks/useApi.ts.” - Specific file, specific problem, reference to a pattern.
“Add pagination to the /api/posts endpoint. Use cursor-based pagination like we do in /api/comments. Return 20 items per page with nextCursor in the response.” - Clear requirement, points to existing implementation, specifies the interface.
“The user profile page at src/pages/Profile.tsx crashes when user.avatar is null. Add a fallback avatar and handle the null case in the UserHeader component.” - Describes the bug, names the file, explains the fix.
Handoffs
For long tasks, Build and Captain can use thehandoff tool to continue work in a fresh context. Handoff carries forward a concise summary, progress, and next steps so the agent can keep moving without hitting context limits.
As context grows, the agent receives progressive reminders:
- Info: consider handoff soon
- Warning: handoff very soon
- Force: handoff required on the next turn
| Model | Info threshold | Warning threshold | Force threshold |
|---|---|---|---|
| Claude Opus 4.6 | 360K | 380K | 400K |
| Claude Opus 4.6 Fast | 360K | 380K | 400K |
| Claude Opus 4.5 | 160K | 180K | 200K |
| Claude Sonnet 4.6 | 360K | 380K | 400K |
| Claude Sonnet 4.5 | 160K | 180K | 200K |
| Claude Haiku 4.5 | 120K | 140K | 160K |
| GPT-5.4 | 220K | 240K | 260K |
| GPT-5.4-Fast | 160K | 180K | 200K |
| GPT-5.4 Mini | 160K | 180K | 200K |
| GPT-5.3-Codex | 160K | 180K | 200K |
| GPT-5.3-Codex-Fast | 160K | 180K | 200K |
| GPT-5.2-Codex | 160K | 180K | 200K |
| GPT-5.2-Codex-Fast | 160K | 180K | 200K |
| GPT-5.2 | 160K | 180K | 200K |
| GPT-5.2-Fast | 160K | 180K | 200K |
| GPT-5.2 Pro | 160K | 180K | 200K |
| GPT-5.1 | 160K | 180K | 200K |
| GPT-5.1-Codex | 160K | 180K | 200K |
| GPT-5.1-Codex-Max | 160K | 180K | 200K |
| GPT-5 | 160K | 180K | 200K |
| GPT-5-Codex | 160K | 180K | 200K |
| Gemini 3.1 Pro | 160K | 180K | 200K |
| Gemini 3 Pro | 160K | 180K | 200K |
| Gemini 3 Flash | 160K | 180K | 200K |
| Grok 4.1 Fast | 160K | 180K | 200K |
| Grok 4 | 120K | 140K | 160K |
| GLM 5 | 120K | 140K | 160K |
| GLM 5 Turbo | 120K | 140K | 160K |
| GLM 4.7 | 80K | 100K | 120K |
| Kimi K2 | 100K | 130K | 160K |
| Kimi K2.5 | 160K | 180K | 200K |
| Qwen 3 Coder | 120K | 140K | 160K |
Models
Capy supports models from Anthropic, OpenAI, Google, xAI, and more. You can switch models mid-task or run the same task with different models to compare their implementations.Available models
Prices below show provider API rates. Capy billing adds a markup (see Pricing). Cached token rates are shown in parentheses when available.| Model | Provider | Context | Input (per 1M, API) | Output (per 1M, API) |
|---|---|---|---|---|
| Claude Opus 4.6 | Anthropic | 1M | $5.00 (cached $0.5) | $25.00 (cached $6.25) |
| Claude Opus 4.6 Fast | Anthropic | 1M | $30.00 (cached $3.00) | $150.00 (cached $37.50) |
| Claude Opus 4.5 | Anthropic | 200K | $5.00 (cached $0.5) | $25.00 (cached $6.25) |
| Claude Sonnet 4.6 | Anthropic | 1M | $3.00 (cached $0.3) | $15.00 (cached $3.75) |
| Claude Sonnet 4.5 | Anthropic | 1M | ≤200K: $3.00 (cached $0.3) >200K: $6.00 (cached $0.6) | ≤200K: $15.00 (cached $3.75) >200K: $22.50 (cached $7.50) |
| Claude Haiku 4.5 | Anthropic | 200K | $1.00 (cached $0.1) | $5.00 (cached $1.25) |
| GPT-5.4 | OpenAI | 1M | ≤272K: $2.50 (cached $0.25) >272K: $5.00 (cached $0.5) | ≤272K: $15.00 >272K: $22.50 |
| GPT-5.4-Fast | OpenAI | 1M | $5.00 (cached $0.5) | $30.00 |
| GPT-5.4 Mini | OpenAI | 400K | $0.75 (cached $0.075) | $4.50 |
| GPT-5.3-Codex | OpenAI | 400K | $1.75 (cached $0.175) | $14.00 |
| GPT-5.3-Codex-Fast | OpenAI | 400K | $3.50 (cached $0.35) | $28.00 |
| GPT-5.2-Codex | OpenAI | 400K | $1.75 (cached $0.175) | $14.00 |
| GPT-5.2-Codex-Fast | OpenAI | 400K | $3.50 (cached $0.35) | $28.00 |
| GPT-5.2 | OpenAI | 400K | $1.75 (cached $0.175) | $14.00 |
| GPT-5.2-Fast | OpenAI | 400K | $3.50 (cached $0.35) | $28.00 |
| GPT-5.2 Pro | OpenAI | 400K | $21.00 (cached $2.10) | $168.00 |
| GPT-5.1 | OpenAI | 400K | $1.25 (cached $0.125) | $10.00 |
| GPT-5.1-Codex | OpenAI | 400K | $1.25 (cached $0.125) | $10.00 |
| GPT-5.1-Codex-Max | OpenAI | 400K | $1.25 (cached $0.125) | $10.00 |
| GPT-5 | OpenAI | 400K | $1.25 (cached $0.125) | $10.00 |
| GPT-5-Codex | OpenAI | 400K | $1.25 (cached $0.125) | $10.00 |
| Gemini 3.1 Pro | 1M | ≤200K: $2.00 (cached $0.2) >200K: $4.00 (cached $0.4) | ≤200K: $12.00 >200K: $18.00 | |
| Gemini 3 Pro | 1M | ≤200K: $2.00 (cached $0.2) >200K: $4.00 (cached $0.4) | ≤200K: $12.00 >200K: $18.00 | |
| Gemini 3 Flash | 1M | $0.5 (cached $0.05) | $3.00 | |
| Grok 4.1 Fast | xAI | 2M | ≤128K: $0.2 (cached $0.05) >128K: $0.4 (cached $0.05) | ≤128K: $0.5 >128K: $1.00 |
| Grok 4 | xAI | 256K | ≤128K: $3.00 (cached $0.75) >128K: $6.00 (cached $0.75) | ≤128K: $15.00 >128K: $30.00 |
| GLM 5 | Fireworks | 200K | $1.00 (cached $0.2) | $3.20 |
| GLM 5 Turbo | Z.AI | 203K | $0.96 (cached $0.192) | $3.20 |
| GLM 4.7 | Google Vertex | 131K | $0.6 | $2.20 |
| Kimi K2 | Fireworks | 262K | $0.6 (cached $0.3) | $2.50 |
| Kimi K2.5 | Fireworks | 262K | $0.6 (cached $0.1) | $3.00 |
| Qwen 3 Coder | Google Vertex | 262K | $0.22 (cached $0.022) | $1.80 |
Quick recommendations
- Quick edits: Haiku 4.5, Gemini 3 Flash, Qwen 3 Coder - fast and cheap for simple changes
- Standard work: Sonnet 4.6, GPT-5.2, GPT-5.2 Codex - good balance of speed and capability
- Complex tasks: Opus 4.6, GPT-5.2 Pro, Grok 4 - best reasoning for architectural decisions
- Large codebases: Sonnet 4.5, Gemini 3 Pro, Grok 4.1 Fast - 1M+ context windows
Extended thinking
Some models support extended thinking (also called reasoning). This makes the model “think longer” before responding - it uses more tokens but produces better results for complex tasks. You can toggle it in the model selector. Models with adaptive thinking adjust reasoning effort automatically. Models with level-based thinking let you set the effort explicitly (minimal, low, medium, high).Model preferences
Manage your model settings at Settings → Models:- Default model - set which model new tasks use by default
- Visible models - hide models you don’t use from the model selector
- Reasoning level - configure the default thinking effort