Capy 101 - Capy

Tasks

A task is a coding session. You describe what you want, an agent builds it in a VM, and you review the diff and create a PR. Each task gets its own branch (capy/<task-slug>-<id>). Your main branch stays clean. You can run the same task with different models to compare approaches - each run has its own conversation history and VM. Tasks move through four states: backlog (planned, editable), in progress (agent working), needs review (work complete), and archived (done). The task interface has a chat panel for talking to the agent, a file browser for navigating changes, a diff view for reviewing code, terminal output for command results, and a todo list showing the agent’s progress.

Captain vs Build

Captain is for planning. It reads your codebase, creates detailed task specs, and delegates to Build agents. Use it for complex features, multi-step projects, or when you’re not sure how to approach something. Captain can read code, browse the web, view PRs and issues, and create tasks - but it doesn’t edit files or run commands. That’s by design. Build is for coding. It has full VM access - edits files, runs commands, installs packages, browses the web, commits code. It works in an isolated Ubuntu VM with Python, Node.js, TypeScript, Rust, Go, Java, Docker, and common tools pre-installed. How that VM is prepared - setup scripts, commands, size - is configured per project in the Dev environment. Use Build when you know exactly what needs to be done. Start with Captain if you’re not sure. It’ll read your code and figure out the plan. If you already know exactly what you want, go straight to Build.

Prompting tips

The biggest factor in getting good results is how you describe the task. Vague prompts get vague results. Specific prompts get working code. Be explicit. “Add error handling to the payment flow in src/payments/checkout.ts for network failures and invalid card responses” beats “improve the payment flow.” Reference files and patterns. If your codebase already does something well, point to it: “Follow the pattern in src/services/order.ts.” The agent will read that file and mirror the approach. Break big things up. One task per feature or fix. “Implement auth, add the dashboard, and set up email notifications” is three tasks. Let Captain break it down, or do it yourself. Give feedback directly. If the agent gets something wrong, tell it specifically what to change. “Move the validation to a middleware” is better than “this doesn’t look right.” Here are some strong prompts: “The useAuth hook in src/hooks/useAuth.ts doesn’t handle token refresh. Add automatic refresh when the token expires, following the error handling pattern in src/hooks/useApi.ts.” - Specific file, specific problem, reference to a pattern. “Add pagination to the /api/posts endpoint. Use cursor-based pagination like we do in /api/comments. Return 20 items per page with nextCursor in the response.” - Clear requirement, points to existing implementation, specifies the interface. “The user profile page at src/pages/Profile.tsx crashes when user.avatar is null. Add a fallback avatar and handle the null case in the UserHeader component.” - Describes the bug, names the file, explains the fix.

Context compaction

For long tasks, Capy’s harness can compact context before the next model request when a run approaches the model limit. Compaction preserves the latest user ask, key facts, constraints, evidence IDs, and next actions, then continues the same task from that compacted boundary.

Model	API ID	Compact threshold
Claude Opus 4.8	`claude-opus-4-8`	400K
Claude Opus 4.7	`claude-opus-4-7`	400K
Claude Opus 4.6	`claude-opus-4-6`	400K
Claude Opus 4.5	`claude-opus-4-5`	200K
Claude Sonnet 4.6	`claude-sonnet-4-6`	400K
Claude Haiku 4.5	`claude-haiku-4-5`	160K
GPT-5.5	`gpt-5.5`	400K
GPT-5.5 Pro	`gpt-5.5-pro`	400K
GPT-5.4	`gpt-5.4`	400K
GPT-5.4 Mini	`gpt-5.4-mini`	200K
GPT-5.3-Codex	`gpt-5.3-codex`	200K
GPT-5.3-Codex-Spark	`gpt-5.3-codex-spark`	100K
Gemini 3.1 Pro	`gemini-3.1-pro-preview`	200K
Gemini 3 Flash	`gemini-3-flash-preview`	200K
Grok 4.1 Fast	`grok-4-1-fast`	200K
GLM 5.2	`glm-5.2`	400K
GLM 5.1	`glm-5.1`	160K
GLM 5V Turbo	`glm-5v-turbo`	160K
GLM 5	`glm-5`	160K
DeepSeek V4 Pro	`deepseek-v4-pro`	400K
GLM 5 Turbo	`glm-5-turbo`	160K
GLM 4.7	`glm-4.7`	120K
Kimi K2.7 Code	`kimi-k2.7-code`	200K
Kimi K2.6	`kimi-k2.6`	200K
Qwen 3 Coder	`qwen3-coder`	160K

Models

Capy supports models from Anthropic, OpenAI, Google, xAI, and more. You can switch models mid-task or run the same task with different models to compare their implementations.

Available models

Prices below show provider API rates. Capy billing adds a markup (see Pricing). Cached token rates are shown in parentheses when available.

Model	API ID	Provider	Context	Input (per 1M, API)	Output (per 1M, API)
Claude Opus 4.8	`claude-opus-4-8`	Anthropic	1M	$5.00 (cached $0.5)	$25.00 (cached $6.25)
Claude Opus 4.7	`claude-opus-4-7`	Anthropic	1M	$5.00 (cached $0.5)	$25.00 (cached $6.25)
Claude Opus 4.6	`claude-opus-4-6`	Anthropic	1M	$5.00 (cached $0.5)	$25.00 (cached $6.25)
Claude Opus 4.5	`claude-opus-4-5`	Anthropic	200K	$5.00 (cached $0.5)	$25.00 (cached $6.25)
Claude Sonnet 4.6	`claude-sonnet-4-6`	Anthropic	1M	$3.00 (cached $0.3)	$15.00 (cached $3.75)
Claude Haiku 4.5	`claude-haiku-4-5`	Anthropic	200K	$1.00 (cached $0.1)	$5.00 (cached $1.25)
GPT-5.5	`gpt-5.5`	OpenAI	1M	≤272K: $5.00 (cached $0.5) >272K: $10.00 (cached $1.00)	≤272K: $30.00 >272K: $45.00
GPT-5.5 Pro	`gpt-5.5-pro`	OpenAI	1M	$30.00	$180.00
GPT-5.4	`gpt-5.4`	OpenAI	1M	≤272K: $2.50 (cached $0.25) >272K: $5.00 (cached $0.5)	≤272K: $15.00 >272K: $22.50
GPT-5.4 Mini	`gpt-5.4-mini`	OpenAI	400K	$0.75 (cached $0.075)	$4.50
GPT-5.3-Codex	`gpt-5.3-codex`	OpenAI	400K	$1.75 (cached $0.175)	$14.00
GPT-5.3-Codex-Spark	`gpt-5.3-codex-spark`	OpenAI	128K	$0 (cached $0)	$0
Gemini 3.1 Pro	`gemini-3.1-pro-preview`	Google	1M	≤200K: $2.00 (cached $0.2) >200K: $4.00 (cached $0.4)	≤200K: $12.00 >200K: $18.00
Gemini 3 Flash	`gemini-3-flash-preview`	Google	1M	$0.5 (cached $0.05)	$3.00
Grok 4.1 Fast	`grok-4-1-fast`	xAI	2M	≤128K: $0.2 (cached $0.05) >128K: $0.4 (cached $0.05)	≤128K: $0.5 >128K: $1.00
GLM 5.2	`glm-5.2`	Fireworks	1M	$1.40 (cached $0.26)	$4.40
GLM 5.1	`glm-5.1`	Fireworks	203K	$1.40 (cached $0.26)	$4.40
GLM 5V Turbo	`glm-5v-turbo`	Z.AI	203K	$1.20 (cached $0.24)	$4.00
GLM 5	`glm-5`	Fireworks	200K	$1.00 (cached $0.2)	$3.20
DeepSeek V4 Pro	`deepseek-v4-pro`	Fireworks	1M	$1.74 (cached $0.145)	$3.48
GLM 5 Turbo	`glm-5-turbo`	Z.AI	203K	$0.96 (cached $0.192)	$3.20
GLM 4.7	`glm-4.7`	Google Vertex	131K	$0.6	$2.20
Kimi K2.7 Code	`kimi-k2.7-code`	Fireworks	262K	$0.95 (cached $0.19)	$4.00
Kimi K2.6	`kimi-k2.6`	Fireworks	262K	$0.95 (cached $0.16)	$4.00
Qwen 3 Coder	`qwen3-coder`	Google Vertex	262K	$0.22 (cached $0.022)	$1.80

Quick recommendations

Quick edits: Haiku 4.5, Gemini 3 Flash, Qwen 3 Coder - fast and cheap for simple changes
Standard work: Sonnet 4.6, GPT-5.4, GPT-5.3-Codex - good balance of speed and capability
Complex tasks: Opus 4.8, GPT-5.5, GPT-5.5 Pro - best reasoning for architectural decisions
Large codebases: Sonnet 4.6, Gemini 3.1 Pro, Grok 4.1 Fast - 1M+ context windows

Extended thinking

Some models support extended thinking (also called reasoning). This makes the model “think longer” before responding - it uses more tokens but produces better results for complex tasks. You can toggle it in the model selector. Models with adaptive thinking adjust reasoning effort automatically. Models with level-based thinking let you set the effort explicitly (minimal, low, medium, high).

Model preferences

Manage your model settings at Settings → Models:

Default model - set which model new tasks use by default
Visible models - hide models you don’t use from the model selector
Reasoning level - configure the default thinking effort

​Tasks

​Captain vs Build

​Prompting tips

​Context compaction

​Models

​Available models

​Quick recommendations

​Extended thinking

​Model preferences