This document explains how token limits for Claude 3.7 models are handled in Cursor, where the relevant code is located, and how to modify these limits if needed.
flowchart TD
A[User starts chat with Claude] --> B{Is model limit cached?}
B -->|Yes| C[Use cached token limit]
B -->|No| D[Call getEffectiveTokenLimit API]
D --> E[Cache result for 24 hours]
E --> F[Apply token limit]
G[User sends message] --> H[Calculate token count]
H --> I{Token count > limit?}
I -->|Yes| J[Show warning or error]
I -->|No| K[Process normally]
flowchart LR
A[workbench.desktop.main.js] --> B[Token limit UI warnings]
A --> C[Default fallback limits]
A --> D[Token counting logic]
E[extensionHostProcess.js] --> F[API definitions]
E --> G[Model metadata]
H[extensionHostWorkerMain.js] --> I[Worker process limits]
-
Default Token Limit:
- File:
resources/app/out/vs/workbench/workbench.desktop.main.js(line ~2586) - Description: Sets default context token limit to 30,000 tokens
- File:
-
Token Limit API:
- File:
resources/app/out/vs/workbench/workbench.desktop.main.js(line ~612) - Description: Contains the
getEffectiveTokenLimitfunction
- File:
-
Claude Model Definitions:
- File:
resources/app/out/vs/workbench/workbench.desktop.main.js(line ~2587-2589) - Description: Contains references to Claude 3.7 models and variants
- File:
-
Model Response Handlers:
- File:
resources/app/out/vs/workbench/api/node/extensionHostProcess.js(line ~139) - Description: Handles model metadata including token limits
- File:
The code analysis reveals that the primary difference between regular Claude 3.7 Sonnet and the "Max" variants is the context window size. This is intentionally configured in the application:
flowchart TD
A[Claude 3.7 Models] --> B{Max Variant?}
B -->|No| C[Standard Context Window]
B -->|Yes| D[Large 200K Context Window]
D --> E[Special flag: isLongContextOnly]
E --> F[Higher getEffectiveTokenLimit value]
-
Model Configuration Flags:
- The code at
resources/app/out/vs/workbench/workbench.desktop.main.js(~line 572) shows other Claude models with 200K context using theisLongContextOnlyflag:
{name:"claude-3-5-sonnet-200k", defaultOn:!0, isLongContextOnly:!0, supportsAgent:!0}
- The code at
-
Context Window Access Control:
- In
workbench.desktop.main.js(~line 74), there's a reference tolongContextOpenAIModel:"claude-3-5-sonnet-200k", showing that some Claude models are specifically flagged for long context
- In
-
Token Limit Resolution Flow:
- The
getEffectiveTokenLimitfunction (line ~612) determines the token limit for a model by:- Checking a cache first
- If not cached, calling the server API
- Falling back to 200,000 tokens if the API fails
- This system allows for dynamic configuration of token limits for different models without hardcoding them
- The
-
No Hard-Coded Distinction:
- There's no explicit code setting different token limits for
claude-3.7-sonnetvs.claude-3.7-sonnet-maxin the client - Instead, the
getEffectiveTokenLimitAPI call retrieves these limits from the server
- There's no explicit code setting different token limits for
Based on the codebase patterns, when the app calls getEffectiveTokenLimit for a "Max" variant:
- The server recognizes the model as premium
- The server returns a higher token limit value (likely 200K)
- The client stores this value in cache for 24 hours
- All subsequent token limit checks use this cached higher value
The regular Claude 3.7 variant follows the same pattern but receives a lower token limit from the server.
The actual token limit values are determined server-side, not in the client code. This allows Cursor to:
- Update token limits without client changes
- Configure different limits for different user tiers
- Apply model-specific restrictions
- Handle API-level token limit enforcement
To give regular Claude 3.7 access to the same 200K context window as the Max variant, you would need to:
-
Server-side change (ideal): Modify the server API to return the same token limit for regular Claude 3.7 as it does for the Max variant
-
Client-side hack:
// Find the getEffectiveTokenLimit function in workbench.desktop.main.js (~line 612) async getEffectiveTokenLimit(e) { const n = e.modelName; // Add this condition to override for Claude 3.7 if (n === "claude-3.7-sonnet") { return 200000; // Same limit as Max } // Rest of the original function... }
-
Cache manipulation: Another approach would be to manipulate the client-side token limit cache to always return the Max limit value for the regular variant.
flowchart LR
A[Find default limit] --> B[Modify hard-coded value]
B --> C[Rebuild application]
To increase the default fallback limit (used when the server doesn't provide a specific value):
- Locate the line in
workbench.desktop.main.jswhere the default limit is set (~line 2586) - Change
3e4(30,000) to your desired value - Rebuild the application
flowchart LR
A[Intercept API response] --> B[Modify token limit value]
B --> C[Return modified response]
If you have access to the server implementation:
- Find the server-side implementation of
getEffectiveTokenLimit - Adjust the returned token limit for Claude 3.7 models
- Deploy the server changes
flowchart LR
A[Locate token limit cache] --> B[Inject custom values]
B --> C[Ensure cache persistence]
To override the client-side cache:
- Find the token limit cache storage in
workbench.desktop.main.js(~line 612) - Modify how the application stores or retrieves these values
- Ensure your changes persist across application restarts
flowchart TD
A[Claude 3.7 Sonnet Models] --> B[claude-3.7-sonnet]
A --> C[claude-3.7-sonnet-max]
A --> D[claude-3.7-sonnet-thinking]
A --> E[claude-3.7-sonnet-thinking-max]
B -->|Base model| F[Standard token limit]
C -->|Max version| G[Potentially higher limit]
D -->|Thinking version| H[Similar to base model]
E -->|Thinking + Max| I[Highest potential limit]
Based on code analysis from the Cursor codebase, here are the key differences between Claude 3.7 variants:
flowchart LR
A[Claude 3.7 Standard] --> B[Base limits]
A --> C[Default pricing]
D[Claude 3.7 Max] --> E[Higher token limits]
D --> F[Premium pricing]
D --> G[Advanced capabilities]
The codebase reveals:
- Max vs. Standard: The code at
workbench.desktop.main.js(~line 2587-2589) shows "Max" variants have special UI treatments with gradient styling, suggesting premium status - Visual Indicators: Max models use a
"continuous-gradient-container gradient-high"CSS class for visual distinction - Different Default Usage: The background composer model defaults to "claude-3.7-sonnet-thinking-max" (~line 572), suggesting this is considered the most capable variant
flowchart TB
A[Regular Mode] --> B[Standard processing]
A --> C[Focused on final answers]
D[Thinking Mode] --> E[Shows reasoning process]
D --> F[Thinking level configurable]
D --> G[Supports agent-based workflows]
From the codebase:
- Thinking Level Configuration:
getModeThinkingLevelfunction (~line 592) indicates thinking modes can be configured with different levels: "none", "medium", or "high" - Thinking Time Tracking: The app tracks and displays thinking time metrics (~line 2295)
- UI for Thinking: There's dedicated UI for thinking states: "Planning next moves" vs. "Thought for X seconds" vs. "Stopped thinking"
- Thinking Mode Toggle: Code (~line 2788) shows direct mode conversion between regular and thinking modes:
function Ut(qt){ return qt==="claude-3.7-sonnet"?"claude-3.7-sonnet-thinking": qt==="claude-3.7-sonnet-max"?"claude-3.7-sonnet-thinking-max": null }
flowchart TD
A[Model Configuration] --> B{Supports Agents?}
B -->|Yes| C[claude-3.7-sonnet-thinking-max]
B -->|Yes| D[claude-3.7-sonnet-thinking]
B -->|Limited| E[claude-3.7-sonnet]
B -->|Limited| F[claude-3.7-sonnet-max]
Key findings:
- Agent Support: The
doesModelSupportAgentfunction (~line 2007) is used to check if a model supports agent capabilities - Tool Use: The code suggests that thinking variants may have better integration with tools and agents
- Background Processing: Max variants, especially "claude-3.7-sonnet-thinking-max", are used for background processing, indicating they may have better performance for complex tasks
The code at workbench.desktop.main.js (~line 2587) shows how model defaults are managed:
c7s = (i, e, t, s, n) => {
if (i)
if (e)
if (i === "claude-3.7-sonnet")
t("claude-3.7-sonnet-thinking");
else if (i === "claude-3.7-sonnet-max")
t("claude-3.7-sonnet-thinking-max");
// ... other models
else if (i === "claude-3.7-sonnet-thinking")
t("claude-3.7-sonnet");
else if (i === "claude-3.7-sonnet-thinking-max")
t("claude-3.7-sonnet-max");
// ... other logic
}This logic shows the close relationship between standard and thinking variants, with easy toggling between them.
sequenceDiagram
participant User
participant UI as Cursor UI
participant API as Cursor API
participant Claude as Claude Model
User->>UI: Types message
UI->>UI: Calculate token count
UI->>UI: Check against limit
alt Token count > limit
UI->>User: Show warning
User->>UI: Send anyway
UI->>API: Submit with token count
API->>Claude: Forward request
Claude->>API: MAX_TOKENS error
API->>UI: Display error
else Token count <= limit
UI->>API: Submit normally
API->>Claude: Forward request
Claude->>API: Process successfully
API->>UI: Return response
UI->>User: Display response
end
While the exact pricing isn't hardcoded in the client code, there are indications of cost differences:
-
Thinking Modes Cost Tracking: The code tracks both
processedTokensandthinkingTokensseparately, along with their associated costs (processedCostandthinkingCost) -
Max Variants Premium Status: The UI treatment of Max variants (with gradient styling) suggests they are premium offerings
-
API Key Requirements: There are references to Claude API keys, suggesting some models may require specific authentication or be available at different pricing tiers
- Token limits are dynamically fetched from the server to allow for updates without client changes
- The client caches these limits for 24 hours (864e5 milliseconds)
- If the server request fails, a default of 200,000 tokens is used
- The UI displays warnings when approaching the token limit
- The system includes specific handling for MAX_TOKENS errors
- Each model variant may have different token limits
Based on analysis of the codebase, here's how Cursor integrates with Claude 3.7 APIs and implements tool usage:
sequenceDiagram
participant User
participant Renderer as Renderer Process
participant ExtHost as Extension Host Process
participant API as Cursor API Servers
participant Claude as Anthropic API
User->>Renderer: Enters request with tool context
Renderer->>ExtHost: IPC message with request
ExtHost->>API: Forward request to API server
API->>Claude: Format request for Anthropic
Claude->>API: Stream response with tool calls
API->>ExtHost: Forward tool call requests
ExtHost->>ExtHost: Execute tool
ExtHost->>API: Send tool results back
API->>Claude: Continue with tool results
Claude->>API: Complete response
API->>ExtHost: Forward final response
ExtHost->>Renderer: Render result to user
Renderer->>User: Display response and tool outputs
-
Server Endpoints:
- Primary API endpoints:
https://api2.cursor.sh,https://api3.cursor.sh, andhttps://api4.cursor.sh - These servers act as proxies to Anthropic's API at
https://api.anthropic.com/v1/messages
- Primary API endpoints:
-
Process Communication:
- Cursor uses a multi-process architecture based on Electron
- Inter-Process Communication (IPC) connects the Renderer process and Extension Host process
- The Extension Host handles direct communication with Claude via API servers
-
Tool Definition System:
- Tools are defined using the VSCode API extension interface
- Each tool has a schema, name, and invoke function
- Defined in
vscode.d.tswith interfaces likeLanguageModelToolandLanguageModelToolCallPart
-
Tool Call Processing:
- When Claude makes a tool call, it returns a
LanguageModelToolCallPartin the response stream - The Extension Host process executes the tool using
invokeTool - Results are returned as
LanguageModelToolResultPartobjects - Tool execution can be synchronous or asynchronous
- When Claude makes a tool call, it returns a
-
Tool Types Available:
- File System Tools: Reading, writing, and navigating files
- Search Tools: Code search, file search, and grep functionality
- Terminal Command Tools: Executing shell commands
- Web Search Tools: External web search integration
Cursor formats requests to Claude using the Anthropic API format:
{
messages: [
{ role: "user", content: [{ type: "text", text: userMessage }] },
// Previous messages in conversation
],
model: "claude-3.7-sonnet", // Or variant
max_tokens: tokenLimit, // From getEffectiveTokenLimit
system: systemPrompt, // Contains Cursor-specific instructions
tools: [
// Tool definitions based on available tools
{
name: "toolName",
description: "Tool description",
input_schema: { /* JSON schema */ },
}
]
}-
Token Limit Errors:
- Code includes specific handling for
CLAUDE_IMAGE_TOO_LARGE(error code 31) - MAX_TOKENS errors trigger UI warnings and suggestions
- Code includes specific handling for
-
API Communication Errors:
- Connection failures are handled with retry logic
- Error codes are mapped to user-friendly messages
- Custom fallbacks for common failure scenarios
-
Key Management:
- Users can provide their own Anthropic API keys through the
useClaudeKeysetting - Keys are stored securely using the application storage service
- Sample API key usage appears in code snippets for both PowerShell and curl
- Users can provide their own Anthropic API keys through the
-
Key-based Features:
- When a personal key is used, different error handling may apply
- Premium features may be available based on the user's Anthropic account tier
This comprehensive analysis shows how Cursor integrates Claude 3.7 models using a sophisticated multi-process architecture, with detailed tool support and robust error handling capabilities.