TL;DR
Claude Opus 4.7 (claude-opus-4-7) is Anthropic’s most capable GA model. It supports a 1M token context window, 128K max output, adaptive thinking, a new xhigh effort level, task budgets, high-res vision up to 3.75 MP, and tool use. This guide shows how to set up the API and implement the main capabilities in Python, TypeScript, and cURL.
Introduction
Anthropic released Claude Opus 4.7 on April 16, 2026. It is the most powerful model in the Claude family and is designed for complex reasoning, autonomous agents, and vision-heavy workflows.
If you already use the Claude API, the Messages API will look familiar. The main code changes are:
- Extended thinking budgets are no longer supported.
- Sampling parameters such as
temperature,top_p, andtop_kare no longer supported. - Thinking now uses only adaptive thinking.
- Thinking is off by default.
-
display: "summarized"is required if you want thinking content returned.
This guide walks through API setup, authentication, basic requests, adaptive thinking, high-resolution images, tool use, task budgets, streaming, prompt caching, and multi-turn conversations. It also shows how to test these payloads with Apidog.
Getting Started
1. Get your API key
Create an API key from Anthropic Console:
- Sign up at
console.anthropic.com - Open API Keys
- Click Create Key
- Copy the key
Store it as an environment variable:
export ANTHROPIC_API_KEY="sk-ant-your-key-here"
2. Install the SDK
Python:
pip install anthropic
TypeScript / Node.js:
npm install @anthropic-ai/sdk
3. Use the Messages API endpoint
All requests go to:
POST https://api.anthropic.com/v1/messages
Required headers:
x-api-key: YOUR_API_KEY
anthropic-version: 2023-06-01
content-type: application/json
Basic Text Request
Use this as your smoke test before adding tools, images, streaming, or thinking.
Python
import anthropic
client = anthropic.Anthropic()
message = client.messages.create(
model="claude-opus-4-7",
max_tokens=1024,
messages=[
{
"role": "user",
"content": "Explain how HTTP/2 server push works in three sentences."
}
]
)
print(message.content[0].text)
TypeScript
import Anthropic from "@anthropic-ai/sdk";
const client = new Anthropic();
const message = await client.messages.create({
model: "claude-opus-4-7",
max_tokens: 1024,
messages: [
{
role: "user",
content: "Explain how HTTP/2 server push works in three sentences.",
},
],
});
console.log(message.content[0].text);
cURL
curl https://api.anthropic.com/v1/messages \
-H "x-api-key: $ANTHROPIC_API_KEY" \
-H "anthropic-version: 2023-06-01" \
-H "content-type: application/json" \
-d '{
"model": "claude-opus-4-7",
"max_tokens": 1024,
"messages": [
{
"role": "user",
"content": "Explain how HTTP/2 server push works in three sentences."
}
]
}'
Adaptive Thinking
Adaptive thinking lets Claude allocate reasoning tokens dynamically based on task complexity.
It is not enabled by default. Add a thinking object to the request:
message = client.messages.create(
model="claude-opus-4-7",
max_tokens=16384,
thinking={
"type": "adaptive",
"display": "summarized"
},
messages=[
{
"role": "user",
"content": """Analyze this algorithm's time complexity and suggest optimizations:
def find_pairs(arr, target):
result = []
for i in range(len(arr)):
for j in range(i+1, len(arr)):
if arr[i] + arr[j] == target:
result.append((arr[i], arr[j]))
return result"""
}
]
)
for block in message.content:
if block.type == "thinking":
print("Thinking:", block.thinking)
elif block.type == "text":
print("Response:", block.text)
Key implementation notes:
- Use
thinking={"type": "adaptive"}to enable adaptive thinking. - Do not set
budget_tokens; it returns a400error. - Use
display: "summarized"if you want thinking content in the response. - If
displayis omitted, thinking is not returned. - Use
output_config.effortto influence reasoning depth.
Control reasoning depth with effort
message = client.messages.create(
model="claude-opus-4-7",
max_tokens=16384,
thinking={"type": "adaptive"},
output_config={"effort": "xhigh"},
messages=[
{
"role": "user",
"content": "Review this pull request for security vulnerabilities..."
}
]
)
Supported effort levels:
| Level | Best for |
|---|---|
xhigh |
Coding, agentic tasks, complex reasoning |
high |
Most intelligence-sensitive work |
medium |
Balanced speed vs. quality |
low |
Simple tasks and fast responses |
High-Resolution Vision
Opus 4.7 accepts images up to 2,576 pixels on the long edge, or 3.75 megapixels. Coordinates map 1:1 to actual pixels.
Analyze an image from a URL
message = client.messages.create(
model="claude-opus-4-7",
max_tokens=4096,
messages=[
{
"role": "user",
"content": [
{
"type": "image",
"source": {
"type": "url",
"url": "https://example.com/architecture-diagram.png"
}
},
{
"type": "text",
"text": "Describe this architecture diagram. List every service and the connections between them."
}
]
}
]
)
print(message.content[0].text)
Analyze a local image with base64
import base64
import anthropic
client = anthropic.Anthropic()
with open("screenshot.png", "rb") as f:
image_data = base64.standard_b64encode(f.read()).decode("utf-8")
message = client.messages.create(
model="claude-opus-4-7",
max_tokens=4096,
messages=[
{
"role": "user",
"content": [
{
"type": "image",
"source": {
"type": "base64",
"media_type": "image/png",
"data": image_data
}
},
{
"type": "text",
"text": "What UI bugs do you see in this screenshot?"
}
]
}
]
)
print(message.content[0].text)
Higher-resolution images consume more tokens. Resize images before sending them if you do not need full visual fidelity.
Tool Use
Tool use lets Claude call functions you define. Opus 4.7 tends to use fewer tool calls by default and may prefer reasoning. Increase effort when you want stronger tool-use behavior.
Define a tool
tools = [
{
"name": "get_weather",
"description": "Get current weather for a city. Returns temperature, conditions, and humidity.",
"input_schema": {
"type": "object",
"properties": {
"city": {
"type": "string",
"description": "City name, e.g. 'San Francisco'"
},
"units": {
"type": "string",
"enum": ["celsius", "fahrenheit"],
"description": "Temperature unit"
}
},
"required": ["city"]
}
}
]
Run a tool-use request
import json
import anthropic
client = anthropic.Anthropic()
tools = [
{
"name": "get_weather",
"description": "Get current weather for a city. Returns temperature, conditions, and humidity.",
"input_schema": {
"type": "object",
"properties": {
"city": {
"type": "string",
"description": "City name, e.g. 'San Francisco'"
},
"units": {
"type": "string",
"enum": ["celsius", "fahrenheit"],
"description": "Temperature unit"
}
},
"required": ["city"]
}
}
]
messages = [
{
"role": "user",
"content": "What's the weather like in Tokyo right now?"
}
]
# First call: Claude requests a tool
response = client.messages.create(
model="claude-opus-4-7",
max_tokens=1024,
tools=tools,
messages=messages,
)
if response.stop_reason == "tool_use":
messages.append({
"role": "assistant",
"content": response.content
})
tool_results = []
for block in response.content:
if block.type == "tool_use":
# Execute your real function here.
result = {
"temperature": 22,
"conditions": "Partly cloudy",
"humidity": 65
}
tool_results.append({
"type": "tool_result",
"tool_use_id": block.id,
"content": json.dumps(result)
})
messages.append({
"role": "user",
"content": tool_results
})
# Second call: Claude uses the tool result
final_response = client.messages.create(
model="claude-opus-4-7",
max_tokens=1024,
tools=tools,
messages=messages,
)
print(final_response.content[0].text)
Agentic Loop Pattern
For autonomous agents, keep calling the model until it stops requesting tools.
def run_agent(system_prompt: str, tools: list, user_message: str) -> str:
messages = [
{
"role": "user",
"content": user_message
}
]
while True:
response = client.messages.create(
model="claude-opus-4-7",
max_tokens=16384,
system=system_prompt,
tools=tools,
thinking={"type": "adaptive"},
output_config={"effort": "xhigh"},
messages=messages,
)
messages.append({
"role": "assistant",
"content": response.content
})
if response.stop_reason != "tool_use":
return "".join(
block.text
for block in response.content
if hasattr(block, "text")
)
tool_results = []
for block in response.content:
if block.type == "tool_use":
result = execute_tool(block.name, block.input)
tool_results.append({
"type": "tool_result",
"tool_use_id": block.id,
"content": result,
})
messages.append({
"role": "user",
"content": tool_results
})
Task Budgets Beta
Task budgets give Claude a token allowance for an entire agentic loop. The model sees a running countdown and can wrap up work as the budget is consumed.
response = client.beta.messages.create(
model="claude-opus-4-7",
max_tokens=128000,
output_config={
"effort": "high",
"task_budget": {
"type": "tokens",
"total": 128000
},
},
messages=[
{
"role": "user",
"content": "Review the codebase and propose a refactor plan."
}
],
betas=["task-budgets-2026-03-13"],
)
Important constraints:
- Minimum budget: 20,000 tokens
- Advisory, not a hard cap
- Claude may overshoot the budget
- Different from
max_tokens, which is a hard ceiling the model cannot see - Requires the beta header
task-budgets-2026-03-13
Streaming Responses
Use streaming for chat UIs, CLIs, and long-running responses.
Python
with client.messages.stream(
model="claude-opus-4-7",
max_tokens=4096,
messages=[
{
"role": "user",
"content": "Write a Python function to parse CSV files with error handling."
}
]
) as stream:
for text in stream.text_stream:
print(text, end="", flush=True)
TypeScript
const stream = await client.messages.stream({
model: "claude-opus-4-7",
max_tokens: 4096,
messages: [
{
role: "user",
content: "Write a Python function to parse CSV files with error handling.",
},
],
});
for await (const event of stream) {
if (
event.type === "content_block_delta" &&
event.delta.type === "text_delta"
) {
process.stdout.write(event.delta.text);
}
}
If adaptive thinking is enabled with display: "summarized", thinking blocks stream before the final text response.
If display is omitted, users may see a pause while the model reasons, followed by the text response.
Prompt Caching
Use prompt caching for repeated context, such as long system prompts, codebase summaries, or documents.
response = client.messages.create(
model="claude-opus-4-7",
max_tokens=1024,
system=[
{
"type": "text",
"text": "You are a senior code reviewer. Review code for security vulnerabilities, performance issues, and best practices violations...",
"cache_control": {
"type": "ephemeral"
}
}
],
messages=[
{
"role": "user",
"content": """Review this function:
def process_user_input(data):
return eval(data)"""
}
]
)
Cache pricing for Opus 4.7:
| Operation | Cost |
|---|---|
| 5-minute cache write | $6.25 / MTok, 1.25x base |
| 1-hour cache write | $10 / MTok, 2x base |
| Cache read / hit | $0.50 / MTok, 0.1x base |
A single cache read pays for the 5-minute cache write. Two reads pay for the 1-hour write.
Multi-Turn Conversations
Maintain conversation state by appending each user and assistant turn to the messages array.
messages = []
# Turn 1
messages.append({
"role": "user",
"content": "I need to build a REST API for a todo app."
})
response = client.messages.create(
model="claude-opus-4-7",
max_tokens=4096,
messages=messages,
)
messages.append({
"role": "assistant",
"content": response.content
})
# Turn 2
messages.append({
"role": "user",
"content": "Add authentication with JWT tokens."
})
response = client.messages.create(
model="claude-opus-4-7",
max_tokens=4096,
messages=messages,
)
print(response.content[0].text)
Testing Your API Calls with Apidog
Building a Claude API integration usually involves complex payloads: multi-turn messages, tool definitions, tool results, base64 images, beta headers, and streaming responses. Apidog can help you inspect and debug those requests visually.
Set up a Claude API request in Apidog:
- Create a new project in Apidog.
- Add the Claude Messages API endpoint.
- Store
ANTHROPIC_API_KEYas an environment variable. - Add the required headers:
x-api-keyanthropic-versioncontent-type
- Save reusable request bodies for basic text, vision, tool use, and streaming scenarios.
Test tool-use flows
Tool use usually requires at least two API calls:
- Send the initial user message.
- Inspect Claude’s
tool_useblock. - Execute your function outside the model.
- Send a
tool_resultblock back. - Read Claude’s final answer.
Apidog lets you chain these requests so you can simulate the full loop and inspect each payload.
Compare models
Run the same request against claude-opus-4-6 and claude-opus-4-7 to compare:
- Token counts
- Response quality
- Latency
- Tool-use behavior
Apidog’s test runner makes these comparisons repeatable.
Validate schemas
Define JSON schemas for expected response formats and validate responses automatically. This helps catch regressions when you change prompts, tools, or model versions.
Common Errors and Fixes
| Error | Cause | Fix |
|---|---|---|
400: thinking.budget_tokens not supported |
Using extended thinking syntax | Switch to thinking: {"type": "adaptive"}
|
400: temperature not supported |
Setting unsupported sampling parameters | Remove temperature, top_p, and top_k
|
400: max_tokens exceeded |
New tokenizer produces more tokens | Increase max_tokens, up to 128,000 |
429: Rate limited |
Too many requests | Implement exponential backoff and check your tier limits |
| Blank thinking blocks | Thinking display defaults to omitted | Add display: "summarized" to the thinking config |
Pricing Reference
| Usage | Cost |
|---|---|
| Input tokens | $5 / MTok |
| Output tokens | $25 / MTok |
| Batch input | $2.50 / MTok |
| Batch output | $12.50 / MTok |
| Cache reads | $0.50 / MTok |
| 5-minute cache writes | $6.25 / MTok |
| 1-hour cache writes | $10 / MTok |
Opus 4.7’s new tokenizer may use up to 35% more tokens for the same text compared to Opus 4.6. Use the /v1/messages/count_tokens endpoint to estimate costs before production deployment.
Conclusion
Claude Opus 4.7 keeps the familiar Messages API shape but changes how reasoning is configured. Remove extended thinking budgets and unsupported sampling parameters, then use adaptive thinking, effort, task budgets, high-resolution vision, and tool use where they fit your workflow.
A practical implementation path:
- Start with a basic text request.
- Add adaptive thinking for complex reasoning.
- Add tool use for external actions and data retrieval.
- Use task budgets for long-running agentic loops.
- Stream responses for better UX.
- Use prompt caching for repeated context.
- Test requests, tool loops, and schemas with Apidog before shipping.

Top comments (0)