deploy --production
Everything you need to ship Claude-powered applications that are reliable, cost-efficient, and ready for real users.
cat architecture.md
The most robust pattern for production Claude apps is a server-side proxy: your frontend calls your own backend, which calls the Anthropic API. This keeps your API key safe, lets you add auth, rate limiting, logging, and caching at the edge.
For simple use cases, a Next.js API route or a lightweight Express server works well. For higher scale, consider a dedicated microservice that handles all LLM calls with its own queue and retry logic.
# Request flow
User Browser
↓ POST /api/chat
Your Server (Next.js API Route)
↓ Validates request, checks auth
↓ Adds system prompt
↓ POST /v1/messages
Anthropic API
↓ Response / Stream
Your Server
↓ Logs usage, handles errors
User Browser
// Next.js App Router API route
import { NextRequest, NextResponse } from "next/server";
import Anthropic from "@anthropic-ai/sdk";
const client = new Anthropic();
export async function POST(req: NextRequest) {
const { message } = await req.json();
if (!message || message.length > 4000) {
return NextResponse.json({ error: "Invalid input" }, { status: 400 });
}
const response = await client.messages.create({
model: "claude-sonnet-4-5",
max_tokens: 1024,
system: process.env.SYSTEM_PROMPT,
messages: [{ role: "user", content: message }]
});
return NextResponse.json({ reply: response.content[0].text });
}
try {} catch { handle() }
The Anthropic API can return several error types. Handle each specifically rather than catching everything generically. The SDK exports typed error classes for each case.
import Anthropic from "@anthropic-ai/sdk";
try {
const response = await client.messages.create(...);
} catch (err) {
// Rate limited — retry with backoff
if (err instanceof Anthropic.RateLimitError) {
await delay(calculateBackoff(attempt));
}
// Auth failed — check API key
else if (err instanceof Anthropic.AuthenticationError) {
logger.error("API key invalid or missing");
}
// Context window exceeded
else if (err instanceof Anthropic.BadRequestError) {
await truncateMessages(messages);
}
// Transient server error — safe to retry
else if (err instanceof Anthropic.InternalServerError) {
await delay(5000); // wait longer
}
// Unknown — log and surface to user
else throw err;
}
cat rate-limits.md
Rate limits apply per API key, not per user. If you have many concurrent users, you will hit limits faster than you expect. Strategies to manage this:
- Queue requests — use a queue like BullMQ to serialize API calls and prevent burst spikes
- Cache aggressively — identical prompts get identical responses; cache them in Redis for 1–24 hours
- Use Haiku for high volume — switch to claude-haiku-3-5 for classification, tagging, and other simple tasks
- Monitor token headers — check x-ratelimit-remaining-tokens in responses to detect approaching limits early
vim system-prompt.txt
Your system prompt is the single highest-leverage variable in a production Claude app. A well-crafted system prompt produces consistent, predictable output; a vague one produces unreliable behavior that is hard to debug.
Key principles for production system prompts:
- Be specific about format — tell Claude exactly what output format you need (JSON, markdown, plain text)
- Define the persona — describe who Claude is in this context, its tone, and its limitations
- Handle edge cases explicitly — what should Claude do if the user asks something out of scope?
- Version your prompts — treat system prompts like code; track changes, test before deploying
claude --cost-optimize
API costs scale with tokens (input + output). On a free side project this is negligible, but at scale it becomes significant. These techniques reduce cost without sacrificing quality:
- Right-size your model — use claude-haiku-3-5 for classification, extraction, and simple Q&A. Reserve claude-sonnet-4-5 for tasks that need deeper reasoning.
- Limit max_tokens — set it to the expected output length, not the maximum. A 200-token limit for a product description costs far less than 4096.
- Truncate conversation history — only pass the last N messages in multi-turn chats. Old context is usually irrelevant and burns tokens.
- Use prompt caching — for long, repeated system prompts, Anthropic's prompt caching feature can reduce input token costs by up to 90%.
chmod 600 .env
A compromised API key means someone else burns your credits and potentially your reputation. Treat it with the same care as a database password.
# API Key Security
✓ Store in environment variables only
✓ Never in source code or git history
✓ Use secrets manager in production (AWS Secrets, GCP Secret Manager)
✓ Rotate keys periodically
✓ Set up usage alerts in Anthropic Console
# Prompt Injection Defense
✓ Validate and sanitize user input
✓ Never concatenate user input directly into system prompt
✓ Use separate message roles (system vs user)
✓ Rate-limit per user, not just per API key
tail -f production.log
You cannot improve what you do not measure. Log these metrics for every API call:
- Input and output token counts (for cost tracking)
- Response latency (p50, p95, p99)
- stop_reason — flag any non-end_turn stops for investigation
- Error rate by type (rate limit, auth, server error)
- User feedback signals if you surface thumbs up/down
./pre-launch-checklist.sh
- API key stored in environment variables — never in code
- Retry logic with exponential backoff on rate limit errors
- Request timeout set (recommended: 30–120 seconds)
- Input validation before sending to API
- Response validation — check stop_reason and content type
- Token usage logged for cost monitoring
- Error states handled gracefully in the UI
- Streaming used for long responses (>500 tokens)
- System prompt tuned and versioned
- Load tested before launching to users
cat ./next-steps.md
Claude API Guide →
Authentication, models, streaming, and tool use — the complete API reference.
Advanced Workflows →
Agentic patterns, plan mode, git worktrees, and more Claude Code techniques.
Attend a Workshop →
Build with Claude alongside other Kenyan developers at our in-person events.
Stuck on something in production? Join our Discord — community members share production tips and debug issues together. Also check the community blog for real-world lessons from Kenyan developers shipping Claude apps.