deploy --production

Everything you need to ship Claude-powered applications that are reliable, cost-efficient, and ready for real users.

cat architecture.md

architecture.md

The most robust pattern for production Claude apps is a server-side proxy: your frontend calls your own backend, which calls the Anthropic API. This keeps your API key safe, lets you add auth, rate limiting, logging, and caching at the edge.

For simple use cases, a Next.js API route or a lightweight Express server works well. For higher scale, consider a dedicated microservice that handles all LLM calls with its own queue and retry logic.

architecture-diagram

# Request flow

User Browser

↓ POST /api/chat

Your Server (Next.js API Route)

↓ Validates request, checks auth

↓ Adds system prompt

↓ POST /v1/messages

Anthropic API

↓ Response / Stream

Your Server

↓ Logs usage, handles errors

User Browser

api-route.ts

// Next.js App Router API route

import { NextRequest, NextResponse } from "next/server";

import Anthropic from "@anthropic-ai/sdk";

const client = new Anthropic();

export async function POST(req: NextRequest) {

const { message } = await req.json();

if (!message || message.length > 4000) {

return NextResponse.json({ error: "Invalid input" }, { status: 400 });

}

const response = await client.messages.create({

model: "claude-sonnet-4-5",

max_tokens: 1024,

system: process.env.SYSTEM_PROMPT,

messages: [{ role: "user", content: message }]

});

return NextResponse.json({ reply: response.content[0].text });

}

try {} catch { handle() }

errors.md

The Anthropic API can return several error types. Handle each specifically rather than catching everything generically. The SDK exports typed error classes for each case.

error-handling.ts

import Anthropic from "@anthropic-ai/sdk";

try {

const response = await client.messages.create(...);

} catch (err) {

// Rate limited — retry with backoff

if (err instanceof Anthropic.RateLimitError) {

await delay(calculateBackoff(attempt));

}

// Auth failed — check API key

else if (err instanceof Anthropic.AuthenticationError) {

logger.error("API key invalid or missing");

}

// Context window exceeded

else if (err instanceof Anthropic.BadRequestError) {

await truncateMessages(messages);

}

// Transient server error — safe to retry

else if (err instanceof Anthropic.InternalServerError) {

await delay(5000); // wait longer

}

// Unknown — log and surface to user

else throw err;

}

cat rate-limits.md

rate-limits.md

Rate limits apply per API key, not per user. If you have many concurrent users, you will hit limits faster than you expect. Strategies to manage this:

Queue requests — use a queue like BullMQ to serialize API calls and prevent burst spikes
Cache aggressively — identical prompts get identical responses; cache them in Redis for 1–24 hours
Use Haiku for high volume — switch to claude-haiku-3-5 for classification, tagging, and other simple tasks
Monitor token headers — check x-ratelimit-remaining-tokens in responses to detect approaching limits early

vim system-prompt.txt

prompts.md

Your system prompt is the single highest-leverage variable in a production Claude app. A well-crafted system prompt produces consistent, predictable output; a vague one produces unreliable behavior that is hard to debug.

Key principles for production system prompts:

Be specific about format — tell Claude exactly what output format you need (JSON, markdown, plain text)
Define the persona — describe who Claude is in this context, its tone, and its limitations
Handle edge cases explicitly — what should Claude do if the user asks something out of scope?
Version your prompts — treat system prompts like code; track changes, test before deploying

claude --cost-optimize

cost-optimization.md

API costs scale with tokens (input + output). On a free side project this is negligible, but at scale it becomes significant. These techniques reduce cost without sacrificing quality:

Right-size your model — use claude-haiku-3-5 for classification, extraction, and simple Q&A. Reserve claude-sonnet-4-5 for tasks that need deeper reasoning.
Limit max_tokens — set it to the expected output length, not the maximum. A 200-token limit for a product description costs far less than 4096.
Truncate conversation history — only pass the last N messages in multi-turn chats. Old context is usually irrelevant and burns tokens.
Use prompt caching — for long, repeated system prompts, Anthropic's prompt caching feature can reduce input token costs by up to 90%.

chmod 600 .env

security.md

A compromised API key means someone else burns your credits and potentially your reputation. Treat it with the same care as a database password.

security-checklist

# API Key Security

✓ Store in environment variables only

✓ Never in source code or git history

✓ Use secrets manager in production (AWS Secrets, GCP Secret Manager)

✓ Rotate keys periodically

✓ Set up usage alerts in Anthropic Console

# Prompt Injection Defense

✓ Validate and sanitize user input

✓ Never concatenate user input directly into system prompt

✓ Use separate message roles (system vs user)

✓ Rate-limit per user, not just per API key

tail -f production.log

monitoring.md

You cannot improve what you do not measure. Log these metrics for every API call:

Input and output token counts (for cost tracking)
Response latency (p50, p95, p99)
stop_reason — flag any non-end_turn stops for investigation
Error rate by type (rate limit, auth, server error)
User feedback signals if you surface thumbs up/down

./pre-launch-checklist.sh

pre-launch-checklist

API key stored in environment variables — never in code
Retry logic with exponential backoff on rate limit errors
Request timeout set (recommended: 30–120 seconds)
Input validation before sending to API
Response validation — check stop_reason and content type
Token usage logged for cost monitoring
Error states handled gracefully in the UI
Streaming used for long responses (>500 tokens)
System prompt tuned and versioned
Load tested before launching to users

cat ./next-steps.md

Claude API Guide →

Authentication, models, streaming, and tool use — the complete API reference.

Advanced Workflows →

Agentic patterns, plan mode, git worktrees, and more Claude Code techniques.

Attend a Workshop →

Build with Claude alongside other Kenyan developers at our in-person events.

Stuck on something in production? Join our Discord — community members share production tips and debug issues together. Also check the community blog for real-world lessons from Kenyan developers shipping Claude apps.

$ deploy --production

$ cat architecture.md

$ try {} catch { handle() }

$ cat rate-limits.md

$ vim system-prompt.txt

$ claude --cost-optimize

$ chmod 600 .env

$ tail -f production.log

$ ./pre-launch-checklist.sh

$ cat ./next-steps.md