ATLAS LLM GATEWAY

Stop treating LLM cost control like five separate projects.

Atlas LLM Gateway is a hosted BYOK API for production teams that need cost-aware model access without building the platform layer themselves. Cache repeat work, route async traffic through batch, reconcile against provider billing, enforce budgets, and track usage per account.

Start with Claude. Keep your provider key. Pay Atlas for the gateway layer: API keys, plan gates, exact and semantic cache, batch execution, usage visibility, reconciliation, budget guards, and idempotency.

Request Gateway Access See the API surface

WEDGE

Integrated LLM cost control

MODEL

BYOK + flat monthly tier

SURFACE

Hosted API + cost rollups

REQUEST PATH

Your app

Atlas API key

BYOK resolver

Sync or batch route

Usage ledger

WEDGE

Keep synchronous calls for interactive UX. Route the rest through cache, batch, reconciliation, and budget controls.

THE PROBLEM

The waste is not just model price. It is missing cost-control plumbing.

Production LLM teams usually discover the same backlog: cache duplicate work, move offline jobs to batch, reconcile local usage against provider invoices, stop calls before budgets blow up, and route traffic as providers change.

Each item is solvable. The problem is that solving them one at a time turns a model integration into an internal platform project:

skip duplicate promptscatch semantic near-duplicatessubmit batch jobspoll terminal statusreconcile provider billsenforce runtime budgetsroute between providersretry safelytrack per-account spendgate by planstore customer keysavoid cross-tenant leakage

THE WEDGE

One gateway for the five controls teams keep rebuilding.

The first conversation can still be simple: you are already paying Anthropic, and part of that traffic can run cheaper through batch. But the locked product wedge is bigger than batch alone: the cost controls work together instead of living in five disconnected tools.

Exact + semantic cache

Skip paid calls when the same prompt or a near-duplicate request has already been answered.

Anthropic batch

Move async traffic like evals, enrichment, backfills, and report jobs onto lower-cost batch execution.

Provider reconciliation

Compare the gateway ledger against actual Anthropic and OpenRouter billing instead of trusting local counters blindly.

Runtime budget guards

Block calls before they push an account or workload over its configured spend ceiling.

Multi-provider routing

Start Claude-first, then route through Anthropic and OpenRouter as traffic and policy needs mature.

MVP API SURFACE

A small gateway surface around the calls teams already make.

The MVP is Claude-first and built for Anthropic plus OpenRouter routing. It gives production scripts a stable Atlas API key, resolves the customer's provider key server-side, writes account-scoped usage, and keeps retries safe.

METHOD

PATH

PURPOSE

POST

/api/v1/llm/chat

Synchronous Claude chat proxy using the customer provider key stored server-side.

POST

/api/v1/llm/chat/stream

SSE streaming for user-facing requests that cannot wait on batch completion.

POST

/api/v1/llm/batch

Anthropic Message Batches behind a normal gateway surface with idempotency-key retries.

GET

/api/v1/llm/batch/{id}

Status polling and terminal usage writeback when batch results settle.

GET

/api/v1/llm/usage

Per-account token and cost rollups across cache hits, synchronous calls, and batch traffic.

POST

/api/v1/byok/keys

Customer Anthropic keys encrypted at rest and resolved per request.

THE SUBSTRATE

The boring infrastructure is the product.

Batch savings get the conversation. The reason teams stay is that cache, routing, usage, reconciliation, and budget enforcement are bundled into the same gateway.

Production API keys

Long-lived atls_live_* keys for scripts and services, separate from dashboard JWT sessions.

BYOK provider keys

Customers keep their Anthropic relationship; Atlas encrypts keys at rest and injects them server-side.

Per-account cost ledger

Token, cost, provider, cache, and batch rollups are scoped to the account that made the call.

Plan and rate gates

Trial, starter, growth, and pro tiers control batch access, key count, and request limits.

Safe retries

Idempotency-Key support keeps retry behavior controlled when clients or workers fail mid-request.

Cost-control substrate

Exact cache, semantic cache, batch execution, reconciliation, budget guards, and routing live together.

Good fit

You already use Claude or plan to use Claude in production.

You have repeat prompts, async jobs, or provider bills large enough for cache and batch savings to matter.

You want provider-key ownership without building key storage, usage tables, budget gates, reconciliation, and billing plumbing.

You need cost visibility per customer, workspace, account, or product area.

Not the first wedge

Every request is unique, user-facing, and must stream immediately.

You only need a thin provider proxy without cache, reconciliation, or budget controls.

You need SOC 2, custom deployment, or enterprise procurement before an MVP trial.

You are trying to avoid having your own provider account or BYOK setup.

PRIVATE ACCESS

Flat monthly tiers, no token markup conversation.

The MVP is sold as gateway access. You keep provider billing with Anthropic or OpenRouter, then pay Atlas for the gateway layer that makes cache, batch, usage, BYOK, reconciliation, budgets, and plan controls operational.

VALIDATE FIT

Trial

Confirm the API shape, BYOK setup, cache behavior, usage visibility, and batch workflow on a limited tier.

Request trial access

FIRST PRODUCTION JOBS

Starter

For small teams moving duplicate prompts and async jobs off full-price synchronous calls.

Discuss Starter

HIGHER VOLUME

Growth

For teams with steady LLM traffic that need account-level usage, cache, budget, and reconciliation controls.

Discuss Growth

PLATFORM TEAM

Pro

For production teams that need higher limits, stronger review, and deeper provider-routing intelligence.

Discuss Pro

WHAT COMES NEXT

Batch opens the door. Cost intelligence expands the account.

Once your LLM traffic flows through Atlas, the next layer is automatic batch-vs-sync arbitrage, prompt-level observability, smarter cache policies, and provider routing by cost, latency, and reliability. The MVP starts where ROI is easiest to prove, then grows into routing intelligence.

Request Gateway Access

FAQ

Questions before you route traffic through it.

Is this a model provider?

No. Atlas LLM Gateway starts as a hosted BYOK gateway. You keep your provider relationship; Atlas adds the API surface, cache, batch execution, plan gates, usage tracking, reconciliation, and budget controls around it.

Where do the savings come from?

The first visible wedge is Anthropic Message Batches for traffic that can wait. The broader savings surface also includes exact cache, semantic cache, provider-billing reconciliation, and runtime budget guards that stop spend before it drifts.

Do I have to rewrite my whole app?

No. Real-time calls can stay real-time through the chat and streaming endpoints. The biggest early win is moving non-interactive work to the batch endpoint first.

Is OpenRouter supported?

The product is Claude-first and designed around Anthropic plus OpenRouter routing. Together, Groq, and other providers are later expansion paths when customer traffic justifies them.

How is Atlas paid if customers bring their own keys?

The MVP is a flat monthly subscription tier for the gateway infrastructure. Provider token billing stays with the customer, which makes the batch-savings wedge easy to verify.

No provider-key markup claim is hidden here: BYOK means provider billing stays yours. Atlas charges for the gateway infrastructure that makes cache savings, batch savings, reconciliation, budgets, and routing practical.

Back to systems