[ MWS.OS ]MODEAPIBUILD4b4deb4————.——.—— · ——:——:——ZSYSTEMS NOMINAL

[ MWS API ]

Sonnet-level AI for ⅕ the cost.

One API. Both Anthropic and OpenAI shapes. Drop-in: change your baseURLand your existing SDK works unchanged. Powered by frontier open-weight models — so we're still cheaper than the providers we replace.

One-line drop-in

Already using Anthropic or OpenAI? Add a baseURL override and an MWS API key. Your existing tool calls, streaming, and message shapes work unchanged.

import Anthropic from "@anthropic-ai/sdk";

const client = new Anthropic({
  baseURL: "https://api.mws.run/v1",
  apiKey: process.env.MWS_API_KEY,
});

const response = await client.messages.create({
  model: "claude-sonnet-4-6",
  max_tokens: 1024,
  messages: [{ role: "user", content: "Hello" }],
});

Models

Two profiles are live today: Sonnet-class (default) — a drop-in for claude-sonnet-*model strings at much lower cost — and Pro (Think) for extended reasoning with a 1M-token context window. Pass any Claude or OpenAI model name and we'll route to the right one.

Sonnet-class

164K context · 143 tok/s

$0.60 in / $0.90 out per 1M tokens

Pro (Think)

1048K context · 178 tok/s

$3.48 in / $6.96 out per 1M tokens

Rate limits auto-scale

Your rate limit grows automatically as your account spends and as you upgrade plans. There's no support ticket for higher throughput, and creating extra API keys doesn't increase your capacity — every key under your account draws from the same pool. Use additional keys to separate environments and rotate credentials, not to scale.

Every response carries the six standard x-ratelimit-* headers (compatible with OpenAI SDK retry logic), so your client can self-regulate without guessing or burning a request to discover the ceiling.

Frequently asked

How do I pay?

Pay-as-you-go. Buy credits in dollar amounts ($25 minimum). Credits are denominated in tokens at the cheapest profile's rate, so you always get at least the implied capacity. We charge via Stripe.

What latency / throughput?

Sonnet-class (default): ~143 tok/s, sub-second TTFT in normal conditions. Pro (Think): ~178 tok/s with extended reasoning. Each profile's measured TPS is shown in the Models section above.

Can I keep my code on Anthropic SDK?

Yes — that's the whole point. We translate Anthropic's Messages format (system, content blocks, tool_use, tool_result) to the underlying OpenAI-shape upstream. Streaming, tool round-trips, and stop reasons round-trip correctly.

What about prompt caching?

Currently not supported — the underlying providers don't expose an Anthropic-equivalent caching API. cache_control blocks are silently ignored. Even without caching the cost difference vs Sonnet is large enough that most workloads come out ahead.

Do credits expire?

Credits stay valid for 12 months from purchase. After that they may be cleaned up. We'll email you before that happens.

Can I get a refund?

Unused credits within 30 days of purchase: yes. Consumed tokens: no — we've already paid the upstream for them. Email support@vellora.ai if you have a billing dispute.

Is this stable?

The same gateway has been in production powering our IDE for months. Public API access is in early access — expect occasional incidents and the odd rough edge during the first weeks. Email support@vellora.ai if anything breaks.

Do I need to create multiple API keys to get more throughput?

No. Rate limits are per-account, not per-key, and they scale automatically based on your plan and cumulative spend. Extra keys are useful for separating environments (prod / staging / sandbox) and revoking access independently — they don't add capacity.