[ MWS.OS ]MODEDOCSBUILD4b4deb4————.——.—— · ——:——:——ZSYSTEMS NOMINAL

[ API documentation ]

Build on MWS

Everything you need to drop the MWS gateway into your stack. If something here is unclear, email support@vellora.ai.

Base URL

All endpoints live under:

https://api.mws.run/v1

Authentication

Every request needs a Bearer token in the Authorization header. Generate one in the dashboard after you've added at least $25 in credits or subscribed to a plan.

Authorization: Bearer mws_live_<your_api_key>

Endpoints

  • POST /v1/messages — Anthropic Messages format (recommended for new code).
  • POST /v1/messages/count_tokens — Approximate token count (rough estimate; Anthropic-equivalent tokenizer not used).
  • POST /v1/chat/completions — OpenAI Chat Completions format. Streaming and tool-use supported.
  • GET /v1/models — List available Dragon profiles.
  • POST /v1/embeddings — Embedding generation.
  • POST /v1/rerank — Document reranking.

Streaming

Set stream: true. Anthropic-shape responses use Anthropic's SSE event types (message_start, content_block_delta, etc.). OpenAI-shape responses use standard data: {...}\\n\\n chunks terminated by [DONE].

const stream = client.messages.stream({
  model: "claude-sonnet-4-6",
  max_tokens: 1024,
  messages: [{ role: "user", content: "Stream me a poem." }],
});

for await (const event of stream) {
  if (event.type === "content_block_delta" && event.delta.type === "text_delta") {
    process.stdout.write(event.delta.text);
  }
}

Tool use

Both API shapes support function/tool calling. IDs are preserved verbatim across turns, so the standard request → tool_use → tool_result → end_turn flow works unchanged from the underlying SDK.

const tools = [
  {
    name: "get_weather",
    description: "Look up the weather for a city.",
    input_schema: {
      type: "object",
      properties: { city: { type: "string" } },
      required: ["city"],
    },
  },
];

const first = await client.messages.create({
  model: "claude-sonnet-4-6",
  max_tokens: 512,
  tools,
  messages: [{ role: "user", content: "What's the weather in Paris?" }],
});

// first.stop_reason === "tool_use"
// Reply with a tool_result and call again to get the final answer.

Models & pricing

Pass a Claude model name (claude-sonnet-4-6, claude-haiku-4, claude-opus-4) or pass a Dragon profile slug directly. Unknown model strings default to Dragon.

Dragon

dragon · 262K ctx · 189 tok/s

$0.48 in / $1.08 out per 1M

Flash

dragon-flash · 131K ctx · 1400 tok/s

$1.20 in / $2.40 out per 1M

Flash 2

dragon-blitz · 131K ctx · 698 tok/s

$0.30 in / $1.20 out per 1M

Flash 3

dragon-pro · 128K ctx · 447 tok/s

$0.22 in / $0.68 out per 1M

Flash 4

dragon-spark · 164K ctx · 143 tok/s

$0.60 in / $0.90 out per 1M

Think

dragon-think · 131K ctx · 383 tok/s

$0.30 in / $2.40 out per 1M

Think 2

dragon-thinkmaxing · 1048K ctx · 178 tok/s

$3.48 in / $6.96 out per 1M

Think 3

dragon-reason · 262K ctx · 172 tok/s

$0.40 in / $1.20 out per 1M

Vision

dragon-seer · 262K ctx · 383 tok/s

$1.00 in / $5.00 out per 1M

Code

dragon-coder · 262K ctx · 189 tok/s

$0.48 in / $1.08 out per 1M

Big

dragon-nova · 262K ctx · 310 tok/s

$1.20 in / $7.20 out per 1M

Errors

  • 400 — Invalid request body (e.g. missing max_tokens, malformed tool schema).
  • 401 — Missing, invalid, or revoked API key.
  • 402 — Out of credits or quota. Top up at /dashboard/credits.
  • 429 — Rate limit exceeded (per-account capacity ceiling, or per-IP for unauthenticated traffic).
  • 5xx — Upstream provider unavailable. Retry with backoff.

Rate limits

Rate limits scale automatically based on your plan and lifetime spend — no support emails, no manual review.

  • Per-account, not per-key. Creating extra keys does not increase capacity. Every key on your account draws from the same pool.
  • Two ceilings: requests per minute (RPM) and tokens per minute (TPM).
  • Two paths up: upgrading your plan moves you to a higher tier instantly; cumulative paid spend also moves you up automatically, even if you stay on the same plan.

Every API response (200 or 429) includes six standard headers compatible with the OpenAI SDK: x-ratelimit-limit-requests, x-ratelimit-limit-tokens, x-ratelimit-remaining-requests, x-ratelimit-remaining-tokens, x-ratelimit-reset-requests, x-ratelimit-reset-tokens.

Need to throttle a single key below your account ceiling (for example, a sandbox key)? Set the throttle from /dashboard/keys.

SDK compatibility notes

  • Prompt caching (cache_control) is silently ignored. Underlying providers don't expose an equivalent.
  • Vision works on profiles that support it (Dragon Seer). Other profiles silently drop image blocks.
  • Token counting via messages.count_tokens uses a 4-char-per-token estimate. Treat as approximate.
  • anthropic-version header is accepted but currently a no-op.
  • Stop sequences work but may behave differently than Anthropic — they pass through to the upstream tokenizer.

Migration guide

From Anthropic

  1. Set baseURL: "https://api.mws.run/v1" on your Anthropic client.
  2. Replace your Anthropic key with an MWS key.
  3. That's it. Model names, message format, tool calls all unchanged.

From OpenAI

  1. Set baseURL: "https://api.mws.run/v1" on your OpenAI client.
  2. Replace your OpenAI key with an MWS key.
  3. Change model: "gpt-4o" to model: "dragon" (or another profile). Models like gpt-4o route to Dragon by default.