Dragon
dragon · 262K ctx · 189 tok/s
$0.48 in / $1.08 out per 1M
[ API documentation ]
Everything you need to drop the MWS gateway into your stack. If something here is unclear, email support@vellora.ai.
All endpoints live under:
https://api.mws.run/v1Every request needs a Bearer token in the Authorization header. Generate one in the dashboard after you've added at least $25 in credits or subscribed to a plan.
Authorization: Bearer mws_live_<your_api_key>POST /v1/messages — Anthropic Messages format (recommended for new code).POST /v1/messages/count_tokens — Approximate token count (rough estimate; Anthropic-equivalent tokenizer not used).POST /v1/chat/completions — OpenAI Chat Completions format. Streaming and tool-use supported.GET /v1/models — List available Dragon profiles.POST /v1/embeddings — Embedding generation.POST /v1/rerank — Document reranking.Set stream: true. Anthropic-shape responses use Anthropic's SSE event types (message_start, content_block_delta, etc.). OpenAI-shape responses use standard data: {...}\\n\\n chunks terminated by [DONE].
const stream = client.messages.stream({
model: "claude-sonnet-4-6",
max_tokens: 1024,
messages: [{ role: "user", content: "Stream me a poem." }],
});
for await (const event of stream) {
if (event.type === "content_block_delta" && event.delta.type === "text_delta") {
process.stdout.write(event.delta.text);
}
}Both API shapes support function/tool calling. IDs are preserved verbatim across turns, so the standard request → tool_use → tool_result → end_turn flow works unchanged from the underlying SDK.
const tools = [
{
name: "get_weather",
description: "Look up the weather for a city.",
input_schema: {
type: "object",
properties: { city: { type: "string" } },
required: ["city"],
},
},
];
const first = await client.messages.create({
model: "claude-sonnet-4-6",
max_tokens: 512,
tools,
messages: [{ role: "user", content: "What's the weather in Paris?" }],
});
// first.stop_reason === "tool_use"
// Reply with a tool_result and call again to get the final answer.Pass a Claude model name (claude-sonnet-4-6, claude-haiku-4, claude-opus-4) or pass a Dragon profile slug directly. Unknown model strings default to Dragon.
dragon · 262K ctx · 189 tok/s
$0.48 in / $1.08 out per 1M
dragon-flash · 131K ctx · 1400 tok/s
$1.20 in / $2.40 out per 1M
dragon-blitz · 131K ctx · 698 tok/s
$0.30 in / $1.20 out per 1M
dragon-pro · 128K ctx · 447 tok/s
$0.22 in / $0.68 out per 1M
dragon-spark · 164K ctx · 143 tok/s
$0.60 in / $0.90 out per 1M
dragon-think · 131K ctx · 383 tok/s
$0.30 in / $2.40 out per 1M
dragon-thinkmaxing · 1048K ctx · 178 tok/s
$3.48 in / $6.96 out per 1M
dragon-reason · 262K ctx · 172 tok/s
$0.40 in / $1.20 out per 1M
dragon-seer · 262K ctx · 383 tok/s
$1.00 in / $5.00 out per 1M
dragon-coder · 262K ctx · 189 tok/s
$0.48 in / $1.08 out per 1M
dragon-nova · 262K ctx · 310 tok/s
$1.20 in / $7.20 out per 1M
400 — Invalid request body (e.g. missing max_tokens, malformed tool schema).401 — Missing, invalid, or revoked API key.402 — Out of credits or quota. Top up at /dashboard/credits.429 — Rate limit exceeded (per-account capacity ceiling, or per-IP for unauthenticated traffic).5xx — Upstream provider unavailable. Retry with backoff.Rate limits scale automatically based on your plan and lifetime spend — no support emails, no manual review.
Every API response (200 or 429) includes six standard headers compatible with the OpenAI SDK: x-ratelimit-limit-requests, x-ratelimit-limit-tokens, x-ratelimit-remaining-requests, x-ratelimit-remaining-tokens, x-ratelimit-reset-requests, x-ratelimit-reset-tokens.
Need to throttle a single key below your account ceiling (for example, a sandbox key)? Set the throttle from /dashboard/keys.
cache_control) is silently ignored. Underlying providers don't expose an equivalent.messages.count_tokens uses a 4-char-per-token estimate. Treat as approximate.baseURL: "https://api.mws.run/v1" on your Anthropic client.baseURL: "https://api.mws.run/v1" on your OpenAI client.model: "gpt-4o" to model: "dragon" (or another profile). Models like gpt-4o route to Dragon by default.