ngrok ai logo

One gateway for every AI model.

Route, secure, and manage traffic to any LLM—cloud or local—with one unified platform.

Product demonstration
Privacy policy

Any LLM, same API

Connect to any LLM or provider, cloud or self-hosted, with the same API.

OpenAIOpenAI
AnthropicAnthropic
GMI CloudGMI Cloud
Google VertexGoogle Vertex
FireworksFireworks
Open RouterOpen Router
Azure FoundryAzure Foundry
z.aiz.ai
GroqGroq
MoonshotMoonshot

Always send requests to the best model

Automatically direct each request to the fastest, most reliable, or most affordable model, no manual intervention required.

Routing

Manage Spend Without Lifting a Finger

Monitor usage and costs in real-time to avoid expensive models or stay within budget. Keep your costs predictable and under control.

Keep Your Product Online

If a provider is slow or unavailable, instantly route traffic to healthy models so your users never experience downtime.

Faster responses, lower costs

Cache common prompts and responses to improve speed and reduce unnecessary calls.

Stay in the Know

See exactly how requests are being routed, so you can feel confident that your system is working as intended.

Protect your users data

Redact sensitive information and choose which providers can access your data, keeping you in control of privacy.

Redaction illustration

Stay Compliant Wherever You Operate

Ensure requests and data are only routed to trusted models and approved regions, meeting your privacy and regulatory requirements.

Scale Seamlessly, Every Time

Distribute requests across multiple providers and keys to avoid rate limits and maintain high performance, even as you grow.

Mitigate abuse

Prevent abuse and unexpected spikes with easy-to-set rate limits, so you can scale safely.

Limits

How does it work?

1

Configure your endpoint

on_http_request:
- actions:
- type: ai-router
config: {}
2

Update your SDK

import OpenAI from "openai";

const ngrokClient = new OpenAI({
baseURL: 'https://your_endpoint.ngrok.dev' apiKey: 'YOUR_PROVIDER_API_KEY'
});
3

Prompt and send traffic

const completion = await client.chat.completions.create({
  model: 'openai/gpt-4o',
  messages: [
    { role: 'system', content: 'Talk like a pirate.' },
{ role: 'user', content: Are semicolons optional in JavaScript? },
], stream: true });

Build smarter, ship faster.

Get early access, help shape the platform, and never fight AI traffic headaches again.

Privacy policy