Use Cases
/

Quota-Aware LLM Gateway

Quota-Aware LLM Gateway

Run application traffic through Console with project-scoped quotas, model routing, and request-level visibility for cost control.

Console
Console SDK

Overview

The most practical Console use case for product teams is not just calling a model. It is putting every AI request behind a controlled gateway so teams can separate traffic, enforce quotas, and see where cost goes.

This is useful when you have multiple internal apps or customer-facing modules sharing the same AI platform but operating under different budgets and reliability rules.

Architecture

Console acts as the LLM gateway and control plane. Each project can have its own API key, model catalog, quota boundaries, and tracing visibility.

Console SDK keeps the app-side integration simple while still exposing request IDs, model access, and typed responses.

1. Separate Traffic By Project And Quota

Quota and rate policies are configured in Console, while each app only receives the project-scoped key it should use.

// Example project setup in Console:
//
// Project: support-assistant
// - apiKey: cp_support_xxx
// - monthly quota: 2M input tokens
// - allowed models: gpt-4o-mini, claude-3-7-sonnet
//
// Project: sales-enablement
// - apiKey: cp_sales_xxx
// - monthly quota: 500K input tokens
// - allowed models: gpt-4o-mini

import { ConsoleClient } from '@cognipeer/console-sdk';

const client = new ConsoleClient({
  apiKey: process.env.SUPPORT_PROJECT_API_KEY!,
  baseURL: 'https://console.example.com',
});

2. Route Requests Through A Stable Model Key

Your application uses one model key while Console decides routing, fallback, and resiliency behind the scenes.

const response = await client.chat.completions.create({
  model: 'support-primary',
  messages: [
    { role: 'system', content: 'You are the support assistant for premium customers.' },
    { role: 'user', content: 'Summarize the open tickets for account A-104.' },
  ],
  temperature: 0.2,
});

console.log(response.choices[0].message.content);
console.log('request_id:', response.request_id);

3. Use Request IDs For Operational Follow-Up

When a team hits a quota wall or sees cost spikes, request-level correlation becomes operationally important.

async function askSupportAssistant(prompt: string) {
  const response = await client.chat.completions.create({
    model: 'support-primary',
    messages: [{ role: 'user', content: prompt }],
  });

  auditLog.info({
    requestId: response.request_id,
    prompt,
    model: 'support-primary',
  });

  return response.choices[0].message.content;
}

4. Add Semantic Cache Through Console

Console can also serve repeated or semantically similar prompts from cache. That is especially useful for high-volume support, catalog, and policy lookup scenarios where the same intent appears with slightly different wording.

// Example Console model configuration:
//
// Model key: support-primary
// semanticCache:
//   enabled: true
//   vectorProviderKey: qdrant-main
//   vectorIndexKey: support-semantic-cache
//   embeddingModelKey: text-embedding-3-small
//   similarityThreshold: 0.93
//   ttlSeconds: 86400

const response = await client.chat.completions.create({
  model: 'support-primary',
  messages: [{ role: 'user', content: 'How do I reset my workspace password?' }],
});

console.log(response.request_id);

Result

You get a gateway pattern that:

- Separates AI traffic by project and budget - Controls which models each team can use - Applies routing, fallback, and semantic caching without app-side complexity - Improves cost and quota investigations through request-level visibility