Overview

The most practical Console use case for product teams is not just calling a model. It is putting every AI request behind a controlled gateway so teams can separate traffic, enforce quotas, and see where cost goes.

This is useful when you have multiple internal apps or customer-facing modules sharing the same AI platform but operating under different budgets and reliability rules.

When to reach for this recipe

If your team needs the capabilities described above and you'd rather build on proven primitives than wire one from scratch — this is the shape to start from.

Architecture

Console acts as the LLM gateway and control plane. Each project can have its own API key, model catalog, quota boundaries, and tracing visibility.

Console SDK keeps the app-side integration simple while still exposing request IDs, model access, and typed responses.

1. Separate Traffic By Project And Quota

Quota and rate policies are configured in Console, while each app only receives the project-scoped key it should use.

1// Example project setup in Console:
2//
3// Project: support-assistant
4// - apiKey: cp_support_xxx
5// - monthly quota: 2M input tokens
6// - allowed models: gpt-4o-mini, claude-3-7-sonnet
7//
8// Project: sales-enablement
9// - apiKey: cp_sales_xxx
10// - monthly quota: 500K input tokens
11// - allowed models: gpt-4o-mini
12 
13import { ConsoleClient } from '@cognipeer/console-sdk';
14 
15const client = new ConsoleClient({
16 apiKey: process.env.SUPPORT_PROJECT_API_KEY!,
17 baseURL: 'https://console.example.com',
18});

2. Route Requests Through A Stable Model Key

Your application uses one model key while Console decides routing, fallback, and resiliency behind the scenes.

1const response = await client.chat.completions.create({
2 model: 'support-primary',
3 messages: [
4 { role: 'system', content: 'You are the support assistant for premium customers.' },
5 { role: 'user', content: 'Summarize the open tickets for account A-104.' },
6 ],
7 temperature: 0.2,
8});
9 
10console.log(response.choices[0].message.content);
11console.log('request_id:', response.request_id);

3. Use Request IDs For Operational Follow-Up

When a team hits a quota wall or sees cost spikes, request-level correlation becomes operationally important.

1async function askSupportAssistant(prompt: string) {
2 const response = await client.chat.completions.create({
3 model: 'support-primary',
4 messages: [{ role: 'user', content: prompt }],
5 });
6 
7 auditLog.info({
8 requestId: response.request_id,
9 prompt,
10 model: 'support-primary',
11 });
12 
13 return response.choices[0].message.content;
14}

4. Add Semantic Cache Through Console

Console can also serve repeated or semantically similar prompts from cache. That is especially useful for high-volume support, catalog, and policy lookup scenarios where the same intent appears with slightly different wording.

1// Example Console model configuration:
2//
3// Model key: support-primary
4// semanticCache:
5// enabled: true
6// vectorProviderKey: qdrant-main
7// vectorIndexKey: support-semantic-cache
8// embeddingModelKey: text-embedding-3-small
9// similarityThreshold: 0.93
10// ttlSeconds: 86400
11 
12const response = await client.chat.completions.create({
13 model: 'support-primary',
14 messages: [{ role: 'user', content: 'How do I reset my workspace password?' }],
15});
16 
17console.log(response.request_id);

Result

You get a gateway pattern that:

- Separates AI traffic by project and budget - Controls which models each team can use - Applies routing, fallback, and semantic caching without app-side complexity - Improves cost and quota investigations through request-level visibility

All recipesSuggest a change