Vector RAG Operations Control Plane

Overview

Many RAG projects stall after the first demo because document ingestion, vector operations, and retrieval quality are spread across unrelated tools. Console brings those tasks into one platform so teams can run RAG like an operational system rather than a script.

This scenario fits internal knowledge assistants, support knowledge bases, compliance search, and document-grounded copilots.

When to reach for this recipe

If your team needs the capabilities described above and you'd rather build on proven primitives than wire one from scratch — this is the shape to start from.

Architecture

Console provides the files pipeline, vector provider abstraction, and model gateway. Console SDK lets you wire the whole flow from a Node service or an internal admin tool.

The key advantage is operational consistency: the same platform handles uploads, embeddings, indexes, retrieval, and the final grounded answer.

1. Upload Source Files And Create The Index

Start by loading documents and provisioning the index that will serve retrieval.

import fs from 'node:fs';
import { ConsoleClient } from '@cognipeer/console-sdk';
 
const client = new ConsoleClient({
  apiKey: process.env.COGNIPEER_API_KEY!,
  baseURL: 'https://console.example.com',
});
 
await client.files.upload({
  file: fs.createReadStream('./docs/expense-policy.pdf'),
  purpose: 'assistants',
});
 
await client.vectors.indexes.create('qdrant-main', {
  name: 'policy-knowledge-base',
  dimension: 1536,
  metric: 'cosine',
});

2. Embed And Upsert Chunks

Once files are parsed into chunks, embed them and upsert them through the vector API.

const chunks = [
  'Taxi receipts are reimbursable when attached within 10 business days.',
  'International hotel expenses require manager approval above 300 EUR per night.',
];
 
const embeddingResponse = await client.embeddings.create({
  model: 'text-embedding-3-small',
  input: chunks,
});
 
await client.vectors.upsert('qdrant-main', 'policy-knowledge-base', {
  vectors: chunks.map((text, index) => ({
    id: 'policy-' + (index + 1),
    values: embeddingResponse.data[index].embedding,
    metadata: { text, source: 'expense-policy.pdf' },
  })),
});

3. Run Retrieval And Grounded Answer Generation

Query the vector index first, then pass the retrieved context into chat completion through the same Console surface.

const question = 'Can I expense a 340 EUR hotel room during an international event?';
 
const queryEmbedding = await client.embeddings.create({
  model: 'text-embedding-3-small',
  input: question,
});
 
const matches = await client.vectors.query('qdrant-main', 'policy-knowledge-base', {
  query: {
    vector: queryEmbedding.data[0].embedding,
    topK: 3,
  },
});
 
const context = matches.result.matches
  .map((match) => match.metadata?.text)
  .filter(Boolean)
  .join('
 
');
 
const answer = await client.chat.completions.create({
  model: 'rag-answer-model',
  messages: [
    {
      role: 'system',
      content: 'Answer only with the provided policy context:
 
' + context,
    },
    { role: 'user', content: question },
  ],
});
 
console.log(answer.choices[0].message.content);

Result

You get a RAG operations pattern that:

- Unifies files, embeddings, vector indexes, and chat in one platform - Lets platform teams manage vector backends without rewriting app code - Improves traceability for grounded answers and data sources - Fits policy assistants, knowledge search, and document-heavy internal workflows

All recipes Suggest a change

Vector RAG Operations Control Plane

Overview

When to reach for this recipe

Architecture

1. Upload Source Files And Create The Index

2. Embed And Upsert Chunks

3. Run Retrieval And Grounded Answer Generation

Result

Related recipes

Quota-Aware LLM Gateway

PromptOps And MCP Tool Gateway

RAG & Semantic Search

AI-Powered Applications