Overview

Many RAG projects stall after the first demo because document ingestion, vector operations, and retrieval quality are spread across unrelated tools. Console brings those tasks into one platform so teams can run RAG like an operational system rather than a script.

This scenario fits internal knowledge assistants, support knowledge bases, compliance search, and document-grounded copilots.

When to reach for this recipe

If your team needs the capabilities described above and you'd rather build on proven primitives than wire one from scratch — this is the shape to start from.

Architecture

Console provides the files pipeline, vector provider abstraction, and model gateway. Console SDK lets you wire the whole flow from a Node service or an internal admin tool.

The key advantage is operational consistency: the same platform handles uploads, embeddings, indexes, retrieval, and the final grounded answer.

1. Upload Source Files And Create The Index

Start by loading documents and provisioning the index that will serve retrieval.

1import fs from 'node:fs';
2import { ConsoleClient } from '@cognipeer/console-sdk';
3 
4const client = new ConsoleClient({
5 apiKey: process.env.COGNIPEER_API_KEY!,
6 baseURL: 'https://console.example.com',
7});
8 
9await client.files.upload({
10 file: fs.createReadStream('./docs/expense-policy.pdf'),
11 purpose: 'assistants',
12});
13 
14await client.vectors.indexes.create('qdrant-main', {
15 name: 'policy-knowledge-base',
16 dimension: 1536,
17 metric: 'cosine',
18});

2. Embed And Upsert Chunks

Once files are parsed into chunks, embed them and upsert them through the vector API.

1const chunks = [
2 'Taxi receipts are reimbursable when attached within 10 business days.',
3 'International hotel expenses require manager approval above 300 EUR per night.',
4];
5 
6const embeddingResponse = await client.embeddings.create({
7 model: 'text-embedding-3-small',
8 input: chunks,
9});
10 
11await client.vectors.upsert('qdrant-main', 'policy-knowledge-base', {
12 vectors: chunks.map((text, index) => ({
13 id: 'policy-' + (index + 1),
14 values: embeddingResponse.data[index].embedding,
15 metadata: { text, source: 'expense-policy.pdf' },
16 })),
17});

3. Run Retrieval And Grounded Answer Generation

Query the vector index first, then pass the retrieved context into chat completion through the same Console surface.

1const question = 'Can I expense a 340 EUR hotel room during an international event?';
2 
3const queryEmbedding = await client.embeddings.create({
4 model: 'text-embedding-3-small',
5 input: question,
6});
7 
8const matches = await client.vectors.query('qdrant-main', 'policy-knowledge-base', {
9 query: {
10 vector: queryEmbedding.data[0].embedding,
11 topK: 3,
12 },
13});
14 
15const context = matches.result.matches
16 .map((match) => match.metadata?.text)
17 .filter(Boolean)
18 .join('
19 
20');
21 
22const answer = await client.chat.completions.create({
23 model: 'rag-answer-model',
24 messages: [
25 {
26 role: 'system',
27 content: 'Answer only with the provided policy context:
28 
29' + context,
30 },
31 { role: 'user', content: question },
32 ],
33});
34 
35console.log(answer.choices[0].message.content);

Result

You get a RAG operations pattern that:

- Unifies files, embeddings, vector indexes, and chat in one platform - Lets platform teams manage vector backends without rewriting app code - Improves traceability for grounded answers and data sources - Fits policy assistants, knowledge search, and document-heavy internal workflows

All recipesSuggest a change