Overview

Retrieval-Augmented Generation (RAG) combines the power of LLMs with your own data. Cognipeer Console acts as the unified gateway — managing vector databases, generating embeddings, and serving chat completions. Console SDK gives you type-safe access to all these capabilities.

This guide shows how to build a document Q&A system over your internal knowledge base.

When to reach for this recipe

If your team needs the capabilities described above and you'd rather build on proven primitives than wire one from scratch — this is the shape to start from.

Architecture

Console provides the infrastructure layer: OpenAI-compatible API for embeddings, vector orchestration across multiple providers (Pinecone, Qdrant, Chroma, etc.), and the files pipeline for document ingestion.

Console SDK is the TypeScript client that ties everything together with full type safety and streaming support.

1. Set Up Console SDK Client

Install the SDK and create a client pointing to your Console instance.

1import { ConsoleClient } from "@cognipeer/console-sdk";
2 
3const client = new ConsoleClient({
4 apiKey: process.env.COGNIPEER_API_KEY!,
5 baseURL: "https://your-console.example.com",
6});

2. Upload and Process Documents

Use the files pipeline to upload documents. Console automatically converts PDFs and other formats to Markdown for RAG-ready ingestion.

1// Upload a document
2const file = await client.files.upload({
3 file: fs.createReadStream("./knowledge-base/product-docs.pdf"),
4 purpose: "rag",
5 convertToMarkdown: true,
6});
7 
8console.log("Uploaded:", file.id, file.filename);
9 
10// List uploaded files
11const files = await client.files.list();
12console.log(`${files.data.length} files ready for processing`);

3. Generate Embeddings

Generate vector embeddings for your text chunks using the Console embeddings API. This supports multiple model providers through a single interface.

1// Generate embeddings for text chunks
2const chunks = [
3 "Cognipeer Console is a self-hosted AI gateway...",
4 "The vector orchestration layer manages multiple databases...",
5 "Files pipeline supports PDF to Markdown conversion...",
6];
7 
8const embeddings = await client.embeddings.create({
9 model: "text-embedding-3-small",
10 input: chunks,
11});
12 
13console.log(`Generated ${embeddings.data.length} embeddings`);
14console.log(`Dimensions: ${embeddings.data[0].embedding.length}`);

4. Store Vectors

Create a vector index and upsert your embeddings. Console provides a unified API across Pinecone, Qdrant, Chroma, and other providers.

1// Create a vector index
2const index = await client.vectors.createIndex("my-provider", {
3 name: "knowledge-base",
4 dimension: 1536,
5 metric: "cosine",
6});
7 
8// Upsert vectors with metadata
9await client.vectors.upsert("my-provider", index.externalId, {
10 vectors: chunks.map((text, i) => ({
11 id: `chunk-${i}`,
12 values: embeddings.data[i].embedding,
13 metadata: { text, source: "product-docs.pdf" },
14 })),
15});

5. Query with RAG

Combine vector search with chat completions for retrieval-augmented generation.

1async function ragQuery(question: string) {
2 // 1. Embed the question
3 const qEmbed = await client.embeddings.create({
4 model: "text-embedding-3-small",
5 input: question,
6 });
7 
8 // 2. Find relevant chunks
9 const results = await client.vectors.query("my-provider", "knowledge-base", {
10 query: { vector: qEmbed.data[0].embedding, topK: 5 },
11 });
12 
13 // 3. Build context from results
14 const context = results.matches
15 .map((m) => m.metadata?.text)
16 .join("\n\n");
17 
18 // 4. Generate answer with context
19 const response = await client.chat.completions.create({
20 model: "gpt-4o",
21 messages: [
22 {
23 role: "system",
24 content: `Answer using the following context:\n\n${context}`,
25 },
26 { role: "user", content: question },
27 ],
28 });
29 
30 return response.choices[0].message.content;
31}
32 
33const answer = await ragQuery("How does vector orchestration work?");
34console.log(answer);

Result

You now have a complete RAG pipeline that:

- Ingests documents through the files pipeline with automatic Markdown conversion - Embeds text chunks through a unified API across any model provider - Stores vectors in your choice of vector database (Pinecone, Qdrant, Chroma, etc.) - Queries semantically and generates context-aware answers - Scales across providers with Console routing and fallback

All recipesSuggest a change