RAG & Semantic Search

Overview

Retrieval-Augmented Generation (RAG) combines the power of LLMs with your own data. Cognipeer Console acts as the unified gateway — managing vector databases, generating embeddings, and serving chat completions. Console SDK gives you type-safe access to all these capabilities.

This guide shows how to build a document Q&A system over your internal knowledge base.

When to reach for this recipe

If your team needs the capabilities described above and you'd rather build on proven primitives than wire one from scratch — this is the shape to start from.

Architecture

Console provides the infrastructure layer: OpenAI-compatible API for embeddings, vector orchestration across multiple providers (Pinecone, Qdrant, Chroma, etc.), and the files pipeline for document ingestion.

Console SDK is the TypeScript client that ties everything together with full type safety and streaming support.

1. Set Up Console SDK Client

Install the SDK and create a client pointing to your Console instance.

import { ConsoleClient } from "@cognipeer/console-sdk";
 
const client = new ConsoleClient({
  apiKey: process.env.COGNIPEER_API_KEY!,
  baseURL: "https://your-console.example.com",
});

2. Upload and Process Documents

Use the files pipeline to upload documents. Console automatically converts PDFs and other formats to Markdown for RAG-ready ingestion.

// Upload a document
const file = await client.files.upload({
  file: fs.createReadStream("./knowledge-base/product-docs.pdf"),
  purpose: "rag",
  convertToMarkdown: true,
});
 
console.log("Uploaded:", file.id, file.filename);
 
// List uploaded files
const files = await client.files.list();
console.log(`${files.data.length} files ready for processing`);

3. Generate Embeddings

Generate vector embeddings for your text chunks using the Console embeddings API. This supports multiple model providers through a single interface.

// Generate embeddings for text chunks
const chunks = [
  "Cognipeer Console is a self-hosted AI gateway...",
  "The vector orchestration layer manages multiple databases...",
  "Files pipeline supports PDF to Markdown conversion...",
];
 
const embeddings = await client.embeddings.create({
  model: "text-embedding-3-small",
  input: chunks,
});
 
console.log(`Generated ${embeddings.data.length} embeddings`);
console.log(`Dimensions: ${embeddings.data[0].embedding.length}`);

4. Store Vectors

Create a vector index and upsert your embeddings. Console provides a unified API across Pinecone, Qdrant, Chroma, and other providers.

// Create a vector index
const index = await client.vectors.createIndex("my-provider", {
  name: "knowledge-base",
  dimension: 1536,
  metric: "cosine",
});
 
// Upsert vectors with metadata
await client.vectors.upsert("my-provider", index.externalId, {
  vectors: chunks.map((text, i) => ({
    id: `chunk-${i}`,
    values: embeddings.data[i].embedding,
    metadata: { text, source: "product-docs.pdf" },
  })),
});

5. Query with RAG

Combine vector search with chat completions for retrieval-augmented generation.

async function ragQuery(question: string) {
  // 1. Embed the question
  const qEmbed = await client.embeddings.create({
    model: "text-embedding-3-small",
    input: question,
  });
 
  // 2. Find relevant chunks
  const results = await client.vectors.query("my-provider", "knowledge-base", {
    query: { vector: qEmbed.data[0].embedding, topK: 5 },
  });
 
  // 3. Build context from results
  const context = results.matches
    .map((m) => m.metadata?.text)
    .join("\n\n");
 
  // 4. Generate answer with context
  const response = await client.chat.completions.create({
    model: "gpt-4o",
    messages: [
      {
        role: "system",
        content: `Answer using the following context:\n\n${context}`,
      },
      { role: "user", content: question },
    ],
  });
 
  return response.choices[0].message.content;
}
 
const answer = await ragQuery("How does vector orchestration work?");
console.log(answer);

Result

You now have a complete RAG pipeline that:

- Ingests documents through the files pipeline with automatic Markdown conversion - Embeds text chunks through a unified API across any model provider - Stores vectors in your choice of vector database (Pinecone, Qdrant, Chroma, etc.) - Queries semantically and generates context-aware answers - Scales across providers with Console routing and fallback

All recipes Suggest a change

Overview

When to reach for this recipe

Architecture

1. Set Up Console SDK Client

2. Upload and Process Documents

3. Generate Embeddings

4. Store Vectors

5. Query with RAG

Result

Related recipes

Vector RAG Operations Control Plane