SRE Incident Response Agent

Overview

An SRE agent is a strong cross-product scenario because incidents require live reasoning, operational guardrails, tool usage, and grounded answers from runbooks. The agent should not answer only from the model. It should read operational knowledge, search incident history, and update the ticketing system with structured findings.

This pattern combines Agent SDK for runtime control with Console for RAG, vector search, config, and tool connectivity.

When to reach for this recipe

If your team needs the capabilities described above and you'd rather build on proven primitives than wire one from scratch — this is the shape to start from.

Architecture

Console stores the Confluence-derived knowledge in RAG and vector indexes, hosts config and tools, and can expose Jira or internal APIs through OpenAPI and MCP. Agent SDK runs the SRE agent with handoff and guardrail support.

The result is an incident agent that can pull runbook context from Confluence, search vector knowledge during analysis, and post a summary back to Jira through a tool call.

1. Ingest Confluence Runbooks Into Console RAG

A sync job can pull pages from Confluence, normalize them, and write them into a Console RAG module so the agent does not rely on stale prompts alone.

import { ConsoleClient } from '@cognipeer/console-sdk';
 
const client = new ConsoleClient({
  apiKey: process.env.COGNIPEER_API_KEY!,
  baseURL: 'https://console.example.com',
});
 
const pages = await fetchConfluenceRunbooks();
 
for (const page of pages) {
  await client.rag.ingest('sre-runbooks', {
    documents: [
      {
        id: page.id,
        content: page.markdown,
        metadata: {
          source: 'confluence',
          title: page.title,
          space: page.spaceKey,
        },
      },
    ],
  });
}

2. Bind Jira And Confluence Access As Tools

The agent uses tools for live systems and Console RAG for grounded knowledge. Jira can come from Console tools or MCP-backed actions; Confluence can be a sync tool or refresh tool depending on your workflow.

import { createTool, createSmartAgent } from '@cognipeer/agent-sdk';
import { z } from 'zod';
 
const jiraCommentTool = createTool({
  name: 'jira_comment',
  description: 'Post an incident update comment to a Jira issue',
  schema: z.object({
    issueKey: z.string(),
    comment: z.string(),
  }),
  func: async ({ issueKey, comment }) => {
    return jiraApi.addComment(issueKey, comment);
  },
});
 
const searchRunbooks = createTool({
  name: 'search_runbooks',
  description: 'Search SRE runbooks through Console vector and RAG infrastructure',
  schema: z.object({
    query: z.string(),
  }),
  func: async ({ query }) => {
    return client.rag.query('sre-runbooks', {
      query,
      topK: 5,
    });
  },
});
 
const refreshConfluencePage = createTool({
  name: 'refresh_confluence_page',
  description: 'Pull the latest Confluence page content when the runbook may be outdated',
  schema: z.object({
    pageId: z.string(),
  }),
  func: async ({ pageId }) => fetchConfluencePage(pageId),
});

3. Build The SRE Agent With Guardrails

The agent should be grounded, concise, and policy-aware. It should never recommend destructive actions without explicit approval or verified runbook support.

const sreAgent = createSmartAgent({
  name: 'SREAgent',
  model,
  tools: [searchRunbooks, refreshConfluencePage, jiraCommentTool],
  systemPrompt: 'You are an SRE incident response agent. Use runbook evidence before making recommendations and produce clear incident updates.',
  guardrails: [
    createRegexGuardrail(/drop\s+database|delete\s+cluster/i, {
      guardrailTitle: 'Destructive Action Guardrail',
    }),
  ],
  useTodoList: true,
  tracing: { enabled: true },
});

4. Search Vector Knowledge And Update Jira

During the incident, the agent searches the runbook corpus, produces a grounded diagnosis, and posts a structured comment back to the Jira ticket.

const result = await sreAgent.invoke({
  messages: [
    {
      role: 'user',
      content: 'Investigate elevated API latency for checkout-service, use the runbooks, and post an update to JIRA-1842.',
    },
  ],
});
 
console.log(result.content);
 
// Typical flow:
// 1. search_runbooks -> Console RAG / vector search
// 2. refresh_confluence_page if the runbook seems stale
// 3. jira_comment -> post a grounded incident update

Result

You get an SRE incident workflow that:

- Pulls operational knowledge from Confluence into Console RAG - Searches vector knowledge during live incident handling - Uses Agent SDK tools for Jira comments and refresh actions - Applies guardrails before dangerous or unsupported recommendations - Fits incident triage, runbook automation, and ops assistant scenarios

All recipes Suggest a change

Overview

When to reach for this recipe

Architecture

1. Ingest Confluence Runbooks Into Console RAG

2. Bind Jira And Confluence Access As Tools

3. Build The SRE Agent With Guardrails

4. Search Vector Knowledge And Update Jira

Result

Related recipes

Agent Development With Tracing

Autonomous AI Agents

Multi-Agent Orchestration