Skill · AI & Development

DevOps & SRE Playbook Suite

Deploy SRE workflows, blameless postmortems, and SLO frameworks to improve system reliability and team response. Install in 30 seconds.

Category: AI & Development
Deliverable: 1 .skill bundle
Outputs: —
Last updated: 13 Jun 2026

$12.99 One-time · lifetime updates

Get it on Agensi

Works in Claude Pro, Team, and Enterprise
Lifetime access to updates
Refundable for 30 days via the marketplace

Or get a free skill every month. Subscribers get one curated skill, free, every 1st. Pick yours →

StrategistKit Affiliate. Purchase happens on the marketplace, which handles payment, delivery and refunds.

Overview

What DevOps & SRE Playbook Suite does.

This skill covers the full SRE lifecycle in a single installation: blameless postmortems structured to drive behaviour change rather than blame, SLO and error budget frameworks tied to business impact, on-call runbooks built for 2am clarity, CI/CD pipeline risk audits, SAST configuration, and migration plans with staged rollback gates. Tell it your stack, team size, and current reliability maturity and it calibrates depth accordingly, then tells you precisely what to tackle next.

A typical session might start with: 'We had a database failover incident last night affecting 12,000 users. Stack is Kubernetes on GCP, team of eight engineers, no formal postmortem process yet.' The skill leads with the highest-impact action for that context, asks four short calibration questions, then produces a draft postmortem, a suggested SLO for the affected service, and a prioritised list of process gaps — ready to paste into your incident management system or engineering handbook.

Who it's for

Engineering managers and SREs building or maturing a reliability practice — particularly those running small-to-mid-size teams without a dedicated platform org, who need production-ready postmortems, runbooks, and SLO definitions without starting from a blank page at 3am.

How it works

Three steps. About two minutes.

Install

Add the .skill file to your Claude app. ~10 seconds.

Run it on your work

Invoke the skill and paste in your material.

Apply the output

Review, keep what works, and use it.

In depth

Why a Claude skill beats a prompt template.

A copy-paste prompt runs one static pass and stops. A skill is a bundled program — instructions, examples, and a workflow Claude runs as a unit: it asks for the right input, applies the same pattern every time, and returns the structured outputs above.

FAQ

Common questions.

What inputs does the skill need to produce useful output?

At minimum: a description of your incident or the deliverable you need, your tech stack, and your team size. Maturity context (e.g. 'no formal on-call process yet') sharpens calibration further. The skill will prompt for anything missing before generating.

What formats does it return and can I paste them directly into existing tools?

Outputs default to structured documents with summary, findings, and action items — designed to drop into incident management systems, Confluence, Notion, or a plain engineering handbook. It also produces copy-paste runbook templates and checklist formats with owner and timeline fields on request.

Does the skill cover the full SRE scope or just postmortems?

It covers ten distinct areas: incident classification, blameless postmortems, migration risk audits, changelog generation, SAST pipeline configuration, on-call runbooks, SLO and error budget design, deployment checklists, disaster recovery plans, and reliability metrics. You can work through one area per session or chain them.

Can it handle a team that has almost no SRE process in place yet?

Yes. The skill explicitly calibrates to maturity level. For early-stage teams it prioritises foundational quick-wins and explains the reasoning; for more mature organisations it goes deeper on error budget policy, SAST rule tuning, and multi-stage rollback gate design.

Will it work for any tech stack or is it biased toward specific tooling?

Stack-agnostic by design. You specify your tooling at the start of the session and all output — pipeline configs, runbook commands, SLO metric queries — references your actual environment rather than a generic example.