
Skill · AI & Development
DevOps & SRE Playbook Suite
Deploy SRE workflows, blameless postmortems, and SLO frameworks to improve system reliability and team response. Install in 30 seconds.
- Category
- AI & Development
- Deliverable
- 1 .skill bundle
- Outputs
- —
- Last updated
- 13 Jun 2026
- Works in Claude Pro, Team, and Enterprise
- Lifetime access to updates
- Refundable for 30 days via the marketplace
StrategistKit Affiliate. Purchase happens on the marketplace, which handles payment, delivery and refunds.
Overview
What DevOps & SRE Playbook Suite does.
This skill covers the full SRE lifecycle in a single installation: blameless postmortems structured to drive behaviour change rather than blame, SLO and error budget frameworks tied to business impact, on-call runbooks built for 2am clarity, CI/CD pipeline risk audits, SAST configuration, and migration plans with staged rollback gates. Tell it your stack, team size, and current reliability maturity and it calibrates depth accordingly, then tells you precisely what to tackle next.
A typical session might start with: 'We had a database failover incident last night affecting 12,000 users. Stack is Kubernetes on GCP, team of eight engineers, no formal postmortem process yet.' The skill leads with the highest-impact action for that context, asks four short calibration questions, then produces a draft postmortem, a suggested SLO for the affected service, and a prioritised list of process gaps — ready to paste into your incident management system or engineering handbook.
Sample output excerpt (postmortem section): | Field | Content | | — | — | | Incident title | Database failover — 47-minute user-facing degradation | | Timeline | 02:14 alert fired; 02:31 root cause identified; 03:01 traffic restored | | Contributing factors | Missing read-replica promotion runbook; alert threshold set above SLO breach point | | Action items | 1. Write promotion runbook (owner: SRE lead, due: Friday) 2. Lower alert threshold to 99.5% over 5 min rolling window | | What went well | Rollback executed without data loss |
Who it's for
Engineering managers and SREs building or maturing a reliability practice — particularly those running small-to-mid-size teams without a dedicated platform org, who need production-ready postmortems, runbooks, and SLO definitions without starting from a blank page at 3am.
How it works
Three steps. About two minutes.
Install
Add the .skill file to your Claude app. ~10 seconds.
Run it on your work
Invoke the skill and paste in your material.
Apply the output
Review, keep what works, and use it.
In depth
Why a Claude skill beats a prompt template.
A copy-paste prompt runs one static pass and stops. A skill is a bundled program — instructions, examples, and a workflow Claude runs as a unit: it asks for the right input, applies the same pattern every time, and returns the structured outputs above.
FAQ
Common questions.
What inputs does the skill need to produce useful output?
At minimum: a description of your incident or the deliverable you need, your tech stack, and your team size. Maturity context (e.g. 'no formal on-call process yet') sharpens calibration further. The skill will prompt for anything missing before generating.
What formats does it return and can I paste them directly into existing tools?
Outputs default to structured documents with summary, findings, and action items — designed to drop into incident management systems, Confluence, Notion, or a plain engineering handbook. It also produces copy-paste runbook templates and checklist formats with owner and timeline fields on request.
Does the skill cover the full SRE scope or just postmortems?
It covers ten distinct areas: incident classification, blameless postmortems, migration risk audits, changelog generation, SAST pipeline configuration, on-call runbooks, SLO and error budget design, deployment checklists, disaster recovery plans, and reliability metrics. You can work through one area per session or chain them.
Can it handle a team that has almost no SRE process in place yet?
Yes. The skill explicitly calibrates to maturity level. For early-stage teams it prioritises foundational quick-wins and explains the reasoning; for more mature organisations it goes deeper on error budget policy, SAST rule tuning, and multi-stage rollback gate design.
Will it work for any tech stack or is it biased toward specific tooling?
Stack-agnostic by design. You specify your tooling at the start of the session and all output — pipeline configs, runbook commands, SLO metric queries — references your actual environment rather than a generic example.
More in AI & Development
Skills used with this one.


Verification-Before-Done

UI Design Taste Critic

Technical Spec Writer
Part of these collections