
Skill · AI & Development
RAG Failure Diagnostics & Architect
Diagnoses why a RAG system underperforms and architects the fix, with an evaluation harness and remediation plan.
- Category
- AI & Development
- Deliverable
- 1 .skill bundle
- Outputs
- 5
- Last updated
- 19 Jun 2026
- Works in Claude Pro, Team, and Enterprise
- Lifetime access to updates
- Refundable for 30 days via the marketplace
StrategistKit Affiliate. Purchase happens on the marketplace, which handles payment, delivery and refunds.
Overview
What RAG Failure Diagnostics & Architect does.
This skill applies a structured debugging framework to RAG systems that return confident but wrong answers. You describe your setup — the corpus, chunking strategy, embedding model, retrieval pipeline, and a failing query or question type — and the skill classifies the failure first (retrieval miss, generation hallucination, structural incompatibility with vector search) before recommending anything. It operates in three modes: DIAGNOSE for a specific bad query, ARCHITECT for choosing the right retrieval shape for a given question type, and SCHEMA for designing the institutional-memory layer that embeddings permanently discard.
A typical input: 'Our RAG system answers questions about internal engineering decisions. When users ask why a particular architecture was chosen two years ago, it retrieves plausible documents but stitches together a confident, fabricated rationale. Chunking is paragraph-level, embeddings are OpenAI text-embedding-3-small, no reranker.' The skill identifies this as a causal-chain failure — the answer requires decision provenance, not semantic proximity — and routes it to DIAGNOSE and ARCHITECT modes rather than chunk-size tuning.
The output for that input would include: Failure mode — 'Causal/provenance miss: the rationale was never stored as a traversable relationship; top-k retrieval cannot reconstruct it.' Structural cause — 'Vector search flattens decision context into token proximity; reranking cannot recover what was never indexed.' Architect recommendation — 'Decision-provenance graph with typed edges (decision → option → rationale → author → date); plain RAG is a structural mismatch for this question class.' Remediation plan — ranked steps from schema design to query routing, with effort estimates and the explicit 80% boundary where tuning alone would stop helping.
Who it's for
ML engineers and AI architects who built a RAG system that is underperforming in production and need to know whether the problem is tunable or structurally wrong. Also useful for technical leads scoping a new retrieval system who want to avoid defaulting to pure vector search for question types it cannot reliably answer.
What you get
One skill. 5 outputs.
One .skill bundle. Run it on your material and it returns:
Failure-mode diagnosis
Retrieval vs generation isolation
Chunking/embedding/rerank review
Eval harness design
Prioritized remediation plan
How it works
Three steps. About two minutes.
Install
Add the .skill file to your Claude app. ~10 seconds.
Run it on your work
Invoke the skill and paste in your material.
Apply the output
Review, keep what works, and use it.
In depth
Why a Claude skill beats a prompt template.
A copy-paste prompt runs one static pass and stops. A skill is a bundled program — instructions, examples, and a workflow Claude runs as a unit: it asks for the right input, applies the same pattern every time, and returns the structured outputs above.
FAQ
Common questions.
What do I need to provide as input for the skill to be useful?
Describe your retrieval pipeline (corpus type, chunking approach, embedding model, any reranker or hybrid search in use) and give at least one concrete failing query with the wrong answer it returned. The more specific the failure description, the more targeted the diagnosis.
Will it just tell me to fix my chunk size?
No — the skill explicitly classifies the failure before recommending any parameter tuning. If the query requires multi-hop reasoning, temporal ordering, causal explanation, or aggregation, the skill names that as a structural mismatch and recommends the appropriate retrieval architecture instead of tuning advice that cannot fix the root cause.
What does the evaluation harness output look like?
It designs a harness matched to your failure mode: for retrieval failures it specifies recall metrics against a labeled query set; for generation failures it specifies faithfulness checks against retrieved context. It does not produce runnable code, but it produces the test design, metric selection, and labeling criteria you hand to an engineer to implement.
Does the skill cover knowledge graphs and structured retrieval, or only vector RAG?
It covers the full decision space — vector RAG with hybrid search and reranking, knowledge graph and GraphRAG patterns, temporal and event-sourced indexes, structured query layers, and hybrid routers that dispatch by query type. It recommends the cheapest architecture that actually answers the question class, including cases where the right answer is a SQL query rather than RAG at all.
Can I use this skill while designing a RAG system, before I have a failing query?
Yes. The ARCHITECT mode takes a question type or system description and maps it to the right retrieval shape, explaining what breaks if you default to pure vector search for that workload. The SCHEMA mode designs the institutional-memory layer for organizations that need agents to answer 'why' and 'what caused' questions that embeddings cannot support.
More in AI & Development
Skills used with this one.


API Contract Tester

AI Automation QA & UAT Pack
