Quality Standards
This document defines the automated quality checks that run on every skill PR.
Quality Score
Every skill receives a composite score from 0-100. The score is calculated across five dimensions, each contributing a weighted portion.
Scoring Breakdown
Structure (15%)
| Check | Points | Criteria |
|---|---|---|
| SKILL.md exists | 3 | Exact case match |
| Valid YAML frontmatter | 3 | Parses without error, has --- delimiters |
| No unexpected keys | 1 | Only allowed frontmatter keys |
| Name field valid | 2 | kebab-case, matches folder, <=64 chars |
| Name matches folder | 1 | Frontmatter name == directory name |
| Description field valid | 2 | Present, >10 chars |
| No angle brackets | 1 | No <> in frontmatter |
| Folder naming | 1 | kebab-case, no spaces/capitals |
| No README.md in skill | 1 | README belongs at repo level only |
| Test directory exists | 2 | tests/test-cases.yml present and valid |
| Status field valid | 1 | If present, must be: active, beta, deprecated, or archived |
Description Quality (20%)
| Check | Points | Criteria |
|---|---|---|
| Contains action verbs | 4 | ”Creates”, “Analyzes”, “Manages”, etc. |
| Contains trigger phrases | 5 | ”Use when user says…”, “Use for…” |
| Specific, not vague | 4 | Detailed, actionable descriptions |
| Mentions file types | 3 | If applicable |
| Under 1024 chars | 2 | Hard limit |
| Includes negative triggers | 2 | ”Do NOT use for…” |
| Owner in metadata | 2 | metadata.owner or metadata.author present |
Instruction Quality (25%)
| Check | Points | Criteria |
|---|---|---|
| Non-empty body | 3 | Content after frontmatter |
| Has step structure | 4 | Numbered steps, ## headers, or clear sequence |
| Includes examples | 5 | Code blocks, user scenarios, expected outputs |
| Includes error handling | 4 | ”If X fails…”, troubleshooting section |
| Uses progressive disclosure | 4 | References to references/ or scripts/ for detail |
| Actionable language | 3 | ”Run X”, “Call Y”, “Check Z” |
| Word count under 5000 | 2 | Encourages conciseness |
| Instruction coherence | 3 | All referenced paths must actually exist |
Test Coverage (25%)
| Check | Points | Criteria |
|---|---|---|
| test-cases.yml exists | 3 | Valid YAML structure |
| >=3 should-trigger tests | 4 | Including paraphrased variations |
| >=2 should-not-trigger tests | 3 | Unrelated topics |
| >=2 functional tests | 5 | Input -> expected behavior pairs |
| >=1 negative test | 3 | What the skill should refuse |
| >=1 edge case test | 3 | Special characters, empty inputs |
| Performance baseline | 2 | Before/after comparison |
| Functional tests have assertions | 2 | Each test has >=2 expected_behavior items |
| Trigger diversity | 2 | No near-duplicate triggers |
| Assertion specificity | 2 | No vague assertions |
Security (15%)
| Check | Points | Criteria |
|---|---|---|
| No secrets detected | 5 | API keys, tokens, passwords |
| No angle brackets in frontmatter | 3 | Prevents prompt injection |
| No reserved names | 3 | ”claude” or “anthropic” in skill name |
| No suspicious patterns | 2 | eval(), exec(), system() |
| No hardcoded URLs | 2 | Unless documented |
Score Thresholds
| Score | Result | Action |
|---|---|---|
| 90-100 | Excellent | Auto-approved for maintainer review |
| 70-89 | Good | Approved with suggestions |
| 50-69 | Needs Work | Blocked — must address issues |
| 0-49 | Rejected | Major issues — significant rework needed |