Creating Checks¶
KubeBuddy checks are now authored for the native Go runtime.
The supported model is:
- YAML for check metadata and rule definitions
- Prometheus blocks for metric-driven checks
- native Go handlers for checks that need procedural logic
The old PowerShell Script: model is no longer part of the supported runtime.
Check Locations¶
Use these directories:
- Kubernetes checks:
checks/kubernetes/*.yaml - AKS checks:
checks/aks/*.yaml
The CLI defaults already point at those paths.
Supported Check Styles¶
Declarative checks¶
Use declarative checks when the result can be derived from:
- resource fields
- simple comparisons
- array membership
- counts
- existence checks
These are the preferred default.
Example:
checks:
- id: POD004
name: Pending Pods
section: Pods
category: Workloads
resource_kind: Pod
severity: Warning
weight: 3
description: Detects pods stuck in a Pending state due to scheduling or dependency issues.
fail_message: Some pods are stuck in Pending.
recommendation: Inspect scheduling constraints, missing dependencies, and cluster capacity.
recommendation_html: |
<div class="recommendation-content">
<ul>
<li>Run <code>kubectl describe pod <pod> -n <namespace></code> to inspect scheduling events.</li>
<li>Check node resources, taints, tolerations, and affinity rules.</li>
<li>Verify required PVCs, Secrets, and ConfigMaps exist and are bound.</li>
</ul>
</div>
speech_bubble:
- Some pods are stuck in Pending.
- Check scheduling events, cluster capacity, and missing dependencies.
url: https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#pod-phase
value:
path: status.phase
operator: not_equals
expected: Pending
Prometheus checks¶
Use a prometheus: block when the check is based on PromQL and threshold comparison.
These are still YAML-defined, but the runtime executes the Prometheus query in Go.
Example:
checks:
- id: PROM001
name: High CPU Pods (Prometheus)
category: Performance
section: Pods
resource_kind: Pod
severity: Warning
weight: 3
description: Checks for pods with sustained high CPU usage over the last 24 hours.
fail_message: Some pods show high sustained CPU usage.
recommendation: Investigate high CPU usage and adjust requests, limits, or scaling.
recommendation_html: |
<div class="recommendation-content">
<ul>
<li>Confirm whether the CPU profile is expected for the workload.</li>
<li>Review container requests and limits.</li>
<li>Consider autoscaling or workload tuning if CPU remains persistently high.</li>
</ul>
</div>
speech_bubble:
- Some pods are showing sustained high CPU usage.
- Check requests, limits, scaling, and workload behavior.
url: https://kubernetes.io/docs/concepts/cluster-administration/monitoring/
prometheus:
query: sum(rate(container_cpu_usage_seconds_total{container!="",pod!=""}[5m])) by (pod)
range:
step: 5m
duration: 24h
operator: greater_than
expected: cpu_critical
Native handler checks¶
Use native_handler: when the logic is too complex for a clean declarative rule.
Examples:
- cross-resource correlation
- workload ownership resolution
- storage/network consistency checks
- richer rightsizing or recommendation logic
In that model:
- YAML still defines the check id, name, severity, docs, and report content
- Go implements the handler logic
Example:
checks:
- id: NET001
name: Services Without Endpoints
category: Networking
section: Networking
resource_kind: Service
severity: High
weight: 2
description: Identifies services that have no backing endpoints.
fail_message: Service has no endpoints.
recommendation: Check selectors, pod readiness, and EndpointSlice generation.
recommendation_html: |
<div class="recommendation-content">
<ul>
<li>Verify the Service selector matches live pod labels.</li>
<li>Check pod readiness and EndpointSlice generation.</li>
<li>Confirm the backing workload is healthy before sending traffic.</li>
</ul>
</div>
speech_bubble:
- This service has no endpoints.
- Check selectors, pod readiness, and EndpointSlices.
url: https://kubernetes.io/docs/concepts/services-networking/service/
native_handler: NET001
value:
path: metadata.name
operator: exists
YAML Shape¶
Current native checks use lower-case field names.
Common fields:
| Field | Required | Notes |
|---|---|---|
id |
yes | Unique check id such as SEC004 or AKSSEC001 |
name |
yes | Human-readable check name |
category |
yes | Broad grouping used in reports |
section |
yes | Report/tab grouping |
resource_kind |
yes for Kubernetes checks | Resource type used by the runtime |
severity |
yes | Example values: Low, Warning, High |
weight |
yes | Used in report weighting and ordering |
description |
yes | What the check detects |
fail_message |
yes | Message shown when findings exist |
recommendation |
yes | Plain-text remediation guidance |
recommendation_html |
expected | Rich HTML recommendation block for report parity |
speech_bubble |
expected | Short Buddy/TUI recommendation text |
url |
yes | Primary docs link |
value |
usually | Path or expression to evaluate |
operator |
usually | Comparison operator |
expected |
usually | Comparison target |
native_handler |
optional | Use for procedural Go checks |
prometheus |
optional | Use for Prometheus-backed checks |
The YAML keeps the user-facing definition. The runtime resolves the native_handler value in Go.
Recommendation Variants¶
Every check should be authored for three output surfaces:
recommendation- plain text for TXT, CSV, and JSON consumers
recommendation_html- richer HTML for the report
speech_bubble- short TUI/Buddy wording
The loader can synthesize recommendation_html and speech_bubble when only recommendation is present, but that is now a fallback only.
Preferred standard:
- new checks should define all three explicitly
- existing checks should keep or restore richer variants where possible
- commands and flags in
recommendation_htmlshould use inline<code> speech_bubbleshould be shorter thanrecommendation, not just copied verbatim
Operators¶
The native evaluator supports operators such as:
equalsnot_equalscontainsnot_containsexistsmissinggreater_thangreater_than_or_equalless_thanless_than_or_equalmatchesnot_matches
Complex rules can also use composed values such as:
allanycoalescecount_where
For examples, inspect the existing catalog under:
checks/kuberneteschecks/aks
When To Use A Native Handler¶
Use a handler when YAML would become harder to understand than the code.
Good reasons:
- joining multiple resource types
- resolving owners or related workloads
- deduplicating compound findings
- formatting special item payloads
- complex AKS or Prometheus logic
Do not force every check into a large declarative expression just because it is possible.
Authoring Rules¶
- Keep one check focused on one concern.
- Keep ids stable once published.
- Prefer declarative YAML first.
- Author
recommendation_htmlandspeech_bubbleexplicitly for new checks. - Keep
recommendation_htmlreadable in the dark report theme. - Keep
speech_bubbledirect and brief. - Keep URLs authoritative and current.
- Match existing naming and severity patterns in the catalog.
Validation¶
Useful validation commands:
go run ./cmd/kubebuddy checks
go test ./internal/checks ./internal/scan
To inspect the AKS catalog:
go run ./cmd/kubebuddy checks --checks-dir checks/aks