Kubernetes Checks Reference¶
KubeBuddy runs checks to find issues and misconfigurations in your Kubernetes cluster. These checks power the health report and help you fix problems, reduce risk, and improve stability. This page lists all checks by category, with their ID, name, description, severity, and score weight.
Overview¶
Each check targets a specific part of your cluster—nodes, pods, workloads, security, etc. Tables group checks by category. Use them to understand what’s being evaluated, how serious the issue is, and how much it affects your overall health score.
Section Mapping: YAML to Report Tabs¶
Each check includes a Section value in its YAML. This table shows how those values map to the tabs in the HTML report:
| YAML Section Value | Report Tab Name |
|---|---|
Nodes |
Nodes |
Namespaces |
Namespaces |
Workloads |
Workloads |
Pods |
Pods |
Jobs |
Jobs |
Networking |
Networking |
Storage |
Storage |
Configuration |
Configuration Hygiene |
Security |
Security |
Kubernetes Events |
Kubernetes Events |
Use this when defining or updating checks to control where they appear in the report.
Checks by Category¶
Each table includes:
- ID – Identifier for the check
- Name – Short label
- Description – What it checks and why it matters
- Severity – Low / Medium / High / Warning / Info
- Weight – Contribution to health score
Performance¶
| ID | Name | Description | Severity | Weight |
|---|---|---|---|---|
| PROM001 | High CPU Pods (Prometheus) | Checks for pods with sustained high CPU usage over the last 24 hours using Prometheus. | Warning | 3 |
| PROM002 | High Memory Usage Pods (Prometheus) | Detects pods with high memory usage over the last 24 hours based on Prometheus metrics. | Warning | 3 |
Configuration¶
| ID | Name | Description | Severity | Weight |
|---|---|---|---|---|
| CFG001 | Orphaned ConfigMaps | Unused ConfigMaps that can be removed. | Medium | 1 |
| CFG002 | Duplicate ConfigMap Names | Same name used in different namespaces—creates confusion. | Medium | 1 |
| CFG003 | Large ConfigMaps | Oversized ConfigMaps that may affect performance. | Medium | 2 |
Events¶
| ID | Name | Description | Severity | Weight |
|---|---|---|---|---|
| EVENT001 | Grouped Warning Events | Groups frequent warnings to help identify recurring issues. | Low | 1 |
| EVENT002 | Full Warning Event Log | Lists all recent warning events. | Low | 1 |
Jobs¶
| ID | Name | Description | Severity | Weight |
|---|---|---|---|---|
| JOB001 | Stuck Kubernetes Jobs | Jobs stuck in start or finish states due to controller issues. | High | 2 |
| JOB002 | Failed Kubernetes Jobs | Jobs that failed or hit backoff limits. | High | 2 |
Networking¶
| ID | Name | Description | Severity | Weight |
|---|---|---|---|---|
| NET001 | Services Without Endpoints | Identifies services that have no backing endpoints, which means no pods are matched. | critical | 2 |
| NET002 | Publicly Accessible Services | Detects services of type LoadBalancer or NodePort that are potentially exposed to the internet. | critical | 4 |
| NET003 | Ingress Health Validation | Validates ingress definitions for missing classes, invalid backends, missing TLS secrets, duplicate host/path entries, and incorrect path types. | critical | 3 |
| NET004 | Namespace Missing Network Policy | Detects namespaces that have running pods but no associated NetworkPolicy resources. This could allow unrestricted pod-to-pod communication. | warning | 3 |
| NET005 | Ingress Host/Path Conflicts | Identifies Ingress resources that define conflicting host and path combinations, leading to unpredictable routing. | critical | 5 |
| NET006 | Ingress Using Wildcard Hosts | Identifies Ingress resources that utilize wildcard hosts (e.g., '*https://www.google.com/search?q=.example.com'), which may offer broader exposure than intended. | medium | 2 |
| NET007 | Service TargetPort Mismatch | Identifies services whose 'targetPort' does not match any 'containerPort' in the backing pods, preventing traffic delivery. | critical | 4 |
| NET008 | ExternalName Service to Internal IP | Identifies 'ExternalName' type services pointing to private IP ranges, which might indicate a misconfiguration or an unusual routing pattern. | medium | 2 |
| NET009 | Overly Permissive Network Policy | Identifies NetworkPolicies that define 'policyTypes' but have no rules, effectively allowing all traffic for that type, or containing overly broad 'ipBlock' rules. | high | 4 |
| NET010 | Network Policy Overly Permissive IPBlock | Flags NetworkPolicies that include '0.0.0.0/0' in their 'ipBlock' rules, effectively allowing traffic to/from all IPs for that rule, which can be a security risk. | high | 5 |
| NET011 | Network Policy Missing PolicyTypes | Detects NetworkPolicies that do not explicitly define 'policyTypes'. While defaulting to Ingress in some older versions, explicit definition improves clarity and future compatibility across different CNI plugins and Kubernetes versions. | low | 1 |
| NET012 | Pod HostNetwork Usage | Identifies pods configured to use 'hostNetwork: true', which allows direct access to the node's network interfaces, bypassing Kubernetes networking. | high | 4 |
| NET013 | Ingress Present Without Gateway API Adoption | Detects Ingress usage where Gateway API resources are not yet adopted, helping teams plan migration to Gateway API-based ingress. | warning | 2 |
| NET014 | HTTPRoute Missing or Unaccepted Parent | Detects HTTPRoutes with no parentRefs or routes not Accepted by any parent Gateway. | critical | 3 |
| NET015 | Gateways Without Attached HTTPRoutes | Detects Gateway resources that currently have no HTTPRoutes attached via parentRefs. | warning | 2 |
| NET016 | Gateway API Readiness Conditions | Detects GatewayClass/Gateway resources that are not Accepted or Programmed and may not route traffic. | critical | 3 |
| NET017 | Gateway TLS Secret and Cross-Namespace ReferenceGrant Validation | Validates Gateway listener certificateRefs and required ReferenceGrants for cross-namespace Secret usage. | critical | 3 |
| NET018 | Duplicate Service Selectors | Detects multiple Services in the same namespace using the same selector set. | warning | 3 |
| PROM003 | High Network Receive Rate (Prometheus) | Detects pods receiving large amounts of network traffic over the last 24 hours. | Medium | 2 |
Nodes¶
| ID | Name | Description | Severity | Weight |
|---|---|---|---|---|
| NODE001 | Node Readiness | Nodes not ready or with critical conditions. | High | 3 |
| NODE002 | Node Resource Pressure | High usage of CPU, memory, or disk. | High | 3 |
| NODE003 | Max Pods per Node | Node pod count exceeds configured threshold. | Warning | 2 |
Control Plane¶
| ID | Name | Description | Severity | Weight |
|---|---|---|---|---|
| PROM004 | API Server High Latency | Detects high latency in Kubernetes API server requests over 24h. | High | 5 |
Capacity¶
| ID | Name | Description | Severity | Weight |
|---|---|---|---|---|
| PROM005 | Overcommitted CPU (Prometheus) | Checks if CPU requests on nodes exceed allocatable capacity over the last 24 hours. | Info | 2 |
| PROM006 | Node Sizing Insights (Prometheus) | Uses Prometheus p95 CPU and memory usage to identify underutilized or saturated nodes and suggest sizing actions. | Info | 3 |
| PROM007 | Pod Sizing Insights (Prometheus) | Uses 7-day p95 per-container CPU/memory usage to recommend right-sized requests and memory limits. CPU limit recommendation defaults to none. |
Info | 4 |
Namespaces¶
| ID | Name | Description | Severity | Weight |
|---|---|---|---|---|
| NS001 | Empty Namespaces | No resources; can be cleaned up. | Low | 1 |
| NS002 | Weak or Missing ResourceQuotas | No quotas or soft limits; risks resource overuse. | Medium | 2 |
| NS003 | Missing LimitRanges | No resource caps; enables excessive use. | Medium | 2 |
Pods¶
| ID | Name | Description | Severity | Weight |
|---|---|---|---|---|
| POD001 | High Restart Count | Pods restarting too often; indicates instability. | Medium | 2 |
| POD002 | Long Running Pods | Pods running longer than expected. | Medium | 2 |
| POD003 | Failed Pods | Pods in failed state. | High | 3 |
| POD004 | Pending Pods | Pods stuck Pending—often resource/scheduling issues. | Medium | 2 |
| POD005 | CrashLoopBackOff | Pods stuck restarting in CrashLoopBackOff. | High | 3 |
| POD006 | Leftover Debug Pods | Debug pods not cleaned up. | Medium | 2 |
| POD007 | Images Using latest Tag |
Risk of inconsistent deployments due to floating tags. | Low | 1 |
RBAC¶
| ID | Name | Description | Severity | Weight |
|---|---|---|---|---|
| RBAC001 | Misconfigurations | Missing or incorrect role bindings. | High | 3 |
| RBAC002 | Overexposed Roles | Roles with overly broad permissions. | High | 3 |
| RBAC003 | Orphaned ServiceAccounts | Not in use; can be removed. | Medium | 2 |
| RBAC004 | Ineffective Roles | Unused roles cluttering the system. | Medium | 2 |
Security¶
| ID | Name | Description | Severity | Weight |
|---|---|---|---|---|
| SEC001 | Orphaned Secrets | Not used; safe to delete. | Medium | 2 |
| SEC002 | hostPID/hostNetwork Usage | Shared host namespaces increase risk. | High | 3 |
| SEC003 | Pods Running as Root | Containers should avoid root for security. | High | 3 |
| SEC004 | Privileged Containers | Grants unnecessary access. | High | 3 |
| SEC005 | hostIPC Usage | Sharing IPC namespace with host is a security risk. | Medium | 2 |
| SEC006 | Pods Missing Secure Defaults | Missing recommended securityContext fields (e.g. runAsNonRoot). |
Medium | 3 |
| SEC007 | Missing Pod Security Admission Labels | Namespaces lacking pod-security.kubernetes.io/enforce labels. |
Low | 1 |
| SEC008 | Secrets in Environment Variables | Exposed via env.valueFrom.secretKeyRef; can leak via logs or /proc. |
High | 4 |
| SEC009 | Missing Capabilities Drop | Containers not dropping all capabilities. | Medium | 3 |
| SEC010 | HostPath Volume Usage | Use of hostPath volumes exposes host filesystem. |
High | 3 |
| SEC011 | Containers Running as UID 0 | Explicit runAsUser: 0 even with securityContext. |
High | 3 |
| SEC012 | Added Linux Capabilities | Use of extra Linux capabilities via securityContext.capabilities.add. |
Medium | 2 |
| SEC013 | EmptyDir Volume Usage | emptyDir volumes are non-persistent and cleared on restart. |
Low | 1 |
| SEC014 | Untrusted Image Registries | Pulling from unapproved registries. | High | 3 |
| SEC015 | Host Ports in Pod Specs | Detects containers that bind host ports directly on the node. | High | 4 |
| SEC016 | Unconfined Seccomp Profiles | Detects pod or container seccomp profiles explicitly set to Unconfined. |
High | 4 |
| SEC017 | Non-Default ProcMount | Detects containers that set a procMount value other than Default. |
High | 3 |
| SEC018 | Disallowed Sysctls | Detects sysctls outside the Kubernetes baseline allowlist. | High | 3 |
| SEC019 | Unsupported AppArmor Values | Detects unsupported AppArmor annotations or structured profile types. | High | 2 |
| SEC020 | Seccomp Profile Not Configured | Detects pods and containers without an explicit seccomp profile. | Warning | 2 |
Storage¶
| ID | Name | Description | Severity | Weight |
|---|---|---|---|---|
| PVC001 | Unused PVCs | Not mounted or bound; can be deleted. | Medium | 2 |
| PV001 | Orphaned Persistent Volumes | Detects Persistent Volumes that are not bound to any Persistent Volume Claim. | Warning | 3 |
| PVC001 | Unused Persistent Volume Claims | Detects PVCs not attached to any pod. | Warning | 2 |
| PVC002 | PVCs Using Default StorageClass | Detects PVCs that do not explicitly specify a storageClassName. | Low | 1 |
| PVC003 | ReadWriteMany PVCs on Incompatible Storage | Detects PVCs requesting ReadWriteMany access mode where the underlying storage is typically block-based and does not support concurrent writes from multiple nodes. | High | 5 |
| PVC004 | Unbound Persistent Volume Claims | Detects Persistent Volume Claims that are in a Pending phase and have not been bound to a Persistent Volume. | High | 3 |
| SC001 | Deprecated StorageClass Provisioners | Detects StorageClasses using deprecated or legacy in-tree provisioners, which should be migrated to CSI drivers. | High | 4 |
| SC002 | StorageClass Prevents Volume Expansion | Identifies StorageClasses that do not permit volume expansion, which can limit dynamic scaling of stateful applications. | Medium | 2 |
| SC003 | High Cluster Storage Usage | Monitors the overall percentage of used storage across the cluster. | Warning | 4 |
Workloads¶
| ID | Name | Description | Severity | Weight |
|---|---|---|---|---|
| WRK001 | DaemonSets Not Fully Running | Some pods unscheduled or not ready. | High | 2 |
| WRK002 | Deployment Missing Replicas | Fewer replicas than specified. | High | 2 |
| WRK003 | Incomplete StatefulSet Rollout | Rollout not finished; may cause issues. | Medium | 2 |
| WRK004 | HPA Misconfig or Inactivity | HPA not working or pointing to nothing. | Medium | 2 |
| WRK005 | Missing Resource Requests | Missing CPU or memory requests on one or more containers. | High | 3 |
| WRK006 | PodDisruptionBudget Coverage | Missing or misconfigured PDBs. | Medium | 2 |
| WRK007 | Missing Health Probes | No liveness or readiness probes; risks silent failures. | Medium | 2 |
| WRK008 | Deployment Selector Without Matching Pods | Selectors that don’t match any pods, leading to 0 replicas. | Medium | 2 |
| WRK009 | Deployment, Pod, and Service Label Consistency | Mismatched labels between Deployments, Pods, or Services; breaks routing. | Medium | 3 |
| WRK010 | HPA Metrics Without Matching Resource Requests | HPAs scale on CPU/memory while target containers miss matching requests. | Warning | 3 |
| WRK011 | VPA Update Mode and Declarative Resource Conflict Risk | Flags VPA Auto/Recreate targets likely to conflict with declarative ownership or HPA. | Warning | 2 |
| WRK012 | PodDisruptionBudget Adequacy for Replicated Workloads | Detects missing/overly strict/overly permissive PDB settings on 2+ replica workloads. | Warning | 2 |
| WRK013 | CrashLoopBackOff and OOMKilled Guardrail | Highlights unstable pods to guard against unsafe right-sizing decisions. | Critical | 3 |
| WRK014 | Missing Memory Limits | Detects workloads whose containers do not define a memory limit. | Warning | 2 |
| WRK015 | Replicated Workloads Missing Spread Constraints | Detects Deployments or StatefulSets with 2+ replicas that define neither anti-affinity nor topology spread constraints. | Warning | 3 |
Usage Notes¶
-
Severity
-
Low: Cosmetic or cleanup
- Medium: May affect performance or reliability
- High: Causes downtime or poses security risk
-
Warning/Info: Advisory thresholds
-
Weight
-
Scores range 1 (low impact) to 5 (high impact)
-
Higher weight = greater effect on cluster score
-
Interpreting Reports
- Passed: no items listed
- Failed: lists affected resources + suggested fixes
- Click IDs to jump to detailed recommendations in the report
AKS Automatic Readiness Metadata¶
Some shared checks now also carry AKS Automatic-specific metadata used to derive the AKS Automatic migration readiness view in reports.
AutomaticRelevanceblockerwarningAutomaticScopeworkloadclusterplatformAutomaticReason- used to group remediation actions in the AKS Automatic action plan
AutomaticAdmissionBehaviordenies_on_enforcewarns_onlymutates_on_enforceAutomaticMutationOutcome- optional observed/documented AKS Automatic behavior note shown in HTML, text, and JSON outputs
This metadata does not create a separate check engine. It allows KubeBuddy to derive AKS Automatic readiness from the same shared checks used elsewhere in the product.
In the AKS Automatic standalone action plan, this metadata is also used to:
- split actions into blocker-driven and warning-driven migration sections
- build per-action migration cards instead of a single wide table
- group structured affected resources into namespace/workload/resource/Helm tables
- attach manifest examples for common remediation themes