π Prometheus Integration¶
KubeBuddy can enrich its cluster health reports by querying Prometheus directly, whether running in-cluster or as an external endpoint.
π Why Integrate Prometheus?¶
By pulling time-series data you can detect:
- API server latency (p99)
- Node/pod CPU & memory usage
- Pod restart patterns
- Disk, network and capacity pressure
- Node sizing opportunities (underutilized vs saturated nodes using p95 trends)
- Pod/container sizing opportunities (p95-based request and memory limit recommendations)
β Supported Prometheus Modes¶
| Mode | Description | Auth Required | Typical Use Case |
|---|---|---|---|
local |
In-cluster Prometheus (e.g. kube-prometheus-stack) | β | No auth needed inside the cluster |
basic |
External Prometheus with HTTP Basic auth | β | Behind an ingress or firewall |
bearer |
External Prometheus secured by bearer token | β | OAuth proxy, API gateway, etc. |
azure |
Azure Monitor Managed Prometheus (AKS + Monitor) | β AAD token | AKS + Azure Monitor workspace |
π How to Authenticate¶
Local (no auth)¶
Invoke-KubeBuddy `
-HtmlReport `
-IncludePrometheus `
-PrometheusUrl "http://prometheus.monitoring.svc:9090" `
-PrometheusMode local
````
### Basic Auth
```powershell
Invoke-KubeBuddy `
-IncludePrometheus `
-PrometheusUrl "https://prom.example.com" `
-PrometheusMode basic `
-PrometheusUsername "admin" `
-PrometheusPassword "s3cr3t"
Bearer Token¶
$env:PROMETHEUS_TOKEN = "<your-token>"
Invoke-KubeBuddy `
-IncludePrometheus `
-PrometheusUrl "https://prom.example.com" `
-PrometheusMode bearer `
-PrometheusBearerTokenEnv PROMETHEUS_TOKEN
Azure Monitor (AAD)¶
# Ensure AZURE_CLIENT_ID / SECRET / TENANT_ID are set
Invoke-KubeBuddy `
-IncludePrometheus `
-PrometheusUrl "https://<workspace>.prometheus.monitor.azure.com" `
-PrometheusMode azure
π§ͺ Example Query¶
p99 API-server latency over last hour
histogram_quantile(0.99, rate(apiserver_request_duration_seconds_bucket[5m]))
β±οΈ Time-Window Configuration¶
Rather than being fixed, the look-back window is now driven by your YAMLβs Range.Duration. You can specify minutes (m), hours (h), or days (d):
Prometheus:
Query: 'sum(rate(container_cpu_usage_seconds_total{container!="",pod!=""}[5m])) by (pod)'
Range:
Step: "5m"
Duration: "24h" # supports "m"=minutes, "h"=hours, "d"=days
KubeBuddy will translate that into start = now - 24h (or 30m, or 2d, etc.) automatically.
βΆοΈ CLI Usage¶
Use any combination of report outputs:
# HTML report with Prometheus
Invoke-KubeBuddy `
-HtmlReport `
-IncludePrometheus `
-PrometheusUrl "https://prometheus.example.com" `
-PrometheusMode basic `
-PrometheusUsername "admin" `
-PrometheusPassword "s3cr3t" `
-OutputPath "C:\reports\cluster.html"
# Text report with Prometheus
Invoke-KubeBuddy `
-txtReport `
-IncludePrometheus `
-PrometheusUrl "http://prometheus.monitoring.svc:9090" `
-PrometheusMode local `
-OutputPath "/home/user/kube.txt"
# JSON report, Azure Monitor mode
Invoke-KubeBuddy `
-jsonReport `
-IncludePrometheus `
-PrometheusUrl "https://<workspace>.prometheus.monitor.azure.com" `
-PrometheusMode azure `
-OutputPath "/reports/cluster.json"
π Node Sizing Insights¶
When Prometheus integration is enabled, KubeBuddy runs PROM006 and classifies each node using fixed 7-day p95 CPU/memory usage:
Underutilized: candidate for smaller SKU or scale-inRight-sized: keep current sizingSaturated: candidate for larger SKU or scale-out
PROM006 now also includes:
- Current Allocatable (vCPU/Gi) from node allocatable capacity
- Suggested Target Capacity (vCPU/Gi) estimated from p95 utilization with safety headroom
Minimum data rule: - KubeBuddy requires at least 7 days of Prometheus history before emitting node sizing recommendations. - If history is below 7 days, reports include an explicit Insufficient Prometheus history row instead of recommendations.
The check surfaces in the Nodes tab and in JSON/text output like any other check.
In HTML reports, Overview now includes a Rightsizing at a Glance section that summarizes:
- Node sizing distribution (Underutilized / Saturated / Right-sized)
- Pod sizing action counts from PROM007
- Impact buckets and quick links to PROM006 and PROM007
Optional Threshold Overrides¶
You can tune the classification in ~/.kube/kubebuddy-config.yaml:
thresholds:
node_sizing_downsize_cpu_p95: 35
node_sizing_downsize_mem_p95: 40
node_sizing_upsize_cpu_p95: 80
node_sizing_upsize_mem_p95: 85
π¦ Pod Sizing Insights¶
When Prometheus integration is enabled, KubeBuddy also runs PROM007 for per-container recommendations using fixed 7-day p95 usage:
- CPU request recommendation (millicores)
- Memory request recommendation (MiB)
- Memory limit recommendation (MiB)
- CPU limit recommendation defaults to
none
Minimum data rule: - KubeBuddy requires at least 7 days of Prometheus history before emitting pod sizing recommendations. - If history is below 7 days, reports include an explicit Insufficient Prometheus history row instead of recommendations.
Why CPU limit defaults to none¶
By default, KubeBuddy recommends no CPU limit because:
- CPU is compressible; requests already control fair scheduling.
- Hard CPU limits can trigger CFS throttling and add latency jitter.
- In many production workloads, setting requests (without limits) gives better tail latency.
Set CPU limits only when strict tenant caps are required.
Optional Pod Sizing Threshold Overrides¶
thresholds:
pod_sizing_profile: balanced # conservative|balanced|aggressive
pod_sizing_compare_profiles: true # HTML/JSON include all 3 profiles by default
pod_sizing_target_cpu_utilization: 65
pod_sizing_target_mem_utilization: 75
pod_sizing_cpu_request_floor_mcores: 25
pod_sizing_mem_request_floor_mib: 128
pod_sizing_mem_limit_buffer_percent: 20
Profile behavior:
- conservative: higher requests/floors (more headroom)
- balanced: default behavior (CPU floor: 25m)
- aggressive: lower requests/floors (higher packing efficiency, CPU floor: 10m)
Comparison mode:
- pod_sizing_compare_profiles is enabled by default to emit all three profile results in JSON and HTML.
- Set pod_sizing_compare_profiles: false if you want only the active profile.
- HTML report includes a profile selector on PROM007 findings so you can switch between profiles.
- Text/CLI remain focused on the single active profile.
π³ Docker Usage with Prometheus¶
For full Docker details, see the Docker Usage guide. Hereβs a minimal Prometheus-enabled example:
export tagId="v0.0.19"
docker run -it --rm \
-e KUBECONFIG="/home/kubeuser/.kube/config" \
-e HTML_REPORT="true" \
-e INCLUDE_PROMETHEUS="true" \
-e PROMETHEUS_URL="https://prom.example.com" \
-e PROMETHEUS_MODE="basic" \
-e PROMETHEUS_USERNAME="admin" \
-e PROMETHEUS_PASSWORD="s3cr3t" \
-v $HOME/.kube/config:/tmp/kubeconfig-original:ro \
-v $HOME/kubebuddy-report:/app/Reports \
ghcr.io/kubedeckio/kubebuddy:$tagId