This is the multi-page printable view of this section. Click here to print.
Instrumentation
- 1: Kubernetes Component SLI Metrics
- 2: CRI Pod & Container Metrics
- 3: Node metrics data
- 4: Kubernetes Metrics Reference
1 - Kubernetes Component SLI Metrics
Kubernetes v1.27 [beta]
By default, Kubernetes 1.27 publishes Service Level Indicator (SLI) metrics
for each Kubernetes component binary. This metric endpoint is exposed on the serving
HTTPS port of each component, at the path /metrics/slis
. The
ComponentSLIs
feature gate
defaults to enabled for each Kubernetes component as of v1.27.
SLI Metrics
With SLI metrics enabled, each Kubernetes component exposes two metrics, labeled per healthcheck:
- a gauge (which represents the current state of the healthcheck)
- a counter (which records the cumulative counts observed for each healthcheck state)
You can use the metric information to calculate per-component availability statistics. For example, the API server checks the health of etcd. You can work out and report how available or unavailable etcd has been - as reported by its client, the API server.
The prometheus gauge data looks like this:
# HELP kubernetes_healthcheck [ALPHA] This metric records the result of a single healthcheck.
# TYPE kubernetes_healthcheck gauge
kubernetes_healthcheck{name="autoregister-completion",type="healthz"} 1
kubernetes_healthcheck{name="autoregister-completion",type="readyz"} 1
kubernetes_healthcheck{name="etcd",type="healthz"} 1
kubernetes_healthcheck{name="etcd",type="readyz"} 1
kubernetes_healthcheck{name="etcd-readiness",type="readyz"} 1
kubernetes_healthcheck{name="informer-sync",type="readyz"} 1
kubernetes_healthcheck{name="log",type="healthz"} 1
kubernetes_healthcheck{name="log",type="readyz"} 1
kubernetes_healthcheck{name="ping",type="healthz"} 1
kubernetes_healthcheck{name="ping",type="readyz"} 1
While the counter data looks like this:
# HELP kubernetes_healthchecks_total [ALPHA] This metric records the results of all healthcheck.
# TYPE kubernetes_healthchecks_total counter
kubernetes_healthchecks_total{name="autoregister-completion",status="error",type="readyz"} 1
kubernetes_healthchecks_total{name="autoregister-completion",status="success",type="healthz"} 15
kubernetes_healthchecks_total{name="autoregister-completion",status="success",type="readyz"} 14
kubernetes_healthchecks_total{name="etcd",status="success",type="healthz"} 15
kubernetes_healthchecks_total{name="etcd",status="success",type="readyz"} 15
kubernetes_healthchecks_total{name="etcd-readiness",status="success",type="readyz"} 15
kubernetes_healthchecks_total{name="informer-sync",status="error",type="readyz"} 1
kubernetes_healthchecks_total{name="informer-sync",status="success",type="readyz"} 14
kubernetes_healthchecks_total{name="log",status="success",type="healthz"} 15
kubernetes_healthchecks_total{name="log",status="success",type="readyz"} 15
kubernetes_healthchecks_total{name="ping",status="success",type="healthz"} 15
kubernetes_healthchecks_total{name="ping",status="success",type="readyz"} 15
Using this data
The component SLIs metrics endpoint is intended to be scraped at a high frequency. Scraping
at a high frequency means that you end up with greater granularity of the gauge's signal, which
can be then used to calculate SLOs. The /metrics/slis
endpoint provides the raw data necessary
to calculate an availability SLO for the respective Kubernetes component.
2 - CRI Pod & Container Metrics
Kubernetes v1.23 [alpha]
The kubelet collects pod and
container metrics via cAdvisor. As an alpha feature,
Kubernetes lets you configure the collection of pod and container
metrics via the Container Runtime Interface (CRI). You
must enable the PodAndContainerStatsFromCRI
feature gate and
use a compatible CRI implementation (containerd >= 1.6.0, CRI-O >= 1.23.0) to
use the CRI based collection mechanism.
CRI Pod & Container Metrics
With PodAndContainerStatsFromCRI
enabled, the kubelet polls the underlying container
runtime for pod and container stats instead of inspecting the host system directly using cAdvisor.
The benefits of relying on the container runtime for this information as opposed to direct
collection with cAdvisor include:
-
Potential improved performance if the container runtime already collects this information during normal operations. In this case, the data can be re-used instead of being aggregated again by the kubelet.
-
It further decouples the kubelet and the container runtime allowing collection of metrics for container runtimes that don't run processes directly on the host with kubelet where they are observable by cAdvisor (for example: container runtimes that use virtualization).
3 - Node metrics data
The kubelet gathers metric statistics at the node, volume, pod and container level, and emits this information in the Summary API.
You can send a proxied request to the stats summary API via the Kubernetes API server.
Here is an example of a Summary API request for a node named minikube
:
kubectl get --raw "/api/v1/nodes/minikube/proxy/stats/summary"
Here is the same API call using curl
:
# You need to run "kubectl proxy" first
# Change 8080 to the port that "kubectl proxy" assigns
curl http://localhost:8080/api/v1/nodes/minikube/proxy/stats/summary
metrics-server
0.6.x, metrics-server
queries the /metrics/resource
kubelet endpoint, and not /stats/summary
.
Summary metrics API source
By default, Kubernetes fetches node summary metrics data using an embedded
cAdvisor that runs within the kubelet. If you
enable the PodAndContainerStatsFromCRI
feature gate
in your cluster, and you use a container runtime that supports statistics access via
Container Runtime Interface (CRI), then
the kubelet fetches Pod- and container-level metric data using CRI, and not via cAdvisor.
What's next
The task pages for Troubleshooting Clusters discuss how to use a metrics pipeline that rely on these data.
4 - Kubernetes Metrics Reference
Metrics (v1.27)
This page details the metrics that different Kubernetes components export. You can query the metrics endpoint for these components using an HTTP scrape, and fetch the current metrics data in Prometheus format.
List of Stable Kubernetes Metrics
Stable metrics observe strict API contracts and no labels can be added or removed from stable metrics during their lifetime.
Name | Stability Level | Type | Help | Labels | Const Labels | Deprecated Version |
---|---|---|---|---|---|---|
apiserver_admission_controller_admission_duration_seconds | STABLE | Histogram | Admission controller latency histogram in seconds, identified by name and broken out for each operation and API resource and type (validate or admit). | name operation rejected type |
||
apiserver_admission_step_admission_duration_seconds | STABLE | Histogram | Admission sub-step latency histogram in seconds, broken out for each operation and API resource and step type (validate or admit). | operation rejected type |
||
apiserver_admission_webhook_admission_duration_seconds | STABLE | Histogram | Admission webhook latency histogram in seconds, identified by name and broken out for each operation and API resource and type (validate or admit). | name operation rejected type |
||
apiserver_current_inflight_requests | STABLE | Gauge | Maximal number of currently used inflight request limit of this apiserver per request kind in last second. | request_kind |
||
apiserver_longrunning_requests | STABLE | Gauge | Gauge of all active long-running apiserver requests broken out by verb, group, version, resource, scope and component. Not all requests are tracked this way. | component group resource scope subresource verb version |
||
apiserver_request_duration_seconds | STABLE | Histogram | Response latency distribution in seconds for each verb, dry run value, group, version, resource, subresource, scope and component. | component dry_run group resource scope subresource verb version |
||
apiserver_request_total | STABLE | Counter | Counter of apiserver requests broken out for each verb, dry run value, group, version, resource, scope, component, and HTTP response code. | code component dry_run group resource scope subresource verb version |
||
apiserver_requested_deprecated_apis | STABLE | Gauge | Gauge of deprecated APIs that have been requested, broken out by API group, version, resource, subresource, and removed_release. | group removed_release resource subresource version |
||
apiserver_response_sizes | STABLE | Histogram | Response size distribution in bytes for each group, version, verb, resource, subresource, scope and component. | component group resource scope subresource verb version |
||
apiserver_storage_objects | STABLE | Gauge | Number of stored objects at the time of last check split by kind. | resource |
||
cronjob_controller_job_creation_skew_duration_seconds | STABLE | Histogram | Time between when a cronjob is scheduled to be run, and when the corresponding job is created | |||
job_controller_job_pods_finished_total | STABLE | Counter | The number of finished Pods that are fully tracked | completion_mode result |
||
job_controller_job_sync_duration_seconds | STABLE | Histogram | The time it took to sync a job | action completion_mode result |
||
job_controller_job_syncs_total | STABLE | Counter | The number of job syncs | action completion_mode result |
||
job_controller_jobs_finished_total | STABLE | Counter | The number of finished jobs | completion_mode reason result |
||
kube_pod_resource_limit | STABLE | Custom | Resources limit for workloads on the cluster, broken down by pod. This shows the resource usage the scheduler and kubelet expect per pod for resources along with the unit for the resource if any. | namespace pod node scheduler priority resource unit |
||
kube_pod_resource_request | STABLE | Custom | Resources requested by workloads on the cluster, broken down by pod. This shows the resource usage the scheduler and kubelet expect per pod for resources along with the unit for the resource if any. | namespace pod node scheduler priority resource unit |
||
node_collector_evictions_total | STABLE | Counter | Number of Node evictions that happened since current instance of NodeController started. | zone |
||
scheduler_framework_extension_point_duration_seconds | STABLE | Histogram | Latency for running all plugins of a specific extension point. | extension_point profile status |
||
scheduler_pending_pods | STABLE | Gauge | Number of pending pods, by the queue type. 'active' means number of pods in activeQ; 'backoff' means number of pods in backoffQ; 'unschedulable' means number of pods in unschedulablePods that the scheduler attempted to schedule and failed; 'gated' is the number of unschedulable pods that the scheduler never attempted to schedule because they are gated. | queue |
||
scheduler_pod_scheduling_attempts | STABLE | Histogram | Number of attempts to successfully schedule a pod. | |||
scheduler_pod_scheduling_duration_seconds | STABLE | Histogram | E2e latency for a pod being scheduled which may include multiple scheduling attempts. | attempts |
||
scheduler_preemption_attempts_total | STABLE | Counter | Total preemption attempts in the cluster till now | |||
scheduler_preemption_victims | STABLE | Histogram | Number of selected preemption victims | |||
scheduler_queue_incoming_pods_total | STABLE | Counter | Number of pods added to scheduling queues by event and queue type. | event queue |
||
scheduler_schedule_attempts_total | STABLE | Counter | Number of attempts to schedule pods, by the result. 'unschedulable' means a pod could not be scheduled, while 'error' means an internal scheduler problem. | profile result |
||
scheduler_scheduling_attempt_duration_seconds | STABLE | Histogram | Scheduling attempt latency in seconds (scheduling algorithm + binding) | profile result |
List of Beta Kubernetes Metrics
Beta metrics observe a looser API contract than its stable counterparts. No labels can be removed from beta metrics during their lifetime, however, labels can be added while the metric is in the beta stage. This offers the assurance that beta metrics will honor existing dashboards and alerts, while allowing for amendments in the future.
Name | Stability Level | Type | Help | Labels | Const Labels | Deprecated Version |
---|
List of Alpha Kubernetes Metrics
Alpha metrics do not have any API guarantees. These metrics must be used at your own risk, subsequent versions of Kubernetes may remove these metrics altogether, or mutate the API in such a way that breaks existing dashboards and alerts.
Name | Stability Level | Type | Help | Labels | Const Labels | Deprecated Version |
---|---|---|---|---|---|---|
aggregator_discovery_aggregation_count_total | ALPHA | Counter | Counter of number of times discovery was aggregated | |||
aggregator_openapi_v2_regeneration_count | ALPHA | Counter | Counter of OpenAPI v2 spec regeneration count broken down by causing APIService name and reason. | apiservice reason |
||
aggregator_openapi_v2_regeneration_duration | ALPHA | Gauge | Gauge of OpenAPI v2 spec regeneration duration in seconds. | reason |
||
aggregator_unavailable_apiservice | ALPHA | Custom | Gauge of APIServices which are marked as unavailable broken down by APIService name. | name |
||
aggregator_unavailable_apiservice_total | ALPHA | Counter | Counter of APIServices which are marked as unavailable broken down by APIService name and reason. | name reason |
||
apiextensions_openapi_v2_regeneration_count | ALPHA | Counter | Counter of OpenAPI v2 spec regeneration count broken down by causing CRD name and reason. | crd reason |
||
apiextensions_openapi_v3_regeneration_count | ALPHA | Counter | Counter of OpenAPI v3 spec regeneration count broken down by group, version, causing CRD and reason. | crd group reason version |
||
apiserver_admission_admission_match_condition_evaluation_errors_total | ALPHA | Counter | Admission match condition evaluation errors count, identified by name of resource containing the match condition and broken out for each admission type (validating or mutating). | name type |
||
apiserver_admission_step_admission_duration_seconds_summary | ALPHA | Summary | Admission sub-step latency summary in seconds, broken out for each operation and API resource and step type (validate or admit). | operation rejected type |
||
apiserver_admission_webhook_fail_open_count | ALPHA | Counter | Admission webhook fail open count, identified by name and broken out for each admission type (validating or mutating). | name type |
||
apiserver_admission_webhook_rejection_count | ALPHA | Counter | Admission webhook rejection count, identified by name and broken out for each admission type (validating or admit) and operation. Additional labels specify an error type (calling_webhook_error or apiserver_internal_error if an error occurred; no_error otherwise) and optionally a non-zero rejection code if the webhook rejects the request with an HTTP status code (honored by the apiserver when the code is greater or equal to 400). Codes greater than 600 are truncated to 600, to keep the metrics cardinality bounded. | error_type name operation rejection_code type |
||
apiserver_admission_webhook_request_total | ALPHA | Counter | Admission webhook request total, identified by name and broken out for each admission type (validating or mutating) and operation. Additional labels specify whether the request was rejected or not and an HTTP status code. Codes greater than 600 are truncated to 600, to keep the metrics cardinality bounded. | code name operation rejected type |
||
apiserver_audit_error_total | ALPHA | Counter | Counter of audit events that failed to be audited properly. Plugin identifies the plugin affected by the error. | plugin |
||
apiserver_audit_event_total | ALPHA | Counter | Counter of audit events generated and sent to the audit backend. | |||
apiserver_audit_level_total | ALPHA | Counter | Counter of policy levels for audit events (1 per request). | level |
||
apiserver_audit_requests_rejected_total | ALPHA | Counter | Counter of apiserver requests rejected due to an error in audit logging backend. | |||
apiserver_cache_list_fetched_objects_total | ALPHA | Counter | Number of objects read from watch cache in the course of serving a LIST request | index resource_prefix |
||
apiserver_cache_list_returned_objects_total | ALPHA | Counter | Number of objects returned for a LIST request from watch cache | resource_prefix |
||
apiserver_cache_list_total | ALPHA | Counter | Number of LIST requests served from watch cache | index resource_prefix |
||
apiserver_cel_compilation_duration_seconds | ALPHA | Histogram | CEL compilation time in seconds. | |||
apiserver_cel_evaluation_duration_seconds | ALPHA | Histogram | CEL evaluation time in seconds. | |||
apiserver_certificates_registry_csr_honored_duration_total | ALPHA | Counter | Total number of issued CSRs with a requested duration that was honored, sliced by signer (only kubernetes.io signer names are specifically identified) | signerName |
||
apiserver_certificates_registry_csr_requested_duration_total | ALPHA | Counter | Total number of issued CSRs with a requested duration, sliced by signer (only kubernetes.io signer names are specifically identified) | signerName |
||
apiserver_client_certificate_expiration_seconds | ALPHA | Histogram | Distribution of the remaining lifetime on the certificate used to authenticate a request. | |||
apiserver_crd_webhook_conversion_duration_seconds | ALPHA | Histogram | CRD webhook conversion duration in seconds | crd_name from_version succeeded to_version |
||
apiserver_current_inqueue_requests | ALPHA | Gauge | Maximal number of queued requests in this apiserver per request kind in last second. | request_kind |
||
apiserver_delegated_authn_request_duration_seconds | ALPHA | Histogram | Request latency in seconds. Broken down by status code. | code |
||
apiserver_delegated_authn_request_total | ALPHA | Counter | Number of HTTP requests partitioned by status code. | code |
||
apiserver_delegated_authz_request_duration_seconds | ALPHA | Histogram | Request latency in seconds. Broken down by status code. | code |
||
apiserver_delegated_authz_request_total | ALPHA | Counter | Number of HTTP requests partitioned by status code. | code |
||
apiserver_egress_dialer_dial_duration_seconds | ALPHA | Histogram | Dial latency histogram in seconds, labeled by the protocol (http-connect or grpc), transport (tcp or uds) | protocol transport |
||
apiserver_egress_dialer_dial_failure_count | ALPHA | Counter | Dial failure count, labeled by the protocol (http-connect or grpc), transport (tcp or uds), and stage (connect or proxy). The stage indicates at which stage the dial failed | protocol stage transport |
||
apiserver_egress_dialer_dial_start_total | ALPHA | Counter | Dial starts, labeled by the protocol (http-connect or grpc) and transport (tcp or uds). | protocol transport |
||
apiserver_envelope_encryption_dek_cache_fill_percent | ALPHA | Gauge | Percent of the cache slots currently occupied by cached DEKs. | |||
apiserver_envelope_encryption_dek_cache_inter_arrival_time_seconds | ALPHA | Histogram | Time (in seconds) of inter arrival of transformation requests. | transformation_type |
||
apiserver_envelope_encryption_invalid_key_id_from_status_total | ALPHA | Counter | Number of times an invalid keyID is returned by the Status RPC call split by error. | error provider_name |
||
apiserver_envelope_encryption_key_id_hash_last_timestamp_seconds | ALPHA | Gauge | The last time in seconds when a keyID was used. | key_id_hash provider_name transformation_type |
||
apiserver_envelope_encryption_key_id_hash_status_last_timestamp_seconds | ALPHA | Gauge | The last time in seconds when a keyID was returned by the Status RPC call. | key_id_hash provider_name |
||
apiserver_envelope_encryption_key_id_hash_total | ALPHA | Counter | Number of times a keyID is used split by transformation type and provider. | key_id_hash provider_name transformation_type |
||
apiserver_envelope_encryption_kms_operations_latency_seconds | ALPHA | Histogram | KMS operation duration with gRPC error code status total. | grpc_status_code method_name provider_name |
||
apiserver_flowcontrol_current_executing_requests | ALPHA | Gauge | Number of requests in initial (for a WATCH) or any (for a non-WATCH) execution stage in the API Priority and Fairness subsystem | flow_schema priority_level |
||
apiserver_flowcontrol_current_inqueue_requests | ALPHA | Gauge | Number of requests currently pending in queues of the API Priority and Fairness subsystem | flow_schema priority_level |
||
apiserver_flowcontrol_current_limit_seats | ALPHA | Gauge | current derived number of execution seats available to each priority level | priority_level |
||
apiserver_flowcontrol_current_r | ALPHA | Gauge | R(time of last change) | priority_level |
||
apiserver_flowcontrol_demand_seats | ALPHA | TimingRatioHistogram | Observations, at the end of every nanosecond, of (the number of seats each priority level could use) / (nominal number of seats for that level) | priority_level |
||
apiserver_flowcontrol_demand_seats_average | ALPHA | Gauge | Time-weighted average, over last adjustment period, of demand_seats | priority_level |
||
apiserver_flowcontrol_demand_seats_high_watermark | ALPHA | Gauge | High watermark, over last adjustment period, of demand_seats | priority_level |
||
apiserver_flowcontrol_demand_seats_smoothed | ALPHA | Gauge | Smoothed seat demands | priority_level |
||
apiserver_flowcontrol_demand_seats_stdev | ALPHA | Gauge | Time-weighted standard deviation, over last adjustment period, of demand_seats | priority_level |
||
apiserver_flowcontrol_dispatch_r | ALPHA | Gauge | R(time of last dispatch) | priority_level |
||
apiserver_flowcontrol_dispatched_requests_total | ALPHA | Counter | Number of requests executed by API Priority and Fairness subsystem | flow_schema priority_level |
||
apiserver_flowcontrol_epoch_advance_total | ALPHA | Counter | Number of times the queueset's progress meter jumped backward | priority_level success |
||
apiserver_flowcontrol_latest_s | ALPHA | Gauge | S(most recently dispatched request) | priority_level |
||
apiserver_flowcontrol_lower_limit_seats | ALPHA | Gauge | Configured lower bound on number of execution seats available to each priority level | priority_level |
||
apiserver_flowcontrol_next_discounted_s_bounds | ALPHA | Gauge | min and max, over queues, of S(oldest waiting request in queue) - estimated work in progress | bound priority_level |
||
apiserver_flowcontrol_next_s_bounds | ALPHA | Gauge | min and max, over queues, of S(oldest waiting request in queue) | bound priority_level |
||
apiserver_flowcontrol_nominal_limit_seats | ALPHA | Gauge | Nominal number of execution seats configured for each priority level | priority_level |
||
apiserver_flowcontrol_priority_level_request_utilization | ALPHA | TimingRatioHistogram | Observations, at the end of every nanosecond, of number of requests (as a fraction of the relevant limit) waiting or in any stage of execution (but only initial stage for WATCHes) | phase priority_level |
||
apiserver_flowcontrol_priority_level_seat_utilization | ALPHA | TimingRatioHistogram | Observations, at the end of every nanosecond, of utilization of seats for any stage of execution (but only initial stage for WATCHes) | priority_level |
phase:executing |
|
apiserver_flowcontrol_read_vs_write_current_requests | ALPHA | TimingRatioHistogram | Observations, at the end of every nanosecond, of the number of requests (as a fraction of the relevant limit) waiting or in regular stage of execution | phase request_kind |
||
apiserver_flowcontrol_rejected_requests_total | ALPHA | Counter | Number of requests rejected by API Priority and Fairness subsystem | flow_schema priority_level reason |
||
apiserver_flowcontrol_request_concurrency_in_use | ALPHA | Gauge | Concurrency (number of seats) occupied by the currently executing (initial stage for a WATCH, any stage otherwise) requests in the API Priority and Fairness subsystem | flow_schema priority_level |
||
apiserver_flowcontrol_request_concurrency_limit | ALPHA | Gauge | Shared concurrency limit in the API Priority and Fairness subsystem | priority_level |
||
apiserver_flowcontrol_request_dispatch_no_accommodation_total | ALPHA | Counter | Number of times a dispatch attempt resulted in a non accommodation due to lack of available seats | flow_schema priority_level |
||
apiserver_flowcontrol_request_execution_seconds | ALPHA | Histogram | Duration of initial stage (for a WATCH) or any (for a non-WATCH) stage of request execution in the API Priority and Fairness subsystem | flow_schema priority_level type |
||
apiserver_flowcontrol_request_queue_length_after_enqueue | ALPHA | Histogram | Length of queue in the API Priority and Fairness subsystem, as seen by each request after it is enqueued | flow_schema priority_level |
||
apiserver_flowcontrol_request_wait_duration_seconds | ALPHA | Histogram | Length of time a request spent waiting in its queue | execute flow_schema priority_level |
||
apiserver_flowcontrol_seat_fair_frac | ALPHA | Gauge | Fair fraction of server's concurrency to allocate to each priority level that can use it | |||
apiserver_flowcontrol_target_seats | ALPHA | Gauge | Seat allocation targets | priority_level |
||
apiserver_flowcontrol_upper_limit_seats | ALPHA | Gauge | Configured upper bound on number of execution seats available to each priority level | priority_level |
||
apiserver_flowcontrol_watch_count_samples | ALPHA | Histogram | count of watchers for mutating requests in API Priority and Fairness | flow_schema priority_level |
||
apiserver_flowcontrol_work_estimated_seats | ALPHA | Histogram | Number of estimated seats (maximum of initial and final seats) associated with requests in API Priority and Fairness | flow_schema priority_level |
||
apiserver_init_events_total | ALPHA | Counter | Counter of init events processed in watch cache broken by resource type. | resource |
||
apiserver_kube_aggregator_x509_insecure_sha1_total | ALPHA | Counter | Counts the number of requests to servers with insecure SHA1 signatures in their serving certificate OR the number of connection failures due to the insecure SHA1 signatures (either/or, based on the runtime environment) | |||
apiserver_kube_aggregator_x509_missing_san_total | ALPHA | Counter | Counts the number of requests to servers missing SAN extension in their serving certificate OR the number of connection failures due to the lack of x509 certificate SAN extension missing (either/or, based on the runtime environment) | |||
apiserver_request_aborts_total | ALPHA | Counter | Number of requests which apiserver aborted possibly due to a timeout, for each group, version, verb, resource, subresource and scope | group resource scope subresource verb version |
||
apiserver_request_body_sizes | ALPHA | Histogram | Apiserver request body sizes broken out by size. | resource verb |
||
apiserver_request_filter_duration_seconds | ALPHA | Histogram | Request filter latency distribution in seconds, for each filter type | filter |
||
apiserver_request_post_timeout_total | ALPHA | Counter | Tracks the activity of the request handlers after the associated requests have been timed out by the apiserver | source status |
||
apiserver_request_sli_duration_seconds | ALPHA | Histogram | Response latency distribution (not counting webhook duration and priority & fairness queue wait times) in seconds for each verb, group, version, resource, subresource, scope and component. | component group resource scope subresource verb version |
||
apiserver_request_slo_duration_seconds | ALPHA | Histogram | Response latency distribution (not counting webhook duration and priority & fairness queue wait times) in seconds for each verb, group, version, resource, subresource, scope and component. | component group resource scope subresource verb version |
1.27.0 | |
apiserver_request_terminations_total | ALPHA | Counter | Number of requests which apiserver terminated in self-defense. | code component group resource scope subresource verb version |
||
apiserver_request_timestamp_comparison_time | ALPHA | Histogram | Time taken for comparison of old vs new objects in UPDATE or PATCH requests | code_path |
||
apiserver_selfrequest_total | ALPHA | Counter | Counter of apiserver self-requests broken out for each verb, API resource and subresource. | resource subresource verb |
||
apiserver_storage_data_key_generation_duration_seconds | ALPHA | Histogram | Latencies in seconds of data encryption key(DEK) generation operations. | |||
apiserver_storage_data_key_generation_failures_total | ALPHA | Counter | Total number of failed data encryption key(DEK) generation operations. | |||
apiserver_storage_db_total_size_in_bytes | ALPHA | Gauge | Total size of the storage database file physically allocated in bytes. | endpoint |
||
apiserver_storage_decode_errors_total | ALPHA | Counter | Number of stored object decode errors split by object type | resource |
||
apiserver_storage_envelope_transformation_cache_misses_total | ALPHA | Counter | Total number of cache misses while accessing key decryption key(KEK). | |||
apiserver_storage_events_received_total | ALPHA | Counter | Number of etcd events received split by kind. | resource |
||
apiserver_storage_list_evaluated_objects_total | ALPHA | Counter | Number of objects tested in the course of serving a LIST request from storage | resource |
||
apiserver_storage_list_fetched_objects_total | ALPHA | Counter | Number of objects read from storage in the course of serving a LIST request | resource |
||
apiserver_storage_list_returned_objects_total | ALPHA | Counter | Number of objects returned for a LIST request from storage | resource |
||
apiserver_storage_list_total | ALPHA | Counter | Number of LIST requests served from storage | resource |
||
apiserver_storage_transformation_duration_seconds | ALPHA | Histogram | Latencies in seconds of value transformation operations. | transformation_type transformer_prefix |
||
apiserver_storage_transformation_operations_total | ALPHA | Counter | Total number of transformations. | status transformation_type transformer_prefix |
||
apiserver_terminated_watchers_total | ALPHA | Counter | Counter of watchers closed due to unresponsiveness broken by resource type. | resource |
||
apiserver_tls_handshake_errors_total | ALPHA | Counter | Number of requests dropped with 'TLS handshake error from' error | |||
apiserver_validating_admission_policy_check_duration_seconds | ALPHA | Histogram | Validation admission latency for individual validation expressions in seconds, labeled by policy and further including binding, state and enforcement action taken. | enforcement_action policy policy_binding state |
||
apiserver_validating_admission_policy_check_total | ALPHA | Counter | Validation admission policy check total, labeled by policy and further identified by binding, enforcement action taken, and state. | enforcement_action policy policy_binding state |
||
apiserver_validating_admission_policy_definition_total | ALPHA | Counter | Validation admission policy count total, labeled by state and enforcement action. | enforcement_action state |
||
apiserver_watch_cache_events_dispatched_total | ALPHA | Counter | Counter of events dispatched in watch cache broken by resource type. | resource |
||
apiserver_watch_cache_events_received_total | ALPHA | Counter | Counter of events received in watch cache broken by resource type. | resource |
||
apiserver_watch_cache_initializations_total | ALPHA | Counter | Counter of watch cache initializations broken by resource type. | resource |
||
apiserver_watch_events_sizes | ALPHA | Histogram | Watch event size distribution in bytes | group kind version |
||
apiserver_watch_events_total | ALPHA | Counter | Number of events sent in watch clients | group kind version |
||
apiserver_webhooks_x509_insecure_sha1_total | ALPHA | Counter | Counts the number of requests to servers with insecure SHA1 signatures in their serving certificate OR the number of connection failures due to the insecure SHA1 signatures (either/or, based on the runtime environment) | |||
apiserver_webhooks_x509_missing_san_total | ALPHA | Counter | Counts the number of requests to servers missing SAN extension in their serving certificate OR the number of connection failures due to the lack of x509 certificate SAN extension missing (either/or, based on the runtime environment) | |||
attachdetach_controller_forced_detaches | ALPHA | Counter | Number of times the A/D Controller performed a forced detach | |||
attachdetach_controller_total_volumes | ALPHA | Custom | Number of volumes in A/D Controller | plugin_name state |
||
authenticated_user_requests | ALPHA | Counter | Counter of authenticated requests broken out by username. | username |
||
authentication_attempts | ALPHA | Counter | Counter of authenticated attempts. | result |
||
authentication_duration_seconds | ALPHA | Histogram | Authentication duration in seconds broken out by result. | result |
||
authentication_token_cache_active_fetch_count | ALPHA | Gauge | status |
|||
authentication_token_cache_fetch_total | ALPHA | Counter | status |
|||
authentication_token_cache_request_duration_seconds | ALPHA | Histogram | status |
|||
authentication_token_cache_request_total | ALPHA | Counter | status |
|||
cloud_provider_webhook_request_duration_seconds | ALPHA | Histogram | Request latency in seconds. Broken down by status code. | code webhook |
||
cloud_provider_webhook_request_total | ALPHA | Counter | Number of HTTP requests partitioned by status code. | code webhook |
||
cloudprovider_azure_api_request_duration_seconds | ALPHA | Histogram | Latency of an Azure API call | request resource_group source subscription_id |
||
cloudprovider_azure_api_request_errors | ALPHA | Counter | Number of errors for an Azure API call | request resource_group source subscription_id |
||
cloudprovider_azure_api_request_ratelimited_count | ALPHA | Counter | Number of rate limited Azure API calls | request resource_group source subscription_id |
||
cloudprovider_azure_api_request_throttled_count | ALPHA | Counter | Number of throttled Azure API calls | request resource_group source subscription_id |
||
cloudprovider_azure_op_duration_seconds | ALPHA | Histogram | Latency of an Azure service operation | request resource_group source subscription_id |
||
cloudprovider_azure_op_failure_count | ALPHA | Counter | Number of failed Azure service operations | request resource_group source subscription_id |
||
cloudprovider_gce_api_request_duration_seconds | ALPHA | Histogram | Latency of a GCE API call | region request version zone |
||
cloudprovider_gce_api_request_errors | ALPHA | Counter | Number of errors for an API call | region request version zone |
||
cloudprovider_vsphere_api_request_duration_seconds | ALPHA | Histogram | Latency of vsphere api call | request |
||
cloudprovider_vsphere_api_request_errors | ALPHA | Counter | vsphere Api errors | request |
||
cloudprovider_vsphere_operation_duration_seconds | ALPHA | Histogram | Latency of vsphere operation call | operation |
||
cloudprovider_vsphere_operation_errors | ALPHA | Counter | vsphere operation errors | operation |
||
cloudprovider_vsphere_vcenter_versions | ALPHA | Custom | Versions for connected vSphere vCenters | hostname version build |
||
container_cpu_usage_seconds_total | ALPHA | Custom | Cumulative cpu time consumed by the container in core-seconds | container pod namespace |
||
container_memory_working_set_bytes | ALPHA | Custom | Current working set of the container in bytes | container pod namespace |
||
container_start_time_seconds | ALPHA | Custom | Start time of the container since unix epoch in seconds | container pod namespace |
||
csi_operations_seconds | ALPHA | Histogram | Container Storage Interface operation duration with gRPC error code status total | driver_name grpc_status_code method_name migrated |
||
endpoint_slice_controller_changes | ALPHA | Counter | Number of EndpointSlice changes | operation |
||
endpoint_slice_controller_desired_endpoint_slices | ALPHA | Gauge | Number of EndpointSlices that would exist with perfect endpoint allocation | |||
endpoint_slice_controller_endpoints_added_per_sync | ALPHA | Histogram | Number of endpoints added on each Service sync | |||
endpoint_slice_controller_endpoints_desired | ALPHA | Gauge | Number of endpoints desired | |||
endpoint_slice_controller_endpoints_removed_per_sync | ALPHA | Histogram | Number of endpoints removed on each Service sync | |||
endpoint_slice_controller_endpointslices_changed_per_sync | ALPHA | Histogram | Number of EndpointSlices changed on each Service sync | topology |
||
endpoint_slice_controller_num_endpoint_slices | ALPHA | Gauge | Number of EndpointSlices | |||
endpoint_slice_controller_syncs | ALPHA | Counter | Number of EndpointSlice syncs | result |
||
endpoint_slice_mirroring_controller_addresses_skipped_per_sync | ALPHA | Histogram | Number of addresses skipped on each Endpoints sync due to being invalid or exceeding MaxEndpointsPerSubset | |||
endpoint_slice_mirroring_controller_changes | ALPHA | Counter | Number of EndpointSlice changes | operation |
||
endpoint_slice_mirroring_controller_desired_endpoint_slices | ALPHA | Gauge | Number of EndpointSlices that would exist with perfect endpoint allocation | |||
endpoint_slice_mirroring_controller_endpoints_added_per_sync | ALPHA | Histogram | Number of endpoints added on each Endpoints sync | |||
endpoint_slice_mirroring_controller_endpoints_desired | ALPHA | Gauge | Number of endpoints desired | |||
endpoint_slice_mirroring_controller_endpoints_removed_per_sync | ALPHA | Histogram | Number of endpoints removed on each Endpoints sync | |||
endpoint_slice_mirroring_controller_endpoints_sync_duration | ALPHA | Histogram | Duration of syncEndpoints() in seconds | |||
endpoint_slice_mirroring_controller_endpoints_updated_per_sync | ALPHA | Histogram | Number of endpoints updated on each Endpoints sync | |||
endpoint_slice_mirroring_controller_num_endpoint_slices | ALPHA | Gauge | Number of EndpointSlices | |||
ephemeral_volume_controller_create_failures_total | ALPHA | Counter | Number of PersistenVolumeClaims creation requests | |||
ephemeral_volume_controller_create_total | ALPHA | Counter | Number of PersistenVolumeClaims creation requests | |||
etcd_bookmark_counts | ALPHA | Gauge | Number of etcd bookmarks (progress notify events) split by kind. | resource |
||
etcd_lease_object_counts | ALPHA | Histogram | Number of objects attached to a single etcd lease. | |||
etcd_request_duration_seconds | ALPHA | Histogram | Etcd request latency in seconds for each operation and object type. | operation type |
||
etcd_version_info | ALPHA | Gauge | Etcd server's binary version | binary_version |
||
field_validation_request_duration_seconds | ALPHA | Histogram | Response latency distribution in seconds for each field validation value | field_validation |
||
force_cleaned_failed_volume_operation_errors_total | ALPHA | Counter | The number of volumes that failed force cleanup after their reconstruction failed during kubelet startup. | |||
force_cleaned_failed_volume_operations_total | ALPHA | Counter | The number of volumes that were force cleaned after their reconstruction failed during kubelet startup. This includes both successful and failed cleanups. | |||
garbagecollector_controller_resources_sync_error_total | ALPHA | Counter | Number of garbage collector resources sync errors | |||
get_token_count | ALPHA | Counter | Counter of total Token() requests to the alternate token source | |||
get_token_fail_count | ALPHA | Counter | Counter of failed Token() requests to the alternate token source | |||
horizontal_pod_autoscaler_controller_metric_computation_duration_seconds | ALPHA | Histogram | The time(seconds) that the HPA controller takes to calculate one metric. The label 'action' should be either 'scale_down', 'scale_up', or 'none'. The label 'error' should be either 'spec', 'internal', or 'none'. The label 'metric_type' corresponds to HPA.spec.metrics[*].type | action error metric_type |
||
horizontal_pod_autoscaler_controller_metric_computation_total | ALPHA | Counter | Number of metric computations. The label 'action' should be either 'scale_down', 'scale_up', or 'none'. Also, the label 'error' should be either 'spec', 'internal', or 'none'. The label 'metric_type' corresponds to HPA.spec.metrics[*].type | action error metric_type |
||
horizontal_pod_autoscaler_controller_reconciliation_duration_seconds | ALPHA | Histogram | The time(seconds) that the HPA controller takes to reconcile once. The label 'action' should be either 'scale_down', 'scale_up', or 'none'. Also, the label 'error' should be either 'spec', 'internal', or 'none'. Note that if both spec and internal errors happen during a reconciliation, the first one to occur is reported in `error` label. | action error |
||
horizontal_pod_autoscaler_controller_reconciliations_total | ALPHA | Counter | Number of reconciliations of HPA controller. The label 'action' should be either 'scale_down', 'scale_up', or 'none'. Also, the label 'error' should be either 'spec', 'internal', or 'none'. Note that if both spec and internal errors happen during a reconciliation, the first one to occur is reported in `error` label. | action error |
||
job_controller_pod_failures_handled_by_failure_policy_total | ALPHA | Counter | `The number of failed Pods handled by failure policy with, respect to the failure policy action applied based on the matched, rule. Possible values of the action label correspond to the, possible values for the failure policy rule action, which are:, "FailJob", "Ignore" and "Count".` | action |
||
job_controller_terminated_pods_tracking_finalizer_total | ALPHA | Counter | `The number of terminated pods (phase=Failed|Succeeded), that have the finalizer batch.kubernetes.io/job-tracking, The event label can be "add" or "delete".` | event |
||
kube_apiserver_clusterip_allocator_allocated_ips | ALPHA | Gauge | Gauge measuring the number of allocated IPs for Services | cidr |
||
kube_apiserver_clusterip_allocator_allocation_errors_total | ALPHA | Counter | Number of errors trying to allocate Cluster IPs | cidr scope |
||
kube_apiserver_clusterip_allocator_allocation_total | ALPHA | Counter | Number of Cluster IPs allocations | cidr scope |
||
kube_apiserver_clusterip_allocator_available_ips | ALPHA | Gauge | Gauge measuring the number of available IPs for Services | cidr |
||
kube_apiserver_nodeport_allocator_allocated_ports | ALPHA | Gauge | Gauge measuring the number of allocated NodePorts for Services | |||
kube_apiserver_nodeport_allocator_allocation_errors_total | ALPHA | Counter | Number of errors trying to allocate NodePort | scope |
||
kube_apiserver_nodeport_allocator_allocation_total | ALPHA | Counter | Number of NodePort allocations | scope |
||
kube_apiserver_nodeport_allocator_available_ports | ALPHA | Gauge | Gauge measuring the number of available NodePorts for Services | |||
kube_apiserver_pod_logs_backend_tls_failure_total | ALPHA | Counter | Total number of requests for pods/logs that failed due to kubelet server TLS verification | |||
kube_apiserver_pod_logs_insecure_backend_total | ALPHA | Counter | Total number of requests for pods/logs sliced by usage type: enforce_tls, skip_tls_allowed, skip_tls_denied | usage |
||
kube_apiserver_pod_logs_pods_logs_backend_tls_failure_total | ALPHA | Counter | Total number of requests for pods/logs that failed due to kubelet server TLS verification | 1.27.0 | ||
kube_apiserver_pod_logs_pods_logs_insecure_backend_total | ALPHA | Counter | Total number of requests for pods/logs sliced by usage type: enforce_tls, skip_tls_allowed, skip_tls_denied | usage |
1.27.0 | |
kubelet_active_pods | ALPHA | Gauge | The number of pods the kubelet considers active and which are being considered when admitting new pods. static is true if the pod is not from the apiserver. | static |
||
kubelet_certificate_manager_client_expiration_renew_errors | ALPHA | Counter | Counter of certificate renewal errors. | |||
kubelet_certificate_manager_client_ttl_seconds | ALPHA | Gauge | Gauge of the TTL (time-to-live) of the Kubelet's client certificate. The value is in seconds until certificate expiry (negative if already expired). If client certificate is invalid or unused, the value will be +INF. | |||
kubelet_certificate_manager_server_rotation_seconds | ALPHA | Histogram | Histogram of the number of seconds the previous certificate lived before being rotated. | |||
kubelet_certificate_manager_server_ttl_seconds | ALPHA | Gauge | Gauge of the shortest TTL (time-to-live) of the Kubelet's serving certificate. The value is in seconds until certificate expiry (negative if already expired). If serving certificate is invalid or unused, the value will be +INF. | |||
kubelet_cgroup_manager_duration_seconds | ALPHA | Histogram | Duration in seconds for cgroup manager operations. Broken down by method. | operation_type |
||
kubelet_container_log_filesystem_used_bytes | ALPHA | Custom | Bytes used by the container's logs on the filesystem. | uid namespace pod container |
||
kubelet_containers_per_pod_count | ALPHA | Histogram | The number of containers per pod. | |||
kubelet_cpu_manager_pinning_errors_total | ALPHA | Counter | The number of cpu core allocations which required pinning failed. | |||
kubelet_cpu_manager_pinning_requests_total | ALPHA | Counter | The number of cpu core allocations which required pinning. | |||
kubelet_credential_provider_plugin_duration | ALPHA | Histogram | Duration of execution in seconds for credential provider plugin | plugin_name |
||
kubelet_credential_provider_plugin_errors | ALPHA | Counter | Number of errors from credential provider plugin | plugin_name |
||
kubelet_desired_pods | ALPHA | Gauge | The number of pods the kubelet is being instructed to run. static is true if the pod is not from the apiserver. | static |
||
kubelet_device_plugin_alloc_duration_seconds | ALPHA | Histogram | Duration in seconds to serve a device plugin Allocation request. Broken down by resource name. | resource_name |
||
kubelet_device_plugin_registration_total | ALPHA | Counter | Cumulative number of device plugin registrations. Broken down by resource name. | resource_name |
||
kubelet_evented_pleg_connection_error_count | ALPHA | Counter | The number of errors encountered during the establishment of streaming connection with the CRI runtime. | |||
kubelet_evented_pleg_connection_latency_seconds | ALPHA | Histogram | The latency of streaming connection with the CRI runtime, measured in seconds. | |||
kubelet_evented_pleg_connection_success_count | ALPHA | Counter | The number of times a streaming client was obtained to receive CRI Events. | |||
kubelet_eviction_stats_age_seconds | ALPHA | Histogram | Time between when stats are collected, and when pod is evicted based on those stats by eviction signal | eviction_signal |
||
kubelet_evictions | ALPHA | Counter | Cumulative number of pod evictions by eviction signal | eviction_signal |
||
kubelet_graceful_shutdown_end_time_seconds | ALPHA | Gauge | Last graceful shutdown start time since unix epoch in seconds | |||
kubelet_graceful_shutdown_start_time_seconds | ALPHA | Gauge | Last graceful shutdown start time since unix epoch in seconds | |||
kubelet_http_inflight_requests | ALPHA | Gauge | Number of the inflight http requests | long_running method path server_type |
||
kubelet_http_requests_duration_seconds | ALPHA | Histogram | Duration in seconds to serve http requests | long_running method path server_type |
||
kubelet_http_requests_total | ALPHA | Counter | Number of the http requests received since the server started | long_running method path server_type |
||
kubelet_lifecycle_handler_http_fallbacks_total | ALPHA | Counter | The number of times lifecycle handlers successfully fell back to http from https. | |||
kubelet_managed_ephemeral_containers | ALPHA | Gauge | Current number of ephemeral containers in pods managed by this kubelet. | |||
kubelet_mirror_pods | ALPHA | Gauge | The number of mirror pods the kubelet will try to create (one per admitted static pod) | |||
kubelet_node_name | ALPHA | Gauge | The node's name. The count is always 1. | node |
||
kubelet_orphan_pod_cleaned_volumes | ALPHA | Gauge | The total number of orphaned Pods whose volumes were cleaned in the last periodic sweep. | |||
kubelet_orphan_pod_cleaned_volumes_errors | ALPHA | Gauge | The number of orphaned Pods whose volumes failed to be cleaned in the last periodic sweep. | |||
kubelet_orphaned_runtime_pods_total | ALPHA | Counter | Number of pods that have been detected in the container runtime without being already known to the pod worker. This typically indicates the kubelet was restarted while a pod was force deleted in the API or in the local configuration, which is unusual. | |||
kubelet_pleg_discard_events | ALPHA | Counter | The number of discard events in PLEG. | |||
kubelet_pleg_last_seen_seconds | ALPHA | Gauge | Timestamp in seconds when PLEG was last seen active. | |||
kubelet_pleg_relist_duration_seconds | ALPHA | Histogram | Duration in seconds for relisting pods in PLEG. | |||
kubelet_pleg_relist_interval_seconds | ALPHA | Histogram | Interval in seconds between relisting in PLEG. | |||
kubelet_pod_resources_endpoint_errors_get | ALPHA | Counter | Number of requests to the PodResource Get endpoint which returned error. Broken down by server api version. | server_api_version |
||
kubelet_pod_resources_endpoint_errors_get_allocatable | ALPHA | Counter | Number of requests to the PodResource GetAllocatableResources endpoint which returned error. Broken down by server api version. | server_api_version |
||
kubelet_pod_resources_endpoint_errors_list | ALPHA | Counter | Number of requests to the PodResource List endpoint which returned error. Broken down by server api version. | server_api_version |
||
kubelet_pod_resources_endpoint_requests_get | ALPHA | Counter | Number of requests to the PodResource Get endpoint. Broken down by server api version. | server_api_version |
||
kubelet_pod_resources_endpoint_requests_get_allocatable | ALPHA | Counter | Number of requests to the PodResource GetAllocatableResources endpoint. Broken down by server api version. | server_api_version |
||
kubelet_pod_resources_endpoint_requests_list | ALPHA | Counter | Number of requests to the PodResource List endpoint. Broken down by server api version. | server_api_version |
||
kubelet_pod_resources_endpoint_requests_total | ALPHA | Counter | Cumulative number of requests to the PodResource endpoint. Broken down by server api version. | server_api_version |
||
kubelet_pod_start_duration_seconds | ALPHA | Histogram | Duration in seconds from kubelet seeing a pod for the first time to the pod starting to run | |||
kubelet_pod_start_sli_duration_seconds | ALPHA | Histogram | Duration in seconds to start a pod, excluding time to pull images and run init containers, measured from pod creation timestamp to when all its containers are reported as started and observed via watch | |||
kubelet_pod_status_sync_duration_seconds | ALPHA | Histogram | Duration in seconds to sync a pod status update. Measures time from detection of a change to pod status until the API is successfully updated for that pod, even if multiple intevening changes to pod status occur. | |||
kubelet_pod_worker_duration_seconds | ALPHA | Histogram | Duration in seconds to sync a single pod. Broken down by operation type: create, update, or sync | operation_type |
||
kubelet_pod_worker_start_duration_seconds | ALPHA | Histogram | Duration in seconds from kubelet seeing a pod to starting a worker. | |||
kubelet_preemptions | ALPHA | Counter | Cumulative number of pod preemptions by preemption resource | preemption_signal |
||
kubelet_restarted_pods_total | ALPHA | Counter | Number of pods that have been restarted because they were deleted and recreated with the same UID while the kubelet was watching them (common for static pods, extremely uncommon for API pods) | static |
||
kubelet_run_podsandbox_duration_seconds | ALPHA | Histogram | Duration in seconds of the run_podsandbox operations. Broken down by RuntimeClass.Handler. | runtime_handler |
||
kubelet_run_podsandbox_errors_total | ALPHA | Counter | Cumulative number of the run_podsandbox operation errors by RuntimeClass.Handler. | runtime_handler |
||
kubelet_running_containers | ALPHA | Gauge | Number of containers currently running | container_state |
||
kubelet_running_pods | ALPHA | Gauge | Number of pods that have a running pod sandbox | |||
kubelet_runtime_operations_duration_seconds | ALPHA | Histogram | Duration in seconds of runtime operations. Broken down by operation type. | operation_type |
||
kubelet_runtime_operations_errors_total | ALPHA | Counter | Cumulative number of runtime operation errors by operation type. | operation_type |
||
kubelet_runtime_operations_total | ALPHA | Counter | Cumulative number of runtime operations by operation type. | operation_type |
||
kubelet_server_expiration_renew_errors | ALPHA | Counter | Counter of certificate renewal errors. | |||
kubelet_started_containers_errors_total | ALPHA | Counter | Cumulative number of errors when starting containers | code container_type |
||
kubelet_started_containers_total | ALPHA | Counter | Cumulative number of containers started | container_type |
||
kubelet_started_host_process_containers_errors_total | ALPHA | Counter | Cumulative number of errors when starting hostprocess containers. This metric will only be collected on Windows and requires WindowsHostProcessContainers feature gate to be enabled. | code container_type |
||
kubelet_started_host_process_containers_total | ALPHA | Counter | Cumulative number of hostprocess containers started. This metric will only be collected on Windows and requires WindowsHostProcessContainers feature gate to be enabled. | container_type |
||
kubelet_started_pods_errors_total | ALPHA | Counter | Cumulative number of errors when starting pods | |||
kubelet_started_pods_total | ALPHA | Counter | Cumulative number of pods started | |||
kubelet_topology_manager_admission_duration_ms | ALPHA | Histogram | Duration in milliseconds to serve a pod admission request. | |||
kubelet_topology_manager_admission_errors_total | ALPHA | Counter | The number of admission request failures where resources could not be aligned. | |||
kubelet_topology_manager_admission_requests_total | ALPHA | Counter | The number of admission requests where resources have to be aligned. | |||
kubelet_volume_metric_collection_duration_seconds | ALPHA | Histogram | Duration in seconds to calculate volume stats | metric_source |
||
kubelet_volume_stats_available_bytes | ALPHA | Custom | Number of available bytes in the volume | namespace persistentvolumeclaim |
||
kubelet_volume_stats_capacity_bytes | ALPHA | Custom | Capacity in bytes of the volume | namespace persistentvolumeclaim |
||
kubelet_volume_stats_health_status_abnormal | ALPHA | Custom | Abnormal volume health status. The count is either 1 or 0. 1 indicates the volume is unhealthy, 0 indicates volume is healthy | namespace persistentvolumeclaim |
||
kubelet_volume_stats_inodes | ALPHA | Custom | Maximum number of inodes in the volume | namespace persistentvolumeclaim |
||
kubelet_volume_stats_inodes_free | ALPHA | Custom | Number of free inodes in the volume | namespace persistentvolumeclaim |
||
kubelet_volume_stats_inodes_used | ALPHA | Custom | Number of used inodes in the volume | namespace persistentvolumeclaim |
||
kubelet_volume_stats_used_bytes | ALPHA | Custom | Number of used bytes in the volume | namespace persistentvolumeclaim |
||
kubelet_working_pods | ALPHA | Gauge | Number of pods the kubelet is actually running, broken down by lifecycle phase, whether the pod is desired, orphaned, or runtime only (also orphaned), and whether the pod is static. An orphaned pod has been removed from local configuration or force deleted in the API and consumes resources that are not otherwise visible. | config lifecycle static |
||
kubeproxy_network_programming_duration_seconds | ALPHA | Histogram | In Cluster Network Programming Latency in seconds | |||
kubeproxy_sync_proxy_rules_duration_seconds | ALPHA | Histogram | SyncProxyRules latency in seconds | |||
kubeproxy_sync_proxy_rules_endpoint_changes_pending | ALPHA | Gauge | Pending proxy rules Endpoint changes | |||
kubeproxy_sync_proxy_rules_endpoint_changes_total | ALPHA | Counter | Cumulative proxy rules Endpoint changes | |||
kubeproxy_sync_proxy_rules_iptables_partial_restore_failures_total | ALPHA | Counter | Cumulative proxy iptables partial restore failures | |||
kubeproxy_sync_proxy_rules_iptables_restore_failures_total | ALPHA | Counter | Cumulative proxy iptables restore failures | |||
kubeproxy_sync_proxy_rules_iptables_total | ALPHA | Gauge | Number of proxy iptables rules programmed | table |
||
kubeproxy_sync_proxy_rules_last_queued_timestamp_seconds | ALPHA | Gauge | The last time a sync of proxy rules was queued | |||
kubeproxy_sync_proxy_rules_last_timestamp_seconds | ALPHA | Gauge | The last time proxy rules were successfully synced | |||
kubeproxy_sync_proxy_rules_no_local_endpoints_total | ALPHA | Gauge | Number of services with a Local traffic policy and no endpoints | traffic_policy |
||
kubeproxy_sync_proxy_rules_service_changes_pending | ALPHA | Gauge | Pending proxy rules Service changes | |||
kubeproxy_sync_proxy_rules_service_changes_total | ALPHA | Counter | Cumulative proxy rules Service changes | |||
kubernetes_build_info | ALPHA | Gauge | A metric with a constant '1' value labeled by major, minor, git version, git commit, git tree state, build date, Go version, and compiler from which Kubernetes was built, and platform on which it is running. | build_date compiler git_commit git_tree_state git_version go_version major minor platform |
||
kubernetes_feature_enabled | ALPHA | Gauge | This metric records the data about the stage and enablement of a k8s feature. | name stage |
||
kubernetes_healthcheck | ALPHA | Gauge | This metric records the result of a single healthcheck. | name type |
||
kubernetes_healthchecks_total | ALPHA | Counter | This metric records the results of all healthcheck. | name status type |
||
leader_election_master_status | ALPHA | Gauge | Gauge of if the reporting system is master of the relevant lease, 0 indicates backup, 1 indicates master. 'name' is the string used to identify the lease. Please make sure to group by name. | name |
||
node_authorizer_graph_actions_duration_seconds | ALPHA | Histogram | Histogram of duration of graph actions in node authorizer. | operation |
||
node_collector_unhealthy_nodes_in_zone | ALPHA | Gauge | Gauge measuring number of not Ready Nodes per zones. | zone |
||
node_collector_update_all_nodes_health_duration_seconds | ALPHA | Histogram | Duration in seconds for NodeController to update the health of all nodes. | |||
node_collector_update_node_health_duration_seconds | ALPHA | Histogram | Duration in seconds for NodeController to update the health of a single node. | |||
node_collector_zone_health | ALPHA | Gauge | Gauge measuring percentage of healthy nodes per zone. | zone |
||
node_collector_zone_size | ALPHA | Gauge | Gauge measuring number of registered Nodes per zones. | zone |
||
node_cpu_usage_seconds_total | ALPHA | Custom | Cumulative cpu time consumed by the node in core-seconds | |||
node_ipam_controller_cidrset_allocation_tries_per_request | ALPHA | Histogram | Number of endpoints added on each Service sync | clusterCIDR |
||
node_ipam_controller_cidrset_cidrs_allocations_total | ALPHA | Counter | Counter measuring total number of CIDR allocations. | clusterCIDR |
||
node_ipam_controller_cidrset_cidrs_releases_total | ALPHA | Counter | Counter measuring total number of CIDR releases. | clusterCIDR |
||
node_ipam_controller_cidrset_usage_cidrs | ALPHA | Gauge | Gauge measuring percentage of allocated CIDRs. | clusterCIDR |
||
node_ipam_controller_cirdset_max_cidrs | ALPHA | Gauge | Maximum number of CIDRs that can be allocated. | clusterCIDR |
||
node_ipam_controller_multicidrset_allocation_tries_per_request | ALPHA | Histogram | Histogram measuring CIDR allocation tries per request. | clusterCIDR |
||
node_ipam_controller_multicidrset_cidrs_allocations_total | ALPHA | Counter | Counter measuring total number of CIDR allocations. | clusterCIDR |
||
node_ipam_controller_multicidrset_cidrs_releases_total | ALPHA | Counter | Counter measuring total number of CIDR releases. | clusterCIDR |
||
node_ipam_controller_multicidrset_usage_cidrs | ALPHA | Gauge | Gauge measuring percentage of allocated CIDRs. | clusterCIDR |
||
node_ipam_controller_multicirdset_max_cidrs | ALPHA | Gauge | Maximum number of CIDRs that can be allocated. | clusterCIDR |
||
node_memory_working_set_bytes | ALPHA | Custom | Current working set of the node in bytes | |||
number_of_l4_ilbs | ALPHA | Gauge | Number of L4 ILBs | feature |
||
plugin_manager_total_plugins | ALPHA | Custom | Number of plugins in Plugin Manager | socket_path state |
||
pod_cpu_usage_seconds_total | ALPHA | Custom | Cumulative cpu time consumed by the pod in core-seconds | pod namespace |
||
pod_gc_collector_force_delete_pod_errors_total | ALPHA | Counter | Number of errors encountered when forcefully deleting the pods since the Pod GC Controller started. | |||
pod_gc_collector_force_delete_pods_total | ALPHA | Counter | Number of pods that are being forcefully deleted since the Pod GC Controller started. | |||
pod_memory_working_set_bytes | ALPHA | Custom | Current working set of the pod in bytes | pod namespace |
||
pod_security_errors_total | ALPHA | Counter | Number of errors preventing normal evaluation. Non-fatal errors may result in the latest restricted profile being used for evaluation. | fatal request_operation resource subresource |
||
pod_security_evaluations_total | ALPHA | Counter | Number of policy evaluations that occurred, not counting ignored or exempt requests. | decision mode policy_level policy_version request_operation resource subresource |
||
pod_security_exemptions_total | ALPHA | Counter | Number of exempt requests, not counting ignored or out of scope requests. | request_operation resource subresource |
||
prober_probe_duration_seconds | ALPHA | Histogram | Duration in seconds for a probe response. | container namespace pod probe_type |
||
prober_probe_total | ALPHA | Counter | Cumulative number of a liveness, readiness or startup probe for a container by result. | container namespace pod pod_uid probe_type result |
||
pv_collector_bound_pv_count | ALPHA | Custom | Gauge measuring number of persistent volume currently bound | storage_class |
||
pv_collector_bound_pvc_count | ALPHA | Custom | Gauge measuring number of persistent volume claim currently bound | namespace |
||
pv_collector_total_pv_count | ALPHA | Custom | Gauge measuring total number of persistent volumes | plugin_name volume_mode |
||
pv_collector_unbound_pv_count | ALPHA | Custom | Gauge measuring number of persistent volume currently unbound | storage_class |
||
pv_collector_unbound_pvc_count | ALPHA | Custom | Gauge measuring number of persistent volume claim currently unbound | namespace |
||
reconstruct_volume_operations_errors_total | ALPHA | Counter | The number of volumes that failed reconstruction from the operating system during kubelet startup. | |||
reconstruct_volume_operations_total | ALPHA | Counter | The number of volumes that were attempted to be reconstructed from the operating system during kubelet startup. This includes both successful and failed reconstruction. | |||
replicaset_controller_sorting_deletion_age_ratio | ALPHA | Histogram | The ratio of chosen deleted pod's ages to the current youngest pod's age (at the time). Should be <2.The intent of this metric is to measure the rough efficacy of the LogarithmicScaleDown feature gate's effect onthe sorting (and deletion) of pods when a replicaset scales down. This only considers Ready pods when calculating and reporting. | |||
resourceclaim_controller_create_attempts_total | ALPHA | Counter | Number of ResourceClaims creation requests | |||
resourceclaim_controller_create_failures_total | ALPHA | Counter | Number of ResourceClaims creation request failures | |||
rest_client_exec_plugin_call_total | ALPHA | Counter | Number of calls to an exec plugin, partitioned by the type of event encountered (no_error, plugin_execution_error, plugin_not_found_error, client_internal_error) and an optional exit code. The exit code will be set to 0 if and only if the plugin call was successful. | call_status code |
||
rest_client_exec_plugin_certificate_rotation_age | ALPHA | Histogram | Histogram of the number of seconds the last auth exec plugin client certificate lived before being rotated. If auth exec plugin client certificates are unused, histogram will contain no data. | |||
rest_client_exec_plugin_ttl_seconds | ALPHA | Gauge | Gauge of the shortest TTL (time-to-live) of the client certificate(s) managed by the auth exec plugin. The value is in seconds until certificate expiry (negative if already expired). If auth exec plugins are unused or manage no TLS certificates, the value will be +INF. | |||
rest_client_rate_limiter_duration_seconds | ALPHA | Histogram | Client side rate limiter latency in seconds. Broken down by verb, and host. | host verb |
||
rest_client_request_duration_seconds | ALPHA | Histogram | Request latency in seconds. Broken down by verb, and host. | host verb |
||
rest_client_request_retries_total | ALPHA | Counter | Number of request retries, partitioned by status code, verb, and host. | code host verb |
||
rest_client_request_size_bytes | ALPHA | Histogram | Request size in bytes. Broken down by verb and host. | host verb |
||
rest_client_requests_total | ALPHA | Counter | Number of HTTP requests, partitioned by status code, method, and host. | code host method |
||
rest_client_response_size_bytes | ALPHA | Histogram | Response size in bytes. Broken down by verb and host. | host verb |
||
retroactive_storageclass_errors_total | ALPHA | Counter | Total number of failed retroactive StorageClass assignments to persistent volume claim | |||
retroactive_storageclass_total | ALPHA | Counter | Total number of retroactive StorageClass assignments to persistent volume claim | |||
root_ca_cert_publisher_sync_duration_seconds | ALPHA | Histogram | Number of namespace syncs happened in root ca cert publisher. | code |
||
root_ca_cert_publisher_sync_total | ALPHA | Counter | Number of namespace syncs happened in root ca cert publisher. | code |
||
running_managed_controllers | ALPHA | Gauge | Indicates where instances of a controller are currently running | manager name |
||
scheduler_goroutines | ALPHA | Gauge | Number of running goroutines split by the work they do such as binding. | operation |
||
scheduler_permit_wait_duration_seconds | ALPHA | Histogram | Duration of waiting on permit. | result |
||
scheduler_plugin_evaluation_total | ALPHA | Counter | Number of attempts to schedule pods by each plugin and the extension point (available only in PreFilter and Filter.). | extension_point plugin profile |
||
scheduler_plugin_execution_duration_seconds | ALPHA | Histogram | Duration for running a plugin at a specific extension point. | extension_point plugin status |
||
scheduler_scheduler_cache_size | ALPHA | Gauge | Number of nodes, pods, and assumed (bound) pods in the scheduler cache. | type |
||
scheduler_scheduler_goroutines | ALPHA | Gauge | Number of running goroutines split by the work they do such as binding. This metric is replaced by the \"goroutines\" metric. | work |
1.26.0 | |
scheduler_scheduling_algorithm_duration_seconds | ALPHA | Histogram | Scheduling algorithm latency in seconds | |||
scheduler_unschedulable_pods | ALPHA | Gauge | The number of unschedulable pods broken down by plugin name. A pod will increment the gauge for all plugins that caused it to not schedule and so this metric have meaning only when broken down by plugin. | plugin profile |
||
scheduler_volume_binder_cache_requests_total | ALPHA | Counter | Total number for request volume binding cache | operation |
||
scheduler_volume_scheduling_stage_error_total | ALPHA | Counter | Volume scheduling stage error count | operation |
||
scrape_error | ALPHA | Custom | 1 if there was an error while getting container metrics, 0 otherwise | |||
service_controller_loadbalancer_sync_total | ALPHA | Counter | A metric counting the amount of times any load balancer has been configured, as an effect of service/node changes on the cluster | |||
service_controller_nodesync_error_total | ALPHA | Counter | A metric counting the amount of times any load balancer has been configured and errored, as an effect of node changes on the cluster | |||
service_controller_nodesync_latency_seconds | ALPHA | Histogram | A metric measuring the latency for nodesync which updates loadbalancer hosts on cluster node updates. | |||
service_controller_update_loadbalancer_host_latency_seconds | ALPHA | Histogram | A metric measuring the latency for updating each load balancer hosts. | |||
serviceaccount_legacy_tokens_total | ALPHA | Counter | Cumulative legacy service account tokens used | |||
serviceaccount_stale_tokens_total | ALPHA | Counter | Cumulative stale projected service account tokens used | |||
serviceaccount_valid_tokens_total | ALPHA | Counter | Cumulative valid projected service account tokens used | |||
storage_count_attachable_volumes_in_use | ALPHA | Custom | Measure number of volumes in use | node volume_plugin |
||
storage_operation_duration_seconds | ALPHA | Histogram | Storage operation duration | migrated operation_name status volume_plugin |
||
ttl_after_finished_controller_job_deletion_duration_seconds | ALPHA | Histogram | The time it took to delete the job since it became eligible for deletion | |||
volume_manager_selinux_container_errors_total | ALPHA | Gauge | Number of errors when kubelet cannot compute SELinux context for a container. Kubelet can't start such a Pod then and it will retry, therefore value of this metric may not represent the actual nr. of containers. | |||
volume_manager_selinux_container_warnings_total | ALPHA | Gauge | Number of errors when kubelet cannot compute SELinux context for a container that are ignored. They will become real errors when SELinuxMountReadWriteOncePod feature is expanded to all volume access modes. | |||
volume_manager_selinux_pod_context_mismatch_errors_total | ALPHA | Gauge | Number of errors when a Pod defines different SELinux contexts for its containers that use the same volume. Kubelet can't start such a Pod then and it will retry, therefore value of this metric may not represent the actual nr. of Pods. | |||
volume_manager_selinux_pod_context_mismatch_warnings_total | ALPHA | Gauge | Number of errors when a Pod defines different SELinux contexts for its containers that use the same volume. They are not errors yet, but they will become real errors when SELinuxMountReadWriteOncePod feature is expanded to all volume access modes. | |||
volume_manager_selinux_volume_context_mismatch_errors_total | ALPHA | Gauge | Number of errors when a Pod uses a volume that is already mounted with a different SELinux context than the Pod needs. Kubelet can't start such a Pod then and it will retry, therefore value of this metric may not represent the actual nr. of Pods. | |||
volume_manager_selinux_volume_context_mismatch_warnings_total | ALPHA | Gauge | Number of errors when a Pod uses a volume that is already mounted with a different SELinux context than the Pod needs. They are not errors yet, but they will become real errors when SELinuxMountReadWriteOncePod feature is expanded to all volume access modes. | |||
volume_manager_selinux_volumes_admitted_total | ALPHA | Gauge | Number of volumes whose SELinux context was fine and will be mounted with mount -o context option. | |||
volume_manager_total_volumes | ALPHA | Custom | Number of volumes in Volume Manager | plugin_name state |
||
volume_operation_total_errors | ALPHA | Counter | Total volume operation errors | operation_name plugin_name |
||
volume_operation_total_seconds | ALPHA | Histogram | Storage operation end to end duration in seconds | operation_name plugin_name |
||
watch_cache_capacity | ALPHA | Gauge | Total capacity of watch cache broken by resource type. | resource |
||
watch_cache_capacity_decrease_total | ALPHA | Counter | Total number of watch cache capacity decrease events broken by resource type. | resource |
||
watch_cache_capacity_increase_total | ALPHA | Counter | Total number of watch cache capacity increase events broken by resource type. | resource |
||
workqueue_adds_total | ALPHA | Counter | Total number of adds handled by workqueue | name |
||
workqueue_depth | ALPHA | Gauge | Current depth of workqueue | name |
||
workqueue_longest_running_processor_seconds | ALPHA | Gauge | How many seconds has the longest running processor for workqueue been running. | name |
||
workqueue_queue_duration_seconds | ALPHA | Histogram | How long in seconds an item stays in workqueue before being requested. | name |
||
workqueue_retries_total | ALPHA | Counter | Total number of retries handled by workqueue | name |
||
workqueue_unfinished_work_seconds | ALPHA | Gauge | How many seconds of work has done that is in progress and hasn't been observed by work_duration. Large values indicate stuck threads. One can deduce the number of stuck threads by observing the rate at which this increases. | name |
||
workqueue_work_duration_seconds | ALPHA | Histogram | How long in seconds processing an item from workqueue takes. | name |