Monitoramento e Observabilidade
Aula 7 de 8
Stack de Observabilidade (kube-prometheus-stack)
# Instalar Prometheus + Grafana via Helm
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm upgrade --install kube-prometheus prometheus-community/kube-prometheus-stack \
--namespace monitoring --create-namespace \
--set grafana.adminPassword=admin
# Acessar
kubectl port-forward -n monitoring svc/kube-prometheus-grafana 3000:80
kubectl port-forward -n monitoring svc/kube-prometheus-prometheus 9090:9090
Métricas do Cluster
# Recursos padrão do kube-state-metrics:
# - kube_deployment_status_replicas
# - kube_pod_status_ready
# - kube_node_status_condition
# - container_cpu_usage_seconds_total
# - container_memory_working_set_bytes
# Query no PromQL
kubectl port-forward -n monitoring svc/prometheus-operated 9090
# PromQL: rate(container_cpu_usage_seconds_total[5m])
# PromQL: sum(container_memory_working_set_bytes) by (namespace)
Custom Metrics e HPA
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: api-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: api
minReplicas: 2
maxReplicas: 20
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80
- type: Pods
pods:
metric:
name: http_requests_per_second
target:
type: AverageValue
averageValue: 1000
kubectl get hpa -w
kubectl describe hpa api-hpa
Loki — Logs Centralizados
# Helm install Loki (Simple Scalable)
helm upgrade --install loki grafana/loki \
--namespace monitoring \
--set deploymentMode=SingleBinary \
--set loki.commonConfig.replication_factor=1
# Promtail — coleta logs dos nós
helm upgrade --install promtail grafana/promtail \
--namespace monitoring \
--set config.lokiAddress=http://loki:3100/loki/api/v1/push
# LogQL (Loki query language)
{namespace="prod", app="api"} |= "ERROR"
{namespace="prod"} |= "timeout" | json | duration > 5s
rate({app="nginx"} |= "404" [5m])
Alertmanager
# values-alertmanager.yaml
alertmanager:
config:
global:
slack_api_url: 'https://hooks.slack.com/services/xxx'
route:
group_by: ['namespace', 'alertname']
receiver: 'slack-prod'
routes:
- match:
severity: critical
receiver: 'pagerduty'
receivers:
- name: 'slack-prod'
slack_configs:
- channel: '#alerts'
title: '{{ .GroupLabels.alertname }}'
text: '{{ .CommonAnnotations.description }}'
helm upgrade kube-prometheus prometheus-community/kube-prometheus-stack \
-f values-alertmanager.yaml
Custom Resource Definitions (CRDs)
# PodMonitor — coleta métricas de pods específicos
apiVersion: monitoring.coreos.com/v1
kind: PodMonitor
metadata:
name: api-monitor
spec:
selector:
matchLabels:
app: api
podMetricsEndpoints:
- port: metrics
interval: 15s
kubectl get prometheus
kubectl get podmonitors
kubectl get servicemonitors
Observabilidade no K8s = Prometheus (métricas) + Loki (logs) + Grafana (dashboards) + Alertmanager (alertas). kube-prometheus-stack instala tudo de uma vez. PodMonitors/ServiceMonitors configuram coleta automática.