部署Prometheus-Operator监控K8S集群
什么是Prometheus Operator?
Prometheus Operator为Kubernetes提供了Prometheus监控组件和节点状态以及资源使用情况的本地部署和管理方案。它即完成了监控的任务又能够完成监控报警的需求。
Prometheus Operator简化了在Kubernetes上部署、管理和运行Prometheus 和Alertmanager集群。
安装Prometheus Operator
★准备工作★
下载地址:https://github.com/prometheus-operator/kube-prometheus/archive/refs/tags/v0.14.0.tar.gz
1.下载对应版本软件包
wget https://github.com/prometheus-operator/kube-prometheus/archive/refs/tags/v0.14.0.tar.gz
tar xf v0.14.0.tar.gz
cd kube-prometheus-0.14.0/
或者直接拉取GitHub仓库
git clone https://github.com/prometheus-operator/kube-prometheus.git
2.修改能够拉取的镜像
查看现在的镜像使用的地址
grep "image:" manifests/*.yaml
manifests/alertmanager-alertmanager.yaml: image: quay.io/prometheus/alertmanager:v0.27.0
manifests/blackboxExporter-deployment.yaml: image: quay.io/prometheus/blackbox-exporter:v0.25.0
manifests/blackboxExporter-deployment.yaml: image: ghcr.io/jimmidyson/configmap-reload:v0.14.0
manifests/blackboxExporter-deployment.yaml: image: quay.io/brancz/kube-rbac-proxy:v0.18.2
manifests/grafana-deployment.yaml: image: grafana/grafana:11.3.1
manifests/kubeStateMetrics-deployment.yaml: image:
registry.k8s.io/kube-state-metrics/kube-state-metrics:2.14.0
manifests/kubeStateMetrics-deployment.yaml: image: quay.io/brancz/kube-rbac-proxy:v0.18.2
manifests/kubeStateMetrics-deployment.yaml: image: quay.io/brancz/kube-rbac-proxy:v0.18.2
manifests/nodeExporter-daemonset.yaml: image: quay.io/prometheus/node-exporter:v1.8.2
manifests/nodeExporter-daemonset.yaml: image: quay.io/brancz/kube-rbac-proxy:v0.18.2
manifests/prometheus-prometheus.yaml: image: quay.io/prometheus/prometheus:v3.0.1
manifests/prometheusAdapter-deployment.yaml: image:
registry.k8s.io/kube-state-metrics/prometheus-adapter:v0.12.0
manifests/prometheusOperator-deployment.yaml: image: quay.io/prometheus-operator/prometheus-operator:v0.78.2
manifests/prometheusOperator-deployment.yaml: image: quay.io/brancz/kube-rbac-proxy:v0.18.2
这里需要求改两个服务的镜像就可以了,kubeStateMetrics-deployment.yaml和prometheusAdapter-deployment.yaml这两个服务的镜像。
将registry.k8s.io/kube-state-metrics/prometheus-adapter:v0.12.0改为v5cn/prometheus-adapter:v0.12.0
将registry.k8s.io/kube-state-metrics/kube-state-metrics:2.14.0改为bitnami/kube-state-metrics:2.14.0
sed -i 's@registry.k8s.io/prometheus-adapter/prometheus-adapter@v5cn/prometheus-adapter@' manifests/prometheusAdapter-deployment.yaml
sed -i 's@registry.k8s.io/kube-state-metrics/kube-state-metrics@bitnami/kube-state-metrics@' manifests/kubeStateMetrics-deployment.yaml
3.修改grafana、prometheus 和 alertmanager的service暴露
修改manifests/grafana-service.yaml、manifests/prometheus-service.yaml和manifests/alertmanager-service.yaml文件将暴露方式修改成NodePort,这样可以外部访问了
apiVersion: v1
kind: Service
metadata:
......
......
spec:
type: NodePort
4.开始部署Prometheus Operator
创建需要的命名空间和 CRDs
kubectl create -f manifests/setup
# 或者
kubectl apply --server-side -f manifests/setup/
直接使用apply会报错The CustomResourceDefinition "prometheuses.monitoring.coreos.com" is invalid: metadata.annotations: Too long: must have at most 262144 bytes
开始部署各个服务配置
kubectl apply -f manifests/
测试阶段
部署完成之后可以进行检查服务运行状态以及详情。
master~# kubectl -n monitoring get pods
NAME READY STATUS RESTARTS AGE
alertmanager-main-0 2/2 Running 0 28h
alertmanager-main-1 2/2 Running 0 28h
alertmanager-main-2 2/2 Running 0 28h
blackbox-exporter-8495dbc9cc-q98h7 3/3 Running 0 28h
grafana-57ff8f54b7-z77q9 1/1 Running 0 28h
kube-state-metrics-556775b68-58ddd 3/3 Running 0 27h
node-exporter-6wjlg 2/2 Running 0 28h
prometheus-adapter-6f45cbc95c-mdc86 1/1 Running 0 27h
prometheus-k8s-0 2/2 Running 0 28h
prometheus-k8s-1 2/2 Running 0 28h
prometheus-operator-d4c4ff67b-qfg9k 2/2 Running 0 28h
master~# kubectl -n monitoring get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
alertmanager-main NodePort 192.168.9.96 <none> 9093:30093/TCP,8080:30342/TCP 28h
alertmanager-operated ClusterIP None <none> 9093/TCP,9094/TCP,9094/UDP 28h
blackbox-exporter ClusterIP 192.168.254.249 <none> 9115/TCP,19115/TCP 28h
grafana NodePort 192.168.244.208 <none> 3000:30030/TCP 28h
kube-state-metrics ClusterIP None <none> 8443/TCP,9443/TCP 28h
node-exporter ClusterIP None <none> 9100/TCP 28h
prometheus-adapter ClusterIP 192.168.87.49 <none> 443/TCP 28h
prometheus-k8s NodePort 192.168.49.175 <none> 9090:30090/TCP,8080:32268/TCP 28h
prometheus-operated ClusterIP None <none> 9090/TCP 28h
prometheus-operator ClusterIP None <none> 8443/TCP 28h
我们可以分别通过节点的30030、30090、30093端口进行访问服务的web页面(如下图所示)

可以看到上面已经存在很多监控项的配置,我们只需要直接使用即可。