部署Prometheus-Operator监控K8S集群

TOC

什么是Prometheus Operator?

Prometheus Operator为Kubernetes提供了Prometheus监控组件和节点状态以及资源使用情况的本地部署和管理方案。它即完成了监控的任务又能够完成监控报警的需求。
Prometheus Operator简化了在Kubernetes上部署、管理和运行Prometheus 和Alertmanager集群。

安装Prometheus Operator

★准备工作★

下载地址:https://github.com/prometheus-operator/kube-prometheus/archive/refs/tags/v0.14.0.tar.gz

1.下载对应版本软件包

wget https://github.com/prometheus-operator/kube-prometheus/archive/refs/tags/v0.14.0.tar.gz
tar xf v0.14.0.tar.gz
cd kube-prometheus-0.14.0/

或者直接拉取GitHub仓库

git clone https://github.com/prometheus-operator/kube-prometheus.git

2.修改能够拉取的镜像

查看现在的镜像使用的地址

grep "image:" manifests/*.yaml
manifests/alertmanager-alertmanager.yaml:  image: quay.io/prometheus/alertmanager:v0.27.0
manifests/blackboxExporter-deployment.yaml:        image: quay.io/prometheus/blackbox-exporter:v0.25.0
manifests/blackboxExporter-deployment.yaml:        image: ghcr.io/jimmidyson/configmap-reload:v0.14.0
manifests/blackboxExporter-deployment.yaml:        image: quay.io/brancz/kube-rbac-proxy:v0.18.2
manifests/grafana-deployment.yaml:        image: grafana/grafana:11.3.1
manifests/kubeStateMetrics-deployment.yaml:        image: 
registry.k8s.io/kube-state-metrics/kube-state-metrics:2.14.0
manifests/kubeStateMetrics-deployment.yaml:        image: quay.io/brancz/kube-rbac-proxy:v0.18.2
manifests/kubeStateMetrics-deployment.yaml:        image: quay.io/brancz/kube-rbac-proxy:v0.18.2
manifests/nodeExporter-daemonset.yaml:        image: quay.io/prometheus/node-exporter:v1.8.2
manifests/nodeExporter-daemonset.yaml:        image: quay.io/brancz/kube-rbac-proxy:v0.18.2
manifests/prometheus-prometheus.yaml:  image: quay.io/prometheus/prometheus:v3.0.1
manifests/prometheusAdapter-deployment.yaml:        image: 
registry.k8s.io/kube-state-metrics/prometheus-adapter:v0.12.0
manifests/prometheusOperator-deployment.yaml:        image: quay.io/prometheus-operator/prometheus-operator:v0.78.2
manifests/prometheusOperator-deployment.yaml:        image: quay.io/brancz/kube-rbac-proxy:v0.18.2

这里需要求改两个服务的镜像就可以了,kubeStateMetrics-deployment.yamlprometheusAdapter-deployment.yaml这两个服务的镜像。
registry.k8s.io/kube-state-metrics/prometheus-adapter:v0.12.0改为v5cn/prometheus-adapter:v0.12.0
registry.k8s.io/kube-state-metrics/kube-state-metrics:2.14.0改为bitnami/kube-state-metrics:2.14.0

sed -i 's@registry.k8s.io/prometheus-adapter/prometheus-adapter@v5cn/prometheus-adapter@' manifests/prometheusAdapter-deployment.yaml
sed -i 's@registry.k8s.io/kube-state-metrics/kube-state-metrics@bitnami/kube-state-metrics@' manifests/kubeStateMetrics-deployment.yaml

3.修改grafana、prometheus 和 alertmanager的service暴露

修改manifests/grafana-service.yamlmanifests/prometheus-service.yamlmanifests/alertmanager-service.yaml文件将暴露方式修改成NodePort,这样可以外部访问了

apiVersion: v1
kind: Service
metadata:
  ......
  ......
spec:
  type: NodePort

4.开始部署Prometheus Operator

创建需要的命名空间和 CRDs

kubectl create -f manifests/setup
# 或者
kubectl apply --server-side -f manifests/setup/

直接使用apply会报错The CustomResourceDefinition "prometheuses.monitoring.coreos.com" is invalid: metadata.annotations: Too long: must have at most 262144 bytes
开始部署各个服务配置

kubectl apply -f manifests/

测试阶段

部署完成之后可以进行检查服务运行状态以及详情。

master~# kubectl -n monitoring get pods
NAME                                  READY   STATUS    RESTARTS   AGE
alertmanager-main-0                   2/2     Running   0          28h
alertmanager-main-1                   2/2     Running   0          28h
alertmanager-main-2                   2/2     Running   0          28h
blackbox-exporter-8495dbc9cc-q98h7    3/3     Running   0          28h
grafana-57ff8f54b7-z77q9              1/1     Running   0          28h
kube-state-metrics-556775b68-58ddd    3/3     Running   0          27h
node-exporter-6wjlg                   2/2     Running   0          28h
prometheus-adapter-6f45cbc95c-mdc86   1/1     Running   0          27h
prometheus-k8s-0                      2/2     Running   0          28h
prometheus-k8s-1                      2/2     Running   0          28h
prometheus-operator-d4c4ff67b-qfg9k   2/2     Running   0          28h
master~# kubectl -n monitoring get svc
NAME                    TYPE        CLUSTER-IP        EXTERNAL-IP   PORT(S)                         AGE
alertmanager-main       NodePort    192.168.9.96      <none>        9093:30093/TCP,8080:30342/TCP   28h
alertmanager-operated   ClusterIP   None              <none>        9093/TCP,9094/TCP,9094/UDP      28h
blackbox-exporter       ClusterIP   192.168.254.249   <none>        9115/TCP,19115/TCP              28h
grafana                 NodePort    192.168.244.208   <none>        3000:30030/TCP                  28h
kube-state-metrics      ClusterIP   None              <none>        8443/TCP,9443/TCP               28h
node-exporter           ClusterIP   None              <none>        9100/TCP                        28h
prometheus-adapter      ClusterIP   192.168.87.49     <none>        443/TCP                         28h
prometheus-k8s          NodePort    192.168.49.175    <none>        9090:30090/TCP,8080:32268/TCP   28h
prometheus-operated     ClusterIP   None              <none>        9090/TCP                        28h
prometheus-operator     ClusterIP   None              <none>        8443/TCP                        28h

我们可以分别通过节点的30030、30090、30093端口进行访问服务的web页面(如下图所示)
prometheus-operator
可以看到上面已经存在很多监控项的配置,我们只需要直接使用即可。