1. 版本申明¶
版本 | 修改内容 | 修改时间 |
---|---|---|
v1.0 | 初始化 | 6/25/2023 |
v1.1 | 更新一些格式以及内容 | 4/8/2024 |
v1.2 | 更新一些格式 | 4/9/2024 |
v1.3 | 更新部分内容 | 5/17/2024 |
v1.4 | 更新部分内容与格式 | 7/17/2024 |
2. 简介¶
Node Feature Discovery是一款部署在k8s集群上的用于检测硬件功能和系统配置的 Kubernetes 插件。
3. 部署示例¶
3.1. 部署要求¶
安装docker
k8s集群版本高于1.8
3.2. 制作NFD组件镜像¶
在topscloud的release包中,打开NFD的目录:
node-feature-discovery_<VERSION>/
├── bin
│ ├── build-from-source.sh
│ ├── nfd-master
│ ├── nfd-topology-updater
│ └── nfd-worker
├── build-image.sh
├── delete.sh
├── deploy.sh
├── docker
│ └── Dockerfile.ubuntu
├── README.md
└── yaml
└── nfd.yaml
执行build-image.sh
脚本一键构建GFD组件镜像:
node-feature-discovery_<VERSION> # ./build-image.sh
1. Clear old image if exist
Untagged: artifact.enflame.cn/enflame_docker_images/enflame/ \
node-feature-discovery:v0.11.3
Deleted: sha256:a88067635e8ec9f5535e06c26e74d4c1b0e45558d195c446b6cd79df7d7725c5
artifact.enflame.cn/enflame_docker_images/enflame/node-feature-discovery:v0.11.3
2. Build image start...
image name:artifact.enflame.cn/enflame_docker_images/enflame/ \
node-feature-discovery, image version:v0.11.3
Sending build context to Docker daemon 87.2MB
Step 1/6 : FROM ubuntu:18.04
---> f9a80a55f492
Step 2/6 : WORKDIR .
---> Running in 6fbf881948e7
Removing intermediate container 6fbf881948e7
---> a8fb7fec5d0c
Step 3/6 : ENV GRPC_GO_LOG_SEVERITY_LEVEL="INFO"
---> Running in f1eb62b438c4
Removing intermediate container f1eb62b438c4
---> 144ccf94017e
Step 4/6 : COPY ./bin/nfd-master /usr/bin/
---> 0c8b53b99841
Step 5/6 : COPY ./bin/nfd-topology-updater /usr/bin/
---> 98deeef25ef1
Step 6/6 : COPY ./bin/nfd-worker /usr/bin/
---> 0deb80d00ff6
Successfully built 0deb80d00ff6
Successfully tagged artifact.enflame.cn/enflame_docker_images/ \
enflame/node-feature-discovery:v0.11.3
build image success
3. save image to ./images
unpacking artifact.enflame.cn/enflame_docker_images/enflame/node-feature-discovery:v0.11.3 \
(sha256:b06cbaf13cb5dd566535244ac6a5e973ee8a9f40e783c93041b69ca714c12111)...done
3.3. 配置yaml文件过滤GFD Labels¶
通过配置NFD yaml里的--extra-label-ns=xxx
可以过滤GFD的Labels,比如允许enflame.com
开头的Labels展示出来,那么就配置
nfd-master 的 --extra-label-ns
为--extra-label-ns=enflame.com
。
如果即要enflame.com
命名空间的标签又要tke.cloud.tencent.com
命名空间的标签,那么可以采用逗号分隔这两个命名空间,如下:
.................
image: artifact.enflame.cn/enflame_docker_images/enflame/node-feature-discovery:v0.11.3
name: nfd-master
command:
- "nfd-master"
args:
# - "--extra-label-ns=enflame.com"
- "--extra-label-ns=enflame.com,tke.cloud.tencent.com"
- env:
- name: NODE_NAME
valueFrom:
fieldRef:
fieldPath: spec.nodeName
..................
3.4. 部署NFD组件¶
node-feature-discovery_<VERSION># ./deploy.sh
Install node-feature-discovery start...
serviceaccount/nfd-master created
clusterrole.rbac.authorization.k8s.io/nfd-master created
clusterrolebinding.rbac.authorization.k8s.io/nfd-master created
daemonset.apps/nfd created
3.5. 检查NFD组件工作¶
查看pod运行正常:
node-feature-discovery_<VERSION> # kubectl get pod -A
NAMESPACE NAME \
READY STATUS RESTARTS AGE
kube-system etcd-sse-lg-112-32 \
1/1 Running 2 (58d ago) 64d
kube-system gcu-feature-discovery-cnm4h \
1/1 Running 0 19s
kube-system kube-apiserver-sse-lg-112-32 \
1/1 Running 2 (58d ago) 64d
kube-system kube-controller-manager-sse-lg-112-32 \
1/1 Running 2 (58d ago) 64d
kube-system kube-proxy-j2gbj \
1/1 Running 2 (58d ago) 64d
kube-system kube-scheduler-sse-lg-112-32 \
1/1 Running 2 (58d ago) 63d
kube-system nfd-fblrn \
2/2 Running 0 7m13s
.........
执行 kubectl describe node
查看节点标签更新成功:
Name: xxxxxxxx
Roles: control-plane
Labels: beta.kubernetes.io/arch=amd64
...
feature.node.kubernetes.io/cpu-rdt.RDTMBM=true
feature.node.kubernetes.io/cpu-rdt.RDTMON=true
feature.node.kubernetes.io/custom-rdma.available=true
feature.node.kubernetes.io/custom-rdma.capable=true
feature.node.kubernetes.io/kernel-version.full=5.4.0-131-generic
feature.node.kubernetes.io/kernel-version.major=5
feature.node.kubernetes.io/kernel-version.minor=4
feature.node.kubernetes.io/kernel-version.revision=0
feature.node.kubernetes.io/memory-numa=true
feature.node.kubernetes.io/pci-1a03.present=true
feature.node.kubernetes.io/pci-1e36.present=true
feature.node.kubernetes.io/storage-nonrotationaldisk=true
...
......................
或执行 kubectl get nodes -o yaml
...
feature.node.kubernetes.io/cpu-rdt.RDTMBM=true
feature.node.kubernetes.io/cpu-rdt.RDTMON=true
feature.node.kubernetes.io/custom-rdma.available=true
feature.node.kubernetes.io/custom-rdma.capable=true
feature.node.kubernetes.io/kernel-version.full=5.4.0-131-generic
feature.node.kubernetes.io/kernel-version.major=5
feature.node.kubernetes.io/kernel-version.minor=4
feature.node.kubernetes.io/kernel-version.revision=0
feature.node.kubernetes.io/memory-numa=true
feature.node.kubernetes.io/pci-1a03.present=true
feature.node.kubernetes.io/pci-1e36.present=true
feature.node.kubernetes.io/storage-nonrotationaldisk=true
...
3.6. 卸载NFD组件¶
node-feature-discovery_<VERSION># ./delete.sh
Uninstall node-feature-discovery start...
serviceaccount "nfd-master" deleted
clusterrole.rbac.authorization.k8s.io "nfd-master" deleted
clusterrolebinding.rbac.authorization.k8s.io "nfd-master" deleted
daemonset.apps "nfd" deleted
4. 常见问题¶
1)如何修改默认的镜像与名称?
自定义node-feature-discovery镜像名称
build-image.sh 里默认的镜像路径与名称为: “artifact.enflame.cn/enflame_docker_images/enflame/node-feature-discovery:v0.11.3”,如下:
ORIGIN_NAME="node-feature-discovery"
VERSION="v0.11.3"
REPO="artifact.enflame.cn/enflame_docker_images/enflame"
可以根据自己的需要自定义这个镜像路径与名称。
2)如何获取更多文档?
topscloud里的node-feature-discovery与开源版本100%兼容,其他相关介绍见官方文档:
# 复制到浏览器查看
https://kubernetes-sigs.github.io/node-feature-discovery/stable/get-started/index.html