1. 版本申明

版本 修改内容 修改时间
v1.0 初始化 6/25/2023
v1.1 更新一些格式以及内容 4/8/2024
v1.2 更新一些格式 4/9/2024
v1.3 更新部分内容 5/17/2024
v1.4 更新部分内容与格式 7/17/2024

2. 简介

Node Feature Discovery是一款部署在k8s集群上的用于检测硬件功能和系统配置的 Kubernetes 插件。

3. 部署示例

3.1. 部署要求

  • 安装docker

  • k8s集群版本高于1.8

3.2. 制作NFD组件镜像

在topscloud的release包中,打开NFD的目录:

node-feature-discovery_<VERSION>/
├── bin
│   ├── build-from-source.sh
│   ├── nfd-master
│   ├── nfd-topology-updater
│   └── nfd-worker
├── build-image.sh
├── delete.sh
├── deploy.sh
├── docker
│   └── Dockerfile.ubuntu
├── README.md
└── yaml
    └── nfd.yaml

执行build-image.sh脚本一键构建GFD组件镜像:

node-feature-discovery_<VERSION> # ./build-image.sh
1. Clear old image if exist
Untagged: artifact.enflame.cn/enflame_docker_images/enflame/ \
                                    node-feature-discovery:v0.11.3
Deleted: sha256:a88067635e8ec9f5535e06c26e74d4c1b0e45558d195c446b6cd79df7d7725c5
artifact.enflame.cn/enflame_docker_images/enflame/node-feature-discovery:v0.11.3
2. Build image start...
image name:artifact.enflame.cn/enflame_docker_images/enflame/ \
                          node-feature-discovery, image version:v0.11.3
Sending build context to Docker daemon   87.2MB
Step 1/6 : FROM ubuntu:18.04
 ---> f9a80a55f492
Step 2/6 : WORKDIR .
 ---> Running in 6fbf881948e7
Removing intermediate container 6fbf881948e7
 ---> a8fb7fec5d0c
Step 3/6 : ENV GRPC_GO_LOG_SEVERITY_LEVEL="INFO"
 ---> Running in f1eb62b438c4
Removing intermediate container f1eb62b438c4
 ---> 144ccf94017e
Step 4/6 : COPY ./bin/nfd-master /usr/bin/
 ---> 0c8b53b99841
Step 5/6 : COPY ./bin/nfd-topology-updater /usr/bin/
 ---> 98deeef25ef1
Step 6/6 : COPY ./bin/nfd-worker /usr/bin/
 ---> 0deb80d00ff6
Successfully built 0deb80d00ff6
Successfully tagged artifact.enflame.cn/enflame_docker_images/ \
                                enflame/node-feature-discovery:v0.11.3
build image success
3. save image to ./images
unpacking artifact.enflame.cn/enflame_docker_images/enflame/node-feature-discovery:v0.11.3 \
(sha256:b06cbaf13cb5dd566535244ac6a5e973ee8a9f40e783c93041b69ca714c12111)...done

3.3. 配置yaml文件过滤GFD Labels

通过配置NFD yaml里的--extra-label-ns=xxx可以过滤GFD的Labels,比如允许enflame.com开头的Labels展示出来,那么就配置 nfd-master 的 --extra-label-ns--extra-label-ns=enflame.com。 如果即要enflame.com命名空间的标签又要tke.cloud.tencent.com命名空间的标签,那么可以采用逗号分隔这两个命名空间,如下:

.................
          image: artifact.enflame.cn/enflame_docker_images/enflame/node-feature-discovery:v0.11.3
          name: nfd-master
          command:
            - "nfd-master"
          args:
          #  - "--extra-label-ns=enflame.com"
            - "--extra-label-ns=enflame.com,tke.cloud.tencent.com"
        - env:
          - name: NODE_NAME
            valueFrom:
              fieldRef:
                fieldPath: spec.nodeName
..................

3.4. 部署NFD组件

node-feature-discovery_<VERSION># ./deploy.sh
Install node-feature-discovery start...
serviceaccount/nfd-master created
clusterrole.rbac.authorization.k8s.io/nfd-master created
clusterrolebinding.rbac.authorization.k8s.io/nfd-master created
daemonset.apps/nfd created

3.5. 检查NFD组件工作

查看pod运行正常:

node-feature-discovery_<VERSION> # kubectl get pod -A
NAMESPACE     NAME                                       \
                                  READY   STATUS    RESTARTS      AGE
kube-system   etcd-sse-lg-112-32                         \
                                  1/1     Running   2 (58d ago)   64d
kube-system   gcu-feature-discovery-cnm4h                \
                                  1/1     Running   0             19s
kube-system   kube-apiserver-sse-lg-112-32               \
                                  1/1     Running   2 (58d ago)   64d
kube-system   kube-controller-manager-sse-lg-112-32      \
                                  1/1     Running   2 (58d ago)   64d
kube-system   kube-proxy-j2gbj                           \
                                  1/1     Running   2 (58d ago)   64d
kube-system   kube-scheduler-sse-lg-112-32               \
                                  1/1     Running   2 (58d ago)   63d
kube-system   nfd-fblrn                                  \
                                  2/2     Running   0             7m13s
.........

执行 kubectl describe node查看节点标签更新成功:

Name:         xxxxxxxx
Roles:        control-plane
Labels:       beta.kubernetes.io/arch=amd64
              ...
              feature.node.kubernetes.io/cpu-rdt.RDTMBM=true
              feature.node.kubernetes.io/cpu-rdt.RDTMON=true
              feature.node.kubernetes.io/custom-rdma.available=true
              feature.node.kubernetes.io/custom-rdma.capable=true
              feature.node.kubernetes.io/kernel-version.full=5.4.0-131-generic
              feature.node.kubernetes.io/kernel-version.major=5
              feature.node.kubernetes.io/kernel-version.minor=4
              feature.node.kubernetes.io/kernel-version.revision=0
              feature.node.kubernetes.io/memory-numa=true
              feature.node.kubernetes.io/pci-1a03.present=true
              feature.node.kubernetes.io/pci-1e36.present=true
              feature.node.kubernetes.io/storage-nonrotationaldisk=true
              ...
......................

或执行 kubectl get nodes -o yaml

    ...
    feature.node.kubernetes.io/cpu-rdt.RDTMBM=true
    feature.node.kubernetes.io/cpu-rdt.RDTMON=true
    feature.node.kubernetes.io/custom-rdma.available=true
    feature.node.kubernetes.io/custom-rdma.capable=true
    feature.node.kubernetes.io/kernel-version.full=5.4.0-131-generic
    feature.node.kubernetes.io/kernel-version.major=5
    feature.node.kubernetes.io/kernel-version.minor=4
    feature.node.kubernetes.io/kernel-version.revision=0
    feature.node.kubernetes.io/memory-numa=true
    feature.node.kubernetes.io/pci-1a03.present=true
    feature.node.kubernetes.io/pci-1e36.present=true
    feature.node.kubernetes.io/storage-nonrotationaldisk=true
    ...

3.6. 卸载NFD组件

node-feature-discovery_<VERSION># ./delete.sh
Uninstall node-feature-discovery start...
serviceaccount "nfd-master" deleted
clusterrole.rbac.authorization.k8s.io "nfd-master" deleted
clusterrolebinding.rbac.authorization.k8s.io "nfd-master" deleted
daemonset.apps "nfd" deleted

4. 常见问题

1)如何修改默认的镜像与名称?

自定义node-feature-discovery镜像名称

build-image.sh 里默认的镜像路径与名称为: “artifact.enflame.cn/enflame_docker_images/enflame/node-feature-discovery:v0.11.3”,如下:

ORIGIN_NAME="node-feature-discovery"
VERSION="v0.11.3"
REPO="artifact.enflame.cn/enflame_docker_images/enflame"

可以根据自己的需要自定义这个镜像路径与名称。

2)如何获取更多文档?

topscloud里的node-feature-discovery与开源版本100%兼容,其他相关介绍见官方文档:

# 复制到浏览器查看
https://kubernetes-sigs.github.io/node-feature-discovery/stable/get-started/index.html