经验首页 前端设计 程序设计 Java相关 移动开发 数据库/运维 软件/图像 大数据/云计算 其他经验
当前位置:技术经验 » 数据库/运维 » Kubernetes » 查看文章
Kubernetes集群部署Node Feature Discovery组件用于检测集群节点特性
来源:cnblogs  作者:人艰不拆_zmc  时间:2024/3/15 9:04:15  对本文有异议

 1、概述

Node Feature Discovery(NFD)是由Intel创建的项目,能够帮助Kubernetes集群更智能地管理节点资源。它通过检测每个节点的特性能力(例如CPU型号、GPU型号、内存大小等)并将这些能力以标签的形式发送到Kubernetes集群的API服务器(kube-apiserver)。然后,通过kube-apiserver修改节点的标签。这些标签可以帮助调度器(kube-scheduler)更智能地选择最适合特定工作负载的节点来运行Pod。

Github:https://github.com/kubernetes-sigs/node-feature-discovery
Docs:https://kubernetes-sigs.github.io/node-feature-discovery/master/get-started/index.html

2、组件架构

NFD 细分为 NFD-Master 和 NFD-Worker 两个组件:

NFD-Master:是一个负责与 kubernetes API Server 通信的Deployment Pod,它从 NFD-Worker 接收节点特性并相应地修改 Node 资源对象(标签、注解)。

NFD-Worker:是一个负责对 Node 的特性能力进行检测的 Daemon Pod,然后它将信息传递给 NFD-Master,NFD-Worker 应该在每个 Node 上运行。

可以检测发现的硬件特征源(feature sources)清单包括:

  • CPU
  • IOMMU
  • Kernel
  • Memory
  • Network
  • PCI
  • Storage
  • System
  • USB
  • Custom (rule-based custom features)
  • Local (hooks for user-specific features)

 3、组件安装

(1)安装前查看集群节点状态

  1. [root@master-10 ~]# kubectl get nodes
  2. NAME STATUS ROLES AGE VERSION
  3. master-10.20.31.105 Ready control-plane,master,worker 31h v1.21.5

节点详细信息,主要关注标签、注解。

  1. [root@master-10 ~]# kubectl describe nodes master-10.20.31.105
  2. Name: master-10.20.31.105
  3. Roles: control-plane,master,worker
  4. Labels: beta.kubernetes.io/arch=amd64
  5. beta.kubernetes.io/os=linux
  6. kubernetes.io/arch=amd64
  7. kubernetes.io/hostname=master-10.20.31.105
  8. kubernetes.io/os=linux
  9. node-role.kubernetes.io/control-plane=
  10. node-role.kubernetes.io/master=
  11. node-role.kubernetes.io/worker=
  12. node.kubernetes.io/exclude-from-external-load-balancers=
  13. Annotations: flannel.alpha.coreos.com/backend-data: {"VtepMAC":"c6:fb:4b:8a:bb:12"}
  14. flannel.alpha.coreos.com/backend-type: vxlan
  15. flannel.alpha.coreos.com/kube-subnet-manager: true
  16. flannel.alpha.coreos.com/public-ip: 10.20.31.105
  17. kubeadm.alpha.kubernetes.io/cri-socket: /var/run/dockershim.sock
  18. node.alpha.kubernetes.io/ttl: 0
  19. volumes.kubernetes.io/controller-managed-attach-detach: true
  20. CreationTimestamp: Tue, 12 Mar 2024 21:01:31 -0400
  21. Taints: <none>
  22. ........

 (2)组件安装

  1. [root@master-10 opt]# kubectl apply -k https://github.com/kubernetes-sigs/node-feature-discovery/deployment/overlays/default?ref=v0.14.2
  2. namespace/node-feature-discovery created
  3. customresourcedefinition.apiextensions.k8s.io/nodefeaturerules.nfd.k8s-sigs.io created
  4. customresourcedefinition.apiextensions.k8s.io/nodefeatures.nfd.k8s-sigs.io created
  5. serviceaccount/nfd-master created
  6. serviceaccount/nfd-worker created
  7. role.rbac.authorization.k8s.io/nfd-worker created
  8. clusterrole.rbac.authorization.k8s.io/nfd-master created
  9. rolebinding.rbac.authorization.k8s.io/nfd-worker created
  10. clusterrolebinding.rbac.authorization.k8s.io/nfd-master created
  11. configmap/nfd-master-conf created
  12. configmap/nfd-worker-conf created
  13. service/nfd-master created
  14. deployment.apps/nfd-master created
  15. daemonset.apps/nfd-worker created

(3)查看组件状态

  1. [root@master-10 opt]# kubectl get pods -n=node-feature-discovery
  2. NAME READY STATUS RESTARTS AGE
  3. nfd-master-5c4684f5cb-hvjjb 1/1 Running 0 4m11s
  4. nfd-worker-cpwx6 1/1 Running 0 4m11s

(4)查看组件日志

可以看到nfd-worker组件默认每隔一分钟检测一次节点特性。

  1. [root@master-10 ~]# kubectl logs -f -n=node-feature-discovery nfd-worker-rlf5t
  2. I0314 06:30:32.003264 1 main.go:66] "-server is deprecated, will be removed in a future release along with the deprecated gRPC API"
  3. I0314 06:30:32.003372 1 nfd-worker.go:219] "Node Feature Discovery Worker" version="v0.14.2" nodeName="master-10.20.31.105" namespace="node-feature-discovery"
  4. I0314 06:30:32.003589 1 nfd-worker.go:520] "configuration file parsed" path="/etc/kubernetes/node-feature-discovery/nfd-worker.conf"
  5. I0314 06:30:32.004500 1 nfd-worker.go:552] "configuration successfully updated" configuration={"Core":{"Klog":{},"LabelWhiteList":{},"NoPublish":false,"FeatureSources":["all"],"Sources":null,"LabelSources":["all"],"SleepInterval":{"Duration":60000000000}},"Sources":{"cpu":{"cpuid":{"attributeBlacklist":["BMI1","BMI2","CLMUL","CMOV","CX16","ERMS","F16C","HTT","LZCNT","MMX","MMXEXT","NX","POPCNT","RDRAND","RDSEED","RDTSCP","SGX","SGXLC","SSE","SSE2","SSE3","SSE4","SSE42","SSSE3","TDX_GUEST"]}},"custom":[],"fake":{"labels":{"fakefeature1":"true","fakefeature2":"true","fakefeature3":"true"},"flagFeatures":["flag_1","flag_2","flag_3"],"attributeFeatures":{"attr_1":"true","attr_2":"false","attr_3":"10"},"instanceFeatures":[{"attr_1":"true","attr_2":"false","attr_3":"10","attr_4":"foobar","name":"instance_1"},{"attr_1":"true","attr_2":"true","attr_3":"100","name":"instance_2"},{"name":"instance_3"}]},"kernel":{"KconfigFile":"","configOpts":["NO_HZ","NO_HZ_IDLE","NO_HZ_FULL","PREEMPT"]},"local":{},"pci":{"deviceClassWhitelist":["03","0b40","12"],"deviceLabelFields":["class","vendor"]},"usb":{"deviceClassWhitelist":["0e","ef","fe","ff"],"deviceLabelFields":["class","vendor","device"]}}}
  6. I0314 06:30:32.004796 1 metrics.go:70] "metrics server starting" port=8081
  7. I0314 06:30:32.019135 1 nfd-worker.go:562] "starting feature discovery..."
  8. I0314 06:30:32.019364 1 nfd-worker.go:577] "feature discovery completed"
  9. I0314 06:31:32.021520 1 nfd-worker.go:562] "starting feature discovery..."
  10. I0314 06:31:32.021695 1 nfd-worker.go:577] "feature discovery completed"
  11. I0314 06:32:32.027970 1 nfd-worker.go:562] "starting feature discovery..."
  12. I0314 06:32:32.028141 1 nfd-worker.go:577] "feature discovery completed"

可以看到nfd-master组件启动后默认第一分钟相应地修改 Node 资源对象(标签、注解),之后是每隔一个小时修改一次 Node 资源对象(标签、注解),也就是说如果一个小时以内用户手动误修改node资源特性信息(标签、注解),最多需要一个小时nfd-master组件才自动更正node资源特性信息。

  1. [root@master-10 ~]# kubectl logs -n=node-feature-discovery nfd-master-5c4684f5cb-hvjjb
  2. I0314 06:23:08.190218 1 nfd-master.go:213] "Node Feature Discovery Master" version="v0.14.2" nodeName="master-10.20.31.105" namespace="node-feature-discovery"
  3. I0314 06:23:08.190356 1 nfd-master.go:1214] "configuration file parsed" path="/etc/kubernetes/node-feature-discovery/nfd-master.conf"
  4. I0314 06:23:08.190912 1 nfd-master.go:1274] "configuration successfully updated" configuration=<
  5. DenyLabelNs: {}
  6. EnableTaints: false
  7. ExtraLabelNs: {}
  8. Klog: {}
  9. LabelWhiteList: {}
  10. LeaderElection:
  11. LeaseDuration:
  12. Duration: 15000000000
  13. RenewDeadline:
  14. Duration: 10000000000
  15. RetryPeriod:
  16. Duration: 2000000000
  17. NfdApiParallelism: 10
  18. NoPublish: false
  19. ResourceLabels: {}
  20. ResyncPeriod:
  21. Duration: 3600000000000
  22. >
  23. I0314 06:23:08.190928 1 nfd-master.go:1338] "starting the nfd api controller"
  24. I0314 06:23:08.191105 1 node-updater-pool.go:79] "starting the NFD master node updater pool" parallelism=10
  25. I0314 06:23:08.860810 1 metrics.go:115] "metrics server starting" port=8081
  26. I0314 06:23:08.861033 1 component.go:36] [core][Server #1] Server created
  27. I0314 06:23:08.861050 1 nfd-master.go:347] "gRPC server serving" port=8080
  28. I0314 06:23:08.861084 1 component.go:36] [core][Server #1 ListenSocket #2] ListenSocket created
  29. I0314 06:23:09.860886 1 nfd-master.go:694] "will process all nodes in the cluster"
  30. I0314 06:23:09.923362 1 nfd-master.go:1086] "node updated" nodeName="master-10.20.31.105"
  31. I0314 07:23:09.224254 1 nfd-master.go:1086] "node updated" nodeName="master-10.20.31.105"
  32. I0314 08:23:09.081362 1 nfd-master.go:1086] "node updated" nodeName="master-10.20.31.105"

(5)查看节点特性信息

可以看到NFD组件已经把节点特性信息维护到了节点标签、注解上,其中标签前缀默认为 feature.node.kubernetes.io/。

  1. [root@master-10 opt]# kubectl describe node master-10.20.31.105
  2. Name: master-10.20.31.105
  3. Roles: control-plane,master,worker
  4. Labels: beta.kubernetes.io/arch=amd64
  5. beta.kubernetes.io/os=linux
  6. feature.node.kubernetes.io/cpu-cpuid.ADX=true
  7. feature.node.kubernetes.io/cpu-cpuid.AESNI=true
  8. feature.node.kubernetes.io/cpu-cpuid.AVX=true
  9. feature.node.kubernetes.io/cpu-cpuid.AVX2=true
  10. feature.node.kubernetes.io/cpu-cpuid.AVX512BW=true
  11. feature.node.kubernetes.io/cpu-cpuid.AVX512CD=true
  12. feature.node.kubernetes.io/cpu-cpuid.AVX512DQ=true
  13. feature.node.kubernetes.io/cpu-cpuid.AVX512F=true
  14. feature.node.kubernetes.io/cpu-cpuid.AVX512VL=true
  15. feature.node.kubernetes.io/cpu-cpuid.CMPXCHG8=true
  16. feature.node.kubernetes.io/cpu-cpuid.FMA3=true
  17. feature.node.kubernetes.io/cpu-cpuid.FXSR=true
  18. feature.node.kubernetes.io/cpu-cpuid.FXSROPT=true
  19. feature.node.kubernetes.io/cpu-cpuid.HLE=true
  20. feature.node.kubernetes.io/cpu-cpuid.HYPERVISOR=true
  21. feature.node.kubernetes.io/cpu-cpuid.LAHF=true
  22. feature.node.kubernetes.io/cpu-cpuid.MOVBE=true
  23. feature.node.kubernetes.io/cpu-cpuid.MPX=true
  24. feature.node.kubernetes.io/cpu-cpuid.OSXSAVE=true
  25. feature.node.kubernetes.io/cpu-cpuid.RTM=true
  26. feature.node.kubernetes.io/cpu-cpuid.SYSCALL=true
  27. feature.node.kubernetes.io/cpu-cpuid.SYSEE=true
  28. feature.node.kubernetes.io/cpu-cpuid.X87=true
  29. feature.node.kubernetes.io/cpu-cpuid.XSAVE=true
  30. feature.node.kubernetes.io/cpu-cpuid.XSAVEC=true
  31. feature.node.kubernetes.io/cpu-cpuid.XSAVEOPT=true
  32. feature.node.kubernetes.io/cpu-cpuid.XSAVES=true
  33. feature.node.kubernetes.io/cpu-hardware_multithreading=false
  34. feature.node.kubernetes.io/cpu-model.family=6
  35. feature.node.kubernetes.io/cpu-model.id=85
  36. feature.node.kubernetes.io/cpu-model.vendor_id=Intel
  37. feature.node.kubernetes.io/kernel-config.NO_HZ=true
  38. feature.node.kubernetes.io/kernel-config.NO_HZ_FULL=true
  39. feature.node.kubernetes.io/kernel-version.full=3.10.0-1160.105.1.el7.x86_64
  40. feature.node.kubernetes.io/kernel-version.major=3
  41. feature.node.kubernetes.io/kernel-version.minor=10
  42. feature.node.kubernetes.io/kernel-version.revision=0
  43. feature.node.kubernetes.io/pci-0300_15ad.present=true
  44. feature.node.kubernetes.io/system-os_release.ID=centos
  45. feature.node.kubernetes.io/system-os_release.VERSION_ID=7
  46. feature.node.kubernetes.io/system-os_release.VERSION_ID.major=7
  47. kubernetes.io/arch=amd64
  48. kubernetes.io/hostname=master-10.20.31.105
  49. kubernetes.io/os=linux
  50. node-role.kubernetes.io/control-plane=
  51. node-role.kubernetes.io/master=
  52. node-role.kubernetes.io/worker=
  53. node.kubernetes.io/exclude-from-external-load-balancers=
  54. Annotations: flannel.alpha.coreos.com/backend-data: {"VtepMAC":"c6:fb:4b:8a:bb:12"}
  55. flannel.alpha.coreos.com/backend-type: vxlan
  56. flannel.alpha.coreos.com/kube-subnet-manager: true
  57. flannel.alpha.coreos.com/public-ip: 10.20.31.105
  58. kubeadm.alpha.kubernetes.io/cri-socket: /var/run/dockershim.sock
  59. nfd.node.kubernetes.io/feature-labels:
  60. cpu-cpuid.ADX,cpu-cpuid.AESNI,cpu-cpuid.AVX,cpu-cpuid.AVX2,cpu-cpuid.AVX512BW,cpu-cpuid.AVX512CD,cpu-cpuid.AVX512DQ,cpu-cpuid.AVX512F,cpu-...
  61. nfd.node.kubernetes.io/master.version: v0.14.2
  62. nfd.node.kubernetes.io/worker.version: v0.14.2
  63. node.alpha.kubernetes.io/ttl: 0
  64. volumes.kubernetes.io/controller-managed-attach-detach: true
  65. CreationTimestamp: Tue, 12 Mar 2024 21:01:31 -0400

4、组件应用场景

Node Feature Discovery(NFD)组件的主要应用场景是在Kubernetes集群中提供更智能的节点调度。以下是一些NFD的常见应用场景:

  1. 智能节点调度:NFD可以帮助Kubernetes调度器更好地了解节点的特性和资源,从而更智能地选择最适合运行特定工作负载的节点。例如,如果某个Pod需要较强的GPU支持,调度器可以利用NFD标签来选择具有适当GPU型号的节点。

  2. 资源约束和优化:通过将节点的特性能力以标签的形式暴露给Kubernetes调度器,集群管理员可以更好地理解和利用集群中节点的资源情况,从而更好地进行资源约束和优化。

  3. 硬件感知的工作负载调度:对于特定的工作负载,可能需要特定类型或配置的硬件。NFD可以使调度器能够更加智能地选择具有适当硬件特性的节点来运行这些工作负载。

  4. 集群扩展性和性能:通过更智能地分配工作负载到节点,NFD可以提高集群的整体性能和效率。它可以帮助避免资源浪费,并确保工作负载能够充分利用可用的硬件资源。

  5. 集群自动化:NFD可以集成到自动化流程中,例如自动化部署或缩放工作负载。通过使用NFD,自动化系统可以更好地了解节点的特性和资源,从而更好地执行相应的操作。

总的来说,Node Feature Discovery(NFD)可以帮助提高Kubernetes集群的智能程度,使其能够更好地适应各种类型的工作负载和节点特性,从而提高集群的性能、可靠性和效率。

 

 5、总结

如果您的 Kubernetes 集群需要根据节点的硬件特性进行智能调度或者对节点的硬件资源进行感知和利用,那么安装 Node Feature Discovery(NFD)是有必要的。然而,如果您的集群中的节点都具有相似的硬件配置,且不需要考虑硬件资源的差异,那么不需要安装 NFD。

原文链接:https://www.cnblogs.com/zhangmingcheng/p/18072751

 友情链接:直通硅谷  点职佳  北美留学生论坛

本站QQ群:前端 618073944 | Java 606181507 | Python 626812652 | C/C++ 612253063 | 微信 634508462 | 苹果 692586424 | C#/.net 182808419 | PHP 305140648 | 运维 608723728

W3xue 的所有内容仅供测试,对任何法律问题及风险不承担任何责任。通过使用本站内容随之而来的风险与本站无关。
关于我们  |  意见建议  |  捐助我们  |  报错有奖  |  广告合作、友情链接(目前9元/月)请联系QQ:27243702 沸活量
皖ICP备17017327号-2 皖公网安备34020702000426号