Kubernetes에 GPU 노드 추가(2) - GPU Worker 노드 추가

Notice

Recent Posts

Recent Comments

Link

« 2025/12 »
일	월	화	수	목	금	토
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31

Tags more

Archives

Today

Total

관리 메뉴

JUST WRITE

Kubernetes에 GPU 노드 추가(2) - GPU Worker 노드 추가 본문

Infra/Kubernetes

Kubernetes에 GPU 노드 추가(2) - GPU Worker 노드 추가

천재보단범재 2024. 6. 5. 18:04

GPU Worker 노드 추가

저번 포스팅에서는 Kubernetes 클러스터에 Worker 노드를 추가하기 전 필요한 세팅을 진행하였습니다.

Kubernetes에 GPU 노드 추가(1) - GPU 노드 세팅

GPU 노드 세팅컨테이너화된 애플리케이션이 주로 개발이 되면서 Kubernetes의 활용도가 높아지고 있습니다.AI 서비스 역시 Kubernetes에 배포, 운영되는 경우가 많아지고 있습니다.AI 서비스에서 인퍼

developnote-blog.tistory.com

이번 포스팅에서는 Kubernetes 클러스터에 Worker 노드를 추가하고 필요한 추가 세팅을 진행하겠습니다.

Worker 노드 추가

Master 노드에서 kubeadm 명령어로 토큰을 생성합니다.

토큰을 생성하면서 --print-join-command 옵션을 넣으면 편하게 명령어 형태로 출력해 줍니다.

$ kubeadm token create --print-join-command
kubeadm join ***.***.***:6443 --token th2fzc.3azjls60bfq879rp --discovery-token-ca-cert-hash sha256:6001a79736375bc109cc6d8933c173f1fa47b1dafc59ed2750ecf1d6b7a773f8

출력으로 나온 join 명령어를 복사해서 Worker 노드에 실행해 줍니다.

해당 명령어를 실행 시 root 계정이나 sudo 권한으로 실행합니다.

노드 추가 관련한 자세한 사항은 아래 포스팅에서 확인하실 수 있습니다.

Kubernets Cluster에 Worker Node 추가

Kubernetes Cluster에 Worker Node 추가 현재 Private Cloud 환경에서 Kubernetes Cluster를 구성하는 프로젝트를 진행하고 있다. 높은 사양으로 제공받다 보니 고려할 부분이 많았다. 한 Server에 Disk가 40개 정도 되

developnote-blog.tistory.com

명령어 실행 후 Master 노드에서 kubectl 명령어로 추가한 노드를 확인합니다.

$ kubectl get no
NAME             STATUS   ROLES           AGE    VERSION
k8sm01.***.***   Ready    control-plane   70d    v1.30.0
k8sm02.***.***   Ready    control-plane   327d   v1.30.0
k8sm03.***.***   Ready    control-plane   327d   v1.30.0
k8sw01.***.***   Ready    <none>          327d   v1.30.0
k8sw02.***.***   Ready    <none>          255d   v1.30.0
k8sw03.***.***   Ready    <none>          327d   v1.30.0
k8sw04.***.***   Ready    <none>          1d     v1.30.0

Taints 설정

GPU 노드를 Worker 노드로 추가하고 나면 테인트 설정을 합니다.

GPU 리소스 활용을 극대화하기 위해서 일반 작업 제외한 GPU 작업만 실행하도록 설정합니다.

테인트 설정을 통해서 GPU 노드에는 GPU 작업 Pod만 스케줄 될 수 있도록 설정합니다.

테인트(Taints)와 톨러레이션(Tolerations)을 통해서 원하는 Pod을 노드에 스케줄링할 수 있습니다.

Taints

테인트는 사전적 의미로 오염을 말합니다.

테인트에 오염(적용)된 노드에는 일반 Pod이 배치될 수 없습니다.

해당 오염에 견딜 수 있는 Pod만 스케줄링할 수 있습니다.

특정 Pod을 스케줄링하기 위해 테인트를 적용합니다.

아래 명령어로 원하는 노드에 테인트를 적용할 수 있습니다.

$ kubectl taint nodes node1 key1=value1:NoSchedule

$ kubectl describe nodes node1
...
...
Taints:             key1=value1:NoSchedule
...
...

이렇게 테인트를 적용한 노드에는 일반 Pod이 스케줄링되지 않습니다.

Tolerations

톨러레이션은 사전적 의미로 내성, 허용을 말합니다.

톨러레이션은 Pod에 적용합니다.

테인트가 적용된 노드라도 테인트와 동일한 톨러레이션이 적용되어 있는 Pod은 스케줄링이 가능합니다.

Pod의 스펙 부분에 톨러레이션을 적용할 수 있습니다.

tolerations:
- key: "key1"
  operator: "Equal"
  value: "value1"
  effect: "NoSchedule"

테인트와 톨러레이션을 간단하게 정리해 보았습니다.

이번 포스팅은 GPU 노드 추가에 관련된 포스팅이니 다음에 자세히 정리하도록 하겠습니다.

GPU 노드에 테인트를 설정하고 GPU 작업 Pod에 톨러레이션을 적용합니다.

그러면 톨러레이션이 적용된 GPU 작업 Pod만 GPU 노드에 스케줄링하게 됩니다.

저는 아래와 같이 테인트를 적용하였습니다.

$ kubectl taint nodes k8sw04.***.*** nvidia.com/gpu:NoSchedule

$ kubectl describe no k8sw04.***.*** | grep Taint
Taints:             nvidia.com/gpu:NoSchedule

nvidia-device-plugin 설치

테인트 설정을 통해 GPU 작업 Pod만 스케줄링하도록 세팅하였습니다.

다음 세팅은 nvidia-device-plugin 설치입니다.

Kubernetes에서 GPU를 사용하기 위해서 필요한 플러그인입니다.

플러그인은 데몬 셋 형태로 실행이 되며 노드에서 사용하는 GPU 수를 보여주고,

Pod에서 GPU를 사용할 수 있도록 해줍니다.

GitHub - NVIDIA/k8s-device-plugin: NVIDIA device plugin for Kubernetes

NVIDIA device plugin for Kubernetes. Contribute to NVIDIA/k8s-device-plugin development by creating an account on GitHub.

github.com

helm 형태로도 설치가 가능하지만 데몬셋 yaml 파일로 설치해 보겠습니다.

nvidia-device-plugin github에서 다운로드 가능합니다.

$ wget https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/v0.15.0/deployments/static/nvidia-device-plugin.yml
--2024-07-02 08:45:03--  https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/v0.15.0/deployments/static/nvidia-device-plugin.yml
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.110.133, 185.199.108.133, 185.199.111.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.110.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1894 (1.8K) [text/plain]
Saving to: ‘nvidia-device-plugin.yml.1’

nvidia-device-plugin.yml                                        100%[===============================================================================================================================================================>]   1.85K  --.-KB/s    in 0s

2024-07-02 08:45:03 (23.9 MB/s) - ‘nvidia-device-plugin.yml’ saved [1894/1894]


$ cat nvidia-device-plugin.yml
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: nvidia-device-plugin-daemonset
  namespace: kube-system
spec:
  selector:
    matchLabels:
      name: nvidia-device-plugin-ds
  updateStrategy:
    type: RollingUpdate
  template:
    metadata:
      labels:
        name: nvidia-device-plugin-ds
    spec:
      tolerations:
      - key: nvidia.com/gpu
        operator: Exists
        effect: NoSchedule
      # Mark this pod as a critical add-on; when enabled, the critical add-on
      # scheduler reserves resources for critical add-on pods so that they can
      # be rescheduled after a failure.
      # See https://kubernetes.io/docs/tasks/administer-cluster/guaranteed-scheduling-critical-addon-pods/
      priorityClassName: "system-node-critical"
      containers:
      - image: nvcr.io/nvidia/k8s-device-plugin:v0.15.0
        name: nvidia-device-plugin-ctr
        env:
          - name: FAIL_ON_INIT_ERROR
            value: "false"
        securityContext:
          allowPrivilegeEscalation: false
          capabilities:
            drop: ["ALL"]
        volumeMounts:
        - name: device-plugin
          mountPath: /var/lib/kubelet/device-plugins
      volumes:
      - name: device-plugin
        hostPath:
          path: /var/lib/kubelet/device-plugins

yaml 파일을 살펴보면 스펙에 톨로레이션이 설정되어 있는 것을 확인할 수 있습니다.

여기에 추가로 설정할 부분이 있습니다.

설치를 진행하고 있는 Kubernetes 클러스터 환경은 Master 노드 3개, Worker 노드 4개입니다.

Worker 노드 중 1개만 GPU 노드고 나머지는 CPU 노드입니다.

nvidia-device-plugin 데몬 셋은 GPU 노드만 실행되면 됩니다.

어피니티 설정을 통해서 데몬 셋을 GPU 노드에만 스케줄링 되도록 하겠습니다.

affinity 설정

어피니티를 적용하려면 먼저 GPU 노드에 특정 레이블을 지정해야 합니다.

아래 명령어로 레이블을 지정합니다.

$ kubectl label node k8sw04.***.*** nvidia.com/gpu=true 
node/k8sw04.***.*** labeled

nvidia-device-plugin 데몬셋 yaml에 아래와 같이 어피니티를 추가합니다.

어피니티 설정을 통해서 레이블이 지정된 노드에만 스케줄링되도록 합니다.

affinity: 
  nodeAffinity: 
    requiredDuringSchedulingIgnoredDuringExecution: 
      nodeSelectorTerms: 
      - matchExpressions: 
        - key: nvidia.com/gpu 
          operator: In 
          values: 
          - "true"

어피니티 설정까지 완료하였으면 이제 nvidia-device-plugin 데몬셋을 실행합니다.

$ kubectl apply -f nvidia-device-plugin.yml

$ kubectl get po -n kube-system -o wide
NAME                                          READY   STATUS        RESTARTS           AGE   IP               NODE                  NOMINATED NODE   READINESS GATES
coredns-76f75df574-5kthg                      1/1     Running       0                  51d   10.244.1.50      k8sm02.***.***        <none>           <none>
coredns-76f75df574-zg9qk                      1/1     Running       0                  68d   10.244.8.3       k8sm01.***.***        <none>           <none>
etcd-k8sm01.***.***                           1/1     Running       1 (68d ago)        68d   192.168.218.61   k8sm01.***.***        <none>           <none>
etcd-k8sm02.***.***                           1/1     Running       1 (68d ago)        68d   192.168.218.62   k8sm02.***.***        <none>           <none>
etcd-k8sm03.***.***                           1/1     Running       1 (50d ago)        68d   192.168.218.63   k8sm03.***.***        <none>           <none>
fluentd-72rfs                                 1/1     Running       746 (3h54m ago)    50d   10.244.3.152     k8sw02.***.***        <none>           <none>
fluentd-b8gsq                                 1/1     Running       1359 (3h54m ago)   50d   10.244.5.182     k8sw01.***.***        <none>           <none>
fluentd-cbx62                                 1/1     Running       19 (3h54m ago)     50d   10.244.8.15      k8sm01.***.***        <none>           <none>
fluentd-md92z                                 1/1     Running       19 (3h54m ago)     50d   10.244.1.51      k8sm02.***.***        <none>           <none>
fluentd-vd6hx                                 1/1     Running       17 (3h52m ago)     49d   10.244.0.2       k8sw04.***.***        <none>           <none>
fluentd-wg5lp                                 1/1     Running       33 (3h53m ago)     50d   10.244.4.79      k8sw03.***.***        <none>           <none>
fluentd-xpcwg                                 1/1     Running       17                 50d   10.244.2.17      k8sm03.***.***        <none>           <none>
kube-apiserver-k8sm01.***.***                 1/1     Running       1 (68d ago)        68d   192.168.218.61   k8sm01.***.***        <none>           <none>
kube-apiserver-k8sm02.***.***                 1/1     Running       1 (68d ago)        68d   192.168.218.62   k8sm02.***.***        <none>           <none>
kube-apiserver-k8sm03.***.***                 1/1     Running       2 (50d ago)        68d   192.168.218.63   k8sm03.***.***        <none>           <none>
kube-controller-manager-k8sm01.***.***        1/1     Running       2 (13d ago)        68d   192.168.218.61   k8sm01.***.***        <none>           <none>
kube-controller-manager-k8sm02.***.***        1/1     Running       1 (68d ago)        68d   192.168.218.62   k8sm02.***.***        <none>           <none>
kube-controller-manager-k8sm03.***.***        1/1     Running       2 (50d ago)        68d   192.168.218.63   k8sm03.***.***        <none>           <none>
kube-proxy-cbpl5                              1/1     Running       1 (50d ago)        68d   192.168.218.63   k8sm03.***.***        <none>           <none>
kube-proxy-hm2h2                              1/1     Running       3 (28d ago)        68d   192.168.218.64   k8sw01.***.***        <none>           <none>
kube-proxy-lfc2l                              1/1     Running       1 (50d ago)        68d   192.168.218.66   k8sw03.***.***        <none>           <none>
kube-proxy-ln2r9                              1/1     Running       0                  68d   192.168.218.62   k8sm02.***.***        <none>           <none>
kube-proxy-m8d7d                              1/1     Running       2 (50d ago)        68d   192.168.218.65   k8sw02.***.***        <none>           <none>
kube-proxy-s5ngr                              1/1     Running       0                  49d   192.168.218.67   k8sw04.***.***        <none>           <none>
kube-proxy-z5csm                              1/1     Running       0                  68d   192.168.218.61   k8sm01.***.***        <none>           <none>
kube-scheduler-k8sm01.***.***                 1/1     Running       1 (68d ago)        68d   192.168.218.61   k8sm01.***.***        <none>           <none>
kube-scheduler-k8sm02.***.***                 1/1     Running       2 (13d ago)        68d   192.168.218.62   k8sm02.***.***        <none>           <none>
kube-scheduler-k8sm03.***.***                 1/1     Running       2 (50d ago)        68d   192.168.218.63   k8sm03.***.***        <none>           <none>
metrics-server-64b67757d9-8svcc               0/1     Terminating   4 (18d ago)        50d   192.168.218.65   k8sw02.***.***        <none>           <none>
metrics-server-64b67757d9-9n5g8               1/1     Running       0                  18d   192.168.218.64   k8sw01.***.***        <none>           <none>
metrics-server-75bf74fc9-88fc8                0/1     Running       3 (15h ago)        30d   192.168.218.66   k8sw03.***.***        <none>           <none>
nvidia-device-plugin-daemonset-bxglp          1/1     Running       0                  49d   10.244.0.11      k8sw04.***.***        <none>           <none

레이블을 지정한 GPU 노드에만 데몬셋 Pod이 스케줄링된것을 확인할 수 있었습니다.

데몬 셋 Pod의 로그를 확인하면 아래와 같습니다.

$ kubectl logs -n kube-system nvidia-device-plugin-daemonset-bxglp 
I0514 06:15:21.157807       1 main.go:178] Starting FS watcher. 
I0514 06:15:21.157970       1 main.go:185] Starting OS watcher. 
I0514 06:15:21.158407       1 main.go:200] Starting Plugins. 
I0514 06:15:21.158459       1 main.go:257] Loading configuration. 
I0514 06:15:21.159637       1 main.go:265] Updating config with default resource matching patterns. 
I0514 06:15:21.160131       1 main.go:276] 
Running with config: 
{ 
  "version": "v1", 
  "flags": { 
    "migStrategy": "none", 
    "failOnInitError": false, 
    "mpsRoot": "", 
    "nvidiaDriverRoot": "/", 
    "gdsEnabled": false, 
    "mofedEnabled": false, 
    "useNodeFeatureAPI": null, 
    "plugin": { 
      "passDeviceSpecs": false, 
      "deviceListStrategy": [ 
        "envvar" 
      ], 
      "deviceIDStrategy": "uuid", 
      "cdiAnnotationPrefix": "cdi.k8s.io/", 
      "nvidiaCTKPath": "/usr/bin/nvidia-ctk", 
      "containerDriverRoot": "/driver-root" 
    } 
  }, 
  "resources": { 
    "gpus": [ 
      { 
        "pattern": "*", 
        "name": "nvidia.com/gpu" 
      } 
    ] 
  }, 
  "sharing": { 
    "timeSlicing": {} 
  } 
} 
I0514 06:15:21.160151       1 main.go:279] Retrieving plugins. 
I0514 06:15:21.161290       1 factory.go:104] Detected NVML platform: found NVML library 
I0514 06:15:21.161352       1 factory.go:104] Detected non-Tegra platform: /sys/devices/soc0/family file not found 
I0514 06:15:21.205326       1 server.go:216] Starting GRPC server for 'nvidia.com/gpu' 
I0514 06:15:21.206230       1 server.go:147] Starting to serve 'nvidia.com/gpu' on /var/lib/kubelet/device-plugins/nvidia-gpu.sock 
I0514 06:15:21.208545       1 server.go:154] Registered device plugin for 'nvidia.com/gpu' with Kubelet

nvidia-device-plugin을 통해 현재 GPU 리소스 상태도 확인해 볼 수 있습니다.

$ kubectl get nodes "-o=custom-columns=NAME:.metadata.name,GPU:.status.allocatable.nvidia\.com/gpu" 
NAME             GPU 
k8sm01.***.***   <none> 
k8sm02.***.***   <none> 
k8sm03.***.***   <none> 
k8sw01.***.***   <none> 
k8sw02.***.***   <none> 
k8sw03.***.***   <none> 
k8sw04.***.***   1

정리

Kubernetes 클러스터에 GPU 노드를 추가해보았습니다.
GPU 노드에 nvidia-driver, NVIDIA Container toolkit를 세팅하고,

노드 추가 후에도 테인트 설정, nvidia-device-plugin을 설치하였습니다.

다음 포스팅에서는 Kubernetes에서 GPU를 활용할 수 있는 방안을 정리해보도록 하겠습니다.

[참고사이트]

Taints and Tolerations

Node affinity is a property of Pods that attracts them to a set of nodes (either as a preference or a hard requirement). Taints are the opposite -- they allow a node to repel a set of pods. Tolerations are applied to pods. Tolerations allow the scheduler t

kubernetes.io

Integrating GPU Telemetry into Kubernetes — NVIDIA GPU Telemetry 1.0.0 documentation

Benefits of GPU Telemetry Understanding GPU usage provides important insights for IT administrators managing a data center. Trends in GPU metrics correlate with workload behavior and make it possible to optimize resource allocation, diagnose anomalies, and

docs.nvidia.com

Assigning Pods to Nodes

You can constrain a Pod so that it is restricted to run on particular node(s), or to prefer to run on particular nodes. There are several ways to do this and the recommended approaches all use label selectors to facilitate the selection. Often, you do not

kubernetes.io

GitHub - NVIDIA/k8s-device-plugin: NVIDIA device plugin for Kubernetes

NVIDIA device plugin for Kubernetes. Contribute to NVIDIA/k8s-device-plugin development by creating an account on GitHub.

github.com

728x90

저작자표시 비영리 변경금지 (새창열림)

'Infra > Kubernetes' 카테고리의 다른 글

Kubernetes에 GPU 노드 추가(1) - GPU 노드 세팅 (0)	2024.05.14
업그레이드해도 될까요? - Control Plane Upgrade (0)	2024.04.25
여기만 사용해! - 특정 Namespace 전용 User 생성 (0)	2024.02.15
k8s 날 거부하지 마 - Certificate 만료 갱신 (0)	2023.10.18
명령어 한 번에 Kubernetes 설치하기(2) - AWS ENI를 이용한 설치 (0)	2023.10.04