Kubernetes GPU 指南教程

2024-09-01 08:59:06作者：齐添朝

This guide should help fellow researchers and hobbyists to easily automate and accelerate there deep leaning training with their own Kubernetes GPU cluster.

项目地址：https://gitcode.com/gh_mirrors/ku/Kubernetes-GPU-Guide

1. 项目的目录结构及介绍

Kubernetes-GPU-Guide/
├── README.md
├── deployments/
│   └── example-gpu-deployment.yaml
└── ...

README.md: 项目的主文档，包含项目的概述、安装指南和使用说明。
deployments/: 包含示例的部署文件，用于在 Kubernetes 集群中部署 GPU 应用。

2. 项目的启动文件介绍

`deployments/example-gpu-deployment.yaml`

该文件是一个示例的 Kubernetes 部署文件，用于在集群中启动一个使用 GPU 的 TensorFlow Jupyter 服务。

---
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: tf-jupyter
spec:
  replicas: 1
  template:
    metadata:
      labels:
        app: tf-jupyter
    spec:
      volumes:
        - hostPath:
            path: /usr/lib/nvidia-375/bin
          name: bin
        - hostPath:
            path: /usr/lib/nvidia-375
          name: lib
      containers:
        - name: tensorflow
          image: tensorflow/tensorflow:0.11.0rc0-gpu
          ports:
            - containerPort: 8888
          resources:
            limits:
              alpha.kubernetes.io/nvidia-gpu: 1
          volumeMounts:
            - mountPath: /usr/local/nvidia/bin
              name: bin
            - mountPath: /usr/local/nvidia/lib
              name: lib
---
apiVersion: v1
kind: Service
metadata:
  name: tf-jupyter-service
  labels:
    app: tf-jupyter
spec:
  selector:
    app: tf-jupyter
  ports:
    - port: 8888

Deployment: 定义了一个名为 tf-jupyter 的部署，包含一个使用 GPU 的 TensorFlow 容器。
Service: 定义了一个名为 tf-jupyter-service 的服务，用于暴露 TensorFlow Jupyter 服务。

3. 项目的配置文件介绍

`deployments/example-gpu-deployment.yaml`

该文件包含以下主要配置：

volumes: 定义了两个卷，分别挂载主机的 /usr/lib/nvidia-375/bin 和 /usr/lib/nvidia-375 目录。
containers: 定义了一个名为 tensorflow 的容器，使用 tensorflow/tensorflow:0.11.0rc0-gpu 镜像，并配置了 GPU 资源限制。
volumeMounts: 将定义的卷挂载到容器的 /usr/local/nvidia/bin 和 /usr/local/nvidia/lib 目录。

通过这些配置，可以在 Kubernetes 集群中启动一个使用 GPU 的 TensorFlow Jupyter 服务。

Kubernetes-GPU-Guide

This guide should help fellow researchers and hobbyists to easily automate and accelerate there deep leaning training with their own Kubernetes GPU cluster.

项目地址：https://gitcode.com/gh_mirrors/ku/Kubernetes-GPU-Guide

登录后查看全文

Kubernetes GPU 指南教程

1. 项目的目录结构及介绍

2. 项目的启动文件介绍

deployments/example-gpu-deployment.yaml

3. 项目的配置文件介绍

deployments/example-gpu-deployment.yaml

项目优选

`deployments/example-gpu-deployment.yaml`

`deployments/example-gpu-deployment.yaml`