Configure GPU sharing for EAS services - Platform For AI

If you use dedicated resource groups or Lingjun resource quotas to deploy services in Elastic Algorithm Service (EAS) of Platform for AI (PAI), you can enable GPU sharing to increase resource utilization. GPU sharing allows EAS to allocate resources based on computing power percentages and GPU memory per instance during service deployment or updates. This document provides step-by-step guidance for configuring GPU sharing via the EAS console or EASCMD client, including key parameters like gpu_memory (required) and gpu_core_percentage (optional), ensuring efficient multi-instance GPU resource sharing while adhering to non-GU GPU compatibility and resource allocation constraints.

Configure GPU sharing when you deploy a service

Use the console

Log in to the PAI console, select the target region from the top navigation bar, choose the target workspace from the right sidebar, and then click to Enter Elastic Algorithm Service (EAS).
Click Deploy Service. In the Custom Model Deployment section, click Custom Deployment.

In the Resource Information section, configure the following key parameters. For more information about other parameters, see Parameters for custom deployment in the console.

Parameter	Description
Resource Type	Select EAS Resource Group or Resource Quota.
GPU Sharing	Select GPU Sharing. Note Only after selecting an Dedicated EAS Resource Group, a virtual resource group, or a Resource Quota will the GPU sharing option become available.
Deployment Resources	Configure the following parameters: Single-GPU Memory (GB): (Required) the single-GPU memory required by each instance. The value is an integer. Unit: GB. PAI allows memory resources of one GPU to be allocated to multiple instances. The GPU memory of multiple instances is not strictly isolated. To prevent out-of-memory (OOM) errors, make sure that the GPU memory used by each instance does not exceed the requested amount. Computing Power per GPU (%): (Optional) the computing power of a single GPU required by each instance. The value must be an integer from 1 to 100. For example, if you enter 10, the system allocates 10% computing power of a single GPU to an instance. This facilitates flexible scheduling of computing power and allows multiple instances to share a single GPU. Single-GPU Memory and Computing Power per GPU are in an AND relationship. For example, if you set Single-GPU Memory to 48 GB and Computing Power per GPU to 10%, a maximum of 48 GB of memory can be used, and simultaneously, only up to 10% of the computing power can be utilized.

After you configure the parameters, click Deploy.

Use on-premises client (EASCMD)

Download the EASCMD client and complete identity authentication. In this example, Windows 64 is used.

Create a service configuration file named service.json in the directory in which the client is located. Sample content of the configuration file:

{
    "containers": [
        {
            "image": "eas-registry-vpc.cn-beijing.cr.aliyuncs.com/pai-eas/chat-llm-webui:3.0.4",
            "port": 8000,
            "script": "python webui/webui_server.py --port=8000 --model-path=Qwen/Qwen1.5-7B-Chat"
        }
    ],
    "metadata": {
        "cpu": 8,
        "enable_webservice": true,
        "gpu_core_percentage": 5,
        "gpu_memory": 20,
        "instance": 1,
        "memory": 20000,
        "name": "testchatglm",
        "resource": "eas-r-fky7kxiq4l2zzt****",
        "resource_burstable": false
    },
    "name": "test"
}

Take note of the following parameters. For information about other parameters, see All Parameters of model services.

Parameter

Description

gpu_memory

The amount of GPU memory required by each instance. The value must be an integer. Unit: GB.

PAI allows memory resources of one GPU to be allocated to multiple instances. If you want to schedule GPU memory, set the gpu field to 0. If you set the gpu field to 1, the instance occupies the entire GPU. In this case, the gpu_memory field is ignored.

Important

The GPU memory of multiple instances is not strictly isolated. To prevent out-of-memory (OOM) errors, make sure that the GPU memory used by each instance does not exceed the requested amount.

gpu_core_percentage

The percentage of the computing power required per GPU by each instance. The value is an integer between 1 and 100. Unit: percentage. For example, if you set the parameter to 10, the system uses 10% computing power of each GPU.

This facilitates flexible scheduling of computing power and allows multiple instances to share a single GPU. If you configure this parameter, you must also configure the gpu_memory parameter. Otherwise, this parameter does not take effect.

Run the following command in the directory in which the JSON file is located to deploy the service: For more information, see Run commands to use the EASCMD client.
```
eascmdwin64.exe create service.json
```

Configure GPU sharing when you update a service

If you did not enable the GPU sharing feature when you deploy a service by using dedicated resource groups, you can enable GPU sharing by updating the service configuration.

Use the console

On the Elastic Algorithm Service (EAS) page, find the service that you want to update and click Update Service in the Actions column.
In the Resource Deployment section of the Update Service page, configure the Resource Type, GPU Sharing, and Deployment parameters.
After you configure the parameters, click Deploy.

For more information, see the "Use the console" section of this topic.

Update the on-premises client (EASCMD)

Download the EASCMD client and complete identity authentication. In this example, Windows 64 is used.
Create a file named instances.json in the directory in which the client is located. Sample content of the file:
```
"metadata": {
        "gpu_memory": 2,
        "gpu_core_percentage": 5
    }
```
For more information about the parameters in the preceding code, see the "Use on-premises client (EASCMD)" section of this topic.
Open the terminal tool. In the directory in which the JSON file is located, run the following command to enable GPU sharing for the EAS service:
```
eascmdwin64.exe modify <service_name> -s <instances.json>
```
Replace <service_name> with the name of the EAS service and <instances.json> with the name of the JSON file you created.