JSON deployment parameter configuration guide - Platform For AI

In EAS, you can define and deploy online services using a JSON configuration file. After you prepare the JSON configuration file, you can deploy the service using the EAS console, the EASCMD client, or an SDK.

Prepare a JSON configuration file

To deploy a service, create a JSON file that contains all the required configurations. If you are a first-time user, you can specify the basic configurations on the service deployment page in the console. The system automatically generates the corresponding JSON content, which you can then modify and extend.

The following code shows an example of a service.json file. For more information about the parameters and their descriptions, see Appendix: JSON parameter description.

{
    "cloud": {
        "computing": {
            "instances": [
                {
                    "type": "ecs.c7a.large"
                }
            ]
        }
    },
    "containers": [
        {
            "image": "****-registry.cn-beijing.cr.aliyuncs.com/***/***:latest",
            "port": 8000,
            "script": "python app.py"
        }
    ],
    "metadata": {
        "cpu": 2,
        "instance": 1,
        "memory": 4000,
        "name": "demo"
    }
}

Deploy a service using a JSON file

Console

Log on to the PAI console. Select a region on the top of the page. Then, select the desired workspace and click Enter Elastic Algorithm Service (EAS).
On the Inference Service tab, click Deploy Service. On the Deploy Service page, choose Custom Model Deployment > JSON On-Premises Deployment.
Paste the JSON configuration and click Deploy. Wait for the service status to change to Running. A status of Running indicates that the service is deployed.

EASCMD

The EASCMD client tool is used to manage model services on your server. You can use it to create, view, delete, and update services. The following procedure shows how to deploy a service using the EASCMD client on a 64-bit Linux system.

Download and authenticate the client
If you use a Data Science Workshop (DSW) development environment and an official image, the EASCMD client is pre-installed in the /etc/dsw/eascmd64 directory. Otherwise, you must download and authenticate the client.

Run the deployment command

In the directory that contains the JSON file, run the following command to deploy the service. For more information about the available operations, see Command reference.

eascmdwin64 create <service.json>

Replace <service.json> with the name of your JSON file.

Note

If you use a DSW development environment and need to upload the JSON configuration file, see Upload and download files.

The system returns a result similar to the following code.

[RequestId]: 1651567F-8F8D-4A2B-933D-F8D3E2DD****
+-------------------+----------------------------------------------------------------------------+
| Intranet Endpoint | http://166233998075****.cn-shanghai.pai-eas.aliyuncs.com/api/predict/test_eascmd |
|             Token | YjhjOWQ2ZjNkYzdiYjEzMDZjOGEyNGY5MDIxMzczZWUzNGEyMzhi****                   |
+-------------------+--------------------------------------------------------------------------+
[OK] Creating api gateway
[OK] Building image [registry-vpc.cn-shanghai.aliyuncs.com/eas/test_eascmd_cn-shanghai:v0.0.1-20221122114614]
[OK] Pushing image [registry-vpc.cn-shanghai.aliyuncs.com/eas/test_eascmd_cn-shanghai:v0.0.1-20221122114614]
[OK] Waiting [Total: 1, Pending: 1, Running: 0]
[OK] Waiting [Total: 1, Pending: 1, Running: 0]
[OK] Service is running

Appendix: JSON parameter description

Parameter	Required	Description
name	Yes	The service name. The name must be unique within a region.
token	No	The token string for access authentication. If you do not specify this parameter, the system automatically generates a token.
model_path	Yes	This parameter is required when you deploy a service using a processor. model_path and processor_path specify the source data addresses of the model and the processor, respectively. The following address formats are supported: OSS address: The address can be a specific file path or a folder path. HTTP address: The required file must be a compressed package, such as TAR.GZ, TAR, BZ2, or ZIP. Local path: You can use a local path if you use the `test` command for local debugging.
oss_endpoint	No	The OSS endpoint. Example: oss-cn-beijing.aliyuncs.com. For other values, see OSS regions and endpoints. Note By default, you do not need to specify this parameter. The system uses the internal OSS endpoint of the current region to download model files or processor files. You must specify this parameter for cross-region access to OSS. For example, if you deploy a service in the China (Hangzhou) region and specify an OSS address in the China (Beijing) region for model_path, you must use this parameter to specify the public endpoint of OSS in the China (Beijing) region.
model_entry	No	The entry file of the model. It can contain any file. If you do not specify this parameter, the file name in model_path is used. The path of the main file is passed to the initialize() function in the processor.
model_config	No	The model configuration. Any text is supported. The parameter value is passed to the second parameter of the Initialize() function in the processor.
processor	No	If you use a built-in processor provided by EAS, specify the processor code. For the codes of processors used in `eascmd`, see Built-in processors. If you use a custom processor, you do not need to configure this parameter. You only need to configure the processor_path, processor_entry, processor_mainclass, and processor_type parameters.
processor_path	No	The path of the file package related to the processor. For more information, see the description of the model_path parameter.
processor_entry	No	The main file of the processor. Examples: libprocessor.so or app.py. The file contains the implementation of the `initialize()` and `process()` functions required for prediction. This parameter is required when processor_type is set to cpp or python.
processor_mainclass	No	The main file of the processor, which is the main class in the JAR package. Example: com.aliyun.TestProcessor. This parameter is required when processor_type is set to java.
processor_type	No	The language in which the processor is implemented. Valid values: cpp java python
warm_up_data_path	No	The path of the request file used for model prefetch. For more information about the model prefetch feature, see Prefetch a model service.
runtime.enable_crash_block	No	Specifies whether a service instance automatically restarts after it crashes due to an exception in the processor code. Valid values: true: The service instance does not automatically restart. This lets you retain the context for troubleshooting. false: Default value. The service instance automatically restarts.
cloud	No	For more information, see Table 1. cloud parameter description.
autoscaler	No	The configuration for automatic scaling of the model service. For more information about parameter configurations, see Auto scaling.
containers	No	For more information, see Table 2. containers parameter description.
dockerAuth	No	If the image is from a private repository, you must configure dockerAuth. The value is the Base64-encoded string of `username:password` of the image repository.
storage	No	The information about service storage mounts. For detailed configuration instructions, see Storage configuration.
metadata	Yes	The metadata of the service. For more information about parameter configurations, see Table 3. metadata parameter description.
features	No	The configuration of special features for the service. For more information about parameter configurations, see Table 4. features parameter description.
networking	No	The call configuration of the service. For more information about parameter configurations, see Table 5. networking parameter description.
labels	No	The tags for the EAS service. The format is `key:value`.
unit.size	No	The number of machines deployed for a single instance in a distributed inference configuration. The default value is 2.
sinker	No	You can persist all requests and responses of a service to MaxCompute or Simple Log Service (SLS). The following provides configuration examples. For more information about parameter configurations, see Table 6. sinker parameter description. Store in MaxCompute `"sinker": { "type": "maxcompute", "config": { "maxcompute": { "project": "cl**", "table": "te" } } }` Store in Simple Log Service (SLS) `"sinker": { "type": "sls", "config": { "sls": { "project": "k8s-log-", "logstore": "d**" } } }`
confidential	No	By configuring the system trust management service, you can ensure that information such as data, models, and code is securely encrypted during service deployment and invocation. This implements a secure and verifiable inference service. The format is as follows: Note The secure encryption environment is mainly for your mounted storage files. Mount the storage files before you enable this feature. `"confidential": { "trustee_endpoint": "xxxx", "decryption_key": "xxxx" }` The following table describes the parameters. trustee_endpoint: The URI of the system trust management service Trustee. decryption_key: The KBS URI of the decryption key. Example: `kbs:///default/key/test-key`.

Table 1. cloud parameter description

Parameter		Required	Description
computing	instances	No	This parameter must be set when you deploy a service in a public resource group. It specifies the list of instance types to be used. If a spot instance fails to be created due to a bid failure or insufficient inventory, the system attempts to create the service using the next instance type in the configured order. type: The instance type. spot_price_limit: Optional. If you configure this parameter, it specifies that spot instances are used for the corresponding instance type, and the parameter value sets the maximum pay-as-you-go price in USD. If this parameter is not configured, it indicates that the corresponding instance type is a regular pay-as-you-go instance. capacity: The upper limit on the number of instances of this type. The value can be a number, such as "500", or a string, such as "20%". After configuration, if the number of instances of the type reaches the upper limit, the type will not be used even if there are available resources. For example, if the total number of instances for the service is 200 and the capacity of instance type A is set to 20%, the service will use instance type A to start up to 40 instances. The remaining instances will be started with other instance types.
computing	disable_spot_protection_period	No	This parameter must be set when you use spot instances. Valid values: false (default): A 1-hour protection period is provided by default after a spot instance is created. During the protection period, the instance is not released even if the market price exceeds your bid. true: The protection period is disabled. Instances without a protection period are always about 10% cheaper than instances with a protection period.
networking	vpc_id	No	The VPC, vSwitch, and security group that are bound to the EAS service.
	vswitch_id	No
	security_group_id	No

Example:

"cloud":{
      "computing":{
          "instances": [
                {
                    "type": "ecs.c8i.2xlarge",
                    "spot_price_limit": 1
                },
                {
                    "type": "ecs.c8i.xlarge",
                     "capacity": "20%"
                }
            ],
            "disable_spot_protection_period": false
      },
      "networking": {
        "vpc_id": "vpc-bp1oll7xawovg9*****",
        "vswitch_id": "vsw-bp1jjgkw51nsca1e****",
        "security_group_id": "sg-bp1ej061cnyfn0b*****"
      }
  }

Table 2. containers parameter description

If you deploy a service using a custom image, see Custom image.

Parameter		Required	Description
image		Yes	This parameter is required when you deploy a service using an image. It specifies the address of the image used to deploy the model service.
env	name	No	The name of the environment variable for image execution.
env	value	No	The value of the environment variable for image execution.
command		One of the two is required.	The entry command for the image. Only a single command is supported. Complex scripts are not supported. For complex scripts such as `cd xxx && python app.py`, use the `script` parameter. The `command` field is suitable for scenarios where the `/bin/sh` command is not available in the image.
script		One of the two is required.	The script to be executed at the entry of the image. You can specify complex scripts. Use `\n` or a semicolon to separate multiple lines.
port		No	The container port. Important The EAS DPI engine listens on fixed ports 8080 and 9090. Therefore, the container port must not be 8080 or 9090. This port must be the same as the port configured in the xxx.py file in the command.
prepare	pythonRequirements	No	A list of Python requirements to be installed before the instance starts. The image must have the `python` and `pip` commands in the system path. The format is a list. Example: `"prepare": { "pythonRequirements": [ "numpy==1.16.4", "absl-py==0.11.0" ] }`
prepare	pythonRequirementsPath	No	The path of the `requirements.txt` file to be installed before the instance starts. The image must have the Python and pip commands in the system path. The `requirements.txt` file can be directly included in the image or mounted to the service instance from external storage. Example: `"prepare": { "pythonRequirementsPath": "/data_oss/requirements.txt" }`

Table 3. metadata parameter description

Parameter		Required	Description
General parameters	instance	Yes	The number of instances to start for the service.
	workspace_id	No	After you set this parameter, the service can be used only in the specified PAI workspace. Example: `1405**`.
	cpu	No	The number of CPUs required for each instance.
	memory	No	The amount of memory required for each instance. The value must be an integer. Unit: MB. For example, `"memory": 4096` indicates that each instance requires 4 GB of memory.
	gpu	No	The number of GPUs required for each instance.
	gpu_memory	No	The amount of GPU memory required for each instance. The value must be an integer. Unit: GB. The system supports instance scheduling based on GPU memory to implement the feature of sharing a single GPU card among multiple instances. If you use GPU memory-based scheduling, you must set the gpu field to 0. If the gpu field is set to 1, the instance exclusively occupies the entire GPU card. In this case, the gpu_memory field is ignored. Important Strict isolation of GPU memory is not currently enabled. You must control the GPU memory usage of each instance to ensure it does not exceed the requested amount and to prevent GPU memory overflow.
	gpu_core_percentage	No	The percentage of computing power of a single GPU required by each instance. The value is an integer from 1 to 100. The unit is percentage. For example, a value of 10 indicates 10% of the computing power of a single GPU. The system supports instance scheduling based on computing power to share a single GPU card among multiple instances. When you specify this parameter, you must also specify the gpu_memory parameter. Otherwise, this parameter does not take effect.
	qos	No	The quality of service (QoS) for the instance. The value can be empty or BestEffort. When qos is set to BestEffort, it indicates that the CPU sharing mode is enabled. Instances are scheduled entirely based on GPU memory and memory, and are no longer limited by the number of CPUs on the node. All instances on the node share the CPUs. In this case, the cpu field indicates the maximum quota that a single instance can use in CPU sharing mode.
	resource	No	The resource group ID. The configuration policy is as follows: If the service is deployed in a public resource group, you can ignore this parameter. The service is pay-as-you-go. If the service is deployed in a dedicated resource group, set this parameter to the resource group ID. Example: eas-r-6dbzve8ip0xnzt****.
	cuda	No	The CUDA version required by the service. When the service is running, the specified CUDA version is automatically mounted to the `/usr/local/cuda` directory of the instance. Supported CUDA versions: 8.0, 9.0, 10.0, 10.1, 10.2, 11.0, 11.1, and 11.2. Example: `"cuda":"11.2"`.
	rdma	No	Specifies whether to enable the RDMA network for distributed inference. Set the value to 1 to enable the RDMA network. If this parameter is not configured, the RDMA network is disabled. Note Currently, only services deployed using Lingjun resources can use the RDMA network.
	enable_grpc	No	Specifies whether to enable the GRPC connection for the service gateway. Valid values: false: Default value. The gateway does not enable GRPC connections. HTTP requests are supported by default. true: The gateway enables GRPC connections. Note If you deploy a service using a custom image and the server in the image is implemented with GRPC, you must use this parameter to switch the gateway protocol to GRPC.
	enable_webservice	No	Specifies whether to enable a web server to deploy an AI-Web application: false: Default value. The web server is not enabled. true: The web server is enabled.
	type	No	Set this parameter to LLMGatewayService to deploy an LLM intelligent router service. For information about how to configure the JSON file, see Deploy an LLM intelligent router service.
Advanced parameters Important Adjust these parameters with caution.	rpc.batching	No	Specifies whether to enable server-side batching to accelerate GPU models. This feature is supported only in built-in processor mode. Valid values: false: Default value. Server-side batching is disabled. true: Server-side batching is enabled.
	rpc.keepalive	No	The maximum processing time for a single request. If the processing time exceeds this value, the server returns a 408 timeout error and closes the connection. The default value is 5000. Unit: milliseconds. Note When you use a custom processor, you must also configure the allspark parameter in the code. For more information, see Develop a custom processor in Python.
	rpc.io_threads	No	The number of threads used by each instance to process network I/O. The default value is 4.
	rpc.max_batch_size	No	The maximum size of each batch. The default value is 16. This feature is supported only in built-in processor mode. This parameter takes effect only when rpc.batching is set to true.
	rpc.max_batch_timeout	No	The maximum timeout period for each batch. The default value is 50. Unit: milliseconds. This feature is supported only in built-in processor mode. This parameter takes effect only when rpc.batching is set to true.
	rpc.max_queue_size	No	When you create an asynchronous inference service, this parameter specifies the maximum length of the queue. The default value is 64. If the queue is full, the server returns a 450 error and closes the connection. To prevent the server from being overloaded, the queue can notify the client in advance to retry with other instances. For services with a long response time (RT), you can reduce the queue length to prevent many requests from timing out due to accumulation in the queue.
	rpc.worker_threads	No	The number of threads in each instance used for concurrent request processing. The default value is 5. This feature is supported only in built-in processor mode.
	rpc.rate_limit	No	Enables queries per second (QPS) throttling and limits the maximum QPS that an instance can process. The default value is 0, which disables QPS throttling. For example, if this parameter is set to 2000, requests are rejected and a 429 (Too Many Requests) error is returned when the QPS exceeds 2000.
	rolling_strategy.max_surge	No	The maximum number of extra instances that can be created beyond the specified number during a rolling update. The value can be a positive integer, which indicates the number of instances, or a percentage, such as 2%. The default value is 2%. Increasing this value can speed up service updates. For example, if the number of service instances is 100 and this parameter is set to 20, 20 new instances are created immediately after the service update starts.
	rolling_strategy.max_unavailable	No	The maximum number of unavailable instances during a rolling update. This parameter can release resources for new instances during a service update to prevent the update from getting stuck due to insufficient idle resources. Currently, the default value is 1 for dedicated resource groups and 0 for public resource groups. For example, if this parameter is set to N, N instances are stopped immediately when the service update starts. Note If idle resources are sufficient, you can set this parameter to 0. A large value may affect service stability because the number of available instances decreases at the moment of the update, which increases the traffic load on a single instance. You need to balance service stability and resource availability when you configure this parameter.
	eas.termination_grace_period	No	The graceful shutdown period for an instance. Unit: seconds. The default value is 30. EAS uses a rolling update strategy. An instance first enters the Terminating state. The service redirects traffic from the instance that is about to be shut down. The instance waits for 30 seconds to process received requests and then shuts down. If the request processing time is long, you can increase this value to ensure that all in-progress requests are processed during the service update. Important Decreasing this value may affect service stability. A large value may slow down the service update. Do not configure this parameter unless necessary.
	scheduling.spread.policy	No	The scheduling policy for service instances. The following policies are supported: host: Spreads instances across different nodes. zone: Spreads instances across different zones. default: Schedules instances based on the default policy without active spreading.
	rpc.enable_sigterm	No	Valid values: false (default): A SIGTERM signal is not sent when an instance enters the shutdown state. true: When a service instance enters the shutdown state, the system immediately sends a SIGTERM signal to the main process. After the process in the service receives the signal, it must perform a custom graceful shutdown operation in the signal handler. If the signal is not handled, the main process may exit directly after receiving the signal, causing the graceful shutdown to fail.
	resource_rebalancing	No	Valid values: false (default): This feature is disabled. true: EAS periodically creates probe instances on high-priority resources. If a probe instance is successfully scheduled, more probe instances are created exponentially until scheduling fails. After a successfully scheduled probe instance is initialized and enters the ready state, it replaces an instance on a low-priority resource. This feature can solve the following problems: During a rolling update, terminating instances still occupy resources, causing new instances to be started in a public resource group. Due to public resource limitations, the new instances are rescheduled back to the dedicated resource group. When you use both spot instances and regular instances, the system periodically checks whether spot instances are available. If they are, regular instances are migrated to spot instances.
	workload_type	No	If you want to deploy an EAS service as a task, you can set this parameter to elasticjob. For more information about how to use the elastic job service, see Elastic Job service.
	resource_burstable	No	Enables the Elastic Resource Pool feature for EAS services deployed in dedicated resource groups: true: Enables the feature. false: Disables the feature.
	shm_size	No	Configures the shared memory for an instance. This allows direct memory read and write operations without data replication or transfer. Unit: GB.

Table 4. features parameter description

Parameter	Required	Description
eas.aliyun.com/extra-ephemeral-storage	No	The size of the extra system disk that you need to configure when the free quota for the system disk cannot meet your business requirements. The value must be a positive integer from 0 to 2000. Unit: GB.
eas.aliyun.com/gpu-driver-version	No	Specifies the GPU driver version. Example: tesla=550.127.08.

Table 5. networking parameter description

Parameter	Required	Description
gateway	No	You can configure a dedicated gateway for an EAS service.

Table 6. sinker parameter description

Parameter		Required	Description
type		No	The following storage classes are supported: maxcompute: MaxCompute. sls: Simple Log Service (SLS).
config	maxcompute.project	No	The MaxCompute project name.
	maxcompute.table	No	The MaxCompute data table.
	sls.project	No	The SLS project name.
	sls.logstore	No	The SLS Logstore.

Appendix: JSON configuration example

The following code provides an example of a JSON configuration file that uses the parameters described in this topic:

{
  "name": "test_eascmd",
  "token": "****M5Mjk0NDZhM2EwYzUzOGE0OGMx****",
  "processor": "tensorflow_cpu_1.12",
  "model_path": "oss://examplebucket/exampledir/",
  "oss_endpoint": "oss-cn-beijing.aliyuncs.com",
  "model_entry": "",
  "model_config": "",
  "processor_path": "",
  "processor_entry": "",
  "processor_mainclass": "",
  "processor_type": "",
  "warm_up_data_path": "",
  "runtime": {
    "enable_crash_block": false
  },
  "unit": {
        "size": 2
    },
  "sinker": {
        "type": "maxcompute",
        "config": {
            "maxcompute": {
                "project": "cl****",
                "table": "te****"
            }
        }
    },
  "cloud": {
    "computing": {
      "instances": [
        {
          "capacity": 800,
          "type": "dedicated_resource"
        },
        {
          "capacity": 200,
          "type": "ecs.c7.4xlarge",
          "spot_price_limit": 3.6
        }
      ],
      "disable_spot_protection_period": true
    },
    "networking": {
            "vpc_id": "vpc-bp1oll7xawovg9t8****",
            "vswitch_id": "vsw-bp1jjgkw51nsca1e****",
            "security_group_id": "sg-bp1ej061cnyfn0b****"
        }
  },
  "autoscaler": {
    "min": 2,
    "max": 5,
    "strategies": {
      "qps": 10
    }
  },
  "storage": [
    {
      "mount_path": "/data_oss",
      "oss": {
        "endpoint": "oss-cn-shanghai-internal.aliyuncs.com",
        "path": "oss://bucket/path/"
      }
    }
  ],
  "confidential": {
        "trustee_endpoint": "xx",
        "decryption_key": "xx"
    },
  "metadata": {
    "resource": "eas-r-9lkbl2jvdm0puv****",
    "instance": 1,
    "workspace_id": 1405**,
    "gpu": 0,
    "cpu": 1,
    "memory": 2000,
    "gpu_memory": 10,
    "gpu_core_percentage": 10,
    "qos": "",
    "cuda": "11.2",
    "enable_grpc": false,
    "enable_webservice": false,
    "rdma": 1,
    "rpc": {
      "batching": false,
      "keepalive": 5000,
      "io_threads": 4,
      "max_batch_size": 16,
      "max_batch_timeout": 50,
      "max_queue_size": 64,
      "worker_threads": 5,
      "rate_limit": 0,
      "enable_sigterm": false
    },
    "rolling_strategy": {
      "max_surge": 1,
      "max_unavailable": 1
    },
    "eas.termination_grace_period": 30,
    "scheduling": {
      "spread": {
        "policy": "host"
      }
    },
    "resource_rebalancing": false,
    "workload_type": "elasticjob",
    "shm_size": 100
  },
  "features": {
    "eas.aliyun.com/extra-ephemeral-storage": "100Gi",
    "eas.aliyun.com/gpu-driver-version": "tesla=550.127.08"
  },
  "networking": {
    "gateway": "gw-m2vkzbpixm7mo****"
  },
  "containers": [
    {
      "image": "registry-vpc.cn-shanghai.aliyuncs.com/xxx/yyy:zzz",
      "prepare": {
        "pythonRequirements": [
          "numpy==1.16.4",
          "absl-py==0.11.0"
        ]
      },
      "command": "python app.py",
      "port": 8000
    }
  ],
  "dockerAuth": "dGVzdGNhbzoxM*******"
}