All Products
Search
Document Center

Platform For AI:Deploy a service using a JSON configuration file

Last Updated:Aug 15, 2025

In EAS, you can define and deploy online services using a JSON configuration file. After you prepare the JSON configuration file, you can deploy the service using the EAS console, the EASCMD client, or an SDK.

Prepare a JSON configuration file

To deploy a service, create a JSON file that contains all the required configurations. If you are a first-time user, you can specify the basic configurations on the service deployment page in the console. The system automatically generates the corresponding JSON content, which you can then modify and extend.

The following code shows an example of a service.json file. For more information about the parameters and their descriptions, see Appendix: JSON parameter description.

{
    "cloud": {
        "computing": {
            "instances": [
                {
                    "type": "ecs.c7a.large"
                }
            ]
        }
    },
    "containers": [
        {
            "image": "****-registry.cn-beijing.cr.aliyuncs.com/***/***:latest",
            "port": 8000,
            "script": "python app.py"
        }
    ],
    "metadata": {
        "cpu": 2,
        "instance": 1,
        "memory": 4000,
        "name": "demo"
    }
}

Deploy a service using a JSON file

Console

  1. Log on to the PAI console. Select a region on the top of the page. Then, select the desired workspace and click Enter Elastic Algorithm Service (EAS).

  2. On the Inference Service tab, click Deploy Service. On the Deploy Service page, choose Custom Model Deployment > JSON On-Premises Deployment.

  3. Paste the JSON configuration and click Deploy. Wait for the service status to change to Running. A status of Running indicates that the service is deployed.

EASCMD

The EASCMD client tool is used to manage model services on your server. You can use it to create, view, delete, and update services. The following procedure shows how to deploy a service using the EASCMD client on a 64-bit Linux system.

  1. Download and authenticate the client

    If you use a Data Science Workshop (DSW) development environment and an official image, the EASCMD client is pre-installed in the /etc/dsw/eascmd64 directory. Otherwise, you must download and authenticate the client.

  2. Run the deployment command

    In the directory that contains the JSON file, run the following command to deploy the service. For more information about the available operations, see Command reference.

    eascmdwin64 create <service.json>

    Replace <service.json> with the name of your JSON file.

    Note

    If you use a DSW development environment and need to upload the JSON configuration file, see Upload and download files.

    The system returns a result similar to the following code.

    [RequestId]: 1651567F-8F8D-4A2B-933D-F8D3E2DD****
    +-------------------+----------------------------------------------------------------------------+
    | Intranet Endpoint | http://166233998075****.cn-shanghai.pai-eas.aliyuncs.com/api/predict/test_eascmd |
    |             Token | YjhjOWQ2ZjNkYzdiYjEzMDZjOGEyNGY5MDIxMzczZWUzNGEyMzhi****                   |
    +-------------------+--------------------------------------------------------------------------+
    [OK] Creating api gateway
    [OK] Building image [registry-vpc.cn-shanghai.aliyuncs.com/eas/test_eascmd_cn-shanghai:v0.0.1-20221122114614]
    [OK] Pushing image [registry-vpc.cn-shanghai.aliyuncs.com/eas/test_eascmd_cn-shanghai:v0.0.1-20221122114614]
    [OK] Waiting [Total: 1, Pending: 1, Running: 0]
    [OK] Waiting [Total: 1, Pending: 1, Running: 0]
    [OK] Service is running

Appendix: JSON parameter description

Parameter

Required

Description

name

Yes

The service name. The name must be unique within a region.

token

No

The token string for access authentication. If you do not specify this parameter, the system automatically generates a token.

model_path

Yes

This parameter is required when you deploy a service using a processor. model_path and processor_path specify the source data addresses of the model and the processor, respectively. The following address formats are supported:

  • OSS address: The address can be a specific file path or a folder path.

  • HTTP address: The required file must be a compressed package, such as TAR.GZ, TAR, BZ2, or ZIP.

  • Local path: You can use a local path if you use the test command for local debugging.

oss_endpoint

No

The OSS endpoint. Example: oss-cn-beijing.aliyuncs.com. For other values, see OSS regions and endpoints.

Note

By default, you do not need to specify this parameter. The system uses the internal OSS endpoint of the current region to download model files or processor files. You must specify this parameter for cross-region access to OSS. For example, if you deploy a service in the China (Hangzhou) region and specify an OSS address in the China (Beijing) region for model_path, you must use this parameter to specify the public endpoint of OSS in the China (Beijing) region.

model_entry

No

The entry file of the model. It can contain any file. If you do not specify this parameter, the file name in model_path is used. The path of the main file is passed to the initialize() function in the processor.

model_config

No

The model configuration. Any text is supported. The parameter value is passed to the second parameter of the Initialize() function in the processor.

processor

No

  • If you use a built-in processor provided by EAS, specify the processor code. For the codes of processors used in eascmd, see Built-in processors.

  • If you use a custom processor, you do not need to configure this parameter. You only need to configure the processor_path, processor_entry, processor_mainclass, and processor_type parameters.

processor_path

No

The path of the file package related to the processor. For more information, see the description of the model_path parameter.

processor_entry

No

The main file of the processor. Examples: libprocessor.so or app.py. The file contains the implementation of the initialize() and process() functions required for prediction.

This parameter is required when processor_type is set to cpp or python.

processor_mainclass

No

The main file of the processor, which is the main class in the JAR package. Example: com.aliyun.TestProcessor.

This parameter is required when processor_type is set to java.

processor_type

No

The language in which the processor is implemented. Valid values:

  • cpp

  • java

  • python

warm_up_data_path

No

The path of the request file used for model prefetch. For more information about the model prefetch feature, see Prefetch a model service.

runtime.enable_crash_block

No

Specifies whether a service instance automatically restarts after it crashes due to an exception in the processor code. Valid values:

  • true: The service instance does not automatically restart. This lets you retain the context for troubleshooting.

  • false: Default value. The service instance automatically restarts.

cloud

No

For more information, see Table 1. cloud parameter description.

autoscaler

No

The configuration for automatic scaling of the model service. For more information about parameter configurations, see Auto scaling.

containers

No

For more information, see Table 2. containers parameter description.

dockerAuth

No

If the image is from a private repository, you must configure dockerAuth. The value is the Base64-encoded string of username:password of the image repository.

storage

No

The information about service storage mounts. For detailed configuration instructions, see Storage configuration.

metadata

Yes

The metadata of the service. For more information about parameter configurations, see Table 3. metadata parameter description.

features

No

The configuration of special features for the service. For more information about parameter configurations, see Table 4. features parameter description.

networking

No

The call configuration of the service. For more information about parameter configurations, see Table 5. networking parameter description.

labels

No

The tags for the EAS service. The format is key:value.

unit.size

No

The number of machines deployed for a single instance in a distributed inference configuration. The default value is 2.

sinker

No

You can persist all requests and responses of a service to MaxCompute or Simple Log Service (SLS). The following provides configuration examples. For more information about parameter configurations, see Table 6. sinker parameter description.

Store in MaxCompute

"sinker": {
        "type": "maxcompute",
        "config": {
            "maxcompute": {
                "project": "cl****",
                "table": "te****"
            }
        }
    }

Store in Simple Log Service (SLS)

"sinker": {
        "type": "sls",
        "config": {
            "sls": {
                "project": "k8s-log-****",
                "logstore": "d****"
            }
        }
    }

confidential

No

By configuring the system trust management service, you can ensure that information such as data, models, and code is securely encrypted during service deployment and invocation. This implements a secure and verifiable inference service. The format is as follows:

Note

The secure encryption environment is mainly for your mounted storage files. Mount the storage files before you enable this feature.

"confidential": {
        "trustee_endpoint": "xxxx",
        "decryption_key": "xxxx"
    }

The following table describes the parameters.

  • trustee_endpoint: The URI of the system trust management service Trustee.

  • decryption_key: The KBS URI of the decryption key. Example: kbs:///default/key/test-key.

Table 1. cloud parameter description

Parameter

Required

Description

computing

instances

No

This parameter must be set when you deploy a service in a public resource group. It specifies the list of instance types to be used. If a spot instance fails to be created due to a bid failure or insufficient inventory, the system attempts to create the service using the next instance type in the configured order.

  • type: The instance type.

  • spot_price_limit: Optional.

    • If you configure this parameter, it specifies that spot instances are used for the corresponding instance type, and the parameter value sets the maximum pay-as-you-go price in USD.

    • If this parameter is not configured, it indicates that the corresponding instance type is a regular pay-as-you-go instance.

  • capacity: The upper limit on the number of instances of this type. The value can be a number, such as "500", or a string, such as "20%". After configuration, if the number of instances of the type reaches the upper limit, the type will not be used even if there are available resources. 

    For example, if the total number of instances for the service is 200 and the capacity of instance type A is set to 20%, the service will use instance type A to start up to 40 instances. The remaining instances will be started with other instance types.

disable_spot_protection_period

No

This parameter must be set when you use spot instances. Valid values:

  • false (default): A 1-hour protection period is provided by default after a spot instance is created. During the protection period, the instance is not released even if the market price exceeds your bid.

  • true: The protection period is disabled. Instances without a protection period are always about 10% cheaper than instances with a protection period.

networking

vpc_id

No

The VPC, vSwitch, and security group that are bound to the EAS service.

vswitch_id

No

security_group_id

No

Example:

"cloud":{
      "computing":{
          "instances": [
                {
                    "type": "ecs.c8i.2xlarge",
                    "spot_price_limit": 1
                },
                {
                    "type": "ecs.c8i.xlarge",
                     "capacity": "20%"
                }
            ],
            "disable_spot_protection_period": false
      },
      "networking": {
        "vpc_id": "vpc-bp1oll7xawovg9*****",
        "vswitch_id": "vsw-bp1jjgkw51nsca1e****",
        "security_group_id": "sg-bp1ej061cnyfn0b*****"
      }
  }

Table 2. containers parameter description

If you deploy a service using a custom image, see Custom image.

Parameter

Required

Description

image

Yes

This parameter is required when you deploy a service using an image. It specifies the address of the image used to deploy the model service.

env

name

No

The name of the environment variable for image execution.

value

No

The value of the environment variable for image execution.

command

One of the two is required.

The entry command for the image. Only a single command is supported. Complex scripts are not supported. For complex scripts such as `cd xxx && python app.py`, use the `script` parameter. The `command` field is suitable for scenarios where the `/bin/sh` command is not available in the image.

script

The script to be executed at the entry of the image. You can specify complex scripts. Use `\n` or a semicolon to separate multiple lines.

port

No

The container port.

Important
  • The EAS DPI engine listens on fixed ports 8080 and 9090. Therefore, the container port must not be 8080 or 9090.

  • This port must be the same as the port configured in the xxx.py file in the command.

prepare

pythonRequirements

No

A list of Python requirements to be installed before the instance starts. The image must have the `python` and `pip` commands in the system path. The format is a list. Example:

"prepare": {
  "pythonRequirements": [
    "numpy==1.16.4",
    "absl-py==0.11.0"
  ]
}

pythonRequirementsPath

No

The path of the `requirements.txt` file to be installed before the instance starts. The image must have the Python and pip commands in the system path. The `requirements.txt` file can be directly included in the image or mounted to the service instance from external storage. Example:

"prepare": {
  "pythonRequirementsPath": "/data_oss/requirements.txt"
}

Table 3. metadata parameter description

Parameter

Required

Description

General parameters

instance

Yes

The number of instances to start for the service.

workspace_id

No

After you set this parameter, the service can be used only in the specified PAI workspace. Example: 1405**.

cpu

No

The number of CPUs required for each instance.

memory

No

The amount of memory required for each instance. The value must be an integer. Unit: MB. For example, "memory": 4096 indicates that each instance requires 4 GB of memory.

gpu

No

The number of GPUs required for each instance.

gpu_memory

No

The amount of GPU memory required for each instance. The value must be an integer. Unit: GB.

The system supports instance scheduling based on GPU memory to implement the feature of sharing a single GPU card among multiple instances. If you use GPU memory-based scheduling, you must set the gpu field to 0. If the gpu field is set to 1, the instance exclusively occupies the entire GPU card. In this case, the gpu_memory field is ignored.

Important

Strict isolation of GPU memory is not currently enabled. You must control the GPU memory usage of each instance to ensure it does not exceed the requested amount and to prevent GPU memory overflow.

gpu_core_percentage

No

The percentage of computing power of a single GPU required by each instance. The value is an integer from 1 to 100. The unit is percentage. For example, a value of 10 indicates 10% of the computing power of a single GPU.

The system supports instance scheduling based on computing power to share a single GPU card among multiple instances. When you specify this parameter, you must also specify the gpu_memory parameter. Otherwise, this parameter does not take effect.

qos

No

The quality of service (QoS) for the instance. The value can be empty or BestEffort. When qos is set to BestEffort, it indicates that the CPU sharing mode is enabled. Instances are scheduled entirely based on GPU memory and memory, and are no longer limited by the number of CPUs on the node. All instances on the node share the CPUs. In this case, the cpu field indicates the maximum quota that a single instance can use in CPU sharing mode.

resource

No

The resource group ID. The configuration policy is as follows:

  • If the service is deployed in a public resource group, you can ignore this parameter. The service is pay-as-you-go.

  • If the service is deployed in a dedicated resource group, set this parameter to the resource group ID. Example: eas-r-6dbzve8ip0xnzt****.

cuda

No

The CUDA version required by the service. When the service is running, the specified CUDA version is automatically mounted to the /usr/local/cuda directory of the instance.

Supported CUDA versions: 8.0, 9.0, 10.0, 10.1, 10.2, 11.0, 11.1, and 11.2. Example: "cuda":"11.2".

rdma

No

Specifies whether to enable the RDMA network for distributed inference. Set the value to 1 to enable the RDMA network. If this parameter is not configured, the RDMA network is disabled.

Note

Currently, only services deployed using Lingjun resources can use the RDMA network.

enable_grpc

No

Specifies whether to enable the GRPC connection for the service gateway. Valid values:

  • false: Default value. The gateway does not enable GRPC connections. HTTP requests are supported by default.

  • true: The gateway enables GRPC connections.

Note

If you deploy a service using a custom image and the server in the image is implemented with GRPC, you must use this parameter to switch the gateway protocol to GRPC.

enable_webservice

No

Specifies whether to enable a web server to deploy an AI-Web application:

  • false: Default value. The web server is not enabled.

  • true: The web server is enabled.

type

No

Set this parameter to LLMGatewayService to deploy an LLM intelligent router service. For information about how to configure the JSON file, see Deploy an LLM intelligent router service.

Advanced parameters

Important

Adjust these parameters with caution.

rpc.batching

No

Specifies whether to enable server-side batching to accelerate GPU models. This feature is supported only in built-in processor mode. Valid values:

  • false: Default value. Server-side batching is disabled.

  • true: Server-side batching is enabled.

rpc.keepalive

No

The maximum processing time for a single request. If the processing time exceeds this value, the server returns a 408 timeout error and closes the connection. The default value is 5000. Unit: milliseconds.

Note

When you use a custom processor, you must also configure the allspark parameter in the code. For more information, see Develop a custom processor in Python.

rpc.io_threads

No

The number of threads used by each instance to process network I/O. The default value is 4.

rpc.max_batch_size

No

The maximum size of each batch. The default value is 16. This feature is supported only in built-in processor mode. This parameter takes effect only when rpc.batching is set to true.

rpc.max_batch_timeout

No

The maximum timeout period for each batch. The default value is 50. Unit: milliseconds. This feature is supported only in built-in processor mode. This parameter takes effect only when rpc.batching is set to true.

rpc.max_queue_size

No

When you create an asynchronous inference service, this parameter specifies the maximum length of the queue. The default value is 64. If the queue is full, the server returns a 450 error and closes the connection. To prevent the server from being overloaded, the queue can notify the client in advance to retry with other instances. For services with a long response time (RT), you can reduce the queue length to prevent many requests from timing out due to accumulation in the queue.

rpc.worker_threads

No

The number of threads in each instance used for concurrent request processing. The default value is 5. This feature is supported only in built-in processor mode.

rpc.rate_limit

No

Enables queries per second (QPS) throttling and limits the maximum QPS that an instance can process. The default value is 0, which disables QPS throttling.

For example, if this parameter is set to 2000, requests are rejected and a 429 (Too Many Requests) error is returned when the QPS exceeds 2000.

rolling_strategy.max_surge

No

The maximum number of extra instances that can be created beyond the specified number during a rolling update. The value can be a positive integer, which indicates the number of instances, or a percentage, such as 2%. The default value is 2%. Increasing this value can speed up service updates.

For example, if the number of service instances is 100 and this parameter is set to 20, 20 new instances are created immediately after the service update starts.

rolling_strategy.max_unavailable

No

The maximum number of unavailable instances during a rolling update. This parameter can release resources for new instances during a service update to prevent the update from getting stuck due to insufficient idle resources. Currently, the default value is 1 for dedicated resource groups and 0 for public resource groups.

For example, if this parameter is set to N, N instances are stopped immediately when the service update starts.

Note

If idle resources are sufficient, you can set this parameter to 0. A large value may affect service stability because the number of available instances decreases at the moment of the update, which increases the traffic load on a single instance. You need to balance service stability and resource availability when you configure this parameter.

eas.termination_grace_period

No

The graceful shutdown period for an instance. Unit: seconds. The default value is 30.

EAS uses a rolling update strategy. An instance first enters the Terminating state. The service redirects traffic from the instance that is about to be shut down. The instance waits for 30 seconds to process received requests and then shuts down. If the request processing time is long, you can increase this value to ensure that all in-progress requests are processed during the service update.

Important

Decreasing this value may affect service stability. A large value may slow down the service update. Do not configure this parameter unless necessary.

scheduling.spread.policy

No

The scheduling policy for service instances. The following policies are supported:

  • host: Spreads instances across different nodes.

  • zone: Spreads instances across different zones.

  • default: Schedules instances based on the default policy without active spreading.

rpc.enable_sigterm

No

Valid values:

  • false (default): A SIGTERM signal is not sent when an instance enters the shutdown state.

  • true: When a service instance enters the shutdown state, the system immediately sends a SIGTERM signal to the main process. After the process in the service receives the signal, it must perform a custom graceful shutdown operation in the signal handler. If the signal is not handled, the main process may exit directly after receiving the signal, causing the graceful shutdown to fail.

resource_rebalancing

No

Valid values:

  • false (default): This feature is disabled.

  • true: EAS periodically creates probe instances on high-priority resources. If a probe instance is successfully scheduled, more probe instances are created exponentially until scheduling fails. After a successfully scheduled probe instance is initialized and enters the ready state, it replaces an instance on a low-priority resource.

This feature can solve the following problems:

  • During a rolling update, terminating instances still occupy resources, causing new instances to be started in a public resource group. Due to public resource limitations, the new instances are rescheduled back to the dedicated resource group.

  • When you use both spot instances and regular instances, the system periodically checks whether spot instances are available. If they are, regular instances are migrated to spot instances.

workload_type

No

If you want to deploy an EAS service as a task, you can set this parameter to elasticjob. For more information about how to use the elastic job service, see Elastic Job service.

resource_burstable

No

Enables the Elastic Resource Pool feature for EAS services deployed in dedicated resource groups:

  • true: Enables the feature.

  • false: Disables the feature.

shm_size

No

Configures the shared memory for an instance. This allows direct memory read and write operations without data replication or transfer. Unit: GB.

Table 4. features parameter description

Parameter

Required

Description

eas.aliyun.com/extra-ephemeral-storage

No

The size of the extra system disk that you need to configure when the free quota for the system disk cannot meet your business requirements. The value must be a positive integer from 0 to 2000. Unit: GB.

eas.aliyun.com/gpu-driver-version

No

Specifies the GPU driver version. Example: tesla=550.127.08.

Table 5. networking parameter description

Parameter

Required

Description

gateway

No

You can configure a dedicated gateway for an EAS service.

Table 6. sinker parameter description

Parameter

Required

Description

type

No

The following storage classes are supported:

  • maxcompute: MaxCompute.

  • sls: Simple Log Service (SLS).

config

maxcompute.project

No

The MaxCompute project name.

maxcompute.table

No

The MaxCompute data table.

sls.project

No

The SLS project name.

sls.logstore

No

The SLS Logstore.

Appendix: JSON configuration example

The following code provides an example of a JSON configuration file that uses the parameters described in this topic:

{
  "name": "test_eascmd",
  "token": "****M5Mjk0NDZhM2EwYzUzOGE0OGMx****",
  "processor": "tensorflow_cpu_1.12",
  "model_path": "oss://examplebucket/exampledir/",
  "oss_endpoint": "oss-cn-beijing.aliyuncs.com",
  "model_entry": "",
  "model_config": "",
  "processor_path": "",
  "processor_entry": "",
  "processor_mainclass": "",
  "processor_type": "",
  "warm_up_data_path": "",
  "runtime": {
    "enable_crash_block": false
  },
  "unit": {
        "size": 2
    },
  "sinker": {
        "type": "maxcompute",
        "config": {
            "maxcompute": {
                "project": "cl****",
                "table": "te****"
            }
        }
    },
  "cloud": {
    "computing": {
      "instances": [
        {
          "capacity": 800,
          "type": "dedicated_resource"
        },
        {
          "capacity": 200,
          "type": "ecs.c7.4xlarge",
          "spot_price_limit": 3.6
        }
      ],
      "disable_spot_protection_period": true
    },
    "networking": {
            "vpc_id": "vpc-bp1oll7xawovg9t8****",
            "vswitch_id": "vsw-bp1jjgkw51nsca1e****",
            "security_group_id": "sg-bp1ej061cnyfn0b****"
        }
  },
  "autoscaler": {
    "min": 2,
    "max": 5,
    "strategies": {
      "qps": 10
    }
  },
  "storage": [
    {
      "mount_path": "/data_oss",
      "oss": {
        "endpoint": "oss-cn-shanghai-internal.aliyuncs.com",
        "path": "oss://bucket/path/"
      }
    }
  ],
  "confidential": {
        "trustee_endpoint": "xx",
        "decryption_key": "xx"
    },
  "metadata": {
    "resource": "eas-r-9lkbl2jvdm0puv****",
    "instance": 1,
    "workspace_id": 1405**,
    "gpu": 0,
    "cpu": 1,
    "memory": 2000,
    "gpu_memory": 10,
    "gpu_core_percentage": 10,
    "qos": "",
    "cuda": "11.2",
    "enable_grpc": false,
    "enable_webservice": false,
    "rdma": 1,
    "rpc": {
      "batching": false,
      "keepalive": 5000,
      "io_threads": 4,
      "max_batch_size": 16,
      "max_batch_timeout": 50,
      "max_queue_size": 64,
      "worker_threads": 5,
      "rate_limit": 0,
      "enable_sigterm": false
    },
    "rolling_strategy": {
      "max_surge": 1,
      "max_unavailable": 1
    },
    "eas.termination_grace_period": 30,
    "scheduling": {
      "spread": {
        "policy": "host"
      }
    },
    "resource_rebalancing": false,
    "workload_type": "elasticjob",
    "shm_size": 100
  },
  "features": {
    "eas.aliyun.com/extra-ephemeral-storage": "100Gi",
    "eas.aliyun.com/gpu-driver-version": "tesla=550.127.08"
  },
  "networking": {
    "gateway": "gw-m2vkzbpixm7mo****"
  },
  "containers": [
    {
      "image": "registry-vpc.cn-shanghai.aliyuncs.com/xxx/yyy:zzz",
      "prepare": {
        "pythonRequirements": [
          "numpy==1.16.4",
          "absl-py==0.11.0"
        ]
      },
      "command": "python app.py",
      "port": 8000
    }
  ],
  "dockerAuth": "dGVzdGNhbzoxM*******"
}