All Products
Search
Document Center

Platform For AI:Create a DSW instance

Last Updated:Aug 09, 2025

Data Science Workshop (DSW) provides a cloud-based integrated development environment (IDE) for AI development. It includes multiple built-in development environments. If you are familiar with Notebook or VSCode, you can quickly start developing models. This topic describes the parameters that you can configure when creating an instance and provides solutions to common issues that might occur when you start or release an instance.

Prerequisites

You must activate Platform for AI (PAI) and create a workspace. To do this, log on to the PAI console with your Alibaba Cloud account. In the upper-left corner, select a region, and then authorize and activate PAI. For more information, see Activate PAI and create a workspace.

Create an instance using the console

Important

When you create an instance that uses public resources, you are charged for the instance's running time. Billing stops after you stop or delete the instance. For more information about billing, see Billing of Data Science Workshop (DSW).

  1. Go to the DSW page.

    1. Log on to the PAI console.

    2. On the Overview page, select a region.

    3. In the navigation pane on the left, click Workspaces. On the Workspaces page, click the name of the workspace that you want to use.

    4. In the navigation pane on the left of the workspace, choose Model Development And Training > Data Science Workshop (DSW).

  2. Click Create Instance.

  3. On the Configure Instance page, configure the following key parameters.

    Basic information

    Parameter

    Description

    Instance Name

    Configure the name of the DSW instance based on the on-screen prompts.

    Tag

    Add tags to the instance as needed. This helps you find, locate, manage, and split bills for resources across different dimensions.

    Resource information

    Parameter

    Description

    Resource Type

    • Public Resources: pay-as-you-go. This billing method cannot be changed to subscription.

      Note

      If you use public resources, each Alibaba Cloud account has a quota of two GPUs per region. An error may occur if your resource usage exceeds the quota. To increase the quota, submit a ticket.

      • Resource Specification: You can select GPU, CPU, or free trial resources. For more information about instance types, see Instance families.

      • Spot Purchase: Select spot instances to reduce running costs. For more information, see Purchase a DSW spot instance.

        This parameter is available only in the China (Hangzhou), China (Shanghai), China (Beijing), China (Ulanqab), China (Shenzhen), China (Guangzhou), Japan (Tokyo), and Singapore regions.

    • Resource Quota: subscription billing method.

      • Resource Quota: You can select general computing resources or Lingjun resources. If no resources are available, click Associate Resource Quota to configure one.

      • Resource Specification: Set the GPU, CPU, and memory specifications as needed.

      • Priority: The priority ranges from 1 to 9. A larger value indicates a higher priority.

      • CPU Affinity: Binds processes in a container or pod to specific CPU cores for execution. This method reduces CPU cache misses and context switches, which improves CPU utilization and application performance. It is suitable for scenarios that are performance-sensitive and have high real-time requirements.

        This parameter is available only in the China (Beijing) and China (Shenzhen) regions.

    Environment context

    Description

    Image

    In addition to Official Image, the following image types are supported:

    • Custom Image: You can use a custom image that is added to PAI. The image repository must be set to public pull, or the image must be stored in Container Registry (ACR). For more information, see Custom images.

    • Image URL: You can configure the URL of a custom or official image that can be accessed over the internet.

      • If it is a private image URL, click Enter Username And Password and configure the username and password for the image repository.

      • To accelerate image pulling, see Image acceleration.

    System Disk

    Used to store files during development. If you set Resource Type to Public Resources, or if you set Resource Quota to subscription general computing resources (with ≥ 2 CPU cores and ≥ 4 GB of memory, or with a GPU configured), each instance receives a free system disk quota of 100 GiB. You can scale out the disk. The price for scaling out is displayed on the console page.

    Warning
    • If you use only the free system disk quota and the instance is stopped for more than 15 days, the content on the disk is deleted.

    • You cannot scale in a disk after it has been scaled out. Scale out the disk as needed.

    • After you scale out the system disk (free + paid), it is no longer subject to the 15-day stop limit for release. However, it will continue to incur charges.

    • When an instance is deleted, its system disk is released at the same time. Before you delete the instance, make sure to back up all necessary data.

    If you need persistent storage, you can configure Mount Dataset or Mount Storage Path.

    Mount Dataset

    You can use this to store datasets that need to be read or to persistently store files from the development process. The following two dataset types are supported:

    • Custom Dataset: You can create a custom dataset to store data files required for training. You can set whether the dataset is Read-only and select a dataset version from the Version List.

    • Public Dataset: PAI provides pre-configured public datasets that only support read-only mounting.

    Mount Path: The path where the dataset is mounted to DSW, such as /mnt/data. You can retrieve the dataset from this path in your code.

    Note
    • The mount paths for multiple datasets cannot be the same.

    • If you configure a CPFS dataset, you must configure the network settings and select the same virtual private cloud (VPC) as the CPFS file system. Otherwise, the DSW instance may fail to be created.

    • When you select a dedicated resource group, the first dataset must be a NAS dataset. It will be mounted to both your specified path and the default DSW working directory /mnt/workspace/.

    For more information about mounting, see Mount a dataset, OSS bucket, NAS file system, or CPFS file system.

    Mount Storage Path

    You can also use a storage type mount to store datasets that need to be read or to persistently store files from the development process.

    • Supported types: OSS, General-purpose NAS file system, Extreme NAS file system, CPFS, and AI-Computing CPFS.

    • Mount Path: The path where the dataset is mounted to DSW, such as /mnt/data. You can retrieve the dataset from this path in your code.

    For more information about mounting, see Mount a dataset, OSS bucket, NAS file system, or CPFS file system.

    Working Directory

    The startup path for Notebook and WebIDE, mounted to /mnt/workspace.

    Show More Configurations

    Parameter

    Description

    Custom Startup Script

    Used to customize the environment or perform initialization tasks during instance startup. The custom script runs after the image and resources are ready, but before development applications such as JupyterLab and Code Server start.

    Note

    Running a custom script increases the instance startup time. The timeout period for the custom script is 3 minutes. Do not run long-running tasks, such as downloading images, in the custom script.

    Environment Variable

    Used for main container startup, system processes, and user processes. You can add custom environment variables or overwrite default system variables as needed.

    Note: Do not modify the following environment variables:

    # Modifications will not take effect
    USER_NAME # Will be overwritten by service logic
    
    # Do not modify these system variables, as it may affect normal use
    JUPYTER_NAME: Constructed from instance information by default. Can be used to modify the JupyterLab URL access path.
    JUPYTER_COMMAND: Jupyter startup instruction, set to 'lab' by default to start JupyterLab.
    JUPYTER_SERVER_ADDR: JupyterLab service listener address, defaults to 0.0.0.0.
    JUPYTER_SERVER_PORT: JupyterLab service listener port, defaults to 8088.
    JUPYTER_SERVER_AUTH: JupyterLab access password, empty by default.
    JUPYTER_SERVER_ROOT: Jupyter working directory, has lower priority than WORKSPACE_DIR.
    CODE_SERVER_ADDR: code-server service listener address, defaults to 0.0.0.0.
    CODE_SERVER_PORT: code-server service listener port, defaults to 8082.
    CODE_SERVER_AUTH: code-server access password, empty by default.
    WORKSPACE_DIR: The system sets this environment variable based on the working directory parameter set when the instance was created. It can change the startup directory for Jupyter and code-server. An error may occur if the path does not exist.

    Advanced Configuration

    Allows users to adjust certain secure kernel parameters required by their services. Currently, only instances in Lingjun resource groups support this setting. For parameter details, see the table below.

    Advanced Configuration Parameter

    Default Value

    Description

    Notes

    VmMaxMapCount

    65530

    Sets the maximum number of memory map areas a process can have.

    For example, you can set it to 1024000.

    Values less than 65530 will not take effect. Excessively high values may lead to wasted memory resources.

    Network information

    Parameter

    Description

    VPC Configuration

    This parameter is supported only when Resource Quota is set to Public Resources.

    If you want to use a DSW instance within a VPC, create a VPC in the same region as the DSW instance and configure this parameter. You also need to configure a VSwitch and a Security Group. For more information about configuration policies for different scenarios, see Network configurations.

    Public Access Gateway

    The following configuration methods are supported:

    • Public Gateway: The network bandwidth is limited. The network speed may not meet your needs during high concurrency or when downloading large files.

    • Private Gateway: To address the bandwidth limitations of the public gateway, you can create an Internet NAT gateway in the DSW instance's VPC, attach an Elastic IP Address (EIP), and configure an SNAT entry. For more information, see Improve public network access speed using a private gateway.

    The following parameters can be configured only when a CPFS dataset is selected for Mount Configuration:

    • Enable All Options: Disabled by default. The system disables VPCs that are not connected to the CPFS dataset.

    Note

    If you select a CPFS dataset for the mount configuration, you must configure a VPC, and the selected VPC must be the same as the one used by the CPFS file system.

    Access configuration

    Parameter

    Description

    Enable SSH

    Used to remotely connect to the instance. You can configure this after selecting a VPC. If you have configured a custom image, make sure that sshd is installed in the custom image.

    SSH Public Key

    You can configure this parameter after you turn on the SSH Configuration switch.

    Note

    To support both logon from within a VPC and logon from the internet, you must add the public keys of multiple clients. Add the public keys one by one, separated by line breaks. You can add up to 10 public keys.

    SSH Access Method

    You can configure this parameter after you turn on the SSH Configuration switch.

    • Access Within VPC: This access method is supported by default. You can remotely connect to the DSW instance using Secure Shell (SSH) from another terminal within the VPC, such as an ECS instance.

    • Public Access: Select this option to add public access. You can then remotely connect to the instance using SSH from a local command line or another terminal.

      • NAT Gateway: Select the Internet NAT gateway created for the VPC.

      • Elastic IP Address: Select the EIP that has been created in the Internet NAT gateway.

    Custom Service

    Used to access services running in DSW from the internet. For more information, see Access services in an instance over the internet.

    Create VPC Internal Access Domain Name

    Creates a built-in authoritative domain name (Private Zone). You can use this domain name within the VPC to access the SSH service or other custom services of the current instance, avoiding the inconvenience of using a changing instance IP address. Note that creating a built-in authoritative domain name will incur charges. For more information, see Billing of Alibaba Cloud DNS.

    Roles and permissions

    Parameter

    Description

    Visibility

    You can select Visible Only To Instance Owner or Publicly Visible Within Workspace.

    Instance Owner

    Only workspace administrators can change the instance owner.

    Show More Configurations

    Parameter

    Description

    Instance RAM Role

    When accessing other cloud resources from a DSW instance, you can associate a Resource Access Management (RAM) role with the instance. This method uses Security Token Service (STS) temporary credentials to access other cloud resources, eliminating the need to configure a long-term AccessKey and effectively reducing the risk of key leakage.

    You can configure the instance RAM role as follows:

    • PAI Default Role: Has permissions to access internal PAI products, MaxCompute, and Object Storage Service (OSS). Temporary access credentials issued based on the PAI default role will have the same permissions as the DSW instance owner when accessing internal PAI products and MaxCompute tables. When accessing OSS, it can only access the default storage path bucket configured for the current workspace.

    • Custom Role: If you want customized or more fine-grained permission management, you can configure a custom role.

    • Do Not Associate Role: If you want to access other cloud products directly using an AccessKey, you can choose not to associate a role.

    For more information about configuring an instance RAM role, see Configure an instance RAM role for a DSW instance.

  4. After you confirm that the configurations are correct, click OK.

FAQ about instance startup and release

Instance startup

1. DSW instance fails to start

Troubleshooting: Click the DSW instance name and view the error message on the Events tab.

image

Common errors that cause an instance to fail to start include the following:

  • Your requested resource type [ecs.******] is not enough currently, please try other regions or other resource types

    • Cause: The selected resource type is in high demand in the current region, which prevents instance creation.

    • Solution: Try again later, or switch to a different resource type or region.

  • Your resource usage has exceeded the default limitation. Please contact us via ticket system to raise the limitation.

    • Cause: Each Alibaba Cloud account can create instances with a maximum of two GPUs per region. The creation fails if the selected specification exceeds this limit.

    • Solution: To increase the quota, submit a ticket.

  • Sales of this resource are temporarily suspended in the specified zone. We recommend that you use the multi-zone creation function to avoid the risk of insufficient resource.

    Cause: Sales of this resource are temporarily suspended in the specified zone. Solution: You can try the following operations to mitigate the risk of insufficient resources:

    • Switch to another region.

    • Adjust the resource specification of the instance.

    • Try to start the instance during off-peak hours.

  • CommodityInstanceNotAvailableError: Commodity instance has been released due to prolonged arrears at past. Please create a new instance for use

    • Cause: The system automatically reclaimed the resources and released the instance because of prolonged overdue payments.

    • Solution: Create a new instance.

  • The charge of current ECI instance has been stopped, but the related resources are still being cleaned.

    • Cause: Trial resources are public resources. If you start a DSW instance during peak hours, it may take more than 30 minutes to start. If the resources cannot be pulled within one hour, the system prompts that the selected specification is unavailable in the current region.

    • Solution: Try one of the following operations:

      • Switch the region.

      • Change the resource specification of the instance. You cannot modify the specification of a pending instance. You must manually stop the instance before you can change the specification.

      • Start the instance during off-peak hours, such as outside of business hours.

      • If none of these methods resolve the issue, contact your account manager for assistance.

  • The cluster resources are fully utilized. Please try later or other regions.

    • Cause: The computing resources are fully occupied.

    • Solution: Try one of the following operations:

      • Switch the region.

      • Change the resource specification of the instance. You cannot modify the specification of a pending instance. You must manually stop the instance before you can change the specification.

      • Start the instance during off-peak hours, such as outside of business hours.

      • If none of these methods resolve the issue, contact your account manager for assistance.

  • Create ECI failed because the specified instance is out of stock. It is recommended to use the multi-zone creation function to avoid the risk of stockout.

    Cause: The specified computing resource is out of stock.

    Solution: Try one of the following operations:

    • Switch the region.

    • Change the resource specification of the instance. You cannot modify the specification of a pending instance. You must manually stop the instance before you can change the specification.

    • Start the instance during off-peak hours, such as outside of business hours.

    • If none of these methods resolve the issue, contact your account manager for assistance.

  • back-off 10s restarting failed container=dsw-notebook pod

    • Cause: The system disk is full. You need to expand the system disk.

      To check system disk usage:

      image

      image

    • Solution: Expand the system disk by clicking Change Configuration:

      image

      Important

      After the system disk is expanded, it is billed continuously, regardless of whether the instance is running. To stop all billing for the DSW instance, you must delete it. Before you delete the instance, make sure to back up all necessary data.

  • the available zone with vSwitch is out of stock

    • Cause: A VPC was configured when the DSW instance was created. Because the vSwitch in the VPC has a zone attribute, configuring the vSwitch limits the search for computing resources to that zone, which can lead to resource shortages.

    • Solution: Change the configuration of the DSW instance to leave the VPC field empty.

      image

      Note

      To use a VPC, we recommend switching to another zone and creating a new vSwitch and DSW instance. This expands the range of available resources and helps avoid stock shortages.

  • Startup failed with the message "Workspace member not found"

    This error indicates that the account you are using is not a member of the target workspace. Contact the workspace administrator to add your account as a member.

Other reasons for startup failure:

  • Instance creation fails due to overdue payments

    If your account has an overdue payment, you cannot create a DSW instance. Vouchers cannot be used to offset the overdue amount. You can log on to the Expenses and Costs console to check for overdue payments.

2. Can I execute a Python file when a DSW instance starts?

No. DSW does not currently support executing Python files on startup.

3. Cannot find the DSW instance?

If you cannot find your instance, try switching to different regions and workspaces.

image

4. What should I do if the DSW page is abnormal or cannot be operated?

If you encounter a blank page, a Notebook that is stuck loading, or a Terminal that does not accept commands, the issue is usually related to your local environment. Try the following troubleshooting steps:

  1. Clear your browser cache and try again.

  2. Use your browser's incognito or private mode to access the page.

  3. Change your network environment. For example, switch from your company's internal network to a mobile hotspot to rule out firewall restrictions.

  4. Try using a different browser, such as Chrome or Firefox.

5. Will data be lost if I stop, restart, change the specification, or change the image of a DSW instance?

  • Stopping or restarting an instance: No data is lost. After an instance is stopped or restarted, all packages installed using pip, code files, and other data stored on the instance disk are retained.

  • Changing the instance specification: No data is lost. Adjusting the instance specification, such as the CPU, memory, or GPU, does not affect the disk data.

  • Changing the instance image: Some data might be lost. Changing the image does not affect mounted datasets or data in OSS, but the content on the system disk might be reset. Therefore, when you change the instance image, you must save your instance data. For example, you can copy or move the data to a dataset or to OSS. For more information, see Mount a dataset, OSS bucket, NAS file system, or CPFS file system.

Instance stop/delete/release

1. How do I release a DSW instance?

On the DSW instance list page, click Stop or Delete for the target instance.

image

Note: If the system disk was expanded when the DSW instance was created, the system disk is billed continuously, regardless of whether the instance is running. To stop all billing for the DSW instance, you must delete the instance.

2. Why can't I find my DSW instance?

If you cannot find your instance, try switching to different regions and workspaces.

image

3. How do I release a free trial resource plan?

You do not need to stop or delete free trial resource plans.

4. How do I completely stop billing for a DSW instance? What is the difference between the "Stop" and "Delete" operations?

  • Stop instance: This operation releases the instance's computing resources (CPU and GPU) and pauses billing for them. Note: The expanded system disk continues to be billed.

  • Delete instance: This operation permanently deletes the instance and all its resources, including the system disk. All related billing stops completely.

When to choose which operation:

  • Stop: If you are not using the instance temporarily but want to keep the data and environment for future restarts.

  • Delete: If you no longer need the instance and want to stop all billing. Back up your data before you perform this operation.

5. Why is my DSW instance stuck in the "Stopping" or "Deleting" state and the operation cannot be completed?

Stopping or deleting an instance takes time because the system needs to safely terminate tasks, save the instance state, and reclaim resources. If an instance is unresponsive for a long time, the common reasons include the following:

  • Processes in the instance have not terminated properly.

  • High memory usage prevents the instance from responding to the shutdown command.

If you encounter this situation, wait for a few minutes and then refresh the page. The instance should then show a normal stopped status.

6. Will my data and code be lost after stopping or deleting a DSW instance?

Whether data is retained depends on the operation and the resource group type of the instance.

  • Stop instance:

    Data retention policies vary by resource group type.

    • Public resource group instance: Data is retained on the mounted disk. Note: If the instance is stopped for more than 15 consecutive days, the disk and its data are deleted.

    • Dedicated resource group instance: Data is stored on the instance's system disk. Stopping the instance deletes the data, and it cannot be recovered.

  • Delete instance:

    All data on the system disk is permanently erased and cannot be recovered. Therefore, you must back up all important data before you delete the instance.

7. Why does my running DSW instance stop automatically?

The instance is configured with an idle auto-shutdown policy. This policy is designed to save resources and is enabled by default for free trial instances.

  • Trigger condition: The instance's CPU and GPU usage is below the set threshold for three consecutive hours.

  • Recommended action:

    • Manual stop: To save resources, manually stop the instance when it is not in use. The auto-shutdown policy is not guaranteed to trigger every time.

    • Modify policy: To run long-term tasks, modify or disable this policy. The steps are as follows:

      Modify DSW auto-shutdown policy

      1. Go to the workspace details page and click Workspace Configurations > Scheduling Configurations.

        image

      2. Find the DSW configuration section, where you can modify the DSW shutdown policy and exclusion policy.

        image

8. I have stopped or deleted all my DSW instances, so why does it still show "Running" or why do I receive billing notifications?

This may be due to one of the following common reasons:

  • You may be confusing resource plans with instances. The 'Running' status that you see might refer to a resource plan, such as '250 billable hours per month', not an instance. A resource plan is always active during its validity period, and its status is independent of any instance.

  • The expanded system disk is still being billed. Stopping an instance only pauses billing for computing resources. An expanded system disk continues to incur storage fees.

  • There is a delay in billing. Billing is not in real-time, and a bill might be generated several hours after you use a resource. For example, charges incurred in the morning might not appear on the bill until the afternoon.