All Products
Search
Document Center

Platform For AI:PS-SMART Multiclass Classification

Last Updated:Feb 26, 2025

A parameter server (PS) is used to process a large number of offline and online training jobs. Scalable Multiple Additive Regression Tree (SMART) is an iterative algorithm that is implemented by using a PS-based gradient boosting decision tree (GBDT). The PS-SMART Multiclass Classification component of Platform for AI (PAI) supports training jobs for tens of billions of samples and hundreds of thousands of features. The component can run training jobs on thousands of nodes. The component also supports multiple data formats and optimization technologies, such as approximation by using histograms.

Limits

The input data of the PS-SMART Multiclass Classification component must meet the following requirements:

  • Data in the destination columns must be of numeric data types. If the data type in the MaxCompute table is STRING, the data must be converted into a numeric data type. For example, if the classification object is a string, such as Good/Medium/Bad, you must convert the string into 0/1/2.

  • If the data is in the key-value format, feature IDs must be positive integers and feature values must be real numbers. If the feature IDs are of the STRING type, you must use the serialization component to serialize the data. If the feature values are categorical strings, you must perform feature engineering, such as feature discretization, to process the values.

  • The PS-SMART Multiclass Classification component supports hundreds of thousands of feature-related jobs. However, these jobs are resource-intensive and time-consuming. To resolve this issue, you can use GBDT algorithms in the training. GBDT algorithms are suitable for scenarios in which continuous features are used for training. You can perform one-hot encoding on categorical features to filter low-frequency features. We recommend that you do not perform feature discretization on continuous features of numeric data types.

  • The PS-SMART algorithm may introduce randomness. For example, randomness may be introduced in the following scenarios: data and feature sampling based on data_sample_ratio and fea_sample_ratio, optimization of the PS-SMART algorithm by using histograms for approximation, and merging of a local sketch into a global sketch. The structures of trees vary when jobs run on multiple worker nodes in distributed mode. However, the training effect of the model is theoretically the same. You may obtain different results even if you use the same data and parameters during training.

  • If you want to accelerate training, you can set the Cores parameter to a larger value. The PS-SMART algorithm starts training jobs after the required resources are provided. The waiting period increases with the amount of the requested resources.

Usage notes

When you use the PS-SMART Multiclass Classification component, take note of the following items:

  • The PS-SMART Multiclass Classification component supports hundreds of thousands of feature-related jobs. However, these jobs are resource-intensive and time-consuming. To resolve this issue, you can use GBDT algorithms in the training. GBDT algorithms are suitable for scenarios in which continuous features are used for training. You can perform one-hot encoding on categorical features to filter low-frequency features. We recommend that you do not perform feature discretization on continuous features of numeric data types.

  • The PS-SMART algorithm may introduce randomness. For example, randomness may be introduced in the following scenarios: data and feature sampling based on data_sample_ratio and fea_sample_ratio, optimization of the PS-SMART algorithm by using histograms for approximation, and merging of a local sketch into a global sketch. The structures of trees vary when jobs run on multiple worker nodes in distributed mode. However, the training effect of the model is theoretically the same. You may obtain different results even if you use the same data and parameters during training.

  • If you want to accelerate training, you can set the Cores parameter to a larger value. The PS-SMART algorithm starts training jobs after the required resources are provided. The waiting period increases with the amount of the requested resources.

Configure the component

Method 1: Configure the component in the PAI console

Add the PS-SMART Multiclass Classification component on the pipeline page of Machine Learning Designer. Configure the following parameters:

Category

Parameter

Description

Fields Setting

Use Sparse Format

Specify whether the input data is in the sparse format. If the input data is sparse data in the key-value format, separate key-value pairs with spaces, and separate keys and values with colons (:). Example: 1:0.3 3:0.9.

Feature Columns

Select the feature columns for training from the input table. If the data in the input table is in the dense format, only the columns of the BIGINT and DOUBLE types are supported. If the data in the input table is key-value pairs in the sparse format, and keys and values are of numeric data types, only columns of the STRING type are supported.

Label Column

The label column in the input table. Columns of the STRING and numeric data types are supported. However, only data of numeric data types can be stored in the columns. For example, column values can be 0 or 1 in binary classification.

Weight Column

Select the column that contains the weight of each row of samples. Columns of numeric data types are supported.

Parameters Setting

Classes

The number of classes for multiclass classification. If you set the parameter to n, the values of the label column are {0,1,2,...,n-1}.

Evaluation Indicator Type

You can set this parameter to Multiclass Negative Log Likelihood or Multiclass Classification Error.

Trees

The number of trees. The value must be a positive integer. The value of Trees is proportional to the training duration.

Maximum Decision Tree Depth

The default value is 5, which indicates that up to 32 leaf nodes can be configured.

Data Sampling Ratio

The data sampling ratio when trees are built. The sample data is used to build a weak learner to accelerate training.

Feature Sampling Fraction

The feature sampling ratio when trees are built. The sample features are used to build a weak learner to accelerate training.

L1 Penalty Coefficient

The size of a leaf node. A larger value indicates a more even distribution of leaf nodes. If overfitting occurs, increase the parameter value.

L2 Penalty Coefficient

The size of a leaf node. A larger value indicates a more even distribution of leaf nodes. If overfitting occurs, increase the parameter value.

Learning Rate

Enter the learning rate. Valid values: (0,1).

Sketch-based Approximate Precision

Enter the threshold for selecting quantiles when you build a sketch. A smaller value indicates that more bins can be obtained. In most cases, the default value 0.03 is used.

Minimum Split Loss Change

Enter the minimum loss change required for splitting a node. A larger value indicates a lower probability of node splitting.

Features

Enter the number of features or the maximum feature ID. Configure this parameter if you want to assess resource usage.

Global Offset

Enter the initial prediction values of all samples.

Random Seed

Enter the random seed. The value of this parameter must be an integer.

Feature Importance Type

The type of feature. Valid values:

  • Weight: the number of splits of the feature.

  • Gain: the information gain provided by the feature.

  • Cover: the number of samples covered by the feature on the split node.

Tuning

Cores

The number of cores. By default, the system determines the value.

Memory Size per Core (MB)

The memory size of each core. Unit: MB. In most cases, the system determines the memory size.

Method 2: Configure the component by using PAI commands

Use PAI commands to configure the PS-SMART Multiclass Classification component. You can use the SQL Script component to call PAI commands. For more information, see SQL Script.

--Training 
PAI -name ps_smart
    -project algo_public
    -DinputTableName="smart_multiclass_input"
    -DmodelName="xlab_m_pai_ps_smart_bi_545859_v0"
    -DoutputTableName="pai_temp_24515_545859_2"
    -DoutputImportanceTableName="pai_temp_24515_545859_3"
    -DlabelColName="label"
    -DfeatureColNames="features"
    -DenableSparse="true"
    -Dobjective="multi:softprob"
    -Dmetric="mlogloss"
    -DfeatureImportanceType="gain"
    -DtreeCount="5"
    -DmaxDepth="5"
    -Dshrinkage="0.3"
    -Dl2="1.0"
    -Dl1="0"
    -Dlifecycle="3"
    -DsketchEps="0.03"
    -DsampleRatio="1.0"
    -DfeatureRatio="1.0"
    -DbaseScore="0.5"
    -DminSplitLoss="0"
--Prediction 
PAI -name prediction
    -project algo_public
    -DinputTableName="smart_multiclass_input";
    -DmodelName="xlab_m_pai_ps_smart_bi_545859_v0"
    -DoutputTableName="pai_temp_24515_545860_1"
    -DfeatureColNames="features"
    -DappendColNames="label,features"
    -DenableSparse="true"
    -DkvDelimiter=":"
    -Dlifecycle="28"

Module

Parameter

Required

Default value

Description

Data parameters

featureColNames

Yes

N/A

The feature columns that are selected from the input table for training. If data in the input table is in the dense format, only the columns of the BIGINT and DOUBLE types are supported. If data in the input table is sparse data in the key-value format, and keys and values are of numeric data types, only columns of the STRING data type are supported.

labelColName

Yes

N/A

The label column in the input table. Columns of the STRING type and numeric data types are supported. However, only data of numeric data types can be stored in the columns. For example, column values can be {0,1,2,…,n-1} in multiclass classification. n indicates the number of classes.

weightCol

No

N/A

Select the column that contains the weight of each row of samples. Columns of numeric data types are supported.

enableSparse

No

false

Specify whether the input data is in the sparse format. Valid values: true and false. If the input data is sparse data in the key-value format, separate key-value pairs with spaces, and separate keys and values with colons (:). Example: 1:0.3 3:0.9.

inputTableName

Yes

N/A

The name of the input table.

modelName

Yes

N/A

The name of the output model.

outputImportanceTableName

No

N/A

The name of the table that contains feature importance.

inputTablePartitions

No

N/A

The partitions that are selected from the input table for training. Format: ds=1/pt=1.

outputTableName

No

N/A

The MaxCompute table that is generated. The table is a binary file that cannot be read and can be obtained only by using the PS-SMART prediction component.

lifecycle

No

3

The lifecycle of the output table.

Algorithm parameters

classNum

Yes

N/A

The number of classes for multiclass classification. If you set this parameter to n, the values of the label column are {0,1,2,...,n-1}.

objective

Yes

N/A

The type of the objective function. If you use multiclass classification for training, specify the multi:softprob objective function.

metric

No

N/A

The evaluation metric type of the training set, which is specified in stdout of the coordinator in a logview. Valid values:

  • mlogloss: corresponds to the Multiclass Negative Log Likelihood value of the Evaluation Index Type parameter in the console.

  • merror: corresponds to the Multiclass Classification Error value of the Evaluation Index Type parameter in the console.

treeCount

No

1

The number of trees. The value is proportional to the amount of training time.

maxDepth

No

5

The maximum depth of a tree. Valid values: 1 to 20.

sampleRatio

No

1.0

The data sampling ratio. Valid values: (0,1]. If you set this parameter to 1.0, no data is sampled.

featureRatio

No

1.0

The feature sampling ratio. Valid values: (0,1]. If you set this parameter to 1.0, no data is sampled.

l1

No

0

The L1 penalty coefficient. A larger value indicates a more even distribution of leaf nodes. If overfitting occurs, increase the parameter value.

l2

No

1.0

The L2 penalty coefficient. A larger value indicates a more even distribution of leaf nodes. If overfitting occurs, increase the parameter value.

shrinkage

No

0.3

Valid values: (0,1).

sketchEps

No

0.03

The threshold for selecting quantiles when you build a sketch. The number of bins is O(1.0/sketchEps). A smaller value indicates that more bins can be obtained. In most cases, the default value is used. Valid values: (0,1).

minSplitLoss

No

0

The minimum loss change required for splitting a node. A larger value indicates a lower probability of node splitting.

featureNum

No

N/A

The number of features or the maximum feature ID. Configure this parameter if you want to assess resource usage.

baseScore

No

0.5

The initial prediction values of all samples.

randSeed

No

N/A

The random seed. The value of this parameter must be an integer.

featureImportanceType

No

gain

The type of the feature importance. Valid values:

  • weight: indicates the number of splits of the feature.

  • gain: indicates the information gain provided by the feature.

  • cover: indicates the number of samples covered by the feature on the splitting node.

Tuning parameters

coreNum

No

Automatically allocated

The number of cores used in computing. The speed of the computing algorithm increases with the value of this parameter.

memSizePerCore

No

Automatically allocated

The memory size of each core. Unit: MB.

PS-SMART model deployment

If you want to deploy the model generated by the PS-SMART Binary Classification Training component to EAS as an online service, you must add the Model export component as a downstream node for the PS-SMART Binary Classification Training component and configure the Model export component. For more information, see Model export.

After the Model export component is successfully run, you can deploy the generated model to EAS as an online service on the EAS-Online Model Services page. For more information, see Deploy a model service in the PAI console.