All Products
Search
Document Center

Platform For AI:Overview of Designer components

Last Updated:Jun 05, 2025

Recommended algorithm components

Recommended algorithm components include common general algorithms (such as data reading algorithms, SQL scripts, Python scripts) and LLM data processing algorithms (such as LLM data processing, LVM data processing), along with LLM training and inference algorithms. We recommend DLC-based algorithm components, which support heterogeneous resources and user-defined environments for more flexible usage.

Type

Component

Description

Custom components

Custom component

You can create custom components in AI computing asset management. Then, use it together with official components in Designer.

Data source/target

Read File Data

Reads files or directories from Object Storage Service (OSS) buckets.

Read CSV File

Reads CSV files from OSS, HTTP, and HDFS.

Read Table

Reads data from MaxCompute tables, by default, in the current project.

Write Table

Writes upstream data to MaxCompute.

User defined script

SQL Script

A custom SQL component that allows you to write SQL statements in an editor and submit them to MaxCompute for execution.

Python Script

Defines dependencies and runs custom Python functions.

Tools

Dataset Register

Registers datasets to AI asset management.

Model Register

Registers models to AI asset management.

Update EAS Service (Beta)

Calls eascmd to update the specified EAS service. The service to be updated must be in the running state. A new service version will be created each time.

Large model data preprocessing

Data conversion

Export MaxCompute Table to OSS

Imports MaxCompute tables to OSS.

Export OSS Data to MaxCompute Table

Imports data from OSS to MaxCompute tables.

LLM data processing (DLC)

LLM-MD5 Deduplicator (DLC)

Calculates the MD5 hash values of text and deduplicate text based on hash values.

LLM-Text Normalizer (DLC)

Normalizes Unicode text and converts traditional Chinese to simplified Chinese.

LLM-Clean Special Content (DLC)

Removes URLs from text. It can also remove HTML format characters and parse HTML text.

LLM-Special Characters Ratio Filter (DLC)

Filters samples based on the proportion of special characters, keeping samples within the specified ratio range.

LLM-Clean Copyright Information (DLC)

Deletes copyright information from text, often used to remove header copyright comments from code text.

LLM-Count Filter (DLC)

Filters samples based on the ratio of numbers and alphabetic characters.

LLM-Length Filter (DLC)

Filters samples based on text length, average length, maximum line length, etc.

LLM-Quality Predict and Language Recognition-FastText (DLC)

Identifies the language of text and calculates scores. Then, it filters samples based on the language and score.

LLM-Sensitive Keywords Filter (DLC)

Filters out samples containing sensitive words.

LLM-Sensitive Content Mask (DLC)

Masks sensitive information, such as replacing email addresses with [EMAIL], phone/telephone numbers with [TELEPHONE] or [MOBILEPHONE], and ID card numbers with [IDNUM].

LLM-Document Deduplicator (DLC)

Calculates similarity between texts using the SimHash algorithm to achieve text deduplication.

LLM-N-Gram Repetition Filter (DLC)

Keeps samples with character-level or word-level N-Gram repetition ratios within the specified range.

LLM-LaTeX Expand Macro (DLC)

Used for TEX document format data. It performs inline expansion on all macros without parameters. If a macro consists of letters and numbers and has no parameters, the macro name is replaced with the macro value.

LLM-LaTeX Remove Bibliography (DLC)

Used for TEX document format data. It deletes bibliographies at the end of LaTeX format text.

LLM-LaTeX Remove Comments (DLC)

Used for TEX document format data. It deletes comment lines and inline comments in LaTeX format text.

LLM-LaTeX Remove Header (DLC)

Used for TEX document format data. It finds the first string matching the <section-type>[optional-args]{name} chapter format, and deletes all content before it, keeping all content after the first matched chapter, including the chapter title.

LLM data processing (MaxCompute)

LLM-MD5 Deduplicator (MaxCompute)

Used for text data preprocessing for LLMs. It calculates MD5 hash values of text and deduplicates text based on hash values.

LLM-Text Normalizer (MaxCompute)

Used for text data preprocessing for LLMs. It normalizes Unicode text and converts traditional Chinese to simplified Chinese.

LLM-Clean Special Content (MaxCompute)

Used for text data preprocessing for LLMs. It removes special content from text, such as navigation information, author information, article source information, URL links, invisible characters, remove HTML format characters and parses HTML text, etc.

LLM-Special Character Ratio Filter (MaxCompute)

Used for text data preprocessing for LLMs. It filters samples based on special character ratio, keeping samples where the proportion of special characters to total text length is within the specified range.

LLM-Clean Copyright Information (MaxCompute)

Used for text data preprocessing for LLMs. It deletes copyright information from text, often used to remove header copyright comments from code text.

LLM-Count Filter (MaxCompute)

Used for text data preprocessing for LLMs. It filters samples based on the count of letters, numbers, and separators.

LLM-Length Filter (MaxCompute)

Used for text data preprocessing for LLMs. It filters samples based on text length, average length, maximum line length, etc. Average length and maximum line length filtering will by default split the text by line before calculating statistics.

LLM-Text Quality Predict and Language Identification-FastText (MaxCompute)

Used for text data preprocessing for LLMs. It identifies the language of text and calculates scores, and can filter samples based on language and score.

LLM-Sensitive Keywords Filter (MaxCompute)

Used for text data preprocessing for LLMs. It filters out samples containing sensitive words.

LLM-Sensitive Content Mask (MaxCompute)

Used for text data preprocessing for LLMs. It masks sensitive information, such as replacing email addresses with [EMAIL], phone/telephone numbers with [TELEPHONE] or [MOBILEPHONE], and ID card numbers with [IDNUM].

LLM-Sentence Deduplicator

Used for text data preprocessing for LLMs. It deduplicates sentences within an article.

LLM-N-Gram Repetition Filter (MaxCompute)

Used for text data preprocessing for LLMs. It keeps samples with character-level or word-level N-Gram repetition ratios within the specified range.

LLM-LaTeX Expand Macro (MaxCompute)

Used for text data preprocessing for LLMs, suitable for TEX document format data. It performs inline expansion on all macros without parameters. If a macro consists of letters and numbers and has no parameters, the macro name is replaced with the macro value.

LLM-LaTeX Remove Bibliography (MaxCompute)

Used for text data preprocessing for LLMs, suitable for TEX document format data. It deletes bibliographies at the end of LaTeX format text.

LLM-LaTeX Remove Comments (MaxCompute)

Used for text data preprocessing for LLMs, suitable for TEX document format data. It deletes comment lines and inline comments in LaTeX format text.

LLM-LaTeX Remove Header (MaxCompute)

Used for text data preprocessing for LLMs, suitable for TEX document format data. It finds the first string matching the <section-type>[optional-args]{name} chapter format, and deletes all content before it, keeping all content after the first matched chapter, including the chapter title.

LVM data processing (DLC)

Video data preprocessing

LVM-Text-Ratio Filter (DLC)

Filters video data with excessive text. it is particularly suitable for video editing and content moderation scenarios, helping users automatically identify and process video segments containing too much text, thereby improving work efficiency.

LVM-Motion Filter (DLC)

Filters video data with too fast or too slow motion.

LVM-Aesthetic Filter (DLC)

Filters video data with low aesthetic scores.

LVM-Aspect-Ratio Filter (DLC)

Filters video data with aspect ratios that are too large or too small.

LVM-Duration Filter (DLC)

Filters video data with durations that are too long or too short.

LVM-Text-Frame-Similarity Filter (DLC)

Filters video data with low similarity scores.

LVM-NSFW Filter (DLC)

Filters video data with high NSFW scores.

LVM-Resolution Filter (DLC)

Filters video data with resolutions that are too high or too low.

LVM-Watermark Filter (DLC)

Filters video data with watermarks.

LVM-Tag Filter (DLC)

Filters video data that does not match specified tags.

LVM-Tag Mapper (DLC)

Calculates tags for video frames.

LVM-Caption-Frames Mapper (DLC)

Generates text for videos.

LVM-Caption-Video Mapper (DLC)

Generates text for videos.

Image data preprocessing

LVM-Image-Aesthetic Filter (DLC)

Filters image data with low aesthetic scores.

LVM-Image-Aspect-Ratio Filter (DLC)

Filters image data with aspect ratios that are too large or too small.

LVM-Image-Face-Ratio Filter (DLC)

Filters image data with face proportions that are too large or too small.

LVM-Image-NSFW Filter (DLC)

Filters image data with high NSFW scores.

LVM-Image-Shape Filter (DLC)

Filters image data with resolutions that are too high or too low.

LVM-Image-Size Filter (DLC)

Filters image data that is too large or too small.

LVM-Image-Text-Matching Filter (DLC)

Filters image data with low text-image match scores.

LVM-Image-Text-Similarity Filter (DLC)

Filters image data with low text-image similarity scores.

LVM-Image-Watermark Filter (DLC)

Filters image data with watermarks.

LVM-Image-Caption Mapper (DLC)

Generates natural language descriptions for input images.

Large model training and inference

LLM Model Training

Supports some LLMs from PAI-Model Gallery.

LLM Model Inference

Supports some LLMs from PAI-Model Gallery, converting online inference to offline inference.

PAI BERT Model Inference

Used for BERT model offline inference, utilizing trained BERT classification models to classify text in the input table.

Traditional algorithm components

Important

Traditional algorithm components are early developed algorithms that have not been updated for a long time. We cannot guarantee their stability. If you have to use them in a production environment, first evaluate their applicability. If they are already used in production, replace them with preferred components as soon as possible.

Type

Component

Description

Data preprocessing

Random Sampling

Performs random independent sampling on the input according to a given proportion or number.

Weighted Sampling

Generates sampling data based on the values of weighted columns.

Filtering and Mapping

Filters data based on expressions, and you can modify the output field names.

Stratified Sampling

Given a grouping column, it divides the input data into different groups based on the different values of these columns, and performs random sampling separately within each group.

JOIN

Merges two tables by associating the columns in the tables and determines the output fields. It works like the JOIN statement of SQL.

Merge Columns

Merges two tables by column. The two tables must have the same number of rows, otherwise an error will occur. If only one of the two tables has partitions, the partitioned table needs to connect to the second input port.

Merge Rows (UNION)

Merges two tables by row. The numbers and data types of the output fields selected from the left and right tables must be the same. This component integrates the features of UNION and UNION ALL.

Data Type Conversion

Converts features of any data type to STRING, DOUBLE, and INT features, and supports missing value filling when conversion exceptions occur.

Append ID Column

Appends an ID column to the first column of a data table.

Split

Randomly splits data to generate training and test datasets.

Missing Data Imputation

Handles missing data in datasets. You can configure the parameters of this component in the console or PAI commands.

Normalization

Normalizes dense data or sparse data.

Standardization

Generates standardized instances in the console or by running PAI commands.

KV to Table

Converts a table in KV (Key:Value) format into a standard table format.

Table to KV

Converts a standard table into a KV (Key:Value) format table, in the console or by running PAI commands.

Feature engineering

Feature Importance Filtering

Provides filtering functionality for components such as linear feature importance, GBDT feature importance, and random forest feature importance, supporting the filtering of TopN features.

Principal Component Analysis (PCA)

A multivariate statistical method that studies how to reveal the internal structure between multiple variables through a small number of principal components, examining the correlation among multiple variables.

Feature Scaling

Performs common scaling transformations on numeric features in dense or sparse format.

Feature Discretization

Discretizes continuous features based on a specific rule.

Feature Softening

Smooths anomalous data in input features to a specific interval, supporting both sparse and dense data formats.

Singular Value Decomposition (SVD)

An important matrix decomposition in linear algebra, which is a generalization of the diagonalization of normal matrices in matrix analysis.

Anomaly Detection

Detects data with continuous and enumeration features. It helps you identify anomalous points in your data.

Linear Model Feature Importance

Includes linear regression and binary logistic regression, and supports both sparse and dense data formats.

Discrete Feature Analysis

Analyzes the distribution of discrete features.

Random Forest Feature Importance Evaluation

Calculates feature importance.

Feature Selection (Filter Method)

Selects and filters the top N feature data from all sparse or dense format feature data based on the different feature selection methods you use.

Feature Encoding

Encodes nonlinear features into linear features through GBDT.

One Hot Encoding

Converts data into sparse data, and the output result is also in a sparse key-value structure.

Statistical analysis

Data Pivoting

Helps you visually understand the distribution of features and label columns along with the characteristics of features, which facilitates subsequent data analysis.

Covariance

Measures the joint variability of two variables.

Empirical Probability Density Chart

Uses empirical distribution and kernel distribution algorithms to estimate the probability density of sample data.

Whole Table Statistics

Collecta statistics about data in a table or only selected columns.

Chi-square Goodness of Fit Test

Used in scenarios where variables are categorical variables. It aims to test whether the actual observed frequency and theoretical frequency are consistent across classifications of a single multinomial categorical variable. The null hypothesis is that there is no difference between the observed frequency and theoretical frequency.

Box Plot

A box plot chart is a statistical graph used to display the dispersion of a set of data. It is mainly used to reflect the distribution characteristics of the original data, and can also be used to compare the distribution characteristics of multiple sets of data.

Scatter Plot

In regression analysis, a scatter plot shows the distribution of data points in a Cartesian coordinate system.

Correlation Coefficient Matrix

The correlation coefficient algorithm is used to calculate the correlation coefficient between each column in a matrix, with values ranging from [-1,1]. When the system calculates, the count is based on the number of elements that are simultaneously non-empty between two columns, which may differ between different column pairs.

Two Sample T Test

Based on statistical principles, it tests whether there is a significant difference between the means of two samples.

One Sample T Test

Tests whether there is a significant difference between the overall mean of a variable and a specified value. The sample being tested must follow a normal distribution overall.

Normality Test

Determines whether the population follows normal distribution by using observations. It is an important special type of goodness-of-fit hypothesis test in statistical decision-making.

Lorenz Curve

Helps you see the income distribution of a country or region.

Percentile

A statistical term used to calculate the percentile of column data in a data table.

Pearson Coefficient

A linear correlation coefficient that measures the linear correlation between two variables.

Histogram

Also known as a mass distribution chart, a statistical reporting graph that uses a series of vertical bars or line segments of varying heights to represent data distribution.

Machine learning

Prediction

Uses the training model and prediction data as input and generates prediction results as output.

XGboost Train

An extension and upgrade based on the boosting algorithm, with better usability and robustness, widely used in various machine learning production systems and competition fields. It currently supports classification and regression.

XGboost Predict

An extension and upgrade based on the boosting algorithm, with better usability and robustness, widely used in various machine learning production systems and competition fields. It currently supports classification and regression.

Linear SVM

A machine learning method based on statistical learning theory. It improves the generalization ability of the learning machine by seeking structural risk minimization, thereby achieving minimization of empirical risk and confidence range.

Logistic Regression for Binary Classification

A binary classification algorithm that supports both sparse and dense data formats.

GBDT Binary Classification

This component works by setting a threshold. If the feature value is greater than the threshold, it is classified as a positive sample. Otherwise, it is classified as a negative sample.

PS-SMART Binary Classification Training

The parameter server PS (Parameter Server) is dedicated to solving large-scale offline and online training tasks. SMART (Scalable Multiple Additive Regression Tree) is an iterative algorithm implemented by GBDT (Gradient Boosting Decision Tree) based on PS.

PS Logistic Regression for Binary Classification

A classic binary classification algorithm widely used in advertising and search scenarios.

PS-SMART Multiclass Classification

The parameter server PS is dedicated to solving large-scale offline and online training tasks. SMART is an iteration algorithm implemented based on PS for GBDT.

K-NN

Selects the K records with the closest distance from the training table for each row of data in the prediction table, and uses the class with the largest number of categories among these K records as the class of that row.

Logistic Regression for Multiclass Classification

A binary classification algorithm. The logistic regression provided by PAI supports multiclass classification and both sparse and dense data formats.

Random Forest

A classifier that consists of multiple decision trees. The classification result is determined by the mode of output classes of individual trees.

Naive Bayes

A probabilistic classification algorithm based on Bayes' theorem with independence assumptions.

K-means Clustering

Randomly selects K objects as the initial clustering centers for each cluster, then calculates the distance between the remaining objects and each cluster center, assigns them to the nearest cluster, and recalculates the clustering center for each cluster.

DBSCAN

Builds clustering models.

GMM Training

Implements model classification.

DBSCAN Prediction

Predicts the cluster to which new point data belongs based on the DBSCAN training model.

GMM Prediction

Performs clustering prediction based on trained Gaussian mixture models.

GBDT Regression

An iterative decision tree algorithm, suitable for linear and nonlinear regression scenarios.

Linear Regression

A model that analyzes the linear relationship between a dependent variable and multiple independent variables.

PS-SMART Regression

Solves large-scale offline and online training tasks. SMART is an iterative algorithm implemented based on PS for GBDT.

PS Linear Regression

A model that analyzes the linear relationship between a dependent variable and multiple independent variables. The PS is dedicated to solving large-scale offline and online training tasks.

Binary Classification Evaluation

Calculates AUC, KS, and F1 Score metrics to generate KS curves, PR curves, ROC curves, LIFT Chart, and Gain Chart.

Regression Model Evaluation

Evaluates the quality of regression algorithm models based on prediction results and original results, and outputs evaluation metrics and residual histograms.

Clustering Model Evaluation

Evaluates the quality of clustering models based on the original data and clustering results, and outputs evaluation metrics.

Confusion Matrix

Suitable for supervised learning and corresponds to the matching matrix in unsupervised learning.

Multiclass Classification Evaluation

Evaluates the advantages and disadvantages of multiclass classification algorithm models based on the prediction results and original results of classification models, and outputs evaluation metrics (such as Accuracy, Kappa, and F1-Score).

Deep learning

Enable deep learning

PAI supports deep learning frameworks. You can use these frameworks and hardware resources to implement deep learning algorithms.

Time series

x13_arima

An Arima algorithm for seasonal adjustment based on the open-source X-13ARIMA-SEATS package.

x13_auto_arima

Includes an automatic ARIMA model selection program, mainly based on the program by Gomez and Maravall (1998) implemented in TRMO (1996) and subsequent revisions.

Prophet

Performs Prophet time series prediction on each row of MTable data and provides prediction results for the next time period.

MTable Assembler

Aggregates a table into an MTable based on grouping columns.

MTable Expander

Expands an MTable into a table.

Recommendation

FM algorithm

The FM (Factorization Machine) algorithm takes into account the interactions between features. It is a nonlinear model suitable for recommendation scenarios in e-commerce, advertising, and live streaming.

Als Matrix Factorization

The Alternating Least Squares (ALS) algorithm performs model decomposition on sparse matrices, evaluates the values of missing items, and obtains the basic training model.

Swing Train

An item recall algorithm. You can use it to measure item similarity based on the User-Item-User principle.

Swing Recommendation

A batch processing prediction component for Swing. You can use it to perform offline prediction based on the Swing training model and prediction data.

Collaborative Filtering (etrec)

etrec is a collaborative filtering algorithm based on item, with two columns as input and the TopN similarity between items as output.

Vector-based Recall Evaluation

Calculates the hitrate results of recalls. Hitrate serves as an evaluation of result quality, with a higher hitrate indicating that the vectors produced by training achieve more accurate recall results.

Outlier detection

LOF Outlier

Determines whether samples are abnormal based on the Local Outlier Factor (LOF) values of data samples.

Use iForest Outlier to detect anomalies

Uses the sub-sampling algorithm, which reduces the computational complexity of the algorithm. It can identify anomalous points in data and has significant application effects in the field of anomaly detection.

One-Class SVM Outlier

Different from traditional SVM, it is an unsupervised learning algorithm. You can use it to predict anomalous points by learning the boundary.

Natural Language Processing

Text Summarization Predict

Extracts, refines, or summarizes key information from lengthy and repetitive text sequences. News headline summarization is a special case of text summarization. You can use it to call a specified pre-trained model to predict news text, thereby generating news headlines.

Machine reading comprehension predict

Performs offline prediction with the generated machine reading comprehension training model.

Text Summarization

Extracts, refines, or summarizes key information from lengthy and repetitive text sequences. News headline summarization is a special case of text summarization. You can use it to train models that generate news headlines, which summarize the central ideas and key information of news articles.

Machine reading comprehension training

Trains a machine reading comprehension model that quickly understands and answers questions based on given documents.

Word Splitting

Based on the AliWS (Alibaba Word Segmenter) lexical analysis system. It performs word segmentation on the content of specified columns, with spaces separating each word after segmentation.

Convert Row, Column, and Value to KV Pair

Converts a trituple table (row,col,value) into a KV table (row,[col_id:value]).

String Similarity

A basic operation in the field of machine learning, mainly used in information retrieval, natural language processing, and bioinformatics.

String Similarity - top N

Calculates string similarity and filters out the top N most similar data.

Deprecated Word Filter

A pre-processing method in text analysis, used to filter noise in word segmentation results (such as "of", "is", or "ah").

N-gram Counting

A step in language model training. It generates n-grams based on words and counts the number of corresponding n-grams across the entire corpus.

Text Summarization

A simple and coherent short text in literature that can comprehensively and accurately reflect the central idea of the document. Automatic text summarization uses computers to automatically extract summary content from original documents.

Keyword Extraction

An important technology in natural language processing, specifically referring to extracting words that have strong relevance to the meaning of the article from the text.

Sentence Splitting

Splits text into sentences based on punctuation marks. It is primarily used for pre-processing before text summarization, converting a paragraph of text into a format where each sentence appears on a separate line.

Semantic Vector Distance

Based on semantic vector results from algorithms (such as word embeddings generated by Word2Vec), it calculates extension words (or extension sentences) for given words (or sentences) by finding the set of vectors with the closest distance to a particular vector. One application is to return a list of the most similar words based on word embeddings generated by Word2Vec, according to the input word.

Doc2Vec

Maps articles to vectors. The input is a vocabulary. The output is a document vector table, a word vector table, or a vocabulary.

Conditional Random Field

A probability distribution model of a group of output random variables under the condition of a given group of input random variables. Its characteristic is that it assumes the output random variables constitute a Markov random field.

Document Similarity

Calculates the similarity between pairs of articles or sentences based on words, building upon string similarity.

PMI

Counts the co-occurrence of all words in multiple articles and calculates the PMI (point mutual information) between each pair.

Conditional Random Field Prediction

An algorithm component based on the linearCRF online prediction model, mainly used for processing sequence labeling problems.

Word Splitting (Generate Models)

Developed based on AliWS, it generates a word segmentation model based on parameters and custom dictionaries.

Word Frequency Statistics

Based on input strings (manually entered or read from a specified file), it uses a program to count the total number of words in these strings and how many times each word appears.

TF-IDF

A commonly used weighting technique for information retrieval and text mining. It is typically applied in search engines and can be used as a measure or rating of the relevance between documents and user queries.

PLDA

Set the topic parameter for the PLDA component to abstract different topics from each document.

Word2Vec

Uses neural networks to map words to vectors in K-dimensional space through training, and supports operations on the vectors that represent words while corresponding to semantics. The input is a word column or vocabulary, and the output is a word vector table and vocabulary.

Network analysis

Tree Depth

Outputs the depth and tree ID of each node.

K-Core

Finds closely associated subgraph structures in a graph that meet the specified core degree. The maximum value of a node's core number is called the core number of the graph.

Single-source Shortest Path

Uses the Dijkstra algorithm to generate the shortest paths between a given node and all other nodes.

Page Rank

Originated from web search ranking, it uses the link structure of web pages to calculate the ranking of each web page.

Label Propagation Clustering

A graph-based semi-supervised learning method. Its basic principle is that a node's label (community) depends on the label information of its adjacent nodes, with the degree of influence determined by node similarity, and stability is achieved through propagation and iterative updates.

Label Propagation Classification

A semi-supervised classification algorithm that uses the label information of labeled nodes to predict the label information of unlabeled nodes.

Modularity

A metric used to evaluate community network structures, which assesses the cohesiveness of communities divided within the network structure. Generally, values above 0.3 indicate a relatively distinct community structure.

Maximum Connected Subgraph

In an undirected graph G, if there is a path connecting vertex A to vertex B, then A and B are said to be connected. In graph G, there exist several subgraphs. If all vertices within each subgraph are connected, but no vertices between different subgraphs are connected, then these subgraphs of graph G are called maximum connected subgraphs.

Vertex Clustering Coefficient

Calculates the density around each node in an undirected graph G. The density of a star network is 0, and the density of a fully connected network is 1.

Edge Clustering Coefficient

Calculates the density around each edge in an undirected graph G.

Counting Triangle

Outputs all triangles in an undirected graph G.

Financials

Data Conversion Module

Performs normalization, discretization, indexation, or WOE conversion on data.

Scorecard Training

A commonly used modeling tool in the credit risk assessment field. It works by discretizing original variables through binning input, then using linear models (such as logistic regression or linear regression) for model training. It includes features such as feature selection and score transformation.

Scorecard Prediction

Scores the raw data based on the model results produced by the scorecard training component.

Binning

Performs feature discretization, which segments continuous data into multiple discrete intervals. It supports equal-frequency binning, equal-width binning, and automatic binning.

Population Stability Index

An important indicator for measuring the shift caused by sample changes, commonly used to measure the stability of samples.

Visual algorithms

Image classification (torch)

Trains image classification model for inference.

Video classification

Trains video classification models for inference.

Object detection (easycv)

Builds object detection models that identify and frame high-risk entities in images.

image self-supervise learning

Directly trains raw unlabeled images to obtain a model for image feature extraction.

Image metric learning (raw)

Builds a metric learning model for model inference.

pose detection

If your business scenario involves human-related keypoint detection, you can use this component to build a keypoint model for inference.

model quantize

Provides mainstream model quantization algorithms. You can use it to compress and accelerate models, achieving high-performance inference.

model prune

Provides the mainstream model pruning algorithm AGP (taylorfo). You can use it to compress and accelerate models, achieving high-performance inference.

Tools

OfflineModel components

A data structure stored in MaxCompute. Models generated by traditional machine learning algorithms based on the PAICommand framework are stored in offline model format in the corresponding MaxCompute project. You can use offline model-related components to obtain offline models for offline prediction.

Model export

Exports models trained in MaxCompute to a specified OSS path.

Custom scripts

PyAlink Script

Calls Alink's classification algorithms for classification, regression algorithms for regression, recommendation algorithms for recommendations, and more. PyAlink script also supports seamless integration with other algorithm components in Designer to build business traces and verify their effectiveness.

Time Window SQL

Adds multi-date loop execution functionality on top of the regular SQL script component, used for parallel execution of daily SQL tasks within a specific time period.

Beta components

Lasso Regression Training

A compression estimation algorithm.

Lasso Regression Prediction

Supports both sparse and dense data formats. You can use this component to predict numeric variables, such as loan amount prediction, temperature prediction, etc.

Ridge Regression Prediction

Predicts numeric variables, including housing price prediction, sales volume prediction, humidity prediction, etc.

Ridge Regression Training

The most commonly used regularization method for regression analysis of ill-posed problems.