Build and debug a model - Platform For AI - Alibaba Cloud Documentation Center

Machine Learning Designer provides a variety of modeling components for building and debugging models visually and flexibly. This topic uses a heart disease prediction model as an example to demonstrate how to build and debug a model in a Machine Learning Designer pipeline.

Prerequisites

You must create a pipeline. For more information, see Create a pipeline.

Build a model

Model building typically involves splitting the overall process into multiple smaller tasks, where each task is represented by a node (component). You can orchestrate these nodes in a pipeline to achieve the desired outcome. As a best practice, each node should perform a single, simple task. The general process for model building is as follows:

In the component list on the left, search for or find the component you want, and then drag the component to the canvas.
In the component list, components marked in purple are Alink components, such as the Read CSV File component in the following figure. In addition to their standard usage, Alink components also support aggregation into groups. Configuring resources for a group can improve execution efficiency and resource utilization. For more information, see Alink components.
Click the target node and configure its parameters in the pane on the right.
Connect the nodes to form a pipeline with upstream and downstream relationships.
Each node has one or more input/output ports. You can hover the mouse pointer over a component's port to view its data type. Connect the nodes based on their data types.

When the model runs, upstream nodes execute first. A downstream node is triggered only after all its upstream nodes have completed.

Model building usually includes the following task modules:

Read data

In a pipeline, you can add a Source/Destination component to read data. You can read data from sources such as MaxCompute and OSS. For more information, see the documentation for the specific components in Component reference: Source/Destination. This topic uses reading data from MaxCompute as an example.

Create a table in MaxCompute and import data. For more information, see Create and use MaxCompute tables.

For this example, a table named heartdisease is created in the test project and test data is imported into the table.

-- Create a table.
CREATE TABLE IF NOT EXISTS heartdisease(
  age STRING COMMENT 'The age of the object.',
  sex STRING COMMENT 'The gender of the object. Valid values are female and male.',
  cp STRING COMMENT 'The type of chest pain. The pain level from severe to mild is typical, atypical, non-anginal, and asymptomatic.',
  trestbps STRING COMMENT 'Blood pressure.',
  chol STRING COMMENT 'Cholesterol.',
  fbs STRING COMMENT 'Fasting blood sugar. If the blood sugar level is greater than 120 mg/dl, the value is true. Otherwise, the value is false.',
  restecg STRING COMMENT 'The resting ECG result. The severity from mild to severe is norm and hyp.',
  thalach STRING COMMENT 'Maximum heart rate.',
  exang STRING COMMENT 'Indicates whether the object has exercise-induced angina. true indicates yes, and false indicates no.',
  oldpeak STRING COMMENT 'ST depression induced by exercise relative to rest.',
  slop STRING COMMENT 'The slope of the peak exercise ST segment. Valid values include down, flat, and up.',
  ca STRING COMMENT 'The number of major vessels found by fluoroscopy.',
  thal STRING COMMENT 'The type of defect. The severity from mild to severe is norm, fix, and rev.',
  `status` STRING COMMENT 'Indicates whether the object has the disease. buff indicates healthy, and sick indicates diseased.',
  style STRING);
-- This is for demonstration purposes only. Directly import the public test data from PAI.
INSERT INTO heartdisease select * from pai_online_project.heart_disease_prediction;

Drag the Read Data Table component to the canvas on the right to read data from the MaxCompute table.
A pipeline node named Read Data Table-1 is automatically generated on the canvas. The number in the name is incremented based on the order in which components of the same type are dragged to the canvas. The first component is numbered 1.
In the node configuration pane, configure the source table name. For more information about the parameters, see Read Data Table.
On the canvas, select the Read Data Table-1 node. In the node configuration pane on the right, enter the MaxCompute table name in the Table Name field. In this example, enter heartdisease.
Note
To read data from a table in a different MaxCompute project, use the ProjectName.TableName format, for example, test2.heartdisease. Ensure that you have the required project permissions.
Switch to the Field Information tab in the node configuration pane on the right to view the field details of the data.

Data pre-processing

After the data is read, you typically need to preprocess it to meet the input requirements for model training or prediction. Machine Learning Designer provides a wide range of data pre-processing and large model data processing components.

You can also use the SQL Script component to customize an SQL script. For example, the following script converts the data types of input features:

select age,
(case sex when 'male' then 1 else 0 end) as sex,
(case cp when 'angina' then 0  when 'notang' then 1 else 2 end) as cp,
trestbps,
chol,
(case fbs when 'true' then 1 else 0 end) as fbs,
(case restecg when 'norm' then 0  when 'abn' then 1 else 2 end) as restecg,
thalach,
(case exang when 'true' then 1 else 0 end) as exang,
oldpeak,
(case slop when 'up' then 0  when 'flat' then 1 else 2 end) as slop,
ca,
(case thal when 'norm' then 0  when 'fix' then 1 else 2 end) as thal,
(case status  when 'sick' then 1 else 0 end) as ifHealth
from  ${t1};

Model training

A model component typically takes pre-processed data from an upstream node as input and connects to downstream components, such as prediction or inference components. A model component can have one or more input/output ports. You can hover the mouse pointer over a component's port to view its port type and then connect the components based on the type.

For example, the Logistic Regression for Binary Classification component has two output ports:

Logistic regression model: The output port for the trained model. It can be used as the model input for components such as prediction components.
PMML: Model deployment usually relies on Predictive Model Markup Language (PMML) models. For example, to deploy the generated model using a built-in processor such as the PMML Processor, you must select Generate PMML in the parameters of a component that supports model generation, and then run the component.

Model prediction or inference

After you train the model, you can connect prediction or inference components to test the model's performance.

For example, the Prediction component has two input ports:

Model result input: Accepts the model trained in the model training section as input.
Prediction data input: Accepts the pre-processed test data as input.

Model evaluation

Some models provide evaluation components. You can use the corresponding evaluation components to analyze the model's performance based on relevant metrics.

For example, Machine Learning Designer provides the following evaluation components, which you can use as downstream components for the prediction component as needed.

Debug a model

Debug and run

Entire pipeline: Click the Run icon () in the upper-left corner of the canvas to run the entire pipeline. If the pipeline is complex, you can run a single node or a group of nodes by module to make debugging easier.
Single or partial components: Right-click the target component to run a single node or a group of nodes. Several run methods are available.

After a component runs successfully, a check mark () appears on it. If it fails, an exclamation mark () appears. You can right-click the component to view logs and results.

View logs and results

View data and perform visual analytics:
After a component runs successfully, right-click the component and select View Data to view the output data.
For some components, Machine Learning Designer can convert data into graphs and charts. This helps you understand complex data and analysis results, quickly obtain key information, and identify trends and patterns to facilitate more efficient analysis and decision-making. You can also click Visual Analysis or the visualization icon above the canvas to perform visual analytics. For more information, see Visual analytics.
View logs: If a component fails to run, right-click the component and select View Log to find the cause of the failure.

View running tasks

Click View All Tasks in the upper-right corner of the canvas to view the run details of all historical tasks. During model building, each run is recorded as a historical task. The nodes involved, their configurations, and the output results of each run are saved in the historical task.

Note

To roll back to a previous version, view the details of the historical task to confirm that you are rolling back to the correct version. Before you roll back, save and run the latest task. If an error occurs after the rollback, you can roll back to the latest task status.

References

After you debug a model, you can register the trained model as a new model and manage it. For more information, see Register and manage models.
After you debug a model, you can deploy it for online prediction. For more information, see Model prediction and deployment.
Machine Learning Designer provides the Update EAS Service (Beta) component to update model services. For more information, see Periodically update online model services.
You can use DataWorks to schedule pipelines for offline, periodic model updates. For more information, see Use DataWorks to schedule Machine Learning Designer pipelines offline.
For more information about components, see Component reference.
For information about billing, see Machine Learning Designer billing.