All Products
Search
Document Center

DataWorks:Data Quality

Last Updated:Aug 12, 2025

DataWorks Data Quality helps you maintain high data quality by detecting changes in source data and identifying dirty data generated during the extract, transform, and load (ETL) process. It can automatically block problematic tasks to prevent dirty data from spreading to downstream nodes. This prevents unexpected data issues that could affect your operations and business decisions, reducing the time and resource costs associated with rerunning tasks and correcting data.

Billing

Data Quality checks data quality by using monitoring rules. The fees generated for Data Quality checks consist of the following two parts:

  • Fees included in your DataWorks bills

    You are charged by DataWorks based on the number of Data Quality checks. For more information, see Billing of Data Quality.

  • Fees not included in your DataWorks bills

    You are charged by the compute engines that are associated with your DataWorks workspace. When monitoring rules are triggered, SQL statements are generated and executed by specific compute engines.

    In this case, you are charged for the computing resources consumed by the compute engines. For more information, see the topic about billing for each type of compute engine. For example, you associate a pay-as-you-go MaxCompute project billing method with your DataWorks workspace. In this case, you are charged when you execute SQL statements and the fees are included in your MaxCompute bills instead of your DataWorks bills.

Features

You can configure quality monitoring rules across multiple dimensions, including completeness, accuracy, validity, consistency, uniqueness, and timeliness. These rules can be associated with scheduling nodes so that, after a task finishes running, the quality checks are automatically triggered. This allows you to detect problematic data at the earliest opportunity and set rule severity levels to control whether a task fails and stops. This approach helps prevent the spread of dirty data and significantly reduces both the time and cost required for data recovery.

The features of each Data Quality module are described as follows:

Feature

Description

Dashboard

The Dashboard page displays an overview of data quality in your workspace. It includes key data quality metrics, trends and distribution of rule check instances, tables with the most data quality issues, issue owners, and the coverage status of monitoring rules. This helps data quality owners understand the overall data quality status of the workspace and promptly handle issues to improve data quality.

Quality Assets

Rules

View all configured monitoring rules.

Rule Template Library

Manage user-defined rule templates to improve the efficiency of rule configuration.

Configure Rules

Configure by Table

Configure a monitoring rule for a single table or for multiple tables based on a rule template.

Configure by Template

Quality O&M

Monitor

View all monitors created in the current workspace.

Running Records

View the results of monitors. After the moniter runs, you can view its details on this page.

Quality Analysis

Quality Reports

Create report templates and add metrics related to rule configuration and execution. Reports are generated and sent regularly based on the defined reporting period, dispatch time, and subscription details.

Usage notes

  • The following table describes the data source types and the regions in which the data source types are supported.

    Data source type

    Supported regions

    MaxCompute

    StarRocks

    MySQL

    China (Hangzhou), China (Shanghai), China (Beijing), China (Zhangjiakou), China (Ulanqab), China (Shenzhen), China (Chengdu), China (Hong Kong), Japan (Tokyo), Singapore, Malaysia (Kuala Lumpur), Indonesia (Jakarta), Germany (Frankfurt), UK (London), US (Silicon Valley), and US (Virginia).

    E-MapReduce

    China (Hangzhou), China (Shanghai), China (Beijing), China (Zhangjiakou), China (Shenzhen), China (Hong Kong), Japan (Tokyo), Singapore, Malaysia (Kuala Lumpur), Indonesia (Jakarta), Germany (Frankfurt), and US (Silicon Valley).

    Hologres

    China (Hangzhou), China (Shanghai), China (Beijing), China (Zhangjiakou), China (Shenzhen), China (Hong Kong), Japan (Tokyo), Japan (Tokyo), Singapore, Malaysia (Kuala Lumpur), Indonesia (Jakarta), Germany (Frankfurt), US (Silicon Valley), and US (Virginia).

    AnalyticDB for PostgreSQL

    China (Hangzhou), China (Shanghai), China (Beijing), China (Shenzhen), and Japan (Tokyo).

    AnalyticDB for MySQL

    China (Shenzhen), Singapore, and US (Silicon Valley).

    CDH

    China (Shanghai), China (Beijing), China (Zhangjiakou), China (Hong Kong), and Germany (Frankfurt).

  • Before you configure monitoring rules for E-MapReduce, Hologres, AnalyticDB for PostgreSQL, AnalyticDB for MySQL, and CDH, StarRocks and MySQL, you must first collect their metadata. For more information, see Collect metadata from an EMR data source.

  • For a monitoring rule on a table from E-MapReduce, Hologres, AnalyticDB for PostgreSQL, AnalyticDB for MySQL, and CDH, StarRocks and MySQL to be triggered, the scheduling node that generates the data must run on a resource group connected to that data source.

  • You can configure multiple monitoring rules for a table.

Scenarios

In offline data validation scenarios, you configure a monitoring rule for a table by specifying a partition filter expression and associating the rule with the scheduling node that generates the table's data. After the node runs, the monitoring rule is triggered to check the data in the partition that matches the filter expression. Note that dry-run tasks do not trigger monitoring rules. You can configure the rule as a strong or weak rule to determine whether to cause the task to fail if an anomaly is detected, which prevents dirty data from spreading downstream. On the rule configuration page, you can also specify notification methods to receive prompt alert notifications.

image

Configure a monitoring rule

  • Create a monitoring rule: You can create a rule for a single table, or create rules for multiple tables in bulk by using a template. For more information, see Configure a monitoring rule for a single table and Configure a monitoring rule for multiple tables based on a template.

  • Subscribe to a monitoring rule: After a rule is created, you can subscribe to it to receive alert notifications for data quality checks. The notification methods include Email, Email and SMS, DingTalk Chatbot, DingTalk Chatbot @ALL, Lark Group Chatbot, Enterprise Wechat Chatbot, Custom Webhook, and Telephone.

    Note

    The Custom Webhook notification method is supported only in DataWorks Enterprise Edition.

Trigger the monitoring rule

After the scheduling node runs in Operation Center, the associated monitoring rule is triggered to check the quality of the data that the node generates. An SQL statement is generated and executed on the relevant compute engine. Based on the rule's strength (strong or weak) and its check result, DataWorks determines whether to cause the task to fail. This blocks downstream nodes from running and prevents dirty data from spreading.

View validation results

You can view validation results on the Monitor page. On the Running Records page, search by table or node to view the validation details of data quality monitoring. For more information, see View the details of a monitor.