You can use Alibaba Cloud Data Online Migration or the ossimport tool to migrate data from Amazon Redshift to AnalyticDB for PostgreSQL.
Prerequisites
An Amazon Redshift cluster is created.
Amazon S3 that is used to export Amazon Redshift data is activated.
Alibaba Cloud Object Storage Service (OSS) is activated. For more information, see What is OSS?
An OSS bucket is created. For more information, see Create a bucket.
NoteTo facilitate subsequent data import, we recommend that you create an OSS bucket in the same region as the AnalyticDB for PostgreSQL instance.
An AnalyticDB for PostgreSQL instance is created. For information about how to select instance specifications, see the "Specification selection" section of this topic.
Specification selection
This section describes how to select specifications for an AnalyticDB for PostgreSQL instance based on the specifications of the source Amazon Redshift cluster.
An Amazon Redshift cluster consists of a leader node and multiple compute nodes.
Leader node: acts as the coordinator node within an AnalyticDB for PostgreSQL instance. The leader node communicates with the client, analyzes and develops query execution plans, and performs database operations.
Compute node: acts as a set of compute nodes within an AnalyticDB for PostgreSQL instance. Each compute node in an Amazon Redshift cluster is partitioned into slices. Each node slice acts as a compute node within an AnalyticDB for PostgreSQL instance and performs computing for data storage and query.
When you create an AnalyticDB for PostgreSQL instance, you can select the node specifications of the instance based on specifications of each node slice in the source Amazon Redshift cluster.
Example
An Amazon Redshift cluster consists of four compute nodes. Each compute node consists of two node slices. The specifications of each node slice are 2 cores, 16 GB memory, and 1 TB storage.
When you create an AnalyticDB for PostgreSQL instance, you can set the number of compute nodes to 8, node specifications to 2 cores and 16 GB memory, and single-node storage to 1,000 GB.
For information about how to create an AnalyticDB for PostgreSQL instance, see Create an instance.
We recommend that you set the storage type to Enhanced SSD (ESSD) because ESSDs provide better I/O performance than ultra disks.
Procedure
Step 1: Export data from an Amazon Redshift cluster to an Amazon S3 bucket
You can execute the UNLOAD
statement to export data. For more information, see UNLOAD.
Syntax:
UNLOAD ('select-statement')
TO 's3://object-path/name-prefix'
AUTHORIZATION
[ option [ ... ] ]
WHERE option IS
{ [ FORMAT [ AS ] ] CSV | PARQUET
| PARTITION BY ( column_name [, ... ] ) [ INCLUDE ]
| MANIFEST [ VERBOSE ]
| HEADER
| DELIMITER [ AS ] 'delimiter-char'
| FIXEDWIDTH [ AS ] 'fixedwidth-spec'
| ENCRYPTED [ AUTO ]
| BZIP2
| GZIP
| ZSTD
| ADDQUOTES
| NULL [ AS ] 'null-string'
| ESCAPE
| ALLOWOVERWRITE
| CLEANPATH
| PARALLEL [ { ON | TRUE } | { OFF | FALSE } ]
| MAXFILESIZE [AS] max-size [ MB | GB ]
| REGION [AS] 'Amazon-region' }
We recommend that you use the FORMAT AS PARQUET or CSV parameter to specify the format of the exported data.
To allow more files to be generated and improve data export efficiency, we recommend that you specify the PARALLEL ON parameter to export data in parallel.
To generate more files, we recommend that you specify the MAXFILESIZE parameter to limit the size of files. The recommended value is an integer multiple of the number of nodes within an AnalyticDB for PostgreSQL instance. This improves the data import efficiency when data is subsequently imported in parallel from OSS foreign tables to the AnalyticDB for PostgreSQL instance.
Step 2: Synchronize data from Amazon S3 to OSS
You can use Alibaba Cloud Data Online Migration or the ossimport tool to synchronize data from Amazon S3 to OSS.
Data Online Migration
Log on to the Data Transport console.
Create the source and destination data addresses for data synchronization.
In the left-side navigation pane, choose
.Click Create Data Address to create a source data address.
In the Create Data Address panel, configure the parameters that are described in the following table and click OK.
Parameter
Required
Description
Name
Yes
The name of the source data address. The name must meet the following requirements:
The name is 3 to 63 characters in length.
The name must be case-sensitive and can contain lowercase letters, digits, hyphens (-), and underscores (_).
The name is encoded in the UTF-8 format and cannot start with a hyphen (-) or an underscore (_).
Type
Yes
The type of the source data address. Select AWS S3.
Domain Name
Yes
The endpoint of the source data address. For more information, see the "Amazon S3 endpoints" section of the Amazon Simple Storage Service endpoints and quotas topic.
AccessKeyId
Yes
The access key pair of the Identity and Access Management (IAM) user that is used to read data from the source data address. Amazon S3 uses the access key pair to check whether the IAM user has the permissions to read data from the source data address.
SecretAccessKey
Yes
Bucket
Yes
The name of the Amazon S3 bucket in which the data to be migrated is stored.
NoteThe bucket name cannot contain invalid characters, such as spaces, line breaks, or tab characters.
Prefix
No
The prefix of the source data address. You can specify a prefix to migrate specific data.
If you specify a prefix, only the data in the directory that is specified by the prefix is migrated. A prefix cannot start with a forward slash (/) and must end with a forward slash (/).
If you do not specify a prefix, all data in the Amazon S3 bucket is migrated.
Tunnel
No
The name of the tunnel that you want to use.
ImportantThis parameter is required only when you migrate data to the cloud by using Express Connect circuits or VPN gateways or migrate data from self-managed databases to the cloud.
If data at the destination data address is stored in a local file system or you need to migrate data over an Express Connect circuit in an environment such as Alibaba Finance Cloud or Apsara Stack, you must create and deploy an agent.
Agent
No
The name of the agent that you want to use.
ImportantThis parameter is required only when you migrate data to the cloud by using Express Connect circuits or VPN gateways or migrate data from self-managed databases to the cloud.
You can select up to 30 agents at a time for a specific tunnel.
Click Create Data Address again to create a destination data address.
In the Create Data Address panel, configure the parameters that are described in the following table and click OK.
Parameter
Required
Description
Data Type
Yes
The type of the destination data address. Select OSS.
Data Name
Yes
The name of the source data address. The name must be 3 to 63 characters in length. The name cannot contain special characters except for hyphens (-) and underscores (_).
Data Region
Yes
The region in which the destination data address resides.
Enable Transfer Acceleration
No
If you want to use Data Online Migration together with the transfer acceleration feature of Object Storage Service (OSS), enable the transfer acceleration feature for buckets. The transfer acceleration feature takes effect within 30 minutes after it is enabled. After you enable the feature, wait for 30 minutes before you create a migration job.
ImportantIf you use the transfer acceleration feature to access your OSS buckets, you are charged additional transfer acceleration fees. For more information, see Transfer acceleration.
OSS Endpoint
Yes
The endpoint of the destination data address. Select an endpoint based on the region in which the destination data address resides. For more information, see Regions and endpoints.
NoteWhen you migrate data from third-party cloud services to OSS, you must access OSS from a public endpoint over the Internet.
Access Key Id and Access Key Secret
Yes
The AccessKey pair of the RAM user that is used to write data to the destination data address. The AccessKey pair is used by OSS to check whether the RAM user has the permissions to write data to the destination data address.
OSS Bucket
Yes
The OSS bucket to which the data is migrated.
OSS Prefix
No
The prefix of the destination data address.
If you specify a prefix, the source data is migrated to the specified directory. The value cannot start with a forward slash (/) and must end with a forward slash (/). Example:
data/to/oss/
.If you do not specify a prefix, the source data is migrated to the root directory of the destination bucket.
ImportantIf the name of a source file starts with a forward slash (/), you must specify an OSS prefix when you specify the destination data address. If no OSS prefix is specified, the migration job fails. For example, if the name of a file to be migrated is /test/test.png, you must specify an OSS prefix, such as oss/. After the migration is complete, the name of the file changes from /test/test.png to oss//test/test.png.
Create an online migration job.
In the left-side navigation pane, choose
.Click Create Job.
In the Create Job panel, read the terms of the migration service and select I agree to the above terms and activate the Data Transport service. Then, click Next.
In the Fee Reminder message, click Yes, Go Ahead.
In the Create Job panel, configure the parameters that are described in the following table and click Next.
Parameter
Required
Description
Job Name
Yes
The name of the source data address. The name must be 3 to 63 characters in length. The name cannot contain special characters except for hyphens (-) and underscores (_).
Source Data Address
Yes
The source data address that you created.
Destination Data Address
Yes
The destination data address that you created.
Specified Directory
No
The directories in which the files and subdirectories are migrated or not migrated. Valid values:
Do not filter: migrates all data from the source data address.
Exclude: does not migrate the files or subdirectories in the specified directories.
For example, you want to migrate all the directories in the
root_dir/
directory exceptroot_dir/bad_sub1/
androot_dir/bad_sub2/
. In this case, you can select this method and specifybad_sub1/
andbad_sub2/
.Contain: migrates only the files and subdirectories in the specified directories.
For example, you want to migrate only
root_dir/good_sub1/
androot_dir/good_sub2/
in theroot_dir/
directory. In this case, you can select this method and specifygood_sub1/
andgood_sub2/
.
NoteA directory name can contain only digits and letters. If you use special characters, the migration job may fail.
A directory name cannot start with a forward slash (/) or a backslash (\), and cannot contain double forward slashes (//), double periods (..), or double quotation marks ("). The character string that consists of all the specified directory names cannot exceed 10 KB in length.
A directory name must end with a forward slash (/). For example, you can specify
docs/
as the directory name.You can specify up to 20 directories of the Exclude or Contain type.
Migration Type
Yes
The data migration mode. Valid values:
Full: The system migrates all data from the source data address to the destination data address. You must configure the Start Time Point of File parameter. The system migrates all data whose last modification time is later than the specified time. After all data is migrated, the migration job ends.
If you change the data at the source data address after the migration job is complete, you can submit another job to migrate all data. In this case, the system migrates only the changed data.
Incremental: The system runs a migration job based on the values that you specified for the Migration Interval and Migration Times parameters.
You must configure the Start Time Point of File parameter. During the first migration, the system migrates all data whose last modification time is later than the specified time. After the first migration is complete, incremental migration is performed based on the value that you specified for the Migration Interval parameter. For each incremental migration, the system migrates only the data that is created or modified after the previous migration starts and before the current migration starts.
If you set the Migration Times parameter to N, full migration is performed once and then incremental migration is performed (N - 1) times.
For example, the current time is 08:00, March 10, 2019 and you set the Migration Interval parameter to 1, the Migration Times parameter to 5, and the Start Time Point of File parameter to 08:00, March 5, 2019. When the first migration starts, the system migrates all files whose last modification time is between 08:00, March 5, 2019 and 08:00, March 10, 2019. In this example, the first migration requires one hour to complete and the second migration starts at 10:00, March 10, 2019. The duration from 08:00 to 10:00 consists of one hour for the first migration process and one hour for the migration interval. During the second migration, the system migrates the files whose last modification time is between 08:00, March 10, 2019 and 10:00, March 10, 2019. The migration job consists of one full migration and four incremental migrations.
ImportantBefore full or incremental migration starts, the system compares the files at the source data address with the files at the destination data address. If a source file has the same name as a destination file, the destination file is overwritten when one of the following conditions is met:
The Content-Type values of the files at the source data address and the destination data address are different.
The last modification time of the file at the source data address is later than that of the file at the destination data address.
The size of the file at the source data address is different from that of the file at the destination data address.
Start Time Point of File
Yes
The time to filter data to be migrated. The system migrates the data that is created or modified after the specified time. Valid values:
All: migrates all files.
Assign: migrates only the files that are created or modified after the specified time.
For example, if you set the time to 08:00:00, November 1, 2018, only the files that are created or modified after 08:00:00, November 1, 2018 are migrated.
Migration Interval
This parameter is required if you set the Migration Type parameter to Incremental.
The default value is 1, and the maximum value is 24. Unit: hours.
Migration Times
This parameter is required if you set the Migration Type parameter to Incremental.
The default value is 1, and the maximum value is 30.
File Overwrite Method
Yes
The method that is used to process the file at the source data address whose name is the same as that of a file at the destination data address. Valid values:
LastModified: If a source file has the same name as a destination file, the system compares the last modified time indicated by the LastModified property of the files.
If the last modification time of the file at the source data address is later than that of the file at the destination data address, the file at the source data address is migrated and the file at the destination data address is overwritten.
If the last modification time of the file at the source data address is earlier than that of the file at the destination data address, the file at the source data address is not migrated and the file at the destination data address is retained.
If the two files have the same last modification time, the system checks the values of the Size and Content-Type properties of the files.
If the two files have the same Size and Content-Type values, the file at the source data address is not migrated. If the two files have different values of at least one property, the file at the source data address is migrated and the file at the destination data address is overwritten.
Condition: If a source file has the same name as a destination file, the system compares the values of the LastModified, Size, and Content-Type properties of the files.
If all property values of the two files are the same, the file at the source data address is not migrated.
If the two files have different values of at least one property, the file at the source data address is migrated and the file at the destination data address is overwritten.
All: If a source file has the same name as a destination file, the system performs no comparison and directly overwrites the destination file with the source file.
No: If a source file has the same name as a destination file, the system performs no comparison and directly skips the source file in the migration.
WarningIf you select Condition or LastModified, earlier files may overwrite later files.
If you select Condition or LastModified, make sure that the source files have LastModified, Size, and Content-Type values. Otherwise, the overwrite policy may become ineffective and data is not migrated as expected.
In the Performance step, go to the Data Prediction section and configure the Data Size and File Count parameters.
NoteTo ensure that the migration job is successful, we recommend that you estimate the amount of data that you want to migrate as accurately as possible. For more information, see Prepare for a migration job.
(Optional) In the Performance step, go to the Flow Control section, specify the Time Range and Max Flow(MB/s) parameters, and then click Add.
NoteTo ensure business continuity, we recommend that you configure the Time Range and Max Flow parameters based on traffic fluctuations.
Click Create and wait until the migration job is complete.
ossimport
Download and install ossimport in standalone mode. For more information, see Overview.
The ossimport tool in standalone mode has the following file structure:
ossimport ├── bin │ └── ossimport2.jar # The JAR package that contains the Master, Worker, TaskTracker, and Console modules. ├── conf │ ├── local_job.cfg # The job configuration file. │ └── sys.properties # The configuration file that contains system parameters. ├── console.bat # The Windows command line utility that is used to run tasks in a distributed manner. ├── console.sh # The Linux command line utility that is used to run tasks in a distributed manner. ├── import.bat # The script that automatically imports files based on the conf/local_job.cfg configuration file in Windows. The configuration file contains parameters that specify data migration operations such as start, migration, verification, and retry. ├── import.sh # The script that automatically imports files based on the conf/local_job.cfg configuration file in Linux. The configuration file contains parameters that specify data migration operations, such as start, migration, verification, and retry. ├── logs # The directory that contains logs. └ ── README.md # The file that provides a description of ossimport. We recommend that you read the file before you use ossimport.
Configure ossimport in standalone mode.
You need to modify only the following parameters in the conf/local_job.cfg configuration file:
srcType=s3 srcAccessKey=<Your AWS access key> srcSecretKey=<Your AWS secret key> srcDomain=<The domain name that corresponds to the region in which the Amazon S3 bucket is deployed> srcBucket=<The name of the Amazon S3 bucket> destAccessKey=<Your Alibaba Cloud AccessKey ID> destSecretKey=<Your Alibaba Cloud AccessKey secret> destDomain=<The endpoint that corresponds to the region in which the OSS bucket is deployed> destBucket=<The name of the OSS bucket> destPrefix= isSkipExistFile=true
For more information, see Overview.
Run ossimport to synchronize data to OSS. For information about how to use ossimport in standalone mode, see Standalone deployment.
Step 3: Create a destination table
Create a destination table in the AnalyticDB for PostgreSQL instance to load data from Amazon Redshift. The destination table must have the same schema as the source table. For more information, see CREATE TABLE.
Modify information about database objects such as schemas, tables, functions, and views to convert DDL statements. For more information, see the "Convert DDL statements" section of this topic.
Step 4: Import data from OSS to the AnalyticDB for PostgreSQL instance
You can use the COPY statement or an OSS foreign table to import data to AnalyticDB for PostgreSQL.
For information about how to use the COPY statement to import OSS data, see Use the COPY or UNLOAD statement to import or export data between OSS foreign tables and AnalyticDB for PostgreSQL tables.
For information about how to use an OSS foreign table to import OSS data, see the "Use OSS foreign tables for data lake analysis" section of the Use OSS foreign tables for data lake analysis topic.
Convert DDL statements
The DDL statements of Amazon Redshift are slightly different from the DDL statements of AnalyticDB for PostgreSQL and must be converted before you migrate data.
CREATE SCHEMA
Execute the CREATE SCHEMA statement of AnalyticDB for PostgreSQL to create a schema. Example:
CREATE SCHEMA schema1 AUTHORIZATION xxxpoc;
GRANT ALL ON SCHEMA schema1 TO xxxpoc;
GRANT ALL ON SCHEMA schema1 TO public;
COMMENT ON SCHEMA model IS 'for xxx migration poc test';
CREATE SCHEMA oss_external_table AUTHORIZATION xxxpoc;
CREATE FUNCTION
AnalyticDB for PostgreSQL does not support specific SQL functions of Amazon Redshift. You must modify or rewrite the SQL functions. Examples:
Replace the
CONVERT_TIMEZONE(a,b,c)
function with the following function:timezone(b, timezone(a,c))
Replace the
GETDATE()
function with the following function:current_timestamp(0):timestamp
Replace or optimize user-defined functions (UDFs). The following sample code provides an example of a UDF of Amazon Redshift:
CREATE OR REPLACE FUNCTION public.f_jdate(dt timestamp WITHOUT time zone) RETURNS character varying AS ' from datetime import timedelta, datetime if dt.hour < 4: d = timedelta(days=-1) dt = dt + d return str(dt.date())' LANGUAGE plpythonu IMMUTABLE; COMMIT;
Replace the preceding function with the following function:
to_char(a - interval '4 hour', 'yyyy-mm-dd')
Replace other Amazon Redshift functions.
For information about standard PostgreSQL functions, see Functions and Operators. You can modify or rewrite the SQL functions of Amazon Redshift that are not supported by AnalyticDB for PostgreSQL, such as the following functions:
CREATE TABLE
Modify the name of a table.
The name of an Amazon Redshift table can be up to 127 characters in length, whereas the name of an AnalyticDB for PostgreSQL table can be up to 63 characters in length. If the names of database objects such as tables, functions, and views are more than 63 characters in length, you must remove the excess characters.
Modify the compression algorithm.
Replace
ENCODE XXX
in a CREATE TABLE statement of Amazon Redshift with the following clause:WITH (COMPRESSTYPE={ZLIB|ZSTD|RLE_TYPE|NONE})
For information about the compression algorithms supported by AnalyticDB for PostgreSQL, see the "Data compression" section of the Define storage models for tables topic.
Modify distribution keys.
Amazon Redshift supports three distribution keys. For more information, see Distribution styles. You must modify distribution keys based on the following rules:
EVEN distribution: Use
DISTRIBUTED RANDOMLY
.KEY distribution: Use
DISTRIBUTED BY (column, [ ... ] )
.ALL distribution: Use
DISTRIBUTED REPLICATED
.
Modify sort keys.
Replace the COMPOUND or INTERLEAVED option in the
[ COMPOUND | INTERLEAVED ] SORTKEY (column_name [, ...] ) ]
sort key clause of Amazon Redshift with the following clause:ORDER BY (column, [ ... ])
Examples:
Example 1:
The following sample code provides an example of a
CREATE TABLE
statement of Amazon Redshift:CREATE TABLE schema1.table1 ( filed1 VARCHAR(100) ENCODE lzo, filed2 INTEGER DISTKEY, filed3 INTEGER, filed4 BIGINT ENCODE lzo, filed5 INTEGER ) INTERLEAVED SORTKEY ( filed1, filed2 );
Convert the preceding statement into the following
CREATE TABLE
statement of AnalyticDB for PostgreSQL:CREATE TABLE schema1.table1 ( filed1 VARCHAR(100) , filed2 INTEGER, filed3 INTEGER, filed4 BIGINT, filed5 INTEGER ) WITH(APPENDONLY=true,ORIENTATION=column,COMPRESSTYPE=zstd) DISTRIBUTED BY (filed2) ORDER BY (filed1, filed2); -- Sort SORT schema1.table1; MULTISORT schema1.table1;
Example 2:
The following sample code provides an example of a
CREATE TABLE
statement of Amazon Redshift, which includes the ENCODE and SORTKEY options:CREATE TABLE schema2.table2 ( filed1 VARCHAR(50) ENCODE lzo, filed2 VARCHAR(50) ENCODE lzo, filed3 VARCHAR(20) ENCODE lzo, ) DISTSTYLE EVEN INTERLEAVED SORTKEY ( filed1 );
Convert the preceding statement into the following
CREATE TABLE
statement of AnalyticDB for PostgreSQL:CREATE TABLE schema2.table2 ( filed1 VARCHAR(50), filed2 VARCHAR(50), filed3 VARCHAR(20), ) WITH(APPENDONLY=true, ORIENTATION=column, COMPRESSTYPE=zstd) DISTRIBUTED randomly ORDER BY (filed1); -- Sort SORT schema2.table2; MULTISORT schema2.table2;
NoteFor more information about sort keys, see Use sort keys and rough set indexes to accelerate queries in column-oriented tables.
CREATE VIEW
Convert a CREATE VIEW
statement of Amazon Redshift into an SQL statement that complies with the AnalyticDB for PostgreSQL syntax. The conversion rules are similar to the conversion rules of CREATE TABLE
statements.
The WITH NO SCHEMA BINDING
clause is not supported.
References
For information about automatic data migration, see Configure automatic data migration from an Amazon Redshift cluster to an AnalyticDB for PostgreSQL instance.