How to migrate data from Amazon DynamoDB to Alibaba Cloud using NimoShake - ApsaraDB for MongoDB

NimoShake (also known as DynamoShake) is a data synchronization tool developed by Alibaba Cloud. It allows you to migrate Amazon DynamoDB databases to Alibaba Cloud.

Prerequisites

An ApsaraDB for MongoDB instance is created. For more information, see Create a replica set instance or Create a sharded cluster instance.

Background information

NimoShake is primarily designed for migrations from an Amazon DynamoDB database. The destination must be an ApsaraDB for MongoDB database.For more information, see NimoShake overview.

Important considerations

Resource consumption. A full data migration consumes resources in both the source and destination databases. This may increase the load of your database servers. If your database experiences high traffic or if the server specifications are excesssively low, this could lead to increased database pressure. We strongly recommend carefully evaluating the potential impact on performance before migrating data, and performing the migration during off-peak hours.
Storage capacity. Make sure that the storage space of the ApsaraDB for MongoDB instance is larger than that of the Amazon DynamoDB database.

Key terms

Resumable transmission: This feature divides a task into multiple parts for transmission. If transmission is interrupted due to network failures or other reasons, the task can resume from where it left off, rather than restarting from the beginning.
Note
- Full migration does not support resumable transmission.
- Incremental migration does support resumable transmission. If an incremental synchronization connection is lost and recovered within a short timeframe, the synchronization can continue. However, in certain situations, such as prolonged disconnection or loss of the previous checkpoint, a full synchronization may be triggered again.
Checkpoint: Resumable transmission for incremental synchronization is achieved through checkpoints. By default, checkpoints are written to the destination MongoDB database, specifically to a database named nimo-shake-checkpoint. Each collection records its own checkpoint table, and a status_table records whether the current synchronization is a full or incremental task.

NimoShake features

NimoShake currently supports a separated synchronization mechanism, carried out in two steps:

Full data migration.
Incremental data migration.

Full migration

Full migration consists of two parts: data migration and index migration. The following figure shows the basic architecture: 全量同步基本架构

Data migration: NimoShake uses multiple concurrent threads to pull source data, as shown in the following figure: 数据同步

Thread	Description
Fetcher	Calls the protocol conversion driver provided by Amazon to batch-retrieve data from the source table and place it in queues until all source data is pulled. Note Only one fetcher thread is provided.
Parser	Reads data from queues and parses it into a BSON structure. After data is parsed, the parser writes data to the executor's thread. Multiple parser threads can be started. The default value is 2. Adjust the number of parser threads using the `FullDocumentParser` parameter .
Executor	Pulls data from queues, then aggregates and writes data to the destination ApsaraDB for MongoDB database. Up to 16 MB of data or 1,024 entries can be aggregated. Multiple executor threads can be started. The default value is 4. Adjust the number of executor threads using the `FullDocumentConcurrency` parameter.

Index migration: NimoShake creates indexes after data migration is complete. Indexes are categorized into auto-generated and user-created indexes:
- Auto-generated indexes:
  - If you have a partition key and a sort key, NimoShake will create a unique composite index and write it to MongoDB.
  - NimoShake will also create a hash index for the partition key and write it to MongoDB.
  - If you have only a partition key, NimoShake will create a hash index and a unique index in MongoDB.
- User-created indexes: If you have a user-created index, NimoShake creates a hash index based on the primary key and writes the index to the destination ApsaraDB for MongoDB database.

Incremental migration

Incremental migration only synchronizes data; it does not synchronize indexes generated during the incremental synchronization process. The basic architecture is as follows: 增量同步架构图

Thread	Description
Fetcher	Monitors shard changes in the stream.
Manager	Manages message notification and dispatcher creation; each shard corresponds to one Dispatcher.
Dispatcher	Retrieves incremental data from the source. For resumable transmission, data retrieval resumes from the last Checkpoint instead of the beginning.
Batcher	Parses, packages, and aggregates incremental data retrieved by the Dispatcher thread.
Executor	Writes the aggregated data to the destination ApsaraDB for MongoDB database and updates the checkpoint.

Steps: Migrate Amazon DynamoDB to Alibaba Cloud (Ubuntu example)

This section demonstrates how to use NimoShake to migrate an Amazon DynamoDB database to ApsaraDB for MongoDB, using an Ubuntu system as an example.

Download NimoShake: Run the following command to download the NimoShake package:
```
wget https://github.com/alibaba/NimoShake/releases/download/release-v1.0.14-20250704/nimo-shake-v1.0.14.tar.gz
```
Note
We recommend downloading the latest version of the NimoShake package.
Decompress the package: Run the following command to decompress the NimoShake package:
```
tar zxvf nimo-shake-v1.0.14.tar.gz
```
Access the directory: After decompression, run the cd nimo-shake-v1.0.14 command to enter the nimo folder.
Open configuration file: Run the vi nimo-shake.conf command to open the NimoShake configuration file.

Configure NimoShake: Configure the parameters in the nimo-shake.conf file. The following table describes each configuration item:

Parameter	Description	Example
id	The ID of the migration task. This is customizable and used for outputting PID files, log names, the database name for checkpoint storage, and the destination database name.	`id = nimo-shake`
log.file	The path of the log file. If this parameter is not configured, logs are displayed in `stdout`.	`log.file = nimo-shake.log`
log.level	The logging level. Valid values: `none`: No logs `error`: Error messages `warn`: Warning information `info`: System status `debug`: Debugging information Default value: `info`.	`log.level = info`
log.buffer	Specifies whether to enable log buffering. Valid values: `true`: Log buffering is enabled. Log buffering ensures high performance but may lose a few of the latest log entries upon exit. `false`: Log buffering is disabled. If disabled, performance may be degraded. However, all log entries are displayed upon exit. Default value: `true`.	`log.buffer = true`
system_profile	The PPROF port, used for debugging and displaying stackful coroutine information.	`system_profile = 9330`
full_sync.http_port	The RESTful port for the full migration phase. Use `curl` to view internal monitoring statistics. For more information, see the wiki.	`full_sync.http_port = 9341`
incr_sync.http_port	The RESTful port for the incremental migration phase. Use `curl` to view internal monitoring statistics. For more information, see the wiki.	`incr_sync.http_port = 9340`
sync_mode	The type of data migration. Valid values: `all`: Full migration and incremental migration `full`: Only full migration Default value: `all`. Note Only `full` is supported when the source is an ApsaraDB for MongoDB instance compatible with the DynamoDB protocol.	`sync_mode = all`
incr_sync_parallel	Specifies whether to perform parallel incremental migration. Valid values: `true`: Parallel incremental migration is enabled. This consumes more memory. `false`: Parallel incremental migration is disabled. Default value: `false`.	`incr_sync_parallel = false`
source.access_key_id	The AccessKey ID for the Amazon DynamoDB database.	`source.access_key_id = xxxxxxxxxxx`
source.secret_access_key	The AccessKey secret for the Amazon DynamoDB database.	`source.secret_access_key = xxxxxxxxxx`
source.session_token	The temporary key for accessing the Amazon DynamoDB database. Optional if no temporary key is used.	`source.session_token = xxxxxxxxxx`
source.region	The region of the Amazon DynamoDB database. Optional if region is not applicable or auto-detected.	`source.region = us-east-2`
source.endpoint_url	Configurable if the source is an endpoint type. Important Enabling this parameter overrides the preceding source-related parameters.	`source.endpoint_url = "http://192.168.0.1:1010"`
source.session.max_retries	The maximum number of retries after a session failure.	`source.session.max_retries = 3`
source.session.timeout	The session timeout period. `0` indicates that the session timeout is disabled. Unit: milliseconds.	`source.session.timeout = 3000`
filter.collection.white	A whitelist of collection names to migrate. For example, `filter.collection.white = c1;c2` indicates that the `c1` and `c2` collections are migrated and other collections are filtered out.	`filter.collection.white = c1;c2`
filter.collection.black	The names of collections to be filtered out. For example, `filter.collection.black = c1;c2` indicates that the `c1` and `c2` collections are filtered out and other collections are migrated. Important Cannot be used with `filter.collection.white` simultaneously. If both are specified, all collections are migrated.	`filter.collection.black = c1;c2`
qps.full	Limits the execution frequency of the `Scan` command during full migration (maximum calls per second). Default value: 1000.	`qps.full = 1000`
qps.full.batch_num	The number of data entries to pull per second during full migration. Default value: 128.	`qps.full.batch_num = 128`
qps.incr	Limits the execution frequency of the `GetRecords` command during incremental migration (maximum calls per second). Default value: 1000.	`qps.incr = 1000`
qps.incr.batch_num	The number of data entries to pull per second in incremental migration. Default value: 128.	`qps.incr.batch_num = 128`
target.type	The type of the destination database. Valid values: `mongodb`: an ApsaraDB for MongoDB instance. `aliyun_dynamo_proxy`: a DynamoDB-compatible ApsaraDB for MongoDB instance.	`target.type = mongodb`
target.address	The connection string of the destination database. Supports MongoDB connection strings and DynamoDB-compatible connection addresses. For more MongoDB addresses, see Connect to a replica set instance or Connect to a sharded cluster instance.	`target.address = mongodb://username:password@s-*****-pub.mongodb.rds.aliyuncs.com:3717`
target.mongodb.type	The type of the destination ApsaraDB for MongoDB instance. Valid values: `replica`: replica set instance. `sharding`: sharded cluster instance.	`target.mongodb.type = sharding`
target.db.exist	Specifies how to handle existing collections with the same name at the destination. Valid values: `rename`: NimoShake renames existing collections by adding a timestamp suffix to the name. For example, NimoShake changes c1 to c1.2019-07-01Z12:10:11. Warning This may affect your business. Ensure prior preparation. `drop`: Deletes the existing collection at the destination. If not configured, the migration will terminate with an error if a collection with the same name already exists in the destination.	`target.db.exist = drop`
sync_schema_only	Specifies whether to migrate only the table schema. Valid values: `true`: Only the table schema is migrated. `false`: False. Default value: `false`.	`sync_schema_only = false`
full.concurrency	The maximum number of collections that can be migrated concurrently in full migration. Default value: 4.	`full.concurrency = 4`
full.read.concurrency	The document-level concurrency within a table during full migration. This parameter indicates the maximum number of threads that can concurrently read from the source for a single table, corresponding to the TotalSegments parameter of the Scan interface. The number of concurrent threads to read documents from a single table at the source during full migration. Corresponds to the Scan interface's TotalSegments parameter.	`full.read.concurrency = 1`
full.document.concurrency	A parameter for full migration. The number of concurrent threads to write documents from a single table to the destination during full migration. Default: 4 Default value: 4.	`full.document.concurrency = 4`
full.document.write.batch	The number of data entries to be aggregated and written at a time. If the destination is a DynamoDB protocol-compatible database, the maximum value is 25.	`full.document.write.batch = 25`
full.document.parser	A parameter for full migration. The number of concurrent parser threads to convert DynamoDB protocol data to the corresponding protocol for the destination. Default value: 2.	`full.document.parser = 2`
full.enable_index.user	A parameter for full migration. Specifies whether to migrate user-defined indexes. Valid values: `true`: Yes. `false`: No. Default value: `true`.	`full.enable_index.user = true`
full.executor.insert_on_dup_update	A parameter for full migration. Specifies whether to change the `INSERT` operation to the `UPDATE` operation if a duplicate key is encountered on the destination. Valid values: `true`: Yes. `false`: No. Default value: `true`.	`full.executor.insert_on_dup_update = true`
increase.concurrency	A parameter for incremental migration. The maximum number of shards that can be captured concurrently. Default value: 16.	`increase.concurrency = 16`
increase.executor.insert_on_dup_update	A parameter for incremental migration. Specifies whether to change the `INSERT` operation to the `UPDATE` operation if the same keys exist on the destination. Valid values: `true`: Yes. `false`: False. Default value: `true`.	`increase.executor.insert_on_dup_update = true`
increase.executor.upsert	A parameter for incremental migration. Specifies whether to change the `UPDATE` operation to the `UPSERT` operation if no keys are found at the destination. Valid values: `true`: Yes `false`: False Note An `UPSERT` operation checks whether the specified keys exist. If they do, the `UPDATE` operation is performed. Otherwise, the `INSERT` operation is performed.	`increase.executor.upsert = true`
checkpoint.type	The storage type for resumable transmission (checkpoint) information. Valid values: `mongodb`: Checkpoint information is stored in the ApsaraDB for MongoDB database. This value is available only when the `target.type` parameter is set to `mongodb`. `file`: Checkpoint information is stored in your computer.	`checkpoint.type = mongodb`
checkpoint.address	The address for storing checkpoint information. If the `checkpoint.type` parameter is set to `mongodb`, enter the connection string of the ApsaraDB for MongoDB database. If not configured, checkpoint information will be stored in the destination ApsaraDB for MongoDB database. See Connect to a replica set instance or Connect to a sharded cluster instance for details. If the `checkpoint.type` parameter is set to `file`, enter a relative path (such as a checkpoint). Defaults to the checkpoint folder relative to the NimoShake executable if not configured.	`checkpoint.address = mongodb://username:password@s-*****-pub.mongodb.rds.aliyuncs.com:3717`
checkpoint.db	The name of the database for checkpoint information. If not configured, the database name will be in the `<id>-checkpoint` format. Example: `nimo-shake-checkpoint`.	`checkpoint.db = nimo-shake-checkpoint`
convert._id	Adds a prefix to the `_id` field in DynamoDB to avoid conflicts with the `_id` field in MongoDB.	`convert._id = pre`
full.read.filter_expression	The DynamoDB expression used for filtering during the full migration . `:begin` and `:end` are variables that start with a colon. The actual values are specified in `filter_attributevalues`.	`full.read.filter_expression = create_time > :begin AND create_time < :end`
full.read.filter_attributevalues	The values corresponding to the variables in `filter_expression` for full migration filtering. `N` represents Number and `S` represents String.	full.read.filter_attributevalues = begin```N```1646724207280~~~end```N```1646724207283

Start migration: Run the following command to start data migration by using the configured nimo-shake.conf file:
```
./nimo-shake.linux -conf=nimo-shake.conf
```
Note
Upon completion of the full migration, full sync done! will be displayed. If the migration is terminated due to an error, the program will automatically close and print the corresponding error message to assist you in troubleshooting.