All Products
Search
Document Center

ApsaraDB for MongoDB:NimoShake - Migrate Amazon DynamoDB to Alibaba Cloud

Last Updated:Jul 07, 2025

NimoShake (also known as DynamoShake) is a data synchronization tool developed by Alibaba Cloud. It allows you to migrate Amazon DynamoDB databases to Alibaba Cloud.

Prerequisites

An ApsaraDB for MongoDB instance is created. For more information, see Create a replica set instance or Create a sharded cluster instance.

Background information

NimoShake is primarily designed for migrations from an Amazon DynamoDB database. The destination must be an ApsaraDB for MongoDB database.For more information, see NimoShake overview.

Important considerations

  • Resource consumption. A full data migration consumes resources in both the source and destination databases. This may increase the load of your database servers. If your database experiences high traffic or if the server specifications are excesssively low, this could lead to increased database pressure. We strongly recommend carefully evaluating the potential impact on performance before migrating data, and performing the migration during off-peak hours.

  • Storage capacity. Make sure that the storage space of the ApsaraDB for MongoDB instance is larger than that of the Amazon DynamoDB database.

Key terms

  • Resumable transmission: This feature divides a task into multiple parts for transmission. If transmission is interrupted due to network failures or other reasons, the task can resume from where it left off, rather than restarting from the beginning.

    Note
    • Full migration does not support resumable transmission.

    • Incremental migration does support resumable transmission. If an incremental synchronization connection is lost and recovered within a short timeframe, the synchronization can continue. However, in certain situations, such as prolonged disconnection or loss of the previous checkpoint, a full synchronization may be triggered again.

  • Checkpoint: Resumable transmission for incremental synchronization is achieved through checkpoints. By default, checkpoints are written to the destination MongoDB database, specifically to a database named nimo-shake-checkpoint. Each collection records its own checkpoint table, and a status_table records whether the current synchronization is a full or incremental task.

NimoShake features

NimoShake currently supports a separated synchronization mechanism, carried out in two steps:

  1. Full data migration.

  2. Incremental data migration.

Full migration

Full migration consists of two parts: data migration and index migration. The following figure shows the basic architecture:全量同步基本架构

  • Data migration: NimoShake uses multiple concurrent threads to pull source data, as shown in the following figure:数据同步

    Thread

    Description

    Fetcher

    Calls the protocol conversion driver provided by Amazon to batch-retrieve data from the source table and place it in queues until all source data is pulled.

    Note

    Only one fetcher thread is provided.

    Parser

    Reads data from queues and parses it into a BSON structure. After data is parsed, the parser writes data to the executor's thread. Multiple parser threads can be started. The default value is 2. Adjust the number of parser threads using the FullDocumentParser parameter .

    Executor

    Pulls data from queues, then aggregates and writes data to the destination ApsaraDB for MongoDB database. Up to 16 MB of data or 1,024 entries can be aggregated. Multiple executor threads can be started. The default value is 4. Adjust the number of executor threads using the FullDocumentConcurrency parameter.

  • Index migration: NimoShake creates indexes after data migration is complete. Indexes are categorized into auto-generated and user-created indexes:

    • Auto-generated indexes:

      • If you have a partition key and a sort key, NimoShake will create a unique composite index and write it to MongoDB.

      • NimoShake will also create a hash index for the partition key and write it to MongoDB.

      • If you have only a partition key, NimoShake will create a hash index and a unique index in MongoDB.

    • User-created indexes: If you have a user-created index, NimoShake creates a hash index based on the primary key and writes the index to the destination ApsaraDB for MongoDB database.

Incremental migration

Incremental migration only synchronizes data; it does not synchronize indexes generated during the incremental synchronization process. The basic architecture is as follows:增量同步架构图

Thread

Description

Fetcher

Monitors shard changes in the stream.

Manager

Manages message notification and dispatcher creation; each shard corresponds to one Dispatcher.

Dispatcher

Retrieves incremental data from the source. For resumable transmission, data retrieval resumes from the last Checkpoint instead of the beginning.

Batcher

Parses, packages, and aggregates incremental data retrieved by the Dispatcher thread.

Executor

Writes the aggregated data to the destination ApsaraDB for MongoDB database and updates the checkpoint.

Steps: Migrate Amazon DynamoDB to Alibaba Cloud (Ubuntu example)

This section demonstrates how to use NimoShake to migrate an Amazon DynamoDB database to ApsaraDB for MongoDB, using an Ubuntu system as an example.

  1. Download NimoShake: Run the following command to download the NimoShake package:

    wget https://github.com/alibaba/NimoShake/releases/download/release-v1.0.14-20250704/nimo-shake-v1.0.14.tar.gz
    Note

    We recommend downloading the latest version of the NimoShake package.

  2. Decompress the package: Run the following command to decompress the NimoShake package:

    tar zxvf nimo-shake-v1.0.14.tar.gz
  3. Access the directory: After decompression, run the cd nimo-shake-v1.0.14 command to enter the nimo folder.

  4. Open configuration file: Run the vi nimo-shake.conf command to open the NimoShake configuration file.

  5. Configure NimoShake: Configure the parameters in the nimo-shake.conf file. The following table describes each configuration item:

    Parameter

    Description

    Example

    id

    The ID of the migration task. This is customizable and used for outputting PID files, log names, the database name for checkpoint storage, and the destination database name.

    id = nimo-shake

    log.file

    The path of the log file. If this parameter is not configured, logs are displayed in stdout.

    log.file = nimo-shake.log

    log.level

    The logging level. Valid values:

    • none: No logs

    • error: Error messages

    • warn: Warning information

    • info: System status

    • debug: Debugging information

    Default value: info.

    log.level = info

    log.buffer

    Specifies whether to enable log buffering. Valid values:

    • true: Log buffering is enabled. Log buffering ensures high performance but may lose a few of the latest log entries upon exit.

    • false: Log buffering is disabled. If disabled, performance may be degraded. However, all log entries are displayed upon exit.

    Default value: true.

    log.buffer = true

    system_profile

    The PPROF port, used for debugging and displaying stackful coroutine information.

    system_profile = 9330

    full_sync.http_port

    The RESTful port for the full migration phase. Use curl to view internal monitoring statistics. For more information, see the wiki.

    full_sync.http_port = 9341

    incr_sync.http_port

    The RESTful port for the incremental migration phase. Use curl to view internal monitoring statistics. For more information, see the wiki.

    incr_sync.http_port = 9340

    sync_mode

    The type of data migration. Valid values:

    • all: Full migration and incremental migration

    • full: Only full migration

    Default value: all.

    Note

    Only full is supported when the source is an ApsaraDB for MongoDB instance compatible with the DynamoDB protocol.

    sync_mode = all

    incr_sync_parallel

    Specifies whether to perform parallel incremental migration. Valid values:

    • true: Parallel incremental migration is enabled. This consumes more memory.

    • false: Parallel incremental migration is disabled.

    Default value: false.

    incr_sync_parallel = false

    source.access_key_id

    The AccessKey ID for the Amazon DynamoDB database.

    source.access_key_id = xxxxxxxxxxx

    source.secret_access_key

    The AccessKey secret for the Amazon DynamoDB database.

    source.secret_access_key = xxxxxxxxxx

    source.session_token

    The temporary key for accessing the Amazon DynamoDB database. Optional if no temporary key is used.

    source.session_token = xxxxxxxxxx

    source.region

    The region of the Amazon DynamoDB database. Optional if region is not applicable or auto-detected.

    source.region = us-east-2

    source.endpoint_url

    Configurable if the source is an endpoint type.

    Important

    Enabling this parameter overrides the preceding source-related parameters.

    source.endpoint_url = "http://192.168.0.1:1010"

    source.session.max_retries

    The maximum number of retries after a session failure.

    source.session.max_retries = 3

    source.session.timeout

    The session timeout period. 0 indicates that the session timeout is disabled. Unit: milliseconds.

    source.session.timeout = 3000

    filter.collection.white

    A whitelist of collection names to migrate. For example, filter.collection.white = c1;c2 indicates that the c1 and c2 collections are migrated and other collections are filtered out.

    filter.collection.white = c1;c2

    filter.collection.black

    The names of collections to be filtered out. For example, filter.collection.black = c1;c2 indicates that the c1 and c2 collections are filtered out and other collections are migrated.

    Important

    Cannot be used with filter.collection.white simultaneously. If both are specified, all collections are migrated.

    filter.collection.black = c1;c2

    qps.full

    Limits the execution frequency of the Scan command during full migration (maximum calls per second).

    Default value: 1000.

    qps.full = 1000

    qps.full.batch_num

    The number of data entries to pull per second during full migration.

    Default value: 128.

    qps.full.batch_num = 128

    qps.incr

    Limits the execution frequency of the GetRecords command during incremental migration (maximum calls per second).

    Default value: 1000.

    qps.incr = 1000

    qps.incr.batch_num

    The number of data entries to pull per second in incremental migration.

    Default value: 128.

    qps.incr.batch_num = 128

    target.type

    The type of the destination database. Valid values:

    • mongodb: an ApsaraDB for MongoDB instance.

    • aliyun_dynamo_proxy: a DynamoDB-compatible ApsaraDB for MongoDB instance.

    target.type = mongodb

    target.address

    The connection string of the destination database. Supports MongoDB connection strings and DynamoDB-compatible connection addresses.

    For more MongoDB addresses, see Connect to a replica set instance or Connect to a sharded cluster instance.

    target.address = mongodb://username:password@s-*****-pub.mongodb.rds.aliyuncs.com:3717

    target.mongodb.type

    The type of the destination ApsaraDB for MongoDB instance. Valid values:

    • replica: replica set instance.

    • sharding: sharded cluster instance.

    target.mongodb.type = sharding

    target.db.exist

    Specifies how to handle existing collections with the same name at the destination. Valid values:

    • rename: NimoShake renames existing collections by adding a timestamp suffix to the name. For example, NimoShake changes c1 to c1.2019-07-01Z12:10:11.

      Warning

      This may affect your business. Ensure prior preparation.

    • drop: Deletes the existing collection at the destination.

    If not configured, the migration will terminate with an error if a collection with the same name already exists in the destination.

    target.db.exist = drop

    sync_schema_only

    Specifies whether to migrate only the table schema. Valid values:

    • true: Only the table schema is migrated.

    • false: False.

    Default value: false.

    sync_schema_only = false

    full.concurrency

    The maximum number of collections that can be migrated concurrently in full migration.

    Default value: 4.

    full.concurrency = 4

    full.read.concurrency

    The document-level concurrency within a table during full migration. This parameter indicates the maximum number of threads that can concurrently read from the source for a single table, corresponding to the TotalSegments parameter of the Scan interface.

    The number of concurrent threads to read documents from a single table at the source during full migration. Corresponds to the Scan interface's TotalSegments parameter.

    full.read.concurrency = 1

    full.document.concurrency

    A parameter for full migration. The number of concurrent threads to write documents from a single table to the destination during full migration. Default: 4

    Default value: 4.

    full.document.concurrency = 4

    full.document.write.batch

    The number of data entries to be aggregated and written at a time. If the destination is a DynamoDB protocol-compatible database, the maximum value is 25.

    full.document.write.batch = 25

    full.document.parser

    A parameter for full migration. The number of concurrent parser threads to convert DynamoDB protocol data to the corresponding protocol for the destination.

    Default value: 2.

    full.document.parser = 2

    full.enable_index.user

    A parameter for full migration. Specifies whether to migrate user-defined indexes. Valid values:

    • true: Yes.

    • false: No.

    Default value: true.

    full.enable_index.user = true

    full.executor.insert_on_dup_update

    A parameter for full migration. Specifies whether to change the INSERT operation to the UPDATE operation if a duplicate key is encountered on the destination. Valid values:

    • true: Yes.

    • false: No.

    Default value: true.

    full.executor.insert_on_dup_update = true

    increase.concurrency

    A parameter for incremental migration. The maximum number of shards that can be captured concurrently.

    Default value: 16.

    increase.concurrency = 16

    increase.executor.insert_on_dup_update

    A parameter for incremental migration. Specifies whether to change the INSERT operation to the UPDATE operation if the same keys exist on the destination. Valid values:

    • true: Yes.

    • false: False.

    Default value: true.

    increase.executor.insert_on_dup_update = true

    increase.executor.upsert

    A parameter for incremental migration. Specifies whether to change the UPDATE operation to the UPSERT operation if no keys are found at the destination. Valid values:

    • true: Yes

    • false: False

    Note

    An UPSERT operation checks whether the specified keys exist. If they do, the UPDATE operation is performed. Otherwise, the INSERT operation is performed.

    increase.executor.upsert = true

    checkpoint.type

    The storage type for resumable transmission (checkpoint) information. Valid values:

    • mongodb: Checkpoint information is stored in the ApsaraDB for MongoDB database. This value is available only when the target.type parameter is set to mongodb.

    • file: Checkpoint information is stored in your computer.

    checkpoint.type = mongodb

    checkpoint.address

    The address for storing checkpoint information.

    • If the checkpoint.type parameter is set to mongodb, enter the connection string of the ApsaraDB for MongoDB database. If not configured, checkpoint information will be stored in the destination ApsaraDB for MongoDB database. See Connect to a replica set instance or Connect to a sharded cluster instance for details.

    • If the checkpoint.type parameter is set to file, enter a relative path (such as a checkpoint). Defaults to the checkpoint folder relative to the NimoShake executable if not configured.

    checkpoint.address = mongodb://username:password@s-*****-pub.mongodb.rds.aliyuncs.com:3717

    checkpoint.db

    The name of the database for checkpoint information. If not configured, the database name will be in the <id>-checkpoint format.

    Example: nimo-shake-checkpoint.

    checkpoint.db = nimo-shake-checkpoint

    convert._id

    Adds a prefix to the _id field in DynamoDB to avoid conflicts with the _id field in MongoDB.

    convert._id = pre

    full.read.filter_expression

    The DynamoDB expression used for filtering during the full migration . :begin and :end are variables that start with a colon. The actual values are specified in filter_attributevalues.

    full.read.filter_expression = create_time > :begin AND create_time < :end

    full.read.filter_attributevalues

    The values corresponding to the variables in filter_expression for full migration filtering.

    N represents Number and S represents String.

    full.read.filter_attributevalues = begin```N```1646724207280~~~end```N```1646724207283

  6. Start migration: Run the following command to start data migration by using the configured nimo-shake.conf file:

    ./nimo-shake.linux -conf=nimo-shake.conf
    Note

    Upon completion of the full migration, full sync done! will be displayed. If the migration is terminated due to an error, the program will automatically close and print the corresponding error message to assist you in troubleshooting.