In Valkey Cluster, you can use a
process known as slot migration to scale your cluster in or out. During
slot migration, one or more of the 16384 hash slots are moved from a
source node to a target node. Valkey 9.0 introduced a option for
migrating hash slots known as atomic slot migration,
which is faster, more reliable, and has less impact on client
applications than the legacy CLUSTER SETSLOT-based
migration.
CLUSTER MIGRATESLOTSValkey 9.0 does not get rid of the legacy slot migration option, but it does introduce atomic slot migration as a second option. To perform an atomic slot migration, an operator performs the following steps:
<source>
CLUSTER MIGRATESLOTS SLOTSRANGE <start-slot> <end-slot> NODE <target><source> for progress using
CLUSTER GETSLOTMIGRATIONSCLUSTER MIGRATESLOTS initiates a migration of the
designated slot range to the specified target node. The slot migration
process is then performed asynchronously.
For more details on CLUSTER MIGRATESLOTS see the command
documentation.
The CLUSTER GETSLOTMIGRATIONS command allows you to poll
the status of your migration. CLUSTER GETSLOTMIGRATIONS can
be executed on either the source node or the target node. In progress
migrations will always be shown, and recently completed migrations will
be visible up to a configurable threshold. In the case of a failure, the
slot migration will also include a short description of the failure to
allow for retry decisions.
For more details on CLUSTER GETSLOTMIGRATIONS see the command
documentation.
If you need to cancel a slot migration after the process was started,
Valkey provides the CLUSTER CANCELSLOTMIGRATIONS command to
cancel all active atomic slot migrations for which that node is the
source node. This command can be sent to the whole cluster to cancel all
slot migrations everywhere.
For more details on CLUSTER CANCELSLOTMIGRATIONS see the
command
documentation.
Atomic slot migration utilizes a completely different process than
CLUSTER SETSLOT-based migrations:
CLUSTER MIGRATESLOTS is received by
the source node, it initiates a connection to the target node and
performs authentication, similar to how a replication link is
initialized.CLUSTER SYNCSLOTS - to inform the target of the
migration.MOVED redirections to the target node, which now owns the
hash slots. The slot migration is completed.Since slot ownership is not moved until the very end of the
migration, commands targeting migrating hash slots on the target node
will receive MOVED redirections per the cluster
specification. But there are some commands that operate on the entire
database:
KEYS/SCAN: These commands allow a client
to list out all keys on a shard.DBSIZE/INFO: These commands provide
statistical information about how many keys are on a shard.FLUSHDB/FLUSHALL: These commands allow a
client to drop all data in a database, or on all databases, on a
node.To handle this, all importing hash slots are marked specially and hidden from read operations on both the target primary and the target replica.
FLUSHDB and FLUSHALL present a special case
where we fail the slot migration when being executed on both the
source and target node. It is expected that operators would
retry the migration after flushing, which should now succeed almost
instantly due to an empty database.
Some configurations may be worth tuning based on your workload:
client-output-buffer-limit: Since atomic slot migration
uses the replication process to migrate the slots, the amount of
accumulated mutations while snapshotting could exceed that of the
configured replication output buffer limit. Both the hard and soft
limits of the replica client output buffer should be
configured large enough to accumulate the accumulated mutations.slot-migration-max-failover-repl-bytes: By default,
atomic slot migration will only proceed to pausing mutations on the
source node once all in-flight mutations have been sent to the target
node. However, for workloads with persistently high write throughput,
atomic slot migration can be configured to do the pause so long as all
in-flight mutations are under a given threshold.cluster-slot-migration-log-max-len: atomic slot
migration keeps track of all in progress migrations and recently
completed or failed migrations. These can be viewed with
CLUSTER GETSLOTMIGRATIONS. The number of recently completed
migrations stored can be increased using this configuration.