2015-07-27 14:55:45 +02:00
|
|
|
#ifndef __CLUSTER_H
|
|
|
|
#define __CLUSTER_H
|
2013-10-09 15:37:20 +02:00
|
|
|
|
|
|
|
/*-----------------------------------------------------------------------------
|
2024-04-09 01:24:03 -07:00
|
|
|
* Cluster exported API.
|
2013-10-09 15:37:20 +02:00
|
|
|
*----------------------------------------------------------------------------*/
|
|
|
|
|
2024-05-22 23:24:12 -07:00
|
|
|
#define CLUSTER_SLOT_MASK_BITS 14 /* Number of bits used for slot id. */
|
|
|
|
#define CLUSTER_SLOTS (1 << CLUSTER_SLOT_MASK_BITS) /* Total number of slots in cluster mode, which is 16384. */
|
Replace cluster metadata with slot specific dictionaries (#11695)
This is an implementation of https://github.com/redis/redis/issues/10589 that eliminates 16 bytes per entry in cluster mode, that are currently used to create a linked list between entries in the same slot. Main idea is splitting main dictionary into 16k smaller dictionaries (one per slot), so we can perform all slot specific operations, such as iteration, without any additional info in the `dictEntry`. For Redis cluster, the expectation is that there will be a larger number of keys, so the fixed overhead of 16k dictionaries will be The expire dictionary is also split up so that each slot is logically decoupled, so that in subsequent revisions we will be able to atomically flush a slot of data.
## Important changes
* Incremental rehashing - one big change here is that it's not one, but rather up to 16k dictionaries that can be rehashing at the same time, in order to keep track of them, we introduce a separate queue for dictionaries that are rehashing. Also instead of rehashing a single dictionary, cron job will now try to rehash as many as it can in 1ms.
* getRandomKey - now needs to not only select a random key, from the random bucket, but also needs to select a random dictionary. Fairness is a major concern here, as it's possible that keys can be unevenly distributed across the slots. In order to address this search we introduced binary index tree). With that data structure we are able to efficiently find a random slot using binary search in O(log^2(slot count)) time.
* Iteration efficiency - when iterating dictionary with a lot of empty slots, we want to skip them efficiently. We can do this using same binary index that is used for random key selection, this index allows us to find a slot for a specific key index. For example if there are 10 keys in the slot 0, then we can quickly find a slot that contains 11th key using binary search on top of the binary index tree.
* scan API - in order to perform a scan across the entire DB, the cursor now needs to not only save position within the dictionary but also the slot id. In this change we append slot id into LSB of the cursor so it can be passed around between client and the server. This has interesting side effect, now you'll be able to start scanning specific slot by simply providing slot id as a cursor value. The plan is to not document this as defined behavior, however. It's also worth nothing the SCAN API is now technically incompatible with previous versions, although practically we don't believe it's an issue.
* Checksum calculation optimizations - During command execution, we know that all of the keys are from the same slot (outside of a few notable exceptions such as cross slot scripts and modules). We don't want to compute the checksum multiple multiple times, hence we are relying on cached slot id in the client during the command executions. All operations that access random keys, either should pass in the known slot or recompute the slot.
* Slot info in RDB - in order to resize individual dictionaries correctly, while loading RDB, it's not enough to know total number of keys (of course we could approximate number of keys per slot, but it won't be precise). To address this issue, we've added additional metadata into RDB that contains number of keys in each slot, which can be used as a hint during loading.
* DB size - besides `DBSIZE` API, we need to know size of the DB in many places want, in order to avoid scanning all dictionaries and summing up their sizes in a loop, we've introduced a new field into `redisDb` that keeps track of `key_count`. This way we can keep DBSIZE operation O(1). This is also kept for O(1) expires computation as well.
## Performance
This change improves SET performance in cluster mode by ~5%, most of the gains come from us not having to maintain linked lists for keys in slot, non-cluster mode has same performance. For workloads that rely on evictions, the performance is similar because of the extra overhead for finding keys to evict.
RDB loading performance is slightly reduced, as the slot of each key needs to be computed during the load.
## Interface changes
* Removed `overhead.hashtable.slot-to-keys` to `MEMORY STATS`
* Scan API will now require 64 bits to store the cursor, even on 32 bit systems, as the slot information will be stored.
* New RDB version to support the new op code for SLOT information.
---------
Co-authored-by: Vitaly Arbuzov <arvit@amazon.com>
Co-authored-by: Harkrishn Patro <harkrisp@amazon.com>
Co-authored-by: Roshan Khatri <rvkhatri@amazon.com>
Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>
Co-authored-by: Oran Agra <oran@redislabs.com>
2023-10-14 23:58:26 -07:00
|
|
|
#define CLUSTER_SLOT_MASK ((unsigned long long)(CLUSTER_SLOTS - 1)) /* Bit mask for slot id stored in LSB. */
|
2024-05-22 23:24:12 -07:00
|
|
|
#define CLUSTER_OK 0 /* Everything looks ok */
|
|
|
|
#define CLUSTER_FAIL 1 /* The cluster can't work */
|
|
|
|
#define CLUSTER_NAMELEN 40 /* sha1 hex length */
|
2013-10-09 15:37:20 +02:00
|
|
|
|
2014-03-07 13:19:09 +01:00
|
|
|
/* Redirection errors returned by getNodeByQuery(). */
|
2015-07-27 14:55:45 +02:00
|
|
|
#define CLUSTER_REDIR_NONE 0 /* Node can serve the request. */
|
|
|
|
#define CLUSTER_REDIR_CROSS_SLOT 1 /* -CROSSSLOT request. */
|
|
|
|
#define CLUSTER_REDIR_UNSTABLE 2 /* -TRYAGAIN redirection required */
|
|
|
|
#define CLUSTER_REDIR_ASK 3 /* -ASK redirection required. */
|
|
|
|
#define CLUSTER_REDIR_MOVED 4 /* -MOVED redirection required. */
|
|
|
|
#define CLUSTER_REDIR_DOWN_STATE 5 /* -CLUSTERDOWN, global state. */
|
|
|
|
#define CLUSTER_REDIR_DOWN_UNBOUND 6 /* -CLUSTERDOWN, unbound slot. */
|
2019-10-30 00:11:17 -07:00
|
|
|
#define CLUSTER_REDIR_DOWN_RO_STATE 7 /* -CLUSTERDOWN, allow reads. */
|
2014-03-07 13:19:09 +01:00
|
|
|
|
2023-10-30 17:08:30 +02:00
|
|
|
typedef struct _clusterNode clusterNode;
|
|
|
|
struct clusterState;
|
2013-10-09 15:37:20 +02:00
|
|
|
|
2024-04-09 01:24:03 -07:00
|
|
|
/* Flags that a module can set in order to prevent certain Cluster
|
2018-09-19 11:20:52 +02:00
|
|
|
* features to be enabled. Useful when implementing a different distributed
|
2024-04-09 01:24:03 -07:00
|
|
|
* system on top of Cluster message bus, using modules. */
|
2018-09-19 11:31:22 +02:00
|
|
|
#define CLUSTER_MODULE_FLAG_NONE 0
|
2024-05-22 23:24:12 -07:00
|
|
|
#define CLUSTER_MODULE_FLAG_NO_FAILOVER (1 << 1)
|
|
|
|
#define CLUSTER_MODULE_FLAG_NO_REDIRECTION (1 << 2)
|
2018-09-19 11:20:52 +02:00
|
|
|
|
2014-01-28 16:34:23 +01:00
|
|
|
/* ---------------------- API exported outside cluster.c -------------------- */
|
2023-11-06 09:42:32 +02:00
|
|
|
/* functions requiring mechanism specific implementations */
|
2021-06-16 06:35:13 +03:00
|
|
|
void clusterInit(void);
|
2023-11-09 11:04:47 +02:00
|
|
|
void clusterInitLast(void);
|
2021-06-16 06:35:13 +03:00
|
|
|
void clusterCron(void);
|
|
|
|
void clusterBeforeSleep(void);
|
|
|
|
int verifyClusterConfigWithData(void);
|
2023-11-06 09:42:32 +02:00
|
|
|
|
2024-05-22 23:24:12 -07:00
|
|
|
int clusterSendModuleMessageToTarget(const char *target,
|
|
|
|
uint64_t module_id,
|
|
|
|
uint8_t type,
|
|
|
|
const char *payload,
|
|
|
|
uint32_t len);
|
2023-11-06 09:42:32 +02:00
|
|
|
|
2021-11-08 10:56:03 +08:00
|
|
|
void clusterUpdateMyselfFlags(void);
|
|
|
|
void clusterUpdateMyselfIp(void);
|
2022-01-02 19:48:29 -08:00
|
|
|
void clusterUpdateMyselfHostname(void);
|
2022-11-26 10:01:01 +08:00
|
|
|
void clusterUpdateMyselfAnnouncedPorts(void);
|
2023-11-06 09:42:32 +02:00
|
|
|
void clusterUpdateMyselfHumanNodename(void);
|
|
|
|
|
|
|
|
void clusterPropagatePublish(robj *channel, robj *message, int sharded);
|
|
|
|
|
|
|
|
unsigned long getClusterConnectionsCount(void);
|
|
|
|
int isClusterHealthy(void);
|
|
|
|
|
Support TLS service when "tls-cluster" is not enabled and persist both plain and TLS port in nodes.conf (#12233)
Originally, when "tls-cluster" is enabled, `port` is set to TLS port. In order to support non-TLS clients, `pport` is used to propagate TCP port across cluster nodes. However when "tls-cluster" is disabled, `port` is set to TCP port, and `pport` is not used, which means the cluster cannot provide TLS service unless "tls-cluster" is on.
```
typedef struct {
// ...
uint16_t port; /* Latest known clients port (TLS or plain). */
uint16_t pport; /* Latest known clients plaintext port. Only used if the main clients port is for TLS. */
// ...
} clusterNode;
```
```
typedef struct {
// ...
uint16_t port; /* TCP base port number. */
uint16_t pport; /* Sender TCP plaintext port, if base port is TLS */
// ...
} clusterMsg;
```
This PR renames `port` and `pport` in `clusterNode` to `tcp_port` and `tls_port`, to record both ports no matter "tls-cluster" is enabled or disabled.
This allows to provide TLS service to clients when "tls-cluster" is disabled: when displaying cluster topology, or giving `MOVED` error, server can provide TLS or TCP port according to client's connection type, no matter what type of connection cluster bus is using.
For backwards compatibility, `port` and `pport` in `clusterMsg` are preserved, when "tls-cluster" is enabled, `port` is set to TLS port and `pport` is set to TCP port, when "tls-cluster" is disabled, `port` is set to TCP port and `pport` is set to TLS port (instead of 0).
Also, in the nodes.conf file, a new aux field displaying an extra port is added to complete the persisted info. We may have `tls_port=xxxxx` or `tcp_port=xxxxx` in the aux field, to complete the cluster topology, while the other port is stored in the normal `<ip>:<port>` field. The format is shown below.
```
<node-id> <ip>:<tcp_port>@<cport>,<hostname>,shard-id=...,tls-port=6379 myself,master - 0 0 0 connected 0-1000
```
Or we can switch the position of two ports, both can be correctly resolved.
```
<node-id> <ip>:<tls_port>@<cport>,<hostname>,shard-id=...,tcp-port=6379 myself,master - 0 0 0 connected 0-1000
```
2023-06-26 22:43:38 +08:00
|
|
|
sds clusterGenNodesDescription(client *c, int filter, int tls_primary);
|
2023-05-02 17:31:32 -07:00
|
|
|
sds genClusterInfoString(void);
|
2023-11-06 09:42:32 +02:00
|
|
|
/* handle implementation specific debug cluster commands. Return 1 if handled, 0 otherwise. */
|
|
|
|
int handleDebugClusterCommand(client *c);
|
2023-11-09 11:04:47 +02:00
|
|
|
const char **clusterDebugCommandExtendedHelp(void);
|
2023-11-06 09:42:32 +02:00
|
|
|
/* handle implementation specific cluster commands. Return 1 if handled, 0 otherwise. */
|
|
|
|
int clusterCommandSpecial(client *c);
|
2024-05-14 17:09:49 -07:00
|
|
|
const char **clusterCommandExtendedHelp(void);
|
2023-11-06 09:42:32 +02:00
|
|
|
|
|
|
|
int clusterAllowFailoverCmd(client *c);
|
2024-06-07 14:21:33 -07:00
|
|
|
void clusterPromoteSelfToPrimary(void);
|
2023-11-06 09:42:32 +02:00
|
|
|
int clusterManualFailoverTimeLimit(void);
|
|
|
|
|
2024-05-14 17:09:49 -07:00
|
|
|
void clusterCommandSlots(client *c);
|
2023-11-06 09:42:32 +02:00
|
|
|
void clusterCommandMyId(client *c);
|
|
|
|
void clusterCommandMyShardId(client *c);
|
|
|
|
void clusterCommandShards(client *c);
|
|
|
|
sds clusterGenNodeDescription(client *c, clusterNode *node, int tls_primary);
|
|
|
|
|
2023-11-01 12:37:00 +02:00
|
|
|
int clusterNodeCoversSlot(clusterNode *n, int slot);
|
Support TLS service when "tls-cluster" is not enabled and persist both plain and TLS port in nodes.conf (#12233)
Originally, when "tls-cluster" is enabled, `port` is set to TLS port. In order to support non-TLS clients, `pport` is used to propagate TCP port across cluster nodes. However when "tls-cluster" is disabled, `port` is set to TCP port, and `pport` is not used, which means the cluster cannot provide TLS service unless "tls-cluster" is on.
```
typedef struct {
// ...
uint16_t port; /* Latest known clients port (TLS or plain). */
uint16_t pport; /* Latest known clients plaintext port. Only used if the main clients port is for TLS. */
// ...
} clusterNode;
```
```
typedef struct {
// ...
uint16_t port; /* TCP base port number. */
uint16_t pport; /* Sender TCP plaintext port, if base port is TLS */
// ...
} clusterMsg;
```
This PR renames `port` and `pport` in `clusterNode` to `tcp_port` and `tls_port`, to record both ports no matter "tls-cluster" is enabled or disabled.
This allows to provide TLS service to clients when "tls-cluster" is disabled: when displaying cluster topology, or giving `MOVED` error, server can provide TLS or TCP port according to client's connection type, no matter what type of connection cluster bus is using.
For backwards compatibility, `port` and `pport` in `clusterMsg` are preserved, when "tls-cluster" is enabled, `port` is set to TLS port and `pport` is set to TCP port, when "tls-cluster" is disabled, `port` is set to TCP port and `pport` is set to TLS port (instead of 0).
Also, in the nodes.conf file, a new aux field displaying an extra port is added to complete the persisted info. We may have `tls_port=xxxxx` or `tcp_port=xxxxx` in the aux field, to complete the cluster topology, while the other port is stored in the normal `<ip>:<port>` field. The format is shown below.
```
<node-id> <ip>:<tcp_port>@<cport>,<hostname>,shard-id=...,tls-port=6379 myself,master - 0 0 0 connected 0-1000
```
Or we can switch the position of two ports, both can be correctly resolved.
```
<node-id> <ip>:<tls_port>@<cport>,<hostname>,shard-id=...,tcp-port=6379 myself,master - 0 0 0 connected 0-1000
```
2023-06-26 22:43:38 +08:00
|
|
|
int getNodeDefaultClientPort(clusterNode *n);
|
2023-11-14 14:32:51 +02:00
|
|
|
clusterNode *getMyClusterNode(void);
|
2023-10-30 12:38:43 +02:00
|
|
|
int getClusterSize(void);
|
2023-12-07 14:30:48 +08:00
|
|
|
int getMyShardSlotCount(void);
|
2023-10-30 17:08:30 +02:00
|
|
|
int handleDebugClusterCommand(client *c);
|
2024-05-14 17:09:49 -07:00
|
|
|
int clusterNodePending(clusterNode *node);
|
2024-06-07 14:21:33 -07:00
|
|
|
int clusterNodeIsPrimary(clusterNode *n);
|
2023-11-14 14:32:51 +02:00
|
|
|
char **getClusterNodesList(size_t *numnodes);
|
|
|
|
char *clusterNodeIp(clusterNode *node);
|
2024-06-07 14:21:33 -07:00
|
|
|
int clusterNodeIsReplica(clusterNode *node);
|
|
|
|
clusterNode *clusterNodeGetPrimary(clusterNode *node);
|
2023-11-14 14:32:51 +02:00
|
|
|
char *clusterNodeGetName(clusterNode *node);
|
2023-10-30 17:08:30 +02:00
|
|
|
int clusterNodeTimedOut(clusterNode *node);
|
|
|
|
int clusterNodeIsFailing(clusterNode *node);
|
|
|
|
int clusterNodeIsNoFailover(clusterNode *node);
|
2023-11-14 14:32:51 +02:00
|
|
|
char *clusterNodeGetShardId(clusterNode *node);
|
2024-06-07 14:21:33 -07:00
|
|
|
int clusterNodeNumReplicas(clusterNode *node);
|
|
|
|
clusterNode *clusterNodeGetReplica(clusterNode *node, int slave_idx);
|
2023-11-02 11:38:31 +02:00
|
|
|
clusterNode *getMigratingSlotDest(int slot);
|
|
|
|
clusterNode *getImportingSlotSource(int slot);
|
|
|
|
clusterNode *getNodeBySlot(int slot);
|
2023-11-09 11:04:47 +02:00
|
|
|
int clusterNodeClientPort(clusterNode *n, int use_tls);
|
2023-11-14 14:32:51 +02:00
|
|
|
char *clusterNodeHostname(clusterNode *node);
|
2023-11-09 11:04:47 +02:00
|
|
|
const char *clusterNodePreferredEndpoint(clusterNode *n);
|
|
|
|
long long clusterNodeReplOffset(clusterNode *node);
|
2023-11-06 09:42:32 +02:00
|
|
|
clusterNode *clusterLookupNode(const char *name, int length);
|
2024-05-23 02:51:41 +05:30
|
|
|
int detectAndUpdateCachedNodeHealth(void);
|
|
|
|
client *createCachedResponseClient(void);
|
|
|
|
void deleteCachedResponseClient(client *recording_client);
|
|
|
|
void clearCachedClusterSlotsResponse(void);
|
Support TLS service when "tls-cluster" is not enabled and persist both plain and TLS port in nodes.conf (#12233)
Originally, when "tls-cluster" is enabled, `port` is set to TLS port. In order to support non-TLS clients, `pport` is used to propagate TCP port across cluster nodes. However when "tls-cluster" is disabled, `port` is set to TCP port, and `pport` is not used, which means the cluster cannot provide TLS service unless "tls-cluster" is on.
```
typedef struct {
// ...
uint16_t port; /* Latest known clients port (TLS or plain). */
uint16_t pport; /* Latest known clients plaintext port. Only used if the main clients port is for TLS. */
// ...
} clusterNode;
```
```
typedef struct {
// ...
uint16_t port; /* TCP base port number. */
uint16_t pport; /* Sender TCP plaintext port, if base port is TLS */
// ...
} clusterMsg;
```
This PR renames `port` and `pport` in `clusterNode` to `tcp_port` and `tls_port`, to record both ports no matter "tls-cluster" is enabled or disabled.
This allows to provide TLS service to clients when "tls-cluster" is disabled: when displaying cluster topology, or giving `MOVED` error, server can provide TLS or TCP port according to client's connection type, no matter what type of connection cluster bus is using.
For backwards compatibility, `port` and `pport` in `clusterMsg` are preserved, when "tls-cluster" is enabled, `port` is set to TLS port and `pport` is set to TCP port, when "tls-cluster" is disabled, `port` is set to TCP port and `pport` is set to TLS port (instead of 0).
Also, in the nodes.conf file, a new aux field displaying an extra port is added to complete the persisted info. We may have `tls_port=xxxxx` or `tcp_port=xxxxx` in the aux field, to complete the cluster topology, while the other port is stored in the normal `<ip>:<port>` field. The format is shown below.
```
<node-id> <ip>:<tcp_port>@<cport>,<hostname>,shard-id=...,tls-port=6379 myself,master - 0 0 0 connected 0-1000
```
Or we can switch the position of two ports, both can be correctly resolved.
```
<node-id> <ip>:<tls_port>@<cport>,<hostname>,shard-id=...,tcp-port=6379 myself,master - 0 0 0 connected 0-1000
```
2023-06-26 22:43:38 +08:00
|
|
|
|
2023-11-06 09:42:32 +02:00
|
|
|
/* functions with shared implementations */
|
2024-05-14 17:09:49 -07:00
|
|
|
int clusterNodeIsMyself(clusterNode *n);
|
2024-04-03 23:54:33 +07:00
|
|
|
clusterNode *getNodeByQuery(client *c, struct serverCommand *cmd, robj **argv, int argc, int *hashslot, int *ask);
|
2023-11-06 09:42:32 +02:00
|
|
|
int clusterRedirectBlockedClientIfNeeded(client *c);
|
|
|
|
void clusterRedirectClient(client *c, clusterNode *n, int hashslot, int error_code);
|
|
|
|
void migrateCloseTimedoutSockets(void);
|
|
|
|
unsigned int keyHashSlot(char *key, int keylen);
|
|
|
|
int patternHashSlot(char *pattern, int length);
|
|
|
|
int isValidAuxString(char *s, unsigned int length);
|
|
|
|
void migrateCommand(client *c);
|
|
|
|
void clusterCommand(client *c);
|
2023-11-01 09:44:11 +02:00
|
|
|
ConnectionType *connTypeOfCluster(void);
|
2024-05-23 02:51:41 +05:30
|
|
|
int isNodeAvailable(clusterNode *node);
|
|
|
|
long long getNodeReplicationOffset(clusterNode *node);
|
|
|
|
sds aggregateClientOutputBuffer(client *c);
|
2015-07-27 14:55:45 +02:00
|
|
|
#endif /* __CLUSTER_H */
|