futriix

Author	SHA1	Message	Date
Pierre	5f7fe9ef21	Send MEET packet to node if there is no inbound link to fix inconsistency when handshake timedout (#1307 ) In some cases, when meeting a new node, if the handshake times out, we can end up with an inconsistent view of the cluster where the new node knows about all the nodes in the cluster, but the cluster does not know about this new node (or vice versa). To detect this inconsistency, we now check if a node has an outbound link but no inbound link, in this case it probably means this node does not know us. In this case we (re-)send a MEET packet to this node to do a new handshake with it. If we receive a MEET packet from a known node, we disconnect the outbound link to force a reconnect and sending of a PING packet so that the other node recognizes the link as belonging to us. This prevents cases where a node could send MEET packets in a loop because it thinks the other node does not have an inbound link. This fixes the bug described in #1251. --------- Signed-off-by: Pierre Turin <pieturin@amazon.com>	2024-12-11 17:26:06 -08:00
Binbin	ee386c92ff	Manual failover vote is not limited by two times the node timeout (#1305 ) This limit should not restrict manual failover, otherwise in some scenarios, manual failover will time out. For example, if some FAILOVER_AUTH_REQUESTs or some FAILOVER_AUTH_ACKs are lost during a manual failover, it cannot vote in the second manual failover. Or in a mixed scenario of plain failover and manual failover, it cannot vote for the subsequent manual failover. The problem with the manual failover retry is that the mf will pause the client 5s in the primary side. So every retry every manual failover timed out is a bad move. --------- Signed-off-by: Binbin <binloveplay1314@qq.com> Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech>	2024-11-19 11:17:20 -05:00
Binbin	22bc49c4a6	Try to stabilize the failover call in the slot migration test (#1078 ) The CI report replica will return the error when performing CLUSTER FAILOVER: ``` -ERR Master is down or failed, please use CLUSTER FAILOVER FORCE ``` This may because the primary state is fail or the cluster connection is disconnected during the primary pause. In this PR, we added some waits in wait_for_role, if the role is replica, we will wait for the replication link and the cluster link to be ok. Signed-off-by: Binbin <binloveplay1314@qq.com>	2024-11-07 13:42:20 +08:00
Binbin	f7c5b40183	Avoid false positive in election tests (#984 ) The node may not be able to initiate an election in time due to problems with cluster communication. If an election is initiated, make sure its offset is 0. Closes #967. Signed-off-by: Binbin <binloveplay1314@qq.com>	2024-09-13 14:53:39 +08:00
Viktor Söderqvist	a323dce890	Dual stack and client-specific IPs in cluster (#736 ) New configs: * `cluster-announce-client-ipv4` * `cluster-announce-client-ipv6` New module API function: * `ValkeyModule_GetClusterNodeInfoForClient`, takes a client id and is otherwise just like its non-ForClient cousin. If configured, one of these IP addresses are reported to each client in CLUSTER SLOTS, CLUSTER SHARDS, CLUSTER NODES and redirects, replacing the IP (`custer-announce-ip` or the auto-detected IP) of each node. Which one is reported to the client depends on whether the client is connected over IPv4 or IPv6. Benefits: * This allows clients using IPv4 to get the IPv4 addresses of all cluster nodes and IPv6 clients to get the IPv6 clients. * This allows the IPs visible to clients to be different to the IPs used between the cluster nodes due to NAT'ing. The information is propagated in the cluster bus using new Ping extensions. (Old nodes without this feature ignore unknown Ping extensions.) This adds another dimension to CLUSTER SLOTS reply. It now depends on the client's use of TLS, the IP address family and RESP version. Refactoring: The cached connection type definition is moved from connection.h (it actually has nothing to do with the connection abstraction) to server.h and is changed to a bitmap, with one bit for each of TLS, IPv6 and RESP3. Fixes #337 --------- Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>	2024-07-10 13:53:52 +02:00
Ping Xie	aad6769a80	Replicate slot migration states via RDB aux fields (#586 )	2024-06-07 20:32:27 -07:00
Binbin	fdd023ff82	Migrate cluster mode tests to normal framework (#442 ) We currently has two disjoint TCL frameworks: 1. Normal testing framework, which trigger by runtest, which individually launches nodes for testing. 2. Cluster framework, which trigger by runtest-cluster, which pre-allocates N nodes and uses them for testing large configurations. The normal TCL testing framework is much more readily tested and is also automatically run as part of the CI for new PRs. The runtest-cluster since it runs very slowly (cannot be parallelized), it currently only runs in daily CI, this results in some changes to the cluster not being exposed in PR CI in time. This PR migrate the Cluster mode tests to normal framework. Some cluster tests are kept in runtest-cluster because of timing issues or not yet supported, we can process them later. Signed-off-by: Binbin <binloveplay1314@qq.com>	2024-05-09 10:14:47 +08:00
VoletiRam	d89ef06ce5	Wait for cluster fully online in cluster_config_consistent (#272 ) Wait for cluster to be in a fully consistent and online state in `cluster_config_consistent`. We expect the `start_server` to create the desired primaries and replicas before the start of the tests. With the current setup, the replicas may not complete the sync with primaries and can be in loading state. In some cases, the role of replicas can still be master with the delay of propagation of replicate command. The tests can show flaky behavior in such cases. Add a check that verifies the nodes health status 'online' for the cluster consistency. Leverage the deterministic order of `CLUSTER SLOTS` to consider the cluster as consistent along with the nodes health status. --------- Signed-off-by: Harkrishn Patro <harkrisp@amazon.com> Signed-off-by: Ram Prasad Voleti <ramvolet@amazon.com> Co-authored-by: Harkrishn Patro <harkrisp@amazon.com> Co-authored-by: Ram Prasad Voleti <ramvolet@amazon.com>	2024-04-08 20:03:56 -07:00
bentotten	b3aaa0a136	When one shard, sole primary node marks potentially failed replica as FAIL instead of PFAIL (#12824 ) Fixes issue where a single primary cannot mark a replica as failed in a single-shard cluster.	2024-01-11 15:48:19 -08:00
Chen Tianjie	22a29935ff	Support TLS service when "tls-cluster" is not enabled and persist both plain and TLS port in nodes.conf (#12233 ) Originally, when "tls-cluster" is enabled, `port` is set to TLS port. In order to support non-TLS clients, `pport` is used to propagate TCP port across cluster nodes. However when "tls-cluster" is disabled, `port` is set to TCP port, and `pport` is not used, which means the cluster cannot provide TLS service unless "tls-cluster" is on. ``` typedef struct { // ... uint16_t port; /* Latest known clients port (TLS or plain). / uint16_t pport; / Latest known clients plaintext port. Only used if the main clients port is for TLS. / // ... } clusterNode; ``` ``` typedef struct { // ... uint16_t port; / TCP base port number. / uint16_t pport; / Sender TCP plaintext port, if base port is TLS */ // ... } clusterMsg; ``` This PR renames `port` and `pport` in `clusterNode` to `tcp_port` and `tls_port`, to record both ports no matter "tls-cluster" is enabled or disabled. This allows to provide TLS service to clients when "tls-cluster" is disabled: when displaying cluster topology, or giving `MOVED` error, server can provide TLS or TCP port according to client's connection type, no matter what type of connection cluster bus is using. For backwards compatibility, `port` and `pport` in `clusterMsg` are preserved, when "tls-cluster" is enabled, `port` is set to TLS port and `pport` is set to TCP port, when "tls-cluster" is disabled, `port` is set to TCP port and `pport` is set to TLS port (instead of 0). Also, in the nodes.conf file, a new aux field displaying an extra port is added to complete the persisted info. We may have `tls_port=xxxxx` or `tcp_port=xxxxx` in the aux field, to complete the cluster topology, while the other port is stored in the normal `<ip>:<port>` field. The format is shown below. ``` <node-id> <ip>:<tcp_port>@<cport>,<hostname>,shard-id=...,tls-port=6379 myself,master - 0 0 0 connected 0-1000 ``` Or we can switch the position of two ports, both can be correctly resolved. ``` <node-id> <ip>:<tls_port>@<cport>,<hostname>,shard-id=...,tcp-port=6379 myself,master - 0 0 0 connected 0-1000 ```	2023-06-26 07:43:38 -07:00
Madelyn Olson	73cf0243df	Make nodename test more consistent (#12330 ) To determine when everything was stable, we couldn't just query the nodename since they aren't API visible by design. Instead, we were using a proxy piece of information which was bumping the epoch and waiting for everyone to observe that. This works for making source Node 0 and Node 1 had pinged, and Node 0 and Node 2 had pinged, but did not guarantee that Node 1 and Node 2 had pinged. Although unlikely, this can cause this failure message. To fix it I hijacked hostnames and used its validation that it has been propagated, since we know that it is stable. I also noticed while stress testing this sometimes the test took almost 4.5 seconds to finish, which is really close to the current 5 second limit of the log check, so I bumped that up as well just to make it a bit more consistent.	2023-06-20 18:00:55 -07:00
Wen Hui	070453eef3	Cluster human readable nodename feature (#9564 ) This PR adds a human readable name to a node in clusters that are visible as part of error logs. This is useful so that admins and operators of Redis cluster have better visibility into failures without having to cross-reference the generated ID with some logical identifier (such as pod-ID or EC2 instance ID). This is mentioned in #8948. Specific nodenames can be set by using the variable cluster-announce-human-nodename. The nodename is gossiped using the clusterbus extension in #9530. Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>	2023-06-17 21:16:51 -07:00
Binbin	aa2403ca98	Fix redis-cli cluster test timing issue (#11887 ) This test fails sporadically: ``` *** [err]: Migrate the last slot away from a node using redis-cli in tests/unit/cluster/cli.tcl cluster size did not reach a consistent size 4 ``` I guess the time (5s) of wait_for_cluster_size is not enough, usually, the waiting time for our other tests for cluster consistency is 50s, so also changing it to 50s.	2023-03-26 08:46:58 +03:00
Brennan	47c493e070	Re-design cluster link send buffer to improve memory management (#11343 ) Re-design cluster link send queue to improve memory management	2022-11-01 19:26:44 -07:00

14 Commits