futriix

Author	SHA1	Message	Date
Madelyn Olson	b728e4170f	Disable empty shard slot migration test until test is de-flaked (#859 ) We have a number of test failures in the empty shard migration which seem to be related to race conditions in the failover, but could be more pervasive. For now disable the tests to prevent so many false negative test failures. Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>	2024-07-31 16:52:20 -07:00
w. ian douglas	b59762f734	Very minor misspelling in some tests (#705 ) Fix misspelling "faiover" instead of "failover" in two test cases. Signed-off-by: w. ian douglas <ian.douglas@iandouglas.com>	2024-06-28 23:56:30 +02:00
Ping Xie	5d9d41868d	Replace `DEBUG RESTART` with `pause_server` and `resume_server` (#652 )	2024-06-13 17:52:50 -07:00
Ping Xie	aad6769a80	Replicate slot migration states via RDB aux fields (#586 )	2024-06-07 20:32:27 -07:00
Madelyn Olson	b95e7c384f	Skip tls for xgroup read regression since it doesn't matter (#595 ) "Client blocked on XREADGROUP while stream's slot is migrated" uses the migrate command, which requires special handling for TLS and non-tls. This was not being handled, so was throwing an error. Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>	2024-06-03 11:49:15 -07:00
nitaicaro	6fb90adf4b	Fix crash where command duration is not reset when client is blocked … (#526 ) In #11012, we changed the way command durations were computed to handle the same command being executed multiple times. In #11970, we added an assert if the duration is not properly reset, potentially indicating that a call to report statistics was missed. I found an edge case where this happens - easily reproduced by blocking a client on `XGROUPREAD` and migrating the stream's slot. This causes the engine to process the `XGROUPREAD` command twice: 1. First time, we are blocked on the stream, so we wait for unblock to come back to it a second time. In most cases, when we come back to process the command second time after unblock, we process the command normally, which includes recording the duration and then resetting it. 2. After unblocking we come back to process the command, and this is where we hit the edge case - at this point, we had already migrated the slot to another node, so we return a `MOVED` response. But when we do that, we don’t reset the duration field. Fix: also reset the duration when returning a `MOVED` response. I think this is right, because the client should redirect the command to the right node, which in turn will calculate the execution duration. Also wrote a test which reproduces this, it fails without the fix and passes with it. --------- Signed-off-by: Nitai Caro <caronita@amazon.com> Co-authored-by: Nitai Caro <caronita@amazon.com>	2024-05-30 12:55:00 -07:00
Ping Xie	e4ead9442b	Make CLUSTER SETSLOT with TIMEOUT 0 block indefinitely (#556 ) This aligns the behaviour with established Valkey commands with a TIMEOUT argument, such as BLPOP. Fix #422 Signed-off-by: Ping Xie <pingxie@google.com>	2024-05-27 07:11:24 -07:00
Viktor Söderqvist	d72ba06dd0	Make cluster replicas return ASK and TRYAGAIN (#495 ) After READONLY, make a cluster replica behave as its primary regarding returning ASK redirects and TRYAGAIN. Without this patch, a client reading from a replica cannot tell if a key doesn't exist or if it has already been migrated to another shard as part of an ongoing slot migration. Therefore, without an ASK redirect in this situation, offloading reads to cluster replicas wasn't reliable. Note: The target of a redirect is always a primary. If a client wants to continue reading from a replica after following a redirect, it needs to figure out the replicas of that new primary using CLUSTER SHARDS or similar. This is related to #21 and has been made possible by the introduction of Replication of Slot Migration States in #445. ---- Release notes: During cluster slot migration, replicas are able to return -ASK redirects and -TRYAGAIN. --------- Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>	2024-05-24 17:58:03 +02:00
Ping Xie	fd53f17a61	Use pause_process to stop a node to make Valgrind happy, hopefully (#508 ) Signed-off-by: Ping Xie <pingxie@google.com>	2024-05-16 22:59:00 -07:00
Ping Xie	ac47ca2d47	Suppress ASAN errors on tests that intentially crash the server via `crash-memcheck-enabled no` (#489 ) Fix daily CI run errors like https://github.com/valkey-io/valkey/actions/runs/9039450198/job/24842308071#step:6:4176 Signed-off-by: Ping Xie <pingxie@google.com>	2024-05-12 16:08:47 -07:00
Ping Xie	6e7af9471c	Slot migration improvement (#445 )	2024-05-06 21:40:28 -07:00

11 Commits