History

Optimize failover time when the new primary node is down again (#782 )

We will not reset failover_auth_time after setting it, this is used
to check auth_timeout and auth_retry_time, but we should at least
reset it after a successful failover.

Let's assume the following scenario:
1. Two replicas initiate an election.
2. Replica 1 is elected as the primary node, and replica 2 does not have
   enough votes.
3. Replica 1 is down, ie the new primary node down again in a short
time.
4. Replica 2 know that the new primary node is down and wants to
initiate
   a failover, but because the failover_auth_time of the previous round
   has not been reset, it needs to wait for it to time out and then wait
for the next retry time, which will take cluster-node-timeout * 4 times,
   this adds a lot of delay.

There is another problem. Like we will set additional random time for
failover_auth_time, such as random 500ms and replicas ranking 1s. If
replica 2 receives PONG from the new primary node before sending the
FAILOVER_AUTH_REQUEST, that is, before the failover_auth_time, it will
change itself to a replica. If the new primary node goes down again at
this time, replica 2 will use the previous failover_auth_time to
initiate
an election instead of going through the logic of random 500ms and
replicas ranking 1s again, which may lead to unexpected consequences
(for example, a low-ranking replica initiates an election and becomes
the new primary node).

That is, we need to reset failover_auth_time at the appropriate time.
When the replica switches to a new primary, we reset it, because the
existing failover_auth_time is already out of date in this case.

---------

Signed-off-by: Binbin <binloveplay1314@qq.com>

2024-07-19 15:27:49 -04:00

assets

Introduce enable-debug-assert to enable/disable debug asserts at runtime (#584 )

2024-05-31 22:50:08 -07:00

cluster

Cache CLUSTER SLOTS response for improving throughput and reduced latency. (#53 )

2024-05-22 14:21:41 -07:00

helpers

Dual channel replication (#60 )

2024-07-17 13:59:33 -07:00

integration

Dual channel replication (#60 )

2024-07-17 13:59:33 -07:00

modules

Remove master and slave from source code (#591 )

2024-06-07 14:21:33 -07:00

rdma

Introduce Valkey Over RDMA transport (experimental) (#477 )

2024-07-15 14:04:22 +02:00

sentinel

Replace master-reboot-down-after-period with primary-reboot-down-after-period in sentinel.conf (#647 )

2024-07-15 13:45:40 -04:00

support

Optimize failover time when the new primary node is down again (#782 )

2024-07-19 15:27:49 -04:00

tmp

minor fixes to the new test suite, html doc updated

2010-05-14 18:48:33 +02:00

unit

Optimize failover time when the new primary node is down again (#782 )

2024-07-19 15:27:49 -04:00

instances.tcl

Enable debug asserts for cluster and sentinel tests (#588 )

2024-06-02 13:15:08 -07:00

README.md

Introduce a minimal debugger for .tcl integration test suite. (#683 )

2024-06-25 10:24:53 -07:00

test_helper.tcl

Minor fix for --loops option in normal testing framework (#781 )

2024-07-13 23:25:51 +08:00

README.md

Valkey Test Suite

Overview

Integration tests are written in Tcl, a high-level, general-purpose, interpreted, dynamic programming language [source]. runtest is the main entrance point for running integration tests. For example, to run a single test;

./runtest --single unit/your_test_name
# For additional arguments, you may refer to the `runtest` script itself.

The normal execution mode of the test suite involves starting and manipulating local valkey-server instances, inspecting process state, log files, etc.

The test suite also supports execution against an external server, which is enabled using the --host and --port parameters. When executing against an external server, tests tagged external:skip are skipped.

There are additional runtime options that can further adjust the test suite to match different external server configurations:

Option	Impact
`--singledb`	Only use database 0, don't assume others are supported.
`--ignore-encoding`	Skip all checks for specific encoding.
`--ignore-digest`	Skip key value digest validations.
`--cluster-mode`	Run in strict Valkey Cluster compatibility mode.
`--large-memory`	Enables tests that consume more than 100mb

Debugging

You can set a breakpoint and invoke a minimal debugger using the bp function.

... your test code before break-point
bp 1
... your test code after break-point

The bp 1 will give back the tcl interpreter to the developer, and allow you to interactively print local variables (through puts), run functions and so forth [source]. bp takes a single argument, which is 1 for the case above, and is used to label a breakpoint with a string. Labels are printed out when breakpoints are hit, so you can identify which breakpoint was triggered. Breakpoints can be skipped by setting the global variable ::bp_skip, and by providing the labels you want to skip.

The minimal debugger comes with the following predefined functions.

Press c to continue past the breakpoint.
Press i to print local variables.

Tags

Tags are applied to tests to classify them according to the subsystem they test, but also to indicate compatibility with different run modes and required capabilities.

Tags can be applied in different context levels:

start_server context
tags context that bundles several tests together
A single test context.

The following compatibility and capability tags are currently used:

Tag	Indicates
`external:skip`	Not compatible with external servers.
`cluster:skip`	Not compatible with `--cluster-mode`.
`large-memory`	Test that requires more than 100mb
`tls:skip`	Not compatible with `--tls`.
`needs:repl`	Uses replication and needs to be able to `SYNC` from server.
`needs:debug`	Uses the `DEBUG` command or other debugging focused commands (like `OBJECT REFCOUNT`).
`needs:pfdebug`	Uses the `PFDEBUG` command.
`needs:config-maxmemory`	Uses `CONFIG SET` to manipulate memory limit, eviction policies, etc.
`needs:config-resetstat`	Uses `CONFIG RESETSTAT` to reset statistics.
`needs:reset`	Uses `RESET` to reset client connections.
`needs:save`	Uses `SAVE` or `BGSAVE` to create an RDB file.

When using an external server (--host and --port), filtering using the external:skip tags is done automatically.

When using --cluster-mode, filtering using the cluster:skip tag is done automatically.

When not using --large-memory, filtering using the largemem:skip tag is done automatically.

In addition, it is possible to specify additional configuration. For example, to run tests on a server that does not permit SYNC use:

./runtest --host <host> --port <port> --tags -needs:repl