Before this commit, when upgrading a replica, expired keys will not
be loaded, thus causing replica having less keys in db. To this point,
master and replica's keys is logically consistent. However, before
the keys in master and replica are physically consistent, that is,
they have the same dbsize, if master got a problem and the replica
got promoted and becomes new master of that partition, and master
updates a key which does not exist on master, but physically exists
on the old master(new replica), the old master would refuse to update
the key, thus causing master and replica data inconsistent.
How could this happen?
That's all because of the wrong judgement of roles while starting up
the server. We can not use server.masterhost to judge if the server
is master or replica, since it fails in cluster mode.
When we start the server, we load rdb and do want to load expired keys,
and do not want to have the ability to active expire keys, if it is
a replica.
When using TLS with a Redis.conf file the line for TLS reading tls-cert-file redis.crt tls-key-file redis.key is interpreted as one complete directive. I am separating this on two separate lines to improve usability so users do not get the below error.
ubuntu@ip-172-31-29-250:~/redis-6.0-rc1$ ./src/redis-server redis.conf
*** FATAL CONFIG FILE ERROR ***
Reading the configuration file, at line 145
>>> 'tls-cert-file redis.crt tls-key-file redis.key'
wrong number of arguments
ubuntu@ip-172-31-29-250:~/redis-6.0-rc1$ vi redis.conf
ubuntu@ip-172-31-29-250:~/redis-6.0-rc1$ ./src/redis-server redis.conf
23085:C 04 Mar 2020 01:58:12.631 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
23085:C 04 Mar 2020 01:58:12.631 # Redis version=5.9.101, bits=64, commit=00000000, modified=0, pid=23085, just started
23085:C 04 Mar 2020 01:58:12.631 # Configuration loaded
23085:M 04 Mar 2020 01:58:12.632 * Increased maximum number of open files to 10032 (it was originally set to 1024).
"Partial Resynchronization" is a special variant of replication success
that we have to tell systemd about if it is managing redis-server via a
Type=Notify service unit.
*** [err]: PSYNC2: total sum of full synchronizations is exactly 4 in tests/integration/psync2.tcl
Expected 5 == 4 (context: type eval line 6 cmd {assert {$sum == 4}} proc ::test)
issue was that sometime the test got an unexpected full sync since it
tried to switch to the replica before it was in sync with it's master.
The callback approach we took is very efficient, the module can do any
filtering of keys without building any list and cloning strings, it can
also read data from the key's value. but if the user tries to re-open
the key, or any other key, this can cause dict re-hashing (dictFind does
that), and that's very bad to do from inside dictScan.
this commit protects the dict from doing any rehashing during scan, but
also warns the user not to attempt any writes or command calls from
within the callback, for fear of unexpected side effects and crashes.