apparently when running tests in parallel (the default of --clients 16),
there's a chance for two tests to use the same port.
specifically, one test might shutdown a master and still have the
replica up, and then another test will re-use the port number of master
for another master, and then that replica will connect to the master of
the other test.
this can cause a master to count too many full syncs and fail a test if
we run the tests with --single integration/psync2 --loop --stop
see Probmem 2 in #7314
After adjustMeaningfulReplOffset(), all the other related variable
should be updated, including server.second_replid_offset.
Or the old version redis like 5.0 may receive wrong data from
replication stream, cause redis 5.0 can sync with redis 6.0,
but doesn't know meaningful offset.
Otherwise we run into that:
Backtrace:
src/redis-server 127.0.0.1:21322(logStackTrace+0x45)[0x479035]
src/redis-server 127.0.0.1:21322(sigsegvHandler+0xb9)[0x4797f9]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x11390)[0x7fd373c5e390]
src/redis-server 127.0.0.1:21322(_serverAssert+0x6a)[0x47660a]
src/redis-server 127.0.0.1:21322(freeReplicationBacklog+0x42)[0x451282]
src/redis-server 127.0.0.1:21322[0x4552d4]
src/redis-server 127.0.0.1:21322[0x4c5593]
src/redis-server 127.0.0.1:21322(aeProcessEvents+0x2e6)[0x42e786]
src/redis-server 127.0.0.1:21322(aeMain+0x1d)[0x42eb0d]
src/redis-server 127.0.0.1:21322(main+0x4c5)[0x42b145]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0)[0x7fd3738a3830]
src/redis-server 127.0.0.1:21322(_start+0x29)[0x42b409]
Since we disconnect all the replicas and free the replication backlog in
certain replication paths, and the code that will free the replication
backlog expects that no replica is connected.
However we still need to free the replicas asynchronously in certain
cases, as documented in the top comment of disconnectSlaves().
Citing from the issue:
btw I suggest we change this fix to something else:
* We revert the fix.
* We add a call that disconnects chained replicas in the place where we trim the replica (that is a master i this case) offset.
This way we can avoid disconnections when there is no trimming of the backlog.
Note that we now want to disconnect replicas asynchronously in
disconnectSlaves(), because it's in general safer now that we can call
it from freeClient(). Otherwise for instance the command:
CLIENT KILL TYPE master
May crash: clientCommand() starts running the linked of of clients,
looking for clients to kill. However it finds the master, kills it
calling freeClient(), but this in turn calls replicationCacheMaster()
that may also call disconnectSlaves() now. So the linked list iterator
of the clientCommand() will no longer be valid.
After adjustMeaningfulReplOffset(), all the other related variable
should be updated, including server.second_replid_offset.
Or the old version redis like 5.0 may receive wrong data from
replication stream, cause redis 5.0 can sync with redis 6.0,
but doesn't know meaningful offset.