with the original version of 6.0.0, this test detects an excessive full
sync.
with the fix in 146201c69, this test detects memory corruption,
especially when using libc allocator with or without valgrind.
This was broken in 146201c: we identified a crash in the CI, what
was happening before the fix should be like that:
1. The client gets in the async free list.
2. However freeClient() gets called again against the same client
which is a master.
3. The client arrived in freeClient() with the CLOSE_ASAP flag set.
4. The master gets cached, but NOT removed from the CLOSE_ASAP linked
list.
5. The master client that was cached was immediately removed since it
was still in the list.
6. Redis accessed a freed cached master.
This is how the crash looked like:
=== REDIS BUG REPORT START: Cut & paste starting from here ===
1092:S 16 May 2020 11:44:09.731 # Redis 999.999.999 crashed by signal: 11
1092:S 16 May 2020 11:44:09.731 # Crashed running the instruction at: 0x447e18
1092:S 16 May 2020 11:44:09.731 # Accessing address: 0xffffffffffffffff
1092:S 16 May 2020 11:44:09.731 # Failed assertion: (:0)
------ STACK TRACE ------
EIP:
src/redis-server 127.0.0.1:21300(readQueryFromClient+0x48)[0x447e18]
And the 0xffff address access likely comes from accessing an SDS that is
set to NULL (we go -1 offset to read the header).
Seems like on some systems choosing specific TLS v1/v1.1 versions no
longer works as expected. Test is reduced for v1.2 now which is still
good enough to test the mechansim, and matters most anyway.
This is really required only for older OpenSSL versions.
Also, at the moment Redis does not use OpenSSL from multiple threads so
this will only be useful if modules end up doing that.
The context is issue #7205: since the introduction of threaded I/O we close
clients asynchronously by default from readQueryFromClient(). So we
should no longer prevent the caching of the master client, to later
PSYNC incrementally, if such flags are set. However we also don't want
the master client to be cached with such flags (would be closed
immediately after being restored). And yet we want a way to understand
if a master was closed because of a protocol error, and in that case
prevent the caching.
This bug was introduced by a recent change in which readQueryFromClient
is using freeClientAsync, and despite the fact that now
freeClientsInAsyncFreeQueue is in beforeSleep, that's not enough since
it's not called during loading in processEventsWhileBlocked.
furthermore, afterSleep was called in that case but beforeSleep wasn't.
This bug also caused slowness sine the level-triggered mode of epoll
kept signaling these connections as readable causing us to keep doing
connRead again and again for ll of these, which keep accumulating.
now both before and after sleep are called, but not all of their actions
are performed during loading, some are only reserved for the main loop.
fixes issue #7215
this test which has coverage for varoius flows of diskless master was
failing randomly from time to time.
the failure was:
[err]: diskless all replicas drop during rdb pipe in tests/integration/replication.tcl
log message of '*Diskless rdb transfer, last replica dropped, killing fork child*' not found
what seemed to have happened is that the master didn't detect that all
replicas dropped by the time the replication ended, it thought that one
replica is still connected.
now the test takes a few seconds longer but it seems stable.
We want to send pings and pongs at specific intervals, since our packets
also contain information about the configuration of the cluster and are
used for gossip. However since our cluster bus is used in a mixed way
for data (such as Pub/Sub or modules cluster messages) and metadata,
sometimes a very busy channel may delay the reception of pong packets.
So after discussing it in #7216, this commit introduces a new field that
is not exposed in the cluster, is only an internal information about
the last time we received any data from a given node: we use this field
in order to avoid detecting failures, claiming data reception of new
data from the node is a proof of liveness.
This works because this struct is never referenced by its name,
but always by its type.
This prevents a conflict with struct user from <sys/user.h>
when compiling against uclibc.
Signed-off-by: Titouan Christophe <titouan.christophe@railnova.eu>
Currently, there are several types of threads/child processes of a
redis server. Sometimes we need deeply optimise the performance of
redis, so we would like to isolate threads/processes.
There were some discussion about cpu affinity cases in the issue:
https://github.com/antirez/redis/issues/2863
So implement cpu affinity setting by redis.conf in this patch, then
we can config server_cpulist/bio_cpulist/aof_rewrite_cpulist/
bgsave_cpulist by cpu list.
Examples of cpulist in redis.conf:
server_cpulist 0-7:2 means cpu affinity 0,2,4,6
bio_cpulist 1,3 means cpu affinity 1,3
aof_rewrite_cpulist 8-11 means cpu affinity 8,9,10,11
bgsave_cpulist 1,10-11 means cpu affinity 1,10,11
Test on linux/freebsd, both work fine.
Signed-off-by: zhenwei pi <pizhenwei@bytedance.com>
When deffered reply is added the previous reply node cannot be used so
all the extra space we allocated in it is wasted. in case someone uses
deffered replies in a loop, each time adding a small reply, each of
these reply nodes (the small string reply) would have consumed a 16k
block.
now when we add anther diferred reply node, we trim the unused portion
of the previous reply block.
see #7123
cherry picked from commit 4ed5b7cb74caf5bef6606909603e371af0da4f9b
with fix to handle a crash with LIBC allocator, which apparently can
return the same pointer despite changing it's size.
i.e. shrinking an allocation of 16k into 56 bytes without changing the
pointer.
* fix memlry leaks with diskless replica short read.
* fix a few timing issues with valgrind runs
* fix issue with valgrind and watchdog schedule signal
about the valgrind WD issue:
the stack trace test in logging.tcl, has issues with valgrind:
==28808== Can't extend stack to 0x1ffeffdb38 during signal delivery for thread 1:
==28808== too small or bad protection modes
it seems to be some valgrind bug with SA_ONSTACK.
SA_ONSTACK seems unneeded since WD is not recursive (SA_NODEFER was removed),
also, not sure if it's even valid without a call to sigaltstack()
We could use uint64_t specific macros, but after all it's simpler to
just use an obvious equivalent type plus casting: this will be a no op
and is simpler than fixed size types printf macros.