9524 Commits

Author SHA1 Message Date
Wen Hui
30ead1edae fix memory leak in ACLLoadFromFile error handling (#7623) 2020-08-08 14:42:32 +03:00
Wang Yuan
921c633df9 Fix applying zero offset to null pointer when creating moduleFreeContextReusedClient (#7323)
Before this fix we where attempting to select a db before creating db the DB, see: #7323

This issue doesn't seem to have any implications, since the selected DB index is 0,
the db pointer remains NULL, and will later be correctly set before using this dummy
client for the first time.

As we know, we call 'moduleInitModulesSystem()' before 'initServer()'. We will allocate
memory for server.db in 'initServer', but we call 'createClient()' that will call 'selectDb()'
in 'moduleInitModulesSystem()', before the databases where created. Instead, we should call
'createClient()' for moduleFreeContextReusedClient after 'initServer()'.
2020-08-08 14:36:41 +03:00
xuannianz
d478dc1c0c remove superfluous else block (#7620)
The else block would be executed when newlen == 0 and in the case memmove won't be called, so there's no need to set start.
2020-08-08 00:19:18 +03:00
fayadexinqing
0054ba666f fix migration's broadcast PONG message, after the slot modification (#7590) 2020-08-07 13:01:14 -07:00
Oran Agra
cad93ed273 Accelerate diskless master connections, and general re-connections (#6271)
Diskless master has some inherent latencies.
1) fork starts with delay from cron rather than immediately
2) replica is put online only after an ACK. but the ACK
   was sent only once a second.
3) but even if it would arrive immediately, it will not
   register in case cron didn't yet detect that the fork is done.

Besides that, when a replica disconnects, it doesn't immediately
attempts to re-connect, it waits for replication cron (one per second).
in case it was already online, it may be important to try to re-connect
as soon as possible, so that the backlog at the master doesn't vanish.

In case it disconnected during rdb transfer, one can argue that it's
not very important to re-connect immediately, but this is needed for the
"diskless loading short read" test to be able to run 100 iterations in 5
seconds, rather than 3 (waiting for replication cron re-connection)

changes in this commit:
1) sync command starts a fork immediately if no sync_delay is configured
2) replica sends REPLCONF ACK when done reading the rdb (rather than on 1s cron)
3) when a replica unexpectedly disconnets, it immediately tries to
   re-connect rather than waiting 1s
4) when when a child exits, if there is another replica waiting, we spawn a new
   one right away, instead of waiting for 1s replicationCron.
5) added a call to connectWithMaster from replicationSetMaster. which is called
   from the REPLICAOF command but also in 3 places in cluster.c, in all of
   these the connection attempt will now be immediate instead of delayed by 1
   second.

side note:
we can add a call to rdbPipeReadHandler in replconfCommand when getting
a REPLCONF ACK from the replica to solve a race where the replica got
the entire rdb and EOF marker before we detected that the pipe was
closed.
in the test i did see this race happens in one about of some 300 runs,
but i concluded that this race is unlikely in real life (where the
replica is on another host and we're more likely to first detect the
pipe was closed.
the test runs 100 iterations in 3 seconds, so in some cases it'll take 4
seconds instead (waiting for another REPLCONF ACK).

Removing unneeded startBgsaveForReplication from updateSlavesWaitingForBgsave
Now that CheckChildrenDone is calling the new replicationStartPendingFork
(extracted from serverCron) there's actually no need to call
startBgsaveForReplication from updateSlavesWaitingForBgsave anymore,
since as soon as updateSlavesWaitingForBgsave returns, CheckChildrenDone is
calling replicationStartPendingFork that handles that anyway.
The code in updateSlavesWaitingForBgsave had a bug in which it ignored
repl-diskless-sync-delay, but removing that code shows that this bug was
hiding another bug, which is that the max_idle should have used >= and
not >, this one second delay has a big impact on my new test.
2020-08-06 16:53:06 +03:00
Oran Agra
10bfd2fc0d Fix potential race in bugReportStart
this race would only happen when two threads paniced at the same time,
and even then the only consequence is some extra log lines.

race reported in #7391
2020-08-06 16:47:27 +03:00
Oran Agra
58ca0b0be2 Assertion and panic, print crash log without generating SIGSEGV
This makes it possible to add tests that generate assertions, and run
them with valgrind, making sure that there are no memory violations
prior to the assertion.

New config options:
- crash-log-enabled - can be disabled for cleaner core dumps
- crash-memcheck-enabled - useful for faster termination after a crash
- use-exit-on-panic - to be used by the test suite so that valgrind can
  detect leaks and memory corruptions

Other changes:
- Crash log is printed even on system that dont HAVE_BACKTRACE, i.e. in
  both SIGSEGV and assert / panic
- Assertion and panic won't print registers and code around EIP (which
  was useless), but will do fast memory test (which may still indicate
  that the assertion was due to memory corrpution)

I had to reshuffle code in order to re-use it, so i extracted come code
into function without actually doing any changes to the code:
- logServerInfo
- logModulesInfo
- doFastMemoryTest (with the exception of it being conditional)
- dumpCodeAroundEIP

changes to the crash report on segfault:
- logRegisters is called right after the stack trace (before info) done
  just in order to have more re-usable code
- stack trace skips the first two items on the stack (the crash log and
  signal handler functions)
2020-08-06 16:47:27 +03:00
Itamar Haber
04db7fd8e9 Merge pull request #7092 from itamarhaber/fix-5629
Prevents default save configuration being reset...
2020-08-05 21:16:38 +03:00
Oran Agra
f519dcb216 redis-cli --cluster-yes - negate force flag for clarity
this internal flag is there so that some commands do not comply to `--cluster-yes`
2020-08-05 18:30:43 +03:00
Frank Meier
c6ac2588db reintroduce REDISCLI_CLUSTER_YES env variable in redis-cli
the variable was introduced only in the 5.0 branch in #5879 bc6c1c40db
2020-08-05 18:30:43 +03:00
Frank Meier
c8f3182f37 add force option to 'create-cluster create' script call (#7612) 2020-08-05 12:06:33 +03:00
WuYunlong
2c68d2d5c7 Fix tests/cluster/cluster.tcl about wrong usage of lrange. (#6702) 2020-08-04 18:00:58 +03:00
Tyson Andre
486e39e86e Add a ZMSCORE command returning an array of scores. (#7593)
Syntax: `ZMSCORE KEY MEMBER [MEMBER ...]`

This is an extension of #2359
amended by Tyson Andre to work with the changed unstable API,
add more tests, and consistently return an array.

- It seemed as if it would be more likely to get reviewed
  after updating the implementation.

Currently, multi commands or lua scripting to call zscore multiple times
would almost definitely be less efficient than a native ZMSCORE
for the following reasons:

- Need to fetch the set from the string every time instead of reusing the C
  pointer.
- Using pipelining or multi-commands would result in more bytes sent by
  the client for the repeated `ZMSCORE KEY` sections.
- Need to specially encode the data and decode it from the client
  for lua-based solutions.
- The fastest solution I've seen for large sets(thousands or millions)
  involves lua and a variadic ZADD, then a ZINTERSECT, then a ZRANGE 0 -1,
  then UNLINK of a temporary set (or lua). This is still inefficient.

Co-authored-by: Tyson Andre <tysonandre775@hotmail.com>
2020-08-04 17:49:33 +03:00
Oran Agra
191b118102 fix new rdb test failing on timing issues (#7604)
apparenlty on github actions sometimes 500ms is not enough
2020-08-04 08:53:50 +03:00
hujie
46bff240bf remove duplicate semicolon (#7438) 2020-08-02 13:59:51 +03:00
Yossi Gottlieb
e97cec2f28 Fix test-centos7-tls daily job. (#7598) 2020-07-31 13:55:57 +03:00
Oran Agra
c5d85c69c7 module hook for master link up missing on successful psync (#7584)
besides, hooks test was time sensitive. when the replica managed to
reconnect quickly after the client kill, the test would fail
2020-07-31 13:14:29 +03:00
Oran Agra
27a8ded3db Remove dead code from update_zmalloc_stat_alloc (#7589)
this seems like leftover from before 9d8ceca
2020-07-31 13:01:39 +03:00
Yossi Gottlieb
92e089b1ab CI: Add daily CentOS 7.x jobs. (#7582) 2020-07-30 13:25:10 +03:00
WuYunlong
be11e1b5ea Fix running single test 14-consistency-check.tcl (#7587) 2020-07-30 08:56:21 +03:00
fayadexinqing
323d92b689 broadcast a PONG message when slot's migration is over, which may reduce the moved request of clients (#7571) 2020-07-29 18:05:27 -07:00
Kevin McGehee
b6acefddb7 Call propagate instead of writing directly to AOF/replicas (#6658)
Use higher-level API to funnel all generic propagation through
single function call.
2020-07-29 17:54:37 -07:00
Yossi Gottlieb
342d9a642f Clarify RM_BlockClient() error condition. (#6093) 2020-07-29 17:03:38 +03:00
Arun Ranganathan
444b53e640 Show threading configuration in INFO output (#7446)
Co-authored-by: Oran Agra <oran@redislabs.com>
2020-07-29 08:46:44 +03:00
namtsui
8c03eb90da Avoid an out-of-bounds read in the redis-sentinel (#7443)
The Redis sentinel would crash with a segfault after a few minutes
because it tried to read from a page without read permissions. Check up
front whether the sds is long enough to contain redis:slave or
redis:master before memcmp() as is done everywhere else in
sentinelRefreshInstanceInfo().

Bug report and commit message from Theo Buehler. Fix from Nam Nguyen.

Co-authored-by: Nam Nguyen <namn@berkeley.edu>
2020-07-29 08:25:56 +03:00
Wen Hui
0a2b019b79 Add SignalModifiedKey hook in XGROUP CREATE with MKSTREAM option (#7562) 2020-07-29 08:22:54 +03:00
Wen Hui
2afa308306 fix leak in error handling of debug populate command (#7062)
valsize was not modified during the for loop below instead of getting from c->argv[4], therefore there is no need to put inside the for loop.. Moreover, putting the check outside loop will also avoid memory leaking, decrRefCount(key) should be called in the original code if we put the check in for loop
2020-07-28 22:05:48 +03:00
Yossi Gottlieb
675b00c7e0 Fix TLS cluster tests. (#7578)
Fix consistency test added in 0c9916d00 without considering TLS
redis-cli configuration.
2020-07-28 14:04:06 +03:00
Yossi Gottlieb
bc450c5f63 TLS: Propagate and handle SSL_new() failures. (#7576)
The connection API may create an accepted connection object in an error
state, and callers are expected to check it before attempting to use it.

Co-authored-by: mrpre <mrpre@163.com>
2020-07-28 11:32:47 +03:00
Oran Agra
06aaeabaea Fix failing tests due to issues with wait_for_log_message (#7572)
- the test now waits for specific set of log messages rather than wait for
  timeout looking for just one message.
- we don't wanna sample the current length of the log after an action, due
  to a race, we need to start the search from the line number of the last
  message we where waiting for.
- when attempting to trigger a full sync, use multi-exec to avoid a race
  where the replica manages to re-connect before we completed the set of
  actions that should force a full sync.
- fix verify_log_message which was broken and unused
2020-07-28 11:15:29 +03:00
Jiayuan Chen
198770751f Add optional tls verification (#7502)
Adds an `optional` value to the previously boolean `tls-auth-clients` configuration keyword.

Co-authored-by: Yossi Gottlieb <yossigo@gmail.com>
2020-07-28 10:45:21 +03:00
Yossi Gottlieb
b03473f0bc TLS: support cluster/replication without tls-port. (#7573)
Initialize and configure OpenSSL even when tls-port is not used, because
we may still have tls-cluster or tls-replication.

Also, make sure to reconfigure OpenSSL when these parameters are changed
as TLS could have been enabled for the first time.
2020-07-27 15:31:53 +03:00
Oran Agra
62e84b42d2 Daily github action: run cluster and sentinel tests with tls (#7575) 2020-07-27 15:30:36 +03:00
Yossi Gottlieb
b76a93c362 TLS: support cluster/replication without tls-port.
Initialize and configure OpenSSL even when tls-port is not used, because
we may still have tls-cluster or tls-replication.

Also, make sure to reconfigure OpenSSL when these parameters are changed
as TLS could have been enabled for the first time.
2020-07-27 13:26:02 +03:00
grishaf
f8751d03ba Fix prepareForShutdown function declaration (#7566) 2020-07-26 08:27:30 +03:00
zhaozhao.zz
f4e2525c86 more strict check in rioConnRead (#7564) 2020-07-24 14:40:19 +08:00
Oran Agra
49d4aebce0 Stabilize bgsave test that sometimes fails with valgrind (#7559)
on ci.redis.io the test fails a lot, reporting that bgsave didn't end.
increaseing the timeout we wait for that bgsave to get aborted.
in addition to that, i also verify that it indeed got aborted by
checking that the save counter wasn't reset.

add another test to verify that a successful bgsave indeed resets the
change counter.
2020-07-23 13:06:24 +03:00
Meir Shpilraien (Spielrein)
73198c5019 This PR introduces a new loaded keyspace event (#7536)
Co-authored-by: Oran Agra <oran@redislabs.com>
Co-authored-by: Itamar Haber <itamar@redislabs.com>
2020-07-23 12:38:51 +03:00
Oran Agra
4a118ab30b Fix harmless bug in rioConnRead (#7557)
this code is in use only if the master is disk-based, and the replica is
diskless. In this case we use a buffered reader, but we must avoid reading
past the rdb file, into the command stream. which Luckly rdb.c doesn't
really attempt to do (it knows how much it should read).

When rioConnRead detects that the extra buffering attempt reaches beyond
the read limit it should read less, but if the caller actually requested
more, then it should return with an error rather than a short read. the
bug would have resulted in short read.

in order to fix it, the code must consider the real requested size, and
not the extra buffering size.
2020-07-23 12:37:43 +03:00
Brian P O'Rourke
7b4a39264e Add contribution guidelines for vulnerability reports 2020-07-22 16:24:39 +03:00
Brian P O'Rourke
a6de21b1a1 Outdent github issues guidelines 2020-07-22 16:24:39 +03:00
Sungho Hwang
63c5bfb6ef Fix typo in deps/README.md (#7553) 2020-07-22 11:33:57 +03:00
Madelyn Olson
9615c7480d Properly reset errno for rdbLoad (#7542) 2020-07-21 17:00:13 -07:00
Oran Agra
bb170fa06e testsuite may leave servers alive on error (#7549)
in cases where you have
test name {
  start_server {
    start_server {
      assert
    }
  }
}

the exception will be thrown to the test proc, and the servers are
supposed to be killed on the way out. but it seems there was always a
bug of not cleaning the server stack, and recently (#7404) we started
relying on that stack in order to kill them, so with that bug sometimes
we would have tried to kill the same server twice, and leave one alive.

luckly, in most cases the pattern is:
start_server {
  test name {
  }
}
2020-07-21 16:56:19 +03:00
Yossi Gottlieb
dbc0a64843 Tests: drop TCL 8.6 dependency. (#7548)
This re-implements the redis-cli --pipe test so it no longer depends on a close feature available only in TCL 8.6.

Basically what this test does is run redis-cli --pipe, generates a bunch of commands and pipes them through redis-cli, and inspects the result in both Redis and the redis-cli output.

To do that, we need to close stdin for redis-cli to indicate we're done so it can flush its buffers and exit. TCL has bi-directional channels can only offers a way to "one-way close" a channel with TCL 8.6. To work around that, we now generate the commands into a file and feed that file to redis-cli directly.

As we're writing to an actual file, the number of commands is now reduced.
2020-07-21 14:17:14 +03:00
Oran Agra
a472f35efd Fixes to release scripts (#7547) 2020-07-21 14:07:06 +03:00
WuYunlong
2054cf46c4 Clarification on the bug that was fixed in PR #7539. (#7541)
Before that PR, processCommand() did not notice that cmd could be a module
command in which case getkeys_proc member has a different meaning.

The outcome was that a module command which doesn't take any key names in its
arguments (similar to SLOWLOG) would be handled as if it might have key name arguments
(similar to MEMORY), would consider cluster redirect but will end up with 0 keys
after an excessive call to getKeysFromCommand, and eventually do the right thing.
2020-07-21 09:41:44 +03:00
Remi Collet
7853d8410b Fix deprecated tail syntax in tests (#7543) 2020-07-21 09:07:54 +03:00
Wen Hui
0b8d47a985 Add missing calls to raxStop (#7532)
Since the dynamic allocations in raxIterator are only used for deep walks, memory
leak due to missing call to raxStop can only happen for rax with key names longer
than 32 bytes.

Out of all the missing calls, the only ones that may lead to a leak are the rax
for consumer groups and consumers, and these were only in AOFRW and rdbSave, which
normally only happen in fork or at shutdown.
2020-07-21 08:13:05 +03:00
Wen Hui
e035e5218f add missing caching command in client help (#7399) 2020-07-20 18:53:03 -07:00