9781 Commits

Author SHA1 Message Date
Oran Agra
73e0cd5a7d Change THP warning to use madvise rather than never (#7771)
completes 60097d361d4096d3826c7580acffd4053f8a4835
2020-09-09 15:39:57 +03:00
天河
df70120c90 Fix comments of _quicklistSplitNode function. (#4341)
Comments about the behavior of the function where wrong (off by one)
Co-authored-by: Oran Agra <oran@redislabs.com>
2020-09-09 15:28:38 +03:00
Yossi Gottlieb
74bac9610e Fix default/explicit "save" parameter loading. (#7767)
Save parameters should either be default or whatever specified in the
config file. This fixes an issue introduced in #7092 which causes
configuration file settings to be applied on top of the defaults.
2020-09-09 15:12:57 +03:00
Itamar Haber
c13fa0aa36 Documents RM_Call's fmt (#5448)
Improve RM_Call inline documentation about the fmt argument
so that we don't completely depend on the web docs.

Co-authored-by: Oran Agra <oran@redislabs.com>
2020-09-09 15:09:41 +03:00
Jan-Erik Rediger
60097d361d Check that THP is not set to always (madvise is ok) (#4001)
THP can also be set to madvise, in which case it shouldn't cause
problems for Redis since redis (or the allocator) doesn't use madvise
to activate it.
2020-09-09 15:06:04 +03:00
Yossi Gottlieb
e5b1ad413b Tests: clean up stale .cli files. (#7768) 2020-09-09 12:30:43 +03:00
Eran Liberty
7bee51bb5b Allow exec with read commands on readonly replica in cluster (#7766)
There was a bug. Although cluster replicas would allow read commands,
they would not allow a MULTI-EXEC that's composed solely of read commands.
Adds tests for coverage.

Co-authored-by: Oran Agra <oran@redislabs.com>
Co-authored-by: Eran Liberty <eranl@amazon.com>
2020-09-09 09:35:42 +03:00
Oran Agra
f10ef2eb77 Tests: Some fixes for macOS (#7757)
* Tests: Some fixes for macOS

1) cur_test: when restart_server, "no such variable" error occurs
  ./runtest --single integration/rdb
  test {client freed during loading}
      SET ::cur_test
      restart_server
        kill_server
          test "Check for memory leaks (pid $pid)"
          SET ::cur_test
          UNSET ::cur_test
      UNSET ::cur_test // This global variable has been unset.

2) `ps --ppid` not available on macOS platform, can be replaced with
`pgrep -P pid`.

* handle cur_test for nested tests

if there are nested tests and nested servers, we need to restore the
previous value of cur_test when a test exist.

example:
```
test{test 1} {
	start_server {
		test{test 1.1 - master only} {
		}
		start_server {
		    test{test 1.2 - with replication} {
            }
		}
	}
}
```
when `test 1.1 - master only exists`, we're still inside `test 1`

Co-authored-by: Oran Agra <oran@redislabs.com>
2020-09-08 16:28:58 +03:00
Yossi Gottlieb
b3782098ae Fix CONFIG REWRITE of oom-score-adj-values. (#7761) 2020-09-08 16:00:20 +03:00
Oran Agra
610b4ff16a handle cur_test for nested tests
if there are nested tests and nested servers, we need to restore the
previous value of cur_test when a test exist.

example:
```
test{test 1} {
	start_server {
		test{test 1.1 - master only} {
		}
		start_server {
		    test{test 1.2 - with replication} {
            }
		}
	}
}
```
when `test 1.1 - master only exists`, we're still inside `test 1`
2020-09-08 14:12:03 +03:00
Oran Agra
1701f23b1f Add daily CI for MacOS (#7759) 2020-09-08 10:59:25 +03:00
bodong.ybd
e90385e223 Tests: Some fixes for macOS
1) cur_test: when restart_server, "no such variable" error occurs
  ./runtest --single integration/rdb
  test {client freed during loading}
      SET ::cur_test
      restart_server
        kill_server
          test "Check for memory leaks (pid $pid)"
          SET ::cur_test
          UNSET ::cur_test
      UNSET ::cur_test // This global variable has been unset.

2) `ps --ppid` not available on macOS platform, can be replaced with
`pgrep -P pid`.
2020-09-08 14:27:53 +08:00
Oran Agra
541d2709a0 Fix cluster consistency-check test (#7754)
This test was failing from time to time see discussion at the bottom of #7635
This was probably due to timing, the DEBUG SLEEP executed by redis-cli
didn't sleep for enough time.

This commit changes:
1) use SET-ACTIVE-EXPIRE instead of DEBUG SLEEP
2) reduce many `after` sleeps with retry loops to speed up the test.
3) add many comment explaining the different steps of the test and
   it's purpose.
4) config appendonly before populating the volatile keys, so that they'll
   be part of the AOF command stream rather than the preamble RDB portion.

other complications: recently kill_instance switched from SIGKILL to
SIGTERM, and this would sometimes fail since there was an AOFRW running
in the background. now we wait for it to end before attempting the kill.
2020-09-07 18:06:25 +03:00
Yossi Gottlieb
871e85b8a7 Tests: fix unmonitored servers. (#7756)
There is an inherent race condition in port allocation for spawned
servers. If a server fails to start because a port is taken, a new port
is allocated. This fixes a problem where the logs are not truncated and
as a result a large number of unmonitored servers are started.
2020-09-07 17:30:36 +03:00
Oran Agra
470de9a516 fix broken cluster/sentinel tests by recent commit (#7752)
da723a917 added a file for stderr to keep valgrind log but i forgot to
add a similar thing when valgrind isn't being used.
the result is that `glob */err.txt` fails.
2020-09-07 16:26:11 +03:00
Oran Agra
725616534e if diskless repl child is killed, make sure to reap the pid (#7742)
Starting redis 6.0 and the changes we made to the diskless master to be
suitable for TLS, I made the master avoid reaping (wait3) the pid of the
child until we know all replicas are done reading their rdb.

I did that in order to avoid a state where the rdb_child_pid is -1 but
we don't yet want to start another fork (still busy serving that data to
replicas).

It turns out that the solution used so far was problematic in case the
fork child was being killed (e.g. by the kernel OOM killer), in that
case there's a chance that we currently disabled the read event on the
rdb pipe, since we're waiting for a replica to become writable again.
and in that scenario the master would have never realized the child
exited, and the replica will remain hung too.
Note that there's no mechanism to detect a hung replica while it's in
rdb transfer state.

The solution here is to add another pipe which is used by the parent to
tell the child it is safe to exit. this mean that when the child exits,
for whatever reason, it is safe to reap it.

Besides that, i'm re-introducing an adjustment to REPLCONF ACK which was
part of #6271 (Accelerate diskless master connections) but was dropped
when that PR was rebased after the TLS fork/pipe changes (6fd5ff8).
Now that RdbPipeCleanup no longer calls checkChildrenDone, and the ACK
has chance to detect that the child exited, it should be the one to call
it so that we don't have to wait for cron (server.hz) to do that.
2020-09-06 16:43:57 +03:00
Oran Agra
da723a917d Improve valgrind support for cluster tests (#7725)
- redirect valgrind reports to a dedicated file rather than console
- try to avoid killing instances with SIGKILL so that we get the memory
  leak report (killing with SIGTERM before resorting to SIGKILL)
- search for valgrind reports when done, print them and fail the tests
- add --dont-clean option to keep the logs on exit
- fix exit error code when crash is found (would have exited with 0)

changes that affect the normal redis test suite:
- refactor check_valgrind_errors into two functions one to search and
  one to report
- move the search half into util.tcl to serve the cluster tests too
- ignore "address range perms" valgrind warnings which seem non relevant.
2020-09-06 11:11:49 +03:00
Oran Agra
cf22e8eb91 test infra - add durable mode to work around test suite crashing
in some cases a command that returns an error possibly due to a timing
issue causes the tcl code to crash and thus prevents the rest of the
tests from running. this adds an option to make the test proceed despite
the crash.
maybe it should be the default mode some day.
2020-09-06 09:59:19 +03:00
Oran Agra
cc455a710c test infra - wait_done_loading
reduce code duplication in aof.tcl.
move creation of clients into the test so that it can be skipped
2020-09-06 09:59:19 +03:00
Oran Agra
2468c17a32 test infra - flushall between tests in external mode 2020-09-06 09:59:19 +03:00
Oran Agra
5c61f1a6ed test infra - improve test skipping ability
- skip full units
- skip a single test (not just a list of tests)
- when skipping tag, skip spinning up servers, not just the tests
- skip tags when running against an external server too
- allow using multiple tags (split them)
2020-09-06 09:59:19 +03:00
Oran Agra
fc18f16260 test infra - reduce disk space usage
this is important when running a test with --loop
2020-09-06 09:59:19 +03:00
Oran Agra
e783c03dd1 test infra - write test name to logfile 2020-09-06 09:59:19 +03:00
Yossi Gottlieb
94cd74e5de redis-cli: fix writeConn() buffer handling. (#7749)
Fix issues with writeConn() which resulted with corruption of the stream by leaving an extra byte in the buffer. The trigger for this is partial writes or write errors which were not experienced on Linux but reported on macOS.
2020-09-03 18:15:48 +03:00
WuYunlong
44cc2e282f fix wrong comments in redis.conf, change default always-show-logo (#5695)
1. default value of always-show-logo was not consistent with the default in the code
2. comment about cluster-replica-no-failover is wrong since we can only do manually failover upon replicas
3. improve description about always-show-logo
2020-09-03 10:31:18 +03:00
Oran Agra
eca1014fe3 Run active defrag while blocked / loading (#7726)
During long running scripts or loading RDB/AOF, we may need to do some
defragging. Since processEventsWhileBlocked is called periodically at
unknown intervals, and many cron jobs either depend on run_with_period
(including active defrag), or rely on being called at server.hz rate
(i.e. active defrag knows ho much time to run by looking at server.hz),
the whileBlockedCron may have to run a loop triggering the cron jobs in it
(currently only active defrag) several times.

Other changes:
- Adding a test for defrag during aof loading.
- Changing key-load-delay config to take negative values for fractions
  of a microsecond sleep
2020-09-03 08:47:29 +03:00
Pierre Jambet
42dc0e98aa Fix error message for the DEBUG ZIPLIST command (#7745)
DEBUG ZIPLIST <key> currently returns the following error string if the
key is not a ziplist: "ERR Not an sds encoded string.". This looks like
an accidental copy/paste error from the error returned in the else if
branch above where this string is returned if the key is not an sds
string. The command was added in
f898429fe149f476d61270ed4299dd1f8f75ae50 and looking at the commit,
nothing indicates that it is not an accidental typo.

The error string now returns a correct error: "Not a ziplist encoded
object", which accurately describes the error.
2020-09-02 23:27:48 +03:00
Oran Agra
0db61f5649 Print server startup messages after daemonization (#7743)
When redis isn't configured to have a log file, having these prints
before damonization puts them in the calling process stdout rather than
/dev/null
2020-09-02 17:18:09 +03:00
Thandayuthapani
5352220639 Add masters/replicas options to redis-cli --cluster call command (#6491)
* Add master/slave option in --cluster call command

* Update src/redis-cli.c

* Update src/redis-cli.c

Co-authored-by: Itamar Haber <itamar@redislabs.com>
2020-09-02 16:23:49 +03:00
Oran Agra
9b61917d7f fix README about BUILD_WITH_SYSTEMD usage (#7739)
BUILD_WITH_SYSTEMD is an internal variable. Users should use USE_SYSTEMD=yes.
2020-09-01 21:31:37 +03:00
Yossi Gottlieb
d377b116ba Fix double-make issue with make && make install. (#7734)
All user-supplied variables that affect the build should be explicitly
persisted.

Fixes #7254
2020-09-01 10:02:14 +03:00
Yossi Gottlieb
374270d3a0 Backport Lua 5.2.2 stack overflow fix. (#7733)
This fixes the issue described in CVE-2014-5461. At this time we cannot
confirm that the original issue has a real impact on Redis, but it is
included as an extra safety measure.
2020-08-31 20:42:46 +03:00
Leoš Literák
635d6ca639 Update README.md with instructions how to build with systemd support (#7730)
#7728 - update instructions for systemd support
2020-08-31 12:44:09 +03:00
Yossi Gottlieb
ae8420298c Fix oom-score-adj on older distros. (#7724)
Don't assume `ps` handles `-h` to display output without headers and
manually trim headers line from output.
2020-08-30 12:23:47 +03:00
maohuazhu
3155c35180 Optimize __ziplistCascadeUpdate algorithm (#6886)
The previous algorithm is of O(n^2) time complexity.
It would have run through the ziplist entries one by one, each time doing a `realloc` and a
`memmove` (moving the entire tail of the ziplist).

The new algorithm is O(n), it runs over all the records once, computing the size of the `realloc`
needed, then does one `realloc`, and run thought the records again doing many smaller `memmove`s,
each time moving just one record.

So this change reduces many reallocs, and moves each record just once.

Co-authored-by: zhumaohua <zhumaohua@megvii.com>
Co-authored-by: Oran Agra <oran@redislabs.com>
2020-08-28 17:22:35 +03:00
Jim Brunner
2ce59d3a1b Use H/W Monotonic clock and updates to AE (#7644)
Update adds a general source for retrieving a monotonic time.
In addition, AE has been updated to utilize the new monotonic
clock for timer processing.

This performance improvement is **not** enabled in a default build due to various H/W compatibility
concerns, see README.md for details. It does however change the default use of gettimeofday with
clock_gettime and somewhat improves performance.

This update provides the following
1. An interface for retrieving a monotonic clock. getMonotonicUs returns a uint64_t (aka monotime)
   with the number of micro-seconds from an arbitrary point. No more messing with tv_sec/tv_usec.
   Simple routines are provided for measuring elapsed milli-seconds or elapsed micro-seconds (the
   most common use case for a monotonic timer). No worries about time moving backwards.
2. High-speed assembler implementation for x86 and ARM. The standard method for retrieving the
   monotonic clock is POSIX.1b (1993): clock_gettime(CLOCK_MONOTONIC, timespec*). However, most
   modern processors provide a constant speed instruction clock which can be retrieved in a fraction
   of the time that it takes to call clock_gettime. For x86, this is provided by the RDTSC
   instruction. For ARM, this is provided by the CNTVCT_EL0 instruction. As a compile-time option,
   these high-speed timers can be chosen. (Default is POSIX clock_gettime.)
3. Refactor of event loop timers. The timer processing in ae.c has been refactored to use the new
   monotonic clock interface. This results in simpler/cleaner logic and improved performance.
2020-08-28 11:54:10 +03:00
Oran Agra
2640897e3a Fix rejectCommand trims newline in shared error objects, hung clients (#7714)
fe8d6fe74 (released in 6.0.6) has a side effect, when processCommand
rejects a command with pre-made shared object error string, it trims the
newlines from the end of the string. if that string is later used with
addReply, the newline will be missing, breaking the protocol, and
leaving the client hung.

It seems that the only scenario which this happens is when replying with
-LOADING to some command, and later using that reply from the CONFIG
SET command (still during loading). this will result in hung client.

Refactoring the code in order to avoid trimming these newlines from
shared string objects, and do the newline trimming only in other cases
where it's needed.

Co-authored-by: Guy Benoish <guy.benoish@redislabs.com>
2020-08-27 12:54:01 +03:00
Oran Agra
8a4d10a3a5 Update memory metrics for INFO during loading (#7690)
During a long AOF or RDB loading, the memory stats were not updated, and
INFO would return stale data, specifically about fragmentation and RSS.
In the past some of these were sampled directly inside the INFO command,
but were moved to cron as an optimization.

This commit introduces a concept of loadingCron which should take
some of the responsibilities of serverCron.
It attempts to limit it's rate to approximately the server Hz, but may
not be very accurate.

In order to avoid too many system call, we use the cached ustime, and
also make sure to update it in both AOF loading and RDB loading inside
processEventsWhileBlocked (it seems AOF loading was missing it).
2020-08-27 11:09:32 +03:00
valentinogeron
0292720ccb EXEC with only read commands should not be rejected when OOM (#7696)
If the server gets MULTI command followed by only read
commands, and right before it gets the EXEC it reaches OOM,
the client will get OOM response.

So, from now on, it will get OOM response only if there was
at least one command that was tagged with `use-memory` flag
2020-08-27 09:19:24 +03:00
Oran Agra
b01816ca6e Add test coverage for CLIENT UNBLOCK (#7712)
plus minor other fixes to list.tcl
2020-08-27 08:09:39 +03:00
filipe oliveira
2693e8f245 Extended redis-benchmark instant metrics and overall latency report (#7600)
A first step to enable a consistent full percentile analysis on query latency so that we can fully understand the performance and stability characteristics of the redis-server system we are measuring. It also improves the instantaneous reported metrics, and the csv output format.
2020-08-25 21:21:29 +03:00
Itamar Haber
cb504d7fdd Expands lazyfree's effort estimate to include Streams (#5794)
Otherwise, it is treated as a single allocation and freed synchronously. The following logic is used for estimating the effort in constant-ish time complexity:

1. Check the number of nodes.
1. Add an allocation for each consumer group registered inside the stream.
1. Check the number of PELs in the first CG, and then add this count times the number of CGs.
1. Check the number of consumers in the first CG, and then add this count times the number of CGs.
2020-08-25 15:58:50 +03:00
Wang Yuan
48a00e6b99 Fix wrong format specifiers of 'sdscatfmt' for the INFO command (#7706)
unlike printf, sdscatfmt doesn't take %d
2020-08-24 22:59:56 +03:00
Wang Yuan
959099a969 Fix data race in bugReportStart (#7700)
The previous fix using _Atomic was insufficient, since we check and set it in
different places.

The implications of this bug are just that a portion of the bug report will be shown
twice, in the race case of two concurrent crashes.
2020-08-24 13:54:33 +03:00
Yossi Gottlieb
74d9d95449 Add language servers stuff, test/tls to gitignore. (#7698) 2020-08-24 12:54:56 +03:00
Valentino Geron
7e6c9ef881 Assert that setDeferredAggregateLen isn't called with negative value
In case the redis is about to return broken reply we want to crash
with assert so that we are notified about the bug. see #7687.
2020-08-23 16:03:30 +03:00
Valentino Geron
7a555da64f Fix LPOS command when RANK is greater than matches
When calling to LPOS command when RANK is higher than matches,
the return value is non valid response. For example:
```
LPUSH l a
:1
LPOS l b RANK 5 COUNT 10
*-4
```
It may break client-side parser.

Now, we count how many replies were replied in the array.
```
LPUSH l a
:1
LPOS l b RANK 5 COUNT 10
*0
```
2020-08-23 16:03:30 +03:00
Yossi Gottlieb
257f9f462f Tests: fix redis-cli with remote hosts. (#7693) 2020-08-23 10:17:43 +03:00
Wen Hui
7386b998e8 fix make warnings (#7692) 2020-08-21 23:37:49 +03:00
Wen Hui
86cd4629ae use dictSlots for getting total slots number in dict (#7691) 2020-08-21 00:14:09 +03:00