The key save delay is too short and on certain systems the child process
is gone before we have a chance to inspect it.
(cherry picked from commit 1abc94155a26356f7fcaf5d20b80f031a55a3e82)
Co-authored-by: Yossi Gottlieb <yossigo@gmail.com>
Co-authored-by: Oran Agra <oran@redislabs.com>
(cherry picked from commit b1de173ec0f6a03d6083b87f1505fbf843708685)
This is a catch-all test to confirm that that rewrite produces a valid
output for all parameters and that this process does not introduce
undesired configuration changes.
(cherry picked from commit 995f1fc53f7daf3d289d5d70d7b45cdd486dc6cc)
There was a bug. Although cluster replicas would allow read commands,
they would not allow a MULTI-EXEC that's composed solely of read commands.
Adds tests for coverage.
Co-authored-by: Oran Agra <oran@redislabs.com>
Co-authored-by: Eran Liberty <eranl@amazon.com>
(cherry picked from commit 7bee51bb5b2cccbaae76f4721761880acf4d5a93)
if there are nested tests and nested servers, we need to restore the
previous value of cur_test when a test exist.
example:
```
test{test 1} {
start_server {
test{test 1.1 - master only} {
}
start_server {
test{test 1.2 - with replication} {
}
}
}
}
```
when `test 1.1 - master only exists`, we're still inside `test 1`
(cherry picked from commit 610b4ff16a62062338588c4508a73784fb962c0b)
1) cur_test: when restart_server, "no such variable" error occurs
./runtest --single integration/rdb
test {client freed during loading}
SET ::cur_test
restart_server
kill_server
test "Check for memory leaks (pid $pid)"
SET ::cur_test
UNSET ::cur_test
UNSET ::cur_test // This global variable has been unset.
2) `ps --ppid` not available on macOS platform, can be replaced with
`pgrep -P pid`.
(cherry picked from commit e90385e2232d41fd7c40dc239279f9837e7bdf57)
This test was failing from time to time see discussion at the bottom of #7635
This was probably due to timing, the DEBUG SLEEP executed by redis-cli
didn't sleep for enough time.
This commit changes:
1) use SET-ACTIVE-EXPIRE instead of DEBUG SLEEP
2) reduce many `after` sleeps with retry loops to speed up the test.
3) add many comment explaining the different steps of the test and
it's purpose.
4) config appendonly before populating the volatile keys, so that they'll
be part of the AOF command stream rather than the preamble RDB portion.
other complications: recently kill_instance switched from SIGKILL to
SIGTERM, and this would sometimes fail since there was an AOFRW running
in the background. now we wait for it to end before attempting the kill.
(cherry picked from commit 541d2709a0bd1a7f88681afa001c714b19df5dc1)
There is an inherent race condition in port allocation for spawned
servers. If a server fails to start because a port is taken, a new port
is allocated. This fixes a problem where the logs are not truncated and
as a result a large number of unmonitored servers are started.
(cherry picked from commit 871e85b8a75a53f90044ac04b0f5a9ba415c3bfa)
da723a917 added a file for stderr to keep valgrind log but i forgot to
add a similar thing when valgrind isn't being used.
the result is that `glob */err.txt` fails.
(cherry picked from commit 470de9a516b0dcb92acb8cf2841ddac604bcbd3a)
- redirect valgrind reports to a dedicated file rather than console
- try to avoid killing instances with SIGKILL so that we get the memory
leak report (killing with SIGTERM before resorting to SIGKILL)
- search for valgrind reports when done, print them and fail the tests
- add --dont-clean option to keep the logs on exit
- fix exit error code when crash is found (would have exited with 0)
changes that affect the normal redis test suite:
- refactor check_valgrind_errors into two functions one to search and
one to report
- move the search half into util.tcl to serve the cluster tests too
- ignore "address range perms" valgrind warnings which seem non relevant.
(cherry picked from commit da723a917dec7f2514d821a615668e158bb4f60c)
in some cases a command that returns an error possibly due to a timing
issue causes the tcl code to crash and thus prevents the rest of the
tests from running. this adds an option to make the test proceed despite
the crash.
maybe it should be the default mode some day.
(cherry picked from commit cf22e8eb91c2c1a769fda4c4de9eba3163dd7f05)
reduce code duplication in aof.tcl.
move creation of clients into the test so that it can be skipped
(cherry picked from commit cc455a710cc68d0fd8243cd1f04c5ee7332e4fdb)
- skip full units
- skip a single test (not just a list of tests)
- when skipping tag, skip spinning up servers, not just the tests
- skip tags when running against an external server too
- allow using multiple tags (split them)
(cherry picked from commit 5c61f1a6ed876186b944e79f903354cd81077bb6)
This is a catch-all test to confirm that that rewrite produces a valid
output for all parameters and that this process does not introduce
undesired configuration changes.
Save parameters should either be default or whatever specified in the
config file. This fixes an issue introduced in #7092 which causes
configuration file settings to be applied on top of the defaults.
There was a bug. Although cluster replicas would allow read commands,
they would not allow a MULTI-EXEC that's composed solely of read commands.
Adds tests for coverage.
Co-authored-by: Oran Agra <oran@redislabs.com>
Co-authored-by: Eran Liberty <eranl@amazon.com>
if there are nested tests and nested servers, we need to restore the
previous value of cur_test when a test exist.
example:
```
test{test 1} {
start_server {
test{test 1.1 - master only} {
}
start_server {
test{test 1.2 - with replication} {
}
}
}
}
```
when `test 1.1 - master only exists`, we're still inside `test 1`
1) cur_test: when restart_server, "no such variable" error occurs
./runtest --single integration/rdb
test {client freed during loading}
SET ::cur_test
restart_server
kill_server
test "Check for memory leaks (pid $pid)"
SET ::cur_test
UNSET ::cur_test
UNSET ::cur_test // This global variable has been unset.
2) `ps --ppid` not available on macOS platform, can be replaced with
`pgrep -P pid`.
This test was failing from time to time see discussion at the bottom of #7635
This was probably due to timing, the DEBUG SLEEP executed by redis-cli
didn't sleep for enough time.
This commit changes:
1) use SET-ACTIVE-EXPIRE instead of DEBUG SLEEP
2) reduce many `after` sleeps with retry loops to speed up the test.
3) add many comment explaining the different steps of the test and
it's purpose.
4) config appendonly before populating the volatile keys, so that they'll
be part of the AOF command stream rather than the preamble RDB portion.
other complications: recently kill_instance switched from SIGKILL to
SIGTERM, and this would sometimes fail since there was an AOFRW running
in the background. now we wait for it to end before attempting the kill.
There is an inherent race condition in port allocation for spawned
servers. If a server fails to start because a port is taken, a new port
is allocated. This fixes a problem where the logs are not truncated and
as a result a large number of unmonitored servers are started.
da723a917 added a file for stderr to keep valgrind log but i forgot to
add a similar thing when valgrind isn't being used.
the result is that `glob */err.txt` fails.
Starting redis 6.0 and the changes we made to the diskless master to be
suitable for TLS, I made the master avoid reaping (wait3) the pid of the
child until we know all replicas are done reading their rdb.
I did that in order to avoid a state where the rdb_child_pid is -1 but
we don't yet want to start another fork (still busy serving that data to
replicas).
It turns out that the solution used so far was problematic in case the
fork child was being killed (e.g. by the kernel OOM killer), in that
case there's a chance that we currently disabled the read event on the
rdb pipe, since we're waiting for a replica to become writable again.
and in that scenario the master would have never realized the child
exited, and the replica will remain hung too.
Note that there's no mechanism to detect a hung replica while it's in
rdb transfer state.
The solution here is to add another pipe which is used by the parent to
tell the child it is safe to exit. this mean that when the child exits,
for whatever reason, it is safe to reap it.
Besides that, i'm re-introducing an adjustment to REPLCONF ACK which was
part of #6271 (Accelerate diskless master connections) but was dropped
when that PR was rebased after the TLS fork/pipe changes (6fd5ff8).
Now that RdbPipeCleanup no longer calls checkChildrenDone, and the ACK
has chance to detect that the child exited, it should be the one to call
it so that we don't have to wait for cron (server.hz) to do that.
- redirect valgrind reports to a dedicated file rather than console
- try to avoid killing instances with SIGKILL so that we get the memory
leak report (killing with SIGTERM before resorting to SIGKILL)
- search for valgrind reports when done, print them and fail the tests
- add --dont-clean option to keep the logs on exit
- fix exit error code when crash is found (would have exited with 0)
changes that affect the normal redis test suite:
- refactor check_valgrind_errors into two functions one to search and
one to report
- move the search half into util.tcl to serve the cluster tests too
- ignore "address range perms" valgrind warnings which seem non relevant.
in some cases a command that returns an error possibly due to a timing
issue causes the tcl code to crash and thus prevents the rest of the
tests from running. this adds an option to make the test proceed despite
the crash.
maybe it should be the default mode some day.
- skip full units
- skip a single test (not just a list of tests)
- when skipping tag, skip spinning up servers, not just the tests
- skip tags when running against an external server too
- allow using multiple tags (split them)
During long running scripts or loading RDB/AOF, we may need to do some
defragging. Since processEventsWhileBlocked is called periodically at
unknown intervals, and many cron jobs either depend on run_with_period
(including active defrag), or rely on being called at server.hz rate
(i.e. active defrag knows ho much time to run by looking at server.hz),
the whileBlockedCron may have to run a loop triggering the cron jobs in it
(currently only active defrag) several times.
Other changes:
- Adding a test for defrag during aof loading.
- Changing key-load-delay config to take negative values for fractions
of a microsecond sleep
When runtest-cluster, at first, we need to create a cluster use spawn_instance,
a port which is not used is choosen, however sometimes we can't run server on
the port. possibley due to a race with another process taking it first.
such as redis/redis/runs/896537490. It may be due to the machine problem or
In order to reduce the probability of failure when start redis in
runtest-cluster, we attemp to use another port when find server do not start up.
Co-authored-by: Oran Agra <oran@redislabs.com>
Co-authored-by: yanhui13 <yanhui13@meituan.com>
(cherry picked from commit 1deaad884c38e92e5b691f36b253ef4ee2201ca4)
Don't assume `ps` handles `-h` to display output without headers and
manually trim headers line from output.
(cherry picked from commit ae8420298cacc2737e8e3ffa3c5acc038cd27849)
Add Linux kernel OOM killer control option.
This adds the ability to control the Linux OOM killer oom_score_adj
parameter for all Redis processes, depending on the process role (i.e.
master, replica, background child).
A oom-score-adj global boolean flag control this feature. In addition,
specific values can be configured using oom-score-adj-values if
additional tuning is required.
(cherry picked from commit 70c823a64e800f22ac68f0172acdd1da82d7be32)
Added RedisModule_HoldString that either returns a
shallow copy of the given String (by increasing
the String ref count) or a new deep copy of String
in case its not possible to get a shallow copy.
Co-authored-by: Itamar Haber <itamar@redislabs.com>
(cherry picked from commit 4f99b22118ca91e3a7fe9c1c68c19dd717dfdbb5)
Co-authored-by: Oran Agra <oran@redislabs.com>
Co-authored-by: Itamar Haber <itamar@redislabs.com>
(cherry picked from commit 73198c50194cbf0254afd4cc5245f9274a538d13)
If the server gets MULTI command followed by only read
commands, and right before it gets the EXEC it reaches OOM,
the client will get OOM response.
So, from now on, it will get OOM response only if there was
at least one command that was tagged with `use-memory` flag
(cherry picked from commit 0292720ccb0a189d3ed49d7bf912602360a4ecdd)
When calling to LPOS command when RANK is higher than matches,
the return value is non valid response. For example:
```
LPUSH l a
:1
LPOS l b RANK 5 COUNT 10
*-4
```
It may break client-side parser.
Now, we count how many replies were replied in the array.
```
LPUSH l a
:1
LPOS l b RANK 5 COUNT 10
*0
```
(cherry picked from commit 7a555da64f56a4fb2f300d84a35778bee8f471ca)
After fork, the child process(redis-aof-rewrite) will get the fd opened
by the parent process(redis), when redis killed by kill -9, it will not
graceful exit(call prepareForShutdown()), so redis-aof-rewrite thread may still
alive, the fd(lock) will still be held by redis-aof-rewrite thread, and
redis restart will fail to get lock, means fail to start.
This issue was causing failures in the cluster tests in github actions.
Co-authored-by: Oran Agra <oran@redislabs.com>
(cherry picked from commit 5e6212e087c4696abc682b64079202c9ade8666c)