We should sync temp DB file before renaming as rdb_fsync_range does not use
flag `SYNC_FILE_RANGE_WAIT_AFTER`.
Refer to `Linux Programmer's Manual`:
SYNC_FILE_RANGE_WAIT_AFTER
Wait upon write-out of all pages in the range after performing any write.
(cherry picked from commit d119448881655a1529eb6d7d7e78af5f15132536)
When fclose would fail, the previous implementation would have attempted to do fclose again
this can in theory lead to segfault.
other changes:
check for non-zero return value as failure rather than a specific error code.
this doesn't fix a real bug, just a minor cleanup.
(cherry picked from commit c67656fa3541376590fe9a9b146ad5641cb861aa)
Before this commit, we would have continued to add replies to the reply buffer even if client
output buffer limit is reached, so the used memory would keep increasing over the configured limit.
What's more, we shouldn’t write any reply to the client if it is set 'CLIENT_CLOSE_ASAP' flag
because that doesn't conform to its definition and we will close all clients flagged with
'CLIENT_CLOSE_ASAP' in ‘beforeSleep’.
Because of code execution order, before this, we may firstly write to part of the replies to
the socket before disconnecting it, but in fact, we may can’t send the full replies to clients
since OS socket buffer is limited. But this unexpected behavior makes some commands work well,
for instance ACL DELUSER, if the client deletes the current user, we need to send reply to client
and close the connection, but before, we close the client firstly and write the reply to reply
buffer. secondly, we shouldn't do this despite the fact it works well in most cases.
We add a flag 'CLIENT_CLOSE_AFTER_COMMAND' to mark clients, this flag means we will close the
client after executing commands and send all entire replies, so that we can write replies to
reply buffer during executing commands, send replies to clients, and close them later.
We also fix some implicit problems. If client output buffer limit is enforced in 'multi/exec',
all commands will be executed completely in redis and clients will not read any reply instead of
partial replies. Even more, if the client executes 'ACL deluser' the using user in 'multi/exec',
it will not read the replies after 'ACL deluser' just like before executing 'client kill' itself
in 'multi/exec'.
We added some tests for output buffer limit breach during multi-exec and using a pipeline of
many small commands rather than one with big response.
Co-authored-by: Oran Agra <oran@redislabs.com>
(cherry picked from commit 3085577c095a0f3b1261f6dbf016d7701aadab46)
This happens only on diskless replicas when attempting to reconnect after
failing to load an RDB file. It is more likely to occur with larger datasets.
After reconnection is initiated, replicationEmptyDbCallback() may get called
and try to write to an unconnected socket. This triggered another issue where
the connection is put into an error state and the connect handler never gets
called. The problem is a regression introduced by commit cad93ed.
(cherry picked from commit ecd86283ec292c1062f377f5707be57a8a77adb4)
redis-check-rdb was unable to parse rdb files containing module aux data.
Co-authored-by: Oran Agra <oran@redislabs.com>
(cherry picked from commit b914d4fc4825cc20cebca43431af5029ee077d09)
This commit adds streamIteratorStop call in rewriteStreamObject function in some of the return statement. Although currently this will not cause memory leak since stream id is only 16 bytes long.
(cherry picked from commit 7934f163b4b6c1c0c0fc55710d3c7e49f56281f1)
Refine comment of makeThreadKillable().
This commit can be backported to 5.0, only if we also backport cf8a6e3.
Co-authored-by: Oran Agra <oran@redislabs.com>
(cherry picked from commit d2291627305d606a5d3b1e3b3bfa17ab10a3ef32)
We're already using bg_unlink in several places to delete the rdb file in the background,
and avoid paying the cost of the deletion from our main thread.
This commit uses bg_unlink to remove the temporary rdb file in the background too.
However, in case we delete that rdb file just before exiting, we don't actually wait for the
background thread or the main thread to delete it, and just let the OS clean up after us.
i.e. we open the file, unlink it and exit with the fd still open.
Furthermore, rdbRemoveTempFile can be called from a thread and was using snprintf which is
not async-signal-safe, we now use ll2string instead.
(cherry picked from commit 6638f6129553d0f19c60944e70fe619a4217658c)
If one thread got SIGSEGV, function sigsegvHandler() would be triggered,
it would call bioKillThreads(). But call pthread_cancel() to cancel itself
would make it block. Also note that if SIGSEGV is caught by bio thread, it
should kill the main thread in order to give a positive report.
(cherry picked from commit cf8a6e3c7a0448851f0c00ff1a726701a2be9f1a)
This commit makes stream object returning "stream" as encoding type in OBJECT ENCODING subcommand and DEBUG OBJECT command.
Till now, it would return "unknown"
(cherry picked from commit 2a8803f534728a6fd1b7c29a2d7e195f6a928f50)
Before this commit, following command did not show --tls option:
./runtest-cluster --help
./runtest-sentinel --help
(cherry picked from commit e4a1280a0e6c33d03ec6b622b8159b2b26f0f9c3)
These tests started failing every day on http 404 (not being able to
install valgrind)
(cherry picked from commit 9428c1a591472fc87775781e5955aa527c6f1ff0)
This test was nearly always failing on MacOS github actions.
This is because of bugs in the test that caused it to nearly always run
all 3 attempts and just look at the last one as the pass/fail creteria.
i.e. the test was nearly always running all 3 attempts and still sometimes
succeed. this is because the break condition was different than the test
completion condition.
The reason the test succeeded is because the break condition tested the
results of all 3 tests (PSETEX/PEXPIRE/PEXPIREAT), but the success check
at the end was only testing the result of PSETEX.
The reason the PEXPIREAT test nearly always failed is because it was
getting the current time wrong: getting the current second and loosing
the sub-section time, so the only chance for it to succeed is if it run
right when a certain second started.
Because i now get the time from redis, adding another round trip, i
added another 100ms to the PEXPIRE test to make it less fragile, and
also added many more attempts.
Adding many more attempts before failure to account for slow platforms,
github actions and valgrind
(cherry picked from commit 1fd56bb75a9afa5469b3ecb70d394b2adaf9baac)
The key save delay is too short and on certain systems the child process
is gone before we have a chance to inspect it.
(cherry picked from commit 1abc94155a26356f7fcaf5d20b80f031a55a3e82)
Co-authored-by: Yossi Gottlieb <yossigo@gmail.com>
Co-authored-by: Oran Agra <oran@redislabs.com>
(cherry picked from commit b1de173ec0f6a03d6083b87f1505fbf843708685)
This is a catch-all test to confirm that that rewrite produces a valid
output for all parameters and that this process does not introduce
undesired configuration changes.
(cherry picked from commit 995f1fc53f7daf3d289d5d70d7b45cdd486dc6cc)
Improve RM_Call inline documentation about the fmt argument
so that we don't completely depend on the web docs.
Co-authored-by: Oran Agra <oran@redislabs.com>
(cherry picked from commit c13fa0aa3619c595f06e191a30710d85a109ad48)
THP can also be set to madvise, in which case it shouldn't cause
problems for Redis since redis (or the allocator) doesn't use madvise
to activate it.
(cherry picked from commit 60097d361d4096d3826c7580acffd4053f8a4835)
There was a bug. Although cluster replicas would allow read commands,
they would not allow a MULTI-EXEC that's composed solely of read commands.
Adds tests for coverage.
Co-authored-by: Oran Agra <oran@redislabs.com>
Co-authored-by: Eran Liberty <eranl@amazon.com>
(cherry picked from commit 7bee51bb5b2cccbaae76f4721761880acf4d5a93)
if there are nested tests and nested servers, we need to restore the
previous value of cur_test when a test exist.
example:
```
test{test 1} {
start_server {
test{test 1.1 - master only} {
}
start_server {
test{test 1.2 - with replication} {
}
}
}
}
```
when `test 1.1 - master only exists`, we're still inside `test 1`
(cherry picked from commit 610b4ff16a62062338588c4508a73784fb962c0b)
1) cur_test: when restart_server, "no such variable" error occurs
./runtest --single integration/rdb
test {client freed during loading}
SET ::cur_test
restart_server
kill_server
test "Check for memory leaks (pid $pid)"
SET ::cur_test
UNSET ::cur_test
UNSET ::cur_test // This global variable has been unset.
2) `ps --ppid` not available on macOS platform, can be replaced with
`pgrep -P pid`.
(cherry picked from commit e90385e2232d41fd7c40dc239279f9837e7bdf57)
This test was failing from time to time see discussion at the bottom of #7635
This was probably due to timing, the DEBUG SLEEP executed by redis-cli
didn't sleep for enough time.
This commit changes:
1) use SET-ACTIVE-EXPIRE instead of DEBUG SLEEP
2) reduce many `after` sleeps with retry loops to speed up the test.
3) add many comment explaining the different steps of the test and
it's purpose.
4) config appendonly before populating the volatile keys, so that they'll
be part of the AOF command stream rather than the preamble RDB portion.
other complications: recently kill_instance switched from SIGKILL to
SIGTERM, and this would sometimes fail since there was an AOFRW running
in the background. now we wait for it to end before attempting the kill.
(cherry picked from commit 541d2709a0bd1a7f88681afa001c714b19df5dc1)
There is an inherent race condition in port allocation for spawned
servers. If a server fails to start because a port is taken, a new port
is allocated. This fixes a problem where the logs are not truncated and
as a result a large number of unmonitored servers are started.
(cherry picked from commit 871e85b8a75a53f90044ac04b0f5a9ba415c3bfa)
da723a917 added a file for stderr to keep valgrind log but i forgot to
add a similar thing when valgrind isn't being used.
the result is that `glob */err.txt` fails.
(cherry picked from commit 470de9a516b0dcb92acb8cf2841ddac604bcbd3a)
- redirect valgrind reports to a dedicated file rather than console
- try to avoid killing instances with SIGKILL so that we get the memory
leak report (killing with SIGTERM before resorting to SIGKILL)
- search for valgrind reports when done, print them and fail the tests
- add --dont-clean option to keep the logs on exit
- fix exit error code when crash is found (would have exited with 0)
changes that affect the normal redis test suite:
- refactor check_valgrind_errors into two functions one to search and
one to report
- move the search half into util.tcl to serve the cluster tests too
- ignore "address range perms" valgrind warnings which seem non relevant.
(cherry picked from commit da723a917dec7f2514d821a615668e158bb4f60c)
in some cases a command that returns an error possibly due to a timing
issue causes the tcl code to crash and thus prevents the rest of the
tests from running. this adds an option to make the test proceed despite
the crash.
maybe it should be the default mode some day.
(cherry picked from commit cf22e8eb91c2c1a769fda4c4de9eba3163dd7f05)