Starting redis 6.0 and the changes we made to the diskless master to be
suitable for TLS, I made the master avoid reaping (wait3) the pid of the
child until we know all replicas are done reading their rdb.
I did that in order to avoid a state where the rdb_child_pid is -1 but
we don't yet want to start another fork (still busy serving that data to
replicas).
It turns out that the solution used so far was problematic in case the
fork child was being killed (e.g. by the kernel OOM killer), in that
case there's a chance that we currently disabled the read event on the
rdb pipe, since we're waiting for a replica to become writable again.
and in that scenario the master would have never realized the child
exited, and the replica will remain hung too.
Note that there's no mechanism to detect a hung replica while it's in
rdb transfer state.
The solution here is to add another pipe which is used by the parent to
tell the child it is safe to exit. this mean that when the child exits,
for whatever reason, it is safe to reap it.
Besides that, i'm re-introducing an adjustment to REPLCONF ACK which was
part of #6271 (Accelerate diskless master connections) but was dropped
when that PR was rebased after the TLS fork/pipe changes (6fd5ff8).
Now that RdbPipeCleanup no longer calls checkChildrenDone, and the ACK
has chance to detect that the child exited, it should be the one to call
it so that we don't have to wait for cron (server.hz) to do that.
- redirect valgrind reports to a dedicated file rather than console
- try to avoid killing instances with SIGKILL so that we get the memory
leak report (killing with SIGTERM before resorting to SIGKILL)
- search for valgrind reports when done, print them and fail the tests
- add --dont-clean option to keep the logs on exit
- fix exit error code when crash is found (would have exited with 0)
changes that affect the normal redis test suite:
- refactor check_valgrind_errors into two functions one to search and
one to report
- move the search half into util.tcl to serve the cluster tests too
- ignore "address range perms" valgrind warnings which seem non relevant.
in some cases a command that returns an error possibly due to a timing
issue causes the tcl code to crash and thus prevents the rest of the
tests from running. this adds an option to make the test proceed despite
the crash.
maybe it should be the default mode some day.
- skip full units
- skip a single test (not just a list of tests)
- when skipping tag, skip spinning up servers, not just the tests
- skip tags when running against an external server too
- allow using multiple tags (split them)
Fix issues with writeConn() which resulted with corruption of the stream by leaving an extra byte in the buffer. The trigger for this is partial writes or write errors which were not experienced on Linux but reported on macOS.
1. default value of always-show-logo was not consistent with the default in the code
2. comment about cluster-replica-no-failover is wrong since we can only do manually failover upon replicas
3. improve description about always-show-logo
During long running scripts or loading RDB/AOF, we may need to do some
defragging. Since processEventsWhileBlocked is called periodically at
unknown intervals, and many cron jobs either depend on run_with_period
(including active defrag), or rely on being called at server.hz rate
(i.e. active defrag knows ho much time to run by looking at server.hz),
the whileBlockedCron may have to run a loop triggering the cron jobs in it
(currently only active defrag) several times.
Other changes:
- Adding a test for defrag during aof loading.
- Changing key-load-delay config to take negative values for fractions
of a microsecond sleep
DEBUG ZIPLIST <key> currently returns the following error string if the
key is not a ziplist: "ERR Not an sds encoded string.". This looks like
an accidental copy/paste error from the error returned in the else if
branch above where this string is returned if the key is not an sds
string. The command was added in
f898429fe149f476d61270ed4299dd1f8f75ae50 and looking at the commit,
nothing indicates that it is not an accidental typo.
The error string now returns a correct error: "Not a ziplist encoded
object", which accurately describes the error.
When runtest-cluster, at first, we need to create a cluster use spawn_instance,
a port which is not used is choosen, however sometimes we can't run server on
the port. possibley due to a race with another process taking it first.
such as redis/redis/runs/896537490. It may be due to the machine problem or
In order to reduce the probability of failure when start redis in
runtest-cluster, we attemp to use another port when find server do not start up.
Co-authored-by: Oran Agra <oran@redislabs.com>
Co-authored-by: yanhui13 <yanhui13@meituan.com>
(cherry picked from commit 1deaad884c38e92e5b691f36b253ef4ee2201ca4)
This fixes the issue described in CVE-2014-5461. At this time we cannot
confirm that the original issue has a real impact on Redis, but it is
included as an extra safety measure.
(cherry picked from commit 374270d3a04e8b224a12655518c815497aeb497d)
Don't assume `ps` handles `-h` to display output without headers and
manually trim headers line from output.
(cherry picked from commit ae8420298cacc2737e8e3ffa3c5acc038cd27849)
In order to keep the redismodule.h self-contained but still usable with
gcc v10 and later, annotate each API function tentative definition with
the __common__ attribute. This avoids the 'multiple definition' errors
modules will otherwise see for all API functions at link time.
Further details at gcc.gnu.org/gcc-10/porting_to.html
Turn the existing __attribute__ ((unused)), ((__common__)) and ((print))
annotations into conditional macros for any compilers not accepting this
syntax. These macros only expand to API annotations under gcc.
Provide a pre- and post- macro for every API function, so that they can
be defined differently by the file that includes redismodule.h.
Removing REDISMODULE_API_FUNC in the interest of keeping the function
declarations readable.
Co-authored-by: Yossi Gottlieb <yossigo@gmail.com>
Co-authored-by: Oran Agra <oran@redislabs.com>
(cherry picked from commit 9d4736b04441b609c17c414e0780882cf92c5e33)
Add Linux kernel OOM killer control option.
This adds the ability to control the Linux OOM killer oom_score_adj
parameter for all Redis processes, depending on the process role (i.e.
master, replica, background child).
A oom-score-adj global boolean flag control this feature. In addition,
specific values can be configured using oom-score-adj-values if
additional tuning is required.
(cherry picked from commit 70c823a64e800f22ac68f0172acdd1da82d7be32)
Added RedisModule_HoldString that either returns a
shallow copy of the given String (by increasing
the String ref count) or a new deep copy of String
in case its not possible to get a shallow copy.
Co-authored-by: Itamar Haber <itamar@redislabs.com>
(cherry picked from commit 4f99b22118ca91e3a7fe9c1c68c19dd717dfdbb5)
Co-authored-by: Oran Agra <oran@redislabs.com>
Co-authored-by: Itamar Haber <itamar@redislabs.com>
(cherry picked from commit 73198c50194cbf0254afd4cc5245f9274a538d13)
fe8d6fe74 (released in 6.0.6) has a side effect, when processCommand
rejects a command with pre-made shared object error string, it trims the
newlines from the end of the string. if that string is later used with
addReply, the newline will be missing, breaking the protocol, and
leaving the client hung.
It seems that the only scenario which this happens is when replying with
-LOADING to some command, and later using that reply from the CONFIG
SET command (still during loading). this will result in hung client.
Refactoring the code in order to avoid trimming these newlines from
shared string objects, and do the newline trimming only in other cases
where it's needed.
Co-authored-by: Guy Benoish <guy.benoish@redislabs.com>
(cherry picked from commit 2640897e3a01fbacb620c12e021c934e48eeccb9)
If the server gets MULTI command followed by only read
commands, and right before it gets the EXEC it reaches OOM,
the client will get OOM response.
So, from now on, it will get OOM response only if there was
at least one command that was tagged with `use-memory` flag
(cherry picked from commit 0292720ccb0a189d3ed49d7bf912602360a4ecdd)
Otherwise, it is treated as a single allocation and freed synchronously. The following logic is used for estimating the effort in constant-ish time complexity:
1. Check the number of nodes.
1. Add an allocation for each consumer group registered inside the stream.
1. Check the number of PELs in the first CG, and then add this count times the number of CGs.
1. Check the number of consumers in the first CG, and then add this count times the number of CGs.
(cherry picked from commit cb504d7fddd09149655e91496588c610e89ca131)
In case the redis is about to return broken reply we want to crash
with assert so that we are notified about the bug. see #7687.
(cherry picked from commit 7e6c9ef8819a071679f8dd18035dbbe2455c7b12)
When calling to LPOS command when RANK is higher than matches,
the return value is non valid response. For example:
```
LPUSH l a
:1
LPOS l b RANK 5 COUNT 10
*-4
```
It may break client-side parser.
Now, we count how many replies were replied in the array.
```
LPUSH l a
:1
LPOS l b RANK 5 COUNT 10
*0
```
(cherry picked from commit 7a555da64f56a4fb2f300d84a35778bee8f471ca)
It was also using the wrong struct, but luckily RedisModuleFlushInfo and RedisModuleLoadingProgress
are identical.
(cherry picked from commit b980e999293e9214a844712f9c88ca69acd20b1b)
We wanna avoid a chance of someone using the pointer in it after it'll be freed / realloced.
(cherry picked from commit 4de17eb032160c7ba94c505eab4b776a456e5117)
After fork, the child process(redis-aof-rewrite) will get the fd opened
by the parent process(redis), when redis killed by kill -9, it will not
graceful exit(call prepareForShutdown()), so redis-aof-rewrite thread may still
alive, the fd(lock) will still be held by redis-aof-rewrite thread, and
redis restart will fail to get lock, means fail to start.
This issue was causing failures in the cluster tests in github actions.
Co-authored-by: Oran Agra <oran@redislabs.com>
(cherry picked from commit 5e6212e087c4696abc682b64079202c9ade8666c)
Since users often post just the crash log in github issues, the log
print that's above it is missing.
No reason not to include the size in the panic message itself.
(cherry picked from commit 1b5cc94836d24b7b36cb6618644f9e2d60113c59)
It was already defined in the API header and the documentation, but not used by the implementation.
(cherry picked from commit b7236f0002bcaa15f3a487def9c5069b6c422e65)
fe8d6fe749 added rejectCommand which takes an robj reply and passes it
through addReplyErrorSafe to addReplyErrorLength.
The robj contains newline at it's end, but addReplyErrorSafe converts it
to spaces, and passes it to addReplyErrorLength which adds the protocol
newlines.
The result was that most error replies (like OOM) had extra two trailing
spaces in them.
(cherry picked from commit 05a4af3464b16e42b31dfb1ea62e2a66dc032fb2)
The `REDISMODULE_CLIENTINFO_FLAG_SSL` flag was already a part of the `RedisModuleClientInfo` structure but was not implemented.
(cherry picked from commit 2ec11f941ae41188e517670fc3224b12c7666541)
Avoid re-configuring (and validating) SSL/TLS configuration on `CONFIG
SET` when TLS is not actively enabled for incoming connections, cluster
bus or replication.
This fixes failures when tests run without `--tls` on binaries that were
built with TLS support.
An additional benefit is that it's now possible to perform a multi-step
configuration process while TLS is disabled. The new configuration will
be verified and applied only when TLS is effectively enabled.
(cherry picked from commit 24efd22e894c90f380aa05a5fa77134bb9423ad3)