track and report memory used by clients argv.
this is very usaful in case clients started sending a command and didn't
complete it. in which case the first args of the command are already
trimmed from the query buffer.
in an effort to avoid cache misses and overheads while keeping track of
these, i avoid calling sdsZmallocSize and instead use the sdslen /
bulk-len which can at least give some insight into the problem.
This memory is now added to the total clients memory usage, as well as
the client list.
(cherry picked from commit 7481e513f0507d01381a87046d8d1366c718f94e)
Adding -D option for redis-cli to control newline between command
responses in raw mode.
Also removing cleanup code before calling exit, just in order
to avoid adding more adding more cleanup code (redis doesn't
bother to release allocations before exit anyway)
Co-authored-by: Oran Agra <oran@redislabs.com>
(cherry picked from commit f0f8e9c824b819c8aec996ec8c8851773a6f9432)
The MEMORY command is used for debugging memory usage, so it should include internal
fragmentation, same as used_memory
(cherry picked from commit 86483e795262c6e2efdffe92c1642a72ef0dd6a0)
PROBLEM:
[$rd1 read] reads invalidation messages one by one, so it's never going to see the second invalidation message produced after INCR b, whether or not it exists. Adding another read will block incase no invalidation message is produced.
FIX:
We switch the order of "INCR a" and "INCR b" - now "INCR b" comes first. We still only read the first invalidation message produces. If an invalidation message is wrongly produces for b - then it will be produced before that of a, since "INCR b" comes before "INCR a".
Co-authored-by: Nitai Caro <caronita@amazon.com>
(cherry picked from commit 94e9b0124e8582912c3771f9828842348490bc38)
When REDISMODULE_EVENT_CLIENT_CHANGE events are delivered, modules may
want to mutate the client state (e.g. perform authentication).
This change links the module context with the real client rather than a
fake client for these events.
(cherry picked from commit 4aca4e5f392ad6030150a92a9ef82412072f9622)
The client pointed to by the module context may in some cases be a fake
client. RM_Authenticate*() calls in this case would be ineffective but
appear to succeed, and this change fails them to make it easier to catch
such cases.
(cherry picked from commit 82866776d0c26f17043f9c1b0f0f5f48660e6848)
The tls-ca-cert or tls-ca-cert-dir configuration parameters are only
used when Redis needs to authenticate peer certificates, in one of these
scenarios:
1. Incoming clients or replicas, with `tls-auth-clients` enabled.
2. A replica authenticating the master's peer certificate.
3. Cluster nodes authenticating other nodes when establishing the bus
protocol connection.
(cherry picked from commit 3bd9d0cc85d4ef8f4cc2789e9ab27e5557471409)
when slaveof config is "no one", reset any pre-existing config and resume.
also solve a memory leak if slaveof appears twice.
and fail loading if port number is out of range or not an integer.
Co-authored-by: caozhengbin <caozb@yidingyun.com>
Co-authored-by: Oran Agra <oran@redislabs.com>
(cherry picked from commit 01694608cb4e39a6ec7970d24b21ab33b7347e31)
There's currently an issue with IO threads and gopher (issuing lookupKey from within the thread).
simply fix is to just not support it for now.
(cherry picked from commit 9bdef76f8e3bbfaacf0962ab1ceded1bafa80bda)
We may access and modify these two variables in signal handler function,
to guarantee them async-signal-safe, so we should set them to volatile
sig_atomic_t type.
It doesn't look like this could have caused any real issue, and it seems that
signals are handled in main thread on most platforms. But we want to follow C
and POSIX standard in signal handler function.
(cherry picked from commit 917043fa438d9bbe9a80fb838fcfd33a7e390952)
Make sure we handle short writes correctly, sync to disk after writing and use
rename to make sure the replacement is actually atomic.
In any case of failure old configuration will remain in place.
Also, add some additional logging to make it easier to diagnose rewrite problems.
(cherry picked from commit 8dbe91f0316f08d785bad1e8e28f1c13ddfbef2c)
We should sync temp DB file before renaming as rdb_fsync_range does not use
flag `SYNC_FILE_RANGE_WAIT_AFTER`.
Refer to `Linux Programmer's Manual`:
SYNC_FILE_RANGE_WAIT_AFTER
Wait upon write-out of all pages in the range after performing any write.
(cherry picked from commit d119448881655a1529eb6d7d7e78af5f15132536)
When fclose would fail, the previous implementation would have attempted to do fclose again
this can in theory lead to segfault.
other changes:
check for non-zero return value as failure rather than a specific error code.
this doesn't fix a real bug, just a minor cleanup.
(cherry picked from commit c67656fa3541376590fe9a9b146ad5641cb861aa)
Before this commit, we would have continued to add replies to the reply buffer even if client
output buffer limit is reached, so the used memory would keep increasing over the configured limit.
What's more, we shouldn’t write any reply to the client if it is set 'CLIENT_CLOSE_ASAP' flag
because that doesn't conform to its definition and we will close all clients flagged with
'CLIENT_CLOSE_ASAP' in ‘beforeSleep’.
Because of code execution order, before this, we may firstly write to part of the replies to
the socket before disconnecting it, but in fact, we may can’t send the full replies to clients
since OS socket buffer is limited. But this unexpected behavior makes some commands work well,
for instance ACL DELUSER, if the client deletes the current user, we need to send reply to client
and close the connection, but before, we close the client firstly and write the reply to reply
buffer. secondly, we shouldn't do this despite the fact it works well in most cases.
We add a flag 'CLIENT_CLOSE_AFTER_COMMAND' to mark clients, this flag means we will close the
client after executing commands and send all entire replies, so that we can write replies to
reply buffer during executing commands, send replies to clients, and close them later.
We also fix some implicit problems. If client output buffer limit is enforced in 'multi/exec',
all commands will be executed completely in redis and clients will not read any reply instead of
partial replies. Even more, if the client executes 'ACL deluser' the using user in 'multi/exec',
it will not read the replies after 'ACL deluser' just like before executing 'client kill' itself
in 'multi/exec'.
We added some tests for output buffer limit breach during multi-exec and using a pipeline of
many small commands rather than one with big response.
Co-authored-by: Oran Agra <oran@redislabs.com>
(cherry picked from commit 3085577c095a0f3b1261f6dbf016d7701aadab46)
This happens only on diskless replicas when attempting to reconnect after
failing to load an RDB file. It is more likely to occur with larger datasets.
After reconnection is initiated, replicationEmptyDbCallback() may get called
and try to write to an unconnected socket. This triggered another issue where
the connection is put into an error state and the connect handler never gets
called. The problem is a regression introduced by commit cad93ed.
(cherry picked from commit ecd86283ec292c1062f377f5707be57a8a77adb4)
redis-check-rdb was unable to parse rdb files containing module aux data.
Co-authored-by: Oran Agra <oran@redislabs.com>
(cherry picked from commit b914d4fc4825cc20cebca43431af5029ee077d09)
This commit adds streamIteratorStop call in rewriteStreamObject function in some of the return statement. Although currently this will not cause memory leak since stream id is only 16 bytes long.
(cherry picked from commit 7934f163b4b6c1c0c0fc55710d3c7e49f56281f1)
Refine comment of makeThreadKillable().
This commit can be backported to 5.0, only if we also backport cf8a6e3.
Co-authored-by: Oran Agra <oran@redislabs.com>
(cherry picked from commit d2291627305d606a5d3b1e3b3bfa17ab10a3ef32)
We're already using bg_unlink in several places to delete the rdb file in the background,
and avoid paying the cost of the deletion from our main thread.
This commit uses bg_unlink to remove the temporary rdb file in the background too.
However, in case we delete that rdb file just before exiting, we don't actually wait for the
background thread or the main thread to delete it, and just let the OS clean up after us.
i.e. we open the file, unlink it and exit with the fd still open.
Furthermore, rdbRemoveTempFile can be called from a thread and was using snprintf which is
not async-signal-safe, we now use ll2string instead.
(cherry picked from commit 6638f6129553d0f19c60944e70fe619a4217658c)
If one thread got SIGSEGV, function sigsegvHandler() would be triggered,
it would call bioKillThreads(). But call pthread_cancel() to cancel itself
would make it block. Also note that if SIGSEGV is caught by bio thread, it
should kill the main thread in order to give a positive report.
(cherry picked from commit cf8a6e3c7a0448851f0c00ff1a726701a2be9f1a)
This commit makes stream object returning "stream" as encoding type in OBJECT ENCODING subcommand and DEBUG OBJECT command.
Till now, it would return "unknown"
(cherry picked from commit 2a8803f534728a6fd1b7c29a2d7e195f6a928f50)
Before this commit, following command did not show --tls option:
./runtest-cluster --help
./runtest-sentinel --help
(cherry picked from commit e4a1280a0e6c33d03ec6b622b8159b2b26f0f9c3)
These tests started failing every day on http 404 (not being able to
install valgrind)
(cherry picked from commit 9428c1a591472fc87775781e5955aa527c6f1ff0)
This test was nearly always failing on MacOS github actions.
This is because of bugs in the test that caused it to nearly always run
all 3 attempts and just look at the last one as the pass/fail creteria.
i.e. the test was nearly always running all 3 attempts and still sometimes
succeed. this is because the break condition was different than the test
completion condition.
The reason the test succeeded is because the break condition tested the
results of all 3 tests (PSETEX/PEXPIRE/PEXPIREAT), but the success check
at the end was only testing the result of PSETEX.
The reason the PEXPIREAT test nearly always failed is because it was
getting the current time wrong: getting the current second and loosing
the sub-section time, so the only chance for it to succeed is if it run
right when a certain second started.
Because i now get the time from redis, adding another round trip, i
added another 100ms to the PEXPIRE test to make it less fragile, and
also added many more attempts.
Adding many more attempts before failure to account for slow platforms,
github actions and valgrind
(cherry picked from commit 1fd56bb75a9afa5469b3ecb70d394b2adaf9baac)