* Audit Logging for KeyProxy and KeyDB (#144)
* Audit Log: log cert fingerprint (#151)
* Add more flash storage stats to info command.
* Remove unneeded libs when not building FLASH
* Fix mem leak
* Allow the reservation of localhost connections to ensure health checks always succeed even at maxclients (#181)
* Enable a force option for commands (#183)
* Fix missing newline and excessive logging in the CLI
* Support NO ONE for "CLUSTER REPLICATE" command.
Co-authored-by: Jacob Bohac <jbohac@snapchat.com>
Co-authored-by: Sergey Kolosov <skolosov@snapchat.com>
Co-authored-by: John Sully <jsully@snapchat.com>
Co-authored-by: John Sully <john@csquare.ca>
* need to include stdint for uintptr_t
* need to include stdint for uintptr_t
* use atomic_load for g_pserver->mstime
* use atomic_load for g_pserver->mstime
* Integrate readwritelock with Pro Code
* Integrate readwritelock with Pro Code
* Defensive asserts for RWLock
* Defensive asserts for RWLock
* Save and restore master info in rdb to allow active replica partial sync (#371)
* save replid for all masters in rdb
* expanded rdbSaveInfo to hold multiple master structs
* parse repl-masters from rdb
* recover replid info from rdb in active replica mode, attempt partial sync
* save offset from rdb into correct variable
* don't change replid based on master in active rep
* save and load psync info from correct fields
* Save and restore master info in rdb to allow active replica partial sync (#371)
* save replid for all masters in rdb
* expanded rdbSaveInfo to hold multiple master structs
* parse repl-masters from rdb
* recover replid info from rdb in active replica mode, attempt partial sync
* save offset from rdb into correct variable
* don't change replid based on master in active rep
* save and load psync info from correct fields
* placement new instead of memcpy
* placement new instead of memcpy
* Remove asserts, RW lock can go below zero in cases of aeAcquireLock
* Remove asserts, RW lock can go below zero in cases of aeAcquireLock
* Inclusive language
* Inclusive language
* update packaging for OS merge
* update packaging for OS merge
* modify dockerfile to build within image
* modify dockerfile to build within image
* Make active client balancing a configurable option
* Make active client balancing a configurable option
* With TLS throttle accepts if server is under heavy load - do not change non TLS behavior
* With TLS throttle accepts if server is under heavy load - do not change non TLS behavior
* Only run the tls-name-validation test if --tls is passed into runtest
* Only run the tls-name-validation test if --tls is passed into runtest
* Fix KeyDB not building with TLS < 1.1.1
* Fix KeyDB not building with TLS < 1.1.1
* update changelog to use replica as terminology
* update changelog to use replica as terminology
* update copyright
* update copyright
* update deb copyright
* update deb copyright
* call aeThreadOnline() earlier
* call aeThreadOnline() earlier
* Removed mergeReplicationId
* Removed mergeReplicationId
* acceptTLS is threadsafe like the non TLS version
* acceptTLS is threadsafe like the non TLS version
* setup Machamp ci
* setup Machamp ci
* make build_test.sh executable
* make build_test.sh executable
* PSYNC production fixes
* PSYNC production fixes
* fix the Machamp build
* fix the Machamp build
* break into tests into steps
* break into tests into steps
* Added multimaster test
* Added multimaster test
* Update ci.yml
Change min tested version to 18.04
* Update ci.yml
Change min tested version to 18.04
* fork lock for all threads, use fastlock for readwritelock
* fork lock for all threads, use fastlock for readwritelock
* hide forklock object in ae
* hide forklock object in ae
* only need to include readwritelock in ae
* only need to include readwritelock in ae
* time thread lock uses fastlock instead of std::mutex
* time thread lock uses fastlock instead of std::mutex
* set thread as offline when waiting for time thread lock
* set thread as offline when waiting for time thread lock
* update README resource links
* update README resource links
* Fix MALLOC=memkind build issues
* Fix MALLOC=memkind build issues
* Fix module test break
* Fix module test break
* Eliminate firewall dialogs on mac for regular and cluster tests. There are still issues with the sentinel tests but attempting to bind only to localhost causes failures
* Eliminate firewall dialogs on mac for regular and cluster tests. There are still issues with the sentinel tests but attempting to bind only to localhost causes failures
* remove unused var in networking.cpp
* remove unused var in networking.cpp
* check ziplist len to avoid crash on empty ziplist convert
* check ziplist len to avoid crash on empty ziplist convert
* remove nullptr subtraction
* remove nullptr subtraction
* cannot mod a pointer
* cannot mod a pointer
* need to include stdint for uintptr_t
* need to include stdint for uintptr_t
* use atomic_load for g_pserver->mstime
* use atomic_load for g_pserver->mstime
* Integrate readwritelock with Pro Code
* Integrate readwritelock with Pro Code
* Defensive asserts for RWLock
* Defensive asserts for RWLock
* Save and restore master info in rdb to allow active replica partial sync (#371)
* save replid for all masters in rdb
* expanded rdbSaveInfo to hold multiple master structs
* parse repl-masters from rdb
* recover replid info from rdb in active replica mode, attempt partial sync
* save offset from rdb into correct variable
* don't change replid based on master in active rep
* save and load psync info from correct fields
* Save and restore master info in rdb to allow active replica partial sync (#371)
* save replid for all masters in rdb
* expanded rdbSaveInfo to hold multiple master structs
* parse repl-masters from rdb
* recover replid info from rdb in active replica mode, attempt partial sync
* save offset from rdb into correct variable
* don't change replid based on master in active rep
* save and load psync info from correct fields
* placement new instead of memcpy
* placement new instead of memcpy
* Remove asserts, RW lock can go below zero in cases of aeAcquireLock
* Remove asserts, RW lock can go below zero in cases of aeAcquireLock
* Inclusive language
* Inclusive language
* call aeThreadOnline() earlier
* call aeThreadOnline() earlier
* Removed mergeReplicationId
* Removed mergeReplicationId
* Make active client balancing a configurable option
* Make active client balancing a configurable option
* With TLS throttle accepts if server is under heavy load - do not change non TLS behavior
* With TLS throttle accepts if server is under heavy load - do not change non TLS behavior
* acceptTLS is threadsafe like the non TLS version
* acceptTLS is threadsafe like the non TLS version
* PSYNC production fixes
* PSYNC production fixes
* Ensure we are responsive during storagecache clears
* Ensure we are responsive during storagecache clears
* Ensure recreated tables use the same settings as ones made at boot
* Ensure recreated tables use the same settings as ones made at boot
* Converted some existing PSYNC tests for multimaster
* Converted some existing PSYNC tests for multimaster
* Inclusive language fix
* Inclusive language fix
* Cleanup test suite
* Cleanup test suite
* Updated test replica configs so tests make sense
* Updated test replica configs so tests make sense
* active-rep test reliability
* active-rep test reliability
* Quick fix to make psync tests work
* Quick fix to make psync tests work
* Fix PSYNC test crashes
* Fix PSYNC test crashes
* Ensure we force moves not copies when ingesting bulk insert files
* Ensure we force moves not copies when ingesting bulk insert files
* Disable async for hget commands as it is not ready
* Disable FLASH
* Fix crash in save of masterinfo
* Fix musl/Alpine build failures
* Remove unnecessary libs
* update readme
* update readme
* remove Enterprise references
* Limit max overage to 20% during RDB save
* Delete COPYING to replace with BSD license
* update deb master changelog
* Update license
* Fix Readme typo from github org transition
Replace mention of scratch-file-path with db-s3-object
* Fix reference counting failure in the dict. This is caused by std::swap also swapping refcounts
* Fix assertion in async rehash
* Prevent crash on shutdown by avoiding dtors (they are unnecessary anyways)
* Initialize noshrink, it was dangling
* Prevent us from starting a rehash when one wasn't already in progress. This can cause severe issues for snapshots
* Avoid unnecessary rehashing when a rehash is abandoned
* Dictionary use correct acquire/release semantics
* Add fence barriers for the repl backlog (important for AARCH64 and other weak memory models)
* Silence TSAN errors on ustime and mstime. Every CPU we support is atomic on aligned ints, but correctness matters
* Disable async commands by default
* Fix TSAN warnings on the repl backlog
* Merge OSS back into pro
* Fix unmerged files
* Fix O(n^2) algorithm in the GC cleanup logic
* Fix crash in expire when a snapshot is in flight. Caused by a perf optimization getting the expire map out of sync with the val
* On Alpine we must have a reasonable stack size
* Revert ci.yml to unstable branch version
* Implements the soft shutdown feature to allow clients to cooperatively disconnect preventing disruption during shutdown
* Ensure clean shutdown with multiple threads
* update dockerfiles
* update deb pkg references and changelog
* update gem reference
* lpGetInteger returns int64_t, avoid overflow (#10068)
Fix#9410
Crucial for the ms and sequence deltas, but I changed all
calls, just in case (e.g. "flags")
Before this commit:
`ms_delta` and `seq_delta` could have overflown, causing `currid` to be wrong,
which in turn would cause `streamTrim` to trim the entire rax node (see new test)
* Fix issue #454 (BSD build break)
* Do not allow commands to run in background when in eval, Issue #452
* Fix certificate leak during connection when tls-allowlists are used
* Fix issue #480
* Fix crash running INFO command while a disk based backlog is set
* check tracking per db
* fix warnings
* Fix a race when undoConnectWithMaster changes mi->repl_transfer_s but the connection is not yet closed and the event handler runs
* Fix a race in processChanges/trackChanges with rdbLoadRio by acquiring the lock when trackChanges is set
* Fix ASAN use after free
* Additional fixes
* Fix integer overflow of the track changes counter
* Fix P99 latency issue for TLS where we leave work for the next event loop
tlsProcessPendingData() needs to be called before we execute queued commands because it may enqueue more commands
* Fix race removing key cache
* Prevent crash on load in long running KeyDB instances
* Fixes a crash where the server assertion failed when the key exists in DB during RDB load
* Remove old assertion which is commented out.
* avoid from instatiating EpochHolder multiple times to improve performance and cpu utilization
* avoid from instatiating EpochHolder multiple times to improve performance and cpu utilization
* src\redis-cli.c: fix potential null pointer dereference found by cppcheck
src\redis-cli.c:5488:35: warning: Either the condition
'!table' is redundant or there is possible null pointer dereference:
table. [nullPointerRedundantCheck]
* Fix Issue #486
* Workaround bug in snapshot sync - abort don't crash
* Improve reliability of async parts of the soft shutdown tests
* Improve reliability of fragmentation tests
* Verify that partial syncs do indeed occur
* Fix O(n) algorithm in INFO command
* Remove incorrect assert that fires when the repl backlog is used fully
* Make building flash optional
* Remove unneeded gitlab CI file
* [BUG] Moves key to another DB, the source key was removed if the move failed due to the key exists in the destination db #497 (#498)
Co-authored-by: Paul Chen <mingchen@Mings-MacBook-Pro.local>
* trigger repl_curr_off!= master_repl_offset assert failure when having pending write case
* use debug for logging the message instead
* rocksdb log using up the diskspace on flash (#519)
* Fix OpenSSL 3.0.x related issues. (#10291)
* Drop obsolete initialization calls.
* Use decoder API for DH parameters.
* Enable auto DH parameters if not explicitly used, which should be the
preferred configuration going forward.
* remove unnecessary forward declaration
* remove internal ci stuff
* remove more internal ci/publishing
* submodule update step
* use with syntax instead
* bump ci ubuntu old ver as latest is now 22.04
* include submodules on all ci jobs
* install all deps for all ci jobs
Co-authored-by: Vivek Saini <vsaini@snapchat.com>
Co-authored-by: Christian Legge <christian@eqalpha.com>
Co-authored-by: benschermel <bschermel@snapchat.com>
Co-authored-by: John Sully <john@csquare.ca>
Co-authored-by: zliang <zliang@snapchat.com>
Co-authored-by: malavan <malavan@eqalpha.com>
Co-authored-by: John Sully <jsully@snapchat.com>
Co-authored-by: jfinity <38383673+jfinity@users.noreply.github.com>
Co-authored-by: benschermel <43507366+benschermel@users.noreply.github.com>
Co-authored-by: guybe7 <guy.benoish@redislabs.com>
Co-authored-by: Karthick Ariyaratnam (A) <k00809413@china.huawei.com>
Co-authored-by: root <paul.chen1@huawei.com>
Co-authored-by: Ilya Shipitsin <chipitsine@gmail.com>
Co-authored-by: Paul Chen <32553156+paulmchen@users.noreply.github.com>
Co-authored-by: Paul Chen <mingchen@Mings-MacBook-Pro.local>
Co-authored-by: Yossi Gottlieb <yossigo@gmail.com>
Add ability to modify port, tls-port and bind configurations by CONFIG SET command.
To simplify the code and make it cleaner, a new structure
added, socketFds, which contains the file descriptors array and its counter,
and used for TCP, TLS and Cluster sockets file descriptors.
some tests use attach_to_replication_stream to watch what's propagated
to replicas, but in some cases the periodic ping may slip in and fail
the test.
we disable that ping by setting the period to once an hour (tests should
not run for that long).
other change is so that the next time this oom-score-adj test fails,
we'll see the value (assert_equals prints it)
This Commit pushes forward the observability on overall error statistics and command statistics within redis-server:
It extends INFO COMMANDSTATS to have
- failed_calls in - so we can keep track of errors that happen from the command itself, broken by command.
- rejected_calls - so we can keep track of errors that were triggered outside the commmand processing per se
Adds a new section to INFO, named ERRORSTATS that enables keeping track of the different errors that
occur within redis ( within processCommand and call ) based on the reply Error Prefix ( The first word
after the "-", up to the first space ).
This commit also fixes RM_ReplyWithError so that it can be correctly identified as an error reply.
This adds a new `tls-client-cert-file` and `tls-client-key-file`
configuration directives which make it possible to use different
certificates for the TLS-server and TLS-client functions of Redis.
This is an optional directive. If it is not specified the `tls-cert-file`
and `tls-key-file` directives are used for TLS client functions as well.
Also, `utils/gen-test-certs.sh` now creates additional server-only and client-only certs and will skip intensive operations if target files already exist.
when using --baseport to run two tests suite in parallel (different
folders), we need to also make sure the port used by the testsuite to
communicate with it's workers is unique. otherwise the attept to find a
free port connects to the other test suite and messes it.
maybe one day we need to attempt to bind, instead of connect when tring
to find a free port.
The test creates keys with various encodings, DUMP them, corrupt the payload
and RESTORES it.
It utilizes the recently added use-exit-on-panic config to distinguish between
asserts and segfaults.
If the restore succeeds, it runs random commands on the key to attempt to
trigger a crash.
It runs in two modes, one with deep sanitation enabled and one without.
In the first one we don't expect any assertions or segfaults, in the second one
we expect assertions, but no segfaults.
We also check for leaks and invalid reads using valgrind, and if we find them
we print the commands that lead to that issue.
Changes in the code (other than the test):
- Replace a few NPD (null pointer deference) flows and division by zero with an
assertion, so that it doesn't fail the test. (since we set the server to use
`exit` rather than `abort` on assertion).
- Fix quite a lot of flows in rdb.c that could have lead to memory leaks in
RESTORE command (since it now responds with an error rather than panic)
- Add a DEBUG flag for SET-SKIP-CHECKSUM-VALIDATION so that the test don't need
to bother with faking a valid checksum
- Remove a pile of code in serverLogObjectDebugInfo which is actually unsafe to
run in the crash report (see comments in the code)
- fix a missing boundary check in lzf_decompress
test suite infra improvements:
- be able to run valgrind checks before the process terminates
- rotate log files when restarting servers
When loading an encoded payload we will at least do a shallow validation to
check that the size that's encoded in the payload matches the size of the
allocation.
This let's us later use this encoded size to make sure the various offsets
inside encoded payload don't reach outside the allocation, if they do, we'll
assert/panic, but at least we won't segfault or smear memory.
We can also do 'deep' validation which runs on all the records of the encoded
payload and validates that they don't contain invalid offsets. This lets us
detect corruptions early and reject a RESTORE command rather than accepting
it and asserting (crashing) later when accessing that payload via some command.
configuration:
- adding ACL flag skip-sanitize-payload
- adding config sanitize-dump-payload [yes/no/clients]
For now, we don't have a good way to ensure MIGRATE in cluster resharding isn't
being slowed down by these sanitation, so i'm setting the default value to `no`,
but later on it should be set to `clients` by default.
changes:
- changing rdbReportError not to `exit` in RESTORE command
- adding a new stat to be able to later check if cluster MIGRATE isn't being
slowed down by sanitation.
We're already using bg_unlink in several places to delete the rdb file in the background,
and avoid paying the cost of the deletion from our main thread.
This commit uses bg_unlink to remove the temporary rdb file in the background too.
However, in case we delete that rdb file just before exiting, we don't actually wait for the
background thread or the main thread to delete it, and just let the OS clean up after us.
i.e. we open the file, unlink it and exit with the fd still open.
Furthermore, rdbRemoveTempFile can be called from a thread and was using snprintf which is
not async-signal-safe, we now use ll2string instead.
(cherry picked from commit b002d2b4f1415f4db805081bc8f5b85d00f30e33)
- add test suite coverage for redis-benchmark
- add --version (similar to what redis-cli has)
- fix bug sending more requests than intended when pipeline > 1.
- when done sending requests, avoid freeing client in the write handler, in theory before
responses are received (probably dead code since the read handler will call clientDone first)
Co-authored-by: Oran Agra <oran@redislabs.com>
We're already using bg_unlink in several places to delete the rdb file in the background,
and avoid paying the cost of the deletion from our main thread.
This commit uses bg_unlink to remove the temporary rdb file in the background too.
However, in case we delete that rdb file just before exiting, we don't actually wait for the
background thread or the main thread to delete it, and just let the OS clean up after us.
i.e. we open the file, unlink it and exit with the fd still open.
Furthermore, rdbRemoveTempFile can be called from a thread and was using snprintf which is
not async-signal-safe, we now use ll2string instead.
in some cases a command that returns an error possibly due to a timing
issue causes the tcl code to crash and thus prevents the rest of the
tests from running. this adds an option to make the test proceed despite
the crash.
maybe it should be the default mode some day.
(cherry picked from commit fe5da2e60d8d6d907062f4789673fbe06fa8773e)
- skip full units
- skip a single test (not just a list of tests)
- when skipping tag, skip spinning up servers, not just the tests
- skip tags when running against an external server too
- allow using multiple tags (split them)
(cherry picked from commit 677d14c2137ab50fa25c8163d20b14bc563261c7)
in some cases a command that returns an error possibly due to a timing
issue causes the tcl code to crash and thus prevents the rest of the
tests from running. this adds an option to make the test proceed despite
the crash.
maybe it should be the default mode some day.
- skip full units
- skip a single test (not just a list of tests)
- when skipping tag, skip spinning up servers, not just the tests
- skip tags when running against an external server too
- allow using multiple tags (split them)
Add Linux kernel OOM killer control option.
This adds the ability to control the Linux OOM killer oom_score_adj
parameter for all Redis processes, depending on the process role (i.e.
master, replica, background child).
A oom-score-adj global boolean flag control this feature. In addition,
specific values can be configured using oom-score-adj-values if
additional tuning is required.
(cherry picked from commit 2530dc0ebd8be8d792f4673073401377cd5bdc42)
Add Linux kernel OOM killer control option.
This adds the ability to control the Linux OOM killer oom_score_adj
parameter for all Redis processes, depending on the process role (i.e.
master, replica, background child).
A oom-score-adj global boolean flag control this feature. In addition,
specific values can be configured using oom-score-adj-values if
additional tuning is required.
in the majority of the cases (on this rarely used feature) we want to
stop and be able to connect to the shard with redis-cli.
since these are two different processes interracting with the tty we
need to stop both, and we'll have to hit enter twice, but it's not that
bad considering it is rarely used.
(cherry picked from commit 02ef355f98691adba4126bbdab0d4d2bfe475701)