297 Commits

Author SHA1 Message Date
christianEQ
c502e6f2a1 Merge remote-tracking branch 'origin/unstable' into ci-flags-fix
Former-commit-id: da1f09e9b551cacdfd24dc839ee659a5e5e1e1de
2021-07-14 22:56:15 +00:00
malavan
4509c6e0a1 cleanup based on 6.2.2 merge review
Former-commit-id: 51277b17a7ab4bb5b3f06fd5af8f26257ac35a37
2021-07-14 15:22:27 -04:00
John Sully
da2aceabcf Merge tag '6.2.3' into unstable
Former-commit-id: 1895dbb7680fa9aadf6040912e89c733abc8c706
2021-07-09 04:40:31 +00:00
christianEQ
d10336b007 various branding and cleanup fixes
Former-commit-id: e3c619eca4755c96af83e1959a6ea5ba95734e93
2021-07-08 02:46:42 +00:00
John Sully
1554161bdc Prevent test code crash due to no log data
Former-commit-id: 0a56a73bd98d4e692ae77683fdb9dd644ecfc2eb
2021-06-14 22:06:36 +00:00
christianEQ
cb3f3d1b7e renamed redis test files
Former-commit-id: 1c77104b5efcdfd1fce6a4a946e8a1ead35dc7f0
2021-06-11 19:09:40 +00:00
christianEQ
f8289cebcc removed unreliable musl test and left only accurate new one
Former-commit-id: 386be8990a83fcc5d57aa20a268544a877c2cfd7
2021-06-11 18:19:59 +00:00
John Sully
6f899382ca Prevent partial sync in test that requires only full syncs
Former-commit-id: 1b9fea066914d7f23d6bec220f26b8c0112d7f8b
2021-05-29 02:22:20 +00:00
John Sully
97d6875862 Fix failover command test failures
Former-commit-id: d3c37c7159a92319759a33851669862a82cf1b28
2021-05-29 01:19:12 +00:00
John Sully
5267928381 Merge tag '6.2.2' into unstable
Former-commit-id: 93ebb31b17adec5d406d2e30a5b9ea71c07fce5c
2021-05-21 05:54:39 +00:00
John Sully
fe8efa916b Merge tag '6.2.1' into unstable
Former-commit-id: bfed57e3e0edaa724b9d060a6bb8edc5a6de65fa
2021-05-19 02:59:48 +00:00
bugwz
0851705304 Print the number of abnormal line in AOF (#8823)
When redis-check-aof finds an error, it prints the line number for faster troubleshooting. 

(cherry picked from commit 761d7d27711edfbf737def41ff28f5b325fb16c8)
2021-05-03 22:57:00 +03:00
Oran Agra
a9897b0084
Fix timing of new replication test (#8807)
In github actions CI with valgrind, i saw that even the fast replica
(one that wasn't paused), didn't get to complete the replication fast
enough, and ended up getting disconnected by timeout.

Additionally, due to a typo in uname, we didn't get to actually run the
CPU efficiency part of the test.
2021-04-18 15:12:34 +03:00
guybe7
d63d02601f
Add a timeout mechanism for replicas stuck in fullsync (#8762)
Starting redis 6.0 (part of the TLS feature), diskless master uses pipe from the fork
child so that the parent is the one sending data to the replicas.
This mechanism has an issue in which a hung replica will cause the master to wait
for it to read the data sent to it forever, thus preventing the fork child from terminating
and preventing the creations of any other forks.

This PR adds a timeout mechanism, much like the ACK-based timeout,
we disconnect replicas that aren't reading the RDB file fast enough.
2021-04-15 17:18:51 +03:00
christianEQ
689a5d2a00 check for musl in logging.tcl as backtrace() is not available with it
Former-commit-id: 2c8239f9cb30aa32de936be09522e6429aa40326
2021-04-05 11:09:49 -04:00
Oran Agra
cd81dcf18b
solve race conditions in psync2-pingoff test (#8720)
Another test race condition in the macos tests.
the test was waiting for PINGs to be generated and put on the replication stream,
but waiting for 1 or 2 seconds doesn't really guarantee that.
then the test that expected 6 full syncs, found only 4
2021-03-30 11:41:06 +03:00
Qu Chen
7de6451818
Properly initialize variable to make valgrind happy in checkChildrenDone(). Removed usage for the obsolete wait3() and wait4() in favor of waitpid(), and properly check for the exit status code. (#8666) 2021-03-24 08:41:05 -07:00
Oran Agra
f6e1a94e03
Corrupt stream key access to uninitialized memory (#8681)
the corrupt-dump-fuzzer test found a case where an access to a corrupt
stream would have caused accessing to uninitialized memory.
now it'll panic instead.

The issue was that there was a stream that says it has more than 0
records, but looking for the max ID came back empty handed.

p.s. when sanitize-dump-payload is used, this corruption is detected,
and the RESTORE command is gracefully rejected.
2021-03-24 11:33:49 +02:00
Oran Agra
a7c02b19bf
Fix race in replication test (#8679)
Since redis 6.2, redis immediately tries to connect to the master, not
waiting for replication cron.

in the slow freebsd CI, this test failed and master_link_status was
already "up" when INFO was called.
2021-03-22 10:50:39 +02:00
Yossi Gottlieb
3c7d6a1853
Improve redis-cli non-binary safe string handling. (#8566)
* The `redis-cli --scan` output should honor output mode (set explicitly or implicitly), and quote key names when not in raw mode.
  * Technically this is a breaking change, but it should be very minor since raw mode is by default on for non-tty output.
  * It should only affect  TTY output (human users) or non-tty output if `--no-raw` is specified.

* Added `--quoted-input` option to treat all arguments as potentially quoted strings.
* Added `--quoted-pattern` option to accept a potentially quoted pattern.

Unquoting is applied to potentially quoted input only if single or double quotes are used. 

Fixes #8561, #8563
2021-03-04 15:03:49 +02:00
Yossi Gottlieb
5d180d2834
Fix potential replication-4 test race condition. (#8583)
Co-authored-by: Oran Agra <oran@redislabs.com>
2021-03-02 18:12:11 +02:00
Oran Agra
349ef3f6a0
fix stream deep sanitization with deleted records (#8568)
When sanitizing the stream listpack, we need to count the deleted records too.
otherwise the last line that checks the next pointer fails.

Add test to cover that state in the stream tests.
2021-03-01 17:23:29 +02:00
Yossi Gottlieb
95ea74549c
Fix failed tests on Linux Alpine and add a CI job. (#8532)
* Remove linux/version.h dependency.

This introduces unnecessary dependencies, and generally not a good idea
as the platform we build on may be different than the platform we run
on.

To determine if sync_file_range exists we can simply rely on header file
hints.

* Fix setproctitle() on libmusl.

The previous ifdef checks were a bit too strict for no apparent
reason.

* Fix tests failure on Linux with no backtrace.

* Add alpine daily CI job.
2021-02-23 12:57:45 +02:00
uriyage
fd052d2a86
Adds INFO fields to track fork child progress (#8414)
* Adding current_save_keys_total and current_save_keys_processed info fields.
  Present in replication, BGSAVE and AOFRW.
* Changing RM_SendChildCOWInfo() to RM_SendChildHeartbeat(double progress)
* Adding new info field current_fork_perc. Present in Replication, BGSAVE, AOFRW,
  and module forks.
2021-02-16 16:06:51 +02:00
Yossi Gottlieb
141ac8df59
Escape unsafe field name characters in INFO. (#8492)
Fixes #8489
2021-02-15 17:08:53 +02:00
Oran Agra
30775bc3e3
solve race in replication-2 test - again (#8491)
this should make it timing independent and also faster in most cases
2021-02-15 12:50:23 +02:00
Oran Agra
02ab14cc2e
solve race in replication-2 test (#8461)
use SIGSTOP instead of DEBUG SLEEP, reduces the test
time by some 2 seconds and avoids failures on slow machines
2021-02-07 16:22:30 +02:00
Yossi Gottlieb
de6f3ad017
Fix FreeBSD tests and CI Daily issues. (#8438)
* Add bash temporarily to allow sentinel fd leaks test to run.
* Use vmactions-freebsd rdist sync to work around bind permission denied
  and slow execution issues.
* Upgrade to tcl8.6 to be aligned with latest Ubuntu envs.
* Concat all command executions to avoid ignoring failures.
* Skip intensive fuzzer on FreeBSD. For some yet unknown reason, generate_fuzzy_traffic_on_key causes TCL to significantly bloat on FreeBSD resulting with out of memory.
2021-02-03 17:35:28 +02:00
Oran Agra
5a7eb9c881
Fix test issues from introduction of HRANDFIELD (#8424)
* The corrupt dump fuzzer found a division by zero.
* in some cases the random fields from the HRANDFIELD tests produced
  fields with newlines and other special chars (due to \ char), this caused
  the TCL tests to see a bulk response that has a newline in it and add {}
  around it, later it can think this is a nested list. in fact the `alpha` random
  string generator isn't using spaces and newlines, so it should not use `\`
  either.
2021-01-31 12:13:45 +02:00
Allen Farris
0d18a1e85f
implement FAILOVER command (#8315)
Implement FAILOVER command, which coordinates failover
between the server and one of its replicas.
2021-01-28 13:18:05 -08:00
christianEQ
da274b9c14 fixed replicaof no one config, added test case
Former-commit-id: e2615ccf88ddb2a93536b62318983780890c4819
2021-01-28 19:49:22 +00:00
Raghav Muddur
0367a80819
GETEX, GETDEL and SET PXAT/EXAT (#8327)
This commit introduces two new command and two options for an existing command

GETEX <key> [PERSIST][EX seconds][PX milliseconds] [EXAT seconds-timestamp]
[PXAT milliseconds-timestamp]

The getexCommand() function implements extended options and variants of the GET
command. Unlike GET command this command is not read-only. Only one of the options
can be used at a given time.

1. PERSIST removes any TTL associated with the key.
2. EX Set expiry TTL in seconds.
3. PX Set expiry TTL in milliseconds.
4. EXAT Same like EX instead of specifying the number of seconds representing the
    TTL (time to live), it takes an absolute Unix timestamp
5. PXAT Same like PX instead of specifying the number of milliseconds representing the
    TTL (time to live), it takes an absolute Unix timestamp

Command would return either the bulk string, error or nil.

GETDEL <key>
Would delete the key after getting.

SET key value [NX] [XX] [KEEPTTL] [GET] [EX <seconds>] [PX <milliseconds>]
[EXAT <seconds-timestamp>][PXAT <milliseconds-timestamp>]

Two new options added here are EXAT and PXAT

Key implementation notes
- `SET` with `PX/EX/EXAT/PXAT` is always translated to `PXAT` in `AOF`. When relative time is
  specified (`PX/EX`), replication will always use `PX`.
- `setexCommand` and `psetexCommand` would no longer need translation in `feedAppendOnlyFile`
  as they are modified to invoke `setGenericCommand ` with appropriate flags which will take care of
  correct AOF translation.
- `GETEX` without any optional argument behaves like `GET`.
- `GETEX` command is never propagated, It is either propagated as `PEXPIRE[AT], or PERSIST`.
- `GETDEL` command is propagated as `DEL`
- Combined the validation for `SET` and `GETEX` arguments. 
- Test cases to validate AOF/Replication propagation
2021-01-27 19:47:26 +02:00
christianEQ
358debebfa Merge tag 'tags/6.0.10' into redismerge_2021-01-20
Former-commit-id: dadce055f897cee83946c2d3e5cbb76341b94230
2021-01-26 21:43:09 +00:00
Yossi Gottlieb
522d93607a
Add io-thread daily CI tests. (#8232)
This adds basic coverage to IO threads by running the cluster and few selected Redis test suite tests with the IO threads enabled.

Also provides some necessary additional improvements to the test suite:

* Add --config to sentinel/cluster tests for arbitrary configuration.
* Fix --tags whitelisting which was broken.
* Add a `network` tag to some tests that are more network intensive. This is work in progress and more tests should be properly tagged in the future.
2021-01-17 15:48:48 +02:00
Oran Agra
8dd16caec8
Fix last COW INFO report, Skip test on non-linux platforms (#8301)
- the last COW report wasn't always read from the pipe
  (receiveLastChildInfo wasn't used)
- but in fact, there's no reason we won't always try to drain that pipe
  so i'm unifying receiveLastChildInfo with receiveChildInfo
- adjust threshold of the COW test when run in accurate mode
- add some prints in case this test fails again
- fix indentation, page size, and PID! in MacOS proc info

p.s. it seems that pri_pages_dirtied is always 0
2021-01-08 23:35:30 +02:00
YaacovHazan
ea930a352c Report child copy-on-write info continuously
Add INFO field, rdb_active_cow_size, to report COW of a live fork child while
it's active.
- once in 1024 keys check the time, and if there's more than one second since
  the last report send a report to the parent via the pipe.
- refactor the child_info_data struct, it's an implementation detail that
  shouldn't be in the server struct, and not used to communicate data between
  caller and callee
- remove the magic value from that struct (not sure what it was good for), and
  instead add handling of short reads.
- add another value to the structure, cow_type, to indicate if the report is
  for the new rdb_active_cow_size field, or it's the last report of a
  successful operation
- add new Module API to report the active COW
- add more asserts variants to test.tcl
2021-01-07 16:14:29 +02:00
Oran Agra
cfb449cc80
Sanitize dump payload: excessive free on dup zset fields (#8189) 2020-12-14 17:10:31 +02:00
Oran Agra
7ca00d694d Sanitize dump payload: fail RESTORE if memory allocation fails
When RDB input attempts to make a huge memory allocation that fails,
RESTORE should fail gracefully rather than die with panic
2020-12-06 14:54:34 +02:00
Oran Agra
3716950cfc Sanitize dump payload: validate no duplicate records in hash/zset/intset
If RESTORE passes successfully with full sanitization, we can't affort
to crash later on assertion due to duplicate records in a hash when
converting it form ziplist to dict.
This means that when doing full sanitization, we must make sure there
are no duplicate records in any of the collections.
2020-12-06 14:54:34 +02:00
Oran Agra
c31055db61 Sanitize dump payload: fuzz tester and fixes for segfaults and leaks it exposed
The test creates keys with various encodings, DUMP them, corrupt the payload
and RESTORES it.
It utilizes the recently added use-exit-on-panic config to distinguish between
 asserts and segfaults.
If the restore succeeds, it runs random commands on the key to attempt to
trigger a crash.

It runs in two modes, one with deep sanitation enabled and one without.
In the first one we don't expect any assertions or segfaults, in the second one
we expect assertions, but no segfaults.
We also check for leaks and invalid reads using valgrind, and if we find them
we print the commands that lead to that issue.

Changes in the code (other than the test):
- Replace a few NPD (null pointer deference) flows and division by zero with an
  assertion, so that it doesn't fail the test. (since we set the server to use
  `exit` rather than `abort` on assertion).
- Fix quite a lot of flows in rdb.c that could have lead to memory leaks in
  RESTORE command (since it now responds with an error rather than panic)
- Add a DEBUG flag for SET-SKIP-CHECKSUM-VALIDATION so that the test don't need
  to bother with faking a valid checksum
- Remove a pile of code in serverLogObjectDebugInfo which is actually unsafe to
  run in the crash report (see comments in the code)
- fix a missing boundary check in lzf_decompress

test suite infra improvements:
- be able to run valgrind checks before the process terminates
- rotate log files when restarting servers
2020-12-06 14:54:34 +02:00
Oran Agra
01c13bddea Sanitize dump payload: improve tests of ziplist and stream encodings
- improve stream rdb encoding test to include more types of stream metadata
- add test to cover various ziplist encoding entries (although it does
  look like the stress test above it is able to find some too
- add another test for ziplist encoding for hash with full sanitization
- add similar ziplist encoding tests for list
2020-12-06 14:54:34 +02:00
Oran Agra
ca1c182567 Sanitize dump payload: ziplist, listpack, zipmap, intset, stream
When loading an encoded payload we will at least do a shallow validation to
check that the size that's encoded in the payload matches the size of the
allocation.
This let's us later use this encoded size to make sure the various offsets
inside encoded payload don't reach outside the allocation, if they do, we'll
assert/panic, but at least we won't segfault or smear memory.

We can also do 'deep' validation which runs on all the records of the encoded
payload and validates that they don't contain invalid offsets. This lets us
detect corruptions early and reject a RESTORE command rather than accepting
it and asserting (crashing) later when accessing that payload via some command.

configuration:
- adding ACL flag skip-sanitize-payload
- adding config sanitize-dump-payload [yes/no/clients]

For now, we don't have a good way to ensure MIGRATE in cluster resharding isn't
being slowed down by these sanitation, so i'm setting the default value to `no`,
but later on it should be set to `clients` by default.

changes:
- changing rdbReportError not to `exit` in RESTORE command
- adding a new stat to be able to later check if cluster MIGRATE isn't being
  slowed down by sanitation.
2020-12-06 14:54:34 +02:00
luhuachao
7885faf18b
Modify help msg PING_BULK to PING_MBULK in benchmark (#8109)
As described in redis-benchamrk help message 'The test names are the same as the ones produced as output.', In redis-benchmark output, we can only see PING_BULK, but the cmd `redis-benchmark -t ping_bulk` is not supported. We  have to run it with ping_mbulk which is not user friendly.
2020-12-02 13:17:25 +02:00
John Sully
c179c98870 Fix issue where active replication doesn't replicate RDB data
Former-commit-id: 527b7eb0742567302e0343e3acbed9814c0cbb95
2020-11-23 02:01:40 +00:00
John Sully
e8753d1b4b Blocking clients should not crash if an active replica loads a remote RDB with a key in the blocklist
Former-commit-id: 1c525e20b10e0a47af687a0d46faf75229a1cbf5
2020-11-19 23:28:01 +00:00
filipe oliveira
10b5006934
Enable specifying TLS ciphers(suites) in redis-cli/redis-benchmark (#8005)
Enable specifying the preferred ciphers and/or ciphersuites for redis-cli/redis-benchmark.

Co-authored-by: Yossi Gottlieb <yossigo@gmail.com>
2020-11-04 14:49:15 +02:00
Meir Shpilraien (Spielrein)
762be79f0b
Disable new SIGABRT test on valgrind (#8013)
The crash reports cause false-positive warnings when run with valgrind.
2020-11-04 13:13:55 +02:00
Meir Shpilraien (Spielrein)
f210e197f3
Added crash report on SIGABRT (#8004)
The reason that we want to get a full crash report on SIGABRT
is that the jmalloc, when detecting a corruption, calls abort().
This will cause the Redis to exist silently without any report
and without any way to analyze what happened.
2020-11-03 14:59:21 +02:00
Oran Agra
9122379abc
Propagate GETSET and SET-GET as SET (#7957)
- Generates a more backwards compatible command stream
- Slightly more efficient execution in replica/AOF
- Add a test for coverage
2020-11-03 14:56:57 +02:00
Wang Yuan
dc899c4c88
Fix timing dependence in replication tcl tests (#7969)
Remove 'fork child $pid' log in replication.tcl
2020-10-27 09:36:42 +02:00