101 Commits

Author SHA1 Message Date
John Sully
5267928381 Merge tag '6.2.2' into unstable
Former-commit-id: 93ebb31b17adec5d406d2e30a5b9ea71c07fce5c
2021-05-21 05:54:39 +00:00
John Sully
fe8efa916b Merge tag '6.2.1' into unstable
Former-commit-id: bfed57e3e0edaa724b9d060a6bb8edc5a6de65fa
2021-05-19 02:59:48 +00:00
Oran Agra
f4b5a4d869
Improve testsuite print of log file (#8805)
1. the `dump_logs` option would have printed only logs of servers that were
   spawn before the test proc started, and not ones that the test proc
   started inside it.
2. when a server proc catches an exception it should normally forward the
   exception upwards, specifically when it's an assertion that should be
   caught by a test proc above. however, in `durable` mode, we caught all
   exceptions printed them to stdout and let the code continue,
   this was wrong to do for assertions, which should have still been
   propagated to the test function.
3. don't bother to search for crash log to print if we printed the the
   entire log anyway
4. if no crash log was found, no need to print anything (i.e. the fact it
   wasn't found)
5. rename warnings_from_file to crashlog_from_file
2021-04-18 11:55:54 +03:00
Huang Zhw
a19c4058be
When tests exit normally, some processes may still be alive (#8647)
In certain scenario start_server may think it failed to start a redis server
although it started successfully. in these cases, it'll not terminate it, and
it'll remain running when the test is over.

In start_server if config doesn't have bind (the minimal.conf in introspection.tcl),
it will try to bind ipv4 and ipv6. One may success while other fails. It will
output "Could not create server TCP listening socket".
wait_server_started uses this message to check whether instance started
successfully. So it will consider that it failed even though redis started successfully.

Additionally, in some cases it wasn't clear to users why the server exited,
since the warning message printed to the log, could in some cases be harmless,
and in some cases fatal.

This PR adds makes a clear distinction between a warning log message and
a fatal one, and changes the test suite to look for the fatal message.
2021-03-16 17:25:30 +02:00
Madelyn Olson
c6f0ea2c81
Allow stopped redis processes to be killed in tests (#8552) 2021-02-24 14:26:16 -08:00
Yossi Gottlieb
5b8350aaaa
Add --dump-logs tests option. (#8459)
Dump the entire server log if a test failed, to easy troubleshooting
with no access to log files.
2021-02-07 12:37:24 +02:00
christianEQ
358debebfa Merge tag 'tags/6.0.10' into redismerge_2021-01-20
Former-commit-id: dadce055f897cee83946c2d3e5cbb76341b94230
2021-01-26 21:43:09 +00:00
Yossi Gottlieb
522d93607a
Add io-thread daily CI tests. (#8232)
This adds basic coverage to IO threads by running the cluster and few selected Redis test suite tests with the IO threads enabled.

Also provides some necessary additional improvements to the test suite:

* Add --config to sentinel/cluster tests for arbitrary configuration.
* Fix --tags whitelisting which was broken.
* Add a `network` tag to some tests that are more network intensive. This is work in progress and more tests should be properly tagged in the future.
2021-01-17 15:48:48 +02:00
Yang Bodong
ee59dc1b5c
Tests: fix the problem that Darwin memory leak detection may fail (#8213)
Apparently the "leaks" took reports a different error string about process
that's not found in each version of MacOS.
This cause the test suite to fail on some OS versions, since some tests terminate
the process before looking for leaks.
Instead of looking at the error string, we now look at the (documented) exit code.
2020-12-23 16:28:17 +02:00
Yossi Gottlieb
8c291b97b9
TLS: Add different client cert support. (#8076)
This adds a new `tls-client-cert-file` and `tls-client-key-file`
configuration directives which make it possible to use different
certificates for the TLS-server and TLS-client functions of Redis.

This is an optional directive. If it is not specified the `tls-cert-file`
and `tls-key-file` directives are used for TLS client functions as well.

Also, `utils/gen-test-certs.sh` now creates additional server-only and client-only certs and will skip intensive operations if target files already exist.
2020-12-11 18:31:40 +02:00
Oran Agra
c31055db61 Sanitize dump payload: fuzz tester and fixes for segfaults and leaks it exposed
The test creates keys with various encodings, DUMP them, corrupt the payload
and RESTORES it.
It utilizes the recently added use-exit-on-panic config to distinguish between
 asserts and segfaults.
If the restore succeeds, it runs random commands on the key to attempt to
trigger a crash.

It runs in two modes, one with deep sanitation enabled and one without.
In the first one we don't expect any assertions or segfaults, in the second one
we expect assertions, but no segfaults.
We also check for leaks and invalid reads using valgrind, and if we find them
we print the commands that lead to that issue.

Changes in the code (other than the test):
- Replace a few NPD (null pointer deference) flows and division by zero with an
  assertion, so that it doesn't fail the test. (since we set the server to use
  `exit` rather than `abort` on assertion).
- Fix quite a lot of flows in rdb.c that could have lead to memory leaks in
  RESTORE command (since it now responds with an error rather than panic)
- Add a DEBUG flag for SET-SKIP-CHECKSUM-VALIDATION so that the test don't need
  to bother with faking a valid checksum
- Remove a pile of code in serverLogObjectDebugInfo which is actually unsafe to
  run in the crash report (see comments in the code)
- fix a missing boundary check in lzf_decompress

test suite infra improvements:
- be able to run valgrind checks before the process terminates
- rotate log files when restarting servers
2020-12-06 14:54:34 +02:00
Yossi Gottlieb
79e6ab31d8 Fix tests failure on busybox systems. (#7916)
(cherry picked from commit ef92f507dd0c402a916c16435e7f3f92598b7242)
2020-10-27 09:12:01 +02:00
Yossi Gottlieb
ef92f507dd
Fix tests failure on busybox systems. (#7916) 2020-10-18 14:50:29 +03:00
John Sully
4f18a247e3 Merge tag '6.0.8' into unstable
Former-commit-id: 4c7e4b91a6bb2034636856b608b8c386d07f5541
2020-09-30 19:47:55 +00:00
Yossi Gottlieb
e9fef49e12 Tests: fix unmonitored servers. (#7756)
There is an inherent race condition in port allocation for spawned
servers. If a server fails to start because a port is taken, a new port
is allocated. This fixes a problem where the logs are not truncated and
as a result a large number of unmonitored servers are started.

(cherry picked from commit 2df4cb93acabf10bb0ff39c12030791b0947e719)
2020-09-10 14:09:00 +03:00
Oran Agra
d410dc3162 Improve valgrind support for cluster tests (#7725)
- redirect valgrind reports to a dedicated file rather than console
- try to avoid killing instances with SIGKILL so that we get the memory
  leak report (killing with SIGTERM before resorting to SIGKILL)
- search for valgrind reports when done, print them and fail the tests
- add --dont-clean option to keep the logs on exit
- fix exit error code when crash is found (would have exited with 0)

changes that affect the normal redis test suite:
- refactor check_valgrind_errors into two functions one to search and
  one to report
- move the search half into util.tcl to serve the cluster tests too
- ignore "address range perms" valgrind warnings which seem non relevant.

(cherry picked from commit 2b998de46078c172c6b19ac3b779318e7992c60a)
2020-09-10 14:09:00 +03:00
Oran Agra
41c7c7919c test infra - add durable mode to work around test suite crashing
in some cases a command that returns an error possibly due to a timing
issue causes the tcl code to crash and thus prevents the rest of the
tests from running. this adds an option to make the test proceed despite
the crash.
maybe it should be the default mode some day.

(cherry picked from commit fe5da2e60d8d6d907062f4789673fbe06fa8773e)
2020-09-10 14:09:00 +03:00
Oran Agra
72d6f966ac test infra - flushall between tests in external mode
(cherry picked from commit b65e5aca86b9c2d24b96abc8414a45f9907b6f7d)
2020-09-10 14:09:00 +03:00
Oran Agra
28e074608c test infra - improve test skipping ability
- skip full units
- skip a single test (not just a list of tests)
- when skipping tag, skip spinning up servers, not just the tests
- skip tags when running against an external server too
- allow using multiple tags (split them)

(cherry picked from commit 677d14c2137ab50fa25c8163d20b14bc563261c7)
2020-09-10 14:09:00 +03:00
Oran Agra
5b8de5b7f2 test infra - reduce disk space usage
this is important when running a test with --loop

(cherry picked from commit e3e69c25fd05b608f5ea8d612bc0e377922a6115)
2020-09-10 14:09:00 +03:00
Oran Agra
bce350c666 test infra - write test name to logfile
(cherry picked from commit 9d527d076b17851b87bc95aa34cca8fa5a91d41b)
2020-09-10 14:09:00 +03:00
Yossi Gottlieb
2df4cb93ac
Tests: fix unmonitored servers. (#7756)
There is an inherent race condition in port allocation for spawned
servers. If a server fails to start because a port is taken, a new port
is allocated. This fixes a problem where the logs are not truncated and
as a result a large number of unmonitored servers are started.
2020-09-07 17:30:36 +03:00
Oran Agra
2b998de460
Improve valgrind support for cluster tests (#7725)
- redirect valgrind reports to a dedicated file rather than console
- try to avoid killing instances with SIGKILL so that we get the memory
  leak report (killing with SIGTERM before resorting to SIGKILL)
- search for valgrind reports when done, print them and fail the tests
- add --dont-clean option to keep the logs on exit
- fix exit error code when crash is found (would have exited with 0)

changes that affect the normal redis test suite:
- refactor check_valgrind_errors into two functions one to search and
  one to report
- move the search half into util.tcl to serve the cluster tests too
- ignore "address range perms" valgrind warnings which seem non relevant.
2020-09-06 11:11:49 +03:00
Oran Agra
fe5da2e60d test infra - add durable mode to work around test suite crashing
in some cases a command that returns an error possibly due to a timing
issue causes the tcl code to crash and thus prevents the rest of the
tests from running. this adds an option to make the test proceed despite
the crash.
maybe it should be the default mode some day.
2020-09-06 09:59:19 +03:00
Oran Agra
b65e5aca86 test infra - flushall between tests in external mode 2020-09-06 09:59:19 +03:00
Oran Agra
677d14c213 test infra - improve test skipping ability
- skip full units
- skip a single test (not just a list of tests)
- when skipping tag, skip spinning up servers, not just the tests
- skip tags when running against an external server too
- allow using multiple tags (split them)
2020-09-06 09:59:19 +03:00
Oran Agra
e3e69c25fd test infra - reduce disk space usage
this is important when running a test with --loop
2020-09-06 09:59:19 +03:00
Oran Agra
9d527d076b test infra - write test name to logfile 2020-09-06 09:59:19 +03:00
Oran Agra
e8aa5583d0 testsuite may leave servers alive on error (#7549)
in cases where you have
test name {
  start_server {
    start_server {
      assert
    }
  }
}

the exception will be thrown to the test proc, and the servers are
supposed to be killed on the way out. but it seems there was always a
bug of not cleaning the server stack, and recently (#7404) we started
relying on that stack in order to kill them, so with that bug sometimes
we would have tried to kill the same server twice, and leave one alive.

luckly, in most cases the pattern is:
start_server {
  test name {
  }
}

(cherry picked from commit 36b949438547eb5bf8555fcac2c5040528fd7854)
2020-09-01 09:27:58 +03:00
Oran Agra
36b9494385
testsuite may leave servers alive on error (#7549)
in cases where you have
test name {
  start_server {
    start_server {
      assert
    }
  }
}

the exception will be thrown to the test proc, and the servers are
supposed to be killed on the way out. but it seems there was always a
bug of not cleaning the server stack, and recently (#7404) we started
relying on that stack in order to kill them, so with that bug sometimes
we would have tried to kill the same server twice, and leave one alive.

luckly, in most cases the pattern is:
start_server {
  test name {
  }
}
2020-07-21 16:56:19 +03:00
Oran Agra
1104113c07 tests/valgrind: don't use debug restart (#7404)
* tests/valgrind: don't use debug restart

DEBUG REATART causes two issues:
1. it uses execve which replaces the original process and valgrind doesn't
   have a chance to check for errors, so leaks go unreported.
2. valgrind report invalid calls to close() which we're unable to resolve.

So now the tests use restart_server mechanism in the tests, that terminates
the old server and starts a new one, new PID, but same stdout, stderr.

since the stderr can contain two or more valgrind report, it is not enough
to just check for the absence of leaks, we also need to check for some known
errors, we do both, and fail if we either find an error, or can't find a
report saying there are no leaks.

other changes:
- when killing a server that was already terminated we check for leaks too.
- adding DEBUG LEAK which was used to test it.
- adding --trace-children to valgrind, although no longer needed.
- since the stdout contains two or more runs, we need slightly different way
  of checking if the new process is up (explicitly looking for the new PID)
- move the code that handles --wait-server to happen earlier (before
  watching the startup message in the log), and serve the restarted server too.

* squashme - CR fixes

(cherry picked from commit 69ade87325eedebdb44760af9a8c28e15381888e)
2020-07-20 21:08:26 +03:00
Oran Agra
69ade87325
tests/valgrind: don't use debug restart (#7404)
* tests/valgrind: don't use debug restart

DEBUG REATART causes two issues:
1. it uses execve which replaces the original process and valgrind doesn't
   have a chance to check for errors, so leaks go unreported.
2. valgrind report invalid calls to close() which we're unable to resolve.

So now the tests use restart_server mechanism in the tests, that terminates
the old server and starts a new one, new PID, but same stdout, stderr.

since the stderr can contain two or more valgrind report, it is not enough
to just check for the absence of leaks, we also need to check for some known
errors, we do both, and fail if we either find an error, or can't find a
report saying there are no leaks.

other changes:
- when killing a server that was already terminated we check for leaks too.
- adding DEBUG LEAK which was used to test it.
- adding --trace-children to valgrind, although no longer needed.
- since the stdout contains two or more runs, we need slightly different way
  of checking if the new process is up (explicitly looking for the new PID)
- move the code that handles --wait-server to happen earlier (before
  watching the startup message in the log), and serve the restarted server too.

* squashme - CR fixes
2020-07-10 08:26:52 +03:00
John Sully
cfe9f8f3bc Merge tag '6.0.4' into unstable
Redis 6.0.4.


Former-commit-id: 9c31ac7925edba187e527f506e5e992946bd38a6
2020-05-29 00:57:07 -04:00
Oran Agra
a2ae463520 tests: each test client work on a distinct port range
apparently when running tests in parallel (the default of --clients 16),
there's a chance for two tests to use the same port.
specifically, one test might shutdown a master and still have the
replica up, and then another test will re-use the port number of master
for another master, and then that replica will connect to the master of
the other test.

this can cause a master to count too many full syncs and fail a test if
we run the tests with --single integration/psync2 --loop --stop

see Probmem 2 in #7314
2020-05-28 10:09:51 +02:00
Oran Agra
e258a1c087 tests: each test client work on a distinct port range
apparently when running tests in parallel (the default of --clients 16),
there's a chance for two tests to use the same port.
specifically, one test might shutdown a master and still have the
replica up, and then another test will re-use the port number of master
for another master, and then that replica will connect to the master of
the other test.

this can cause a master to count too many full syncs and fail a test if
we run the tests with --single integration/psync2 --loop --stop

see Probmem 2 in #7314
2020-05-26 11:17:08 +03:00
John Sully
b6500a08dc Merge commit '026cc11b056f063631d990f1a9db45b4e583974e' into unstable
Former-commit-id: 95cecb0229af0278cf614ffd746ba829ae7c897c
2020-05-21 17:45:15 -04:00
John Sully
3e2ba10d00 Run all KeyDB instances in testmode during tests
Former-commit-id: cd306f1d23f4fbb900433edbf55d89099bbf903c
2020-04-15 22:27:04 -04:00
John Sully
aa48225a76 Merge commit '76d57161d960f27ac92765118e3b80fef8847b8f' into redis_6_merge
Former-commit-id: 320bc3c0329ff9e5a980b79426b719addae381cf
2020-04-14 21:04:42 -04:00
John Sully
f27524674a Merge commit '973297336fc05a601e17be70aba88e5dca6480ae' into redis_6_merge
Former-commit-id: ef1236b6009ebd7b00f6dd2f43df57ad95e51253
2020-04-14 20:19:48 -04:00
Oran Agra
02b594f6ad diffrent fix for runtest --host --port 2020-04-07 16:52:28 +02:00
Oran Agra
cf3789f045 diffrent fix for runtest --host --port 2020-04-06 09:41:14 +03:00
bodong.ybd
76d57161d9 Fix bug of tcl test using external server 2020-03-25 15:54:34 +01:00
bodong.ybd
336458d4b5 Fix bug of tcl test using external server 2020-03-11 21:01:27 +08:00
antirez
ef3551d149 Test engine: experimental change to avoid busy port problems. 2020-02-27 18:02:30 +01:00
antirez
72c053519c Test engine: detect timeout when checking for Redis startup. 2020-02-27 18:02:30 +01:00
antirez
294c9af469 Test engine: better tracking of what workers are doing. 2020-02-27 18:00:47 +01:00
antirez
73305861f5 Test engine: experimental change to avoid busy port problems. 2020-02-24 10:46:23 +01:00
antirez
e78c4e813c Test engine: detect timeout when checking for Redis startup. 2020-02-21 18:55:56 +01:00
antirez
c6954de3ea Test engine: better tracking of what workers are doing. 2020-02-21 17:08:45 +01:00
John Sully
8e5fe97525 Merge remote-tracking branch 'redis/6.0' into redis_merge
Former-commit-id: ef9a3cadcf94326bf2f163db7698aad9a3c01690
2020-01-27 02:55:48 -05:00