153 Commits

Author SHA1 Message Date
Oran Agra
826b49bcb4 handle cur_test for nested tests
if there are nested tests and nested servers, we need to restore the
previous value of cur_test when a test exist.

example:
```
test{test 1} {
	start_server {
		test{test 1.1 - master only} {
		}
		start_server {
		    test{test 1.2 - with replication} {
            }
		}
	}
}
```
when `test 1.1 - master only exists`, we're still inside `test 1`

(cherry picked from commit 610b4ff16a62062338588c4508a73784fb962c0b)
2020-09-10 14:09:00 +03:00
bodong.ybd
cb4f96657b Tests: Some fixes for macOS
1) cur_test: when restart_server, "no such variable" error occurs
  ./runtest --single integration/rdb
  test {client freed during loading}
      SET ::cur_test
      restart_server
        kill_server
          test "Check for memory leaks (pid $pid)"
          SET ::cur_test
          UNSET ::cur_test
      UNSET ::cur_test // This global variable has been unset.

2) `ps --ppid` not available on macOS platform, can be replaced with
`pgrep -P pid`.

(cherry picked from commit e90385e2232d41fd7c40dc239279f9837e7bdf57)
2020-09-10 14:09:00 +03:00
Yossi Gottlieb
9275c8b990 Tests: fix unmonitored servers. (#7756)
There is an inherent race condition in port allocation for spawned
servers. If a server fails to start because a port is taken, a new port
is allocated. This fixes a problem where the logs are not truncated and
as a result a large number of unmonitored servers are started.

(cherry picked from commit 871e85b8a75a53f90044ac04b0f5a9ba415c3bfa)
2020-09-10 14:09:00 +03:00
Oran Agra
540841d6f7 Improve valgrind support for cluster tests (#7725)
- redirect valgrind reports to a dedicated file rather than console
- try to avoid killing instances with SIGKILL so that we get the memory
  leak report (killing with SIGTERM before resorting to SIGKILL)
- search for valgrind reports when done, print them and fail the tests
- add --dont-clean option to keep the logs on exit
- fix exit error code when crash is found (would have exited with 0)

changes that affect the normal redis test suite:
- refactor check_valgrind_errors into two functions one to search and
  one to report
- move the search half into util.tcl to serve the cluster tests too
- ignore "address range perms" valgrind warnings which seem non relevant.

(cherry picked from commit da723a917dec7f2514d821a615668e158bb4f60c)
2020-09-10 14:09:00 +03:00
Oran Agra
81476c0cf7 test infra - add durable mode to work around test suite crashing
in some cases a command that returns an error possibly due to a timing
issue causes the tcl code to crash and thus prevents the rest of the
tests from running. this adds an option to make the test proceed despite
the crash.
maybe it should be the default mode some day.

(cherry picked from commit cf22e8eb91c2c1a769fda4c4de9eba3163dd7f05)
2020-09-10 14:09:00 +03:00
Oran Agra
e001152825 test infra - wait_done_loading
reduce code duplication in aof.tcl.
move creation of clients into the test so that it can be skipped

(cherry picked from commit cc455a710cc68d0fd8243cd1f04c5ee7332e4fdb)
2020-09-10 14:09:00 +03:00
Oran Agra
f180326b65 test infra - flushall between tests in external mode
(cherry picked from commit 2468c17a3229ae37825466a18dce9a5272eeef30)
2020-09-10 14:09:00 +03:00
Oran Agra
575d07b7a8 test infra - improve test skipping ability
- skip full units
- skip a single test (not just a list of tests)
- when skipping tag, skip spinning up servers, not just the tests
- skip tags when running against an external server too
- allow using multiple tags (split them)

(cherry picked from commit 5c61f1a6ed876186b944e79f903354cd81077bb6)
2020-09-10 14:09:00 +03:00
Oran Agra
7d3cec9686 test infra - reduce disk space usage
this is important when running a test with --loop

(cherry picked from commit fc18f16260d15b3584d92f73cebafa3a552e2686)
2020-09-10 14:09:00 +03:00
Oran Agra
60bec0c20c test infra - write test name to logfile
(cherry picked from commit e783c03dd1828fbf67259ee037a4faf835c4700a)
2020-09-10 14:09:00 +03:00
Yossi Gottlieb
c5675c66bc Tests: fix redis-cli with remote hosts. (#7693)
(cherry picked from commit 257f9f462f7782dcaecf7bbf35f4701b20b88a45)
2020-09-01 09:27:58 +03:00
Oran Agra
10a8407a4f Fix failing tests due to issues with wait_for_log_message (#7572)
- the test now waits for specific set of log messages rather than wait for
  timeout looking for just one message.
- we don't wanna sample the current length of the log after an action, due
  to a race, we need to start the search from the line number of the last
  message we where waiting for.
- when attempting to trigger a full sync, use multi-exec to avoid a race
  where the replica manages to re-connect before we completed the set of
  actions that should force a full sync.
- fix verify_log_message which was broken and unused

(cherry picked from commit 06aaeabaea9d9b248e8a790dde352cd14d66628a)
2020-09-01 09:27:58 +03:00
Oran Agra
2b45c88a6a testsuite may leave servers alive on error (#7549)
in cases where you have
test name {
  start_server {
    start_server {
      assert
    }
  }
}

the exception will be thrown to the test proc, and the servers are
supposed to be killed on the way out. but it seems there was always a
bug of not cleaning the server stack, and recently (#7404) we started
relying on that stack in order to kill them, so with that bug sometimes
we would have tried to kill the same server twice, and leave one alive.

luckly, in most cases the pattern is:
start_server {
  test name {
  }
}

(cherry picked from commit bb170fa06e5909dd816b6530121952d57c8209a0)
2020-09-01 09:27:58 +03:00
Remi Collet
443e57b08e Fix deprecated tail syntax in tests (#7543)
(cherry picked from commit 7853d8410b12c3ffac699c8a2e06f2a8e6df26b0)
2020-09-01 09:27:58 +03:00
Oran Agra
905ffb72e9 runtest --stop pause stops before terminating the redis server (#7513)
in the majority of the cases (on this rarely used feature) we want to
stop and be able to connect to the shard with redis-cli.
since these are two different processes interracting with the tty we
need to stop both, and we'll have to hit enter twice, but it's not that
bad considering it is rarely used.

(cherry picked from commit 3351549c22434337dfa8a262dce678679a35d7da)
2020-07-20 21:08:26 +03:00
Oran Agra
c994e73c8e stabilize tests that look for log lines (#7367)
tests were sensitive to additional log lines appearing in the log
causing the search to come empty handed.

instead of just looking for the n last log lines, capture the log lines
before performing the action, and then search from that offset.

(cherry picked from commit efc4189b6227a17f26ed9bd6bbac62bf4bf7ab66)
2020-07-20 21:08:26 +03:00
Oran Agra
298e93c360 tests/valgrind: don't use debug restart (#7404)
* tests/valgrind: don't use debug restart

DEBUG REATART causes two issues:
1. it uses execve which replaces the original process and valgrind doesn't
   have a chance to check for errors, so leaks go unreported.
2. valgrind report invalid calls to close() which we're unable to resolve.

So now the tests use restart_server mechanism in the tests, that terminates
the old server and starts a new one, new PID, but same stdout, stderr.

since the stderr can contain two or more valgrind report, it is not enough
to just check for the absence of leaks, we also need to check for some known
errors, we do both, and fail if we either find an error, or can't find a
report saying there are no leaks.

other changes:
- when killing a server that was already terminated we check for leaks too.
- adding DEBUG LEAK which was used to test it.
- adding --trace-children to valgrind, although no longer needed.
- since the stdout contains two or more runs, we need slightly different way
  of checking if the new process is up (explicitly looking for the new PID)
- move the code that handles --wait-server to happen earlier (before
  watching the startup message in the log), and serve the restarted server too.

* squashme - CR fixes

(cherry picked from commit 8d4f055e43ab554adfce617c971f10c4b6423484)
2020-07-20 21:08:26 +03:00
Oran Agra
571b03021a tests: find_available_port start search from next port
i.e. don't start the search from scratch hitting the used ones again.
this will also reduce the likelihood of collisions (if there are any
left) by increasing the time until we re-use a port we did use in the
past.
2020-05-28 10:09:51 +02:00
Oran Agra
4653d796f0 tests: each test client work on a distinct port range
apparently when running tests in parallel (the default of --clients 16),
there's a chance for two tests to use the same port.
specifically, one test might shutdown a master and still have the
replica up, and then another test will re-use the port number of master
for another master, and then that replica will connect to the master of
the other test.

this can cause a master to count too many full syncs and fail a test if
we run the tests with --single integration/psync2 --loop --stop

see Probmem 2 in #7314
2020-05-28 10:09:51 +02:00
WuYunlong
65de5a1a7d Handle keys with hash tag when computing hash slot using tcl cluster client. 2020-05-22 12:37:49 +02:00
Oran Agra
dcd6726366 diffrent fix for runtest --host --port 2020-04-07 16:52:28 +02:00
bodong.ybd
c609bf3f2c Fix bug of tcl test using external server 2020-03-25 15:54:34 +01:00
antirez
2542ff9c25 Test engine: experimental change to avoid busy port problems. 2020-02-27 18:02:30 +01:00
antirez
22ad06eafd Test engine: detect timeout when checking for Redis startup. 2020-02-27 18:02:30 +01:00
antirez
2d9a144515 Test engine: better tracking of what workers are doing. 2020-02-27 18:00:47 +01:00
Oran Agra
5752b1718b test infra: improve prints on failed assertions
sometimes we have several assertions with the same condition in the same test
at different stages, and when these fail (the ones that print the condition
text) you don't know which one it was. other assertions didn't print the
condition text (variable names), just the expected and unexpected values.

So now, all assertions print context line, and conditin text.

besides, one of the major differences between 'assert' and 'assert_equal',
is that the later is able to print the value that doesn't match the expected.
if there is a rare non-reproducible failure, it is helpful to know what was
the value the test encountered and how far it was from the threshold.

So now, adding assert_lessthan and assert_range that can be used in some places.
were we used just 'assert { a > b }' so far.
2019-10-29 17:38:12 +02:00
Yossi Gottlieb
85d7f38136 Merge remote-tracking branch 'upstream/unstable' into tls 2019-10-16 17:08:07 +03:00
Yossi Gottlieb
df08b624bd TLS: Configuration options.
Add configuration options for TLS protocol versions, ciphers/cipher
suites selection, etc.
2019-10-07 21:07:27 +03:00
Yossi Gottlieb
10ffeb03e4 TLS: Connections refactoring and TLS support.
* Introduce a connection abstraction layer for all socket operations and
integrate it across the code base.
* Provide an optional TLS connections implementation based on OpenSSL.
* Pull a newer version of hiredis with TLS support.
* Tests, redis-cli updates for TLS support.
2019-10-07 21:06:13 +03:00
Madelyn Olson
364c8601e3 Allowed passing in of password hash and fixed config rewrite 2019-09-30 17:57:49 +02:00
Oran Agra
73a945c73c prevent diskless replica from terminating on short read
now that replica can read rdb directly from the socket, it should avoid exiting
on short read and instead try to re-sync.

this commit tries to have minimal effects on non-diskless rdb reading.
and includes a test that tries to trigger this scenario on various read cases.
2019-07-17 16:46:22 +02:00
Oran Agra
29754ebe22 diskless replication on slave side (don't store rdb to file), plus some other related fixes
The implementation of the diskless replication was currently diskless only on the master side.
The slave side was still storing the received rdb file to the disk before loading it back in and parsing it.

This commit adds two modes to load rdb directly from socket:
1) when-empty
2) using "swapdb"
the third mode of using diskless slave by flushdb is risky and currently not included.

other changes:
--------------
distinguish between aof configuration and state so that we can re-enable aof only when sync eventually
succeeds (and not when exiting from readSyncBulkPayload after a failed attempt)
also a CONFIG GET and INFO during rdb loading would have lied

When loading rdb from the network, don't kill the server on short read (that can be a network error)

Fix rdb check when performed on preamble AOF

tests:
run replication tests for diskless slave too
make replication test a bit more aggressive
Add test for diskless load swapdb
2019-07-08 15:37:48 +03:00
Oran Agra
05d6b8b9ae fix small test suite race conditions 2018-11-12 10:26:10 +02:00
maya-rv
90a82823d6 Fix typo 2018-09-04 13:32:02 +03:00
Oran Agra
dfdeb6a036 test suite conveniency improvements
* allowing --single to be repeated
* adding --only so that only a specific test inside a unit can be run
* adding --skiptill useful to resume a test that crashed passed the problematic unit.
  useful together with --clients 1
* adding --skipfile to use a file containing list of tests names to skip
* printing the names of the tests that are skiped by skipfile or denytags
* adding --config to add config file options from command line
2018-07-30 19:13:15 +03:00
antirez
dc475bcedd Test: fix lshuffle by providing the "K" combinator. 2018-07-13 17:52:39 +02:00
antirez
51bb49d8f2 Test: add lshuffle in the Tcl utility functions set. 2018-07-13 17:51:03 +02:00
Oran Agra
4436bff8a2 test suite infra improvements and fix
* fail the test (exit code) in case of timeout.
* add --wait-server to allow attaching a debugger
* add --dont-clean to keep log files when tests are done
2018-06-26 20:23:55 +03:00
antirez
435160dcae Fix test "server is up" detection after logging changes. 2016-12-19 16:49:58 +01:00
Oran Agra
7496636280 various cleanups and minor fixes 2016-04-25 16:49:57 +03:00
antirez
e9cd997869 Fix to Cluster test to support @busport format. 2016-02-02 11:03:53 +01:00
antirez
cdfd6a2607 Test: support for stack logging for OSX malloc/leaks. 2015-10-01 13:02:25 +02:00
antirez
0aa35539c6 Test: csvdump now scans all DBs. 2015-08-05 12:27:15 +02:00
antirez
2684b76b66 Test: be more patient waiting for servers to exit.
This should likely fix a false positive when running with the --valgrind
option.
2015-03-31 23:43:38 +02:00
Matt Stancliff
e240d16cc9 Add --track-origins=yes to valgrind 2015-01-21 15:48:19 +01:00
antirez
72640fe5fc Cluster test: also write from Lua script in resharding test. 2015-01-09 11:23:22 +01:00
Matt Stancliff
e24ef16446 Add quicklist implementation
This replaces individual ziplist vs. linkedlist representations
for Redis list operations.

Big thanks for all the reviews and feedback from everybody in
https://github.com/antirez/redis/pull/2143
2015-01-02 11:16:08 -05:00
antirez
e346d72200 Test: wait for actual startup in start_server.
start_server now uses return value from Tcl exec to get the server pid,
however this introduces errors that depend from timing: a lot of the
testing code base assumed the server to be actually up and running when
server_start returns.

So the old code that waits to see the pid in the log file was restored.
2014-11-28 11:49:26 +01:00
antirez
820d9dd5cb Test: try to cleanup still running Redis instances on exit.
It's hard to run the Redis test continuously if it leaks processes on
exceptions / errors.
2014-11-28 11:38:17 +01:00
Matt Stancliff
5fab7e5bf2 Remove trailing spaces from tests 2014-09-29 06:49:08 -04:00