History

fix crash in crash-report and other improvements (#12623 )

## Crash fix
### Current behavior
We might crash if we fail to collect some of the threads' output. If it exceeds timeout for example.

The threads mngr API guarantees that the output array length will be `tids_len`, however, some
indices can be NULL, in case it fails to collect some of the threads' outputs.

When we use the threads mngr to collect the threads' stacktraces, we rely on this and skip NULL
entries. Since the output array was allocated with malloc, instead of NULL, it contained garbage,
so we got a segmentation fault when trying to read this garbage. (in debug.c:writeStacktraces() )

### fix
Allocate the global output array with zcalloc.

### To reproduce the bug, you'll have to change the code:
**in threadsmngr:ThreadsManager_runOnThreads():**
make sure the g_output_array allocation is initialized with garbage and not 0s 
(add `memset(g_output_array, 2, sizeof(void*) * tids_len);` below the allocation).

Force one of the threads to write to the array:
add a global var: `static redisAtomic size_t return_now = 0;` 
add to `invoke_callback()` before writing to the output array:
```
    size_t i_return;
    atomicGetIncr(return_now, i_return, 1);
    if(i_return == 1) return;
```
compile, start the server with `--enable-debug-command local` and run `redis-cli debug assert`
The assertion triggers the the stacktrace collection. 
Expect to get 2 prints of the stack trace - since we get the segmentation fault after we return from
the threads mngr, it can be safely triggered again.

## Added global variables r/w lock in ThreadsManager
To avoid a situation where the main thread runs `ThreadsManager_cleanups` while threads are still
invoking the signal handler, we use a r/w lock.
For cleanups, we will acquire the write lock.
The threads will acquire the read lock to enable them to write simultaneously.
If we fail to acquire the read lock, it means cleanups are in progress and we return immediately.
After acquiring the lock we can safely check that the global output array wasn't nullified and proceed
to write to it.
This way we ensure the threads are not modifying the global variables/ trying to write to the output
array after they were zeroed/nullified/destroyed(the semaphore).

## other minor logging change
1. removed logging if the semaphore times out because the threads can still write to the output array
  after this check. Instead, we print the total number of printed stacktraces compared to the exacted
  number (len_tids).
2. use noinline attribute to make sure the uplevel number of ignored stack trace entries stays correct.
3. improve testing

Co-authored-by: Oran Agra <oran@redislabs.com>

2023-10-02 20:02:02 +03:00

assets

fix GEORADIUS[BYMEMBER] STORE & STOREDIST args spec (#12151 )

2023-05-09 14:24:37 +03:00

cluster

Process loss of slot ownership in cluster bus (#12344 )

2023-07-05 17:46:23 -07:00

helpers

When redis-cli received ASK, it didn't handle it (#8930 )

2021-08-02 14:59:08 +03:00

integration

fix crash in crash-report and other improvements (#12623 )

2023-10-02 20:02:02 +03:00

modules

Allows modules to declare new ACL categories. (#12486 )

2023-08-30 13:01:24 -07:00

sentinel

Fix flaky SENTINEL RESET test (#12437 )

2023-08-10 08:58:52 +03:00

support

Stabilization and improvements around aof tests (#12626 )

2023-10-02 08:20:53 +03:00

tmp

minor fixes to the new test suite, html doc updated

2010-05-14 18:48:33 +02:00

unit

WAITAOF: Update fsynced_reploff_pending even if there's nothing to fsync (#12622 )

2023-09-28 17:19:20 +03:00

instances.tcl

Add reply_schema to command json files (internal for now) (#10273 )

2023-03-11 10:14:16 +02:00

README.md

Make assert_refcount skip the OBJECT REFCOUNT check with needs:debug tag (#11487 )

2022-11-22 16:38:27 +02:00

test_helper.tcl

Dump server logs when corrupt fuzzer reports crash (#12612 )

2023-09-27 09:08:18 +03:00

README.md

Redis Test Suite

The normal execution mode of the test suite involves starting and manipulating local redis-server instances, inspecting process state, log files, etc.

The test suite also supports execution against an external server, which is enabled using the --host and --port parameters. When executing against an external server, tests tagged external:skip are skipped.

There are additional runtime options that can further adjust the test suite to match different external server configurations:

Option	Impact
`--singledb`	Only use database 0, don't assume others are supported.
`--ignore-encoding`	Skip all checks for specific encoding.
`--ignore-digest`	Skip key value digest validations.
`--cluster-mode`	Run in strict Redis Cluster compatibility mode.
`--large-memory`	Enables tests that consume more than 100mb

Tags

Tags are applied to tests to classify them according to the subsystem they test, but also to indicate compatibility with different run modes and required capabilities.

Tags can be applied in different context levels:

start_server context
tags context that bundles several tests together
A single test context.

The following compatibility and capability tags are currently used:

Tag	Indicates
`external:skip`	Not compatible with external servers.
`cluster:skip`	Not compatible with `--cluster-mode`.
`large-memory`	Test that requires more than 100mb
`tls:skip`	Not compatible with `--tls`.
`needs:repl`	Uses replication and needs to be able to `SYNC` from server.
`needs:debug`	Uses the `DEBUG` command or other debugging focused commands (like `OBJECT REFCOUNT`).
`needs:pfdebug`	Uses the `PFDEBUG` command.
`needs:config-maxmemory`	Uses `CONFIG SET` to manipulate memory limit, eviction policies, etc.
`needs:config-resetstat`	Uses `CONFIG RESETSTAT` to reset statistics.
`needs:reset`	Uses `RESET` to reset client connections.
`needs:save`	Uses `SAVE` or `BGSAVE` to create an RDB file.

When using an external server (--host and --port), filtering using the external:skip tags is done automatically.

When using --cluster-mode, filtering using the cluster:skip tag is done automatically.

When not using --large-memory, filtering using the largemem:skip tag is done automatically.

In addition, it is possible to specify additional configuration. For example, to run tests on a server that does not permit SYNC use:

./runtest --host <host> --port <port> --tags -needs:repl