futriix

Author	SHA1	Message	Date
Ping Xie	84157890fd	Set up clang-format github action (#538 ) Setup clang-format GitHub action to ensure coding style consistency --------- Signed-off-by: Ping Xie <pingxie@google.com>	2024-05-28 09:27:51 -07:00
Samuel Adetunji	5d0f4bc9f0	Require C11 atomics (#490 ) - Replaces custom atomics logic with C11 default atomics logic. - Drops "atomicvar_api" field from server info Closes #485 --------- Signed-off-by: adetunjii <adetunjithomas1@outlook.com> Signed-off-by: Samuel Adetunji <adetunjithomas1@outlook.com> Co-authored-by: teej4y <samuel.adetunji@prunny.com>	2024-05-26 18:41:11 +02:00
Ping Xie	c41dd77a3e	Add clang-format configs (#323 ) I have validated that these settings closely match the existing coding style with one major exception on `BreakBeforeBraces`, which will be `Attach` going forward. The mixed `BreakBeforeBraces` styles in the current codebase are hard to imitate and also very odd IMHO - see below ``` if (a == 1) { /Attach / } ``` ``` if (a == 1 \|\| b == 2) { /* Why? */ } ``` Please do NOT merge just yet. Will add the github action next once the style is reviewed/approved. --------- Signed-off-by: Ping Xie <pingxie@google.com>	2024-05-22 23:24:12 -07:00
0del	8b3ab8f74f	Rename redisAtomic to serverAtomic (#180 ) https://github.com/valkey-io/valkey/issues/144 Signed-off-by: 0del <bany.y0599@gmail.com>	2024-04-03 20:29:33 +02:00
meiravgri	2e854bccc6	Fix async safety in signal handlers (#12658 ) see discussion from after https://github.com/redis/redis/pull/12453 was merged ---- This PR replaces signals that are not considered async-signal-safe (AS-safe) with safe calls. #### 1. serverLog() and serverLogFromHandler() `serverLog` uses unsafe calls. It was decided that we will avoid `serverLog` calls by the signal handlers when: * The signal is not fatal, such as SIGALRM. In these cases, we prefer using `serverLogFromHandler` which is the safe version of `serverLog`. Note they have different prompts: `serverLog`: `62220:M 26 Oct 2023 14:39:04.526 # <msg>` `serverLogFromHandler`: `62220:signal-handler (1698331136) <msg>` * The code was added recently. Calls to `serverLog` by the signal handler have been there ever since Redis exists and it hasn't caused problems so far. To avoid regression, from now we should use `serverLogFromHandler` #### 2. `snprintf` `fgets` and `strtoul`(base = 16) --------> `_safe_snprintf`, `fgets_async_signal_safe`, `string_to_hex` The safe version of `snprintf` was taken from [here](`8cfc4ca5e7/src/mc_util.c (L754)`) #### 3. fopen(), fgets(), fclose() --------> open(), read(), close() #### 4. opendir(), readdir(), closedir() --------> open(), syscall(SYS_getdents64), close() #### 5. Threads_mngr sync mechanisms * waiting for the thread to generate stack trace: semaphore --------> busy-wait * `globals_rw_lock` was removed: as we are not using malloc and the semaphore anymore we don't need to protect `ThreadsManager_cleanups`. #### 6. Stacktraces buffer The initial problem was that we were not able to safely call malloc within the signal handler. To solve that we created a buffer on the stack of `writeStacktraces` and saved it in a global pointer, assuming that under normal circumstances, the function `writeStacktraces` would complete before any thread attempted to write to it. However, if threads lag behind, they might access this global pointer after it no longer belongs to the `writeStacktraces` stack, potentially corrupting memory. To address this, various solutions were discussed [here](https://github.com/redis/redis/pull/12658#discussion_r1390442896) Eventually, we decided to create a pipe at server startup that will remain valid as long as the process is alive. We chose this solution due to its minimal memory usage, and since `write()` and `read()` are atomic operations. It ensures that stack traces from different threads won't mix. The stacktraces collection process is now as follows: * Cleaning the pipe to eliminate writes of late threads from previous runs. * Each thread writes to the pipe its stacktrace * Waiting for all the threads to mark completion or until a timeout (2 sec) is reached * Reading from the pipe to print the stacktraces. #### 7. Changes that were considered and eventually were dropped * replace watchdog timer with a POSIX timer: according to [settimer man](https://linux.die.net/man/2/setitimer) > POSIX.1-2008 marks getitimer() and setitimer() obsolete, recommending the use of the POSIX timers API ([timer_gettime](https://linux.die.net/man/2/timer_gettime)(2), [timer_settime](https://linux.die.net/man/2/timer_settime)(2), etc.) instead. However, although it is supposed to conform to POSIX std, POSIX timers API is not supported on Mac. You can take a look here at the Linux implementation: [here](`c7562ee135`) To avoid messing up the code, and uncertainty regarding compatibility, it was decided to drop it for now. * avoid using sds (uses malloc) in logConfigDebugInfo It was considered to print config info instead of using sds, however apparently, `logConfigDebugInfo` does more than just print the sds, so it was decided this fix is out of this issue scope. #### 8. fix Signal mask check The check `signum & sig_mask` intended to indicate whether the signal is blocked by the thread was incorrect. Actually, the bit position in the signal mask corresponds to the signal number. We fixed this by changing the condition to: `sig_mask & (1L << (sig_num - 1))` #### 9. Unrelated changes both `fork.tcl `and `util.tcl` implemented a function called `count_log_message` expecting different parameters. This caused confusion when trying to run daily tests with additional test parameters to run a specific test. The `count_log_message` in `fork.tcl` was removed and the calls were replaced with calls to `count_log_message` located in `util.tcl` --------- Co-authored-by: Ozan Tezcan <ozantezcan@gmail.com> Co-authored-by: Oran Agra <oran@redislabs.com>	2023-11-23 13:22:20 +02:00
meiravgri	d27c7413a9	remove heap allocations from signal handlers. (#12655 ) Using heap allocation during signal handlers is unsafe. This PR purpose is to replace all the heap allocations done within the signal handlers raised upon server crash and assertions. These were added in #12453. writeStacktraces(): allocates the stacktraces output array on the calling thread's stack and assigns the address to a global variable. It calls `ThreadsManager_runOnThreads()` that invokes `collect_stacktrace_data()` by each thread: each thread writes to a different location in the above array to allow sync writes. get_ready_to_signal_threads_tids(): instead of allocating the `tids` array, it receives it as a fixed size array parameter, allocated on on the stack of the calling function, and returns the number of valid threads. The array size is hard-coded to 50. `ThreadsManager_runOnThreads():` To avoid the outputs array allocation, the callback signature was changed. Now it should return void. This function return type has also changed to int - returns 1 if successful, and 0 otherwise. Other unsafe calls will be handled in following PRs	2023-10-16 17:21:49 +03:00
meiravgri	4ba9e18ef0	fix crash in crash-report and other improvements (#12623 ) ## Crash fix ### Current behavior We might crash if we fail to collect some of the threads' output. If it exceeds timeout for example. The threads mngr API guarantees that the output array length will be `tids_len`, however, some indices can be NULL, in case it fails to collect some of the threads' outputs. When we use the threads mngr to collect the threads' stacktraces, we rely on this and skip NULL entries. Since the output array was allocated with malloc, instead of NULL, it contained garbage, so we got a segmentation fault when trying to read this garbage. (in debug.c:writeStacktraces() ) ### fix Allocate the global output array with zcalloc. ### To reproduce the bug, you'll have to change the code: in threadsmngr:ThreadsManager_runOnThreads(): make sure the g_output_array allocation is initialized with garbage and not 0s (add `memset(g_output_array, 2, sizeof(void) tids_len);` below the allocation). Force one of the threads to write to the array: add a global var: `static redisAtomic size_t return_now = 0;` add to `invoke_callback()` before writing to the output array: ``` size_t i_return; atomicGetIncr(return_now, i_return, 1); if(i_return == 1) return; ``` compile, start the server with `--enable-debug-command local` and run `redis-cli debug assert` The assertion triggers the the stacktrace collection. Expect to get 2 prints of the stack trace - since we get the segmentation fault after we return from the threads mngr, it can be safely triggered again. ## Added global variables r/w lock in ThreadsManager To avoid a situation where the main thread runs `ThreadsManager_cleanups` while threads are still invoking the signal handler, we use a r/w lock. For cleanups, we will acquire the write lock. The threads will acquire the read lock to enable them to write simultaneously. If we fail to acquire the read lock, it means cleanups are in progress and we return immediately. After acquiring the lock we can safely check that the global output array wasn't nullified and proceed to write to it. This way we ensure the threads are not modifying the global variables/ trying to write to the output array after they were zeroed/nullified/destroyed(the semaphore). ## other minor logging change 1. removed logging if the semaphore times out because the threads can still write to the output array after this check. Instead, we print the total number of printed stacktraces compared to the exacted number (len_tids). 2. use noinline attribute to make sure the uplevel number of ignored stack trace entries stays correct. 3. improve testing Co-authored-by: Oran Agra <oran@redislabs.com>	2023-10-02 20:02:02 +03:00
meiravgri	cc2be63997	Print stack trace from all threads in crash report (#12453 ) In this PR we are adding the functionality to collect all the process's threads' backtraces. ## Changes made in this PR ### introduce threads mngr API The threads mngr API which has 2 abilities: * `ThreadsManager_init() `- register to SIGUSR2. called on the server start-up. * ` ThreadsManager_runOnThreads()` - receives a list of a pid_t and a callback, tells every thread in the list to invoke the callback, and returns the output collected by each invocation. Elaborating atomicvar API * `atomicIncrGet(var,newvalue_var,count) `-- Increment and get the atomic counter new value * `atomicFlagGetSet` -- Get and set the atomic counter value to 1 ### Always set SIGALRM handler SIGALRM handler prints the process's stacktrace to the log file. Up until now, it was set only if the `server.watchdog_period` > 0. This can be also useful if debugging is needed. However, in situations where the server can't get requests, (a deadlock, for example) we weren't able to change the signal handler. To make it available at run time we set SIGALRM handler on server startup. The signal handler name was changed to a more general `sigalrmSignalHandler`. ### Print all the process' threads' stacktraces `logStackTrace()` now calls `writeStacktraces()`, instead of logging the current thread stacktrace. `writeStacktraces()`: * On Linux systems we use the threads manager API to collect the backtraces of all the process' threads. To get the `tids` list (threads ids) we read the `/proc/<redis-server-pid>/tasks` file which includes a list of directories. Each directory name corresponds to one tid (including the main thread). For each thread, we also need to check if it can get the signal from the threads manager (meaning it is not blocking/ignoring that signal). We send the threads manager this tids list and `collect_stacktrace_data()` callback, which collects the thread's backtrace addresses, its name, and tid. * On other systems, the behavior remained as it was (writing only the current thread stacktrace to the log file). ## compatibility notes 1. The threads mngr API is only supported in linux. 2. glibc earlier than 2.3 We use `syscall(SYS_gettid)` and `syscall(SYS_tgkill...)` because their dedicated alternatives (`gettid()` and `tgkill`) were added in glibc 2.3. ## Output example Each thread backtrace will have the following format: `<tid> <thread_name> [additional_info]` * tid: as read from the `/proc/<redis-server-pid>/tasks` file * thread_name: the tread name as it is registered in the os/ * additional_info: Sometimes we want to add specific information about one of the threads. currently. it is only used to mark the thread that handles the backtraces collection by adding "". In case of crash - this also indicates which thread caused the crash. The handling thread in won't necessarily appear first. ``` ------ STACK TRACE ------ EIP: /lib/aarch64-linux-gnu/libc.so.6(epoll_pwait+0x9c)[0xffffb9295ebc] 67089 redis-server linux-vdso.so.1(__kernel_rt_sigreturn+0x0)[0xffffb9437790] /lib/aarch64-linux-gnu/libc.so.6(epoll_pwait+0x9c)[0xffffb9295ebc] redis-server :6379(+0x75e0c)[0xaaaac2fe5e0c] redis-server :6379(aeProcessEvents+0x18c)[0xaaaac2fe6c00] redis-server :6379(aeMain+0x24)[0xaaaac2fe7038] redis-server :6379(main+0xe0c)[0xaaaac3001afc] /lib/aarch64-linux-gnu/libc.so.6(+0x273fc)[0xffffb91d73fc] /lib/aarch64-linux-gnu/libc.so.6(__libc_start_main+0x98)[0xffffb91d74cc] redis-server :6379(_start+0x30)[0xaaaac2fe0370] 67093 bio_lazy_free /lib/aarch64-linux-gnu/libc.so.6(+0x79dfc)[0xffffb9229dfc] /lib/aarch64-linux-gnu/libc.so.6(pthread_cond_wait+0x208)[0xffffb922c8fc] redis-server :6379(bioProcessBackgroundJobs+0x174)[0xaaaac30976e8] /lib/aarch64-linux-gnu/libc.so.6(+0x7d5c8)[0xffffb922d5c8] /lib/aarch64-linux-gnu/libc.so.6(+0xe5d1c)[0xffffb9295d1c] 67091 bio_close_file /lib/aarch64-linux-gnu/libc.so.6(+0x79dfc)[0xffffb9229dfc] /lib/aarch64-linux-gnu/libc.so.6(pthread_cond_wait+0x208)[0xffffb922c8fc] redis-server :6379(bioProcessBackgroundJobs+0x174)[0xaaaac30976e8] /lib/aarch64-linux-gnu/libc.so.6(+0x7d5c8)[0xffffb922d5c8] /lib/aarch64-linux-gnu/libc.so.6(+0xe5d1c)[0xffffb9295d1c] 67092 bio_aof /lib/aarch64-linux-gnu/libc.so.6(+0x79dfc)[0xffffb9229dfc] /lib/aarch64-linux-gnu/libc.so.6(pthread_cond_wait+0x208)[0xffffb922c8fc] redis-server :6379(bioProcessBackgroundJobs+0x174)[0xaaaac30976e8] /lib/aarch64-linux-gnu/libc.so.6(+0x7d5c8)[0xffffb922d5c8] /lib/aarch64-linux-gnu/libc.so.6(+0xe5d1c)[0xffffb9295d1c] 67089:signal-handler (1693824528) -------- ```	2023-09-24 09:47:23 +03:00

8 Commits