2012-11-08 18:25:23 +01:00
|
|
|
/*
|
|
|
|
* Copyright (c) 2009-2012, Salvatore Sanfilippo <antirez at gmail dot com>
|
Fix async safety in signal handlers (#12658)
see discussion from after https://github.com/redis/redis/pull/12453 was
merged
----
This PR replaces signals that are not considered async-signal-safe
(AS-safe) with safe calls.
#### **1. serverLog() and serverLogFromHandler()**
`serverLog` uses unsafe calls. It was decided that we will **avoid**
`serverLog` calls by the signal handlers when:
* The signal is not fatal, such as SIGALRM. In these cases, we prefer
using `serverLogFromHandler` which is the safe version of `serverLog`.
Note they have different prompts:
`serverLog`: `62220:M 26 Oct 2023 14:39:04.526 # <msg>`
`serverLogFromHandler`: `62220:signal-handler (1698331136) <msg>`
* The code was added recently. Calls to `serverLog` by the signal
handler have been there ever since Redis exists and it hasn't caused
problems so far. To avoid regression, from now we should use
`serverLogFromHandler`
#### **2. `snprintf` `fgets` and `strtoul`(base = 16) -------->
`_safe_snprintf`, `fgets_async_signal_safe`, `string_to_hex`**
The safe version of `snprintf` was taken from
[here](https://github.com/twitter/twemcache/blob/8cfc4ca5e76ed936bd3786c8cc43ed47e7778c08/src/mc_util.c#L754)
#### **3. fopen(), fgets(), fclose() --------> open(), read(), close()**
#### **4. opendir(), readdir(), closedir() --------> open(),
syscall(SYS_getdents64), close()**
#### **5. Threads_mngr sync mechanisms**
* waiting for the thread to generate stack trace: semaphore -------->
busy-wait
* `globals_rw_lock` was removed: as we are not using malloc and the
semaphore anymore we don't need to protect `ThreadsManager_cleanups`.
#### **6. Stacktraces buffer**
The initial problem was that we were not able to safely call malloc
within the signal handler.
To solve that we created a buffer on the stack of `writeStacktraces` and
saved it in a global pointer, assuming that under normal circumstances,
the function `writeStacktraces` would complete before any thread
attempted to write to it. However, **if threads lag behind, they might
access this global pointer after it no longer belongs to the
`writeStacktraces` stack, potentially corrupting memory.**
To address this, various solutions were discussed
[here](https://github.com/redis/redis/pull/12658#discussion_r1390442896)
Eventually, we decided to **create a pipe** at server startup that will
remain valid as long as the process is alive.
We chose this solution due to its minimal memory usage, and since
`write()` and `read()` are atomic operations. It ensures that stack
traces from different threads won't mix.
**The stacktraces collection process is now as follows:**
* Cleaning the pipe to eliminate writes of late threads from previous
runs.
* Each thread writes to the pipe its stacktrace
* Waiting for all the threads to mark completion or until a timeout (2
sec) is reached
* Reading from the pipe to print the stacktraces.
#### **7. Changes that were considered and eventually were dropped**
* replace watchdog timer with a POSIX timer:
according to [settimer man](https://linux.die.net/man/2/setitimer)
> POSIX.1-2008 marks getitimer() and setitimer() obsolete, recommending
the use of the POSIX timers API
([timer_gettime](https://linux.die.net/man/2/timer_gettime)(2),
[timer_settime](https://linux.die.net/man/2/timer_settime)(2), etc.)
instead.
However, although it is supposed to conform to POSIX std, POSIX timers
API is not supported on Mac.
You can take a look here at the Linux implementation:
[here](https://github.com/redis/redis/commit/c7562ee13546e504977372fdf40d33c3f86775a5)
To avoid messing up the code, and uncertainty regarding compatibility,
it was decided to drop it for now.
* avoid using sds (uses malloc) in logConfigDebugInfo
It was considered to print config info instead of using sds, however
apparently, `logConfigDebugInfo` does more than just print the sds, so
it was decided this fix is out of this issue scope.
#### **8. fix Signal mask check**
The check `signum & sig_mask` intended to indicate whether the signal is
blocked by the thread was incorrect. Actually, the bit position in the
signal mask corresponds to the signal number. We fixed this by changing
the condition to: `sig_mask & (1L << (sig_num - 1))`
#### **9. Unrelated changes**
both `fork.tcl `and `util.tcl` implemented a function called
`count_log_message` expecting different parameters. This caused
confusion when trying to run daily tests with additional test parameters
to run a specific test.
The `count_log_message` in `fork.tcl` was removed and the calls were
replaced with calls to `count_log_message` located in `util.tcl`
---------
Co-authored-by: Ozan Tezcan <ozantezcan@gmail.com>
Co-authored-by: Oran Agra <oran@redislabs.com>
2023-11-23 13:22:20 +02:00
|
|
|
* Copyright (c) 2012, Twitter, Inc.
|
2012-11-08 18:25:23 +01:00
|
|
|
* All rights reserved.
|
|
|
|
*
|
|
|
|
* Redistribution and use in source and binary forms, with or without
|
|
|
|
* modification, are permitted provided that the following conditions are met:
|
|
|
|
*
|
|
|
|
* * Redistributions of source code must retain the above copyright notice,
|
|
|
|
* this list of conditions and the following disclaimer.
|
|
|
|
* * Redistributions in binary form must reproduce the above copyright
|
|
|
|
* notice, this list of conditions and the following disclaimer in the
|
|
|
|
* documentation and/or other materials provided with the distribution.
|
|
|
|
* * Neither the name of Redis nor the names of its contributors may be used
|
|
|
|
* to endorse or promote products derived from this software without
|
|
|
|
* specific prior written permission.
|
|
|
|
*
|
|
|
|
* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
|
|
|
|
* AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
|
|
|
|
* IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
|
|
|
|
* ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
|
|
|
|
* LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
|
|
|
|
* CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
|
|
|
|
* SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
|
|
|
|
* INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
|
|
|
|
* CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
|
|
|
|
* ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
|
|
|
|
* POSSIBILITY OF SUCH DAMAGE.
|
|
|
|
*/
|
|
|
|
|
2011-05-16 17:20:27 +02:00
|
|
|
#include "fmacros.h"
|
optimizing d2string() and addReplyDouble() with grisu2: double to string conversion based on Florian Loitsch's Grisu-algorithm (#10587)
All commands / use cases that heavily rely on double to a string representation conversion,
(e.g. meaning take a double-precision floating-point number like 1.5 and return a string like "1.5" ),
could benefit from a performance boost by swapping snprintf(buf,len,"%.17g",value) by the
equivalent [fpconv_dtoa](https://github.com/night-shift/fpconv) or any other algorithm that ensures
100% coverage of conversion.
This is a well-studied topic and Projects like MongoDB. RedPanda, PyTorch leverage libraries
( fmtlib ) that use the optimized double to string conversion underneath.
The positive impact can be substantial. This PR uses the grisu2 approach ( grisu explained on
https://www.cs.tufts.edu/~nr/cs257/archive/florian-loitsch/printf.pdf section 5 ).
test suite changes:
Despite being compatible, in some cases it produces a different result from printf, and some tests
had to be adjusted.
one case is that `%.17g` (which means %e or %f which ever is shorter), chose to use `5000000000`
instead of 5e+9, which sounds like a bug?
In other cases, we changed TCL to compare numbers instead of strings to ignore minor rounding
issues (`expr 0.8 == 0.79999999999999999`)
2022-10-15 10:17:41 +01:00
|
|
|
#include "fpconv_dtoa.h"
|
2011-04-27 13:24:52 +02:00
|
|
|
#include <stdlib.h>
|
|
|
|
#include <stdio.h>
|
|
|
|
#include <string.h>
|
2010-06-22 00:07:48 +02:00
|
|
|
#include <ctype.h>
|
|
|
|
#include <limits.h>
|
2011-03-08 12:30:01 +01:00
|
|
|
#include <math.h>
|
2012-03-08 10:08:44 +01:00
|
|
|
#include <unistd.h>
|
|
|
|
#include <sys/time.h>
|
2012-07-25 23:51:22 -07:00
|
|
|
#include <float.h>
|
2014-07-23 18:01:51 +02:00
|
|
|
#include <stdint.h>
|
2015-02-12 16:40:41 +01:00
|
|
|
#include <errno.h>
|
2018-10-26 14:12:47 +00:00
|
|
|
#include <time.h>
|
Implement Multi Part AOF mechanism to avoid AOFRW overheads. (#9788)
Implement Multi-Part AOF mechanism to avoid overheads during AOFRW.
Introducing a folder with multiple AOF files tracked by a manifest file.
The main issues with the the original AOFRW mechanism are:
* buffering of commands that are processed during rewrite (consuming a lot of RAM)
* freezes of the main process when the AOFRW completes to drain the remaining part of the buffer and fsync it.
* double disk IO for the data that arrives during AOFRW (had to be written to both the old and new AOF files)
The main modifications of this PR:
1. Remove the AOF rewrite buffer and related code.
2. Divide the AOF into multiple files, they are classified as two types, one is the the `BASE` type,
it represents the full amount of data (Maybe AOF or RDB format) after each AOFRW, there is only
one `BASE` file at most. The second is `INCR` type, may have more than one. They represent the
incremental commands since the last AOFRW.
3. Use a AOF manifest file to record and manage these AOF files mentioned above.
4. The original configuration of `appendfilename` will be the base part of the new file name, for example:
`appendonly.aof.1.base.rdb` and `appendonly.aof.2.incr.aof`
5. Add manifest-related TCL tests, and modified some existing tests that depend on the `appendfilename`
6. Remove the `aof_rewrite_buffer_length` field in info.
7. Add `aof-disable-auto-gc` configuration. By default we're automatically deleting HISTORY type AOFs.
It also gives users the opportunity to preserve the history AOFs. just for testing use now.
8. Add AOFRW limiting measure. When the AOFRW failures reaches the threshold (3 times now),
we will delay the execution of the next AOFRW by 1 minute. If the next AOFRW also fails, it will be
delayed by 2 minutes. The next is 4, 8, 16, the maximum delay is 60 minutes (1 hour). During the limit
period, we can still use the 'bgrewriteaof' command to execute AOFRW immediately.
9. Support upgrade (load) data from old version redis.
10. Add `appenddirname` configuration, as the directory name of the append only files. All AOF files and
manifest file will be placed in this directory.
11. Only the last AOF file (BASE or INCR) can be truncated. Otherwise redis will exit even if
`aof-load-truncated` is enabled.
Co-authored-by: Oran Agra <oran@redislabs.com>
2022-01-04 01:14:13 +08:00
|
|
|
#include <sys/stat.h>
|
|
|
|
#include <dirent.h>
|
|
|
|
#include <fcntl.h>
|
2022-06-21 00:17:23 +08:00
|
|
|
#include <libgen.h>
|
2011-05-16 17:20:27 +02:00
|
|
|
|
2011-04-27 13:24:52 +02:00
|
|
|
#include "util.h"
|
2020-04-23 11:17:42 +02:00
|
|
|
#include "sha256.h"
|
Implement Multi Part AOF mechanism to avoid AOFRW overheads. (#9788)
Implement Multi-Part AOF mechanism to avoid overheads during AOFRW.
Introducing a folder with multiple AOF files tracked by a manifest file.
The main issues with the the original AOFRW mechanism are:
* buffering of commands that are processed during rewrite (consuming a lot of RAM)
* freezes of the main process when the AOFRW completes to drain the remaining part of the buffer and fsync it.
* double disk IO for the data that arrives during AOFRW (had to be written to both the old and new AOF files)
The main modifications of this PR:
1. Remove the AOF rewrite buffer and related code.
2. Divide the AOF into multiple files, they are classified as two types, one is the the `BASE` type,
it represents the full amount of data (Maybe AOF or RDB format) after each AOFRW, there is only
one `BASE` file at most. The second is `INCR` type, may have more than one. They represent the
incremental commands since the last AOFRW.
3. Use a AOF manifest file to record and manage these AOF files mentioned above.
4. The original configuration of `appendfilename` will be the base part of the new file name, for example:
`appendonly.aof.1.base.rdb` and `appendonly.aof.2.incr.aof`
5. Add manifest-related TCL tests, and modified some existing tests that depend on the `appendfilename`
6. Remove the `aof_rewrite_buffer_length` field in info.
7. Add `aof-disable-auto-gc` configuration. By default we're automatically deleting HISTORY type AOFs.
It also gives users the opportunity to preserve the history AOFs. just for testing use now.
8. Add AOFRW limiting measure. When the AOFRW failures reaches the threshold (3 times now),
we will delay the execution of the next AOFRW by 1 minute. If the next AOFRW also fails, it will be
delayed by 2 minutes. The next is 4, 8, 16, the maximum delay is 60 minutes (1 hour). During the limit
period, we can still use the 'bgrewriteaof' command to execute AOFRW immediately.
9. Support upgrade (load) data from old version redis.
10. Add `appenddirname` configuration, as the directory name of the append only files. All AOF files and
manifest file will be placed in this directory.
11. Only the last AOF file (BASE or INCR) can be truncated. Otherwise redis will exit even if
`aof-load-truncated` is enabled.
Co-authored-by: Oran Agra <oran@redislabs.com>
2022-01-04 01:14:13 +08:00
|
|
|
#include "config.h"
|
2010-06-22 00:07:48 +02:00
|
|
|
|
Reclaim page cache of RDB file (#11248)
# Background
The RDB file is usually generated and used once and seldom used again, but the content would reside in page cache until OS evicts it. A potential problem is that once the free memory exhausts, the OS have to reclaim some memory from page cache or swap anonymous page out, which may result in a jitters to the Redis service.
Supposing an exact scenario, a high-capacity machine hosts many redis instances, and we're upgrading the Redis together. The page cache in host machine increases as RDBs are generated. Once the free memory drop into low watermark(which is more likely to happen in older Linux kernel like 3.10, before [watermark_scale_factor](https://lore.kernel.org/lkml/1455813719-2395-1-git-send-email-hannes@cmpxchg.org/) is introduced, the `low watermark` is linear to `min watermark`, and there'is not too much buffer space for `kswapd` to be wake up to reclaim memory), a `direct reclaim` happens, which means the process would stall to wait for memory allocation.
# What the PR does
The PR introduces a capability to reclaim the cache when the RDB is operated. Generally there're two cases, read and write the RDB. For read it's a little messy to address the incremental reclaim, so the reclaim is done in one go in background after the load is finished to avoid blocking the work thread. For write, incremental reclaim amortizes the work of reclaim so no need to put it into background, and the peak watermark of cache can be reduced in this way.
Two cases are addresses specially, replication and restart, for both of which the cache is leveraged to speed up the processing, so the reclaim is postponed to a right time. To do this, a flag is added to`rdbSave` and `rdbLoad` to control whether the cache need to be kept, with the default value false.
# Something deserve noting
1. Though `posix_fadvise` is the POSIX standard, but only few platform support it, e.g. Linux, FreeBSD 10.0.
2. In Linux `posix_fadvise` only take effect on writeback-ed pages, so a `sync`(or `fsync`, `fdatasync`) is needed to flush the dirty page before `posix_fadvise` if we reclaim write cache.
# About test
A unit test is added to verify the effect of `posix_fadvise`.
In integration test overall cache increase is checked, as well as the cache backed by RDB as a specific TCL test is executed in isolated Github action job.
2023-02-12 15:23:29 +08:00
|
|
|
#define UNUSED(x) ((void)(x))
|
|
|
|
|
2010-06-22 00:07:48 +02:00
|
|
|
/* Glob-style pattern matching. */
|
2023-02-28 15:15:26 +02:00
|
|
|
static int stringmatchlen_impl(const char *pattern, int patternLen,
|
|
|
|
const char *string, int stringLen, int nocase, int *skipLongerMatches)
|
2010-06-22 00:07:48 +02:00
|
|
|
{
|
2018-12-11 13:18:52 +01:00
|
|
|
while(patternLen && stringLen) {
|
2010-06-22 00:07:48 +02:00
|
|
|
switch(pattern[0]) {
|
|
|
|
case '*':
|
2020-05-06 16:18:21 +02:00
|
|
|
while (patternLen && pattern[1] == '*') {
|
2010-06-22 00:07:48 +02:00
|
|
|
pattern++;
|
|
|
|
patternLen--;
|
|
|
|
}
|
|
|
|
if (patternLen == 1)
|
|
|
|
return 1; /* match */
|
|
|
|
while(stringLen) {
|
2023-02-28 15:15:26 +02:00
|
|
|
if (stringmatchlen_impl(pattern+1, patternLen-1,
|
|
|
|
string, stringLen, nocase, skipLongerMatches))
|
2010-06-22 00:07:48 +02:00
|
|
|
return 1; /* match */
|
2023-02-28 15:15:26 +02:00
|
|
|
if (*skipLongerMatches)
|
|
|
|
return 0; /* no match */
|
2010-06-22 00:07:48 +02:00
|
|
|
string++;
|
|
|
|
stringLen--;
|
|
|
|
}
|
2023-02-28 15:15:26 +02:00
|
|
|
/* There was no match for the rest of the pattern starting
|
|
|
|
* from anywhere in the rest of the string. If there were
|
|
|
|
* any '*' earlier in the pattern, we can terminate the
|
|
|
|
* search early without trying to match them to longer
|
|
|
|
* substrings. This is because a longer match for the
|
|
|
|
* earlier part of the pattern would require the rest of the
|
|
|
|
* pattern to match starting later in the string, and we
|
|
|
|
* have just determined that there is no match for the rest
|
|
|
|
* of the pattern starting from anywhere in the current
|
|
|
|
* string. */
|
|
|
|
*skipLongerMatches = 1;
|
2010-06-22 00:07:48 +02:00
|
|
|
return 0; /* no match */
|
|
|
|
break;
|
|
|
|
case '?':
|
|
|
|
string++;
|
|
|
|
stringLen--;
|
|
|
|
break;
|
|
|
|
case '[':
|
|
|
|
{
|
|
|
|
int not, match;
|
|
|
|
|
|
|
|
pattern++;
|
|
|
|
patternLen--;
|
|
|
|
not = pattern[0] == '^';
|
|
|
|
if (not) {
|
|
|
|
pattern++;
|
|
|
|
patternLen--;
|
|
|
|
}
|
|
|
|
match = 0;
|
|
|
|
while(1) {
|
2017-12-12 01:25:03 +01:00
|
|
|
if (pattern[0] == '\\' && patternLen >= 2) {
|
2010-06-22 00:07:48 +02:00
|
|
|
pattern++;
|
|
|
|
patternLen--;
|
|
|
|
if (pattern[0] == string[0])
|
|
|
|
match = 1;
|
|
|
|
} else if (pattern[0] == ']') {
|
|
|
|
break;
|
|
|
|
} else if (patternLen == 0) {
|
|
|
|
pattern--;
|
|
|
|
patternLen++;
|
|
|
|
break;
|
2020-05-06 16:18:21 +02:00
|
|
|
} else if (patternLen >= 3 && pattern[1] == '-') {
|
2010-06-22 00:07:48 +02:00
|
|
|
int start = pattern[0];
|
|
|
|
int end = pattern[2];
|
|
|
|
int c = string[0];
|
|
|
|
if (start > end) {
|
|
|
|
int t = start;
|
|
|
|
start = end;
|
|
|
|
end = t;
|
|
|
|
}
|
|
|
|
if (nocase) {
|
|
|
|
start = tolower(start);
|
|
|
|
end = tolower(end);
|
|
|
|
c = tolower(c);
|
|
|
|
}
|
|
|
|
pattern += 2;
|
|
|
|
patternLen -= 2;
|
|
|
|
if (c >= start && c <= end)
|
|
|
|
match = 1;
|
|
|
|
} else {
|
|
|
|
if (!nocase) {
|
|
|
|
if (pattern[0] == string[0])
|
|
|
|
match = 1;
|
|
|
|
} else {
|
|
|
|
if (tolower((int)pattern[0]) == tolower((int)string[0]))
|
|
|
|
match = 1;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
pattern++;
|
|
|
|
patternLen--;
|
|
|
|
}
|
|
|
|
if (not)
|
|
|
|
match = !match;
|
|
|
|
if (!match)
|
|
|
|
return 0; /* no match */
|
|
|
|
string++;
|
|
|
|
stringLen--;
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
case '\\':
|
|
|
|
if (patternLen >= 2) {
|
|
|
|
pattern++;
|
|
|
|
patternLen--;
|
|
|
|
}
|
|
|
|
/* fall through */
|
|
|
|
default:
|
|
|
|
if (!nocase) {
|
|
|
|
if (pattern[0] != string[0])
|
|
|
|
return 0; /* no match */
|
|
|
|
} else {
|
|
|
|
if (tolower((int)pattern[0]) != tolower((int)string[0]))
|
|
|
|
return 0; /* no match */
|
|
|
|
}
|
|
|
|
string++;
|
|
|
|
stringLen--;
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
pattern++;
|
|
|
|
patternLen--;
|
|
|
|
if (stringLen == 0) {
|
|
|
|
while(*pattern == '*') {
|
|
|
|
pattern++;
|
|
|
|
patternLen--;
|
|
|
|
}
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
if (patternLen == 0 && stringLen == 0)
|
|
|
|
return 1;
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2023-02-28 15:15:26 +02:00
|
|
|
int stringmatchlen(const char *pattern, int patternLen,
|
|
|
|
const char *string, int stringLen, int nocase) {
|
|
|
|
int skipLongerMatches = 0;
|
|
|
|
return stringmatchlen_impl(pattern,patternLen,string,stringLen,nocase,&skipLongerMatches);
|
|
|
|
}
|
|
|
|
|
2010-06-22 00:07:48 +02:00
|
|
|
int stringmatch(const char *pattern, const char *string, int nocase) {
|
|
|
|
return stringmatchlen(pattern,strlen(pattern),string,strlen(string),nocase);
|
|
|
|
}
|
|
|
|
|
2018-12-11 13:29:30 +01:00
|
|
|
/* Fuzz stringmatchlen() trying to crash it with bad input. */
|
|
|
|
int stringmatchlen_fuzz_test(void) {
|
|
|
|
char str[32];
|
|
|
|
char pat[32];
|
|
|
|
int cycles = 10000000;
|
|
|
|
int total_matches = 0;
|
|
|
|
while(cycles--) {
|
|
|
|
int strlen = rand() % sizeof(str);
|
|
|
|
int patlen = rand() % sizeof(pat);
|
|
|
|
for (int j = 0; j < strlen; j++) str[j] = rand() % 128;
|
|
|
|
for (int j = 0; j < patlen; j++) pat[j] = rand() % 128;
|
|
|
|
total_matches += stringmatchlen(pat, patlen, str, strlen, 0);
|
|
|
|
}
|
|
|
|
return total_matches;
|
|
|
|
}
|
|
|
|
|
2021-08-23 14:00:40 -04:00
|
|
|
|
2010-06-22 00:07:48 +02:00
|
|
|
/* Convert a string representing an amount of memory into the number of
|
2021-08-23 14:00:40 -04:00
|
|
|
* bytes, so for instance memtoull("1Gb") will return 1073741824 that is
|
2010-06-22 00:07:48 +02:00
|
|
|
* (1024*1024*1024).
|
|
|
|
*
|
|
|
|
* On parsing error, if *err is not NULL, it's set to 1, otherwise it's
|
2015-02-12 16:40:41 +01:00
|
|
|
* set to 0. On error the function return value is 0, regardless of the
|
|
|
|
* fact 'err' is NULL or not. */
|
2021-08-23 14:00:40 -04:00
|
|
|
unsigned long long memtoull(const char *p, int *err) {
|
2010-06-22 00:07:48 +02:00
|
|
|
const char *u;
|
|
|
|
char buf[128];
|
|
|
|
long mul; /* unit multiplier */
|
2021-08-23 14:00:40 -04:00
|
|
|
unsigned long long val;
|
2010-06-22 00:07:48 +02:00
|
|
|
unsigned int digits;
|
|
|
|
|
|
|
|
if (err) *err = 0;
|
2015-02-12 16:40:41 +01:00
|
|
|
|
2010-06-22 00:07:48 +02:00
|
|
|
/* Search the first non digit character. */
|
|
|
|
u = p;
|
Client eviction (#8687)
### Description
A mechanism for disconnecting clients when the sum of all connected clients is above a
configured limit. This prevents eviction or OOM caused by accumulated used memory
between all clients. It's a complimentary mechanism to the `client-output-buffer-limit`
mechanism which takes into account not only a single client and not only output buffers
but rather all memory used by all clients.
#### Design
The general design is as following:
* We track memory usage of each client, taking into account all memory used by the
client (query buffer, output buffer, parsed arguments, etc...). This is kept up to date
after reading from the socket, after processing commands and after writing to the socket.
* Based on the used memory we sort all clients into buckets. Each bucket contains all
clients using up up to x2 memory of the clients in the bucket below it. For example up
to 1m clients, up to 2m clients, up to 4m clients, ...
* Before processing a command and before sleep we check if we're over the configured
limit. If we are we start disconnecting clients from larger buckets downwards until we're
under the limit.
#### Config
`maxmemory-clients` max memory all clients are allowed to consume, above this threshold
we disconnect clients.
This config can either be set to 0 (meaning no limit), a size in bytes (possibly with MB/GB
suffix), or as a percentage of `maxmemory` by using the `%` suffix (e.g. setting it to `10%`
would mean 10% of `maxmemory`).
#### Important code changes
* During the development I encountered yet more situations where our io-threads access
global vars. And needed to fix them. I also had to handle keeps the clients sorted into the
memory buckets (which are global) while their memory usage changes in the io-thread.
To achieve this I decided to simplify how we check if we're in an io-thread and make it
much more explicit. I removed the `CLIENT_PENDING_READ` flag used for checking
if the client is in an io-thread (it wasn't used for anything else) and just used the global
`io_threads_op` variable the same way to check during writes.
* I optimized the cleanup of the client from the `clients_pending_read` list on client freeing.
We now store a pointer in the `client` struct to this list so we don't need to search in it
(`pending_read_list_node`).
* Added `evicted_clients` stat to `INFO` command.
* Added `CLIENT NO-EVICT ON|OFF` sub command to exclude a specific client from the
client eviction mechanism. Added corrosponding 'e' flag in the client info string.
* Added `multi-mem` field in the client info string to show how much memory is used up
by buffered multi commands.
* Client `tot-mem` now accounts for buffered multi-commands, pubsub patterns and
channels (partially), tracking prefixes (partially).
* CLIENT_CLOSE_ASAP flag is now handled in a new `beforeNextClient()` function so
clients will be disconnected between processing different clients and not only before sleep.
This new function can be used in the future for work we want to do outside the command
processing loop but don't want to wait for all clients to be processed before we get to it.
Specifically I wanted to handle output-buffer-limit related closing before we process client
eviction in case the two race with each other.
* Added a `DEBUG CLIENT-EVICTION` command to print out info about the client eviction
buckets.
* Each client now holds a pointer to the client eviction memory usage bucket it belongs to
and listNode to itself in that bucket for quick removal.
* Global `io_threads_op` variable now can contain a `IO_THREADS_OP_IDLE` value
indicating no io-threading is currently being executed.
* In order to track memory used by each clients in real-time we can't rely on updating
these stats in `clientsCron()` alone anymore. So now I call `updateClientMemUsage()`
(used to be `clientsCronTrackClientsMemUsage()`) after command processing, after
writing data to pubsub clients, after writing the output buffer and after reading from the
socket (and maybe other places too). The function is written to be fast.
* Clients are evicted if needed (with appropriate log line) in `beforeSleep()` and before
processing a command (before performing oom-checks and key-eviction).
* All clients memory usage buckets are grouped as follows:
* All clients using less than 64k.
* 64K..128K
* 128K..256K
* ...
* 2G..4G
* All clients using 4g and up.
* Added client-eviction.tcl with a bunch of tests for the new mechanism.
* Extended maxmemory.tcl to test the interaction between maxmemory and
maxmemory-clients settings.
* Added an option to flag a numeric configuration variable as a "percent", this means that
if we encounter a '%' after the number in the config file (or config set command) we
consider it as valid. Such a number is store internally as a negative value. This way an
integer value can be interpreted as either a percent (negative) or absolute value (positive).
This is useful for example if some numeric configuration can optionally be set to a percentage
of something else.
Co-authored-by: Oran Agra <oran@redislabs.com>
2021-09-23 14:02:16 +03:00
|
|
|
if (*u == '-') {
|
|
|
|
if (err) *err = 1;
|
|
|
|
return 0;
|
|
|
|
}
|
2010-06-22 00:07:48 +02:00
|
|
|
while(*u && isdigit(*u)) u++;
|
|
|
|
if (*u == '\0' || !strcasecmp(u,"b")) {
|
|
|
|
mul = 1;
|
|
|
|
} else if (!strcasecmp(u,"k")) {
|
|
|
|
mul = 1000;
|
|
|
|
} else if (!strcasecmp(u,"kb")) {
|
|
|
|
mul = 1024;
|
|
|
|
} else if (!strcasecmp(u,"m")) {
|
|
|
|
mul = 1000*1000;
|
|
|
|
} else if (!strcasecmp(u,"mb")) {
|
|
|
|
mul = 1024*1024;
|
|
|
|
} else if (!strcasecmp(u,"g")) {
|
|
|
|
mul = 1000L*1000*1000;
|
|
|
|
} else if (!strcasecmp(u,"gb")) {
|
|
|
|
mul = 1024L*1024*1024;
|
|
|
|
} else {
|
|
|
|
if (err) *err = 1;
|
2015-02-12 16:40:41 +01:00
|
|
|
return 0;
|
2010-06-22 00:07:48 +02:00
|
|
|
}
|
2015-02-12 16:40:41 +01:00
|
|
|
|
|
|
|
/* Copy the digits into a buffer, we'll use strtoll() to convert
|
|
|
|
* the digit (without the unit) into a number. */
|
2010-06-22 00:07:48 +02:00
|
|
|
digits = u-p;
|
|
|
|
if (digits >= sizeof(buf)) {
|
|
|
|
if (err) *err = 1;
|
2015-02-12 16:40:41 +01:00
|
|
|
return 0;
|
2010-06-22 00:07:48 +02:00
|
|
|
}
|
|
|
|
memcpy(buf,p,digits);
|
|
|
|
buf[digits] = '\0';
|
2015-02-12 16:40:41 +01:00
|
|
|
|
|
|
|
char *endptr;
|
|
|
|
errno = 0;
|
2021-08-23 14:00:40 -04:00
|
|
|
val = strtoull(buf,&endptr,10);
|
2015-02-12 16:40:41 +01:00
|
|
|
if ((val == 0 && errno == EINVAL) || *endptr != '\0') {
|
|
|
|
if (err) *err = 1;
|
|
|
|
return 0;
|
|
|
|
}
|
2010-06-22 00:07:48 +02:00
|
|
|
return val*mul;
|
|
|
|
}
|
|
|
|
|
2021-02-15 17:08:53 +02:00
|
|
|
/* Search a memory buffer for any set of bytes, like strpbrk().
|
|
|
|
* Returns pointer to first found char or NULL.
|
|
|
|
*/
|
|
|
|
const char *mempbrk(const char *s, size_t len, const char *chars, size_t charslen) {
|
|
|
|
for (size_t j = 0; j < len; j++) {
|
|
|
|
for (size_t n = 0; n < charslen; n++)
|
|
|
|
if (s[j] == chars[n]) return &s[j];
|
|
|
|
}
|
|
|
|
|
|
|
|
return NULL;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* Modify the buffer replacing all occurrences of chars from the 'from'
|
|
|
|
* set with the corresponding char in the 'to' set. Always returns s.
|
|
|
|
*/
|
|
|
|
char *memmapchars(char *s, size_t len, const char *from, const char *to, size_t setlen) {
|
|
|
|
for (size_t j = 0; j < len; j++) {
|
|
|
|
for (size_t i = 0; i < setlen; i++) {
|
|
|
|
if (s[j] == from[i]) {
|
|
|
|
s[j] = to[i];
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
return s;
|
|
|
|
}
|
|
|
|
|
2014-06-24 15:45:03 +02:00
|
|
|
/* Return the number of digits of 'v' when converted to string in radix 10.
|
|
|
|
* See ll2string() for more information. */
|
|
|
|
uint32_t digits10(uint64_t v) {
|
|
|
|
if (v < 10) return 1;
|
|
|
|
if (v < 100) return 2;
|
|
|
|
if (v < 1000) return 3;
|
|
|
|
if (v < 1000000000000UL) {
|
|
|
|
if (v < 100000000UL) {
|
|
|
|
if (v < 1000000) {
|
|
|
|
if (v < 10000) return 4;
|
|
|
|
return 5 + (v >= 100000);
|
|
|
|
}
|
|
|
|
return 7 + (v >= 10000000UL);
|
|
|
|
}
|
|
|
|
if (v < 10000000000UL) {
|
|
|
|
return 9 + (v >= 1000000000UL);
|
|
|
|
}
|
|
|
|
return 11 + (v >= 100000000000UL);
|
|
|
|
}
|
|
|
|
return 12 + digits10(v / 1000000000000UL);
|
|
|
|
}
|
|
|
|
|
2015-02-27 15:20:58 +01:00
|
|
|
/* Like digits10() but for signed values. */
|
|
|
|
uint32_t sdigits10(int64_t v) {
|
|
|
|
if (v < 0) {
|
|
|
|
/* Abs value of LLONG_MIN requires special handling. */
|
|
|
|
uint64_t uv = (v != LLONG_MIN) ?
|
2015-02-27 16:01:45 +01:00
|
|
|
(uint64_t)-v : ((uint64_t) LLONG_MAX)+1;
|
2015-02-27 15:20:58 +01:00
|
|
|
return digits10(uv)+1; /* +1 for the minus. */
|
|
|
|
} else {
|
|
|
|
return digits10(v);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2010-06-22 00:07:48 +02:00
|
|
|
/* Convert a long long into a string. Returns the number of
|
2014-06-24 15:45:03 +02:00
|
|
|
* characters needed to represent the number.
|
2021-08-23 14:00:40 -04:00
|
|
|
* If the buffer is not big enough to store the string, 0 is returned. */
|
2016-03-06 13:44:24 +01:00
|
|
|
int ll2string(char *dst, size_t dstlen, long long svalue) {
|
2014-06-24 15:45:03 +02:00
|
|
|
unsigned long long value;
|
2021-08-23 14:00:40 -04:00
|
|
|
int negative = 0;
|
2014-06-24 15:45:03 +02:00
|
|
|
|
2021-08-23 14:00:40 -04:00
|
|
|
/* The ull2string function with 64bit unsigned integers for simplicity, so
|
2014-06-24 15:45:03 +02:00
|
|
|
* we convert the number here and remember if it is negative. */
|
|
|
|
if (svalue < 0) {
|
2014-08-15 15:48:15 +02:00
|
|
|
if (svalue != LLONG_MIN) {
|
|
|
|
value = -svalue;
|
|
|
|
} else {
|
|
|
|
value = ((unsigned long long) LLONG_MAX)+1;
|
|
|
|
}
|
2021-08-23 14:00:40 -04:00
|
|
|
if (dstlen < 2)
|
2022-07-18 10:56:26 +03:00
|
|
|
goto err;
|
2014-06-24 15:45:03 +02:00
|
|
|
negative = 1;
|
2021-08-23 14:00:40 -04:00
|
|
|
dst[0] = '-';
|
|
|
|
dst++;
|
|
|
|
dstlen--;
|
2014-06-24 15:45:03 +02:00
|
|
|
} else {
|
|
|
|
value = svalue;
|
|
|
|
}
|
|
|
|
|
2021-08-23 14:00:40 -04:00
|
|
|
/* Converts the unsigned long long value to string*/
|
|
|
|
int length = ull2string(dst, dstlen, value);
|
|
|
|
if (length == 0) return 0;
|
|
|
|
return length + negative;
|
2022-07-18 10:56:26 +03:00
|
|
|
|
|
|
|
err:
|
|
|
|
/* force add Null termination */
|
|
|
|
if (dstlen > 0)
|
|
|
|
dst[0] = '\0';
|
|
|
|
return 0;
|
2021-08-23 14:00:40 -04:00
|
|
|
}
|
|
|
|
|
|
|
|
/* Convert a unsigned long long into a string. Returns the number of
|
|
|
|
* characters needed to represent the number.
|
|
|
|
* If the buffer is not big enough to store the string, 0 is returned.
|
|
|
|
*
|
|
|
|
* Based on the following article (that apparently does not provide a
|
|
|
|
* novel approach but only publicizes an already used technique):
|
|
|
|
*
|
|
|
|
* https://www.facebook.com/notes/facebook-engineering/three-optimization-tips-for-c/10151361643253920 */
|
|
|
|
int ull2string(char *dst, size_t dstlen, unsigned long long value) {
|
|
|
|
static const char digits[201] =
|
|
|
|
"0001020304050607080910111213141516171819"
|
|
|
|
"2021222324252627282930313233343536373839"
|
|
|
|
"4041424344454647484950515253545556575859"
|
|
|
|
"6061626364656667686970717273747576777879"
|
|
|
|
"8081828384858687888990919293949596979899";
|
|
|
|
|
2014-06-24 15:45:03 +02:00
|
|
|
/* Check length. */
|
2021-08-23 14:00:40 -04:00
|
|
|
uint32_t length = digits10(value);
|
2022-07-18 10:56:26 +03:00
|
|
|
if (length >= dstlen) goto err;;
|
2014-06-24 15:45:03 +02:00
|
|
|
|
|
|
|
/* Null term. */
|
2021-08-23 14:00:40 -04:00
|
|
|
uint32_t next = length - 1;
|
|
|
|
dst[next + 1] = '\0';
|
2014-06-24 15:45:03 +02:00
|
|
|
while (value >= 100) {
|
|
|
|
int const i = (value % 100) * 2;
|
|
|
|
value /= 100;
|
|
|
|
dst[next] = digits[i + 1];
|
|
|
|
dst[next - 1] = digits[i];
|
|
|
|
next -= 2;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* Handle last 1-2 digits. */
|
|
|
|
if (value < 10) {
|
|
|
|
dst[next] = '0' + (uint32_t) value;
|
|
|
|
} else {
|
|
|
|
int i = (uint32_t) value * 2;
|
|
|
|
dst[next] = digits[i + 1];
|
|
|
|
dst[next - 1] = digits[i];
|
|
|
|
}
|
|
|
|
return length;
|
2022-07-18 10:56:26 +03:00
|
|
|
err:
|
|
|
|
/* force add Null termination */
|
|
|
|
if (dstlen > 0)
|
|
|
|
dst[0] = '\0';
|
|
|
|
return 0;
|
2010-06-22 00:07:48 +02:00
|
|
|
}
|
|
|
|
|
2011-03-10 16:16:27 +01:00
|
|
|
/* Convert a string into a long long. Returns 1 if the string could be parsed
|
|
|
|
* into a (non-overflowing) long long, 0 otherwise. The value will be set to
|
2015-09-10 17:26:48 +02:00
|
|
|
* the parsed value when appropriate.
|
|
|
|
*
|
|
|
|
* Note that this function demands that the string strictly represents
|
|
|
|
* a long long: no spaces or other characters before or after the string
|
|
|
|
* representing the number are accepted, nor zeroes at the start if not
|
|
|
|
* for the string "0" representing the zero number.
|
|
|
|
*
|
|
|
|
* Because of its strictness, it is safe to use this function to check if
|
|
|
|
* you can convert a string into a long long, and obtain back the string
|
|
|
|
* from the number without any loss in the string representation. */
|
2012-01-02 15:24:32 -08:00
|
|
|
int string2ll(const char *s, size_t slen, long long *value) {
|
|
|
|
const char *p = s;
|
2011-03-10 16:16:27 +01:00
|
|
|
size_t plen = 0;
|
|
|
|
int negative = 0;
|
|
|
|
unsigned long long v;
|
|
|
|
|
2022-03-14 14:22:57 +08:00
|
|
|
/* A string of zero length or excessive length is not a valid number. */
|
|
|
|
if (plen == slen || slen >= LONG_STR_SIZE)
|
2011-03-10 16:16:27 +01:00
|
|
|
return 0;
|
|
|
|
|
2011-04-27 13:24:52 +02:00
|
|
|
/* Special case: first and only digit is 0. */
|
|
|
|
if (slen == 1 && p[0] == '0') {
|
|
|
|
if (value != NULL) *value = 0;
|
|
|
|
return 1;
|
|
|
|
}
|
|
|
|
|
2018-07-24 10:27:20 +02:00
|
|
|
/* Handle negative numbers: just set a flag and continue like if it
|
|
|
|
* was a positive number. Later convert into negative. */
|
2011-03-10 16:16:27 +01:00
|
|
|
if (p[0] == '-') {
|
|
|
|
negative = 1;
|
|
|
|
p++; plen++;
|
|
|
|
|
|
|
|
/* Abort on only a negative sign. */
|
|
|
|
if (plen == slen)
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2011-05-01 15:36:47 +02:00
|
|
|
/* First digit should be 1-9, otherwise the string should just be 0. */
|
2011-03-10 16:16:27 +01:00
|
|
|
if (p[0] >= '1' && p[0] <= '9') {
|
|
|
|
v = p[0]-'0';
|
|
|
|
p++; plen++;
|
|
|
|
} else {
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2018-07-24 10:27:20 +02:00
|
|
|
/* Parse all the other digits, checking for overflow at every step. */
|
2011-03-10 16:16:27 +01:00
|
|
|
while (plen < slen && p[0] >= '0' && p[0] <= '9') {
|
|
|
|
if (v > (ULLONG_MAX / 10)) /* Overflow. */
|
|
|
|
return 0;
|
|
|
|
v *= 10;
|
|
|
|
|
|
|
|
if (v > (ULLONG_MAX - (p[0]-'0'))) /* Overflow. */
|
|
|
|
return 0;
|
|
|
|
v += p[0]-'0';
|
|
|
|
|
|
|
|
p++; plen++;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* Return if not all bytes were used. */
|
|
|
|
if (plen < slen)
|
|
|
|
return 0;
|
|
|
|
|
2018-07-24 10:27:20 +02:00
|
|
|
/* Convert to negative if needed, and do the final overflow check when
|
|
|
|
* converting from unsigned long long to long long. */
|
2011-03-10 16:16:27 +01:00
|
|
|
if (negative) {
|
2011-04-27 13:24:52 +02:00
|
|
|
if (v > ((unsigned long long)(-(LLONG_MIN+1))+1)) /* Overflow. */
|
2011-03-10 16:16:27 +01:00
|
|
|
return 0;
|
|
|
|
if (value != NULL) *value = -v;
|
|
|
|
} else {
|
|
|
|
if (v > LLONG_MAX) /* Overflow. */
|
|
|
|
return 0;
|
|
|
|
if (value != NULL) *value = v;
|
|
|
|
}
|
|
|
|
return 1;
|
|
|
|
}
|
|
|
|
|
2019-11-04 08:50:29 +02:00
|
|
|
/* Helper function to convert a string to an unsigned long long value.
|
|
|
|
* The function attempts to use the faster string2ll() function inside
|
2024-04-23 20:20:35 +08:00
|
|
|
* Valkey: if it fails, strtoull() is used instead. The function returns
|
2019-11-04 08:50:29 +02:00
|
|
|
* 1 if the conversion happened successfully or 0 if the number is
|
|
|
|
* invalid or out of range. */
|
|
|
|
int string2ull(const char *s, unsigned long long *value) {
|
|
|
|
long long ll;
|
|
|
|
if (string2ll(s,strlen(s),&ll)) {
|
|
|
|
if (ll < 0) return 0; /* Negative values are out of range. */
|
|
|
|
*value = ll;
|
|
|
|
return 1;
|
|
|
|
}
|
|
|
|
errno = 0;
|
|
|
|
char *endptr = NULL;
|
|
|
|
*value = strtoull(s,&endptr,10);
|
|
|
|
if (errno == EINVAL || errno == ERANGE || !(*s != '\0' && *endptr == '\0'))
|
|
|
|
return 0; /* strtoull() failed. */
|
|
|
|
return 1; /* Conversion done! */
|
|
|
|
}
|
|
|
|
|
2011-04-27 13:24:52 +02:00
|
|
|
/* Convert a string into a long. Returns 1 if the string could be parsed into a
|
|
|
|
* (non-overflowing) long, 0 otherwise. The value will be set to the parsed
|
|
|
|
* value when appropriate. */
|
2012-01-02 15:24:32 -08:00
|
|
|
int string2l(const char *s, size_t slen, long *lval) {
|
2011-04-27 13:24:52 +02:00
|
|
|
long long llval;
|
|
|
|
|
|
|
|
if (!string2ll(s,slen,&llval))
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
if (llval < LONG_MIN || llval > LONG_MAX)
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
*lval = (long)llval;
|
|
|
|
return 1;
|
|
|
|
}
|
|
|
|
|
Fix async safety in signal handlers (#12658)
see discussion from after https://github.com/redis/redis/pull/12453 was
merged
----
This PR replaces signals that are not considered async-signal-safe
(AS-safe) with safe calls.
#### **1. serverLog() and serverLogFromHandler()**
`serverLog` uses unsafe calls. It was decided that we will **avoid**
`serverLog` calls by the signal handlers when:
* The signal is not fatal, such as SIGALRM. In these cases, we prefer
using `serverLogFromHandler` which is the safe version of `serverLog`.
Note they have different prompts:
`serverLog`: `62220:M 26 Oct 2023 14:39:04.526 # <msg>`
`serverLogFromHandler`: `62220:signal-handler (1698331136) <msg>`
* The code was added recently. Calls to `serverLog` by the signal
handler have been there ever since Redis exists and it hasn't caused
problems so far. To avoid regression, from now we should use
`serverLogFromHandler`
#### **2. `snprintf` `fgets` and `strtoul`(base = 16) -------->
`_safe_snprintf`, `fgets_async_signal_safe`, `string_to_hex`**
The safe version of `snprintf` was taken from
[here](https://github.com/twitter/twemcache/blob/8cfc4ca5e76ed936bd3786c8cc43ed47e7778c08/src/mc_util.c#L754)
#### **3. fopen(), fgets(), fclose() --------> open(), read(), close()**
#### **4. opendir(), readdir(), closedir() --------> open(),
syscall(SYS_getdents64), close()**
#### **5. Threads_mngr sync mechanisms**
* waiting for the thread to generate stack trace: semaphore -------->
busy-wait
* `globals_rw_lock` was removed: as we are not using malloc and the
semaphore anymore we don't need to protect `ThreadsManager_cleanups`.
#### **6. Stacktraces buffer**
The initial problem was that we were not able to safely call malloc
within the signal handler.
To solve that we created a buffer on the stack of `writeStacktraces` and
saved it in a global pointer, assuming that under normal circumstances,
the function `writeStacktraces` would complete before any thread
attempted to write to it. However, **if threads lag behind, they might
access this global pointer after it no longer belongs to the
`writeStacktraces` stack, potentially corrupting memory.**
To address this, various solutions were discussed
[here](https://github.com/redis/redis/pull/12658#discussion_r1390442896)
Eventually, we decided to **create a pipe** at server startup that will
remain valid as long as the process is alive.
We chose this solution due to its minimal memory usage, and since
`write()` and `read()` are atomic operations. It ensures that stack
traces from different threads won't mix.
**The stacktraces collection process is now as follows:**
* Cleaning the pipe to eliminate writes of late threads from previous
runs.
* Each thread writes to the pipe its stacktrace
* Waiting for all the threads to mark completion or until a timeout (2
sec) is reached
* Reading from the pipe to print the stacktraces.
#### **7. Changes that were considered and eventually were dropped**
* replace watchdog timer with a POSIX timer:
according to [settimer man](https://linux.die.net/man/2/setitimer)
> POSIX.1-2008 marks getitimer() and setitimer() obsolete, recommending
the use of the POSIX timers API
([timer_gettime](https://linux.die.net/man/2/timer_gettime)(2),
[timer_settime](https://linux.die.net/man/2/timer_settime)(2), etc.)
instead.
However, although it is supposed to conform to POSIX std, POSIX timers
API is not supported on Mac.
You can take a look here at the Linux implementation:
[here](https://github.com/redis/redis/commit/c7562ee13546e504977372fdf40d33c3f86775a5)
To avoid messing up the code, and uncertainty regarding compatibility,
it was decided to drop it for now.
* avoid using sds (uses malloc) in logConfigDebugInfo
It was considered to print config info instead of using sds, however
apparently, `logConfigDebugInfo` does more than just print the sds, so
it was decided this fix is out of this issue scope.
#### **8. fix Signal mask check**
The check `signum & sig_mask` intended to indicate whether the signal is
blocked by the thread was incorrect. Actually, the bit position in the
signal mask corresponds to the signal number. We fixed this by changing
the condition to: `sig_mask & (1L << (sig_num - 1))`
#### **9. Unrelated changes**
both `fork.tcl `and `util.tcl` implemented a function called
`count_log_message` expecting different parameters. This caused
confusion when trying to run daily tests with additional test parameters
to run a specific test.
The `count_log_message` in `fork.tcl` was removed and the calls were
replaced with calls to `count_log_message` located in `util.tcl`
---------
Co-authored-by: Ozan Tezcan <ozantezcan@gmail.com>
Co-authored-by: Oran Agra <oran@redislabs.com>
2023-11-23 13:22:20 +02:00
|
|
|
/* return 1 if c>= start && c <= end, 0 otherwise*/
|
|
|
|
static int safe_is_c_in_range(char c, char start, char end) {
|
|
|
|
if (c >= start && c <= end) return 1;
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int base_16_char_type(char c) {
|
|
|
|
if (safe_is_c_in_range(c, '0', '9')) return 0;
|
|
|
|
if (safe_is_c_in_range(c, 'a', 'f')) return 1;
|
|
|
|
if (safe_is_c_in_range(c, 'A', 'F')) return 2;
|
|
|
|
return -1;
|
|
|
|
}
|
|
|
|
|
|
|
|
/** This is an async-signal safe version of string2l to convert unsigned long to string.
|
|
|
|
* The function translates @param src until it reaches a value that is not 0-9, a-f or A-F, or @param we read slen characters.
|
|
|
|
* On successes writes the result to @param result_output and returns 1.
|
|
|
|
* if the string represents an overflow value, return -1. */
|
|
|
|
int string2ul_base16_async_signal_safe(const char *src, size_t slen, unsigned long *result_output) {
|
|
|
|
static char ascii_to_dec[] = {'0', 'a' - 10, 'A' - 10};
|
|
|
|
|
|
|
|
int char_type = 0;
|
|
|
|
size_t curr_char_idx = 0;
|
|
|
|
unsigned long result = 0;
|
|
|
|
int base = 16;
|
|
|
|
while ((-1 != (char_type = base_16_char_type(src[curr_char_idx]))) &&
|
|
|
|
curr_char_idx < slen) {
|
|
|
|
unsigned long curr_val = src[curr_char_idx] - ascii_to_dec[char_type];
|
|
|
|
if ((result > ULONG_MAX / base) || (result > (ULONG_MAX - curr_val)/base)) /* Overflow. */
|
|
|
|
return -1;
|
|
|
|
result = result * base + curr_val;
|
|
|
|
++curr_char_idx;
|
|
|
|
}
|
|
|
|
|
|
|
|
*result_output = result;
|
|
|
|
return 1;
|
|
|
|
}
|
|
|
|
|
2015-09-10 17:26:48 +02:00
|
|
|
/* Convert a string into a double. Returns 1 if the string could be parsed
|
|
|
|
* into a (non-overflowing) double, 0 otherwise. The value will be set to
|
|
|
|
* the parsed value when appropriate.
|
|
|
|
*
|
|
|
|
* Note that this function demands that the string strictly represents
|
|
|
|
* a double: no spaces or other characters before or after the string
|
|
|
|
* representing the number are accepted. */
|
2015-11-04 17:16:34 +01:00
|
|
|
int string2ld(const char *s, size_t slen, long double *dp) {
|
2019-01-28 17:58:11 +02:00
|
|
|
char buf[MAX_LONG_DOUBLE_CHARS];
|
2015-11-04 17:16:34 +01:00
|
|
|
long double value;
|
2015-09-23 09:33:23 +02:00
|
|
|
char *eptr;
|
|
|
|
|
2020-01-30 18:14:45 +05:30
|
|
|
if (slen == 0 || slen >= sizeof(buf)) return 0;
|
2015-09-23 09:33:23 +02:00
|
|
|
memcpy(buf,s,slen);
|
|
|
|
buf[slen] = '\0';
|
2015-09-10 17:26:48 +02:00
|
|
|
|
|
|
|
errno = 0;
|
2015-11-04 17:16:34 +01:00
|
|
|
value = strtold(buf, &eptr);
|
2015-09-23 09:33:23 +02:00
|
|
|
if (isspace(buf[0]) || eptr[0] != '\0' ||
|
2020-01-30 18:14:45 +05:30
|
|
|
(size_t)(eptr-buf) != slen ||
|
2015-09-10 17:26:48 +02:00
|
|
|
(errno == ERANGE &&
|
2022-05-10 13:55:09 +02:00
|
|
|
(value == HUGE_VAL || value == -HUGE_VAL || fpclassify(value) == FP_ZERO)) ||
|
2015-09-10 17:26:48 +02:00
|
|
|
errno == EINVAL ||
|
|
|
|
isnan(value))
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
if (dp) *dp = value;
|
|
|
|
return 1;
|
|
|
|
}
|
|
|
|
|
2019-11-03 15:02:25 +02:00
|
|
|
/* Convert a string into a double. Returns 1 if the string could be parsed
|
|
|
|
* into a (non-overflowing) double, 0 otherwise. The value will be set to
|
|
|
|
* the parsed value when appropriate.
|
|
|
|
*
|
|
|
|
* Note that this function demands that the string strictly represents
|
|
|
|
* a double: no spaces or other characters before or after the string
|
|
|
|
* representing the number are accepted. */
|
|
|
|
int string2d(const char *s, size_t slen, double *dp) {
|
|
|
|
errno = 0;
|
|
|
|
char *eptr;
|
|
|
|
*dp = strtod(s, &eptr);
|
|
|
|
if (slen == 0 ||
|
|
|
|
isspace(((const char*)s)[0]) ||
|
|
|
|
(size_t)(eptr-(char*)s) != slen ||
|
|
|
|
(errno == ERANGE &&
|
2022-05-10 13:55:09 +02:00
|
|
|
(*dp == HUGE_VAL || *dp == -HUGE_VAL || fpclassify(*dp) == FP_ZERO)) ||
|
2019-11-03 15:02:25 +02:00
|
|
|
isnan(*dp))
|
|
|
|
return 0;
|
|
|
|
return 1;
|
|
|
|
}
|
|
|
|
|
Optimize integer zset scores in listpack (converting to string and back) (#10486)
When the score doesn't have fractional part, and can be stored as an integer,
we use the integer capabilities of listpack to store it, rather than convert it to string.
This already existed before this PR (lpInsert dose that conversion implicitly).
But to do that, we would have first converted the score from double to string (calling `d2string`),
then pass the string to `lpAppend` which identified it as being an integer and convert it back to an int.
Now, instead of converting it to a string, we store it using lpAppendInteger`.
Unrelated:
---
* Fix the double2ll range check (negative and positive ranges, and also the comparison operands
were slightly off. but also, the range could be made much larger, see comment).
* Unify the double to string conversion code in rdb.c with the one in util.c
* Small optimization in lpStringToInt64, don't attempt to convert strings that are obviously too long.
Benchmark;
---
Up to 20% improvement in certain tight loops doing zzlInsert with large integers.
(if listpack is pre-allocated to avoid realloc, and insertion is sorted from largest to smaller)
2022-04-17 17:16:46 +03:00
|
|
|
/* Returns 1 if the double value can safely be represented in long long without
|
|
|
|
* precision loss, in which case the corresponding long long is stored in the out variable. */
|
|
|
|
int double2ll(double d, long long *out) {
|
|
|
|
#if (DBL_MANT_DIG >= 52) && (DBL_MANT_DIG <= 63) && (LLONG_MAX == 0x7fffffffffffffffLL)
|
|
|
|
/* Check if the float is in a safe range to be casted into a
|
|
|
|
* long long. We are assuming that long long is 64 bit here.
|
|
|
|
* Also we are assuming that there are no implementations around where
|
|
|
|
* double has precision < 52 bit.
|
|
|
|
*
|
|
|
|
* Under this assumptions we test if a double is inside a range
|
|
|
|
* where casting to long long is safe. Then using two castings we
|
|
|
|
* make sure the decimal part is zero. If all this is true we can use
|
|
|
|
* integer without precision loss.
|
|
|
|
*
|
|
|
|
* Note that numbers above 2^52 and below 2^63 use all the fraction bits as real part,
|
|
|
|
* and the exponent bits are positive, which means the "decimal" part must be 0.
|
|
|
|
* i.e. all double values in that range are representable as a long without precision loss,
|
|
|
|
* but not all long values in that range can be represented as a double.
|
|
|
|
* we only care about the first part here. */
|
2022-04-18 13:34:22 +08:00
|
|
|
if (d < (double)(-LLONG_MAX/2) || d > (double)(LLONG_MAX/2))
|
Optimize integer zset scores in listpack (converting to string and back) (#10486)
When the score doesn't have fractional part, and can be stored as an integer,
we use the integer capabilities of listpack to store it, rather than convert it to string.
This already existed before this PR (lpInsert dose that conversion implicitly).
But to do that, we would have first converted the score from double to string (calling `d2string`),
then pass the string to `lpAppend` which identified it as being an integer and convert it back to an int.
Now, instead of converting it to a string, we store it using lpAppendInteger`.
Unrelated:
---
* Fix the double2ll range check (negative and positive ranges, and also the comparison operands
were slightly off. but also, the range could be made much larger, see comment).
* Unify the double to string conversion code in rdb.c with the one in util.c
* Small optimization in lpStringToInt64, don't attempt to convert strings that are obviously too long.
Benchmark;
---
Up to 20% improvement in certain tight loops doing zzlInsert with large integers.
(if listpack is pre-allocated to avoid realloc, and insertion is sorted from largest to smaller)
2022-04-17 17:16:46 +03:00
|
|
|
return 0;
|
|
|
|
long long ll = d;
|
|
|
|
if (ll == d) {
|
|
|
|
*out = ll;
|
|
|
|
return 1;
|
|
|
|
}
|
|
|
|
#endif
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2011-03-08 12:30:01 +01:00
|
|
|
/* Convert a double to a string representation. Returns the number of bytes
|
2015-09-15 14:43:14 +02:00
|
|
|
* required. The representation should always be parsable by strtod(3).
|
|
|
|
* This function does not support human-friendly formatting like ld2string
|
2018-07-01 13:24:50 +08:00
|
|
|
* does. It is intended mainly to be used inside t_zset.c when writing scores
|
2021-11-24 19:34:13 +08:00
|
|
|
* into a listpack representing a sorted set. */
|
2011-03-08 12:30:01 +01:00
|
|
|
int d2string(char *buf, size_t len, double value) {
|
|
|
|
if (isnan(value)) {
|
2023-02-06 16:26:40 +00:00
|
|
|
/* Libc in some systems will format nan in a different way,
|
|
|
|
* like nan, -nan, NAN, nan(char-sequence).
|
|
|
|
* So we normalize it and create a single nan form in an explicit way. */
|
2011-03-08 12:30:01 +01:00
|
|
|
len = snprintf(buf,len,"nan");
|
|
|
|
} else if (isinf(value)) {
|
2023-02-06 16:26:40 +00:00
|
|
|
/* Libc in odd systems (Hi Solaris!) will format infinite in a
|
|
|
|
* different way, so better to handle it in an explicit way. */
|
2011-03-08 12:30:01 +01:00
|
|
|
if (value < 0)
|
|
|
|
len = snprintf(buf,len,"-inf");
|
|
|
|
else
|
|
|
|
len = snprintf(buf,len,"inf");
|
|
|
|
} else if (value == 0) {
|
|
|
|
/* See: http://en.wikipedia.org/wiki/Signed_zero, "Comparisons". */
|
|
|
|
if (1.0/value < 0)
|
|
|
|
len = snprintf(buf,len,"-0");
|
|
|
|
else
|
|
|
|
len = snprintf(buf,len,"0");
|
|
|
|
} else {
|
Optimize integer zset scores in listpack (converting to string and back) (#10486)
When the score doesn't have fractional part, and can be stored as an integer,
we use the integer capabilities of listpack to store it, rather than convert it to string.
This already existed before this PR (lpInsert dose that conversion implicitly).
But to do that, we would have first converted the score from double to string (calling `d2string`),
then pass the string to `lpAppend` which identified it as being an integer and convert it back to an int.
Now, instead of converting it to a string, we store it using lpAppendInteger`.
Unrelated:
---
* Fix the double2ll range check (negative and positive ranges, and also the comparison operands
were slightly off. but also, the range could be made much larger, see comment).
* Unify the double to string conversion code in rdb.c with the one in util.c
* Small optimization in lpStringToInt64, don't attempt to convert strings that are obviously too long.
Benchmark;
---
Up to 20% improvement in certain tight loops doing zzlInsert with large integers.
(if listpack is pre-allocated to avoid realloc, and insertion is sorted from largest to smaller)
2022-04-17 17:16:46 +03:00
|
|
|
long long lvalue;
|
|
|
|
/* Integer printing function is much faster, check if we can safely use it. */
|
|
|
|
if (double2ll(value, &lvalue))
|
|
|
|
len = ll2string(buf,len,lvalue);
|
optimizing d2string() and addReplyDouble() with grisu2: double to string conversion based on Florian Loitsch's Grisu-algorithm (#10587)
All commands / use cases that heavily rely on double to a string representation conversion,
(e.g. meaning take a double-precision floating-point number like 1.5 and return a string like "1.5" ),
could benefit from a performance boost by swapping snprintf(buf,len,"%.17g",value) by the
equivalent [fpconv_dtoa](https://github.com/night-shift/fpconv) or any other algorithm that ensures
100% coverage of conversion.
This is a well-studied topic and Projects like MongoDB. RedPanda, PyTorch leverage libraries
( fmtlib ) that use the optimized double to string conversion underneath.
The positive impact can be substantial. This PR uses the grisu2 approach ( grisu explained on
https://www.cs.tufts.edu/~nr/cs257/archive/florian-loitsch/printf.pdf section 5 ).
test suite changes:
Despite being compatible, in some cases it produces a different result from printf, and some tests
had to be adjusted.
one case is that `%.17g` (which means %e or %f which ever is shorter), chose to use `5000000000`
instead of 5e+9, which sounds like a bug?
In other cases, we changed TCL to compare numbers instead of strings to ignore minor rounding
issues (`expr 0.8 == 0.79999999999999999`)
2022-10-15 10:17:41 +01:00
|
|
|
else {
|
|
|
|
len = fpconv_dtoa(value, buf);
|
|
|
|
buf[len] = '\0';
|
|
|
|
}
|
2011-03-08 12:30:01 +01:00
|
|
|
}
|
|
|
|
|
|
|
|
return len;
|
2022-01-09 17:04:18 -08:00
|
|
|
}
|
|
|
|
|
2022-12-04 08:11:38 +00:00
|
|
|
/* Convert a double into a string with 'fractional_digits' digits after the dot precision.
|
|
|
|
* This is an optimized version of snprintf "%.<fractional_digits>f".
|
|
|
|
* We convert the double to long and multiply it by 10 ^ <fractional_digits> to shift
|
|
|
|
* the decimal places.
|
|
|
|
* Note that multiply it of input value by 10 ^ <fractional_digits> can overflow but on the scenario
|
2024-04-09 01:24:03 -07:00
|
|
|
* that we currently use within the server this that is not possible.
|
2022-12-04 08:11:38 +00:00
|
|
|
* After we get the long representation we use the logic from ull2string function on this file
|
|
|
|
* which is based on the following article:
|
|
|
|
* https://www.facebook.com/notes/facebook-engineering/three-optimization-tips-for-c/10151361643253920
|
|
|
|
*
|
|
|
|
* Input values:
|
|
|
|
* char: the buffer to store the string representation
|
|
|
|
* dstlen: the buffer length
|
|
|
|
* dvalue: the input double
|
|
|
|
* fractional_digits: the number of fractional digits after the dot precision. between 1 and 17
|
|
|
|
*
|
|
|
|
* Return values:
|
|
|
|
* Returns the number of characters needed to represent the number.
|
|
|
|
* If the buffer is not big enough to store the string, 0 is returned.
|
|
|
|
*/
|
|
|
|
int fixedpoint_d2string(char *dst, size_t dstlen, double dvalue, int fractional_digits) {
|
|
|
|
if (fractional_digits < 1 || fractional_digits > 17)
|
|
|
|
goto err;
|
|
|
|
/* min size of 2 ( due to 0. ) + n fractional_digitits + \0 */
|
|
|
|
if ((int)dstlen < (fractional_digits+3))
|
|
|
|
goto err;
|
|
|
|
if (dvalue == 0) {
|
|
|
|
dst[0] = '0';
|
|
|
|
dst[1] = '.';
|
2022-12-15 20:25:38 +00:00
|
|
|
memset(dst + 2, '0', fractional_digits);
|
2022-12-04 08:11:38 +00:00
|
|
|
dst[fractional_digits+2] = '\0';
|
|
|
|
return fractional_digits + 2;
|
|
|
|
}
|
|
|
|
/* scale and round */
|
|
|
|
static double powers_of_ten[] = {1.0, 10.0, 100.0, 1000.0, 10000.0, 100000.0, 1000000.0,
|
|
|
|
10000000.0, 100000000.0, 1000000000.0, 10000000000.0, 100000000000.0, 1000000000000.0,
|
|
|
|
10000000000000.0, 100000000000000.0, 1000000000000000.0, 10000000000000000.0,
|
|
|
|
100000000000000000.0 };
|
|
|
|
long long svalue = llrint(dvalue * powers_of_ten[fractional_digits]);
|
|
|
|
unsigned long long value;
|
|
|
|
/* write sign */
|
|
|
|
int negative = 0;
|
|
|
|
if (svalue < 0) {
|
|
|
|
if (svalue != LLONG_MIN) {
|
|
|
|
value = -svalue;
|
|
|
|
} else {
|
|
|
|
value = ((unsigned long long) LLONG_MAX)+1;
|
|
|
|
}
|
|
|
|
if (dstlen < 2)
|
|
|
|
goto err;
|
|
|
|
negative = 1;
|
|
|
|
dst[0] = '-';
|
|
|
|
dst++;
|
|
|
|
dstlen--;
|
|
|
|
} else {
|
|
|
|
value = svalue;
|
|
|
|
}
|
|
|
|
|
|
|
|
static const char digitsd[201] =
|
|
|
|
"0001020304050607080910111213141516171819"
|
|
|
|
"2021222324252627282930313233343536373839"
|
|
|
|
"4041424344454647484950515253545556575859"
|
|
|
|
"6061626364656667686970717273747576777879"
|
|
|
|
"8081828384858687888990919293949596979899";
|
|
|
|
|
|
|
|
/* Check length. */
|
|
|
|
uint32_t ndigits = digits10(value);
|
|
|
|
if (ndigits >= dstlen) goto err;
|
|
|
|
int integer_digits = ndigits - fractional_digits;
|
|
|
|
/* Fractional only check to avoid representing 0.7750 as .7750.
|
|
|
|
* This means we need to increment the length and store 0 as the first character.
|
|
|
|
*/
|
|
|
|
if (integer_digits < 1) {
|
|
|
|
dst[0] = '0';
|
|
|
|
integer_digits = 1;
|
|
|
|
}
|
|
|
|
dst[integer_digits] = '.';
|
|
|
|
int size = integer_digits + 1 + fractional_digits;
|
2022-12-15 20:25:38 +00:00
|
|
|
/* fill with 0 from fractional digits until size */
|
|
|
|
memset(dst + integer_digits + 1, '0', fractional_digits);
|
2022-12-04 08:11:38 +00:00
|
|
|
int next = size - 1;
|
|
|
|
while (value >= 100) {
|
|
|
|
int const i = (value % 100) * 2;
|
|
|
|
value /= 100;
|
|
|
|
dst[next] = digitsd[i + 1];
|
|
|
|
dst[next - 1] = digitsd[i];
|
|
|
|
next -= 2;
|
|
|
|
/* dot position */
|
|
|
|
if (next == integer_digits) {
|
|
|
|
next--;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
/* Handle last 1-2 digits. */
|
|
|
|
if (value < 10) {
|
|
|
|
dst[next] = '0' + (uint32_t) value;
|
|
|
|
} else {
|
|
|
|
int i = (uint32_t) value * 2;
|
|
|
|
dst[next] = digitsd[i + 1];
|
|
|
|
dst[next - 1] = digitsd[i];
|
|
|
|
}
|
|
|
|
/* Null term. */
|
|
|
|
dst[size] = '\0';
|
|
|
|
return size + negative;
|
|
|
|
err:
|
|
|
|
/* force add Null termination */
|
|
|
|
if (dstlen > 0)
|
|
|
|
dst[0] = '\0';
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2022-01-09 17:04:18 -08:00
|
|
|
/* Trims off trailing zeros from a string representing a double. */
|
|
|
|
int trimDoubleString(char *buf, size_t len) {
|
|
|
|
if (strchr(buf,'.') != NULL) {
|
|
|
|
char *p = buf+len-1;
|
|
|
|
while(*p == '0') {
|
|
|
|
p--;
|
|
|
|
len--;
|
|
|
|
}
|
|
|
|
if (*p == '.') len--;
|
|
|
|
}
|
|
|
|
buf[len] = '\0';
|
|
|
|
return len;
|
2011-03-08 12:30:01 +01:00
|
|
|
}
|
|
|
|
|
2019-11-03 16:42:31 +02:00
|
|
|
/* Create a string object from a long double.
|
|
|
|
* If mode is humanfriendly it does not use exponential format and trims trailing
|
|
|
|
* zeroes at the end (may result in loss of precision).
|
|
|
|
* If mode is default exp format is used and the output of snprintf()
|
|
|
|
* is not modified (may result in loss of precision).
|
|
|
|
* If mode is hex hexadecimal format is used (no loss of precision)
|
2015-09-15 14:43:14 +02:00
|
|
|
*
|
|
|
|
* The function returns the length of the string or zero if there was not
|
|
|
|
* enough buffer room to store it. */
|
2019-11-03 16:42:31 +02:00
|
|
|
int ld2string(char *buf, size_t len, long double value, ld2string_mode mode) {
|
|
|
|
size_t l = 0;
|
2015-09-15 14:43:14 +02:00
|
|
|
|
|
|
|
if (isinf(value)) {
|
|
|
|
/* Libc in odd systems (Hi Solaris!) will format infinite in a
|
|
|
|
* different way, so better to handle it in an explicit way. */
|
2022-07-18 10:56:26 +03:00
|
|
|
if (len < 5) goto err; /* No room. 5 is "-inf\0" */
|
2015-09-15 14:43:14 +02:00
|
|
|
if (value > 0) {
|
|
|
|
memcpy(buf,"inf",3);
|
|
|
|
l = 3;
|
|
|
|
} else {
|
|
|
|
memcpy(buf,"-inf",4);
|
|
|
|
l = 4;
|
|
|
|
}
|
2022-12-09 01:29:30 +08:00
|
|
|
} else if (isnan(value)) {
|
|
|
|
/* Libc in some systems will format nan in a different way,
|
|
|
|
* like nan, -nan, NAN, nan(char-sequence).
|
|
|
|
* So we normalize it and create a single nan form in an explicit way. */
|
|
|
|
if (len < 4) goto err; /* No room. 4 is "nan\0" */
|
|
|
|
memcpy(buf, "nan", 3);
|
|
|
|
l = 3;
|
2019-11-03 16:42:31 +02:00
|
|
|
} else {
|
|
|
|
switch (mode) {
|
|
|
|
case LD_STR_AUTO:
|
|
|
|
l = snprintf(buf,len,"%.17Lg",value);
|
2022-07-18 10:56:26 +03:00
|
|
|
if (l+1 > len) goto err;; /* No room. */
|
2019-11-03 16:42:31 +02:00
|
|
|
break;
|
|
|
|
case LD_STR_HEX:
|
|
|
|
l = snprintf(buf,len,"%La",value);
|
2022-07-18 10:56:26 +03:00
|
|
|
if (l+1 > len) goto err; /* No room. */
|
2019-11-03 16:42:31 +02:00
|
|
|
break;
|
|
|
|
case LD_STR_HUMAN:
|
|
|
|
/* We use 17 digits precision since with 128 bit floats that precision
|
|
|
|
* after rounding is able to represent most small decimal numbers in a
|
|
|
|
* way that is "non surprising" for the user (that is, most small
|
|
|
|
* decimal numbers will be represented in a way that when converted
|
|
|
|
* back into a string are exactly the same as what the user typed.) */
|
|
|
|
l = snprintf(buf,len,"%.17Lf",value);
|
2022-07-18 10:56:26 +03:00
|
|
|
if (l+1 > len) goto err; /* No room. */
|
2019-11-03 16:42:31 +02:00
|
|
|
/* Now remove trailing zeroes after the '.' */
|
|
|
|
if (strchr(buf,'.') != NULL) {
|
|
|
|
char *p = buf+l-1;
|
|
|
|
while(*p == '0') {
|
|
|
|
p--;
|
|
|
|
l--;
|
|
|
|
}
|
|
|
|
if (*p == '.') l--;
|
2015-09-15 14:43:14 +02:00
|
|
|
}
|
2019-11-05 19:23:37 +05:30
|
|
|
if (l == 2 && buf[0] == '-' && buf[1] == '0') {
|
|
|
|
buf[0] = '0';
|
|
|
|
l = 1;
|
|
|
|
}
|
2019-11-03 16:42:31 +02:00
|
|
|
break;
|
2022-07-18 10:56:26 +03:00
|
|
|
default: goto err; /* Invalid mode. */
|
2015-09-15 14:43:14 +02:00
|
|
|
}
|
|
|
|
}
|
|
|
|
buf[l] = '\0';
|
|
|
|
return l;
|
2022-07-18 10:56:26 +03:00
|
|
|
err:
|
|
|
|
/* force add Null termination */
|
|
|
|
if (len > 0)
|
|
|
|
buf[0] = '\0';
|
|
|
|
return 0;
|
2015-09-15 14:43:14 +02:00
|
|
|
}
|
|
|
|
|
2018-04-05 13:24:22 +02:00
|
|
|
/* Get random bytes, attempts to get an initial seed from /dev/urandom and
|
|
|
|
* the uses a one way hash function in counter mode to generate a random
|
|
|
|
* stream. However if /dev/urandom is not available, a weaker seed is used.
|
|
|
|
*
|
|
|
|
* This function is not thread safe, since the state is global. */
|
|
|
|
void getRandomBytes(unsigned char *p, size_t len) {
|
2015-01-22 11:00:26 +01:00
|
|
|
/* Global state. */
|
|
|
|
static int seed_initialized = 0;
|
2020-04-23 11:17:42 +02:00
|
|
|
static unsigned char seed[64]; /* 512 bit internal block size. */
|
2015-01-22 11:00:26 +01:00
|
|
|
static uint64_t counter = 0; /* The counter we hash with the seed. */
|
2015-01-21 23:19:37 +01:00
|
|
|
|
|
|
|
if (!seed_initialized) {
|
|
|
|
/* Initialize a seed and use SHA1 in counter mode, where we hash
|
|
|
|
* the same seed with a progressive counter. For the goals of this
|
|
|
|
* function we just need non-colliding strings, there are no
|
|
|
|
* cryptographic security needs. */
|
|
|
|
FILE *fp = fopen("/dev/urandom","r");
|
2018-04-05 13:24:22 +02:00
|
|
|
if (fp == NULL || fread(seed,sizeof(seed),1,fp) != 1) {
|
|
|
|
/* Revert to a weaker seed, and in this case reseed again
|
|
|
|
* at every call.*/
|
|
|
|
for (unsigned int j = 0; j < sizeof(seed); j++) {
|
|
|
|
struct timeval tv;
|
|
|
|
gettimeofday(&tv,NULL);
|
|
|
|
pid_t pid = getpid();
|
|
|
|
seed[j] = tv.tv_sec ^ tv.tv_usec ^ pid ^ (long)fp;
|
|
|
|
}
|
|
|
|
} else {
|
2015-01-21 23:19:37 +01:00
|
|
|
seed_initialized = 1;
|
2018-04-05 13:24:22 +02:00
|
|
|
}
|
2015-01-21 23:19:37 +01:00
|
|
|
if (fp) fclose(fp);
|
|
|
|
}
|
2012-03-08 10:08:44 +01:00
|
|
|
|
2018-04-05 13:24:22 +02:00
|
|
|
while(len) {
|
2020-04-23 11:17:42 +02:00
|
|
|
/* This implements SHA256-HMAC. */
|
|
|
|
unsigned char digest[SHA256_BLOCK_SIZE];
|
|
|
|
unsigned char kxor[64];
|
|
|
|
unsigned int copylen =
|
|
|
|
len > SHA256_BLOCK_SIZE ? SHA256_BLOCK_SIZE : len;
|
|
|
|
|
|
|
|
/* IKEY: key xored with 0x36. */
|
|
|
|
memcpy(kxor,seed,sizeof(kxor));
|
|
|
|
for (unsigned int i = 0; i < sizeof(kxor); i++) kxor[i] ^= 0x36;
|
|
|
|
|
|
|
|
/* Obtain HASH(IKEY||MESSAGE). */
|
|
|
|
SHA256_CTX ctx;
|
|
|
|
sha256_init(&ctx);
|
|
|
|
sha256_update(&ctx,kxor,sizeof(kxor));
|
|
|
|
sha256_update(&ctx,(unsigned char*)&counter,sizeof(counter));
|
|
|
|
sha256_final(&ctx,digest);
|
|
|
|
|
|
|
|
/* OKEY: key xored with 0x5c. */
|
|
|
|
memcpy(kxor,seed,sizeof(kxor));
|
|
|
|
for (unsigned int i = 0; i < sizeof(kxor); i++) kxor[i] ^= 0x5C;
|
|
|
|
|
|
|
|
/* Obtain HASH(OKEY || HASH(IKEY||MESSAGE)). */
|
|
|
|
sha256_init(&ctx);
|
|
|
|
sha256_update(&ctx,kxor,sizeof(kxor));
|
|
|
|
sha256_update(&ctx,digest,SHA256_BLOCK_SIZE);
|
|
|
|
sha256_final(&ctx,digest);
|
|
|
|
|
|
|
|
/* Increment the counter for the next iteration. */
|
2018-04-05 13:24:22 +02:00
|
|
|
counter++;
|
|
|
|
|
|
|
|
memcpy(p,digest,copylen);
|
|
|
|
len -= copylen;
|
|
|
|
p += copylen;
|
2012-03-08 10:08:44 +01:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2024-04-09 01:24:03 -07:00
|
|
|
/* Generate the server "Run ID", a SHA1-sized random number that identifies a
|
|
|
|
* given execution of the server, so that if you are talking with an instance
|
2018-04-05 13:24:22 +02:00
|
|
|
* having run_id == A, and you reconnect and it has run_id == B, you can be
|
|
|
|
* sure that it is either a different instance or it was restarted. */
|
|
|
|
void getRandomHexChars(char *p, size_t len) {
|
|
|
|
char *charset = "0123456789abcdef";
|
|
|
|
size_t j;
|
|
|
|
|
|
|
|
getRandomBytes((unsigned char*)p,len);
|
|
|
|
for (j = 0; j < len; j++) p[j] = charset[p[j] & 0x0F];
|
|
|
|
}
|
|
|
|
|
2013-07-02 11:56:52 +02:00
|
|
|
/* Given the filename, return the absolute path as an SDS string, or NULL
|
|
|
|
* if it fails for some reason. Note that "filename" may be an absolute path
|
|
|
|
* already, this will be detected and handled correctly.
|
|
|
|
*
|
|
|
|
* The function does not try to normalize everything, but only the obvious
|
2018-11-15 16:55:40 +08:00
|
|
|
* case of one or more "../" appearing at the start of "filename"
|
2013-07-02 11:56:52 +02:00
|
|
|
* relative path. */
|
|
|
|
sds getAbsolutePath(char *filename) {
|
|
|
|
char cwd[1024];
|
|
|
|
sds abspath;
|
|
|
|
sds relpath = sdsnew(filename);
|
|
|
|
|
|
|
|
relpath = sdstrim(relpath," \r\n\t");
|
|
|
|
if (relpath[0] == '/') return relpath; /* Path is already absolute. */
|
|
|
|
|
|
|
|
/* If path is relative, join cwd and relative path. */
|
|
|
|
if (getcwd(cwd,sizeof(cwd)) == NULL) {
|
|
|
|
sdsfree(relpath);
|
|
|
|
return NULL;
|
|
|
|
}
|
|
|
|
abspath = sdsnew(cwd);
|
|
|
|
if (sdslen(abspath) && abspath[sdslen(abspath)-1] != '/')
|
|
|
|
abspath = sdscat(abspath,"/");
|
|
|
|
|
|
|
|
/* At this point we have the current path always ending with "/", and
|
|
|
|
* the trimmed relative path. Try to normalize the obvious case of
|
|
|
|
* trailing ../ elements at the start of the path.
|
|
|
|
*
|
|
|
|
* For every "../" we find in the filename, we remove it and also remove
|
|
|
|
* the last element of the cwd, unless the current cwd is "/". */
|
|
|
|
while (sdslen(relpath) >= 3 &&
|
|
|
|
relpath[0] == '.' && relpath[1] == '.' && relpath[2] == '/')
|
|
|
|
{
|
2013-07-24 18:59:36 +02:00
|
|
|
sdsrange(relpath,3,-1);
|
2013-07-02 11:56:52 +02:00
|
|
|
if (sdslen(abspath) > 1) {
|
|
|
|
char *p = abspath + sdslen(abspath)-2;
|
|
|
|
int trimlen = 1;
|
|
|
|
|
|
|
|
while(*p != '/') {
|
|
|
|
p--;
|
|
|
|
trimlen++;
|
|
|
|
}
|
2013-07-24 18:59:36 +02:00
|
|
|
sdsrange(abspath,0,-(trimlen+1));
|
2013-07-02 11:56:52 +02:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
/* Finally glue the two parts together. */
|
|
|
|
abspath = sdscatsds(abspath,relpath);
|
|
|
|
sdsfree(relpath);
|
|
|
|
return abspath;
|
|
|
|
}
|
|
|
|
|
2018-10-26 14:02:09 +00:00
|
|
|
/*
|
|
|
|
* Gets the proper timezone in a more portable fashion
|
|
|
|
* i.e timezone variables are linux specific.
|
|
|
|
*/
|
2021-01-18 01:37:05 -08:00
|
|
|
long getTimeZone(void) {
|
2020-12-13 17:09:54 +02:00
|
|
|
#if defined(__linux__) || defined(__sun)
|
2018-10-26 14:02:09 +00:00
|
|
|
return timezone;
|
|
|
|
#else
|
|
|
|
struct timezone tz;
|
|
|
|
|
2023-11-06 21:10:09 +08:00
|
|
|
gettimeofday(NULL, &tz);
|
2018-10-26 14:02:09 +00:00
|
|
|
|
2021-01-18 01:37:05 -08:00
|
|
|
return tz.tz_minuteswest * 60L;
|
2018-10-26 14:02:09 +00:00
|
|
|
#endif
|
|
|
|
}
|
|
|
|
|
2013-07-02 12:08:07 +02:00
|
|
|
/* Return true if the specified path is just a file basename without any
|
|
|
|
* relative or absolute path. This function just checks that no / or \
|
|
|
|
* character exists inside the specified path, that's enough in the
|
2024-04-09 01:24:03 -07:00
|
|
|
* environments where the server runs. */
|
2013-07-02 12:08:07 +02:00
|
|
|
int pathIsBaseName(char *path) {
|
|
|
|
return strchr(path,'/') == NULL && strchr(path,'\\') == NULL;
|
|
|
|
}
|
|
|
|
|
Implement Multi Part AOF mechanism to avoid AOFRW overheads. (#9788)
Implement Multi-Part AOF mechanism to avoid overheads during AOFRW.
Introducing a folder with multiple AOF files tracked by a manifest file.
The main issues with the the original AOFRW mechanism are:
* buffering of commands that are processed during rewrite (consuming a lot of RAM)
* freezes of the main process when the AOFRW completes to drain the remaining part of the buffer and fsync it.
* double disk IO for the data that arrives during AOFRW (had to be written to both the old and new AOF files)
The main modifications of this PR:
1. Remove the AOF rewrite buffer and related code.
2. Divide the AOF into multiple files, they are classified as two types, one is the the `BASE` type,
it represents the full amount of data (Maybe AOF or RDB format) after each AOFRW, there is only
one `BASE` file at most. The second is `INCR` type, may have more than one. They represent the
incremental commands since the last AOFRW.
3. Use a AOF manifest file to record and manage these AOF files mentioned above.
4. The original configuration of `appendfilename` will be the base part of the new file name, for example:
`appendonly.aof.1.base.rdb` and `appendonly.aof.2.incr.aof`
5. Add manifest-related TCL tests, and modified some existing tests that depend on the `appendfilename`
6. Remove the `aof_rewrite_buffer_length` field in info.
7. Add `aof-disable-auto-gc` configuration. By default we're automatically deleting HISTORY type AOFs.
It also gives users the opportunity to preserve the history AOFs. just for testing use now.
8. Add AOFRW limiting measure. When the AOFRW failures reaches the threshold (3 times now),
we will delay the execution of the next AOFRW by 1 minute. If the next AOFRW also fails, it will be
delayed by 2 minutes. The next is 4, 8, 16, the maximum delay is 60 minutes (1 hour). During the limit
period, we can still use the 'bgrewriteaof' command to execute AOFRW immediately.
9. Support upgrade (load) data from old version redis.
10. Add `appenddirname` configuration, as the directory name of the append only files. All AOF files and
manifest file will be placed in this directory.
11. Only the last AOF file (BASE or INCR) can be truncated. Otherwise redis will exit even if
`aof-load-truncated` is enabled.
Co-authored-by: Oran Agra <oran@redislabs.com>
2022-01-04 01:14:13 +08:00
|
|
|
int fileExist(char *filename) {
|
|
|
|
struct stat statbuf;
|
|
|
|
return stat(filename, &statbuf) == 0 && S_ISREG(statbuf.st_mode);
|
|
|
|
}
|
|
|
|
|
|
|
|
int dirExists(char *dname) {
|
|
|
|
struct stat statbuf;
|
|
|
|
return stat(dname, &statbuf) == 0 && S_ISDIR(statbuf.st_mode);
|
|
|
|
}
|
|
|
|
|
|
|
|
int dirCreateIfMissing(char *dname) {
|
|
|
|
if (mkdir(dname, 0755) != 0) {
|
|
|
|
if (errno != EEXIST) {
|
|
|
|
return -1;
|
|
|
|
} else if (!dirExists(dname)) {
|
|
|
|
errno = ENOTDIR;
|
|
|
|
return -1;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
int dirRemove(char *dname) {
|
|
|
|
DIR *dir;
|
|
|
|
struct stat stat_entry;
|
|
|
|
struct dirent *entry;
|
|
|
|
char full_path[PATH_MAX + 1];
|
|
|
|
|
|
|
|
if ((dir = opendir(dname)) == NULL) {
|
|
|
|
return -1;
|
|
|
|
}
|
|
|
|
|
|
|
|
while ((entry = readdir(dir)) != NULL) {
|
|
|
|
if (!strcmp(entry->d_name, ".") || !strcmp(entry->d_name, "..")) continue;
|
|
|
|
|
|
|
|
snprintf(full_path, sizeof(full_path), "%s/%s", dname, entry->d_name);
|
|
|
|
|
|
|
|
int fd = open(full_path, O_RDONLY|O_NONBLOCK);
|
|
|
|
if (fd == -1) {
|
|
|
|
closedir(dir);
|
|
|
|
return -1;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (fstat(fd, &stat_entry) == -1) {
|
|
|
|
close(fd);
|
|
|
|
closedir(dir);
|
|
|
|
return -1;
|
|
|
|
}
|
|
|
|
close(fd);
|
|
|
|
|
|
|
|
if (S_ISDIR(stat_entry.st_mode) != 0) {
|
|
|
|
if (dirRemove(full_path) == -1) {
|
2023-11-23 03:04:02 -05:00
|
|
|
closedir(dir);
|
Implement Multi Part AOF mechanism to avoid AOFRW overheads. (#9788)
Implement Multi-Part AOF mechanism to avoid overheads during AOFRW.
Introducing a folder with multiple AOF files tracked by a manifest file.
The main issues with the the original AOFRW mechanism are:
* buffering of commands that are processed during rewrite (consuming a lot of RAM)
* freezes of the main process when the AOFRW completes to drain the remaining part of the buffer and fsync it.
* double disk IO for the data that arrives during AOFRW (had to be written to both the old and new AOF files)
The main modifications of this PR:
1. Remove the AOF rewrite buffer and related code.
2. Divide the AOF into multiple files, they are classified as two types, one is the the `BASE` type,
it represents the full amount of data (Maybe AOF or RDB format) after each AOFRW, there is only
one `BASE` file at most. The second is `INCR` type, may have more than one. They represent the
incremental commands since the last AOFRW.
3. Use a AOF manifest file to record and manage these AOF files mentioned above.
4. The original configuration of `appendfilename` will be the base part of the new file name, for example:
`appendonly.aof.1.base.rdb` and `appendonly.aof.2.incr.aof`
5. Add manifest-related TCL tests, and modified some existing tests that depend on the `appendfilename`
6. Remove the `aof_rewrite_buffer_length` field in info.
7. Add `aof-disable-auto-gc` configuration. By default we're automatically deleting HISTORY type AOFs.
It also gives users the opportunity to preserve the history AOFs. just for testing use now.
8. Add AOFRW limiting measure. When the AOFRW failures reaches the threshold (3 times now),
we will delay the execution of the next AOFRW by 1 minute. If the next AOFRW also fails, it will be
delayed by 2 minutes. The next is 4, 8, 16, the maximum delay is 60 minutes (1 hour). During the limit
period, we can still use the 'bgrewriteaof' command to execute AOFRW immediately.
9. Support upgrade (load) data from old version redis.
10. Add `appenddirname` configuration, as the directory name of the append only files. All AOF files and
manifest file will be placed in this directory.
11. Only the last AOF file (BASE or INCR) can be truncated. Otherwise redis will exit even if
`aof-load-truncated` is enabled.
Co-authored-by: Oran Agra <oran@redislabs.com>
2022-01-04 01:14:13 +08:00
|
|
|
return -1;
|
|
|
|
}
|
|
|
|
continue;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (unlink(full_path) != 0) {
|
|
|
|
closedir(dir);
|
|
|
|
return -1;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
if (rmdir(dname) != 0) {
|
|
|
|
closedir(dir);
|
|
|
|
return -1;
|
|
|
|
}
|
|
|
|
|
|
|
|
closedir(dir);
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
sds makePath(char *path, char *filename) {
|
|
|
|
return sdscatfmt(sdsempty(), "%s/%s", path, filename);
|
|
|
|
}
|
|
|
|
|
2022-06-21 00:17:23 +08:00
|
|
|
/* Given the filename, sync the corresponding directory.
|
|
|
|
*
|
|
|
|
* Usually a portable and safe pattern to overwrite existing files would be like:
|
|
|
|
* 1. create a new temp file (on the same file system!)
|
|
|
|
* 2. write data to the temp file
|
|
|
|
* 3. fsync() the temp file
|
|
|
|
* 4. rename the temp file to the appropriate name
|
|
|
|
* 5. fsync() the containing directory */
|
|
|
|
int fsyncFileDir(const char *filename) {
|
|
|
|
#ifdef _AIX
|
|
|
|
/* AIX is unable to fsync a directory */
|
|
|
|
return 0;
|
|
|
|
#endif
|
|
|
|
char temp_filename[PATH_MAX + 1];
|
|
|
|
char *dname;
|
|
|
|
int dir_fd;
|
|
|
|
|
|
|
|
if (strlen(filename) > PATH_MAX) {
|
|
|
|
errno = ENAMETOOLONG;
|
|
|
|
return -1;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* In the glibc implementation dirname may modify their argument. */
|
|
|
|
memcpy(temp_filename, filename, strlen(filename) + 1);
|
|
|
|
dname = dirname(temp_filename);
|
|
|
|
|
|
|
|
dir_fd = open(dname, O_RDONLY);
|
|
|
|
if (dir_fd == -1) {
|
|
|
|
/* Some OSs don't allow us to open directories at all, just
|
|
|
|
* ignore the error in that case */
|
|
|
|
if (errno == EISDIR) {
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
return -1;
|
|
|
|
}
|
|
|
|
/* Some OSs don't allow us to fsync directories at all, so we can ignore
|
|
|
|
* those errors. */
|
2024-04-23 20:20:35 +08:00
|
|
|
if (valkey_fsync(dir_fd) == -1 && !(errno == EBADF || errno == EINVAL)) {
|
2022-06-21 00:17:23 +08:00
|
|
|
int save_errno = errno;
|
|
|
|
close(dir_fd);
|
|
|
|
errno = save_errno;
|
|
|
|
return -1;
|
|
|
|
}
|
Fix async safety in signal handlers (#12658)
see discussion from after https://github.com/redis/redis/pull/12453 was
merged
----
This PR replaces signals that are not considered async-signal-safe
(AS-safe) with safe calls.
#### **1. serverLog() and serverLogFromHandler()**
`serverLog` uses unsafe calls. It was decided that we will **avoid**
`serverLog` calls by the signal handlers when:
* The signal is not fatal, such as SIGALRM. In these cases, we prefer
using `serverLogFromHandler` which is the safe version of `serverLog`.
Note they have different prompts:
`serverLog`: `62220:M 26 Oct 2023 14:39:04.526 # <msg>`
`serverLogFromHandler`: `62220:signal-handler (1698331136) <msg>`
* The code was added recently. Calls to `serverLog` by the signal
handler have been there ever since Redis exists and it hasn't caused
problems so far. To avoid regression, from now we should use
`serverLogFromHandler`
#### **2. `snprintf` `fgets` and `strtoul`(base = 16) -------->
`_safe_snprintf`, `fgets_async_signal_safe`, `string_to_hex`**
The safe version of `snprintf` was taken from
[here](https://github.com/twitter/twemcache/blob/8cfc4ca5e76ed936bd3786c8cc43ed47e7778c08/src/mc_util.c#L754)
#### **3. fopen(), fgets(), fclose() --------> open(), read(), close()**
#### **4. opendir(), readdir(), closedir() --------> open(),
syscall(SYS_getdents64), close()**
#### **5. Threads_mngr sync mechanisms**
* waiting for the thread to generate stack trace: semaphore -------->
busy-wait
* `globals_rw_lock` was removed: as we are not using malloc and the
semaphore anymore we don't need to protect `ThreadsManager_cleanups`.
#### **6. Stacktraces buffer**
The initial problem was that we were not able to safely call malloc
within the signal handler.
To solve that we created a buffer on the stack of `writeStacktraces` and
saved it in a global pointer, assuming that under normal circumstances,
the function `writeStacktraces` would complete before any thread
attempted to write to it. However, **if threads lag behind, they might
access this global pointer after it no longer belongs to the
`writeStacktraces` stack, potentially corrupting memory.**
To address this, various solutions were discussed
[here](https://github.com/redis/redis/pull/12658#discussion_r1390442896)
Eventually, we decided to **create a pipe** at server startup that will
remain valid as long as the process is alive.
We chose this solution due to its minimal memory usage, and since
`write()` and `read()` are atomic operations. It ensures that stack
traces from different threads won't mix.
**The stacktraces collection process is now as follows:**
* Cleaning the pipe to eliminate writes of late threads from previous
runs.
* Each thread writes to the pipe its stacktrace
* Waiting for all the threads to mark completion or until a timeout (2
sec) is reached
* Reading from the pipe to print the stacktraces.
#### **7. Changes that were considered and eventually were dropped**
* replace watchdog timer with a POSIX timer:
according to [settimer man](https://linux.die.net/man/2/setitimer)
> POSIX.1-2008 marks getitimer() and setitimer() obsolete, recommending
the use of the POSIX timers API
([timer_gettime](https://linux.die.net/man/2/timer_gettime)(2),
[timer_settime](https://linux.die.net/man/2/timer_settime)(2), etc.)
instead.
However, although it is supposed to conform to POSIX std, POSIX timers
API is not supported on Mac.
You can take a look here at the Linux implementation:
[here](https://github.com/redis/redis/commit/c7562ee13546e504977372fdf40d33c3f86775a5)
To avoid messing up the code, and uncertainty regarding compatibility,
it was decided to drop it for now.
* avoid using sds (uses malloc) in logConfigDebugInfo
It was considered to print config info instead of using sds, however
apparently, `logConfigDebugInfo` does more than just print the sds, so
it was decided this fix is out of this issue scope.
#### **8. fix Signal mask check**
The check `signum & sig_mask` intended to indicate whether the signal is
blocked by the thread was incorrect. Actually, the bit position in the
signal mask corresponds to the signal number. We fixed this by changing
the condition to: `sig_mask & (1L << (sig_num - 1))`
#### **9. Unrelated changes**
both `fork.tcl `and `util.tcl` implemented a function called
`count_log_message` expecting different parameters. This caused
confusion when trying to run daily tests with additional test parameters
to run a specific test.
The `count_log_message` in `fork.tcl` was removed and the calls were
replaced with calls to `count_log_message` located in `util.tcl`
---------
Co-authored-by: Ozan Tezcan <ozantezcan@gmail.com>
Co-authored-by: Oran Agra <oran@redislabs.com>
2023-11-23 13:22:20 +02:00
|
|
|
|
2022-06-21 00:17:23 +08:00
|
|
|
close(dir_fd);
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
Reclaim page cache of RDB file (#11248)
# Background
The RDB file is usually generated and used once and seldom used again, but the content would reside in page cache until OS evicts it. A potential problem is that once the free memory exhausts, the OS have to reclaim some memory from page cache or swap anonymous page out, which may result in a jitters to the Redis service.
Supposing an exact scenario, a high-capacity machine hosts many redis instances, and we're upgrading the Redis together. The page cache in host machine increases as RDBs are generated. Once the free memory drop into low watermark(which is more likely to happen in older Linux kernel like 3.10, before [watermark_scale_factor](https://lore.kernel.org/lkml/1455813719-2395-1-git-send-email-hannes@cmpxchg.org/) is introduced, the `low watermark` is linear to `min watermark`, and there'is not too much buffer space for `kswapd` to be wake up to reclaim memory), a `direct reclaim` happens, which means the process would stall to wait for memory allocation.
# What the PR does
The PR introduces a capability to reclaim the cache when the RDB is operated. Generally there're two cases, read and write the RDB. For read it's a little messy to address the incremental reclaim, so the reclaim is done in one go in background after the load is finished to avoid blocking the work thread. For write, incremental reclaim amortizes the work of reclaim so no need to put it into background, and the peak watermark of cache can be reduced in this way.
Two cases are addresses specially, replication and restart, for both of which the cache is leveraged to speed up the processing, so the reclaim is postponed to a right time. To do this, a flag is added to`rdbSave` and `rdbLoad` to control whether the cache need to be kept, with the default value false.
# Something deserve noting
1. Though `posix_fadvise` is the POSIX standard, but only few platform support it, e.g. Linux, FreeBSD 10.0.
2. In Linux `posix_fadvise` only take effect on writeback-ed pages, so a `sync`(or `fsync`, `fdatasync`) is needed to flush the dirty page before `posix_fadvise` if we reclaim write cache.
# About test
A unit test is added to verify the effect of `posix_fadvise`.
In integration test overall cache increase is checked, as well as the cache backed by RDB as a specific TCL test is executed in isolated Github action job.
2023-02-12 15:23:29 +08:00
|
|
|
/* free OS pages backed by file */
|
|
|
|
int reclaimFilePageCache(int fd, size_t offset, size_t length) {
|
|
|
|
#ifdef HAVE_FADVISE
|
|
|
|
int ret = posix_fadvise(fd, offset, length, POSIX_FADV_DONTNEED);
|
2024-05-03 20:10:33 +02:00
|
|
|
if (ret) {
|
|
|
|
errno = ret;
|
|
|
|
return -1;
|
|
|
|
}
|
Reclaim page cache of RDB file (#11248)
# Background
The RDB file is usually generated and used once and seldom used again, but the content would reside in page cache until OS evicts it. A potential problem is that once the free memory exhausts, the OS have to reclaim some memory from page cache or swap anonymous page out, which may result in a jitters to the Redis service.
Supposing an exact scenario, a high-capacity machine hosts many redis instances, and we're upgrading the Redis together. The page cache in host machine increases as RDBs are generated. Once the free memory drop into low watermark(which is more likely to happen in older Linux kernel like 3.10, before [watermark_scale_factor](https://lore.kernel.org/lkml/1455813719-2395-1-git-send-email-hannes@cmpxchg.org/) is introduced, the `low watermark` is linear to `min watermark`, and there'is not too much buffer space for `kswapd` to be wake up to reclaim memory), a `direct reclaim` happens, which means the process would stall to wait for memory allocation.
# What the PR does
The PR introduces a capability to reclaim the cache when the RDB is operated. Generally there're two cases, read and write the RDB. For read it's a little messy to address the incremental reclaim, so the reclaim is done in one go in background after the load is finished to avoid blocking the work thread. For write, incremental reclaim amortizes the work of reclaim so no need to put it into background, and the peak watermark of cache can be reduced in this way.
Two cases are addresses specially, replication and restart, for both of which the cache is leveraged to speed up the processing, so the reclaim is postponed to a right time. To do this, a flag is added to`rdbSave` and `rdbLoad` to control whether the cache need to be kept, with the default value false.
# Something deserve noting
1. Though `posix_fadvise` is the POSIX standard, but only few platform support it, e.g. Linux, FreeBSD 10.0.
2. In Linux `posix_fadvise` only take effect on writeback-ed pages, so a `sync`(or `fsync`, `fdatasync`) is needed to flush the dirty page before `posix_fadvise` if we reclaim write cache.
# About test
A unit test is added to verify the effect of `posix_fadvise`.
In integration test overall cache increase is checked, as well as the cache backed by RDB as a specific TCL test is executed in isolated Github action job.
2023-02-12 15:23:29 +08:00
|
|
|
return 0;
|
|
|
|
#else
|
|
|
|
UNUSED(fd);
|
|
|
|
UNUSED(offset);
|
|
|
|
UNUSED(length);
|
|
|
|
return 0;
|
|
|
|
#endif
|
|
|
|
}
|
|
|
|
|
Fix async safety in signal handlers (#12658)
see discussion from after https://github.com/redis/redis/pull/12453 was
merged
----
This PR replaces signals that are not considered async-signal-safe
(AS-safe) with safe calls.
#### **1. serverLog() and serverLogFromHandler()**
`serverLog` uses unsafe calls. It was decided that we will **avoid**
`serverLog` calls by the signal handlers when:
* The signal is not fatal, such as SIGALRM. In these cases, we prefer
using `serverLogFromHandler` which is the safe version of `serverLog`.
Note they have different prompts:
`serverLog`: `62220:M 26 Oct 2023 14:39:04.526 # <msg>`
`serverLogFromHandler`: `62220:signal-handler (1698331136) <msg>`
* The code was added recently. Calls to `serverLog` by the signal
handler have been there ever since Redis exists and it hasn't caused
problems so far. To avoid regression, from now we should use
`serverLogFromHandler`
#### **2. `snprintf` `fgets` and `strtoul`(base = 16) -------->
`_safe_snprintf`, `fgets_async_signal_safe`, `string_to_hex`**
The safe version of `snprintf` was taken from
[here](https://github.com/twitter/twemcache/blob/8cfc4ca5e76ed936bd3786c8cc43ed47e7778c08/src/mc_util.c#L754)
#### **3. fopen(), fgets(), fclose() --------> open(), read(), close()**
#### **4. opendir(), readdir(), closedir() --------> open(),
syscall(SYS_getdents64), close()**
#### **5. Threads_mngr sync mechanisms**
* waiting for the thread to generate stack trace: semaphore -------->
busy-wait
* `globals_rw_lock` was removed: as we are not using malloc and the
semaphore anymore we don't need to protect `ThreadsManager_cleanups`.
#### **6. Stacktraces buffer**
The initial problem was that we were not able to safely call malloc
within the signal handler.
To solve that we created a buffer on the stack of `writeStacktraces` and
saved it in a global pointer, assuming that under normal circumstances,
the function `writeStacktraces` would complete before any thread
attempted to write to it. However, **if threads lag behind, they might
access this global pointer after it no longer belongs to the
`writeStacktraces` stack, potentially corrupting memory.**
To address this, various solutions were discussed
[here](https://github.com/redis/redis/pull/12658#discussion_r1390442896)
Eventually, we decided to **create a pipe** at server startup that will
remain valid as long as the process is alive.
We chose this solution due to its minimal memory usage, and since
`write()` and `read()` are atomic operations. It ensures that stack
traces from different threads won't mix.
**The stacktraces collection process is now as follows:**
* Cleaning the pipe to eliminate writes of late threads from previous
runs.
* Each thread writes to the pipe its stacktrace
* Waiting for all the threads to mark completion or until a timeout (2
sec) is reached
* Reading from the pipe to print the stacktraces.
#### **7. Changes that were considered and eventually were dropped**
* replace watchdog timer with a POSIX timer:
according to [settimer man](https://linux.die.net/man/2/setitimer)
> POSIX.1-2008 marks getitimer() and setitimer() obsolete, recommending
the use of the POSIX timers API
([timer_gettime](https://linux.die.net/man/2/timer_gettime)(2),
[timer_settime](https://linux.die.net/man/2/timer_settime)(2), etc.)
instead.
However, although it is supposed to conform to POSIX std, POSIX timers
API is not supported on Mac.
You can take a look here at the Linux implementation:
[here](https://github.com/redis/redis/commit/c7562ee13546e504977372fdf40d33c3f86775a5)
To avoid messing up the code, and uncertainty regarding compatibility,
it was decided to drop it for now.
* avoid using sds (uses malloc) in logConfigDebugInfo
It was considered to print config info instead of using sds, however
apparently, `logConfigDebugInfo` does more than just print the sds, so
it was decided this fix is out of this issue scope.
#### **8. fix Signal mask check**
The check `signum & sig_mask` intended to indicate whether the signal is
blocked by the thread was incorrect. Actually, the bit position in the
signal mask corresponds to the signal number. We fixed this by changing
the condition to: `sig_mask & (1L << (sig_num - 1))`
#### **9. Unrelated changes**
both `fork.tcl `and `util.tcl` implemented a function called
`count_log_message` expecting different parameters. This caused
confusion when trying to run daily tests with additional test parameters
to run a specific test.
The `count_log_message` in `fork.tcl` was removed and the calls were
replaced with calls to `count_log_message` located in `util.tcl`
---------
Co-authored-by: Ozan Tezcan <ozantezcan@gmail.com>
Co-authored-by: Oran Agra <oran@redislabs.com>
2023-11-23 13:22:20 +02:00
|
|
|
/** An async signal safe version of fgets().
|
|
|
|
* Has the same behaviour as standard fgets(): reads a line from fd and stores it into the dest buffer.
|
|
|
|
* It stops when either (buff_size-1) characters are read, the newline character is read, or the end-of-file is reached,
|
|
|
|
* whichever comes first.
|
|
|
|
*
|
|
|
|
* On success, the function returns the same dest parameter. If the End-of-File is encountered and no characters have
|
|
|
|
* been read, the contents of dest remain unchanged and a null pointer is returned.
|
|
|
|
* If an error occurs, a null pointer is returned. */
|
|
|
|
char *fgets_async_signal_safe(char *dest, int buff_size, int fd) {
|
|
|
|
for (int i = 0; i < buff_size; i++) {
|
|
|
|
/* Read one byte */
|
|
|
|
ssize_t bytes_read_count = read(fd, dest + i, 1);
|
|
|
|
/* On EOF or error return NULL */
|
|
|
|
if (bytes_read_count < 1) {
|
|
|
|
return NULL;
|
|
|
|
}
|
|
|
|
/* we found the end of the line. */
|
|
|
|
if (dest[i] == '\n') {
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
return dest;
|
|
|
|
}
|
|
|
|
|
|
|
|
static const char HEX[] = "0123456789abcdef";
|
|
|
|
|
|
|
|
static char *u2string_async_signal_safe(int _base, uint64_t val, char *buf) {
|
|
|
|
uint32_t base = (uint32_t) _base;
|
|
|
|
*buf-- = 0;
|
|
|
|
do {
|
|
|
|
*buf-- = HEX[val % base];
|
|
|
|
} while ((val /= base) != 0);
|
|
|
|
return buf + 1;
|
|
|
|
}
|
|
|
|
|
|
|
|
static char *i2string_async_signal_safe(int base, int64_t val, char *buf) {
|
|
|
|
char *orig_buf = buf;
|
|
|
|
const int32_t is_neg = (val < 0);
|
|
|
|
*buf-- = 0;
|
|
|
|
|
|
|
|
if (is_neg) {
|
|
|
|
val = -val;
|
|
|
|
}
|
|
|
|
if (is_neg && base == 16) {
|
|
|
|
int ix;
|
|
|
|
val -= 1;
|
|
|
|
for (ix = 0; ix < 16; ++ix)
|
|
|
|
buf[-ix] = '0';
|
|
|
|
}
|
|
|
|
|
|
|
|
do {
|
|
|
|
*buf-- = HEX[val % base];
|
|
|
|
} while ((val /= base) != 0);
|
|
|
|
|
|
|
|
if (is_neg && base == 10) {
|
|
|
|
*buf-- = '-';
|
|
|
|
}
|
|
|
|
|
|
|
|
if (is_neg && base == 16) {
|
|
|
|
int ix;
|
|
|
|
buf = orig_buf - 1;
|
|
|
|
for (ix = 0; ix < 16; ++ix, --buf) {
|
2024-05-08 20:58:53 +02:00
|
|
|
/* clang-format off */
|
Fix async safety in signal handlers (#12658)
see discussion from after https://github.com/redis/redis/pull/12453 was
merged
----
This PR replaces signals that are not considered async-signal-safe
(AS-safe) with safe calls.
#### **1. serverLog() and serverLogFromHandler()**
`serverLog` uses unsafe calls. It was decided that we will **avoid**
`serverLog` calls by the signal handlers when:
* The signal is not fatal, such as SIGALRM. In these cases, we prefer
using `serverLogFromHandler` which is the safe version of `serverLog`.
Note they have different prompts:
`serverLog`: `62220:M 26 Oct 2023 14:39:04.526 # <msg>`
`serverLogFromHandler`: `62220:signal-handler (1698331136) <msg>`
* The code was added recently. Calls to `serverLog` by the signal
handler have been there ever since Redis exists and it hasn't caused
problems so far. To avoid regression, from now we should use
`serverLogFromHandler`
#### **2. `snprintf` `fgets` and `strtoul`(base = 16) -------->
`_safe_snprintf`, `fgets_async_signal_safe`, `string_to_hex`**
The safe version of `snprintf` was taken from
[here](https://github.com/twitter/twemcache/blob/8cfc4ca5e76ed936bd3786c8cc43ed47e7778c08/src/mc_util.c#L754)
#### **3. fopen(), fgets(), fclose() --------> open(), read(), close()**
#### **4. opendir(), readdir(), closedir() --------> open(),
syscall(SYS_getdents64), close()**
#### **5. Threads_mngr sync mechanisms**
* waiting for the thread to generate stack trace: semaphore -------->
busy-wait
* `globals_rw_lock` was removed: as we are not using malloc and the
semaphore anymore we don't need to protect `ThreadsManager_cleanups`.
#### **6. Stacktraces buffer**
The initial problem was that we were not able to safely call malloc
within the signal handler.
To solve that we created a buffer on the stack of `writeStacktraces` and
saved it in a global pointer, assuming that under normal circumstances,
the function `writeStacktraces` would complete before any thread
attempted to write to it. However, **if threads lag behind, they might
access this global pointer after it no longer belongs to the
`writeStacktraces` stack, potentially corrupting memory.**
To address this, various solutions were discussed
[here](https://github.com/redis/redis/pull/12658#discussion_r1390442896)
Eventually, we decided to **create a pipe** at server startup that will
remain valid as long as the process is alive.
We chose this solution due to its minimal memory usage, and since
`write()` and `read()` are atomic operations. It ensures that stack
traces from different threads won't mix.
**The stacktraces collection process is now as follows:**
* Cleaning the pipe to eliminate writes of late threads from previous
runs.
* Each thread writes to the pipe its stacktrace
* Waiting for all the threads to mark completion or until a timeout (2
sec) is reached
* Reading from the pipe to print the stacktraces.
#### **7. Changes that were considered and eventually were dropped**
* replace watchdog timer with a POSIX timer:
according to [settimer man](https://linux.die.net/man/2/setitimer)
> POSIX.1-2008 marks getitimer() and setitimer() obsolete, recommending
the use of the POSIX timers API
([timer_gettime](https://linux.die.net/man/2/timer_gettime)(2),
[timer_settime](https://linux.die.net/man/2/timer_settime)(2), etc.)
instead.
However, although it is supposed to conform to POSIX std, POSIX timers
API is not supported on Mac.
You can take a look here at the Linux implementation:
[here](https://github.com/redis/redis/commit/c7562ee13546e504977372fdf40d33c3f86775a5)
To avoid messing up the code, and uncertainty regarding compatibility,
it was decided to drop it for now.
* avoid using sds (uses malloc) in logConfigDebugInfo
It was considered to print config info instead of using sds, however
apparently, `logConfigDebugInfo` does more than just print the sds, so
it was decided this fix is out of this issue scope.
#### **8. fix Signal mask check**
The check `signum & sig_mask` intended to indicate whether the signal is
blocked by the thread was incorrect. Actually, the bit position in the
signal mask corresponds to the signal number. We fixed this by changing
the condition to: `sig_mask & (1L << (sig_num - 1))`
#### **9. Unrelated changes**
both `fork.tcl `and `util.tcl` implemented a function called
`count_log_message` expecting different parameters. This caused
confusion when trying to run daily tests with additional test parameters
to run a specific test.
The `count_log_message` in `fork.tcl` was removed and the calls were
replaced with calls to `count_log_message` located in `util.tcl`
---------
Co-authored-by: Ozan Tezcan <ozantezcan@gmail.com>
Co-authored-by: Oran Agra <oran@redislabs.com>
2023-11-23 13:22:20 +02:00
|
|
|
switch (*buf) {
|
|
|
|
case '0': *buf = 'f'; break;
|
|
|
|
case '1': *buf = 'e'; break;
|
|
|
|
case '2': *buf = 'd'; break;
|
|
|
|
case '3': *buf = 'c'; break;
|
|
|
|
case '4': *buf = 'b'; break;
|
|
|
|
case '5': *buf = 'a'; break;
|
|
|
|
case '6': *buf = '9'; break;
|
|
|
|
case '7': *buf = '8'; break;
|
|
|
|
case '8': *buf = '7'; break;
|
|
|
|
case '9': *buf = '6'; break;
|
|
|
|
case 'a': *buf = '5'; break;
|
|
|
|
case 'b': *buf = '4'; break;
|
|
|
|
case 'c': *buf = '3'; break;
|
|
|
|
case 'd': *buf = '2'; break;
|
|
|
|
case 'e': *buf = '1'; break;
|
|
|
|
case 'f': *buf = '0'; break;
|
|
|
|
}
|
2024-05-08 20:58:53 +02:00
|
|
|
/* clang-format on */
|
Fix async safety in signal handlers (#12658)
see discussion from after https://github.com/redis/redis/pull/12453 was
merged
----
This PR replaces signals that are not considered async-signal-safe
(AS-safe) with safe calls.
#### **1. serverLog() and serverLogFromHandler()**
`serverLog` uses unsafe calls. It was decided that we will **avoid**
`serverLog` calls by the signal handlers when:
* The signal is not fatal, such as SIGALRM. In these cases, we prefer
using `serverLogFromHandler` which is the safe version of `serverLog`.
Note they have different prompts:
`serverLog`: `62220:M 26 Oct 2023 14:39:04.526 # <msg>`
`serverLogFromHandler`: `62220:signal-handler (1698331136) <msg>`
* The code was added recently. Calls to `serverLog` by the signal
handler have been there ever since Redis exists and it hasn't caused
problems so far. To avoid regression, from now we should use
`serverLogFromHandler`
#### **2. `snprintf` `fgets` and `strtoul`(base = 16) -------->
`_safe_snprintf`, `fgets_async_signal_safe`, `string_to_hex`**
The safe version of `snprintf` was taken from
[here](https://github.com/twitter/twemcache/blob/8cfc4ca5e76ed936bd3786c8cc43ed47e7778c08/src/mc_util.c#L754)
#### **3. fopen(), fgets(), fclose() --------> open(), read(), close()**
#### **4. opendir(), readdir(), closedir() --------> open(),
syscall(SYS_getdents64), close()**
#### **5. Threads_mngr sync mechanisms**
* waiting for the thread to generate stack trace: semaphore -------->
busy-wait
* `globals_rw_lock` was removed: as we are not using malloc and the
semaphore anymore we don't need to protect `ThreadsManager_cleanups`.
#### **6. Stacktraces buffer**
The initial problem was that we were not able to safely call malloc
within the signal handler.
To solve that we created a buffer on the stack of `writeStacktraces` and
saved it in a global pointer, assuming that under normal circumstances,
the function `writeStacktraces` would complete before any thread
attempted to write to it. However, **if threads lag behind, they might
access this global pointer after it no longer belongs to the
`writeStacktraces` stack, potentially corrupting memory.**
To address this, various solutions were discussed
[here](https://github.com/redis/redis/pull/12658#discussion_r1390442896)
Eventually, we decided to **create a pipe** at server startup that will
remain valid as long as the process is alive.
We chose this solution due to its minimal memory usage, and since
`write()` and `read()` are atomic operations. It ensures that stack
traces from different threads won't mix.
**The stacktraces collection process is now as follows:**
* Cleaning the pipe to eliminate writes of late threads from previous
runs.
* Each thread writes to the pipe its stacktrace
* Waiting for all the threads to mark completion or until a timeout (2
sec) is reached
* Reading from the pipe to print the stacktraces.
#### **7. Changes that were considered and eventually were dropped**
* replace watchdog timer with a POSIX timer:
according to [settimer man](https://linux.die.net/man/2/setitimer)
> POSIX.1-2008 marks getitimer() and setitimer() obsolete, recommending
the use of the POSIX timers API
([timer_gettime](https://linux.die.net/man/2/timer_gettime)(2),
[timer_settime](https://linux.die.net/man/2/timer_settime)(2), etc.)
instead.
However, although it is supposed to conform to POSIX std, POSIX timers
API is not supported on Mac.
You can take a look here at the Linux implementation:
[here](https://github.com/redis/redis/commit/c7562ee13546e504977372fdf40d33c3f86775a5)
To avoid messing up the code, and uncertainty regarding compatibility,
it was decided to drop it for now.
* avoid using sds (uses malloc) in logConfigDebugInfo
It was considered to print config info instead of using sds, however
apparently, `logConfigDebugInfo` does more than just print the sds, so
it was decided this fix is out of this issue scope.
#### **8. fix Signal mask check**
The check `signum & sig_mask` intended to indicate whether the signal is
blocked by the thread was incorrect. Actually, the bit position in the
signal mask corresponds to the signal number. We fixed this by changing
the condition to: `sig_mask & (1L << (sig_num - 1))`
#### **9. Unrelated changes**
both `fork.tcl `and `util.tcl` implemented a function called
`count_log_message` expecting different parameters. This caused
confusion when trying to run daily tests with additional test parameters
to run a specific test.
The `count_log_message` in `fork.tcl` was removed and the calls were
replaced with calls to `count_log_message` located in `util.tcl`
---------
Co-authored-by: Ozan Tezcan <ozantezcan@gmail.com>
Co-authored-by: Oran Agra <oran@redislabs.com>
2023-11-23 13:22:20 +02:00
|
|
|
}
|
|
|
|
}
|
|
|
|
return buf + 1;
|
|
|
|
}
|
|
|
|
|
|
|
|
static const char *check_longlong_async_signal_safe(const char *fmt, int32_t *have_longlong) {
|
|
|
|
*have_longlong = 0;
|
|
|
|
if (*fmt == 'l') {
|
|
|
|
fmt++;
|
|
|
|
if (*fmt != 'l') {
|
|
|
|
*have_longlong = (sizeof(long) == sizeof(int64_t));
|
|
|
|
} else {
|
|
|
|
fmt++;
|
|
|
|
*have_longlong = 1;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
return fmt;
|
|
|
|
}
|
|
|
|
|
|
|
|
int vsnprintf_async_signal_safe(char *to, size_t size, const char *format, va_list ap) {
|
|
|
|
char *start = to;
|
|
|
|
char *end = start + size - 1;
|
|
|
|
for (; *format; ++format) {
|
|
|
|
int32_t have_longlong = 0;
|
|
|
|
if (*format != '%') {
|
|
|
|
if (to == end) { /* end of buffer */
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
*to++ = *format; /* copy ordinary char */
|
|
|
|
continue;
|
|
|
|
}
|
|
|
|
++format; /* skip '%' */
|
|
|
|
|
|
|
|
format = check_longlong_async_signal_safe(format, &have_longlong);
|
|
|
|
|
|
|
|
switch (*format) {
|
|
|
|
case 'd':
|
|
|
|
case 'i':
|
|
|
|
case 'u':
|
|
|
|
case 'x':
|
|
|
|
case 'p':
|
|
|
|
{
|
|
|
|
int64_t ival = 0;
|
|
|
|
uint64_t uval = 0;
|
|
|
|
if (*format == 'p')
|
|
|
|
have_longlong = (sizeof(void *) == sizeof(uint64_t));
|
|
|
|
if (have_longlong) {
|
|
|
|
if (*format == 'u') {
|
|
|
|
uval = va_arg(ap, uint64_t);
|
|
|
|
} else {
|
|
|
|
ival = va_arg(ap, int64_t);
|
|
|
|
}
|
|
|
|
} else {
|
|
|
|
if (*format == 'u') {
|
|
|
|
uval = va_arg(ap, uint32_t);
|
|
|
|
} else {
|
|
|
|
ival = va_arg(ap, int32_t);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
{
|
|
|
|
char buff[22];
|
|
|
|
const int base = (*format == 'x' || *format == 'p') ? 16 : 10;
|
|
|
|
|
|
|
|
/* *INDENT-OFF* */
|
|
|
|
char *val_as_str = (*format == 'u') ?
|
|
|
|
u2string_async_signal_safe(base, uval, &buff[sizeof(buff) - 1]) :
|
|
|
|
i2string_async_signal_safe(base, ival, &buff[sizeof(buff) - 1]);
|
|
|
|
/* *INDENT-ON* */
|
|
|
|
|
|
|
|
/* Strip off "ffffffff" if we have 'x' format without 'll' */
|
|
|
|
if (*format == 'x' && !have_longlong && ival < 0) {
|
|
|
|
val_as_str += 8;
|
|
|
|
}
|
|
|
|
|
|
|
|
while (*val_as_str && to < end) {
|
|
|
|
*to++ = *val_as_str++;
|
|
|
|
}
|
|
|
|
continue;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
case 's':
|
|
|
|
{
|
|
|
|
const char *val = va_arg(ap, char *);
|
|
|
|
if (!val) {
|
|
|
|
val = "(null)";
|
|
|
|
}
|
|
|
|
while (*val && to < end) {
|
|
|
|
*to++ = *val++;
|
|
|
|
}
|
|
|
|
continue;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
*to = 0;
|
|
|
|
return (int)(to - start);
|
|
|
|
}
|
|
|
|
|
|
|
|
int snprintf_async_signal_safe(char *to, size_t n, const char *fmt, ...) {
|
|
|
|
int result;
|
|
|
|
va_list args;
|
|
|
|
va_start(args, fmt);
|
|
|
|
result = vsnprintf_async_signal_safe(to, n, fmt, args);
|
|
|
|
va_end(args);
|
|
|
|
return result;
|
|
|
|
}
|