futriix

Author	SHA1	Message	Date
yoav-steinberg	843a4cdc07	Add warning for suspected slow system clocksource setting (#10636 ) This PR does 2 main things: 1) Add warning for suspected slow system clocksource setting. This is Linux specific. 2) Add a `--check-system` argument to redis which runs all system checks and prints a report. ## System checks Add a command line option `--check-system` which runs all known system checks and provides a report to stdout of which systems checks have failed with details on how to reconfigure the system for optimized redis performance. The `--system-check` mode exists with an appropriate error code after running all the checks. ## Slow clocksource details We check the system's clocksource performance by running `clock_gettime()` in a loop and then checking how much time was spent in a system call (via `getrusage()`). If we spend more than 10% of the time in the kernel then we print a warning. I verified that using the slow clock sources: `acpi_pm` (~90% in the kernel on my laptop) and `xen` (~30% in the kernel on an ec2 `m4.large`) we get this warning. The check runs 5 system ticks so we can detect time spent in kernel at 20% jumps (0%,20%,40%...). Anything more accurate will require the test to run longer. Typically 5 ticks are 50ms. This means running the test on startup will delay startup by 50ms. To avoid this we make sure the test is only executed in the `--check-system` mode. For a quick startup check, we specifically warn if the we see the system is using the `xen` clocksource which we know has bad performance and isn't recommended (at least on ec2). In such a case the user should manually rung redis with `--check-system` to force the slower clocksource test described above. ## Other changes in the PR * All the system checks are now implemented as functions in _syscheck.c_. They are implemented using a standard interface (see details in _syscheck.c_). To do this I moved the checking functions `linuxOvercommitMemoryValue()`, `THPIsEnabled()`, `linuxMadvFreeForkBugCheck()` out of _server.c_ and _latency.c_ and into the new _syscheck.c_. When moving these functions I made sure they don't depend on other functionality provided in _server.c_ and made them use a standard "check functions" interface. Specifically: * I removed all logging out of `linuxMadvFreeForkBugCheck()`. In case there's some unexpected error during the check aborts as before, but without any logging. It returns an error code 0 meaning the check didn't not complete. * All these functions now return 1 on success, -1 on failure, 0 in case the check itself cannot be completed. * The `linuxMadvFreeForkBugCheck()` function now internally calls `exit()` and not `exitFromChild()` because the latter is only available in _server.c_ and I wanted to remove that dependency. This isn't an because we don't need to worry about the child process created by the test doing anything related to the rdb/aof files which is why `exitFromChild()` was created. * This also fixes parsing of other /proc/\<pid\>/stat fields to correctly handle spaces in the process name and be more robust in general. Not that before this fix the rss info in `INFO memory` was corrupt in case of spaces in the process name. To recreate just rename `redis-server` to `redis server`, start it, and run `INFO memory`.	2022-05-22 17:10:31 +03:00
David CARLIER	bdcd4b3df8	zmalloc_get_rss implementation for haiku. (#10687 ) also fixing already defined constants build warning while at it. Co-authored-by: Oran Agra <oran@redislabs.com>	2022-05-08 15:12:17 +03:00
David CARLIER	834fa5870c	zmalloc_get_rss openbsd implementation (#10149 ) Add support for getting the RSS in OpenBSD	2022-01-19 20:56:12 +02:00
David CARLIER	50fa627b90	zmalloc_get_rss netbsd impl fix proposal. (#10116 ) Seems like the previous implementation was broken (always returning 0) since kinfo_proc2 is used the KERN_PROC2 sysctl oid is more appropriate and also the query's length was not necessarily accurate (6 here).	2022-01-16 10:03:09 +02:00
filipe oliveira	5dd15443ac	Added INFO LATENCYSTATS section: latency by percentile distribution/latency by cumulative distribution of latencies (#9462 ) # Short description The Redis extended latency stats track per command latencies and enables: - exporting the per-command percentile distribution via the `INFO LATENCYSTATS` command. ( percentile distribution is not mergeable between cluster nodes ). - exporting the per-command cumulative latency distributions via the `LATENCY HISTOGRAM` command. Using the cumulative distribution of latencies we can merge several stats from different cluster nodes to calculate aggregate metrics . By default, the extended latency monitoring is enabled since the overhead of keeping track of the command latency is very small. If you don't want to track extended latency metrics, you can easily disable it at runtime using the command: - `CONFIG SET latency-tracking no` By default, the exported latency percentiles are the p50, p99, and p999. You can alter them at runtime using the command: - `CONFIG SET latency-tracking-info-percentiles "0.0 50.0 100.0"` ## Some details: - The total size per histogram should sit around 40 KiB. We only allocate those 40KiB when a command was called for the first time. - With regards to the WRITE overhead As seen below, there is no measurable overhead on the achievable ops/sec or full latency spectrum on the client. Including also the measured redis-benchmark for unstable vs this branch. - We track from 1 nanosecond to 1 second ( everything above 1 second is considered +Inf ) ## `INFO LATENCYSTATS` exposition format - Format: `latency_percentiles_usec_<CMDNAME>:p0=XX,p50....` ## `LATENCY HISTOGRAM [command ...]` exposition format Return a cumulative distribution of latencies in the format of a histogram for the specified command names. The histogram is composed of a map of time buckets: - Each representing a latency range, between 1 nanosecond and roughly 1 second. - Each bucket covers twice the previous bucket's range. - Empty buckets are not printed. - Everything above 1 sec is considered +Inf. - At max there will be log2(1000000000)=30 buckets We reply a map for each command in the format: `<command name> : { `calls`: <total command calls> , `histogram` : { <bucket 1> : latency , < bucket 2> : latency, ... } }` Co-authored-by: Oran Agra <oran@redislabs.com>	2022-01-05 14:01:05 +02:00
sundb	e725d737fb	Add --large-memory flag for REDIS_TEST to enable tests that consume more than 100mb (#9784 ) This is a preparation step in order to add a new test in quicklist.c see #9776	2021-11-16 08:55:10 +02:00
DarrenJiang13	8ab33c18e4	fix a compilation error around madvise when make with jemalloc on MacOS (#9350 ) We only use MADV_DONTNEED on Linux, that's were it was tested.	2021-08-10 11:32:27 +03:00
Wang Yuan	d4bca53cd9	Use madvise(MADV_DONTNEED) to release memory to reduce COW (#8974 ) ## Backgroud As we know, after `fork`, one process will copy pages when writing data to these pages(CoW), and another process still keep old pages, they totally cost more memory. For redis, we suffered that redis consumed much memory when the fork child is serializing key/values, even that maybe cause OOM. But actually we find, in redis fork child process, the child process don't need to keep some memory and parent process may write or update that, for example, child process will never access the key-value that is serialized but users may update it in parent process. So we think it may reduce COW if the child process release memory that it is not needed. ## Implementation For releasing key value in child process, we may think we call `decrRefCount` to free memory, but i find the fork child process still use much memory when we don't write any data to redis, and it costs much more time that slows down bgsave. Maybe because memory allocator doesn't really release memory to OS, and it may modify some inner data for this free operation, especially when we free small objects. Moreover, CoW is based on pages, so it is a easy way that we only free the memory bulk that is not less than kernel page size. madvise(MADV_DONTNEED) can quickly release specified region pages to OS bypassing memory allocator, and allocator still consider that this memory still is used and don't change its inner data. There are some buffers we can release in the fork child process: - Serialized key-values the fork child process never access serialized key-values, so we try to free them. Because we only can release big bulk memory, and it is time consumed to iterate all items/members/fields/entries of complex data type. So we decide to iterate them and try to release them only when their average size of item/member/field/entry is more than page size of OS. - Replication backlog Because replication backlog is a cycle buffer, it will be changed quickly if redis has heavy write traffic, but in fork child process, we don't need to access that. - Client buffers If clients have requests during having the fork child process, clients' buffer also be changed frequently. The memory includes client query buffer, output buffer, and client struct used memory. To get child process peak private dirty memory, we need to count peak memory instead of last used memory, because the child process may continue to release memory (since COW used to only grow till now, the last was equivalent to the peak). Also we're adding a new `current_cow_peak` info variable (to complement the existing `current_cow_size`) Co-authored-by: Oran Agra <oran@redislabs.com>	2021-08-04 23:01:46 +03:00
Yossi Gottlieb	c3df27d1ea	Fix slowdown due to child reporting CoW. (#8645 ) Reading CoW from /proc/<pid>/smaps can be slow with large processes on some platforms. This measures the time it takes to read CoW info and limits the duty cycle of future updates to roughly 1/100. As current_cow_size no longer represnets a current, fixed interval value there is also a new current_cow_size_age field that provides information about the age of the size value, in seconds.	2021-03-22 13:25:58 +02:00
sundb	95d6297db8	Add run all test support with define REDIS_TEST (#8570 ) 1. Add `redis-server test all` support to run all tests. 2. Add redis test to daily ci. 3. Add `--accurate` option to run slow tests for more iterations (so that by default we run less cycles (shorter time, and less prints). 4. Move dict benchmark to REDIS_TEST. 5. fix some leaks in tests 6. make quicklist tests run on a specific fill set of options rather than huge ranges 7. move some prints in quicklist test outside their loops to reduce prints 8. removing sds.h from dict.c since it is now used in both redis-server and redis-cli (uses hiredis sds)	2021-03-10 09:13:11 +02:00
Yossi Gottlieb	af2175326c	Fix memory info on FreeBSD. (#8620 ) The obtained process_rss was incorrect (the OS reports pages, not bytes), resulting with many other fields getting corrupted. This has been tested on FreeBSD but not other platforms.	2021-03-09 11:33:32 +02:00
Yossi Gottlieb	3ea4c43add	Cleanup usage of malloc_usable_size. (#8554 ) * Add better control of malloc_usable_size() usage. * Use malloc_usable_size on alpine libc daily job. * Add no-malloc-usable-size daily jobs. * Fix zmalloc(0) when HAVE_MALLOC_SIZE is undefined. In order to align with the jemalloc behavior, this should never return NULL or OOM panic.	2021-02-25 09:24:41 +02:00
Yossi Gottlieb	dd885780d6	Fix compile errors with no HAVE_MALLOC_SIZE. (#8533 ) Also adds a new daily CI test, relying on the fact that we don't use malloc_size() on alpine libmusl. Fixes #8531	2021-02-23 17:08:49 +02:00
Yossi Gottlieb	d32f2e9999	Fix integer overflow (CVE-2021-21309). (#8522 ) On 32-bit systems, setting the proto-max-bulk-len config parameter to a high value may result with integer overflow and a subsequent heap overflow when parsing an input bulk (CVE-2021-21309). This fix has two parts: Set a reasonable limit to the config parameter. Add additional checks to prevent the problem in other potential but unknown code paths.	2021-02-22 15:41:32 +02:00
Oran Agra	8dd16caec8	Fix last COW INFO report, Skip test on non-linux platforms (#8301 ) - the last COW report wasn't always read from the pipe (receiveLastChildInfo wasn't used) - but in fact, there's no reason we won't always try to drain that pipe so i'm unifying receiveLastChildInfo with receiveChildInfo - adjust threshold of the COW test when run in accurate mode - add some prints in case this test fails again - fix indentation, page size, and PID! in MacOS proc info p.s. it seems that pri_pages_dirtied is always 0	2021-01-08 23:35:30 +02:00
Yossi Gottlieb	86e3395c11	Several (mostly Solaris-related) cleanups (#8171 ) * Allow runtest-moduleapi use a different 'make', for systems where GNU Make is 'gmake'. * Fix issue with builds on Solaris re-building everything from scratch due to CFLAGS/LDFLAGS not stored. * Fix compile failure on Solaris due to atomicvar and a bunch of warnings. * Fix garbled log timestamps on Solaris.	2020-12-13 17:09:54 +02:00
David CARLIER	ec951cdc15	Solaris based system rss size report. (#8138 )	2020-12-06 15:30:29 +02:00
Oran Agra	7ca00d694d	Sanitize dump payload: fail RESTORE if memory allocation fails When RDB input attempts to make a huge memory allocation that fails, RESTORE should fail gracefully rather than die with panic	2020-12-06 14:54:34 +02:00
David CARLIER	d428de590f	DragonFlyBSD resident memory amount (almost) similar as FreeBSD. (#8023 )	2020-11-08 09:16:14 +02:00
Yossi Gottlieb	9824fe3e39	Fix wrong zmalloc_size() assumption. (#7963 ) When using a system with no malloc_usable_size(), zmalloc_size() assumed that the heap allocator always returns blocks that are long-padded. This may not always be the case, and will result with zmalloc_size() returning a size that is bigger than allocated. At least in one case this leads to out of bound write, process crash and a potential security vulnerability. Effectively this does not affect the vast majority of users, who use jemalloc or glibc. This problem along with a (different) fix was reported by Drew DeVault.	2020-10-26 14:49:08 +02:00
Oran Agra	3945a32177	performance and memory reporting improvement - sds take control of it's internal frag (#7875 ) This commit has two aspects: 1) improve memory reporting for all the places that use sdsAllocSize to compute memory used by a string, in this case it'll include the internal fragmentation. 2) reduce the need for realloc calls by making the sds implicitly take over the internal fragmentation of the block it allocated.	2020-10-02 08:19:44 +03:00
David CARLIER	ce8bfc56ad	getting rss size implementation for netbsd (#7293 )	2020-09-29 08:49:35 +03:00
Wang Yuan	445a4b669a	Implement redisAtomic to replace _Atomic C11 builtin (#7707 ) Redis 6.0 introduces I/O threads, it is so cool and efficient, we use C11 _Atomic to establish inter-thread synchronization without mutex. But the compiler that must supports C11 _Atomic can compile redis code, that brings a lot of inconvenience since some common platforms can't support by default such as CentOS7, so we want to implement redis atomic type to make it more portable. We have implemented our atomic variable for redis that only has 'relaxed' operations in src/atomicvar.h, so we implement some operations with 'sequentially-consistent', just like the default behavior of C11 _Atomic that can establish inter-thread synchronization. And we replace all uses of C11 _Atomic with redis atomic variable. Our implementation of redis atomic variable uses C11 _Atomic, __atomic or __sync macros if available, it supports most common platforms, and we will detect automatically which feature we use. In Makefile we use a dummy file to detect if the compiler supports C11 _Atomic. Now for gcc, we can compile redis code theoretically if your gcc version is not less than 4.1.2(starts to support __sync_xxx operations). Otherwise, we remove use mutex fallback to implement redis atomic variable for performance and test. You will get compiling errors if your compiler doesn't support all features of above. For cover redis atomic variable tests, we add other CI jobs that build redis on CentOS6 and CentOS7 and workflow daily jobs that run the tests on them. For them, we just install gcc by default in order to cover different compiler versions, gcc is 4.4.7 by default installation on CentOS6 and 4.8.5 on CentOS7. We restore the feature that we can test redis with Helgrind to find data race errors. But you need install Valgrind in the default path configuration firstly before running your tests, since we use macros in helgrind.h to tell Helgrind inter-thread happens-before relationship explicitly for avoiding false positives. Please open an issue on github if you find data race errors relate to this commit. Unrelated: - Fix redefinition of typedef 'RedisModuleUserChangedFunc' For some old version compilers, they will report errors or warnings, if we re-define function type.	2020-09-17 16:01:45 +03:00
Oran Agra	50f5181488	Remove dead code from update_zmalloc_stat_alloc (#7589 ) this seems like leftover from before 6eb51bf	2020-07-31 13:01:39 +03:00
antirez	4092a75d85	Avoid collision with MacOS LIST_HEAD macro after #6384 .	2019-12-02 09:13:29 +01:00
Salvatore Sanfilippo	e5b5f9a2f6	Merge pull request #6384 from devnexen/apple_smaps_impl Getting region date per process in Darwin	2019-12-02 09:02:08 +01:00
Oran Agra	bf759cc9c3	Merge remote-tracking branch 'antirez/unstable' into jemalloc_purge_bg	2019-10-04 13:53:40 +03:00
Oran Agra	2e19b94113	RED-31295 - redis: avoid race between dlopen and thread creation It seeems that since I added the creation of the jemalloc thread redis sometimes fails to start with the following error: Inconsistency detected by ld.so: dl-tls.c: 493: _dl_allocate_tls_init: Assertion `listp->slotinfo[cnt].gen <= GL(dl_tls_generation)' failed! This seems to be due to a race bug in ld.so, in which TLS creation on the thread, collide with dlopen. Move the creation of BIO and jemalloc threads to after modules are loaded. plus small bugfix when trying to disable the jemalloc thread at runtime	2019-10-02 15:39:44 +03:00
David Carlier	5a8a005026	Adding AnonHugePages case + comments	2019-09-20 11:01:36 +01:00
David Carlier	819a661be5	Getting region date per process in Darwin	2019-09-15 14:05:00 +01:00
David Carlier	f1c6c658ac	Updating resident memory request impl on FreeBSD.	2019-07-28 14:33:57 +01:00
Oran Agra	09f99c2a92	make redis purge jemalloc after flush, and enable background purging thread jemalloc 5 doesn't immediately release memory back to the OS, instead there's a decaying mechanism, which doesn't work when there's no traffic (no allocations). this is most evident if there's no traffic after flushdb, the RSS will remain high. 1) enable jemalloc background purging 2) explicitly purge in flushdb	2019-06-02 15:33:14 +03:00
antirez	9dabbd1ab0	Alter coding style in #4696 to conform to Redis code base.	2019-03-21 12:18:55 +01:00
Salvatore Sanfilippo	5c47e2e964	Merge pull request #4696 from oranagra/zrealloc_fix Fix zrealloc to behave similarly to je_realloc when size is 0	2019-03-21 12:18:04 +01:00
Bruce Merry	8fd1031b10	Fix incorrect memory usage accounting in zrealloc When HAVE_MALLOC_SIZE is false, each call to zrealloc causes used_memory to increase by PREFIX_SIZE more than it should, due to mis-matched accounting between the original zmalloc (which includes PREFIX size in its increment) and zrealloc (which misses it from its decrement). I've also supplied a command-line test to easily demonstrate the problem. It's not wired into the test framework, because I don't know TCL so I'm not sure how to automate it.	2018-09-30 11:49:03 +02:00
Oran Agra	780815dd6e	fix recursion typo in zmalloc_usable	2018-07-22 10:17:35 +03:00
Oran Agra	bf680b6f8c	slave buffers were wasteful and incorrectly counted causing eviction A) slave buffers didn't count internal fragmentation and sds unused space, this caused them to induce eviction although we didn't mean for it. B) slave buffers were consuming about twice the memory of what they actually needed. - this was mainly due to sdsMakeRoomFor growing to twice as much as needed each time but networking.c not storing more than 16k (partially fixed recently in 237a38737). - besides it wasn't able to store half of the new string into one buffer and the other half into the next (so the above mentioned fix helped mainly for small items). - lastly, the sds buffers had up to 30% internal fragmentation that was wasted, consumed but not used. C) inefficient performance due to starting from a small string and reallocing many times. what i changed: - creating dedicated buffers for reply list, counting their size with zmalloc_size - when creating a new reply node from, preallocate it to at least 16k. - when appending a new reply to the buffer, first fill all the unused space of the previous node before starting a new one. other changes: - expose mem_not_counted_for_evict info field for the benefit of the test suite - add a test to make sure slave buffers are counted correctly and that they don't cause eviction	2018-07-16 16:43:42 +03:00
Jack Drogon	93238575f7	Fix typo	2018-07-03 18:19:46 +02:00
Fuxin Hao	a4f658b2b5	Fix update_zmalloc_stat_alloc in zrealloc	2018-06-14 16:44:19 +08:00
Salvatore Sanfilippo	e2a9ea0405	Merge pull request #4901 from KFilipek/zmalloc_typo_fix HW_PHYSMEM typo in preprocessor condition	2018-06-11 16:32:40 +02:00
Remi Collet	9561fec496	include stdint.h for unit64_t definition	2018-05-30 15:33:06 +02:00
Oran Agra	ad133e1023	Active defrag fixes for 32bit builds problems fixed: * failing to read fragmentation information from jemalloc * overflow in jemalloc fragmentation hint to the defragger * test suite not triggering eviction after population	2018-05-17 09:52:00 +03:00
Krzysztof Filipek	fd9177dd33	Typo in preprocessor condition	2018-05-06 20:18:48 +02:00
Oran Agra	806736cdf9	Adding real allocator fragmentation to INFO and MEMORY command + active defrag test other fixes / improvements: - LUA script memory isn't taken from zmalloc (taken from libc malloc) so it can cause high fragmentation ratio to be displayed (which is false) - there was a problem with "fragmentation" info being calculated from RSS and used_memory sampled at different times (now sampling them together) other details: - adding a few more allocator info fields to INFO and MEMORY commands - improve defrag test to measure defrag latency of big keys - increasing the accuracy of the defrag test (by looking at real grag info) this way we can use an even lower threshold and still avoid false positives - keep the old (total) "fragmentation" field unchanged, but add new ones for spcific things - add these the MEMORY DOCTOR command - deduct LUA memory from the rss in case of non jemalloc allocator (one for which we don't "allocator active/used") - reduce sampling rate of the rss and allocator info	2018-03-12 15:08:52 +02:00
Oran Agra	5def65008f	Fix zrealloc to behave similarly to je_realloc when size is 0 According to C11, the behavior of realloc with size 0 is now deprecated. it can either behave as free(ptr) and return NULL, or return a valid pointer. but in zmalloc it can lead to zmalloc_oom_handler and panic. and that can affect modules that use it. It looks like both glibc allocator and jemalloc behave like so: realloc(malloc(32),0) returns NULL realloc(NULL,0) returns a valid pointer This commit changes zmalloc to behave the same	2018-02-21 11:04:13 +02:00
antirez	6eb51bf1ec	zmalloc.c: remove thread safe mode, it's the default way.	2017-05-09 16:59:51 +02:00
antirez	2a51bac44e	Simplify atomicvar.h usage by having the mutex name implicit.	2017-05-04 17:01:00 +02:00
antirez	f47607af02	Fix preprocessor if/else chain broken in order to fix #3927 .	2017-04-11 16:54:27 +02:00
antirez	aa5b4be02e	Fix zmalloc_get_memory_size() ifdefs to actually use the else branch. Close #3927.	2017-04-11 16:45:11 +02:00
antirez	173d692bc2	Defrag: activate it only if running modified version of Jemalloc. This commit also includes minor aesthetic changes like removal of trailing spaces.	2017-01-10 11:25:39 +01:00

1 2

85 Commits