16 Commits

Author SHA1 Message Date
Pieter Cailliau
4d284daefd
Copyright update to reflect IP transfer from salvatore to Redis (#740)
Update references of copyright being assigned to Salvatore when it was
transferred to Redis Ltd. as per
https://github.com/valkey-io/valkey/issues/544.

---------

Signed-off-by: Pieter Cailliau <pieter@redis.com>
2024-08-14 09:20:36 -07:00
Ping Xie
c41dd77a3e
Add clang-format configs (#323)
I have validated that these settings closely match the existing coding
style with one major exception on `BreakBeforeBraces`, which will be
`Attach` going forward. The mixed `BreakBeforeBraces` styles in the
current codebase are hard to imitate and also very odd IMHO - see below

```
if (a == 1) { /*Attach */
}
```

```
if (a == 1 ||
    b == 2)
{ /* Why? */
}
```

Please do NOT merge just yet. Will add the github action next once the
style is reviewed/approved.

---------

Signed-off-by: Ping Xie <pingxie@google.com>
2024-05-22 23:24:12 -07:00
Chen Tianjie
29ca87955e
Add basic eventloop latency measurement. (#11963)
The measured latency(duration) includes the list below, which can be shown by `INFO STATS`.
```
eventloop_cycles  // ever increasing counter
eventloop_duration_sum // cumulative duration of eventloop in microseconds
eventloop_duration_cmd_sum  // cumulative duration of executing commands in microseconds
instantaneous_eventloop_cycles_per_sec  // average eventloop count per second in recent 1.6s
instantaneous_eventloop_duration_usec  // average single eventloop duration in recent 1.6s
```

Also added some experimental metrics, which are shown only when `INFO DEBUG` is called.
This section isn't included in the default INFO, or even in `INFO ALL` and the fields in this section
can change in the future without considering backwards compatibility.
```
eventloop_duration_aof_sum  // cumulative duration of writing AOF
eventloop_duration_cron_sum  // cumulative duration cron jobs (serverCron, beforeSleep excluding IO and AOF)
eventloop_cmd_per_cycle_max  // max number of commands executed in one eventloop
eventloop_duration_max  // max duration of one eventloop
```

All of these are being reset by CONFIG RESETSTAT
2023-05-12 20:13:15 +03:00
yoav-steinberg
843a4cdc07
Add warning for suspected slow system clocksource setting (#10636)
This PR does 2 main things:
1) Add warning for suspected slow system clocksource setting. This is Linux specific.
2) Add a `--check-system` argument to redis which runs all system checks and prints a report.

## System checks
Add a command line option `--check-system` which runs all known system checks and provides
a report to stdout of which systems checks have failed with details on how to reconfigure the
system for optimized redis performance.
The `--system-check` mode exists with an appropriate error code after running all the checks.

## Slow clocksource details
We check the system's clocksource performance by running `clock_gettime()` in a loop and then
checking how much time was spent in a system call (via `getrusage()`). If we spend more than
10% of the time in the kernel then we print a warning. I verified that using the slow clock sources:
`acpi_pm` (~90% in the kernel on my laptop) and `xen` (~30% in the kernel on an ec2 `m4.large`)
we get this warning.

The check runs 5 system ticks so we can detect time spent in kernel at 20% jumps (0%,20%,40%...).
Anything more accurate will require the test to run longer. Typically 5 ticks are 50ms. This means
running the test on startup will delay startup by 50ms. To avoid this we make sure the test is only
executed in the `--check-system` mode.

For a quick startup check, we specifically warn if the we see the system is using the `xen` clocksource
which we know has bad performance and isn't recommended (at least on ec2). In such a case the
user should manually rung redis with `--check-system` to force the slower clocksource test described
above.

## Other changes in the PR

* All the system checks are now implemented as functions in _syscheck.c_.
  They are implemented using a standard interface (see details in _syscheck.c_).
  To do this I moved the checking functions `linuxOvercommitMemoryValue()`,
  `THPIsEnabled()`, `linuxMadvFreeForkBugCheck()` out of _server.c_ and _latency.c_
  and into the new _syscheck.c_. When moving these functions I made sure they don't
  depend on other functionality provided in _server.c_ and made them use a standard
  "check functions" interface. Specifically:
  * I removed all logging out of `linuxMadvFreeForkBugCheck()`. In case there's some
    unexpected error during the check aborts as before, but without any logging.
    It returns an error code 0 meaning the check didn't not complete.
  * All these functions now return 1 on success, -1 on failure, 0 in case the check itself
    cannot be completed.
  * The `linuxMadvFreeForkBugCheck()` function now internally calls `exit()` and not
    `exitFromChild()` because the latter is only available in _server.c_ and I wanted to
    remove that dependency. This isn't an because we don't need to worry about the
    child process created by the test doing anything related to the rdb/aof files which
    is why `exitFromChild()` was created.

* This also fixes parsing of other /proc/\<pid\>/stat fields to correctly handle spaces
  in the process name and be more robust in general. Not that before this fix the rss
  info in `INFO memory` was corrupt in case of spaces in the process name. To
  recreate just rename `redis-server` to `redis server`, start it, and run `INFO memory`.
2022-05-22 17:10:31 +03:00
zhenwei pi
a9c0602149
Disable THP if enabled (#7381)
In case redis starts and find that THP is enabled ("always"), instead
of printing a log message, which might go unnoticed, redis will try to
disable it (just for the redis process).

Note: it looks like on self-bulit kernels THP is likely be set to "always" by default.

Some discuss about THP side effect on Linux:
according to http://www.antirez.com/news/84, we can see that
redis latency spikes are caused by linux kernel THP feature.
I have tested on E3-2650 v3, and found that 2M huge page costs
about 0.25ms to fix COW page fault.

Add a new config 'disable-thp', the recommended setting is 'yes',
(default) the redis tries to disable THP by prctl syscall. But
users who really want THP can set it to "no"

Thanks to Oran & Yossi for suggestions.

Signed-off-by: zhenwei pi <pizhenwei@bytedance.com>
2020-10-27 15:04:18 +02:00
Oran Agra
e3b1d6d3ad Module API for LatencyAddSample 2019-10-24 14:24:55 +03:00
antirez
585d1a60bf Separate latency monitoring of eviction loop and eviction DELs. 2015-02-11 10:52:27 +01:00
antirez
110f0464e0 Check THP support at startup and warn about it. 2014-11-12 10:55:47 +01:00
antirez
2a232dfa9a LATENCY DOCTOR: initial draft and events summary output. 2014-07-08 11:31:46 +02:00
antirez
19853db892 Latency: low level time series analysis implemented. 2014-07-07 15:00:01 +02:00
antirez
a887af34e1 latencyStartMonitor() modified to avoid warnings. 2014-07-02 16:53:44 +02:00
antirez
6f20482a86 latencyTimeSeries structure max field type fixed. 2014-07-02 16:14:28 +02:00
antirez
ed4980243a License added to latency.h. 2014-07-02 10:06:58 +02:00
antirez
753b707d2a Latency monitor: command duration is in useconds. Convert. 2014-07-01 16:09:02 +02:00
antirez
8612e6de88 Latency monitor: collect slow commands.
We introduce the distinction between slow and fast commands since those
are two different sources of latency. An O(1) or O(log N) command without
side effects (can't trigger deletion of large objects as a side effect of
its execution) if delayed is a symptom of inherent latency of the system.

A non-fast command (commands that may run large O(N) computations) if
delayed may just mean that the user is executing slow operations.

The advices LATENCY should provide in this two different cases are
different, so we log the two classes of commands in a separated way.
2014-07-01 11:47:08 +02:00
antirez
d7a07a2012 Latency monitor: basic samples collection. 2014-07-01 11:30:15 +02:00