Compare commits

...

1136 Commits

Author SHA1 Message Date
5b4ca543a2 Update cluster-experimental
Some checks failed
Daily / test-sanitizer-address (gcc) (push) Has been cancelled
Daily / test-sanitizer-undefined (clang) (push) Has been cancelled
Daily / test-sanitizer-undefined (gcc) (push) Has been cancelled
Daily / test-sanitizer-force-defrag (push) Has been cancelled
Daily / test-almalinux8-jemalloc (push) Has been cancelled
Daily / test-almalinux9-jemalloc (push) Has been cancelled
Daily / test-fedoralatest-jemalloc (push) Has been cancelled
Daily / test-fedorarawhide-jemalloc (push) Has been cancelled
Daily / test-centosstream9-jemalloc (push) Has been cancelled
Daily / test-almalinux8-tls-module (push) Has been cancelled
Daily / test-almalinux9-tls-module (push) Has been cancelled
Daily / test-fedoralatest-tls-module (push) Has been cancelled
Daily / test-fedorarawhide-tls-module (push) Has been cancelled
Daily / test-centosstream9-tls-module (push) Has been cancelled
Daily / test-almalinux8-tls-module-no-tls (push) Has been cancelled
Daily / test-almalinux9-tls-module-no-tls (push) Has been cancelled
Daily / test-fedoralatest-tls-module-no-tls (push) Has been cancelled
Daily / test-fedorarawhide-tls-module-no-tls (push) Has been cancelled
Daily / test-centosstream9-tls-module-no-tls (push) Has been cancelled
Daily / test-macos-latest (push) Has been cancelled
Daily / test-macos-latest-sentinel (push) Has been cancelled
Daily / test-macos-latest-cluster (push) Has been cancelled
Daily / build-macos (macos-13) (push) Has been cancelled
Daily / build-macos (macos-14) (push) Has been cancelled
Daily / test-freebsd (push) Has been cancelled
Daily / test-alpine-jemalloc (push) Has been cancelled
Daily / test-alpine-libc-malloc (push) Has been cancelled
Daily / reply-schemas-validator (push) Has been cancelled
Daily / notify-about-job-results (push) Has been cancelled
CodeQL / Analyze (cpp) (push) Has been cancelled
2025-04-04 19:40:14 +00:00
2eff3d9874 Update cluster
Some checks are pending
CI / test-rdma (push) Waiting to run
CI / build-debian-old (push) Waiting to run
CI / build-macos-latest (push) Waiting to run
CI / build-32bit (push) Waiting to run
CI / build-libc-malloc (push) Waiting to run
CI / build-almalinux8-jemalloc (push) Waiting to run
CI / test-ubuntu-latest (push) Waiting to run
CI / test-ubuntu-latest-cmake (push) Waiting to run
CI / test-sanitizer-address (push) Waiting to run
CI / format-yaml (push) Waiting to run
Clang Format Check / clang-format-check (push) Waiting to run
Codecov / code-coverage (push) Waiting to run
External Server Tests / test-external-standalone (push) Waiting to run
External Server Tests / test-external-cluster (push) Waiting to run
External Server Tests / test-external-nodebug (push) Waiting to run
Spellcheck / Spellcheck (push) Waiting to run
2025-04-04 19:39:42 +00:00
90ac6479c3 Update cluster-experimental
Some checks are pending
CI / build-almalinux8-jemalloc (push) Waiting to run
CI / test-ubuntu-latest (push) Waiting to run
CI / test-ubuntu-latest-cmake (push) Waiting to run
CI / test-sanitizer-address (push) Waiting to run
CI / test-rdma (push) Waiting to run
CI / build-debian-old (push) Waiting to run
CI / build-macos-latest (push) Waiting to run
CI / build-32bit (push) Waiting to run
CI / build-libc-malloc (push) Waiting to run
CI / format-yaml (push) Waiting to run
Clang Format Check / clang-format-check (push) Waiting to run
Codecov / code-coverage (push) Waiting to run
External Server Tests / test-external-nodebug (push) Waiting to run
External Server Tests / test-external-standalone (push) Waiting to run
External Server Tests / test-external-cluster (push) Waiting to run
Spellcheck / Spellcheck (push) Waiting to run
2025-04-04 19:31:29 +00:00
1dbf196e5a Upload files to "/"
Some checks are pending
CI / test-rdma (push) Waiting to run
CI / build-debian-old (push) Waiting to run
CI / build-macos-latest (push) Waiting to run
CI / build-32bit (push) Waiting to run
CI / build-libc-malloc (push) Waiting to run
CI / build-almalinux8-jemalloc (push) Waiting to run
CI / test-ubuntu-latest (push) Waiting to run
CI / test-ubuntu-latest-cmake (push) Waiting to run
CI / test-sanitizer-address (push) Waiting to run
CI / format-yaml (push) Waiting to run
Clang Format Check / clang-format-check (push) Waiting to run
Codecov / code-coverage (push) Waiting to run
External Server Tests / test-external-standalone (push) Waiting to run
External Server Tests / test-external-cluster (push) Waiting to run
External Server Tests / test-external-nodebug (push) Waiting to run
Spellcheck / Spellcheck (push) Waiting to run
2025-04-04 19:27:28 +00:00
65ab81a1ac Update README.md
Some checks failed
CI / build-32bit (push) Has been cancelled
CI / build-libc-malloc (push) Has been cancelled
CI / build-almalinux8-jemalloc (push) Has been cancelled
Spellcheck / Spellcheck (push) Has been cancelled
CI / test-ubuntu-latest (push) Has been cancelled
CI / test-ubuntu-latest-cmake (push) Has been cancelled
CI / test-sanitizer-address (push) Has been cancelled
CI / test-rdma (push) Has been cancelled
CI / build-debian-old (push) Has been cancelled
CI / build-macos-latest (push) Has been cancelled
CI / format-yaml (push) Has been cancelled
Clang Format Check / clang-format-check (push) Has been cancelled
Codecov / code-coverage (push) Has been cancelled
External Server Tests / test-external-standalone (push) Has been cancelled
External Server Tests / test-external-cluster (push) Has been cancelled
External Server Tests / test-external-nodebug (push) Has been cancelled
2025-04-02 20:09:00 +00:00
cb7ebda652 Update README.md
Some checks are pending
CI / test-rdma (push) Waiting to run
CI / build-debian-old (push) Waiting to run
CI / build-macos-latest (push) Waiting to run
CI / build-32bit (push) Waiting to run
CI / build-libc-malloc (push) Waiting to run
CI / build-almalinux8-jemalloc (push) Waiting to run
CI / test-ubuntu-latest (push) Waiting to run
CI / test-ubuntu-latest-cmake (push) Waiting to run
CI / test-sanitizer-address (push) Waiting to run
CI / format-yaml (push) Waiting to run
Clang Format Check / clang-format-check (push) Waiting to run
Codecov / code-coverage (push) Waiting to run
External Server Tests / test-external-standalone (push) Waiting to run
External Server Tests / test-external-cluster (push) Waiting to run
External Server Tests / test-external-nodebug (push) Waiting to run
Spellcheck / Spellcheck (push) Waiting to run
2025-04-02 20:08:26 +00:00
3ec2a4bd68 Update README.md
Some checks are pending
CI / build-almalinux8-jemalloc (push) Waiting to run
CI / test-ubuntu-latest (push) Waiting to run
CI / test-ubuntu-latest-cmake (push) Waiting to run
CI / test-sanitizer-address (push) Waiting to run
CI / test-rdma (push) Waiting to run
CI / build-debian-old (push) Waiting to run
CI / build-macos-latest (push) Waiting to run
CI / build-32bit (push) Waiting to run
CI / build-libc-malloc (push) Waiting to run
CI / format-yaml (push) Waiting to run
Clang Format Check / clang-format-check (push) Waiting to run
Codecov / code-coverage (push) Waiting to run
External Server Tests / test-external-nodebug (push) Waiting to run
External Server Tests / test-external-standalone (push) Waiting to run
External Server Tests / test-external-cluster (push) Waiting to run
Spellcheck / Spellcheck (push) Waiting to run
2025-04-02 20:07:16 +00:00
a5c63f9904 Update README.md
Some checks are pending
CI / test-ubuntu-latest-cmake (push) Waiting to run
CI / test-sanitizer-address (push) Waiting to run
CI / test-rdma (push) Waiting to run
CI / build-debian-old (push) Waiting to run
CI / build-macos-latest (push) Waiting to run
CI / build-32bit (push) Waiting to run
CI / build-libc-malloc (push) Waiting to run
CI / build-almalinux8-jemalloc (push) Waiting to run
CI / test-ubuntu-latest (push) Waiting to run
CI / format-yaml (push) Waiting to run
Clang Format Check / clang-format-check (push) Waiting to run
Codecov / code-coverage (push) Waiting to run
External Server Tests / test-external-standalone (push) Waiting to run
External Server Tests / test-external-cluster (push) Waiting to run
External Server Tests / test-external-nodebug (push) Waiting to run
Spellcheck / Spellcheck (push) Waiting to run
2025-04-02 20:01:56 +00:00
c823b181ee Update README.md
Some checks are pending
CI / test-ubuntu-latest (push) Waiting to run
CI / test-ubuntu-latest-cmake (push) Waiting to run
CI / test-sanitizer-address (push) Waiting to run
CI / test-rdma (push) Waiting to run
CI / build-debian-old (push) Waiting to run
CI / build-macos-latest (push) Waiting to run
CI / build-32bit (push) Waiting to run
CI / build-libc-malloc (push) Waiting to run
CI / build-almalinux8-jemalloc (push) Waiting to run
CI / format-yaml (push) Waiting to run
Clang Format Check / clang-format-check (push) Waiting to run
Codecov / code-coverage (push) Waiting to run
External Server Tests / test-external-standalone (push) Waiting to run
External Server Tests / test-external-cluster (push) Waiting to run
External Server Tests / test-external-nodebug (push) Waiting to run
Spellcheck / Spellcheck (push) Waiting to run
2025-04-02 19:50:40 +00:00
0b5c1b4b69 Update README.md
Some checks are pending
CI / test-ubuntu-latest (push) Waiting to run
CI / test-ubuntu-latest-cmake (push) Waiting to run
CI / test-sanitizer-address (push) Waiting to run
CI / test-rdma (push) Waiting to run
CI / build-debian-old (push) Waiting to run
CI / build-macos-latest (push) Waiting to run
CI / build-32bit (push) Waiting to run
CI / build-libc-malloc (push) Waiting to run
CI / build-almalinux8-jemalloc (push) Waiting to run
CI / format-yaml (push) Waiting to run
Clang Format Check / clang-format-check (push) Waiting to run
Codecov / code-coverage (push) Waiting to run
External Server Tests / test-external-standalone (push) Waiting to run
External Server Tests / test-external-cluster (push) Waiting to run
External Server Tests / test-external-nodebug (push) Waiting to run
Spellcheck / Spellcheck (push) Waiting to run
2025-04-02 19:48:24 +00:00
28126899ff Update README.md
Some checks are pending
CI / test-ubuntu-latest (push) Waiting to run
CI / test-ubuntu-latest-cmake (push) Waiting to run
CI / test-sanitizer-address (push) Waiting to run
CI / test-rdma (push) Waiting to run
CI / build-debian-old (push) Waiting to run
CI / build-macos-latest (push) Waiting to run
CI / build-32bit (push) Waiting to run
CI / build-libc-malloc (push) Waiting to run
CI / build-almalinux8-jemalloc (push) Waiting to run
CI / format-yaml (push) Waiting to run
Clang Format Check / clang-format-check (push) Waiting to run
Codecov / code-coverage (push) Waiting to run
External Server Tests / test-external-standalone (push) Waiting to run
External Server Tests / test-external-cluster (push) Waiting to run
External Server Tests / test-external-nodebug (push) Waiting to run
Spellcheck / Spellcheck (push) Waiting to run
2025-04-02 19:47:44 +00:00
74494c71c5 Update README.md
Some checks are pending
CI / test-ubuntu-latest (push) Waiting to run
CI / test-ubuntu-latest-cmake (push) Waiting to run
CI / test-sanitizer-address (push) Waiting to run
CI / test-rdma (push) Waiting to run
CI / build-debian-old (push) Waiting to run
CI / build-macos-latest (push) Waiting to run
CI / build-32bit (push) Waiting to run
CI / build-libc-malloc (push) Waiting to run
CI / build-almalinux8-jemalloc (push) Waiting to run
CI / format-yaml (push) Waiting to run
Clang Format Check / clang-format-check (push) Waiting to run
Codecov / code-coverage (push) Waiting to run
External Server Tests / test-external-standalone (push) Waiting to run
External Server Tests / test-external-cluster (push) Waiting to run
External Server Tests / test-external-nodebug (push) Waiting to run
Spellcheck / Spellcheck (push) Waiting to run
2025-04-02 19:45:05 +00:00
2a8c840bd3 Update README.md
Some checks are pending
CI / test-ubuntu-latest (push) Waiting to run
CI / test-ubuntu-latest-cmake (push) Waiting to run
CI / test-sanitizer-address (push) Waiting to run
CI / test-rdma (push) Waiting to run
CI / build-debian-old (push) Waiting to run
CI / build-macos-latest (push) Waiting to run
CI / build-32bit (push) Waiting to run
CI / build-libc-malloc (push) Waiting to run
CI / build-almalinux8-jemalloc (push) Waiting to run
CI / format-yaml (push) Waiting to run
Clang Format Check / clang-format-check (push) Waiting to run
Codecov / code-coverage (push) Waiting to run
External Server Tests / test-external-standalone (push) Waiting to run
External Server Tests / test-external-cluster (push) Waiting to run
External Server Tests / test-external-nodebug (push) Waiting to run
Spellcheck / Spellcheck (push) Waiting to run
2025-04-02 19:37:25 +00:00
a2383d93fd Update README.md
Some checks are pending
CI / test-ubuntu-latest (push) Waiting to run
CI / test-ubuntu-latest-cmake (push) Waiting to run
CI / test-sanitizer-address (push) Waiting to run
CI / test-rdma (push) Waiting to run
CI / build-debian-old (push) Waiting to run
CI / build-macos-latest (push) Waiting to run
CI / build-32bit (push) Waiting to run
CI / build-libc-malloc (push) Waiting to run
CI / build-almalinux8-jemalloc (push) Waiting to run
CI / format-yaml (push) Waiting to run
Clang Format Check / clang-format-check (push) Waiting to run
Codecov / code-coverage (push) Waiting to run
External Server Tests / test-external-standalone (push) Waiting to run
External Server Tests / test-external-cluster (push) Waiting to run
External Server Tests / test-external-nodebug (push) Waiting to run
Spellcheck / Spellcheck (push) Waiting to run
2025-04-02 19:36:45 +00:00
020b96fe67 Update README.md
Some checks are pending
CI / test-ubuntu-latest (push) Waiting to run
CI / test-ubuntu-latest-cmake (push) Waiting to run
CI / test-sanitizer-address (push) Waiting to run
CI / test-rdma (push) Waiting to run
CI / build-debian-old (push) Waiting to run
CI / build-macos-latest (push) Waiting to run
CI / build-32bit (push) Waiting to run
CI / build-libc-malloc (push) Waiting to run
CI / build-almalinux8-jemalloc (push) Waiting to run
CI / format-yaml (push) Waiting to run
Clang Format Check / clang-format-check (push) Waiting to run
Codecov / code-coverage (push) Waiting to run
External Server Tests / test-external-standalone (push) Waiting to run
External Server Tests / test-external-cluster (push) Waiting to run
External Server Tests / test-external-nodebug (push) Waiting to run
Spellcheck / Spellcheck (push) Waiting to run
2025-04-02 17:23:57 +00:00
04386d7cf8 Update cluster
Some checks failed
CI / build-almalinux8-jemalloc (push) Has been cancelled
Clang Format Check / clang-format-check (push) Has been cancelled
Codecov / code-coverage (push) Has been cancelled
CI / test-ubuntu-latest (push) Has been cancelled
CI / test-ubuntu-latest-cmake (push) Has been cancelled
CI / test-sanitizer-address (push) Has been cancelled
CI / test-rdma (push) Has been cancelled
CI / build-debian-old (push) Has been cancelled
CI / build-macos-latest (push) Has been cancelled
CI / build-32bit (push) Has been cancelled
CI / build-libc-malloc (push) Has been cancelled
CI / format-yaml (push) Has been cancelled
Spellcheck / Spellcheck (push) Has been cancelled
External Server Tests / test-external-standalone (push) Has been cancelled
External Server Tests / test-external-cluster (push) Has been cancelled
External Server Tests / test-external-nodebug (push) Has been cancelled
CodeQL / Analyze (cpp) (push) Has been cancelled
2025-03-23 19:41:33 +00:00
40a6e15c30 Update futriix.conf
Some checks are pending
CI / test-sanitizer-address (push) Waiting to run
CI / test-rdma (push) Waiting to run
CI / build-debian-old (push) Waiting to run
CI / build-macos-latest (push) Waiting to run
CI / build-32bit (push) Waiting to run
CI / build-libc-malloc (push) Waiting to run
CI / build-almalinux8-jemalloc (push) Waiting to run
CI / test-ubuntu-latest (push) Waiting to run
CI / test-ubuntu-latest-cmake (push) Waiting to run
CI / format-yaml (push) Waiting to run
Clang Format Check / clang-format-check (push) Waiting to run
Codecov / code-coverage (push) Waiting to run
External Server Tests / test-external-standalone (push) Waiting to run
External Server Tests / test-external-cluster (push) Waiting to run
External Server Tests / test-external-nodebug (push) Waiting to run
Spellcheck / Spellcheck (push) Waiting to run
2025-03-23 15:40:11 +00:00
8afff26a55 Delete sentinel.conf
Some checks are pending
CI / test-sanitizer-address (push) Waiting to run
CI / test-ubuntu-latest (push) Waiting to run
CI / test-ubuntu-latest-cmake (push) Waiting to run
CI / test-rdma (push) Waiting to run
CI / build-debian-old (push) Waiting to run
CI / build-macos-latest (push) Waiting to run
CI / build-32bit (push) Waiting to run
CI / build-libc-malloc (push) Waiting to run
CI / build-almalinux8-jemalloc (push) Waiting to run
CI / format-yaml (push) Waiting to run
Clang Format Check / clang-format-check (push) Waiting to run
Codecov / code-coverage (push) Waiting to run
External Server Tests / test-external-standalone (push) Waiting to run
External Server Tests / test-external-cluster (push) Waiting to run
External Server Tests / test-external-nodebug (push) Waiting to run
Spellcheck / Spellcheck (push) Waiting to run
2025-03-23 15:34:20 +00:00
aec56296e2 Upload files to "/"
Some checks are pending
CI / test-sanitizer-address (push) Waiting to run
CI / test-ubuntu-latest (push) Waiting to run
CI / test-ubuntu-latest-cmake (push) Waiting to run
CI / test-rdma (push) Waiting to run
CI / build-debian-old (push) Waiting to run
CI / build-macos-latest (push) Waiting to run
Codecov / code-coverage (push) Waiting to run
External Server Tests / test-external-standalone (push) Waiting to run
External Server Tests / test-external-cluster (push) Waiting to run
External Server Tests / test-external-nodebug (push) Waiting to run
Spellcheck / Spellcheck (push) Waiting to run
CI / build-32bit (push) Waiting to run
CI / build-libc-malloc (push) Waiting to run
CI / build-almalinux8-jemalloc (push) Waiting to run
CI / format-yaml (push) Waiting to run
Clang Format Check / clang-format-check (push) Waiting to run
2025-03-23 15:31:44 +00:00
c07840b626 Delete cluster
Some checks are pending
CI / build-libc-malloc (push) Waiting to run
CI / build-almalinux8-jemalloc (push) Waiting to run
CI / test-rdma (push) Waiting to run
CI / build-debian-old (push) Waiting to run
CI / build-macos-latest (push) Waiting to run
CI / test-ubuntu-latest (push) Waiting to run
CI / test-ubuntu-latest-cmake (push) Waiting to run
CI / test-sanitizer-address (push) Waiting to run
CI / build-32bit (push) Waiting to run
CI / format-yaml (push) Waiting to run
Clang Format Check / clang-format-check (push) Waiting to run
Codecov / code-coverage (push) Waiting to run
External Server Tests / test-external-standalone (push) Waiting to run
External Server Tests / test-external-cluster (push) Waiting to run
External Server Tests / test-external-nodebug (push) Waiting to run
Spellcheck / Spellcheck (push) Waiting to run
2025-03-23 15:31:19 +00:00
f4aeea00bf Update README.md
Some checks are pending
CI / test-sanitizer-address (push) Waiting to run
CI / test-ubuntu-latest (push) Waiting to run
CI / test-ubuntu-latest-cmake (push) Waiting to run
CI / test-rdma (push) Waiting to run
CI / build-debian-old (push) Waiting to run
CI / build-macos-latest (push) Waiting to run
CI / build-32bit (push) Waiting to run
CI / build-libc-malloc (push) Waiting to run
CI / build-almalinux8-jemalloc (push) Waiting to run
CI / format-yaml (push) Waiting to run
Clang Format Check / clang-format-check (push) Waiting to run
Codecov / code-coverage (push) Waiting to run
External Server Tests / test-external-standalone (push) Waiting to run
External Server Tests / test-external-cluster (push) Waiting to run
External Server Tests / test-external-nodebug (push) Waiting to run
Spellcheck / Spellcheck (push) Waiting to run
2025-03-23 15:30:54 +00:00
0b2a5e53f4 Update README.md
Some checks are pending
CI / test-sanitizer-address (push) Waiting to run
CI / test-ubuntu-latest (push) Waiting to run
CI / test-ubuntu-latest-cmake (push) Waiting to run
CI / test-rdma (push) Waiting to run
CI / build-debian-old (push) Waiting to run
CI / build-macos-latest (push) Waiting to run
CI / build-32bit (push) Waiting to run
CI / build-libc-malloc (push) Waiting to run
CI / build-almalinux8-jemalloc (push) Waiting to run
CI / format-yaml (push) Waiting to run
Clang Format Check / clang-format-check (push) Waiting to run
Codecov / code-coverage (push) Waiting to run
External Server Tests / test-external-standalone (push) Waiting to run
External Server Tests / test-external-cluster (push) Waiting to run
External Server Tests / test-external-nodebug (push) Waiting to run
Spellcheck / Spellcheck (push) Waiting to run
2025-03-23 15:29:42 +00:00
7c17219702 Update README.md
Some checks are pending
CI / build-macos-latest (push) Waiting to run
CI / build-32bit (push) Waiting to run
CI / build-libc-malloc (push) Waiting to run
CI / build-almalinux8-jemalloc (push) Waiting to run
CI / test-sanitizer-address (push) Waiting to run
CI / test-rdma (push) Waiting to run
CI / test-ubuntu-latest (push) Waiting to run
CI / test-ubuntu-latest-cmake (push) Waiting to run
CI / build-debian-old (push) Waiting to run
CI / format-yaml (push) Waiting to run
Clang Format Check / clang-format-check (push) Waiting to run
Codecov / code-coverage (push) Waiting to run
External Server Tests / test-external-standalone (push) Waiting to run
External Server Tests / test-external-cluster (push) Waiting to run
External Server Tests / test-external-nodebug (push) Waiting to run
Spellcheck / Spellcheck (push) Waiting to run
2025-03-23 15:29:15 +00:00
ee50166b0c Update README.md
Some checks are pending
CI / test-ubuntu-latest (push) Waiting to run
CI / test-ubuntu-latest-cmake (push) Waiting to run
CI / test-sanitizer-address (push) Waiting to run
CI / test-rdma (push) Waiting to run
CI / build-debian-old (push) Waiting to run
CI / build-macos-latest (push) Waiting to run
CI / build-32bit (push) Waiting to run
CI / build-libc-malloc (push) Waiting to run
CI / build-almalinux8-jemalloc (push) Waiting to run
CI / format-yaml (push) Waiting to run
Clang Format Check / clang-format-check (push) Waiting to run
Codecov / code-coverage (push) Waiting to run
External Server Tests / test-external-standalone (push) Waiting to run
External Server Tests / test-external-cluster (push) Waiting to run
External Server Tests / test-external-nodebug (push) Waiting to run
Spellcheck / Spellcheck (push) Waiting to run
2025-03-23 15:26:26 +00:00
4414431a99 Update README.md
Some checks are pending
CI / build-almalinux8-jemalloc (push) Waiting to run
CI / test-ubuntu-latest (push) Waiting to run
CI / test-ubuntu-latest-cmake (push) Waiting to run
CI / test-sanitizer-address (push) Waiting to run
CI / test-rdma (push) Waiting to run
CI / build-debian-old (push) Waiting to run
CI / build-macos-latest (push) Waiting to run
CI / build-32bit (push) Waiting to run
CI / build-libc-malloc (push) Waiting to run
CI / format-yaml (push) Waiting to run
Clang Format Check / clang-format-check (push) Waiting to run
Codecov / code-coverage (push) Waiting to run
External Server Tests / test-external-standalone (push) Waiting to run
External Server Tests / test-external-cluster (push) Waiting to run
External Server Tests / test-external-nodebug (push) Waiting to run
Spellcheck / Spellcheck (push) Waiting to run
2025-03-23 15:24:19 +00:00
41464c0cad Delete src/modules/hellotype.c
Some checks failed
CI / test-ubuntu-latest (push) Has been cancelled
CI / test-ubuntu-latest-cmake (push) Has been cancelled
CI / test-sanitizer-address (push) Has been cancelled
CI / test-rdma (push) Has been cancelled
CI / build-debian-old (push) Has been cancelled
CI / build-macos-latest (push) Has been cancelled
CI / build-32bit (push) Has been cancelled
CI / build-libc-malloc (push) Has been cancelled
CI / build-almalinux8-jemalloc (push) Has been cancelled
CI / format-yaml (push) Has been cancelled
Clang Format Check / clang-format-check (push) Has been cancelled
Codecov / code-coverage (push) Has been cancelled
External Server Tests / test-external-standalone (push) Has been cancelled
External Server Tests / test-external-cluster (push) Has been cancelled
External Server Tests / test-external-nodebug (push) Has been cancelled
Spellcheck / Spellcheck (push) Has been cancelled
CodeQL / Analyze (cpp) (push) Has been cancelled
2025-02-15 15:56:13 +00:00
fe044a4d63 Delete src/modules/hellotimer.c
Some checks are pending
CI / test-ubuntu-latest (push) Waiting to run
CI / test-ubuntu-latest-cmake (push) Waiting to run
CI / test-sanitizer-address (push) Waiting to run
CI / test-rdma (push) Waiting to run
CI / build-debian-old (push) Waiting to run
CI / build-macos-latest (push) Waiting to run
CI / build-32bit (push) Waiting to run
CI / build-libc-malloc (push) Waiting to run
CI / build-almalinux8-jemalloc (push) Waiting to run
CI / format-yaml (push) Waiting to run
Clang Format Check / clang-format-check (push) Waiting to run
Codecov / code-coverage (push) Waiting to run
External Server Tests / test-external-nodebug (push) Waiting to run
External Server Tests / test-external-standalone (push) Waiting to run
External Server Tests / test-external-cluster (push) Waiting to run
Spellcheck / Spellcheck (push) Waiting to run
2025-02-15 15:56:04 +00:00
9bda1019e3 Delete src/modules/hellocluster.c
Some checks are pending
External Server Tests / test-external-standalone (push) Waiting to run
External Server Tests / test-external-cluster (push) Waiting to run
CI / build-almalinux8-jemalloc (push) Waiting to run
CI / test-ubuntu-latest (push) Waiting to run
CI / test-ubuntu-latest-cmake (push) Waiting to run
CI / test-sanitizer-address (push) Waiting to run
CI / test-rdma (push) Waiting to run
CI / build-debian-old (push) Waiting to run
CI / build-macos-latest (push) Waiting to run
CI / build-32bit (push) Waiting to run
CI / build-libc-malloc (push) Waiting to run
CI / format-yaml (push) Waiting to run
Clang Format Check / clang-format-check (push) Waiting to run
Codecov / code-coverage (push) Waiting to run
External Server Tests / test-external-nodebug (push) Waiting to run
Spellcheck / Spellcheck (push) Waiting to run
2025-02-15 15:55:55 +00:00
bd8d11b2d3 Delete src/modules/helloworld.c
Some checks are pending
CI / test-rdma (push) Waiting to run
CI / test-sanitizer-address (push) Waiting to run
CI / build-debian-old (push) Waiting to run
CI / build-macos-latest (push) Waiting to run
CI / build-32bit (push) Waiting to run
CI / build-libc-malloc (push) Waiting to run
CI / build-almalinux8-jemalloc (push) Waiting to run
CI / test-ubuntu-latest (push) Waiting to run
CI / test-ubuntu-latest-cmake (push) Waiting to run
CI / format-yaml (push) Waiting to run
Clang Format Check / clang-format-check (push) Waiting to run
Codecov / code-coverage (push) Waiting to run
External Server Tests / test-external-standalone (push) Waiting to run
External Server Tests / test-external-cluster (push) Waiting to run
External Server Tests / test-external-nodebug (push) Waiting to run
Spellcheck / Spellcheck (push) Waiting to run
2025-02-15 15:55:47 +00:00
8dc71285eb Delete src/modules/hellohook.c
Some checks are pending
CI / test-ubuntu-latest-cmake (push) Waiting to run
CI / test-rdma (push) Waiting to run
CI / build-debian-old (push) Waiting to run
CI / build-macos-latest (push) Waiting to run
CI / build-32bit (push) Waiting to run
CI / build-libc-malloc (push) Waiting to run
CI / build-almalinux8-jemalloc (push) Waiting to run
CI / format-yaml (push) Waiting to run
Clang Format Check / clang-format-check (push) Waiting to run
External Server Tests / test-external-cluster (push) Waiting to run
External Server Tests / test-external-nodebug (push) Waiting to run
Spellcheck / Spellcheck (push) Waiting to run
CI / test-sanitizer-address (push) Waiting to run
CI / test-ubuntu-latest (push) Waiting to run
Codecov / code-coverage (push) Waiting to run
External Server Tests / test-external-standalone (push) Waiting to run
2025-02-15 15:55:39 +00:00
9d6053c00f Delete src/modules/hellodict.c
Some checks are pending
CI / test-ubuntu-latest (push) Waiting to run
CI / test-ubuntu-latest-cmake (push) Waiting to run
CI / test-sanitizer-address (push) Waiting to run
CI / test-rdma (push) Waiting to run
CI / build-debian-old (push) Waiting to run
CI / build-macos-latest (push) Waiting to run
CI / build-32bit (push) Waiting to run
CI / build-libc-malloc (push) Waiting to run
CI / build-almalinux8-jemalloc (push) Waiting to run
CI / format-yaml (push) Waiting to run
Clang Format Check / clang-format-check (push) Waiting to run
Codecov / code-coverage (push) Waiting to run
External Server Tests / test-external-standalone (push) Waiting to run
External Server Tests / test-external-cluster (push) Waiting to run
External Server Tests / test-external-nodebug (push) Waiting to run
Spellcheck / Spellcheck (push) Waiting to run
2025-02-15 15:55:30 +00:00
9b9baca82c Delete src/modules/Makefile
Some checks are pending
CI / build-32bit (push) Waiting to run
CI / build-libc-malloc (push) Waiting to run
CI / build-almalinux8-jemalloc (push) Waiting to run
CI / format-yaml (push) Waiting to run
CI / test-ubuntu-latest (push) Waiting to run
CI / test-ubuntu-latest-cmake (push) Waiting to run
CI / test-sanitizer-address (push) Waiting to run
CI / test-rdma (push) Waiting to run
CI / build-debian-old (push) Waiting to run
CI / build-macos-latest (push) Waiting to run
Clang Format Check / clang-format-check (push) Waiting to run
Codecov / code-coverage (push) Waiting to run
External Server Tests / test-external-standalone (push) Waiting to run
External Server Tests / test-external-cluster (push) Waiting to run
External Server Tests / test-external-nodebug (push) Waiting to run
Spellcheck / Spellcheck (push) Waiting to run
2025-02-15 15:55:21 +00:00
315bc059be Delete src/modules/helloblock.c
Some checks are pending
CI / test-sanitizer-address (push) Waiting to run
CI / test-rdma (push) Waiting to run
CI / build-debian-old (push) Waiting to run
CI / build-macos-latest (push) Waiting to run
CI / build-32bit (push) Waiting to run
CI / build-libc-malloc (push) Waiting to run
CI / build-almalinux8-jemalloc (push) Waiting to run
CI / test-ubuntu-latest (push) Waiting to run
CI / test-ubuntu-latest-cmake (push) Waiting to run
CI / format-yaml (push) Waiting to run
Clang Format Check / clang-format-check (push) Waiting to run
Codecov / code-coverage (push) Waiting to run
External Server Tests / test-external-standalone (push) Waiting to run
External Server Tests / test-external-cluster (push) Waiting to run
External Server Tests / test-external-nodebug (push) Waiting to run
Spellcheck / Spellcheck (push) Waiting to run
2025-02-15 15:54:55 +00:00
4cf16b5249 Delete src/modules/helloacl.c
Some checks are pending
CI / build-32bit (push) Waiting to run
CI / build-libc-malloc (push) Waiting to run
CI / build-almalinux8-jemalloc (push) Waiting to run
CI / test-sanitizer-address (push) Waiting to run
CI / test-rdma (push) Waiting to run
CI / build-debian-old (push) Waiting to run
CI / build-macos-latest (push) Waiting to run
CI / test-ubuntu-latest (push) Waiting to run
CI / test-ubuntu-latest-cmake (push) Waiting to run
CI / format-yaml (push) Waiting to run
Clang Format Check / clang-format-check (push) Waiting to run
Codecov / code-coverage (push) Waiting to run
External Server Tests / test-external-standalone (push) Waiting to run
External Server Tests / test-external-cluster (push) Waiting to run
External Server Tests / test-external-nodebug (push) Waiting to run
Spellcheck / Spellcheck (push) Waiting to run
2025-02-15 15:54:40 +00:00
fe3c1c34b0 Delete valkey.conf
Some checks failed
CI / test-rdma (push) Has been cancelled
CI / build-debian-old (push) Has been cancelled
CI / build-macos-latest (push) Has been cancelled
CI / build-32bit (push) Has been cancelled
CI / build-libc-malloc (push) Has been cancelled
CI / build-almalinux8-jemalloc (push) Has been cancelled
CI / test-ubuntu-latest (push) Has been cancelled
CI / test-ubuntu-latest-cmake (push) Has been cancelled
CI / test-sanitizer-address (push) Has been cancelled
CI / format-yaml (push) Has been cancelled
Clang Format Check / clang-format-check (push) Has been cancelled
Codecov / code-coverage (push) Has been cancelled
External Server Tests / test-external-standalone (push) Has been cancelled
External Server Tests / test-external-cluster (push) Has been cancelled
External Server Tests / test-external-nodebug (push) Has been cancelled
Spellcheck / Spellcheck (push) Has been cancelled
2025-02-13 19:39:37 +00:00
5979980595 Upload files to "/"
Some checks are pending
CI / test-rdma (push) Waiting to run
CI / build-debian-old (push) Waiting to run
CI / build-macos-latest (push) Waiting to run
CI / build-32bit (push) Waiting to run
CI / build-libc-malloc (push) Waiting to run
CI / build-almalinux8-jemalloc (push) Waiting to run
CI / test-ubuntu-latest (push) Waiting to run
CI / test-ubuntu-latest-cmake (push) Waiting to run
CI / test-sanitizer-address (push) Waiting to run
CI / format-yaml (push) Waiting to run
Clang Format Check / clang-format-check (push) Waiting to run
Codecov / code-coverage (push) Waiting to run
External Server Tests / test-external-standalone (push) Waiting to run
External Server Tests / test-external-cluster (push) Waiting to run
External Server Tests / test-external-nodebug (push) Waiting to run
Spellcheck / Spellcheck (push) Waiting to run
2025-02-13 19:38:56 +00:00
7317466c05 Update src/rdb.c
Some checks failed
Codecov / code-coverage (push) Has been cancelled
CI / test-ubuntu-latest (push) Has been cancelled
CI / test-ubuntu-latest-cmake (push) Has been cancelled
CI / test-sanitizer-address (push) Has been cancelled
CI / test-rdma (push) Has been cancelled
CI / build-debian-old (push) Has been cancelled
CI / build-macos-latest (push) Has been cancelled
CI / build-32bit (push) Has been cancelled
CI / build-libc-malloc (push) Has been cancelled
CI / build-almalinux8-jemalloc (push) Has been cancelled
Clang Format Check / clang-format-check (push) Has been cancelled
CI / format-yaml (push) Has been cancelled
External Server Tests / test-external-standalone (push) Has been cancelled
External Server Tests / test-external-cluster (push) Has been cancelled
External Server Tests / test-external-nodebug (push) Has been cancelled
Spellcheck / Spellcheck (push) Has been cancelled
2025-02-10 20:30:08 +00:00
18e232f0e5 Upload files to "/"
Some checks failed
CI / test-ubuntu-latest (push) Has been cancelled
CI / test-ubuntu-latest-cmake (push) Has been cancelled
CI / test-sanitizer-address (push) Has been cancelled
CI / test-rdma (push) Has been cancelled
CI / build-debian-old (push) Has been cancelled
CI / build-macos-latest (push) Has been cancelled
CI / build-32bit (push) Has been cancelled
CI / build-libc-malloc (push) Has been cancelled
CI / build-almalinux8-jemalloc (push) Has been cancelled
CI / format-yaml (push) Has been cancelled
Clang Format Check / clang-format-check (push) Has been cancelled
Codecov / code-coverage (push) Has been cancelled
External Server Tests / test-external-standalone (push) Has been cancelled
External Server Tests / test-external-cluster (push) Has been cancelled
External Server Tests / test-external-nodebug (push) Has been cancelled
Spellcheck / Spellcheck (push) Has been cancelled
Coverity Scan / coverity (push) Has been cancelled
CodeQL / Analyze (cpp) (push) Has been cancelled
2025-02-03 20:02:39 +00:00
a5601ba85a Delete cluster
Some checks are pending
CI / test-ubuntu-latest (push) Waiting to run
CI / test-ubuntu-latest-cmake (push) Waiting to run
CI / test-sanitizer-address (push) Waiting to run
CI / test-rdma (push) Waiting to run
CI / build-debian-old (push) Waiting to run
CI / build-macos-latest (push) Waiting to run
CI / build-32bit (push) Waiting to run
CI / build-libc-malloc (push) Waiting to run
CI / build-almalinux8-jemalloc (push) Waiting to run
CI / format-yaml (push) Waiting to run
Clang Format Check / clang-format-check (push) Waiting to run
Codecov / code-coverage (push) Waiting to run
External Server Tests / test-external-standalone (push) Waiting to run
External Server Tests / test-external-cluster (push) Waiting to run
External Server Tests / test-external-nodebug (push) Waiting to run
Spellcheck / Spellcheck (push) Waiting to run
2025-02-03 20:01:27 +00:00
7eaad5782b Delete utils/create-cluster/create-cluster
Some checks are pending
CI / build-32bit (push) Waiting to run
CI / build-libc-malloc (push) Waiting to run
CI / build-almalinux8-jemalloc (push) Waiting to run
CI / format-yaml (push) Waiting to run
CI / test-ubuntu-latest (push) Waiting to run
CI / test-ubuntu-latest-cmake (push) Waiting to run
CI / test-sanitizer-address (push) Waiting to run
CI / test-rdma (push) Waiting to run
CI / build-debian-old (push) Waiting to run
CI / build-macos-latest (push) Waiting to run
Clang Format Check / clang-format-check (push) Waiting to run
Codecov / code-coverage (push) Waiting to run
External Server Tests / test-external-standalone (push) Waiting to run
External Server Tests / test-external-cluster (push) Waiting to run
External Server Tests / test-external-nodebug (push) Waiting to run
Spellcheck / Spellcheck (push) Waiting to run
2025-02-03 20:01:10 +00:00
5cba04d02c Delete utils/create-cluster/.gitignore
Some checks are pending
CI / test-ubuntu-latest (push) Waiting to run
CI / test-ubuntu-latest-cmake (push) Waiting to run
CI / test-sanitizer-address (push) Waiting to run
CI / test-rdma (push) Waiting to run
CI / build-debian-old (push) Waiting to run
CI / build-macos-latest (push) Waiting to run
CI / build-32bit (push) Waiting to run
CI / build-libc-malloc (push) Waiting to run
CI / build-almalinux8-jemalloc (push) Waiting to run
CI / format-yaml (push) Waiting to run
Clang Format Check / clang-format-check (push) Waiting to run
Codecov / code-coverage (push) Waiting to run
External Server Tests / test-external-standalone (push) Waiting to run
External Server Tests / test-external-cluster (push) Waiting to run
External Server Tests / test-external-nodebug (push) Waiting to run
Spellcheck / Spellcheck (push) Waiting to run
2025-02-03 20:01:01 +00:00
e9a66d0ae0 Delete utils/create-cluster/README
Some checks are pending
CI / test-ubuntu-latest (push) Waiting to run
CI / test-ubuntu-latest-cmake (push) Waiting to run
CI / test-sanitizer-address (push) Waiting to run
CI / test-rdma (push) Waiting to run
CI / build-debian-old (push) Waiting to run
CI / build-macos-latest (push) Waiting to run
CI / build-32bit (push) Waiting to run
CI / build-libc-malloc (push) Waiting to run
CI / build-almalinux8-jemalloc (push) Waiting to run
CI / format-yaml (push) Waiting to run
Clang Format Check / clang-format-check (push) Waiting to run
Codecov / code-coverage (push) Waiting to run
External Server Tests / test-external-standalone (push) Waiting to run
External Server Tests / test-external-cluster (push) Waiting to run
External Server Tests / test-external-nodebug (push) Waiting to run
Spellcheck / Spellcheck (push) Waiting to run
2025-02-03 20:00:51 +00:00
e6ad354b20 Update README.md
Some checks are pending
CI / build-32bit (push) Waiting to run
CI / build-libc-malloc (push) Waiting to run
CI / build-almalinux8-jemalloc (push) Waiting to run
CI / test-ubuntu-latest (push) Waiting to run
CI / test-ubuntu-latest-cmake (push) Waiting to run
CI / test-sanitizer-address (push) Waiting to run
CI / test-rdma (push) Waiting to run
CI / build-debian-old (push) Waiting to run
CI / build-macos-latest (push) Waiting to run
CI / format-yaml (push) Waiting to run
Clang Format Check / clang-format-check (push) Waiting to run
Codecov / code-coverage (push) Waiting to run
External Server Tests / test-external-standalone (push) Waiting to run
External Server Tests / test-external-cluster (push) Waiting to run
External Server Tests / test-external-nodebug (push) Waiting to run
Spellcheck / Spellcheck (push) Waiting to run
2025-02-03 18:47:46 +00:00
be5f5e18e0 Update README.md
Some checks are pending
CI / build-32bit (push) Waiting to run
CI / build-libc-malloc (push) Waiting to run
CI / build-almalinux8-jemalloc (push) Waiting to run
CI / test-ubuntu-latest (push) Waiting to run
CI / test-ubuntu-latest-cmake (push) Waiting to run
CI / test-sanitizer-address (push) Waiting to run
CI / test-rdma (push) Waiting to run
CI / build-debian-old (push) Waiting to run
CI / build-macos-latest (push) Waiting to run
CI / format-yaml (push) Waiting to run
Clang Format Check / clang-format-check (push) Waiting to run
Codecov / code-coverage (push) Waiting to run
External Server Tests / test-external-standalone (push) Waiting to run
External Server Tests / test-external-cluster (push) Waiting to run
External Server Tests / test-external-nodebug (push) Waiting to run
Spellcheck / Spellcheck (push) Waiting to run
2025-02-03 18:42:56 +00:00
0865036a8d Update COPYING
Some checks are pending
CI / test-ubuntu-latest (push) Waiting to run
CI / test-ubuntu-latest-cmake (push) Waiting to run
CI / test-sanitizer-address (push) Waiting to run
CI / test-rdma (push) Waiting to run
CI / build-debian-old (push) Waiting to run
CI / build-macos-latest (push) Waiting to run
CI / build-32bit (push) Waiting to run
CI / build-libc-malloc (push) Waiting to run
CI / build-almalinux8-jemalloc (push) Waiting to run
CI / format-yaml (push) Waiting to run
Clang Format Check / clang-format-check (push) Waiting to run
Codecov / code-coverage (push) Waiting to run
External Server Tests / test-external-standalone (push) Waiting to run
External Server Tests / test-external-cluster (push) Waiting to run
External Server Tests / test-external-nodebug (push) Waiting to run
Spellcheck / Spellcheck (push) Waiting to run
2025-02-03 18:41:48 +00:00
e6f6239b05 Delete 00-RELEASENOTES
Some checks are pending
CI / test-ubuntu-latest (push) Waiting to run
CI / test-ubuntu-latest-cmake (push) Waiting to run
CI / test-sanitizer-address (push) Waiting to run
CI / test-rdma (push) Waiting to run
CI / build-debian-old (push) Waiting to run
CI / build-macos-latest (push) Waiting to run
CI / build-32bit (push) Waiting to run
CI / build-libc-malloc (push) Waiting to run
CI / build-almalinux8-jemalloc (push) Waiting to run
CI / format-yaml (push) Waiting to run
Clang Format Check / clang-format-check (push) Waiting to run
Codecov / code-coverage (push) Waiting to run
External Server Tests / test-external-standalone (push) Waiting to run
External Server Tests / test-external-cluster (push) Waiting to run
External Server Tests / test-external-nodebug (push) Waiting to run
Spellcheck / Spellcheck (push) Waiting to run
2025-02-03 18:40:23 +00:00
3667aefcd7 Upload files to "src"
Some checks are pending
CI / test-sanitizer-address (push) Waiting to run
CI / test-rdma (push) Waiting to run
CI / build-debian-old (push) Waiting to run
CI / build-macos-latest (push) Waiting to run
CI / build-32bit (push) Waiting to run
CI / build-libc-malloc (push) Waiting to run
CI / build-almalinux8-jemalloc (push) Waiting to run
CI / test-ubuntu-latest (push) Waiting to run
CI / test-ubuntu-latest-cmake (push) Waiting to run
CI / format-yaml (push) Waiting to run
Clang Format Check / clang-format-check (push) Waiting to run
Codecov / code-coverage (push) Waiting to run
External Server Tests / test-external-standalone (push) Waiting to run
External Server Tests / test-external-cluster (push) Waiting to run
External Server Tests / test-external-nodebug (push) Waiting to run
Spellcheck / Spellcheck (push) Waiting to run
2025-02-03 18:39:44 +00:00
0f15725225 Delete src/server.c
Some checks are pending
CI / test-ubuntu-latest (push) Waiting to run
CI / test-ubuntu-latest-cmake (push) Waiting to run
CI / test-sanitizer-address (push) Waiting to run
CI / test-rdma (push) Waiting to run
CI / build-debian-old (push) Waiting to run
CI / build-macos-latest (push) Waiting to run
CI / build-32bit (push) Waiting to run
CI / build-libc-malloc (push) Waiting to run
CI / build-almalinux8-jemalloc (push) Waiting to run
CI / format-yaml (push) Waiting to run
Clang Format Check / clang-format-check (push) Waiting to run
Codecov / code-coverage (push) Waiting to run
External Server Tests / test-external-standalone (push) Waiting to run
External Server Tests / test-external-cluster (push) Waiting to run
External Server Tests / test-external-nodebug (push) Waiting to run
Spellcheck / Spellcheck (push) Waiting to run
2025-02-03 18:38:56 +00:00
feb27073ca Upload files to "src"
Some checks are pending
CI / build-macos-latest (push) Waiting to run
CI / test-rdma (push) Waiting to run
CI / test-ubuntu-latest (push) Waiting to run
CI / test-ubuntu-latest-cmake (push) Waiting to run
CI / test-sanitizer-address (push) Waiting to run
CI / build-debian-old (push) Waiting to run
CI / build-32bit (push) Waiting to run
CI / build-libc-malloc (push) Waiting to run
CI / build-almalinux8-jemalloc (push) Waiting to run
CI / format-yaml (push) Waiting to run
Clang Format Check / clang-format-check (push) Waiting to run
Codecov / code-coverage (push) Waiting to run
External Server Tests / test-external-standalone (push) Waiting to run
External Server Tests / test-external-cluster (push) Waiting to run
External Server Tests / test-external-nodebug (push) Waiting to run
Spellcheck / Spellcheck (push) Waiting to run
2025-02-03 18:38:34 +00:00
3a40802dfe Delete src/valkey-cli.c
Some checks are pending
CI / build-32bit (push) Waiting to run
CI / build-libc-malloc (push) Waiting to run
CI / build-almalinux8-jemalloc (push) Waiting to run
CI / test-ubuntu-latest (push) Waiting to run
CI / test-ubuntu-latest-cmake (push) Waiting to run
CI / test-sanitizer-address (push) Waiting to run
CI / test-rdma (push) Waiting to run
CI / build-debian-old (push) Waiting to run
CI / build-macos-latest (push) Waiting to run
CI / format-yaml (push) Waiting to run
Clang Format Check / clang-format-check (push) Waiting to run
Codecov / code-coverage (push) Waiting to run
External Server Tests / test-external-standalone (push) Waiting to run
External Server Tests / test-external-cluster (push) Waiting to run
External Server Tests / test-external-nodebug (push) Waiting to run
Spellcheck / Spellcheck (push) Waiting to run
2025-02-03 18:38:16 +00:00
28a627c170 Upload files to "src"
Some checks are pending
CI / build-32bit (push) Waiting to run
CI / build-libc-malloc (push) Waiting to run
CI / build-almalinux8-jemalloc (push) Waiting to run
CI / test-sanitizer-address (push) Waiting to run
CI / test-rdma (push) Waiting to run
CI / build-debian-old (push) Waiting to run
CI / build-macos-latest (push) Waiting to run
CI / test-ubuntu-latest (push) Waiting to run
CI / test-ubuntu-latest-cmake (push) Waiting to run
CI / format-yaml (push) Waiting to run
Clang Format Check / clang-format-check (push) Waiting to run
Codecov / code-coverage (push) Waiting to run
External Server Tests / test-external-standalone (push) Waiting to run
External Server Tests / test-external-cluster (push) Waiting to run
External Server Tests / test-external-nodebug (push) Waiting to run
Spellcheck / Spellcheck (push) Waiting to run
2025-02-03 17:34:11 +00:00
f9d25adab9 Upload files to "src"
Some checks are pending
CI / build-macos-latest (push) Waiting to run
CI / build-32bit (push) Waiting to run
CI / build-libc-malloc (push) Waiting to run
CI / build-almalinux8-jemalloc (push) Waiting to run
CI / test-ubuntu-latest (push) Waiting to run
CI / test-ubuntu-latest-cmake (push) Waiting to run
CI / test-sanitizer-address (push) Waiting to run
CI / test-rdma (push) Waiting to run
CI / build-debian-old (push) Waiting to run
CI / format-yaml (push) Waiting to run
Clang Format Check / clang-format-check (push) Waiting to run
Codecov / code-coverage (push) Waiting to run
External Server Tests / test-external-standalone (push) Waiting to run
External Server Tests / test-external-cluster (push) Waiting to run
External Server Tests / test-external-nodebug (push) Waiting to run
Spellcheck / Spellcheck (push) Waiting to run
2025-02-03 17:32:27 +00:00
410738342c Upload files to "src"
Some checks are pending
CI / test-ubuntu-latest (push) Waiting to run
CI / test-ubuntu-latest-cmake (push) Waiting to run
CI / test-sanitizer-address (push) Waiting to run
CI / test-rdma (push) Waiting to run
CI / build-debian-old (push) Waiting to run
CI / build-macos-latest (push) Waiting to run
CI / build-32bit (push) Waiting to run
CI / build-libc-malloc (push) Waiting to run
CI / build-almalinux8-jemalloc (push) Waiting to run
CI / format-yaml (push) Waiting to run
Clang Format Check / clang-format-check (push) Waiting to run
Codecov / code-coverage (push) Waiting to run
External Server Tests / test-external-standalone (push) Waiting to run
External Server Tests / test-external-cluster (push) Waiting to run
External Server Tests / test-external-nodebug (push) Waiting to run
Spellcheck / Spellcheck (push) Waiting to run
2025-02-03 17:32:06 +00:00
da64d5a94f Delete src/config.c
Some checks are pending
CI / test-sanitizer-address (push) Waiting to run
CI / test-ubuntu-latest (push) Waiting to run
CI / test-ubuntu-latest-cmake (push) Waiting to run
CI / test-rdma (push) Waiting to run
CI / build-debian-old (push) Waiting to run
CI / build-macos-latest (push) Waiting to run
CI / build-32bit (push) Waiting to run
CI / build-libc-malloc (push) Waiting to run
CI / build-almalinux8-jemalloc (push) Waiting to run
CI / format-yaml (push) Waiting to run
Clang Format Check / clang-format-check (push) Waiting to run
Codecov / code-coverage (push) Waiting to run
External Server Tests / test-external-standalone (push) Waiting to run
External Server Tests / test-external-cluster (push) Waiting to run
External Server Tests / test-external-nodebug (push) Waiting to run
Spellcheck / Spellcheck (push) Waiting to run
2025-02-03 16:55:31 +00:00
f8730df47f Delete src/asciilogo.h
Some checks are pending
CI / build-32bit (push) Waiting to run
CI / build-libc-malloc (push) Waiting to run
CI / build-almalinux8-jemalloc (push) Waiting to run
CI / test-ubuntu-latest (push) Waiting to run
CI / test-ubuntu-latest-cmake (push) Waiting to run
CI / test-sanitizer-address (push) Waiting to run
CI / test-rdma (push) Waiting to run
CI / build-debian-old (push) Waiting to run
CI / build-macos-latest (push) Waiting to run
CI / format-yaml (push) Waiting to run
Clang Format Check / clang-format-check (push) Waiting to run
Codecov / code-coverage (push) Waiting to run
External Server Tests / test-external-standalone (push) Waiting to run
External Server Tests / test-external-cluster (push) Waiting to run
External Server Tests / test-external-nodebug (push) Waiting to run
Spellcheck / Spellcheck (push) Waiting to run
2025-02-03 16:55:10 +00:00
2bd092bcb8 Delete src/valkey-cli.c
Some checks are pending
CI / build-32bit (push) Waiting to run
CI / build-libc-malloc (push) Waiting to run
CI / build-almalinux8-jemalloc (push) Waiting to run
CI / test-ubuntu-latest (push) Waiting to run
CI / test-ubuntu-latest-cmake (push) Waiting to run
CI / test-sanitizer-address (push) Waiting to run
CI / test-rdma (push) Waiting to run
CI / build-debian-old (push) Waiting to run
CI / build-macos-latest (push) Waiting to run
CI / format-yaml (push) Waiting to run
Clang Format Check / clang-format-check (push) Waiting to run
Codecov / code-coverage (push) Waiting to run
External Server Tests / test-external-standalone (push) Waiting to run
External Server Tests / test-external-cluster (push) Waiting to run
External Server Tests / test-external-nodebug (push) Waiting to run
Spellcheck / Spellcheck (push) Waiting to run
2025-02-03 16:54:47 +00:00
2ccd70a257 Delete src/server.c
Some checks are pending
CI / build-32bit (push) Waiting to run
CI / build-libc-malloc (push) Waiting to run
CI / build-almalinux8-jemalloc (push) Waiting to run
CI / test-ubuntu-latest (push) Waiting to run
CI / test-ubuntu-latest-cmake (push) Waiting to run
CI / test-sanitizer-address (push) Waiting to run
CI / test-rdma (push) Waiting to run
CI / build-debian-old (push) Waiting to run
CI / build-macos-latest (push) Waiting to run
CI / format-yaml (push) Waiting to run
Clang Format Check / clang-format-check (push) Waiting to run
Codecov / code-coverage (push) Waiting to run
External Server Tests / test-external-standalone (push) Waiting to run
External Server Tests / test-external-cluster (push) Waiting to run
External Server Tests / test-external-nodebug (push) Waiting to run
Spellcheck / Spellcheck (push) Waiting to run
2025-02-03 16:53:07 +00:00
a4f2c53f46 Update README.md
Some checks are pending
CI / test-ubuntu-latest (push) Waiting to run
CI / test-ubuntu-latest-cmake (push) Waiting to run
CI / test-sanitizer-address (push) Waiting to run
CI / test-rdma (push) Waiting to run
CI / build-debian-old (push) Waiting to run
CI / build-macos-latest (push) Waiting to run
CI / build-32bit (push) Waiting to run
CI / build-libc-malloc (push) Waiting to run
CI / build-almalinux8-jemalloc (push) Waiting to run
CI / format-yaml (push) Waiting to run
Clang Format Check / clang-format-check (push) Waiting to run
Codecov / code-coverage (push) Waiting to run
External Server Tests / test-external-standalone (push) Waiting to run
External Server Tests / test-external-cluster (push) Waiting to run
External Server Tests / test-external-nodebug (push) Waiting to run
Spellcheck / Spellcheck (push) Waiting to run
2025-02-02 22:06:37 +00:00
916e917d50 Upload files to "/"
Some checks are pending
CI / test-ubuntu-latest (push) Waiting to run
CI / test-ubuntu-latest-cmake (push) Waiting to run
CI / test-sanitizer-address (push) Waiting to run
CI / test-rdma (push) Waiting to run
CI / build-debian-old (push) Waiting to run
CI / build-macos-latest (push) Waiting to run
CI / build-32bit (push) Waiting to run
CI / build-libc-malloc (push) Waiting to run
CI / build-almalinux8-jemalloc (push) Waiting to run
CI / format-yaml (push) Waiting to run
Clang Format Check / clang-format-check (push) Waiting to run
Codecov / code-coverage (push) Waiting to run
External Server Tests / test-external-standalone (push) Waiting to run
External Server Tests / test-external-cluster (push) Waiting to run
External Server Tests / test-external-nodebug (push) Waiting to run
Spellcheck / Spellcheck (push) Waiting to run
2025-02-02 21:52:21 +00:00
9982efe26d Upload files to "/"
Some checks are pending
CI / build-32bit (push) Waiting to run
CI / build-libc-malloc (push) Waiting to run
CI / build-almalinux8-jemalloc (push) Waiting to run
CI / format-yaml (push) Waiting to run
CI / test-ubuntu-latest (push) Waiting to run
CI / test-ubuntu-latest-cmake (push) Waiting to run
CI / test-sanitizer-address (push) Waiting to run
CI / test-rdma (push) Waiting to run
CI / build-debian-old (push) Waiting to run
CI / build-macos-latest (push) Waiting to run
Clang Format Check / clang-format-check (push) Waiting to run
Codecov / code-coverage (push) Waiting to run
External Server Tests / test-external-standalone (push) Waiting to run
External Server Tests / test-external-cluster (push) Waiting to run
External Server Tests / test-external-nodebug (push) Waiting to run
Spellcheck / Spellcheck (push) Waiting to run
2025-02-02 21:51:12 +00:00
2cae0b1910 Update README.md
Some checks are pending
CI / build-32bit (push) Waiting to run
CI / build-libc-malloc (push) Waiting to run
CI / build-almalinux8-jemalloc (push) Waiting to run
CI / test-ubuntu-latest (push) Waiting to run
CI / test-ubuntu-latest-cmake (push) Waiting to run
CI / test-sanitizer-address (push) Waiting to run
CI / test-rdma (push) Waiting to run
CI / build-debian-old (push) Waiting to run
CI / build-macos-latest (push) Waiting to run
CI / format-yaml (push) Waiting to run
Clang Format Check / clang-format-check (push) Waiting to run
Codecov / code-coverage (push) Waiting to run
External Server Tests / test-external-standalone (push) Waiting to run
External Server Tests / test-external-cluster (push) Waiting to run
External Server Tests / test-external-nodebug (push) Waiting to run
Spellcheck / Spellcheck (push) Waiting to run
2025-02-02 21:50:41 +00:00
5664f394c0 Upload files to "src"
Some checks are pending
CI / test-ubuntu-latest (push) Waiting to run
CI / test-ubuntu-latest-cmake (push) Waiting to run
CI / test-sanitizer-address (push) Waiting to run
CI / test-rdma (push) Waiting to run
CI / build-debian-old (push) Waiting to run
CI / build-macos-latest (push) Waiting to run
CI / build-32bit (push) Waiting to run
CI / build-libc-malloc (push) Waiting to run
CI / build-almalinux8-jemalloc (push) Waiting to run
CI / format-yaml (push) Waiting to run
Clang Format Check / clang-format-check (push) Waiting to run
Codecov / code-coverage (push) Waiting to run
External Server Tests / test-external-standalone (push) Waiting to run
External Server Tests / test-external-cluster (push) Waiting to run
External Server Tests / test-external-nodebug (push) Waiting to run
Spellcheck / Spellcheck (push) Waiting to run
2025-02-02 21:47:42 +00:00
2ba8847b6d Delete src/Makefile
Some checks are pending
CI / test-ubuntu-latest (push) Waiting to run
CI / test-ubuntu-latest-cmake (push) Waiting to run
CI / test-sanitizer-address (push) Waiting to run
CI / test-rdma (push) Waiting to run
CI / build-debian-old (push) Waiting to run
CI / build-macos-latest (push) Waiting to run
CI / build-32bit (push) Waiting to run
CI / build-libc-malloc (push) Waiting to run
CI / build-almalinux8-jemalloc (push) Waiting to run
CI / format-yaml (push) Waiting to run
Clang Format Check / clang-format-check (push) Waiting to run
Codecov / code-coverage (push) Waiting to run
External Server Tests / test-external-standalone (push) Waiting to run
External Server Tests / test-external-cluster (push) Waiting to run
External Server Tests / test-external-nodebug (push) Waiting to run
Spellcheck / Spellcheck (push) Waiting to run
2025-02-02 21:46:38 +00:00
烈香
26c6f1af9b
Loop optimization: move maxlen check outside to reduce unnecessary checks (#1557)
A trival pr, move maxlen check outside to reduce unnecessary ecks

---------

Signed-off-by: hengyouhai <hengyouhai@tuhu.cn>
Signed-off-by: 烈香 <hengyoush1@163.com>
Co-authored-by: hengyouhai <hengyouhai@tuhu.cn>
2025-02-01 05:10:32 -08:00
Harkrishn Patro
78bcc0a2cf
Update daily failure notification job list (#1648)
Two jobs were missing from the job list for failure notification

* test-ubuntu-tls-io-threads
* test-sanitizer-force-defrag

Signed-off-by: Harkrishn Patro <harkrisp@amazon.com>
2025-01-30 15:21:31 -08:00
Viktor Söderqvist
12ec3d5932
Increase timeout for cross-version-replication test (#1644)
Fixes #1641

Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
2025-01-29 13:29:35 -08:00
Madelyn Olson
d3aabd7f13
Hex encode the data in dump test (#1637)
Addresses the failure here:
https://github.com/valkey-io/valkey/actions/runs/13000845302/job/36259016156#step:5:7272.

This change does three things:
1. For some reason TCL 8.5 (which is used on macos) is handling `\x03ba`
as `\0xba`, according to
https://www.tcl-lang.org/man/tcl8.5/TclCmd/Tcl.htm#M27 so we encode
"bar" using hex escapes too.
2. Fix a spacing issue. 
3. Make it so that if the restore fails, it immediately errors.

---------

Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
2025-01-29 16:20:38 -05:00
xingbowang
ff8a528fd6
Fix a heap-use-after-free bug in cluster bus (#1643)
https://github.com/valkey-io/valkey/issues/1642

Avoid heap-use-after-free in cluster bus around node cleanup code.

freeClusterNode free the human_nodename.
https://github.com/valkey-io/valkey/blob/unstable/src/cluster_legacy.c#L1725
Then it calls freeClusterLink to free the links.
https://github.com/valkey-io/valkey/blob/unstable/src/cluster_legacy.c#L1730
freeClusterLink print human_nodename here, which just got freed by the
caller freeClusterNode.
https://github.com/valkey-io/valkey/blob/unstable/src/cluster_legacy.c#L1383

Signed-off-by: xingbowang <shawn.xingbo.wang@gmail.com>
2025-01-29 13:13:40 -08:00
Binbin
4b8f3ed9ac
Do command existence and arity checks when loading AOF to avoid crash (#1614)
Do command existence and arity checks when loading AOF to avoid crash

Currently, loading commands such as `cluster` or `cluster slots xxx`
from AOF will cause the server to crash.
1. `cluster` is a container command, and executing proc will cause a
    crash because we do not check subcommand and arity.
2. `cluster slots xxx`, arity check fail, reply with an error from the
    AOF client and trigger a panic.

Of course, there are many other ways for a problematic AOF to cause the
panic, but it is still necessary do some basic checks before executing.
In this way, in these basic cases, we can print useful error messages
instead of crashing directly.

Signed-off-by: Binbin <binloveplay1314@qq.com>
2025-01-30 01:06:13 +08:00
zhenwei pi
d72a97edf6
RDMA: Protect RDMA memory regions (#1602)
Use Linux syscall mmap/munmap to manage a RDMA memory region, then we
have a guard page protected VMA like (cat /proc/PID/maps):
 785018afe000-785018aff000 ---p 00000000 00:00 0  -> top guard page
 785018aff000-785018bff000 rw-p 00000000 00:00 0  -> RDMA memory region
 785018bff000-785018c00000 ---p 00000000 00:00 0  -> bottom guard page

Once any code accesses memory unexpectedly, segment fault occurs.

Signed-off-by: zhenwei pi <zhenwei.pi@linux.dev>
Signed-off-by: zhenwei pi <pizhenwei@bytedance.com>
2025-01-28 12:22:16 -05:00
Wen Hui
ad60d6b7b3
Initialize one variable in struct to avoid risk (#1606)
In C, we had better initialize every variable in struct, this PR fixes
one missed variable Initialization.

---------

Signed-off-by: hwware <wen.hui.ware@gmail.com>
2025-01-28 11:37:41 -05:00
Madelyn Olson
f695c52acb
Fix timing issue in pause test (#1631) 2025-01-28 06:35:24 -08:00
ranshid
230efa4fbf
deflake tracking-redir-broken test (#1628)
This address 2 issues:

1. It is possible (somehow) that the inner server client (r) was not
working resp 3 when entering this test.
this makes sure it does.

2. in case the test failed it might leave the redirection client closed.
there is a cross test assumption it should be open, so moved most of the
assert checks to the end of the test.

example fail:
https://github.com/valkey-io/valkey/actions/runs/12979601179/job/36195523412

---------

Signed-off-by: Ran Shidlansik <ranshid@amazon.com>
2025-01-28 12:35:32 +02:00
Viktor Söderqvist
e9b8970e72
Relaxed RDB version check (#1604)
New config `rdb-version-check` with values:

* `strict`: Reject future RDB versions.
* `relaxed`: Try parsing future RDB versions and fail only when an
unknown RDB opcode or type is encountered.

This can make it possible for Valkey 8.1 to try read a dump from for
example Valkey 9.0 or later on a best-effort basis. The conditions for
when this is expected to work can be defined when the future Valkey
versions are released. Loading is expected to fail in the following
cases:

* If the data set contains any new key types or other data elements not
supported by the current version.
* If the RDB contains new representations or encodings of existing key
types or other data elements.

This change also prepares for the next RDB version bump. A range of RDB
versions (12-79) is reserved, since it's expected to be used by foreign
software RDB versions, so Valkey will not accept versions in this range
even with the `relaxed` version check. The DUMP/RESTORE format has no
magic string; only the RDB version number.

This change also prepares for the magic string to change from REDIS to
VALKEY next time we bump the RDB version.

Related to #1108.

---------

Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>
2025-01-27 18:44:24 +01:00
Viktor Söderqvist
7699a3a94a
Fix use-after-free in hashtableTwoPhasePopDelete (#1626)
Use-after-free has been detect by address sanitizer, such as in this
test run:

https://github.com/valkey-io/valkey/actions/runs/12981530413/job/36200075972?pr=1620#step:5:1339

`hashtableShrinkIfNeeded` may free one of the hash tables and invalidate
the variables used by the `fillBucketHole(ht, b, pos_in_bucket,
table_index)` just after, causing use-after-free. Fill bucket hole first
and shrink afterwards is assumed to solve the issue. (Not reproduced
locally.)

Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
2025-01-27 15:45:09 +01:00
Madelyn Olson
88a68303c0
Make sure to disable pause after fork for dual channel test (#1612)
Might close https://github.com/valkey-io/valkey/issues/1484.

I noticed that we don't disable pause after fork on the last test that
was getting executed, so it might getting stuck in pause loops after the
test ends if it tries another psync for any reason.

---------

Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
2025-01-27 06:44:48 -08:00
Viktor Söderqvist
a18fcdb371
Deflake hashtable random fairness test (#1618)
Fixes the unit test for hashtable random fairness intermittent failures when
running with the `--accurate` flag.

https://github.com/valkey-io/valkey/actions/runs/12969591890/job/36173815884#step:10:105

The test case picks a random element out of 400, repeated 1M times, and
then checks that 60% of the elements are picked within 3 standard
deviations from the number of times they're expected to be picked. In
this test run (with `--accurate`), the expected number is 2500 and the
standard deviation is 50, which is only 2% of the expected value. This
makes the check too strict and makes the test flaky.

As an alternative, we allow 80% of the elements to be picked within 10%
of the expected number. With this alternative condition, we can also
raise the check for the non-edge case from 60% to 80% of the elements to
be within 3 standard deviations. (With fewer repetitions, 3 standard
deviations is greater than 10% of the expected value, so this new
condition only affects the `--accurate` test run.)

Additional change: Set a random seed to the hash function in the test
suite. Until now, we only seeded the random number generator.

Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
2025-01-27 10:13:46 +01:00
Viktor Söderqvist
66577573f2
Test coverage for COMMANDLOG HELP (#1617)
Fixes reply-schema-validator test job which needs coverage for all
commands.

Failing job:
https://github.com/valkey-io/valkey/actions/runs/12969591890/job/36173810824

Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
2025-01-27 04:38:54 +01:00
Harkrishn Patro
9071a5c8e6
Set GH actions job timeout to a day (#1540)
Signed-off-by: Harkrishn Patro <harkrisp@amazon.com>
2025-01-24 10:47:10 -08:00
zhaozhao.zz
3f21705a6c
Feature COMMANDLOG to record slow execution and large request/reply (#1294)
As discussed in PR #336.

We have different types of resources like CPU, memory, network, etc. The
`slowlog` can only record commands eat lots of CPU during the processing
phase (doesn't include read/write network time), but can not record
commands eat too many memory and network. For example:

1. run "SET key value(10 megabytes)" command would not be recored in
slowlog, since when processing it the SET command only insert the
value's pointer into db dict. But that command eats huge memory in query
buffer and bandwidth from network. In this case, just 1000 tps can cause
10GB/s network flow.
2. run "GET key" command and the key's value length is 10 megabytes. The
get command can eat huge memory in output buffer and bandwidth to
network.

This PR introduces a new command `COMMANDLOG`, to log commands that
consume significant network bandwidth, including both input and output.
Users can retrieve the results using `COMMANDLOG get <count>
large-request` and `COMMANDLOG get <count> large-reply`, all subcommands
for `COMMANDLOG` are:

* `COMMANDLOG HELP`
* `COMMANDLOG GET <count> <slow|large-request|large-reply>`
* `COMMANDLOG LEN <slow|large-request|large-reply>`
* `COMMANDLOG RESET <slow|large-request|large-reply>`

And the slowlog is also incorporated into the commandlog.

For each of these three types, additional configs have been added for
control:

* `commandlog-request-larger-than` and
`commandlog-large-request-max-len` represent the threshold for large
requests(the unit is Bytes) and the maximum number of commands that can
be recorded.
* `commandlog-reply-larger-than` and `commandlog-large-reply-max-len`
represent the threshold for large replies(the unit is Bytes) and the
maximum number of commands that can be recorded.
* `commandlog-execution-slower-than` and
`commandlog-slow-execution-max-len` represent the threshold for slow
executions(the unit is microseconds) and the maximum number of commands
that can be recorded.
* Additionally, `slowlog-log-slower-than` and `slowlog-max-len` are now
set as aliases for these two new configs.

---------

Signed-off-by: zhaozhao.zz <zhaozhao.zz@alibaba-inc.com>
Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>
Co-authored-by: Ping Xie <pingxie@outlook.com>
2025-01-24 11:41:40 +08:00
Nadav Gigi
f2510783f9
Accelerate hash table iterator with value prefetching (#1568)
This PR builds upon the [previous entry prefetching
optimization](https://github.com/valkey-io/valkey/pull/1501) to further
enhance performance by implementing value prefetching for hashtable
iterators.

## Implementation
Modified `hashtableInitIterator` to accept a new flags parameter,
allowing control over iterator behavior.
Implemented conditional value prefetching within `hashtableNext` based
on the new `HASHTABLE_ITER_PREFETCH_VALUES` flag.
When the flag is set, hashtableNext now calls `prefetchBucketValues` at
the start of each new bucket, preemptively loading the values of filled
entries into the CPU cache.
The actual prefetching of values is performed using type-specific
callback functions implemented in `server.c`:
- For `robj` the `hashtableObjectPrefetchValue` callback is used to
prefetch the value if not embeded.

This implementation is specifically focused on main database iterations
at this stage. Applying it to hashtables that hold other object types
should not be problematic, but its performance benefits for those cases
will need to be proven through testing and benchmarking.

## Performance

### Setup:
- 64cores Graviton 3 Amazon EC2 instance.
-  50 mil keys with different value sizes.
-  Running valkey server over RAM file system.
-  crc checksum and comperssion off.

### Action
- save command.

### Results
The results regarding the duration of “save” command was taken from
“info all” command.
```
+--------------------+------------------+------------------+ 
| Prefetching        | Value size (byte)| Time (seconds)   | 
+--------------------+------------------+------------------+ 
| No                 | 100              | 20.112279        | 
| Yes                | 100              | 12.758519        | 
| No                 | 40               | 16.945366        | 
| Yes                | 40               | 10.902022        |
| No                 | 20               | 9.817000         | 
| Yes                | 20               | 9.626821         |
| No                 | 10               | 9.71510          | 
| Yes                | 10               | 9.510565         |
+--------------------+------------------+------------------+
```
The results largely align with our expectations, showing significant
improvements for larger values (100 bytes and 40 bytes) that are stored
outside the robj. For smaller values (20 bytes and 10 bytes) that are
embedded within the robj, we see almost no improvement, which is as
expected.

However, the small improvement observed even for these embedded values
is somewhat surprising. Given that we are not actively prefetching these
embedded values, this minor performance gain was not anticipated.

perf record on save command **without** value prefetching:
```
                --99.98%--rdbSaveDb
                          |          
                          |--91.38%--rdbSaveKeyValuePair
                          |          |          
                          |          |--42.72%--rdbSaveRawString
                          |          |          |          
                          |          |          |--26.69%--rdbWriteRaw
                          |          |          |          |          
                          |          |          |           --25.75%--rioFileWrite.lto_priv.0
                          |          |          |          
                          |          |           --15.41%--rdbSaveLen
                          |          |                     |          
                          |          |                     |--7.58%--rdbWriteRaw
                          |          |                     |          |          
                          |          |                     |           --7.08%--rioFileWrite.lto_priv.0
                          |          |                     |                     |          
                          |          |                     |                      --6.54%--_IO_fwrite
                          |          |                     |                                         
                          |          |                     |          
                          |          |                      --7.42%--rdbWriteRaw.constprop.1
                          |          |                                |          
                          |          |                                 --7.18%--rioFileWrite.lto_priv.0
                          |          |                                           |          
                          |          |                                            --6.73%--_IO_fwrite
                          |          |                                                            
                          |          |          
                          |          |--40.44%--rdbSaveStringObject
                          |          |          
                          |           --7.62%--rdbSaveObjectType
                          |                     |          
                          |                      --7.39%--rdbWriteRaw.constprop.1
                          |                                |          
                          |                                 --7.04%--rioFileWrite.lto_priv.0
                          |                                           |          
                          |                                            --6.59%--_IO_fwrite
                          |                                                               
                          |          
                           --7.33%--hashtableNext.constprop.1
                                     |          
                                      --6.28%--prefetchNextBucketEntries.lto_priv.0
```
perf record on save command **with** value prefetching:
```
               rdbSaveRio
               |          
                --99.93%--rdbSaveDb
                          |          
                          |--79.81%--rdbSaveKeyValuePair
                          |          |          
                          |          |--66.79%--rdbSaveRawString
                          |          |          |          
                          |          |          |--42.31%--rdbWriteRaw
                          |          |          |          |          
                          |          |          |           --40.74%--rioFileWrite.lto_priv.0
                          |          |          |          
                          |          |           --23.37%--rdbSaveLen
                          |          |                     |          
                          |          |                     |--11.78%--rdbWriteRaw
                          |          |                     |          |          
                          |          |                     |           --11.03%--rioFileWrite.lto_priv.0
                          |          |                     |                     |          
                          |          |                     |                      --10.30%--_IO_fwrite
                          |          |                     |                                |          
                          |          |                     |          
                          |          |                      --10.98%--rdbWriteRaw.constprop.1
                          |          |                                |          
                          |          |                                 --10.44%--rioFileWrite.lto_priv.0
                          |          |                                           |          
                          |          |                                            --9.74%--_IO_fwrite
                          |          |                                                      |          
                          |          |          
                          |          |--11.33%--rdbSaveObjectType
                          |          |          |          
                          |          |           --10.96%--rdbWriteRaw.constprop.1
                          |          |                     |          
                          |          |                      --10.51%--rioFileWrite.lto_priv.0
                          |          |                                |          
                          |          |                                 --9.75%--_IO_fwrite
                          |          |                                           |          
                          |          |          
                          |           --0.77%--rdbSaveStringObject
                          |          
                           --18.39%--hashtableNext
                                     |          
                                     |--10.04%--hashtableObjectPrefetchValue
                                     |
                                      --6.06%--prefetchNextBucketEntries        

```
Conclusions:

The prefetching strategy appears to be working as intended, shifting the
performance bottleneck from data access to I/O operations.
The significant reduction in rdbSaveStringObject time suggests that
string objects(which are the values) are being accessed more
efficiently.

Signed-off-by: NadavGigi <nadavgigi102@gmail.com>
2025-01-23 12:17:20 +01:00
Viktor Söderqvist
99ed308817
Add cross-version test framework (and a simple test) (#1371)
This includes a way to run two versions of the server from the TCL test
framework. It's a preparation to add more cross-version tests. The
runtest script accepts a new parameter

    ./runtest --other-server-path path/to/valkey-server

and a new tag "needs:other-server" for test cases and start_server.
Tests with this tag are automatically skipped if `--other-server-path`
is not provided.

This PR adds it in a CI job with Valkey 7.2.7 by downloading a binary
release.

Fixes #76

---------

Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
2025-01-23 11:26:54 +01:00
ranshid
7fc958da52
fix test Protocol desync regression test with TLS (#1593)
remove socket nonblocking and simplify the validation

fixes https://github.com/valkey-io/valkey/issues/1592

Signed-off-by: ranshid <ranshid@amazon.com>
2025-01-21 08:57:01 +02:00
ranshid
dd92d079dc
Fix Protocol desync regression test (#1590)
The desync regression test was created as a regression test for the
following bug:
in case we embed NULL termination inside inline/multi-bulk message we
will not be able to perform strchr in order to
identify the newline(\n)/carriage-return(\r) in the client query buffer.
this can influence (for example) replica reading primary stream and keep
filling it's query buffer endlessly consuming more and more memory.

In order to handle the above risk, a check was added to verify the
inline bulk and multi-bulk size are not exceeding the 64K bytes in the
query-buffer. A test was placed in order to verify this.

This PR introduce the following fixes to the desync regression test:
1. fix the sent payload to flush 1024 bytes block of 'A's instead of
'payload' which was sent by mistake.
2. Make sure that the connection is correctly terminated on protocol
error by the server after exceeding the 64K and not over 64K.
3. add another test intrinsic which will also verify the nested bulk
with embedded null termination (was not verified before)

fixes https://github.com/valkey-io/valkey/issues/1583


NOTE: Although it is possible to change the use of strchr to a more
"safe" utility (eg memchr) which will not pause scan at first occurrence
of '\0', we still like to protect against over excessive usage of the
query buffer and also preserve the current behavior(?). We will look
into improving this though in a followup issue.

---------

Signed-off-by: Ran Shidlansik <ranshid@amazon.com>
Signed-off-by: ranshid <88133677+ranshid@users.noreply.github.com>
2025-01-20 20:28:45 +02:00
ranshid
3032ccd48a
Change the shared format for dual channel replication logs (#1586)
change the format of the dual channel replication logs so that it will
not
conflict with existing log formats like modules. 

Fixes: https://github.com/valkey-io/valkey/issues/1509

Signed-off-by: Ran Shidlansik <ranshid@amazon.com>
2025-01-20 08:04:47 +02:00
Viktor Söderqvist
b2e4155f54
Lower latenct-monitor-threashold in expire-cycle test case (#1584)
The test case checks for expire-cycle in LATENCY LATEST, but with the
new hash table, the expiry-cycle is too fast to be logged by latency
monitor. Lower the latency monitor threshold to make it more likely to
be logged.

Fixes #1580

---------

Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
2025-01-19 19:23:00 +01:00
Pierre
2d0b8e3608
Update comments and log message in cluster_legacy.c (#1561)
Update comments and log message in `cluster_legacy.c`.

Follow-up from #1441.

Signed-off-by: Pierre Turin <pieturin@amazon.com>
Co-authored-by: Ping Xie <pingxie@outlook.com>
Co-authored-by: Binbin <binloveplay1314@qq.com>
2025-01-17 15:56:52 +08:00
Pierre
c9aea6d2d3
Fix memory leak in forgotten node ping ext code path (#1574)
When processing a cluster bus PING extension, there is a memory leak
when adding a new key to the `nodes_black_list` dict. We now make sure
to free the key `sds` if the dict did not take ownership of it.

Signed-off-by: Pierre Turin <pieturin@amazon.com>
2025-01-16 15:38:15 -08:00
Harkrishn Patro
87cc3d7a71
Fix cluster info sent stats for message with light header (#1563)
This issue affected only two message types (CLUSTERMSG_TYPE_PUBLISH and CLUSTERMSG_TYPE_PUBLISHSHARD) because they used a light message header, which caused the CLUSTER INFO stats to miss sent/received message information for those types.

---------

Signed-off-by: Harkrishn Patro <harkrisp@amazon.com>
Signed-off-by: Harkrishn Patro <bunty.hari@gmail.com>
Co-authored-by: Binbin <binloveplay1314@qq.com>
2025-01-16 11:25:37 -08:00
Ricardo Dias
af71619c45
Extract the scripting engine code from the functions unit (#1312)
This commit creates a new compilation unit for the scripting engine code
by extracting the existing code from the functions unit.
We're doing this refactor to prepare the code for running the `EVAL`
command using different scripting engines.

This PR has a module API change: we changed the type of error messages
returned by the callback
`ValkeyModuleScriptingEngineCreateFunctionsLibraryFunc` to be a
`ValkeyModuleString` (aka `robj`);

This PR also fixes #1470.

---------

Signed-off-by: Ricardo Dias <ricardo.dias@percona.com>
2025-01-16 10:08:16 +01:00
Ray Cao
921ba19acb
Incr expired_keys if the unix-time is already expired for EXPIREAT and other commands(#1517)
Some commands that use unix-time, such as `EXPIREAT` and `SET EXAT`, should include the deleted keys in the `expired_keys` statistics if the specified time has already expired, and notifications should be sent in the manner of expired.

---------

Signed-off-by: Ray Cao <zisong.cw@alibaba-inc.com>
2025-01-16 16:40:34 +08:00
Binbin
cda9eee8c9
Allow clang-format to be triggered in push events (#1565)
Just like spell-check workflow, we should allow to trigger it
in push events, so that the forks repo can notice the format
thing way before submitting the PR.

Signed-off-by: Binbin <binloveplay1314@qq.com>
2025-01-16 10:23:03 +08:00
Sarthak Aggarwal
6a8f068e36
Adding Missing filters to CLIENT LIST and Dedup Parsing (#1401)
Adds filter options to CLIENT LIST:

    * USER <username>
      Return clients authenticated by <username>.
    * ADDR <ip:port>
      Return clients connected from the specified address.
    * LADDR <ip:port>
      Return clients connected to the specified local address.
    * SKIPME (YES|NO)
      Exclude the current client from the list (default: no).
    * MAXAGE <maxage>
      Only list connections older than the specified age.

Modifies the ID filter to CLIENT KILL to allow multiple IDs

    * ID <client-id> [<client-id>...]
      Kill connections by client ids.


This makes CLIENT LIST and CLIENT KILL accept the same options.

For backward compatibility, the default value for SKIPME is NO for
CLIENT LIST and YES for CLIENT KILL.

The MAXAGE comes from CLIENT KILL, where it *keeps* clients with the
given max age and kills the older ones. This logic becomes weird for
CLIENT LIST, but is kept for similary with CLIENT KILL, for the use case
of first testing manually using CLIENT LIST, and then running CLIENT
KILL with the same filters.

The `ID client-id [client-id ...]` no longer needs to be the last
filter. The parsing logic determines if an argument is an ID or not
based on whether it can be parsed as an integer or not.

Partly addresses: #668

---------

Signed-off-by: Sarthak Aggarwal <sarthagg@amazon.com>
2025-01-15 20:44:13 +01:00
zhaozhao.zz
c5a1585547
add paused_actions for INFO Clients (#1519)
Add `paused_actions` and `paused_timeout_milliseconds` for INFO Clients
to inform users about if clients are paused.

---------

Signed-off-by: zhaozhao.zz <zhaozhao.zz@alibaba-inc.com>
2025-01-14 19:01:00 +08:00
Viktor Söderqvist
2a1a65b4c7
Introduce const_sds for const-content sds (#1553)
`sds` is a typedef of `char *`.

`const sds` means `char * const`, i.e. a const-pointer to non-const
content.

More often, you would want `const char *`, i.e. a pointer to
const-content. Until now, it's not possible to express that. This PR
adds `const_sds` which is a pointer to const-content sds.

To get a const-pointer to const-content sds, you can use `const
const_sds`.

In this PR, some uses of `const sds` are replaced by `const_sds`. We can
use it more later.

Fixes #1542

---------

Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
2025-01-14 10:38:12 +01:00
Amit Nagler
6be1c77b1e
Fix valgrind test (#1555)
Introduced at https://github.com/valkey-io/valkey/pull/1165/files

Signed-off-by: naglera <anagler123@gmail.com>
2025-01-14 10:49:46 +02:00
secwall
fdc89c56b7
Escape unix socket group in unit tests (#1554)
In some cases unix groups could have whitespace and/or `\` in them.
One example is my workstation. It's a MacOS in an Active Directory
domain. So my user has group `LD\Domain Users`.
Running `make test` on `unstable` and `8.0` branches fails with:

I'm not sure if we need to fix this in 8.0. But it seems that it should
be fixed in unstable.

Signed-off-by: secwall <secwall@yandex-team.ru>
2025-01-13 20:05:04 -08:00
Rain Valentine
d13aad45f4
Replace dict with new hashtable: hash datatype (#1502)
This PR replaces dict with the new hashtable data structure in the HASH
datatype. There is a new struct for hashtable items which contains a
pointer to value sds string and the embedded key sds string. These
values were previously stored in dictEntry. This structure is kept
opaque so we can easily add small value embedding or other optimizations
in the future.

closes #1095

---------

Signed-off-by: Rain Valentine <rsg000@gmail.com>
2025-01-13 11:17:16 +01:00
Viktor Söderqvist
dc9ca1b98d
Test coverage for ECHO for reply schema validation (#1549)
After #1545 disabled some tests for reply schema validation, we now have
another issue that ECHO is not covered.

```
WARNING! The following commands were not hit at all:
  echo
ERROR! at least one command was not hit by the tests
```

This patch adds a test case for ECHO in the unit/other test suite. I
haven't checked if there are more commands that aren't covered.

Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
2025-01-13 10:14:09 +08:00
Viktor Söderqvist
ad592f73d7
Skip CLI tests with reply schema validation (#1545)
The commands used in valkey-cli tests are not important the reply schema
validation. Skip them to avoid the problem if tests hanging. This has
failed lately in the daily job:

```
[TIMEOUT]: clients state report follows.
sock55fedcc19be0 => (IN PROGRESS) valkey-cli pubsub mode with single standard channel subscription
Killing still running Valkey server 33357
```

These test cases use a special valkey-cli command `:get pubsub` command,
which is an internal command to valkey-cli rather than a Valkey server
command. This command hangs when compiled with with logreqres enabled.
Easy solution is to skip the tests in this setup.

The test cases were introduced in #1432.

Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
2025-01-12 08:02:39 +08:00
Binbin
11cb8ee27c
Add latency stats around cluster config file operations (#1534)
When the cluster changes, we need to persist the cluster configuration,
and these file IO operations may cause latency.

Signed-off-by: Binbin <binloveplay1314@qq.com>
2025-01-11 11:03:10 +08:00
Binbin
10357ceda5
Mark the node as FAIL when the node is marked as NOADDR and broadcast the FAIL (#1191)
Imagine we have a cluster, for example a three-shard cluster,
if shard 1 doing a CLUSTER RESET HARD, it will change the node
name, and then other nodes will mark it as NOADR since the node
name received by PONG has changed.

In the eyes of other nodes, there is one working primary node
left but with no address, and in this case, the address report
in MOVED will be invalid and will confuse the clients. And in
the same time, the replica will not failover since its primary
is not in the FAIL state. And the cluster looks OK to everyone.

This leaves a cluster that appears OK, but with no coverage for
shard 1, obviously we should do something like CLUSTER FORGET
to remove the node and fix the cluster before using it.

But the point in here, we can mark the NOADDR node as FAIL to
advance the cluster state. If a node is NOADDR means it does
not have a valid address, so we won't reconnect it, we won't
send PING, we won't gossip it, it seems reasonable to mark it
as FAIL.

Signed-off-by: Binbin <binloveplay1314@qq.com>
2025-01-11 11:02:05 +08:00
Binbin
211b250aad
Do election in order based on failed primary rank to avoid voting conflicts (#1018)
When multiple primary nodes fail simultaneously, the cluster can not recover
within the default effective time (data_age limit). The main reason is that
the vote is without ranking among multiple replica nodes, which case too many
epoch conflicts.

Therefore, we introduced into ranking based on the failed primary shard-id.
Introduced a new failed_primary_rank var, this var means the rank of this
myself instance in the context of all failed primary list. This var will be
used in failover and we will do the failover election packets in order based
on the rank, this can effectively avoid the voting conflicts.

If a single primary is down, the behavior is the same as before. If multiple
primaries are down, their replica election initiation time will be delayed
by 500ms according to the ranking.

Signed-off-by: Binbin <binloveplay1314@qq.com>
2025-01-11 10:43:18 +08:00
Binbin
d6bdd9e7d7
Fix module LatencyAddSample still work when latency-monitor-threshold is 0 (#1541)
When latency-monitor-threshold is set to 0, it means the latency monitor
is disabled, and in VM_LatencyAddSample, we wrote the condition
incorrectly, causing us to record latency when latency was turned off.

This bug was introduced in the very first day, see e3b1d6d, it was merged
in 2019.

Signed-off-by: Binbin <binloveplay1314@qq.com>
2025-01-11 10:32:58 +08:00
Binbin
e60990e579
Fix crash when freeing newly created node when nodeIp2String fail (#1535)
In #1441, we found a assert, and decided remove this assert and instead
just free the newly created node and close the link, since if we cannot
get the IP from the link it probably means the connection was closed.
```
=== VALKEY BUG REPORT START: Cut & paste starting from here ===
17847:M 19 Dec 2024 00:15:58.021 # === ASSERTION FAILED ===
17847:M 19 Dec 2024 00:15:58.021 # ==> cluster_legacy.c:3252 'nodeIp2String(node->ip, link, hdr->myip) == C_OK' is not true

------ STACK TRACE ------

17847 valkey-server *
src/valkey-server 127.0.0.1:27131 [cluster](clusterProcessPacket+0x1304) [0x4e5634]
src/valkey-server 127.0.0.1:27131 [cluster](clusterReadHandler+0x11e) [0x4e59de]
/__w/valkey/valkey/src/valkey-tls.so(+0x2f1e) [0x7f083983ff1e]
src/valkey-server 127.0.0.1:27131 [cluster](aeMain+0x8a) [0x41afea]
src/valkey-server 127.0.0.1:27131 [cluster](main+0x4d7) [0x40f547]
/lib64/libc.so.6(+0x40c8) [0x7f083985a0c8]
/lib64/libc.so.6(__libc_start_main+0x8b) [0x7f083985a18b]
src/valkey-server 127.0.0.1:27131 [cluster](_start+0x25) [0x410ef5]
```

But it also introduces another assert. The reason is that this new node
is not added to the cluster nodes dict.
```
17128:M 08 Jan 2025 10:51:44.061 # === ASSERTION FAILED ===
17128:M 08 Jan 2025 10:51:44.061 # ==> cluster_legacy.c:1693 'dictDelete(server.cluster->nodes, nodename) == DICT_OK' is not true

------ STACK TRACE ------

17128 valkey-server *
src/valkey-server 127.0.0.1:28627 [cluster][0x4ebdc4]
src/valkey-server 127.0.0.1:28627 [cluster][0x4e81d2]
src/valkey-server 127.0.0.1:28627 [cluster](clusterReadHandler+0x268)[0x4e8618]
/__w/valkey/valkey/src/valkey-tls.so(+0xb278)[0x7f109480b278]
src/valkey-server 127.0.0.1:28627 [cluster](aeMain+0x89)[0x592b09]
src/valkey-server 127.0.0.1:28627 [cluster](main+0x4b3)[0x453e23]
/lib64/libc.so.6(__libc_start_main+0xe5)[0x7f10958bf7e5]
src/valkey-server 127.0.0.1:28627 [cluster](_start+0x2e)[0x454a5e]
```

This closes #1527.

Signed-off-by: Binbin <binloveplay1314@qq.com>
2025-01-10 10:19:04 +08:00
Harkrishn Patro
c338de3d46
Update upload artifacts to v4 (#1539)
Fixes #1538

Signed-off-by: Harkrishn Patro <harkrisp@amazon.com>
2025-01-09 17:19:36 -08:00
Madelyn Olson
d99457c09c
Free the passed in lua context instead of the global (#1536)
The fix that Redis gave us for the CVE-2024-46981 was freeing lctx.lua,
and I didn't merge it correctly. We made some changes so that we are
able to async free the lua context, so we need to free the passed in
context. This was applied correctly on the two released versions (8.0
and 7.2) just not on unstable.

Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
2025-01-09 14:35:48 +08:00
Binbin
b207b421bc
Fix new cli subscribed mode test in cluster mode (#1533)
We need to add a hash tag in cluster mode.
Fixes #1531.

Signed-off-by: Binbin <binloveplay1314@qq.com>
2025-01-09 12:21:31 +08:00
Karthick Ariyaratnam
80c35402bc
Remove legacy SERVER_TEST compiler flag from cmake. (#1530)
This PR is to cleanup the `SERVER_TEST` compiler flag from cmake compile
definitions, as it is no longer required in the new unit test framework, see #428.

Signed-off-by: Karthick Ariyaratnam <karthyuom@gmail.com>
2025-01-09 11:52:45 +08:00
Nadav Gigi
9f4815a224
Accelerate hash table iterator with prefetching (#1501)
This PR introduces improvements to the hashtable iterator, implementing
prefetching technique described in the blog post [Unlock One Million RPS
- Part 2](https://valkey.io/blog/unlock-one-million-rps-part2/) . The
changes lay the groundwork for further enhancements in use cases
involving iterators. Future PRs will build upon this foundation to
improve performance and functionality in various iterator-dependent
operations.

In the pursuit of maximizing iterator performance, I conducted a
comprehensive series of experiments. My tests encompassed a wide range
of approaches, including processing multiple bucket indices in parallel,
prefetching the next bucket upon completion of the current one, and
several other timing and quantity variations. Surprisingly, after
rigorous testing and performance analysis, the simplest implementation
presented in this PR consistently outperformed all other more complex
strategies.

## Implementation

Each time we start iterating over a bucket, we prefetch data for future
iterations:

- We prefetch the entries of the next bucket (if it exists).
- We prefetch the structure (but not the entries) of the bucket after
  the next.

This prefetching is done when we pick up a new bucket, increasing the
chance that the data will be in cache by the time we need it.

## Performance

The data below was taken by conducting keys command on 64cores Graviton
3 Amazon EC2 instance with 50 mil keys in size of 100 bytes each. The
results regarding the duration of “keys *” command was taken from “info
all” command.

```
+--------------------+------------------+-----------------------------+
| prefetching        | Time (seconds)   | Keys Processed per Second   |
+--------------------+------------------+-----------------------------+
| No                 | 11.112279        | 4,499,529                   |
| Yes                | 3.141916         | 15,913,862                  |
+--------------------+------------------+-----------------------------+
Improvement:
Comparing the iterator without prefetching to the one with prefetching, 
we can see a speed improvement of 11.112279 / 3.141916 ≈ 3.54 times faster.
```


### Save command improvment

#### Setup:
- 64cores Graviton 3 Amazon EC2 instance.
-  50 mil keys in size of 100 bytes each.
-  Running valkey server over RAM file system.
-  crc checksum and comperssion off.

#### Results

```
+--------------------+------------------+-----------------------------+
| prefetching        | Time (seconds)   | Keys Processed per Second   |
+--------------------+------------------+-----------------------------+
| No                 | 28               | 1,785,700                   |
| Yes                | 19.6             | 2,550,000                   |
+--------------------+------------------+-----------------------------+
Improvement:
- Reduced SAVE time by 30% (8.4 seconds faster)
- Increased key processing rate by 42.8% (764,300 more keys/second)
```

Signed-off-by: NadavGigi <nadavgigi102@gmail.com>
2025-01-08 23:18:55 +01:00
Viktor Szépe
418f1d059f
Improve Typos configuration (#1456)
- remove old ignores
- fix a "new" typo 🎁

Signed-off-by: Viktor Szépe <viktor@szepe.net>
2025-01-08 22:39:45 +01:00
Nikhil Manglore
9e0204941d
valkey-cli auto-exit from subscribed mode (#1432)
Resolves issue with valkey-cli not auto exiting from subscribed mode on
reaching zero pub/sub subscription (previously filed on Redis)
https://github.com/redis/redis/issues/12592

---------

Signed-off-by: Nikhil Manglore <nmanglor@amazon.com>
2025-01-08 21:03:06 +01:00
Rueian
0a89571dcc
Skip logreqres on tests for the HELLO command (#1528)
Skip logreqres on tests for the HELLO command

Signed-off-by: Rueian <rueiancsie@gmail.com>
2025-01-08 10:05:20 -08:00
Rain Valentine
ab627d6721
Replace dict with new hashtable: sorted set datatype (#1427)
This PR replaces dict with hashtable in the ZSET datatype. Instead of
mapping key to score as dict did, the hashtable maps key to a node in
the skiplist, which contains the score. This takes advantage of
hashtable performance improvements and saves 15 bytes per set item - 24
bytes overhead before, 9 bytes after.

Closes #1096

---------

Signed-off-by: Rain Valentine <rsg000@gmail.com>
Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
2025-01-08 18:34:02 +01:00
Lipeng Zhu
8af35a1712
Add build folder to gitignore. (#1488)
Default cmake build folder in vscode is `"cmake.buildDirectory": "${workspaceFolder}/build"`.

Signed-off-by: Lipeng Zhu <lipeng.zhu@intel.com>
2025-01-08 19:33:02 +08:00
uriyage
6c09eea2bc
client struct: lazy init components and optimize struct layout (#1405)
# Refactor client structure to use modular data components

## Current State
The client structure allocates memory for replication / pubsub /
multi-keys / module / blocked data for every client, despite these
features being used by only a small subset of clients. In addition the
current field layout in the client struct is suboptimal, with poor
alignment and unnecessary padding between fields, leading to a larger
than necessary memory footprint of 896 bytes per client. Furthermore,
fields that are frequently accessed together during operations are
scattered throughout the struct, resulting in poor cache locality.

## This PR's Change

1.  Lazy Initialization 
- **Components are only allocated when first used:**
  - PubSubData: Created on first SUBSCRIBE/PUBLISH operation
  - ReplicationData: Initialized only for replica connections
  - ModuleData: Allocated when module interaction begins
  - BlockingState: Created when first blocking command is issued
  - MultiState: Initialized on MULTI command

2. Memory Layout Optimization:
   - Grouped related fields for better locality
   - Moved rarely accessed fields (e.g., client->name) to struct end
   - Optimized field alignment to eliminate padding

3. Additional changes:
   - Moved watched_keys to be static allocated in the `mstate` struct
   - Relocated replication init logic to replication.c
  

### Key Benefits
- **Efficient Memory Usage:**
- 45% smaller base client structure - Basic clients now use 528 bytes
(down from 896).
- Better memory locality for related operations
- Performance improvement in high throughput scenarios. No performance
regressions in other cases.


### Performance Impact

Tested with 650 clients and 512 bytes values.

#### Single Thread Performance
| Operation   | Dataset | New (ops/sec) | Old (ops/sec) | Change % |
|------------|---------|---------------|---------------|-----------|
| SET        | 1 key   | 261,799      | 258,261      | +1.37%    |
| SET        | 3M keys | 209,134      | ~209,000     | ~0%       |
| GET        | 1 key   | 281,564      | 277,965      | +1.29%    |
| GET        | 3M keys | 231,158      | 228,410      | +1.20%    |

#### 8 IO Threads Performance
| Operation   | Dataset | New (ops/sec) | Old (ops/sec) | Change % |
|------------|---------|---------------|---------------|-----------|
| SET        | 1 key   | 1,331,578    | 1,331,626    | -0.00%    |
| SET        | 3M keys | 1,254,441    | 1,152,645    | +8.83%    |
| GET        | 1 key   | 1,293,149    | 1,289,503    | +0.28%    |
| GET        | 3M keys | 1,152,898    | 1,101,791    | +4.64%    |

#### Pipeline Performance (3M keys)
| Operation | Pipeline Size | New (ops/sec) | Old (ops/sec) | Change % |
|-----------|--------------|---------------|---------------|-----------|
| SET       | 10          | 548,964      | 538,498      | +1.94%    |
| SET       | 20          | 606,148      | 594,872      | +1.89%    |
| SET       | 30          | 631,122      | 616,606      | +2.35%    |
| GET       | 10          | 628,482      | 624,166      | +0.69%    |
| GET       | 20          | 687,371      | 681,659      | +0.84%    |
| GET       | 30          | 725,855      | 721,102      | +0.66%    |

### Observations:
1. Single-threaded operations show consistent improvements (1-1.4%)
2. Multi-threaded performance shows significant gains for large
datasets:
   - SET with 3M keys: +8.83% improvement
   - GET with 3M keys: +4.64% improvement
3. Pipeline operations show consistent improvements:
   - SET operations: +1.89% to +2.35%
   - GET operations: +0.66% to +0.84%
4. No performance regressions observed in any test scenario


Related issue:https://github.com/valkey-io/valkey/issues/761

---------

Signed-off-by: Uri Yagelnik <uriy@amazon.com>
Signed-off-by: uriyage <78144248+uriyage@users.noreply.github.com>
Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
2025-01-08 10:28:54 +02:00
Rueian
dc4628d444
Add availability_zone to the HELLO command history (#1524)
This PR is a followup for #1487.

Signed-off-by: Rueian <rueiancsie@gmail.com>
Co-authored-by: Binbin <binloveplay1314@qq.com>
2025-01-08 10:04:58 +08:00
Madelyn Olson
d3acd90320
Actually run code coverage on ubuntu 22 (#1522)
This commit, https://github.com/valkey-io/valkey/pull/1504, moved the
wrong worker to ubuntu 22. We wanted to move codecov and not coverity.

Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
2025-01-07 15:43:46 -08:00
Rueian
3b52186b6a
Add availability_zone to the HELLO response (#1487)
It's inconvenient for client implementations to extract the
`availability_zone` information from the `INFO` response. The `INFO`
response contains a lot of information that a client implementation
typically doesn't need.

This PR adds the availability zone to the `HELLO` response. Clients
usually already use the `HELLO` command for protocol negotiation and
also get the server `version` and `role` from its response. To keep the
`HELLO` response small, the field is only added if availability zone is
configured.

---------

Signed-off-by: Rueian <rueiancsie@gmail.com>
2025-01-07 22:54:55 +01:00
Madelyn Olson
e1db553834
Add tests for acl selectors with no permissions or patterns (#1515)
Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
2025-01-06 15:46:55 -08:00
Madelyn Olson
4ffd3ebdeb
Fix LUA garbage collector (CVE-2024-46981) (#1513)
Reset GC state before closing the lua VM to prevent user data to be
wrongly freed while still might be used on destructor callbacks.

Created and publish by Redis in their OSS branch.

Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
Co-authored-by: YaacovHazan <yaacov.hazan@redis.com>
2025-01-06 14:02:22 -08:00
Madelyn Olson
7977c55ac9
Fix Read/Write key pattern selector (CVE-2024-51741) (#1514)
The explanation on the original commit was wrong. Key based access must
have a `~` in order to correctly configure whey key prefixes to apply
the selector to. If this is missing, a server assert will be triggered
later.

Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
Co-authored-by: YaacovHazan <yaacov.hazan@redis.com>
2025-01-06 14:02:16 -08:00
Binbin
c0014ef15e
Check whether to switch to fail when setting the node to pfail in cron (#1061)
This may speed up the transition to the fail state a bit.
Previously we would only check when we received a pfail/fail
report from others in gossip. If myself is the last vote,
we can directly switch to fail in here without waiting for
the next gossip packet.

Signed-off-by: Binbin <binloveplay1314@qq.com>
2025-01-06 09:26:17 +08:00
Binbin
33b824137e
Explicitly check C_ERR condition to improve readability in clusterSaveConfig (#1505)
It's not obvious to see it at first, modify it to use C_ERR.

Signed-off-by: Binbin <binloveplay1314@qq.com>
2025-01-04 10:47:32 +08:00
eifrah-aws
b3b4bdcda4
CMake: fail on warnings (#1503)
When building with `CMake` (especially the targets `valkey-cli`,
`valkey-server` and `valkey-benchmark`) it is possible to have a
successful build while having warnings.

This PR fixes this - which is aligned with how the `Makefile` is working
today:
- Enable `-Wall` + `-Werror` for valkey targets
- Fixed warning in valkey-cli:jsonStringOutput method

Signed-off-by: Eran Ifrah <eifrah@amazon.com>
2025-01-03 09:44:41 +08:00
Madelyn Olson
fe72c784b7
Move coverity back to ubuntu 22 until test failures are fixed (#1504)
The issues in #1453 seem to
have only shown up since we moved to ubuntu 24, as part of the rolling
`ubunut-latest` migration from 22->24.

Closes #1453.

Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
2025-01-03 09:43:16 +08:00
gmbnomis
26a72fa89c
Use the correct command proc for the LOOKUP_NOTOUCH exception in lookupKey (#1499)
When looking up a key in no-touch mode, `LOOKUP_NOTOUCH` is set to avoid
updating the last access time in `lookupKey`. An exception must be made
for the `TOUCH` command which must always update the key.

When called from a script, `server.executing_client` will point to the
`TOUCH` command, while `server.current_client` will point to e.g. an
`EVAL` command. So, we must use the former to find out the currently
executing command if defined.

This fix addresses the issue where TOUCH wasn't updating key access
times when called from scripts like EVAL.

Fixes #1498

Signed-off-by: Simon Baatz <gmbnomis@gmail.com>
Co-authored-by: Binbin <binloveplay1314@qq.com>
2025-01-03 09:41:15 +08:00
Wen Hui
93b701d8d4
Update Redis legacy keyword and link in utils/whatisdoing.sh (#1495)
Signed-off-by: hwware <wen.hui.ware@gmail.com>
2025-01-03 09:37:55 +08:00
Ricardo Dias
8d764f27b3
Refactor: move all valkey modules related declarations to module.h (#1489)
In this commit we move all structures and functions declarations related
to Valkey modules from `server.h` to the recently added `module.h` file.

This re-organization makes it easier for new contributors to find the
valkey modules related code, as well as reducing the compilation times
when changes are made to the modules code.

---------

Signed-off-by: Ricardo Dias <ricardo.dias@percona.com>
2025-01-02 18:35:10 +01:00
Wen Hui
ede4adde7a
Remove releasetools folder (#1496)
The release tool utils\releasetools\ does not work anymore in Valkey, in
this PR, we remove it.

Signed-off-by: hwware <wen.hui.ware@gmail.com>
2025-01-02 10:12:09 -05:00
uriyage
35abb68b79
Offload reading the replication stream to IO threads (#1449)
Support Primary client IO offload.

Related issue: https://github.com/valkey-io/valkey/issues/761

---------

Signed-off-by: Uri Yagelnik <uriy@amazon.com>
2025-01-02 10:42:39 +01:00
uriyage
ae70c5459b
replication: fix io-threads possible race by moving waitForClientIO (#1422)
### Fix race with pending writes in replica state transition

#### The Problem
In #60 (Dual channel replication) a new `connWrite` call was added
before the `waitForClientIO` check. This created a race condition where
the main thread may attempt to write to a client that could have pending
writes in IO threads.

#### The Fix
Moved the `waitForClientIO()` call earlier in `syncCommand`, before any
`connWrite` call. This ensures all pending IO operations are completed
before attempting to write to the client.

---------

Signed-off-by: Uri Yagelnik <uriy@amazon.com>
2025-01-02 10:01:55 +02:00
Amit Nagler
8aff235721
Fix unreliable dual channel Valgrind tests (#1500)
Used same approach as PR #1165 to solve random failures.

Resolves #1491

Signed-off-by: naglera <anagler123@gmail.com>
2025-01-02 10:00:29 +08:00
ranshid
0f273bb648
Align rejected unblocked commands to update the correct error statistic (#577)
Currently, in case a blocked command is unblocked externally (eg. due to
the relevant slot being migrated or the CLIENT UNBLOCK command was
issued, the command statistics will always update the failed_calls error
statistic. This leads to missalignment with
90b9f08e5d
as well as some inconsistencies. For example when a key is migrated
during cluster slot migration, clients blocked on XREADGROUP will be
unblocked and update the rejected_calls stat, while clients blocked on
BLPOP will get unblocked updating the failed_calls stat.

In this PR we add explicit indication in updateStatsOnUnblock thet
indicates if the command was rejected or failed.

---------

Signed-off-by: ranshid <ranshid@amazon.com>
Signed-off-by: Ran Shidlansik <ranshid@amazon.com>
2025-01-01 16:33:09 +02:00
zhenwei pi
a136ad9a50
Make global configs as static (#1159)
Don't expose static configs symbol, and make configEnumGetValue as
static function.

Signed-off-by: zhenwei pi <pizhenwei@bytedance.com>
2024-12-30 15:58:06 -05:00
Pierre
e4179f1f3b
Only (re-)send MEET packet once every handshake timeout period (#1441)
Add `meet_sent` field in `clusterNode` indicating the last time we sent
a MEET packet. Use this field to only (re-)send a MEET packet once every
handshake timeout period when detecting a node without an inbound link.

When receiving multiple MEET packets on the same link while the node is
in handshake state, instead of dropping the packet, we now simply
prevent the creation of a new node. This way we still process the MEET
packet's gossip and reply with a PONG as any other packets.

Improve some logging messages to include `human_nodename`. Add
`nodeExceedsHandshakeTimeout()` function.

This is a follow-up to this previous PR:
https://github.com/valkey-io/valkey/pull/1307
And a partial fix to the crash described in:
https://github.com/valkey-io/valkey/pull/1436

---------

Signed-off-by: Pierre Turin <pieturin@amazon.com>
2024-12-30 15:56:39 -05:00
Madelyn Olson
e470735d91
Immediately restart the defrag cycle if we still need to defrag (#1492) 2024-12-29 08:22:49 -08:00
gmbnomis
8b40341295
Fix JSON description of SET command (#1473)
In the `arguments` section, the `arguments` key is only used for
arguments of type `block` or `oneof`.

Consequently, the `arguments` given for `IFEQ` are ignored by the
server. However, they lead to strange results when rendering the
command's page for the web documentation.

Fix this by removing `arguments` for `IFEQ`.

Signed-off-by: Simon Baatz <gmbnomis@gmail.com>
2024-12-27 00:55:20 +01:00
uriyage
bb325bde35
Fix restore replica output bytes stat update (#1486)
This PR fixes the missing stat update for `total_net_repl_output_bytes`
that was removed during the refactoring in PR #758. The metric was not
being updated when writing to replica connections.

Changes:
- Restored the stat update in postWriteToClient for replica connections
- Added integration test to verify the metric is properly updated

Signed-off-by: Uri Yagelnik <uriy@amazon.com>
Co-authored-by: Binbin <binloveplay1314@qq.com>
2024-12-25 10:58:49 +08:00
Binbin
da92c1d6c8
Document all command flags near serverCommand (#1474)
These flags are not documented here.

Signed-off-by: Binbin <binloveplay1314@qq.com>
2024-12-25 10:57:42 +08:00
Amit Nagler
9f4503ca50
Add scoped RDB loading context and immediate abort flag (#1173)
This PR introduces a new mechanism for temporarily changing the
server's loading_rio context during RDB loading operations. The new
`RDB_SCOPED_LOADING_RIO` macro allows for a scoped change of the
`server.loading_rio` value, ensuring that it's automatically restored
to its original value when the scope ends.

Introduces a dedicated flag to `rio` to signal immediate abort,
preventing
potential use-after-free scenarios during replication disconnection in 
dual-channel load. This ensures proper termination of
`rdbLoadRioWithLoadingCtx`
when replication is cancelled due to connection loss on main connection.

Fixes https://github.com/valkey-io/valkey/issues/1152

---------

Signed-off-by: naglera <anagler123@gmail.com>
Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
Signed-off-by: Amit Nagler <58042354+naglera@users.noreply.github.com>
Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>
Co-authored-by: ranshid <88133677+ranshid@users.noreply.github.com>
2024-12-24 08:14:32 +02:00
Amit Nagler
f1b7f3072c
Reduce dual channel testing time (#1477)
- By not waiting `repl-diskless-sync-delay` when we don't have to, we
can reduce ~30% of dual channel tests execution time.
- This commit also drops one test which is not required for regular sync
(`Sync should continue if not all slaves dropped`).
- Skip dual channel test with master diskless disabled because it will
initiate the same synchronization process as the non-dual channel test,
making it redundant.


Before:
```
Execution time of different units:
  171 seconds - integration/dual-channel-replication
  305 seconds - integration/replication-psync

\o/ All tests passed without errors!
```
After:
```
Execution time of different units:
  120 seconds - integration/dual-channel-replication
  236 seconds - integration/replication-psync

\o/ All tests passed without errors!
```

Discused on https://github.com/valkey-io/valkey/pull/1173

---------

Signed-off-by: naglera <anagler123@gmail.com>
2024-12-24 08:13:25 +02:00
Madelyn Olson
2ee06e7983
Remove readability refactor for failover auth to fix clang warning (#1481)
As part of #1463, I made a small refactor between the PR and the daily
test I submitted to try to improve readability by adding a function to
abstract the extraction of the message types. However, that change
apparently caused GCC to throw another warning, so reverting the
abstraction on just one line.

Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
2024-12-24 13:07:15 +08:00
Binbin
d00c856448
Fix switch case compilation error in the new helloscripting (#1472)
It is missing the curly braces for variable declaration after case.

Signed-off-by: Binbin <binloveplay1314@qq.com>
2024-12-22 22:57:56 +01:00
Ricardo Dias
6adef8e2f9
Adds support for scripting engines as Valkey modules (#1277)
This PR extends the module API to support the addition of different
scripting engines to execute user defined functions.

The scripting engine can be implemented as a Valkey module, and can be
dynamically loaded with the `loadmodule` config directive, or with the
`MODULE LOAD` command.

This PR also adds an example of a dummy scripting engine module, to show
how to use the new module API. The dummy module is implemented in
`tests/modules/helloscripting.c`.

The current module API support, only allows to load scripting engines to
run functions using `FCALL` command.

The additions to the module API are the following:

```c
/* This struct represents a scripting engine function that results from the
 * compilation of a script by the engine implementation. */
struct ValkeyModuleScriptingEngineCompiledFunction

typedef ValkeyModuleScriptingEngineCompiledFunction **(*ValkeyModuleScriptingEngineCreateFunctionsLibraryFunc)(
    ValkeyModuleScriptingEngineCtx *engine_ctx,
    const char *code,
    size_t timeout,
    size_t *out_num_compiled_functions,
    char **err);

typedef void (*ValkeyModuleScriptingEngineCallFunctionFunc)(
    ValkeyModuleCtx *module_ctx,
    ValkeyModuleScriptingEngineCtx *engine_ctx,
    ValkeyModuleScriptingEngineFunctionCtx *func_ctx,
    void *compiled_function,
    ValkeyModuleString **keys,
    size_t nkeys,
    ValkeyModuleString **args,
    size_t nargs);

typedef size_t (*ValkeyModuleScriptingEngineGetUsedMemoryFunc)(
    ValkeyModuleScriptingEngineCtx *engine_ctx);

typedef size_t (*ValkeyModuleScriptingEngineGetFunctionMemoryOverheadFunc)(
    void *compiled_function);

typedef size_t (*ValkeyModuleScriptingEngineGetEngineMemoryOverheadFunc)(
    ValkeyModuleScriptingEngineCtx *engine_ctx);

typedef void (*ValkeyModuleScriptingEngineFreeFunctionFunc)(
    ValkeyModuleScriptingEngineCtx *engine_ctx,
    void *compiled_function);

/* This struct stores the callback functions implemented by the scripting
 * engine to provide the functionality for the `FUNCTION *` commands. */
typedef struct ValkeyModuleScriptingEngineMethodsV1 {
    uint64_t version; /* Version of this structure for ABI compat. */

    /* Library create function callback. When a new script is loaded, this
     * callback will be called with the script code, and returns a list of
     * ValkeyModuleScriptingEngineCompiledFunc objects. */
    ValkeyModuleScriptingEngineCreateFunctionsLibraryFunc create_functions_library;

    /* The callback function called when `FCALL` command is called on a function
     * registered in this engine. */
    ValkeyModuleScriptingEngineCallFunctionFunc call_function;

    /* Function callback to get current used memory by the engine. */
    ValkeyModuleScriptingEngineGetUsedMemoryFunc get_used_memory;

    /* Function callback to return memory overhead for a given function. */
    ValkeyModuleScriptingEngineGetFunctionMemoryOverheadFunc get_function_memory_overhead;

    /* Function callback to return memory overhead of the engine. */
    ValkeyModuleScriptingEngineGetEngineMemoryOverheadFunc get_engine_memory_overhead;

    /* Function callback to free the memory of a registered engine function. */
    ValkeyModuleScriptingEngineFreeFunctionFunc free_function;
} ValkeyModuleScriptingEngineMethodsV1;

/* Registers a new scripting engine in the server.
 *
 * - `engine_name`: the name of the scripting engine. This name will match
 *   against the engine name specified in the script header using a shebang.
 *
 * - `engine_ctx`: engine specific context pointer.
 *
 * - `engine_methods`: the struct with the scripting engine callback functions
 * pointers.
 */
int ValkeyModule_RegisterScriptingEngine(ValkeyModuleCtx *ctx,
                                         const char *engine_name,
                                         void *engine_ctx,
                                         ValkeyModuleScriptingEngineMethods engine_methods);

/* Removes the scripting engine from the server.
 *
 * `engine_name` is the name of the scripting engine.
 *
 */
int ValkeyModule_UnregisterScriptingEngine(ValkeyModuleCtx *ctx, const char *engine_name);
```

---------

Signed-off-by: Ricardo Dias <ricardo.dias@percona.com>
2024-12-21 23:09:35 +01:00
Madelyn Olson
1c97317518
Resolve bounds checks on cluster_legacy.c (#1463)
We are getting a number of errors like:
```
array subscript ‘clusterMsg[0]’ is partly outside array bounds of ‘unsigned char[2272]’
```

Which is basically GCC telling us that we have an object which is longer
than the underlying storage of the allocation. We actually do this a
lot, but GCC is generally not aware of how big the underlying allocation
is, so it doesn't throw this error. We are specifically getting this
error because the msgBlock can be of variable length depending on the
type of message, but GCC assumes it's the longest one possible. The
solution I went with here was make the message type optional, so that it
wasn't included in the size. I think this also makes some sense, since
it's really just a helper for us to easily cast the object around.

I considered disabling this error, but it is generally pretty useful
since it can catch real issues. Another solution would be to
over-allocate to the largest possible object, which could hurt
performance as we initialize it to zero.

Results:
https://github.com/madolson/valkey/actions/runs/12423414811/job/34686899884

This is a slightly cleaned up version of
https://github.com/valkey-io/valkey/pull/1439. I thought I had another
strategy but alas, it didn't work out.

Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
2024-12-20 12:10:48 -08:00
Madelyn Olson
b56f4f70d2
Update info.tcl test to revert client output limits sooner (#1462)
We set the client output buffer limits to 10 bytes, and then execute
`info stats` which produces more than 10 bytes of output, which can
cause that command to throw an error.

I'm not sure why it wasn't consistently erroring before, might have been
some change related to the ubuntu upgrade though.

Issues related to ubuntu-tls are hopefully resolved now.

Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
2024-12-20 10:16:46 +08:00
Madelyn Olson
ffef236dbb
Fix storing the wrong PID in active servers (#1464)
In #1459, I missed that the data was also used to keep track of the PID
files so if the testing framework crashed it would no longer be able to
cleanup the extra servers. So now we properly extract the PID and store
it so we can clean up PIDs.

Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
2024-12-20 10:14:56 +08:00
Binbin
ca0b0c662a
Clear outdated failure reports more accurately (#1184)
There are two changes here:

1. The one in clusterNodeCleanupFailureReports, only primary with slots can
report the failure report, if the primary became a replica its failure report
should be cleared. This may lead to inaccurate node fail judgment in some network
partition cases i guess, it will also affect the CLUSTER COUNT-FAILURE-REPORTS
command.

2. The one in clusterProcessGossipSection, it is not that important, but it can
print a "node is back online" log helps us troubleshoot the problem, although
it may conflict with 1 at some points.

Signed-off-by: Binbin <binloveplay1314@qq.com>
2024-12-20 10:14:01 +08:00
Roshan Khatri
e48317eb34
Workflow changes to fix old release binaries (#1461)
- Moves `build-config.json` to workflow dir to build old versions with
new configs.
- Enables contributors to test release Wf on private repo by adding
`github.event_name == 'workflow_dispatch' ||`

---------

Signed-off-by: Roshan Khatri <rvkhatri@amazon.com>
2024-12-19 21:32:40 +01:00
Jungwoo Song
e9a1fe0b32
Support for reading from replicas in valkey-benchmark (#1392)
**Background**
When conducting performance tests using `valkey-benchmark`, reading from
replicas was not supported. Consequently, even in cluster mode, all
reads were directed to the primary nodes. This limitation made it
challenging to obtain accurate metrics during workload stress testing
for performance measurement or before a version upgrade.

Related issue : https://github.com/valkey-io/valkey/issues/900

**Changes**
1. Replaced the use of `CLUSTER NODES` with `CLUSTER SLOTS` when
fetching cluster configuration. This allows for easier identification of
replica slots.
2. Support for reading from replicas by executing the client in
`READONLY` mode.
3. Support reading from replicas even during slot migrations.
4. Introduced two CLI options `--rfr` to enable reading from replicas
only or all cluster nodes. A warning added to indicate that write
requests might not be handled correctly when using this option.

---------

Signed-off-by: bluayer <ijacsong98@gmail.com>
Signed-off-by: bluayer <bluayer@gmail.com>
Signed-off-by: Jungwoo Song <37579681+bluayer@users.noreply.github.com>
Co-authored-by: ranshid <88133677+ranshid@users.noreply.github.com>
2024-12-19 18:32:31 +02:00
Binbin
97029953a0
Minor log fixes when failover auth denied due to slot epoch (#1341)
The old reqEpoch mainly refers to requestCurrentEpoch, see:
```
    if (requestCurrentEpoch < server.cluster->currentEpoch) {
        serverLog(LL_WARNING, "Failover auth denied to %.40s (%s): reqEpoch (%llu) < curEpoch(%llu)", node->name,
                  node->human_nodename, (unsigned long long)requestCurrentEpoch,
                  (unsigned long long)server.cluster->currentEpoch);
        return;
    }
```

And in here we refer to requestConfigEpoch, it's a bit misleading,
so change it to reqConfigEpoch to make it clear.

Signed-off-by: Binbin <binloveplay1314@qq.com>
2024-12-19 16:12:34 +08:00
Madelyn Olson
079f4edf2d
Add a hint about the current file for TCL debugging (#1459)
There are some tests that fail and give no useful information since they are
outside of a test context. Now we will at least get the file we are located in.

We can sort of reverse engineer where we are in the test by seeing which
tests have finished in a file.

```
[TIMEOUT]: clients state report follows.
sock6 => (SPAWNED SERVER) pid:30375 - tests/unit/info.tcl
Killing still running Valkey server 30375 - tests/unit/info.tcl
```

Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
2024-12-19 14:18:02 +08:00
Madelyn Olson
60197b30e2
Attempt to read secondary error from info test (#1452)
The test attempts to write 1MB of data in order to trigger a disconnect.
Normally, the data is fully flushed and we get the error on the read
(I/O error). However, it's possible we might fail the write, which
leaves the client in an inconsistent state. On the next command, we
finally process the I/O error on the FD. So, the simple fix is to
consume any secondary errors.

---------

Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
2024-12-18 09:17:11 -08:00
uriyage
8060c86d20
Offload TLS negotiation to I/O threads (#1338)
## TLS Negotiation Offloading to I/O Threads

### Overview
This PR introduces the ability to offload TLS handshake negotiations to
I/O threads, significantly improving performance under high TLS
connection loads.

### Key Changes
- Added infrastructure to offload TLS negotiations to I/O threads
- Refactored SSL event handling to allow I/O threads modify conn flags.
- Introduced new connection flag to identify client connections

### Performance Impact
Testing with 650 clients with SET commands and 160 new TLS connections
per second in the background:

#### Throughput Impact of new TLS connections
- **With Offloading**: Minimal impact (1050K → 990K ops/sec)
- **Without Offloading**: Significant drop (1050K → 670K ops/sec)

#### New Connection Rate
- **With Offloading**: 
  - 1,757 conn/sec
- **Without Offloading**: 
  - 477 conn/sec

### Implementation Details
1. **Main Thread**:
   - Initiates negotiation-offload jobs to I/O threads
- Adds connections to pending-read clients list (using existing read
offload mechanism)
   - Post-negotiation handling:
     - Creates read/write events if needed for incomplete negotiations
     - Calls accept handler for completed negotiations

2. **I/O Thread**:
   - Performs TLS negotiation
   - Updates connection flags based on negotiation result

Related issue:https://github.com/valkey-io/valkey/issues/761

---------

Signed-off-by: Uri Yagelnik <uriy@amazon.com>
Signed-off-by: ranshid <88133677+ranshid@users.noreply.github.com>
Co-authored-by: ranshid <88133677+ranshid@users.noreply.github.com>
Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>
2024-12-18 09:03:30 +02:00
Madelyn Olson
e203ca35b7
Fix undefined behavior defined by ASAN (#1451)
Asan now supports making sure you are passing in the correct pointer
type, which seems useful but we can't support it since we pass in an
incorrect pointer in several places. This is most commonly done with
generic free functions, where we simply cast it to the correct type.

It's not a lot of code to clean up, so it seems appropriate to cleanup
instead of disabling the check.

---------

Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
2024-12-17 17:48:53 -08:00
Viktor Szépe
b66698b887
Discover and fix new typos (#1446)
Upgrade `typos` and fix corresponding typos

---------

Signed-off-by: Viktor Szépe <viktor@szepe.net>
2024-12-17 17:45:43 -08:00
ranshid
ba25b586d5
Introduce FORCE_DEFRAG compilation option to allow activedefrag run when allocator is not jemalloc (#1303)
Introduce compile time option to force activedefrag to run even when
jemalloc is not used as the allocator.
This is in order to be able to run tests with defrag enabled
while using memory instrumentation tools.

fixes: https://github.com/valkey-io/valkey/issues/1241

---------

Signed-off-by: ranshid <ranshid@amazon.com>
Signed-off-by: Ran Shidlansik <ranshid@amazon.com>
Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
Signed-off-by: ranshid <88133677+ranshid@users.noreply.github.com>
Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>
2024-12-17 19:07:55 +02:00
xbasel
7892bf808b
Fix test_reclaimFilePageCache to avoid tmpfs (#1379)
Avoid tmpfs as fadvise(FADV_DONTNEED) has no effect on memory-backed
filesystems.

Fixes https://github.com/valkey-io/valkey/issues/897

---------

Signed-off-by: Ran Shidlansik <ranshid@amazon.com>
Signed-off-by: ranshid <88133677+ranshid@users.noreply.github.com>
Co-authored-by: ranshid <88133677+ranshid@users.noreply.github.com>
Co-authored-by: Ran Shidlansik <ranshid@amazon.com>
2024-12-17 18:04:27 +02:00
Roshan Khatri
980a801159
Fix the secrete for test bucket. (#1447)
We have set the secret as `AWS_S3_TEST_BUCKET` for test bucket and I
missed it in the initial review.

Signed-off-by: Roshan Khatri <rvkhatri@amazon.com>
2024-12-16 13:01:34 -08:00
Binbin
e024b4bd27
Drop the MEET packet if the link node is in handshake state (#1436)
After #1307 got merged, we notice there is a assert happen in setClusterNodeToInboundClusterLink:
```
=== ASSERTION FAILED ===
==> '!link->node' is not true
```

In #778, we will call setClusterNodeToInboundClusterLink to attach the node to the link
during the MEET processing, so if we receive a another MEET packet in a short time, the
node is still in handshake state, we will meet this assert and crash the server.

If the link is bound to a node and the node is in the handshake state, and we receive
a MEET packet, it may be that the sender sent multiple MEET packets so in here we are
dropping the MEET to avoid the assert in setClusterNodeToInboundClusterLink. The assert
will happen if the other sends a MEET packet because it detects that there is no inbound
link, this node creates a new node in HANDSHAKE state (with a random node name), and
respond with a PONG. The other node receives the PONG and removes the CLUSTER_NODE_MEET
flag. This node is supposed to open an outbound connection to the other node in the next
cron cycle, but before this happens, the other node re-sends a MEET on the same link
because it still detects no inbound connection.

Note that in getNodeFromLinkAndMsg, the node in the handshake state has a random name
and not truly "known", so we don't know the sender. Dropping the MEET packet can prevent
us from creating a random node, avoid incorrect link binding, and avoid duplicate MEET
packet eliminate the handshake state.

Signed-off-by: Binbin <binloveplay1314@qq.com>
2024-12-16 13:43:48 +08:00
Binbin
ad24220681
Automatic failover vote is not limited by two times the node timeout (#1356)
This is a follow of #1305, we now decided to apply the same change
to automatic failover as well, that is, move forward with removing
it for both automatic and manual failovers.

Quote from Ping during the review:
Note that we already debounce transient primary failures with node
timeout, ensuring failover is only triggered after sustained outages.
Election timing is naturally staggered by replica spacing, making the
likelihood of simultaneous elections from replicas of the same shard
very low. The one-vote-per-epoch rule further throttles retries and
ensures orderly elections. On top of that, quorum-based primary failure
confirmation, cluster-state convergence, and slot ownership validation
are all built into the process.

Quote from Madelyn during the review:
It against the specific primary. It's to prevent double failovers.
If a primary just took over we don't want someone else to try to
take over and give the new primary some amount of time to take over.
I have not seen this issue though, it might have been over optimizing?
The double failure mode, where a node fails and then another node fails
within the nodetimeout also doesn't seem that common either though.

So the conclusion is that we all agreed to remove it completely,
it will make the code a lot simpler. And if there is other specific
edge cases we are missing, we will fix it in other way.

See discussion #1305 for more information.

Signed-off-by: Binbin <binloveplay1314@qq.com>
2024-12-15 12:09:53 +08:00
Rain Valentine
88942c8e61
Replace dict with new hashtable for sets datatype (#1176)
The new `hashtable` provides faster lookups and uses less memory than
`dict`.

A TCL test case "SRANDMEMBER with a dict containing long chain" is
deleted because it's covered by a hashtable unit test
"test_random_entry_with_long_chain", which is already present.

This change also moves some logic from dismissMemory (object.c) to
zmadvise_dontneed (zmalloc.c), so the hashtable implementation which
needs the dismiss functionality doesn't need to depend on object.c and
server.h.

This PR follows #1186.

---------

Signed-off-by: Rain Valentine <rsg000@gmail.com>
Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
2024-12-14 20:53:48 +01:00
Madelyn Olson
0e96bb311e
Synchronously delete data during defrag tests (#1443)
The creation of fragmentation is delayed when we use lazy-free. You can
induce some of the active-defrag tests to fail by artificially adding a
delay in the lazyfree process, similar to the issues seen in #1433 and
issues like
https://github.com/valkey-io/valkey/actions/runs/12267010712/job/34226304803#step:7:6538.
The solution is to always do sync free during tests.

Might close https://github.com/valkey-io/valkey/issues/1433.

Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
2024-12-14 19:14:01 +01:00
Madelyn Olson
3cd176dc39
Avoid importing memory aligned malloc (#1442)
We deprecate the usage of classic malloc and free, but under certain
circumstances they might get imported from intrinsics. The original
thought is we should just override malloc and free to use zmalloc and
zfree, but I think we should continue to deprecate it to avoid
accidental imports of allocations.

Closes https://github.com/valkey-io/valkey/issues/1434.

---------

Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
2024-12-14 19:13:04 +01:00
Binbin
7d72fada2c
Fix wrong file name in build-release-packages.yml (#1437)
Introduced in #1363, the file name does not match.

Signed-off-by: Binbin <binloveplay1314@qq.com>
2024-12-13 14:26:20 -08:00
Binbin
d588bb4406
Skip build-release-packages CI job in forks (#1438)
The CI job was introduced in #1363, we should skip it in forks.

Signed-off-by: Binbin <binloveplay1314@qq.com>
2024-12-13 16:32:54 -05:00
Thalia Archibald
b60097ba07
Check length before reading in stringmatchlen (#1431)
Fixes four cases where `stringmatchlen` could overrun the pattern if it
is not terminated with NUL.

These commits are cherry-picked from my
[fork](https://github.com/thaliaarchi/antirez-stringmatch) which
extracts `stringmatch` as a library and compares it to other projects by
antirez which use the same matcher.

Signed-off-by: Thalia Archibald <thalia@archibald.dev>
2024-12-13 11:05:19 +01:00
Jim Brunner
32f2c73cb5
defrag: eliminate persistent kvstore pointer and edge case fixes (#1430)
This update addresses several issues in defrag:
1. In the defrag redesign
(https://github.com/valkey-io/valkey/pull/1242), a bug was introduced
where `server.cronloops` was no longer being incremented in the
`whileBlockedCron()`. This resulted in some memory statistics not being
updated while blocked.
2. In the test case for AOF loading, we were seeing errors due to defrag
latencies. However, running the math, the latencies are justified given
the extremely high CPU target of the testcase. Adjusted the expected
latency check to allow longer latencies for this case where defrag is
undergoing starvation while AOF loading is in progress.
3. A "stage" is passed a "target". For the main dictionary and expires,
we were passing in a `kvstore*`. However, on flushall or swapdb, the
pointer may change. It's safer and more stable to use an index for the
DB (a DBID). Then if the pointer changes, we can detect the change, and
simply abort the stage. (If there's still fragmentation to deal with,
we'll pick it up again on the next cycle.)
4. We always start a new stage on a new defrag cycle. This gives the new
stage time to run, and prevents latency issues for certain stages which
don't operate incrementally. However, often several stages will require
almost no work, and this will leave a chunk of our CPU allotment unused.
This is mainly an issue in starvation situations (like AOF loading or
LUA script) - where defrag is running infrequently, with a large
duty-cycle. This change allows a new stage to be initiated if we still
have a standard duty-cycle remaining. (This can happen during starvation
situations where the planned duty cycle is larger than the standard
cycle. Most likely this isn't a concern for real scenarios, but it was
observed in testing.)
5. Minor comment correction in `server.h`

Signed-off-by: Jim Brunner <brunnerj@amazon.com>
2024-12-12 14:55:57 -08:00
Roshan Khatri
3a1043a4f0
Fix Valkey binary build workflow, version support changes. (#1429)
This change makes the binary build on the target ubuntu version.

This PR also deprecated ubuntu18 and valkey will not support:

- X86:
  - Ubuntu 20
  - Ubuntu 22
  - Ubuntu 24
 - ARM:
   - Ubuntu 20
   - Ubuntu 22
   
Removed ARM ubuntu 24 as the action we are using for ARM builds does not
support Ubuntu 24.

---------

Signed-off-by: Roshan Khatri <rvkhatri@amazon.com>
2024-12-12 14:46:35 -08:00
Vu Diep
ab69a8a55d
Use configure-aws-credentials workflow instead of passing secret_access_key (#1363)
## Summary
This PR fixes #1346 where we can get rid of the long term credentials by
using OpenID Connect. OpenID Connect (OIDC) allows your GitHub Actions
workflows to access resources in Amazon Web Services (AWS), without
needing to store the AWS credentials as long-lived GitHub secrets.

---------

Signed-off-by: vudiep411 <vdiep@amazon.com>
2024-12-12 14:42:52 -08:00
ranshid
2d92404522
Avoid defragging scripts during EVAL command execution (#1414)
This can happen when scripts are running for long period of time and the server attempts to defrag it in the whileBlockedCron.

Signed-off-by: Ran Shidlansik <ranshid@amazon.com>
2024-12-12 13:52:58 -08:00
Pierre
5f7fe9ef21
Send MEET packet to node if there is no inbound link to fix inconsistency when handshake timedout (#1307)
In some cases, when meeting a new node, if the handshake times out, we
can end up with an inconsistent view of the cluster where the new node
knows about all the nodes in the cluster, but the cluster does not know
about this new node (or vice versa).
To detect this inconsistency, we now check if a node has an outbound
link but no inbound link, in this case it probably means this node does
not know us. In this case we (re-)send a MEET packet to this node to do
a new handshake with it.
If we receive a MEET packet from a known node, we disconnect the
outbound link to force a reconnect and sending of a PING packet so that
the other node recognizes the link as belonging to us. This prevents
cases where a node could send MEET packets in a loop because it thinks
the other node does not have an inbound link.

This fixes the bug described in #1251.

---------

Signed-off-by: Pierre Turin <pieturin@amazon.com>
2024-12-11 17:26:06 -08:00
Jim Brunner
0c8ad5cd34
defrag: allow defrag to start during AOF loading (#1420)
Addresses https://github.com/valkey-io/valkey/issues/1393

Changes:
* During AOF loading or long running script, this allows defrag to be
initiated.
* The AOF defrag test was corrected to eliminate the wait period and
rely on non-timer invocations.
* Logic for "overage" time in defrag was changed. It previously
accumulated underage leading to large latencies in extreme tests having
very high CPU percentage. After several simple stages were completed
during infrequent blocked processing, a large cycle time would be
experienced.

Signed-off-by: Jim Brunner <brunnerj@amazon.com>
2024-12-11 19:47:06 +02:00
Binbin
1acf7f71c0
Fix memory leak in the new hashtable unittest (#1421)
There is a leak in here, hashtableTwoPhasePopDelete won't call the entry
destructor and like hashtablePop we need to call it by myself.

Signed-off-by: Binbin <binloveplay1314@qq.com>
2024-12-11 06:40:18 +01:00
Viktor Söderqvist
3eb8314be6 Replace dict with hashtable for keys, expires and pubsub channels
Instead of a dictEntry with pointers to key and value, the hashtable
has a pointer directly to the value (robj) which can hold an embedded
key and acts as a key-value in the hashtable. This minimizes the number
of pointers to follow and thus the number of memory accesses to lookup
a key-value pair.

        Keys         robj
      hashtable
      +-------+   +-----------------------+
      | 0     |   | type, encoding, LRU   |
      | 1 ------->| refcount, expire      |
      | 2     |   | ptr                   |
      | ...   |   | optional embedded key |
      +-------+   | optional embedded val |
                  +-----------------------+

The expire timestamp (TTL) is also stored in the robj, if any. The expire
hash table points to the same robj.

Overview of changes:

* Replace dict with hashtable in kvstore (kvstore.c)
* Add functions for embedding key and expire in robj (object.c)
  * When there's unused space, reserve an expire field to avoid realloting
    it later if expire is added.
  * Always reserve space for expire for large key names to avoid realloc
    if it's set later.
* Update db functions (db.c)
  * dbAdd, setKey and setExpire reallocate the object when embedding a key
  * setKey does not increment the reference counter, since it would require
    duplicating the object. This responsibility is moved to the caller.
* Remove logic for shared integer objects as values in the database. The keys
  are now embedded in the objects, so all objects in the database need to be
  unique. Thus, we can't use shared objects as values. Also delete test cases
  for shared integers.
* Adjust various commands to the changes mentioned above.
* Adjust defrag code
  * Improvement: Don't access the expires table before defrag has actually
    reallocated the object.
* Adjust test cases that were using hard-coded sizes for dict when realloc
  would happen, and some other adjustments in test cases.
* Adjust memory prefetch for new hash table implementation in IO-threading,
  using new `hashtableIncrementalFind` API
* Adjust offloading of free() to IO threads: Object free to be done in main
  thread while keeping obj->ptr offloading in IO-thread since the DB object is
  now allocated by the main-thread and not by the IO-thread as it used to be.
* Let expireIfNeeded take an optional value, to avoid looking up the expires
  table when possible.

---------

Signed-off-by: Uri Yagelnik <uriy@amazon.com>
Signed-off-by: uriyage <78144248+uriyage@users.noreply.github.com>
Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
Co-authored-by: Uri Yagelnik <uriy@amazon.com>
2024-12-10 21:30:56 +01:00
Rain Valentine
4efff42f04 Replace dict with hashtable in command tables (#1065)
This changes the type of command tables from dict to hashtable. Command
table lookup takes ~3% of overall CPU time in benchmarks, so it is a
good candidate for optimization.

My initial SET benchmark comparison suggests that hashtable is about 4.5
times faster than dict and this replacement reduced overall CPU time by
2.79% 🥳

---------

Signed-off-by: Rain Valentine <rainval@amazon.com>
Signed-off-by: Rain Valentine <rsg000@gmail.com>
Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
Co-authored-by: Rain Valentine <rainval@amazon.com>
2024-12-10 21:30:56 +01:00
Viktor Söderqvist
c8ee5c2c46 Hashtable implementation including unit tests
A cache-line aware hash table with a user-defined key-value entry type,
supporting incremental rehashing, scan, iterator, random sampling,
incremental lookup and more...

Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
2024-12-10 21:30:56 +01:00
Viktor Söderqvist
b4c2a1804a
Fix flaky init_test proc in maxmemory test suite (#1419)
The following error has been seen, but not reliably reproduced:

```
*** [err]: eviction due to output buffers of pubsub, client eviction: true in tests/unit/maxmemory.tcl
Expected '42' to be equal to '50' (context: type proc line 17 cmd {assert_equal [r dbsize] 50} proc ::init_test level 2)
```

The reason is probably that FLUSHDB is asynchronous and when we start
populating new keys, they are evicted because the background flush is
too slow. Changing this to FLUSHDB SYNC prevents this.

Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
2024-12-10 20:52:06 +02:00
Binbin
7e564887b9
Set HIDDEN_CONFIG flag on events-per-io-thread (#1408)
events-per-io-thread is for testing purposes that allow us to force the
main thread to always offload the works to the IO threads, see
adjustIOThreadsByEventLoad for more details.

Signed-off-by: Binbin <binloveplay1314@qq.com>
2024-12-11 00:37:18 +08:00
Viktor Söderqvist
2dfe25b408
Fix race in test "CLUSTER SLOT-STATS cpu-usec for blocking commands, unblocked on timeout" (#1416)
This fix changes the timeout for BLPOP in this test case from 1 second
to 0.5 seconds.

In the test case quoted below, the procedure
`wait_for_blocked_clients_count` waits for one second by default. If
BLPOP has 1 second timeout and the first
`wait_for_blocked_clients_count` finishes very fast, then the second
`wait_for_blocked_clients_count` can time out before the BLPOP has been
unblocked.

```TCL
    test "CLUSTER SLOT-STATS cpu-usec for blocking commands, unblocked on timeout." {
        # Blocking command with 1 second timeout.
        set rd [valkey_deferring_client]
        $rd BLPOP $key 1

        # Confirm that the client is blocked, then unblocked after 1 second timeout.
        wait_for_blocked_clients_count 1
        wait_for_blocked_clients_count 0
```

As seen in the definition of `wait_for_blocked_clients_count`, the total
time to wait is 1 second by default.

```TCL
proc wait_for_blocked_clients_count {count {maxtries 100} {delay 10} {idx 0}} {
    wait_for_condition $maxtries $delay  {
        [s $idx blocked_clients] == $count
    } else {
        fail "Timeout waiting for blocked clients"
    }
}
```

Fixes #1121

Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
2024-12-10 14:46:21 +01:00
Yanqi Lv
f951a1ca73
Add new flag in CLIENT LIST for import-source client (#1398)
- Add new flag "I" in `CLIENT LIST` for import-source client
- Add `DEBUG_CONFIG` for import-mode
- Allow import-source status to be turned off when import-mode is off

Fixes #1350 and
https://github.com/valkey-io/valkey/pull/1185#discussion_r1851049362.

---------

Signed-off-by: lvyanqi.lyq <lvyanqi.lyq@alibaba-inc.com>
Signed-off-by: Yanqi Lv <lvyanqi.lyq@alibaba-inc.com>
Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
Co-authored-by: Binbin <binloveplay1314@qq.com>
2024-12-10 13:35:07 +01:00
Sarthak Aggarwal
9cfe1b3d81
Set Command with IFEQ Support (#1324)
This PR allows the Valkey users to perform conditional updates where the
SET command is completed if the given comparison-value matches the key’s
current value.

Syntax:

```
SET key value IFEQ comparison-value
```

Behavior:

If the values match, the SET completes as expected. If they do not
match, the command returns a (nil), except if the GET argument is also
given (see below).

Behavior with Additional Flags:

1. ```SET key value IFEQ comparison-value GET``` returns the existing
value, regardless of whether it matches comparison-value or not. The
conditional set operation is performed if the given comparison value
matches the existing value. To check if the SET succeeded, the caller
needs to check if the returned string matches the comparison-value.
2. ```SET key value IFEQ comparison-value XX``` is a syntax error.
3.  ```SET key value IFEQ comparison-value NX``` is a syntax error.

Closes: #1215

---------

Signed-off-by: Sarthak Aggarwal <sarthagg@amazon.com>
2024-12-10 12:54:49 +01:00
Madelyn Olson
4f61034934
Update governance and maintainers file for Valkey committers (#1390)
We added two more committers, but according to our governance document
that makes them TSC members. As we discussed, for now we want to keep
the balance of corporate interests, so so updating the governance to
explicitly list TSC members compared to folks with just write
permissions.

Also adds the new new folks with commit permissions.

---------

Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
2024-12-09 12:28:17 -08:00
Binbin
1ba85d002a
Use binary representation in assert crash log, cleanup in crash log (#1410)
Change assert crash log to also use binary representation like 5bdd72bea77d4bb237441c9a671e80edcdc998ad.
And do not print the password in assert crash log like 56eef6fb5ab7a755485c19f358761954ca459472.

In addition, for 5bdd72bea77d4bb237441c9a671e80edcdc998ad, we will print '"argv"',
because originally the code would print a '', and sdscatrepr will add an extra "",
so now removing the extra '' here.

Extract the getArgvReprString method and clean up the code a bit.

Examples:
```
debug assert "\x00abc"

before:
client->argv[0] = "debug" (refcount: 1)
client->argv[1] = "assert" (refcount: 1)
client->argv[2] = "" (refcount: 1)

after:
client->argv[0] = "debug" (refcount: 1)
client->argv[1] = "assert" (refcount: 1)
client->argv[2] = "\x00abc" (refcount: 1)

debug panic "\x00abc"

before:
argc: '3'
argv[0]: '"debug"'
argv[1]: '"panic"'
argv[2]: '"\x00abc"'

after:
argc: 3
argv[0]: "debug"
argv[1]: "panic"
argv[2]: "\x00abc"
```

Signed-off-by: Binbin <binloveplay1314@qq.com>
2024-12-10 00:37:04 +08:00
ranshid
5be4ce6d27
Optimize ZRANK to avoid path comparisons (#1389)
ZRANK is a widly used command for workloads using sorted-sets. For
example, in leaderboards It enables query the specific rank of a player.
The way ZRANK is currently implemented is:

1. locate the element in the SortedSet hashtable.
2. take the score of the element and use it in order to locate the
element in the SkipList (when listpack encoding is not used)
3. During the SkipLis scan for the elemnt we keep the path and use it in
order to sum the span in each path node in order to calculate the elemnt
rank

One problem with this approach is that it involves multiple compare
operations in order to locate the element. Specifically string
comparison can be expensive since it will require access multiple memory
locations for the items the element string is compared against.
Perf analysis showed this can take up to 20% of the rank scan time. (TBD
- provide the perf results for example)

We can improve the rank search by taking advantage of the fact that the
element node in the skiplist is pointed by the hashtable value!
Our Skiplist implementation is using FatKeys, where each added node is
assigned a randomly chosen height. Say we keep a height record for every
skiplist element. In order to get an element rank we simply:

1. locate the element in the SortedSet hashtable.
2. we go directly to the node in the skiplist.
3. we jump to the full height of the node and take the span value.
4. we continue going foreward and always jump to the heighst point in
each node we get to, making sure to sum all the spans.
5. we take off the summed spans from the SkipList length and we now have
the specific node rank. :)

In order to test this method I created several benchmarks. All
benchmarks used the same seeds and the lists contained 1M elements.
Since a very important factor is the number of scores compared to the
number of elements (since small ratio means more string compares during
searches) each benchmark test used different number of scores (1, 10K,
100K, 1M)
some results:

**TPS**

Scores range | non-optimized | optimized | gain
-- | -- | -- | --
1 | 416042 | 605363 | 45.51%
10K | 359776 | 459200 | 27.63%
100K | 380387 | 459157 | 20.71%
1M | 416059 | 450853 | 8.36%

**Latency**

Scores range | non-optimized | optimized | gain
-- | -- | -- | --
1 | 1.191000 | 0.831000 | -30.23%
10K | 1.383000 | 1.095000 | -20.82%
100K | 1.311000 | 1.087000 | -17.09%
1M | 1.191000 | 1.119000 | -6.05%

###  Memory efficiency

adding another field to each skiplist node can cause degredation in
memory efficiency for large sortedsets. We use the fact that level 0
recorded span of ALL nodes can either be 1 or zero (for the last node).
So we use wrappers in order to get a node span and override the span for
level 0 to hold the node height.

---------

Signed-off-by: Ran Shidlansik <ranshid@amazon.com>
2024-12-09 15:48:46 +01:00
Binbin
924729eb16
Fix the election was reset wrongly before failover epoch was obtained (#1339)
After #1009, we will reset the election when we received
a claim with an equal or higher epoch since a node can win
an election in the past.

But we need to consider the time before the node actually
obtains the failover_auth_epoch. The failover_auth_epoch
default is 0, so before the node actually get the failover
epoch, we might wrongly reset the election.

This is probably harmless, but will produce misleading log
output and may delay election by a cron cycle or beforesleep.
Now we will only reset the election when a node is actually
obtains the failover epoch.

Signed-off-by: Binbin <binloveplay1314@qq.com>
2024-12-09 16:19:02 +08:00
Roman Gershman
b09db3ef78
Fix typo in streams seen-time / active-time test (#1409)
This variable name is wrong, it causes the wrong variable to be asserted.

Signed-off-by: Roman Gershman <romange@gmail.com>
2024-12-09 16:01:43 +08:00
Guillaume Koenig
e8078b7315
Allow MEMORY MALLOC-STATS and MEMORY PURGE during loading phase (#1317)
- Enable investigation of memory issues during loading
- Previously, all memory commands were rejected with LOADING error
(except memory help)
- `MEMORY MALLOC-STATS` and `MEMORTY PURGE` are now allowed
as they don't depend on the dataset
- `MEMORY STATS` and `MEMORY USAGE KEY` remain disallowed

Fixes #1299

Signed-off-by: Guillaume Koenig <knggk@amazon.com>
Signed-off-by: Binbin <binloveplay1314@qq.com>
Co-authored-by: Binbin <binloveplay1314@qq.com>
2024-12-08 20:30:07 +08:00
Binbin
176fafcaf7
Add a note to conf about the dangers of modifying dir at runtime (#887)
We've had security issues in the past with it, which is why
we marked it as PROTECTED. But, modifying during runtime
is also a dangerous action. For example, when child processes
are running, persistent temp files and log files may have
unexpected effects.

A scenario for modifying dir at runtime is to migrate a disk
failure, such as using disk-based replication to migrate a node,
writing nodes.conf to save the cluster configuration.

We decided to leave it as is and add a note in the conf
about the dangers of modifying dir at runtime.

Signed-off-by: Binbin <binloveplay1314@qq.com>
2024-12-08 20:28:14 +08:00
Viktor Söderqvist
f20d629dbe
Fix sanitizer builds with clang (#1402)
By including <stdatomic.h> after the other includes in the unit test, we
can avoid redefining a macro which led to a build failure.

Fixes #1394

---------

Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
2024-12-07 10:26:31 +01:00
Viktor Söderqvist
a2fe6af457
Fix Module Update Args test when other modules are loaded (#1403)
Fixes #1400

Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
2024-12-07 10:25:40 +01:00
Caiyi Wu
6df376d68a
Fix coredump when use hellodict example module (#1395)
In the ValkeyModule_OnLoad method of the file hellodict.c, the parameter
keystep of ValkeyModule_CreateCommand should be 1. Otherwise, execute
command will coredump.

    MODULE LOAD /home/tiger/valkey/src/modules/hellodict.so
    COMMAND GETKEYS HELLODICT.SET key value

Signed-off-by: Codebells <1347103071@qq.com>
2024-12-05 20:01:38 +01:00
风去幽墨
6b3e1228cd
RDMA: Fix dead loop when transfer large data (20KB) (#1386)
Determine the status of the Client when attempting to read data. If
state=CLIENT_COMPLETED_IO, no read attempt is made and I/O operations on
the Client are rescheduled by the main thread.

> And 20474 Byte = PROTO_IOBUF_LEN(16KB) + SDS_HDR_VAR(16, s)(4090 Byte)

Fixes #1385

---------

Signed-off-by: fengquyoumo <1455117463@qq.com>
2024-12-05 18:26:56 +01:00
Wen Hui
71560a2a4a
Add API UpdateRuntimeArgs for updating the module arguments during runtime (#1041)
Before Redis OSS 7, if we load a module with some arguments during
runtime,
and run the command "config rewrite", the module information will not be
saved into the
config file.

Since Redis OSS 7 and Valkey 7.2, if we load a module with some
arguments during runtime,
the module information (path, arguments number, and arguments value) can
be saved into the config file after config rewrite command is called.
Thus, the module will be loaded automatically when the server startup
next time.

Following is one example:

bind 172.25.0.58
port 7000
protected-mode no
enable-module-command yes

Generated by CONFIG REWRITE
latency-tracking-info-percentiles 50 99 99.9
dir "/home/ubuntu/valkey"
save 3600 1 300 100 60 10000
user default on nopass sanitize-payload ~* &* +https://github.com/ALL
loadmodule tests/modules/datatype.so 10 20

However, there is one problem.
If developers write a module, and update the running arguments by
someway, the updated arguments can not be saved into the config file
even "config rewrite" is called.
The reason comes from the following function
rewriteConfigLoadmoduleOption (src/config.c)

void rewriteConfigLoadmoduleOption(struct rewriteConfigState *state) {
..........
struct ValkeyModule *module = dictGetVal(de);
line = sdsnew("loadmodule ");
line = sdscatsds(line, module->loadmod->path);
for (int i = 0; i < module->loadmod->argc; i++) {
line = sdscatlen(line, " ", 1);
line = sdscatsds(line, module->loadmod->argv[i]->ptr);
}
rewriteConfigRewriteLine(state, "loadmodule", line, 1);
.......
}

The function only save the initial arguments information
(module->loadmod) into the configfile.

After core members discuss, ref
https://github.com/valkey-io/valkey/issues/1177


We decide add the following API to implement this feature:

Original proposal:

int VM_UpdateRunTimeArgs(ValkeyModuleCtx *ctx, int index, char *value);


Updated proposal:

ValkeyModuleString **values VM_GetRuntimeArgs(ValkeyModuleCtx *ctx);
**int VM_UpdateRuntimeArgs(ValkeyModuleCtx *ctx, int argc,
ValkeyModuleString **values);



Why we do not recommend the following way: 


MODULE UNLOAD
Update module args in the conf file
MODULE LOAD

I think there are the following disadvantages:

1. Some modules can not be unloaded. Such as the example module
datatype.so, which is tests/modules/datatype.so
2. it is not atomic operation for MODULE UNLOAD + MODULE LOAD
3. sometimes, if we just run the module unload, the client business
could be interrupted

---------

Signed-off-by: hwware <wen.hui.ware@gmail.com>
2024-12-05 11:58:24 -05:00
Madelyn Olson
a401e3789d
Update code of conduct maintainers email address (#1391)
Updating code of conduct maintainer's email address

Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
2024-12-04 10:33:14 -08:00
zhenwei pi
105509cdad
Run RDMA builtin in CI workflow (#1380)
Since 4695d118dd (#1209), RDMA supports builtin.
And module connection type may be removed in future. So run a builtin
RDMA support for CI workflow.

RDMA module is complied only in CI, keep it building check only until
module connection type gets obsolete.

Signed-off-by: zhenwei pi <pizhenwei@bytedance.com>
2024-12-03 23:09:56 +01:00
Jim Brunner
349bc7547b
defrag: use monotime in module interface (#1388)
The recent PR (https://github.com/valkey-io/valkey/pull/1242) converted
Active Defrag to use `monotime`. In that change, a conversion was
performed to continue to use `ustime()` as part of the module interface.
Since this time is only used internally, and never actually exposed to
the module, we can convert this to use `monotime` directly.

Signed-off-by: Jim Brunner <brunnerj@amazon.com>
2024-12-03 11:19:53 -08:00
uriyage
9f8b174c2e
Optimize IO thread offload for modified argv (#1360)
### Improve expired commands performance with IO threads

#### Background
In our IO threads architecture, IO threads allocate client argv's and
later when we free it after processCommand we offload its free to the IO
threads.
With jemalloc, it's crucial that the same thread that allocates memory
also frees it.

For some commands we modify the client's argv in the main thread during
command processing (for example in `SET EX` command we rewrite the
command to use absolute time for replication propagation).

#### Current issues
1. When commands are rewritten (e.g., expire commands), we store the
original argv
   in `c->original_argv`. However, we're currently:
   - Freeing new argv (allocated by main thread) in IO threads
   - Freeing original argv (allocated by IO threads) in main thread
2. Currently, `c->original_argv` points to new array with old 
objects, while `c->argv` has old array with new objects, making memory
free management complicated.

#### Changes
1. Refactored argv modification handling code to ensure consistency -
both array and objects are now either all new or all old
2. Moved original_argv cleanup to happen in resetClient after argv
cleanup
3. Modified IO threads code to properly handle original argv cleanup
when argv are modified.

#### Performance Impact
Benchmark with `SET EX` commands (650 clients, 512 byte value, 8 IO
threads):
- New implementation: **729,548 ops/sec**
- Old implementation: **633,243 ops/sec**
Representing a **~15%** performance improvement due to more efficient
memory handling.

---------

Signed-off-by: Uri Yagelnik <uriy@amazon.com>
Signed-off-by: ranshid <88133677+ranshid@users.noreply.github.com>
Co-authored-by: ranshid <88133677+ranshid@users.noreply.github.com>
2024-12-03 19:20:31 +02:00
Jim Brunner
397201c48f
Refactor of ActiveDefrag to reduce latencies (#1242)
Refer to:  https://github.com/valkey-io/valkey/issues/1141

This update refactors the defrag code to:
* Make the overall code more readable and maintainable
* Reduce latencies incurred during defrag processing

With this update, the defrag cycle time is reduced to 500us, with more
frequent cycles. This results in much more predictable latencies, with a
dramatic reduction in tail latencies.

(See https://github.com/valkey-io/valkey/issues/1141 for more complete
details.)

This update is focused mostly on the high-level processing, and does NOT
address lower level functions which aren't currently timebound (e.g.
`activeDefragSdsDict()`, and `moduleDefragGlobals()`). These are out of
scope for this update and left for a future update.

I fixed `kvstoreDictLUTDefrag` because it was using up to 7ms on a CME
single shard. See original github issue for performance details.

---------

Signed-off-by: Jim Brunner <brunnerj@amazon.com>
Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>
2024-12-03 08:42:29 -08:00
Nugine
3df609ef06
Optimize PFCOUNT, PFMERGE command by SIMD acceleration (#1293)
This PR optimizes the performance of HyperLogLog commands (PFCOUNT,
PFMERGE) by adding AVX2 fast paths.

Two AVX2 functions are added for conversion between raw representation
and dense representation. They are 15 ~ 30 times faster than scalar
implementaion. Note that sparse representation is not accelerated.

AVX2 fast paths are enabled when the CPU supports AVX2 (checked at
runtime) and the hyperloglog configuration is default (HLL_REGISTERS ==
16384 && HLL_BITS == 6).

`PFDEBUG SIMD (ON|OFF)` subcommand is added for unit tests. A new TCL
unit test checks that the results produced by non-AVX2 and AVX2
implementations are exactly equal.

When merging 3 dense hll structures, the benchmark shows a 12x speedup
compared to the scalar version.

```
pfcount key1 key2 key3
pfmerge keyall key1 key2 key3
```

```
======================================================================================================
Type             Ops/sec    Avg. Latency     p50 Latency     p99 Latency   p99.9 Latency       KB/sec 
------------------------------------------------------------------------------------------------------
PFCOUNT-scalar    5665.56        35.29839        32.25500        63.99900        67.58300       608.60
PFCOUNT-avx2     72377.83         2.75834         2.67100         5.34300         6.81500      7774.96
------------------------------------------------------------------------------------------------------
PFMERGE-scalar    9851.29        20.28806        20.09500        36.86300        39.16700       615.71
PFMERGE-avx2    125621.89         1.59126         1.55100         3.11900         4.70300     15702.74
------------------------------------------------------------------------------------------------------

scalar: valkey:unstable  2df56d87c0ebe802f38e8922bb2ea1e4ca9cfa76
avx2:   Nugine:hll-simd  8f9adc34021080d96e60bd0abe06b043f3ed0275

CPU:    13th Gen Intel® Core™ i9-13900H × 20
Memory: 32.0 GiB
OS:     Ubuntu 22.04.5 LTS
```

Experiment repo: https://github.com/Nugine/redis-hyperloglog
Benchmark script:
https://github.com/Nugine/redis-hyperloglog/blob/main/scripts/memtier.sh
Algorithm:
https://github.com/Nugine/redis-hyperloglog/blob/main/cpp/bench.cpp

---------

Signed-off-by: Xuyang Wang <xuyangwang@link.cuhk.edu.cn>
2024-12-02 19:40:38 +01:00
Binbin
fbbfe5d3d3
Print logs when the cluster state changes to fail or the fail reason changes (#1188)
This log allows us to easily distinguish between full coverage and
minority partition when the cluster fails. Sometimes it is not easy
to see the minority partition in a healthy shards (both primary and
replicas).

And we decided not to add a cluster_fail_reason field to cluster info.
Given that there are only two reasons and both are well-known and if
we ended up adding more down the road we can add it in the furture.

Signed-off-by: Binbin <binloveplay1314@qq.com>
2024-12-02 15:55:24 +08:00
Vadym Khoptynets
90475af594
Free strings during BGSAVE/BGAOFRW to reduce copy-on-write (#905)
**Motivation**

Copy-on-write (COW) amplification refers to the issue where writing to a
small object leads to the entire page being cloned, resulting in
inefficient memory usage. This issue arises during the BGSAVE process,
which can be particularly problematic on instances with limited memory.
If the BGSAVE process could release unneeded memory, it could reduce
memory consumption. To address this, the BGSAVE process calls the
`madvise` function to signal the operating system to reclaim the buffer.
However, this approach does not work for buffers smaller than a page
(usually 4KiB). Even after multiple such calls, where a full page may be
free, the operating system will not reclaim it.
To solve this issue, we can call `zfree` directly. This allows the
allocator (jemalloc) to handle the bookkeeping and release pages when
buffers are no longer needed. This approach reduces copy-on-write
events.

**Benchmarks**
To understand how usage of `zfree` affects BGSAVE and the memory
consumption I ran 45 benchmarks that compares my clonewith the vanilla
version. The benchmark has the following steps:
1. Start a new Valkey process
2. Fill the DB with data sequentially
3. Run a warmup to randomize the memory layout
4. Introduce fragmentation by deleting part of the keys
5. In parallel:
    1. Trigger BGSAVE
    2. Start 80/20 get/set load

I played the following parameters to understand their influence:

1. Number of keys: 3M, 6M, and 12M.
2. Data size. While key themselves are of fixed length ~30 bytes, the
value size is 120, 250, 500, 1000, and 2000 bytes.
3. Fragmentation. I delete 5%, 10%, and 15% of the original key range.

I'm attaching a graph of BGSAVE process memory consumption. Instead of
all benchmarks, I show the most representative runs IMO.

<img width="1570" alt="3m-fixed"
src="https://github.com/user-attachments/assets/3dbbc528-01c1-4821-a3c2-6be455e7f78a">


For 2000 bytes values peak memory usage is ~53% compared to vanilla. The
peak happens at 57% BGSAVE progress.
For 500 bytes values the peak is ~80% compared to vanilla. And happens
at ~80% progress.
For 120 bytes the difference is under 5%, and the patched version could
even use more memory.



![500b-fixed](https://github.com/user-attachments/assets/b09451d3-4bce-4f33-b3db-2b5df2178ed2)


For 12M keys, the peak is ~85% of the vanilla’s. Happens at ~70% mark.
For 6M keys, the peak is ~87% of the vanilla’s. Happens at ~77% mark.
For 3M keys, the peak is ~87% of the vanilla’s Happens at ~80% mark.

**Changes**

The PR contains 2 changes:
1. Static buffer for RDB comrpession.
RDB compression leads to COW events even without any write load if we
use `zfree`. It happens because the compression functions allocates a
new buffer for each object. Together with freeing objects with `zfree`
it leads to reusing of the memory shared with the main process.
To deal with this problem, we use a pre-allocated constant 8K buffer for
compression. If the object size is too big for this buffer, than we fall
back to the ad hoc allocation behavior.

2. Freeing string objects instead of dismissing them
Call to `zfree` is more expensive than direct call to `madvise`. But
with #453 strings use the fast path – `zfree_with_size`. As a possible
next step we can optimize `zfree` for other data types as well.

---------

Signed-off-by: Vadym Khoptynets <vadymkh@amazon.com>
Signed-off-by: ranshid <88133677+ranshid@users.noreply.github.com>
Co-authored-by: ranshid <88133677+ranshid@users.noreply.github.com>
Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
2024-12-01 17:12:27 +02:00
Amit Nagler
7043ef0bbb
Split dual-channel COB overrun tests to separate servers (#1374)
1. The test isn't waiting long enough for the output buffer to overrun.
This problem is happening because an error from the previous test is
bleeding into the current test's logs. The simplest fix would be to
split these tests.
2. Increased replication timeout to ensure sync fails due to output
buffer overrun before a timeout occurs.

Fixes #1367

Signed-off-by: naglera <anagler123@gmail.com>
2024-12-01 21:33:43 +08:00
Binbin
9c48f56790
Reset repl_down_since to zero only on state change (#1149)
We should reset repl_down_since only on state change, in the
current code, if the rdb channel in the dual channel is normal,
that is, rdb is loaded normally, but the psync channel is
abnormal, we will set repl_down_since 0 here. If the primary
is down at this time, the replica may be abnormal when calculating
data_age in cluster failover, since repl_state != REPL_STATE_CONNECTED,
this causes the replica to be unable to initiate an election due
to the old data_age.

In dualChannelSyncHandleRdbLoadCompletion, if the psync channel
is not established, the function will return. We will set repl_state
to REPL_STATE_CONNECTED and set repl_down_since to 0 in
dualChannelSyncSuccess, that is, in establishPrimaryConnection.

See also 677d10b2a8ff7f13033ccfe56ffcd246dbe70fb6 for more details.

Signed-off-by: Binbin <binloveplay1314@qq.com>
2024-12-01 21:33:21 +08:00
Stav Ben-Tov
c8ceb2ee25
Use zfree_with_size for client buffer (#1376)
Replace occurrences of 'zfree' with 'zfree_with_size' to improve
performance.
'zfree_with_size' function avoids calling 'zmalloc_size' to retrieve
buffer size and
uses previuos calculation of size for calling 'zfree_with_size'. This
results in faster
memory deallocation and reduces overhead.

Signed-off-by: stav bentov <stavbt@amazon.com>
Co-authored-by: stav bentov <stavbt@amazon.com>
2024-12-01 12:24:18 +01:00
zhenwei pi
4695d118dd
RDMA builtin support (#1209)
There are several patches in this PR:

* Abstract set/rewrite config bind option: `bind` option is a special
config, `socket` and `tls` are using the same one. However RDMA uses the
similar style but different one. Use a bit abstract work to make it
flexible for both `socket` and `RDMA`. (Even for QUIC in the future.)
* Introduce closeListener for connection type: closing socket by a
simple syscall would be fine, RDMA has complex logic. Introduce
connection type specific close listener method.
* RDMA: Use valkey.conf style instead of module parameters: use
`--rdma-bind` and `--rdma-port` style instead of module parameters. The
module style config `rdma.bind` and `rdma.port` are removed.
* RDMA: Support builtin: support `make BUILD_RDMA=yes`. module style is
still kept for now.

Signed-off-by: zhenwei pi <pizhenwei@bytedance.com>
2024-11-29 11:13:34 +01:00
zvi-code
fd58f8d058
Disable lazy free in defrag test to fix 32bit daily failure (#1370)
Signed-off-by: Zvi Schneider <zvi.schneider22@gmail.com>
Co-authored-by: Zvi Schneider <zvi.schneider22@gmail.com>
2024-11-28 16:27:00 +01:00
Binbin
a939cb88ee
Handle keyIsExpiredWithDictIndex to make it check for import mode (#1368)
In #1326 we make KEYS can visit expired key in import-source state
by updating keyIsExpired to check for import mode. But after #1205,
we now use keyIsExpiredWithDictIndex to optimize and remove the
redundant dict_index, and keyIsExpiredWithDictIndex does not handle
this logic.

In this commit, we handle keyIsExpiredWithDictIndex to make it check
for import mode as well so that KEYS can visit the expired key.

Signed-off-by: Binbin <binloveplay1314@qq.com>
2024-11-28 14:10:48 +08:00
Binbin
db7b7396ff
Make KEYS can visit expired key in import-source state (#1326)
After #1185, a client in import-source state can visit expired key
both in read commands and write commands, this commit handle
keyIsExpired function to handle import-source state as well, so
KEYS can visit the expired key.

This is not particularly important, but it ensures the definition,
also doing some cleanup around the test, verified that the client
can indeed visit the expired key.

Signed-off-by: Binbin <binloveplay1314@qq.com>
2024-11-28 00:16:55 +08:00
Binbin
5d08149e72
Use fake client flag to replace not conn check (#1198)
The fake client flag was introduced in #1063, 
we want this to replace all !conn fake client checks.

Signed-off-by: Binbin <binloveplay1314@qq.com>
2024-11-27 18:02:07 +08:00
ranshid
66ae8b7135
change the container image to ubuntu:plucky (#1359)
Our fortify workflow is running on ubuntu lunar container that is EOL
since [January 25, 2024(January 25,
2024](https://lists.ubuntu.com/archives/ubuntu-announce/2024-January/000298.html).
This case cause the workflow to fail during update actions like:
```
apt-get update && apt-get install -y make gcc-13
  update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-1[3](https://github.com/valkey-io/valkey/actions/runs/12021130026/job/33547460209#step:5:3) 100
  make all-with-unit-tests CC=gcc OPT=-O3 SERVER_CFLAGS='-Werror -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=3'
  shell: sh -e {0}
Ign:1 http://security.ubuntu.com/ubuntu lunar-security InRelease
Err:2 http://security.ubuntu.com/ubuntu lunar-security Release
  [4](https://github.com/valkey-io/valkey/actions/runs/12021130026/job/33547460209#step:5:4)04  Not Found [IP: 91.189.91.82 80]
Ign:3 http://archive.ubuntu.com/ubuntu lunar InRelease
Ign:4 http://archive.ubuntu.com/ubuntu lunar-updates InRelease
Ign:[5](https://github.com/valkey-io/valkey/actions/runs/12021130026/job/33547460209#step:5:5) http://archive.ubuntu.com/ubuntu lunar-backports InRelease
Err:[6](https://github.com/valkey-io/valkey/actions/runs/12021130026/job/33547460209#step:5:7) http://archive.ubuntu.com/ubuntu lunar Release
  404  Not Found [IP: 185.125.190.81 80]
Err:7 http://archive.ubuntu.com/ubuntu lunar-updates Release
  404  Not Found [IP: 185.125.190.81 80]
Err:8 http://archive.ubuntu.com/ubuntu lunar-backports Release
  404  Not Found [IP: 185.125.190.81 80]
Reading package lists...
E: The repository 'http://security.ubuntu.com/ubuntu lunar-security Release' does not have a Release file.
E: The repository 'http://archive.ubuntu.com/ubuntu lunar Release' does not have a Release file.
E: The repository 'http://archive.ubuntu.com/ubuntu lunar-updates Release' does not have a Release file.
E: The repository 'http://archive.ubuntu.com/ubuntu lunar-backports Release' does not have a Release file.
update-alternatives: error: alternative path /usr/bin/gcc-[13](https://github.com/valkey-io/valkey/actions/runs/12021130026/job/33547460209#step:5:14) doesn't exist
Error: Process completed with exit code 2.
```

example:
https://github.com/valkey-io/valkey/actions/runs/12021130026/job/33547460209

This pr uses the latest stable ubuntu image release
[plucky](https://hub.docker.com/layers/library/ubuntu/plucky/images/sha256-dc4565c7636f006c26d54c988faae576465e825ea349fef6fd3af6bf5100e8b6?context=explore)

Signed-off-by: Ran Shidlansik <ranshid@amazon.com>
2024-11-27 07:34:02 +02:00
Amit Nagler
9305b49145
Add tag for dual-channel logs (#999)
This PR introduces a consistent tagging system for dual-channel logs.
The goal is to improve log readability and filterability, making it
easier for operators to manage and analyze log entries.

Resolves https://github.com/valkey-io/valkey/issues/986

---------

Signed-off-by: naglera <anagler123@gmail.com>
2024-11-26 16:51:52 +02:00
Binbin
469d41fb37
Avoid double close on repl_transfer_fd (#1349)
The code is ok before 2de544cfcc6d1aa7cf6d0c75a6116f7fc27b6fd6,
but now we will set server.repl_transfer_fd right after dfd was
initiated, and in here we have a double close error since dfd and
server.repl_transfer_fd are the same fd.

Also move the declaration of dfd/maxtries to a small scope to avoid
the confusion since they are only used in this code.

Signed-off-by: Binbin <binloveplay1314@qq.com>
2024-11-26 00:00:47 +08:00
Binbin
2d48a39c27
Save open's errno when opening temp rdb fails to prevent it from being modified (#1347)
Apparently on Mac, sleep will modify errno to ETIMEDOUT, and then it
prints the misleading message: Operation timed out.

Signed-off-by: Binbin <binloveplay1314@qq.com>
2024-11-25 23:56:51 +08:00
Ray Cao
cf1a1e0931
Optimize sdscatrepr by batch processing printable characters (#1342)
Optimize sdscatrepr by reducing realloc calls, furthermore, we can reduce memcpy calls by
batch processing of consecutive printable characters.

Signed-off-by: Ray Cao <zisong.cw@alibaba.com>
Co-authored-by: Ray Cao <zisong.cw@alibaba.com>
2024-11-25 07:16:46 -08:00
Parth
c4920bca4a
Integrating fast_float to optionally replace strtod (#1260)
Fast_float is a C++ header-only library to parse doubles using SIMD
instructions. The purpose is to speed up sorted sets and other commands
that use doubles. A single-file copy of fast_float is included in this
repo. This introduces an optional dependency on a C++ compiler.

The use of fast_float is enabled at compile time using the make variable
`USE_FAST_FLOAT=yes`. It is disabled by default.

Fixes #1069.

---------

Signed-off-by: Parth Patel <661497+parthpatel@users.noreply.github.com>
Signed-off-by: Parth <661497+parthpatel@users.noreply.github.com>
Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
Co-authored-by: Roshan Swain <swainroshan001@gmail.com>
Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>
Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
2024-11-25 10:01:43 +01:00
Binbin
653d5f7fe3
Support empty callback on function and free temp function in async way (#1334)
We have a replicationEmptyDbCallback, it is a callback used by emptyData
while flushing away old data. Previously, we did not add this callback
logic for function, in case of abuse, there may be a lot of functions,
and also to make the code consistent, we add the same callback logic
for function.

Changes around this commit:
1. Extend emptyData / functionsLibCtxClear to support passing callback
   when flushing functions.
2. Added disklessLoad function create and discard helper function, just
   like disklessLoadInitTempDb and disklessLoadDiscardTempDb), we wll
   always flush the temp function in a async way to avoid any block.
3. Cleanup around discardTempDb, remove the callback pointer since in
   async way we don't need the callback.
4. Remove functionsLibCtxClear call in readSyncBulkPayload, because we
   called emptyData in the previous lines, which also empty functions.

We are doing this callback in replication is because during the flush,
replica may block a while if the flush is doing in the sync way, to
avoid the primary to detect the replica is timing out, replica will use this
callback to notify the primary (we also do this callback when loading
a RDB). And in the async way, we empty the data in the bio and there is
no slw operation, so it will ignores the callback.

Signed-off-by: Binbin <binloveplay1314@qq.com>
2024-11-25 09:59:37 +08:00
eifrah-aws
33f42d7fb5
CMake fixes + README update (#1276) 2024-11-22 12:17:53 -08:00
Binbin
9851006d6d
Add short client info log to CLUSTER MEET / FORGET / RESET commands (#1249)
These commands are all administrator commands. If they are operated
incorrectly, serious consequences may occur. Print the full client
info by using catClientInfoString, the info is useful when we want
to identify the source of request.

Since the origin client info is very large and might complicate the
output, we added a catClientInfoShortString function, it will only
print some basic fields, we want these fields that are useful to
identify the client. These fields are:
- id
- addr
- laddr
- connection info
- name
- user
- lib-name
- lib-ver

And also used it to replace the origin client info where it has the
same purpose. Some logging is changed from full client info to short
client info:
- CLUSTER FAILOVER
- FAILOVER / PSYNC
- REPLICAOF NO ONE
- SHUTDOWN

Signed-off-by: Binbin <binloveplay1314@qq.com>
2024-11-23 00:23:38 +08:00
Binbin
b9d224097a
Brocast a PONG to all node in cluster when role changed (#1295)
When a node role changes, we should brocast the change to notify other nodes.
For example, one primary and one replica, after a failover, the replica became
a new primary, the primary became a new replica.

And then we trigger a second cluster failover for the new replica, the
new replica will send a MFSTART to its primary, ie, the new primary.

But the new primary may reject the MFSTART due to this logic:
```
    } else if (type == CLUSTERMSG_TYPE_MFSTART) {
        if (!sender || sender->replicaof != myself) return 1;
```

In the new primary views, sender is still a primary, and sender->replicaof
is NULL, so we will return. Then the manual failover timedout.

Another possibility is that other primaries refuse to vote after receiving
the FAILOVER_AUTH_REQUEST, since in their's views, sender is still a primary,
so it refuse to vote, and then manual failover timedout.
```
void clusterSendFailoverAuthIfNeeded(clusterNode *node, clusterMsg *request) {
    ...
        if (clusterNodeIsPrimary(node)) {
            serverLog(LL_WARNING, "Failover auth denied to...
```

The reason is that, currently, we only update the node->replicaof information
when we receive a PING/PONG from the sender. For details, see clusterProcessPacket.
Therefore, in some scenarios, such as clusters with many nodes and a large
cluster-ping-interval (that is, cluster-node-timeout), the role change of the node
will be very delayed.

Added a DEBUG DISABLE-CLUSTER-RANDOM-PING command, send cluster ping
to a random node every second (see clusterCron).

Signed-off-by: Binbin <binloveplay1314@qq.com>
2024-11-23 00:22:04 +08:00
Binbin
979f4c1ceb
Add cmake-build-debug and cmake-build-release to gitignore (#1340)
Signed-off-by: Binbin <binloveplay1314@qq.com>
2024-11-22 16:49:16 +08:00
Alan Scherger
377ed22c97
[feat] add Ubuntu 24.04 Noble package support (#971)
add Ubuntu 24.04 Noble package support

Signed-off-by: Alan Scherger <alan.scherger@gmail.com>
2024-11-21 19:26:30 -08:00
Yury-Fridlyand
109d2dadc0
Add slack link for users (#1273)
Add slack link for users

---------

Signed-off-by: Yury-Fridlyand <yury.fridlyand@improving.com>
Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>
2024-11-21 19:19:10 -08:00
Nadav Levanoni
18d1eb5a85
Remove redundant dict_index calculations (#1205)
We need to start making use of the new `WithDictIndex` APIs which allow
us to reuse the dict_index calculation (avoid over-calling `getKeySlot`
for no good reason).

In this PR I optimized `lookupKey` so it now calls `getKeySlot` to reuse
the dict_index two additional times. It also optimizes the keys command
to avoid unnecessary computation of the slot id.

---------

Signed-off-by: Nadav Levanoni <nadavl@amazon.com>
Co-authored-by: Nadav Levanoni <nadavl@amazon.com>
2024-11-21 19:14:28 -08:00
Sinkevich Artem
43b5026162
Fix argument types of formatting functions (#1253)
`cluster_legacy.c`: `slot_info_pairs` has `uint16_t` values, but they
were cast to `unsigned long` and `%i` was used.

`valkey-cli.c`: `node->replicas_count` is `int`, not `unsigned long`.

Signed-off-by: ArtSin <artsin666@gmail.com>
2024-11-21 18:58:15 -08:00
Binbin
50aae13b0a
Skip reclaim file page cache test in valgrind (#1327)
The test is incompatible with valgrind. Added a new `--valgrind`
argument to test suite, which will cause that test to be skipped.

We skipped it in the past, see 5b61b0dc6d2579ee484fa6cf29bfac59513f84ab

Signed-off-by: Binbin <binloveplay1314@qq.com>
2024-11-22 10:29:24 +08:00
Binbin
c4be326c32
Make manual failover reset the on-going election to promote failover (#1274)
If a manual failover got timed out, like the election don't get the
enough votes, since we have a auth_timeout and a auth_retry_time, a
new manual failover will not be able to proceed on the replica side.

Like if we initiate a new manual failover after a election timed out,
we will pause the primary, but on the replica side, due to retry_time,
replica does not trigger the new election and the manual failover will
eventually time out.

In this case, if we initiate manual failover again and there is an
ongoing election, we will reset it so that the replica can initiate
a new election at the manual failover's request.

Signed-off-by: Binbin <binloveplay1314@qq.com>
2024-11-22 10:28:59 +08:00
zvi-code
b56eed2479
Remove valkey specific changes in jemalloc source code (#1266)
### Summary of the change

This is a base PR for refactoring defrag. It moves the defrag logic to
rely on jemalloc [native
api](https://github.com/jemalloc/jemalloc/pull/1463#issuecomment-479706489)
instead of relying on custom code changes made by valkey in the jemalloc
([je_defrag_hint](9f8185f5c8/deps/jemalloc/include/jemalloc/internal/jemalloc_internal_inlines_c.h (L382)))
library. This enables valkey to use latest vanila jemalloc without the
need to maintain code changes cross jemalloc versions.

This change requires some modifications because the new api is providing
only the information, not a yes\no defrag. The logic needs to be
implemented at valkey code. Additionally, the api does not provide,
within single call, all the information needed to make a decision, this
information is available through additional api call. To reduce the
calls to jemalloc, in this PR the required information is collected
during the `computeDefragCycles` and not for every single ptr, this way
we are avoiding the additional api call.
Followup work will utilize the new options that are now open and will
further improve the defrag decision and process.

### Added files: 

`allocator_defrag.c` / `allocator_defrag.h` - This files implement the
allocator specific knowledge for making defrag decision. The knowledge
about slabs and allocation logic and so on, all goes into this file.
This improves the separation between jemalloc specific code and other
possible implementation.


### Moved functions: 

[`zmalloc_no_tcache` , `zfree_no_tcache`
](4593dc2f05/src/zmalloc.c (L215))
- these are very jemalloc specific logic assumptions, and are very
specific to how we defrag with jemalloc. This is also with the vision
that from performance perspective we should consider using tcache, we
only need to make sure we don't recycle entries without going through
the arena [for example: we can use private tcache, one for free and one
for alloc].
`frag_smallbins_bytes` - the logic and implementation moved to the new
file

### Existing API:

* [once a second + when completed full cycle]
[`computeDefragCycles`](4593dc2f05/src/defrag.c (L916))
* `zmalloc_get_allocator_info` : gets from jemalloc _allocated, active,
resident, retained, muzzy_, `frag_smallbins_bytes`
*
[`frag_smallbins_bytes`](4593dc2f05/src/zmalloc.c (L690))
: for each bin; gets from jemalloc bin_info, `curr_regs`, `cur_slabs`
* [during defrag, for each pointer]
* `je_defrag_hint` is getting a memory pointer and returns {0,1} .
[Internally it
uses](4593dc2f05/deps/jemalloc/include/jemalloc/internal/jemalloc_internal_inlines_c.h (L368))
this information points:
        * #`nonfull_slabs`
        * #`total_slabs`
        * #free regs in the ptr slab

## Jemalloc API (via ctl interface)


[BATCH][`experimental_utilization_batch_query_ctl`](4593dc2f05/deps/jemalloc/src/ctl.c (L4114))
: gets an array of pointers, returns for each pointer 3 values,

* number of free regions in the extent
* number of regions in the extent
* size of the extent in terms of bytes


[EXTENDED][`experimental_utilization_query_ctl`](4593dc2f05/deps/jemalloc/src/ctl.c (L3989))
:

* memory address of the extent a potential reallocation would go into
* number of free regions in the extent
* number of regions in the extent
* size of the extent in terms of bytes
* [stats-enabled]total number of free regions in the bin the extent
belongs to
* [stats-enabled]total number of regions in the bin the extent belongs
to

### `experimental_utilization_batch_query_ctl` vs valkey
`je_defrag_hint`?
[good]
   - We can query pointers in a batch, reduce the overall overhead
- The per ptr decision algorithm is not within jemalloc api, jemalloc
only provides information, valkey can tune\configure\optimize easily

 
[bad]
- In the batch API we only know the utilization of the slab (of that
memory ptr), we don’t get the data about #`nonfull_slabs` and total
allocated regs.


## New functions:
1. `defrag_jemalloc_init`: Reducing the cost of call to je_ctl: use the
[MIB interface](https://jemalloc.net/jemalloc.3.html) to get a faster
calls. See this quote from the jemalloc documentation:
    
The mallctlnametomib() function provides a way to avoid repeated name
lookups for
applications that repeatedly query the same portion of the namespace,by
translating
a name to a “Management Information Base” (MIB) that can be passed
repeatedly to
    mallctlbymib().

6. `jemalloc_sz2binind_lgq*` : this api is to support reverse map
between bin size and it’s info without lookup. This mapping depends on
the number of size classes we have that are derived from
[`lg_quantum`](4593dc2f05/deps/Makefile (L115))
7. `defrag_jemalloc_get_frag_smallbins` : This function replaces
`frag_smallbins_bytes` the logic moved to the new file allocator_defrag
`defrag_jemalloc_should_defrag_multi` → `handle_results` - unpacks the
results
8. `should_defrag` : implements the same logic as the existing
implementation
[inside](9f8185f5c8/deps/jemalloc/include/jemalloc/internal/jemalloc_internal_inlines_c.h (L382))
je_defrag_hint
9. `defrag_jemalloc_should_defrag_multi` : implements the hint for an
array of pointers, utilizing the new batch api. currently only 1 pointer
is passed.


### Logical differences:

In order to get the information about #`nonfull_slabs` and #`regs`, we
use the query cycle to collect the information per size class. In order
to find the index of bin information given bin size, in o(1), we use
`jemalloc_sz2binind_lgq*` .


## Testing
This is the first draft. I did some initial testing that basically
fragmentation by reducing max memory and than waiting for defrag to
reach desired level. The test only serves as sanity that defrag is
succeeding eventually, no data provided here regarding efficiency and
performance.

### Test: 
1. disable `activedefrag`
2. run valkey benchmark on overlapping address ranges with different
block sizes
3. wait untill `used_memory` reaches 10GB
4. set `maxmemory` to 5GB and `maxmemory-policy` to `allkeys-lru`
5. stop load
6. wait for `mem_fragmentation_ratio` to reach 2
7. enable `activedefrag` - start test timer
8. wait until reach `mem_fragmentation_ratio` = 1.1

#### Results*:
(With this PR)Test results: ` 56 sec`
(Without this PR)Test results: `67 sec`

*both runs perform same "work" number of buffers moved to reach
fragmentation target

Next benchmarking is to compare to:
- DONE // existing `je_get_defrag_hint` 
- compare with naive defrag all: `int defrag_hint() {return 1;}`

---------

Signed-off-by: Zvi Schneider <ezvisch@amazon.com>
Signed-off-by: Zvi Schneider <zvi.schneider22@gmail.com>
Signed-off-by: zvi-code <54795925+zvi-code@users.noreply.github.com>
Co-authored-by: Zvi Schneider <ezvisch@amazon.com>
Co-authored-by: Zvi Schneider <zvi.schneider22@gmail.com>
Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>
2024-11-21 16:29:21 -08:00
xbasel
b486a41500
Preserve original fd blocking state in TLS I/O operations (#1298)
This change prevents unintended side effects on connection state and
improves consistency with non-TLS sync operations.

For example, when invoking `connTLSSyncRead` with a blocking file
descriptor, the mode is switched to non-blocking upon `connTLSSyncRead`
exit. If the code assumes the file descriptor remains blocking and calls
the normal `read` expecting it to block, it may result in a short read.

This caused a crash in dual-channel, which was fixed in this PR by
relocating `connBlock()`:
https://github.com/valkey-io/valkey/pull/837

Signed-off-by: xbasel <103044017+xbasel@users.noreply.github.com>
2024-11-21 18:22:16 +02:00
Binbin
6038eda010
Make FUNCTION RESTORE FLUSH flush async based on lazyfree-lazy-user-flush (#1254)
FUNCTION RESTORE have a FLUSH option, it will delete all the existing
libraries before restoring the payload. If for some reasons, there are
a lot of libraries, we will block a while in here.

Signed-off-by: Binbin <binloveplay1314@qq.com>
2024-11-21 21:02:05 +08:00
Binbin
f553ccbda6
Use goto to cleanup error handling in readSyncBulkPayload (#1332)
The goto error label is the same as the error return, use goto
to reduce the references.
```
error:
    cancelReplicationHandshake(1);
    return;
```

Also this can make the log printing more continuous under the
error, that is, we print the error log first, and then print
the reconnecting log at the last (in cancelReplicationHandshake).

Signed-off-by: Binbin <binloveplay1314@qq.com>
2024-11-21 20:01:30 +08:00
Yanqi Lv
4986310945
Import-mode: Avoid expiration and eviction during data syncing (#1185)
New config: `import-mode (yes|no)`

New command: `CLIENT IMPORT-SOURCE (ON|OFF)`

The config, when set to `yes`, disables eviction and deletion of expired
keys, except for commands coming from a client which has marked itself
as an import-source, the data source when importing data from another
node, using the CLIENT IMPORT-SOURCE command.

When we sync data from the source Valkey to the destination Valkey using
some sync tools like
[redis-shake](https://github.com/tair-opensource/RedisShake), the
destination Valkey can perform expiration and eviction, which may cause
data corruption. This problem has been discussed in
https://github.com/redis/redis/discussions/9760#discussioncomment-1681041
and Redis already have a solution. But in Valkey we haven't fixed it by
now.

E.g. we call `set key 1 ex 1` on the source server and transfer this
command to the destination server. Then we call `incr key` on the source
server before the key expired, we will have a key on the source server
with a value of 2. But when the command arrived at the destination
server, the key may be expired and has deleted. So we will have a key on
the destination server with a value of 1, which is inconsistent with the
source server.

In standalone mode, we can use writable replica to simplify the sync
process. However, in cluster mode, we still need a sync tool to help us
transfer the source data to the destination. The sync tool usually work
as a normal client and the destination works as a primary which keep
expiration and eviction.

In this PR, we add a new mode named 'import-mode'. In this mode, server
stop expiration and eviction just like a replica. Notice that this mode
exists only in sync state to avoid data inconsistency caused by
expiration and eviction. Import mode only takes effect on the primary.
Sync tools can mark their clients as an import source by `CLIENT
IMPORT-SOURCE`, which work like a client from primary and can visit
expired keys in `lookupkey`.

**Notice: during the migration, other clients, apart from the import
source, should not access the data imported by import source.**

---------

Signed-off-by: lvyanqi.lyq <lvyanqi.lyq@alibaba-inc.com>
Signed-off-by: Yanqi Lv <lvyanqi.lyq@alibaba-inc.com>
Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>
2024-11-19 21:53:19 +01:00
Binbin
ee386c92ff
Manual failover vote is not limited by two times the node timeout (#1305)
This limit should not restrict manual failover, otherwise in some
scenarios, manual failover will time out.

For example, if some FAILOVER_AUTH_REQUESTs or some FAILOVER_AUTH_ACKs
are lost during a manual failover, it cannot vote in the second manual
failover. Or in a mixed scenario of plain failover and manual failover,
it cannot vote for the subsequent manual failover.

The problem with the manual failover retry is that the mf will pause
the client 5s in the primary side. So every retry every manual failover
timed out is a bad move.

---------

Signed-off-by: Binbin <binloveplay1314@qq.com>
Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
2024-11-19 11:17:20 -05:00
Binbin
132798b57d
Receipt of REPLCONF VERSION reply should be triggered by event (#1320)
This add the missing return when repl_state change to RECEIVE_VERSION_REPLY,
this way we won’t be blocked if the primary doesn’t reply with REPLCONF
VERSION.

In practice i guess this is no likely to block in this context, reading
small responses are are likely to be received in one packet, so this
is just a cleanup (consistent with the previous state machine
processing).

Also update the state machine diagram to mention the VERSION reply.

Signed-off-by: Binbin <binloveplay1314@qq.com>
2024-11-19 23:42:50 +08:00
Seungmin Lee
3d0c834203
Fix LRU crash when getting too many random lua scripts (#1310)
### Problem
Valkey stores scripts in a dictionary (lua_scripts) keyed by their SHA1
hashes, but it needs a way to know which scripts are least recently
used. It uses an LRU list (lua_scripts_lru_list) to keep track of
scripts in usage order. When the list reaches a maximum length, Valkey
evicts the oldest scripts to free memory in both the list and
dictionary. The problem here is that the sds from the LRU list can be
pointing to already freed/moved memory by active defrag that the sds in
the dictionary used to point to. It results in assertion error at [this
line](https://github.com/valkey-io/valkey/blob/unstable/src/eval.c#L519)

### Solution
If we duplicate the sds when adding it to the LRU list, we can create an
independent copy of the script identifier (sha). This duplication
ensures that the sha string in the LRU list remains stable and
unaffected by any defragmentation that could alter or free the original
sds. In addition, dictUnlink doesn't require exact pointer
match([ref](https://github.com/valkey-io/valkey/blob/unstable/src/eval.c#L71-L78))
so this change makes sense to unlink the right dictEntry with the copy
of the sds.

### Reproduce
To reproduce it with tcl test:
1. Disable je_get_defrag_hint in defrag.c to trigger defrag often
2. Execute test script
```
start_server {tags {"auth external:skip"}} {

    test {Regression for script LRU crash} {
        r config set activedefrag yes
        r config set active-defrag-ignore-bytes 1
        r config set active-defrag-threshold-lower 0
        r config set active-defrag-threshold-upper 1
        r config set active-defrag-cycle-min 99
        r config set active-defrag-cycle-max 99

        for {set i 0} {$i < 100000} {incr i} {
            r eval "return $i" 0
        }
        after 5000;
    }
}
```


### Crash info
Crash report:
```
=== REDIS BUG REPORT START: Cut & paste starting from here ===
14044:M 12 Nov 2024 14:51:27.054 # === ASSERTION FAILED ===
14044:M 12 Nov 2024 14:51:27.054 # ==> eval.c:556 'de' is not true

------ STACK TRACE ------

Backtrace:
/usr/bin/redis-server 127.0.0.1:6379 [cluster](luaDeleteFunction+0x148)[0x723708]
/usr/bin/redis-server 127.0.0.1:6379 [cluster](luaCreateFunction+0x26c)[0x724450]
/usr/bin/redis-server 127.0.0.1:6379 [cluster](evalCommand+0x2bc)[0x7254dc]
/usr/bin/redis-server 127.0.0.1:6379 [cluster](call+0x574)[0x5b8d14]
/usr/bin/redis-server 127.0.0.1:6379 [cluster](processCommand+0xc84)[0x5b9b10]
/usr/bin/redis-server 127.0.0.1:6379 [cluster](processCommandAndResetClient+0x11c)[0x6db63c]
/usr/bin/redis-server 127.0.0.1:6379 [cluster](processInputBuffer+0x1b0)[0x6dffd4]
/usr/bin/redis-server 127.0.0.1:6379 [cluster][0x6bd968]
/usr/bin/redis-server 127.0.0.1:6379 [cluster][0x659634]
/usr/bin/redis-server 127.0.0.1:6379 [cluster](amzTLSEventHandler+0x194)[0x6588d8]
/usr/bin/redis-server 127.0.0.1:6379 [cluster][0x750c88]
/usr/bin/redis-server 127.0.0.1:6379 [cluster](aeProcessEvents+0x228)[0x757fa8]
/usr/bin/redis-server 127.0.0.1:6379 [cluster](redisMain+0x478)[0x7786b8]
/lib64/libc.so.6(__libc_start_main+0xe4)[0xffffa7763da4]
/usr/bin/redis-server 127.0.0.1:6379 [cluster][0x5ad3b0]
```
Defrag info:
```
mem_fragmentation_ratio:1.18
mem_fragmentation_bytes:47229992
active_defrag_hits:20561
active_defrag_misses:5878518
active_defrag_key_hits:77
active_defrag_key_misses:212
total_active_defrag_time:29009
```

### Test:
Run the test script to push 100,000 scripts to ensure the LRU list keeps
500 maximum length without any crash.
```
27489:M 14 Nov 2024 20:56:41.583 * LRU List length: 500
27489:M 14 Nov 2024 20:56:41.583 * LRU List length: 500
27489:M 14 Nov 2024 20:56:41.584 * LRU List length: 500
27489:M 14 Nov 2024 20:56:41.584 * LRU List length: 500
27489:M 14 Nov 2024 20:56:41.584 * LRU List length: 500
27489:M 14 Nov 2024 20:56:41.584 * LRU List length: 500
27489:M 14 Nov 2024 20:56:41.584 * LRU List length: 500
27489:M 14 Nov 2024 20:56:41.584 * LRU List length: 500
27489:M 14 Nov 2024 20:56:41.584 * LRU List length: 500
27489:M 14 Nov 2024 20:56:41.584 * LRU List length: 500
27489:M 14 Nov 2024 20:56:41.584 * LRU List length: 500
27489:M 14 Nov 2024 20:56:41.584 * LRU List length: 500
27489:M 14 Nov 2024 20:56:41.584 * LRU List length: 500
[ok]: Regression for script LRU crash (6811 ms)
[1/1 done]: unit/test (7 seconds)
```

---------

Signed-off-by: Seungmin Lee <sungming@amazon.com>
Signed-off-by: Seungmin Lee <155032684+sungming2@users.noreply.github.com>
Co-authored-by: Seungmin Lee <sungming@amazon.com>
Co-authored-by: Binbin <binloveplay1314@qq.com>
2024-11-18 18:06:35 -08:00
Seungmin Lee
f9d0b87622
Upgrade macos-12 to macos-13 in workflows (#1318)
### Problem
GitHub Actions is starting the deprecation process for macOS 12.
Deprecation will begin on 10/7/24 and the image will be fully
unsupported by 12/3/24.
For more details, see
https://github.com/actions/runner-images/issues/10721

Signed-off-by: Seungmin Lee <sungming@amazon.com>
Co-authored-by: Seungmin Lee <sungming@amazon.com>
2024-11-18 18:00:30 -08:00
Amit Nagler
c5012cc630
Optimize RDB load performance and fix cluster mode resizing on replica side (#1199)
This PR addresses two issues:

1. Performance Degradation Fix - Resolves a significant performance
issue during RDB load on replica nodes.
- The problem was causing replicas to rehash multiple times during the
load process. Local testing demonstrated up to 50% degradation in BGSAVE
time.
- The problem occurs when the replica tries to expand pre-created slot
dictionaries. This operation fails quietly, resulting in undetected
performance issues.
- This fix aims to optimize the RDB load process and restore expected
performance levels.

2. Bug fix when reading `RDB_OPCODE_RESIZEDB` in Valkey 8.0 cluster
mode-
- Use the shard's master slots count when processing this opcode, as
`clusterNodeCoversSlot` is not initialized for the currently syncing
replica.
- Previously, this problem went unnoticed because `RDB_OPCODE_RESIZEDB`
had no practical impact (due to 1).

These improvements will enhance overall system performance and ensure
smoother upgrades to Valkey 8.0 in the future.

Testing:
- Conducted local tests to verify the performance improvement during RDB
load.
- Verified that ignoring `RDB_OPCODE_RESIZEDB` does not negatively
impact functionality in the current version.

Signed-off-by: naglera <anagler123@gmail.com>
Co-authored-by: Binbin <binloveplay1314@qq.com>
2024-11-18 19:09:35 +08:00
Binbin
d07674fc01
Fix sds unittest tests to check for zmalloc_usable_size (#1314)
s_malloc_size == zmalloc_size, currently sdsAllocSize does not
calculate PREFIX_SIZE when no malloc_size available, this casue
test_typesAndAllocSize fail in the new unittest, what we want to
check is actually zmalloc_usable_size.

Signed-off-by: Binbin <binloveplay1314@qq.com>
2024-11-18 14:55:26 +08:00
uriyage
94113fde7f
Improvements for TLS with I/O threads (#1271)
Main thread profiling revealed significant overhead in TLS operations,
even with read/write offloaded to I/O threads:

Perf results:

**10.82%** 8.82% `valkey-server libssl.so.3 [.] SSL_pending` # Called by
main thread after I/O completion

**10.16%** 5.06% `valkey-server libcrypto.so.3 [.] ERR_clear_error` #
Called for every event regardless of thread handling

This commit further optimizes TLS operations by moving more work from
the main thread to I/O threads:

Improve TLS offloading to I/O threads with two main changes:

1. Move `ERR_clear_error()` calls closer to SSL operations
   - Currently, error queue is cleared for every TLS event
   - Now only clear before actual SSL function calls
   - This prevents unnecessary clearing in main thread when operations
     are handled by I/O threads

2. Optimize `SSL_pending()` checks
   - Add `TLS_CONN_FLAG_HAS_PENDING` flag to track pending data
   - Move pending check to follow read operations immediately
   - I/O thread sets flag when pending data exists
   - Main thread uses flag to update pending list

Performance improvements:
Testing setup based on
https://valkey.io/blog/unlock-one-million-rps-part2/

Before:
- SET: 896,047 ops/sec
- GET: 875,794 ops/sec

After:
- SET: 985,784 ops/sec (+10% improvement)
- GET: 1,066,171 ops/sec (+22% improvement)

Signed-off-by: Uri Yagelnik <uriy@amazon.com>
2024-11-17 21:52:35 -08:00
Binbin
aa2dd3ecb8
Stabilize replica migration test to make sure cluster config is consistent (#1311)
CI report this failure:
```
[exception]: Executing test client: MOVED 1 127.0.0.1:22128.
MOVED 1 127.0.0.1:22128
    while executing
"wait_for_condition 1000 50 {
            [R 3 get key_991803] == 1024 && [R 3 get key_977613] == 10240 &&
            [R 4 get key_991803] == 1024 && ..."
```

This may be because, even though the cluster state becomes OK,
The cluster still has inconsistent configuration for a short period
of time. We make sure to wait for the config to be consistent.

Signed-off-by: Binbin <binloveplay1314@qq.com>
2024-11-16 18:58:25 +08:00
Binbin
86f33ea2b0
Unprotect rdb channel when bgsave child fails in dual channel replication (#1297)
If bgsaveerr is error, there is no need to protect the rdb channel.
The impact of this may be that when bgsave fails, we will protect
the rdb channel for 60s. It may occupy the reference of the repl
buf block, making it impossible to recycle it until we free the
client due to COB or free the client after 60s.

We kept the RDB channel open as long as the replica hadn't established
a main connection, even if the snapshot process failed. There is no
value in keeping the RDB client in this case.

Signed-off-by: Binbin <binloveplay1314@qq.com>
2024-11-15 16:48:13 +08:00
Binbin
92181b6797
Fix primary crash when processing dirty slots during shutdown wait / failover wait / client pause (#1131)
We have an assert in propagateNow. If the primary node receives a
CLUSTER UPDATE such as dirty slots during SIGTERM waitting or during
a manual failover pausing or during a client pause, the delKeysInSlot
call will trigger this assert and cause primary crash.

In this case, we added a new server_del_keys_in_slot state just like
client_pause_in_transaction to track the state to avoid the assert
in propagateNow, the dirty slots will be deleted in the end without
affecting the data consistency.

Signed-off-by: Binbin <binloveplay1314@qq.com>
Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
2024-11-15 16:47:15 +08:00
Binbin
4e2493e5c9
Kill diskless fork child asap when the last replica drop (#1227)
We originally checked the replica connection to whether to kill the
diskless child only when rdbPipeReadHandler is triggered. Actually
we can check it when the replica is disconnected, so that we don't
have to wait for rdbPipeReadHandler to be triggered and can kill
the forkless child as soon as possible.

In this way, when the child or rdbPipeReadHandler is stuck for some
reason, we can kill the child faster and release the fork resources.

Signed-off-by: Binbin <binloveplay1314@qq.com>
2024-11-15 16:34:32 +08:00
Binbin
d3f3b9cc3a
Fix daily valgrind build with unit tests (#1309)
This was introduced in #515.

Signed-off-by: Binbin <binloveplay1314@qq.com>
2024-11-15 14:27:28 +08:00
bentotten
b9994030e9
Log clusterbus handshake timeout failures (#1247)
This adds a log when a handshake fails for a timeout. This can help
troubleshoot cluster asymmetry issues caused by failed MEETs

---------

Signed-off-by: Ben Totten <btotten@amazon.com>
Signed-off-by: bentotten <59932872+bentotten@users.noreply.github.com>
Co-authored-by: Ben Totten <btotten@amazon.com>
Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>
2024-11-14 20:48:48 -08:00
Qu Chen
32f7541fe3
Simplify dictType callbacks and move some macros from dict.h to dict.c (#1281)
Remove the dict pointer argument to the `dictType` callbacks `keyDup`,
`keyCompare`, `keyDestructor` and `valDestructor`. This argument was
unused in all of the callback implementations.

The macros `dictFreeKey()` and `dictFreeVal()` are made internal to dict
and moved from dict.h to dict.c. They're also changed from macros to
static inline functions.

Signed-off-by: Qu Chen <quchen@amazon.com>
2024-11-14 09:45:47 +01:00
Parth
863d312803
Fix link-time optimization to work correctly for unit tests (i.e. -flto flag) (#1290) (#1296)
* We compile various c files into object and package them into library
(.a file) using ar to feed to unit tests. With new GCC versions, the
objects inside such library don't participate in LTO process without
additional flags.
* Here is a direct quote from gcc documentation explaining this issue:
"If you are not using a linker with plugin support and/or do not enable
the linker plugin, then the objects inside libfoo.a are extracted and
linked as usual, but they do not participate in the LTO optimization
process. In order to make a static library suitable for both LTO
optimization and usual linkage, compile its object files with
-flto-ffat-lto-objects."
* Read full documentation about -flto at
https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html
* Without this additional flag, I get following errors while executing
"make test-unit". With this change, those errors go away.

```
ARCHIVE libvalkey.a
ar: threads_mngr.o: plugin needed to handle lto object
...
..
.
/tmp/ccDYbMXL.ltrans0.ltrans.o: In function `dictClear':
/local/workplace/elasticache/valkey/src/unit/../dict.c:776: undefined
reference to `valkey_free'
/local/workplace/elasticache/valkey/src/unit/../dict.c:770: undefined
reference to `valkey_free'
/tmp/ccDYbMXL.ltrans0.ltrans.o: In function `dictGetVal':
```

Fixes #1290

---------

Signed-off-by: Parth Patel <661497+parthpatel@users.noreply.github.com>
2024-11-13 21:50:55 -08:00
skyfirelee
4a9864206f
Migrate quicklist unit test to new framework (#515)
Migrate quicklist unit test to new unit test framework, and cleanup
remaining references of SERVER_TEST, parent ticket #428.

Closes #428.

Signed-off-by: artikell <739609084@qq.com>
Signed-off-by: Binbin <binloveplay1314@qq.com>
Co-authored-by: Binbin <binloveplay1314@qq.com>
2024-11-14 10:37:44 +08:00
Binbin
6fba747c39
Fix log printing always shows the role as child under daemonize (#1301)
In #1282, we init server.pid earlier to keep log message role
consistent, but we forgot to consider daemonize. In daemonize
mode, we will always print the child role.

We need to reset server.pid after daemonize(), otherwise the
log printing role will always be the child. It also causes a
incorrect server.pid value, affecting the concatenation of
some pid names.

Signed-off-by: Binbin <binloveplay1314@qq.com>
2024-11-14 10:26:23 +08:00
Binbin
2df56d87c0
Fix empty primary may have dirty slots data due to bad migration (#1285)
If we become an empty primary for some reason, we still need to
check if we need to delete dirty slots, because we may have dirty
slots data left over from a bad migration. Like the target node forcibly
executes CLUSTER SETSLOT NODE to take over the slot without
performing key migration.

Signed-off-by: Binbin <binloveplay1314@qq.com>
2024-11-11 22:13:47 +08:00
Binbin
a2d22c63c0
Fix replica not able to initate election in time when epoch fails (#1009)
If multiple primary nodes go down at the same time, their replica nodes will
initiate the elections at the same time. There is a certain probability that
the replicas will initate the elections in the same epoch.

And obviously, in our current election mechanism, only one replica node can
eventually get the enough votes, and the other replica node will fail to win
due the the insufficient majority, and then its election will time out and
we will wait for the retry, which result in a long failure time.

If another node has been won the election in the failover epoch, we can assume
that my election has failed and we can retry as soom as possible.

Signed-off-by: Binbin <binloveplay1314@qq.com>
2024-11-11 22:12:49 +08:00
Binbin
167e8ab8de
Trigger the election immediately when doing a manual failover (#1081)
Currently when a manual failover is triggeded, we will set a
CLUSTER_TODO_HANDLE_FAILOVER to start the election as soon as
possible in the next beforeSleep. But in fact, we won't delay
the election in manual failover, waitting for the next beforeSleep
to kick in will delay the election a some milliseconds.

We can trigger the election immediately in this case in the
same function call, without waitting for beforeSleep, which
can save us some milliseconds.

Signed-off-by: Binbin <binloveplay1314@qq.com>
2024-11-11 21:43:46 +08:00
Binbin
4aacffa32d
Stabilize dual replication test to avoid getting LOADING error (#1288)
When doing `$replica replicaof no one`, we may get a LOADING
error, this is because during the test execution, the replica
may reconnect very quickly, and the full sync is initiated,
and the replica has entered the LOADING state.

In this commit, we make sure the primary is pasued after the
fork, so the replica won't enter the LOADING state, and with
this fix, this test seems more natural and predictable.

Signed-off-by: Binbin <binloveplay1314@qq.com>
2024-11-11 21:42:34 +08:00
Qu Chen
9300a7ebc8
Set fields to NULL after free in freeClient() (#1279)
Null out several references after freeing the object in `freeClient()`.

This is just to make the code more safe, to protect against
use-after-free for future changes.

Signed-off-by: Qu Chen <quchen@amazon.com>
2024-11-11 10:39:48 +01:00
zixuan zhao
0b5b2c7484
Log as primary role (M) instead of child process (C) during startup (#1282)
Init server.pid earlier to keep log message role consistent.

Closes #1206.

Before:
```text
24881:C 21 Oct 2024 21:10:57.165 * oO0OoO0OoO0Oo Valkey is starting oO0OoO0OoO0Oo
24881:C 21 Oct 2024 21:10:57.165 * Valkey version=255.255.255, bits=64, commit=814e0f55, modified=1, pid=24881, just started
24881:C 21 Oct 2024 21:10:57.165 * Configuration loaded
24881:M 21 Oct 2024 21:10:57.167 * Increased maximum number of open files to 10032 (it was originally set to 1024).
```
After:
```text
68560:M 08 Nov 2024 16:10:12.257 * oO0OoO0OoO0Oo Valkey is starting oO0OoO0OoO0Oo
68560:M 08 Nov 2024 16:10:12.257 * Valkey version=255.255.255, bits=64, commit=45d596e1, modified=1, pid=68560, just started
68560:M 08 Nov 2024 16:10:12.257 * Configuration loaded
68560:M 08 Nov 2024 16:10:12.258 * monotonic clock: POSIX clock_gettime
```

Signed-off-by: azuredream <zhaozixuan67@gmail.com>
2024-11-11 10:33:26 +01:00
zhenwei pi
45d596e121
RDMA: Use conn ref counter to prevent double close (#1250)
RDMA: Use connection reference counter style
    
The reference counter of connection is used to protect re-entry of closenmethod.
Use this style instead the unsafe one.

Signed-off-by: zhenwei pi <pizhenwei@bytedance.com>
2024-11-08 09:33:01 +01:00
Jacob Murphy
e972d56460
Make sure to copy null terminator byte in dual channel code (#1272)
As @madolson pointed out, these do have proper null terminators. This
cleans them up to follow the rest of the code which copies the last byte
explicitly, which should help reduce cognitive load and make it more
resilient should code refactors occur (e.g. non-static allocation of
memory, changes to other functions).

---------

Signed-off-by: Jacob Murphy <jkmurphy@google.com>
2024-11-07 18:25:43 -08:00
eifrah-aws
07b3e7ae7a
Add CMake build system for valkey (#1196)
With this commit, users are able to build valkey using `CMake`.

## Example usage:

Build `valkey-server` in Release mode with TLS enabled and using
`jemalloc` as the allocator:

```bash
mkdir build-release
cd $_
cmake .. -DCMAKE_BUILD_TYPE=Release \
         -DCMAKE_INSTALL_PREFIX=/tmp/valkey-install \
         -DBUILD_MALLOC=jemalloc -DBUILD_TLS=1
make -j$(nproc) install

# start valkey
/tmp/valkey-install/bin/valkey-server
```

Build `valkey-unit-tests`:

```bash
mkdir build-release-ut
cd $_
cmake .. -DCMAKE_BUILD_TYPE=Release \
         -DBUILD_MALLOC=jemalloc -DBUILD_UNIT_TESTS=1
make -j$(nproc)

# Run the tests
./bin/valkey-unit-tests 
```

Current features supported by this PR:

- Building against different allocators: (`jemalloc`, `tcmalloc`,
`tcmalloc_minimal` and `libc`), e.g. to enable `jemalloc` pass
`-DBUILD_MALLOC=jemalloc` to `cmake`
- OpenSSL builds (to enable TLS, pass `-DBUILD_TLS=1` to `cmake`)
- Sanitizier: pass `-DBUILD_SANITIZER=<address|thread|undefined>` to
`cmake`
- Install target + redis symbolic links
- Build `valkey-unit-tests` executable
- Standard CMake variables are supported. e.g. to install `valkey` under
`/home/you/root` pass `-DCMAKE_INSTALL_PREFIX=/home/you/root`

Why using `CMake`? To list *some* of the advantages of using `CMake`:

- Superior IDE integrations: cmake generates the file
`compile_commands.json` which is required by `clangd` to get a compiler
accuracy code completion (in other words: your VScode will thank you)
- Out of the source build tree: with the current build system, object
files are created all over the place polluting the build source tree,
the best practice is to build the project on a separate folder
- Multiple build types co-existing: with the current build system, it is
often hard to have multiple build configurations. With cmake you can do
it easily:
- It is the de-facto standard for C/C++ project these days

More build examples: 

ASAN build:

```bash
mkdir build-asan
cd $_
cmake .. -DBUILD_SANITIZER=address -DBUILD_MALLOC=libc
make -j$(nproc)
```

ASAN with jemalloc:

```bash
mkdir build-asan-jemalloc
cd $_
cmake .. -DBUILD_SANITIZER=address -DBUILD_MALLOC=jemalloc 
make -j$(nproc)
```

As seen by the previous examples, any combination is allowed and
co-exist on the same source tree.

## Valkey installation

With this new `CMake`, it is possible to install the binary by running
`make install` or creating a package `make package` (currently supported
on Debian like distros)

### Example 1: build & install using `make install`:

```bash
mkdir build-release
cd $_
cmake .. -DCMAKE_INSTALL_PREFIX=$HOME/valkey-install -DCMAKE_BUILD_TYPE=Release
make -j$(nproc) install
# valkey is now installed under $HOME/valkey-install
```

### Example 2: create a `.deb` installer:

```bash
mkdir build-release
cd $_
cmake .. -DCMAKE_BUILD_TYPE=Release
make -j$(nproc) package
# ... CPack deb generation output
sudo gdebi -n ./valkey_8.1.0_amd64.deb
# valkey is now installed under /opt/valkey
```

### Example 3: create installer for non Debian systems (e.g. FreeBSD or
macOS):

```bash
mkdir build-release
cd $_
cmake .. -DCMAKE_BUILD_TYPE=Release
make -j$(nproc) package
mkdir -p /opt/valkey && ./valkey-8.1.0-Darwin.sh --prefix=/opt/valkey  --exclude-subdir
# valkey-server is now installed under /opt/valkey

```

Signed-off-by: Eran Ifrah <eifrah@amazon.com>
2024-11-07 18:01:37 -08:00
Wen Hui
3672f9b2c3
Revert "Decline unsubscribe related command in non-subscribed mode" (#1265)
This PR goal is to revert the changes on PR
https://github.com/valkey-io/valkey/pull/759

Recently, we got some reports that in Valkey 8.0 the PR
https://github.com/valkey-io/valkey/pull/759 (Decline unsubscribe
related command in non-subscribed mode) causes break change.
(https://github.com/valkey-io/valkey/issues/1228)

Although from my thought, call commands "unsubscribeCommand",
"sunsubscribeCommand", "punsubscribeCommand" in request-response mode
make no sense. This is why I created PR
https://github.com/valkey-io/valkey/pull/759

But breaking change is always no good, @valkey-io/core-team How do you
think we revert this PR code changes?

Signed-off-by: hwware <wen.hui.ware@gmail.com>
2024-11-07 20:05:16 -05:00
Binbin
1c18c80844
Fix incorrect cache_memory reset in functionsLibCtxClear (#1255)
functionsLibCtxClear should clear the provided lib_ctx parameter,
not the static variable curr_functions_lib_ctx, as this contradicts
the function's intended purpose.

The impact i guess is minor, like in some unhappy paths (diskless load
fails, function restore fails?), we will mess up the functions_caches
field, which is used in used_memory_functions / used_memory_scripts
fileds in INFO.

Signed-off-by: Binbin <binloveplay1314@qq.com>
2024-11-07 13:44:21 +08:00
Binbin
22bc49c4a6
Try to stabilize the failover call in the slot migration test (#1078)
The CI report replica will return the error when performing CLUSTER
FAILOVER:
```
-ERR Master is down or failed, please use CLUSTER FAILOVER FORCE
```

This may because the primary state is fail or the cluster connection
is disconnected during the primary pause. In this PR, we added some
waits in wait_for_role, if the role is replica, we will wait for the
replication link and the cluster link to be ok.

Signed-off-by: Binbin <binloveplay1314@qq.com>
2024-11-07 13:42:20 +08:00
Binbin
a0b1cbad83
Change errno from EEXIST to EALREADY in serverFork if child process exists (#1258)
We set this to EEXIST in 568c2e039bac388003068cd8debb2f93619dd462,
it prints "File exists" which is not quite accurate,
change it to EALREADY, it will print "Operation already in progress".

Signed-off-by: Binbin <binloveplay1314@qq.com>
2024-11-07 12:13:00 +08:00
Binbin
12c5af03b8
Remove empty DB check branch in KEYS command (#1259)
We don't think we really care about optimizing for the empty DB case,
which should be uncommon. Adding branches hurts branch prediction.

Signed-off-by: Binbin <binloveplay1314@qq.com>
2024-11-06 10:32:00 +08:00
Amit Nagler
48ebe21ad1
fix: clean up refactoring leftovers (#1264)
This commit addresses issues that were likely introduced during a rebase
related to:
b0f23df165

Change dual channel replication state in main handler only

Signed-off-by: naglera <anagler123@gmail.com>
2024-11-05 04:57:34 -08:00
Madelyn Olson
3c32ee1bda
Add a filter option to drop all cluster packets (#1252)
A minor debugging change that helped in the investigation of
https://github.com/valkey-io/valkey/issues/1251. Basically there are
some edge cases where we want to fully isolate a note from receiving
packets, but can't suspend the process because we need it to continue
sending outbound traffic. So, added a filter for that.

Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
2024-11-04 12:36:20 -08:00
Binbin
a102852d5e
Fix timing issue in cluster-shards tests (#1243)
The cluster-node-timeout is 3000 in our tests, the timing test wasn't
succeeding, so extending the wait_for made them much more reliable.

Signed-off-by: Binbin <binloveplay1314@qq.com>
2024-11-02 19:51:14 +08:00
Jim Brunner
0d7b2344b2
correct type internal to kvstore (minor) (#1246)
All of the internal variables related to number of dicts in the kvstore
are type `int`. Not sure why these 2 items were declared as `long long`.

Signed-off-by: Jim Brunner <brunnerj@amazon.com>
2024-11-01 15:16:18 -07:00
zhenwei pi
e985ead7f9
RDMA: Prevent IO for child process (#1244)
RDMA MR (memory region) is not forkable, the VMA (virtual memory area)
of a MR gets empty in a child process. Prevent IO for child process to
avoid server crash.

In the check for whether read and write is allowed in an RDMA
connection, a check that if we're in a child process is added. If we
are, the function returns an error, which will cause the RDMA client to
be disconnected.

Suggested-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
Signed-off-by: zhenwei pi <pizhenwei@bytedance.com>
2024-11-01 13:28:09 +01:00
Madelyn Olson
1c222f77ce
Improve performance of sdssplitargs (#1230)
The current implementation of `sdssplitargs` does repeated `sdscatlen`
to build the parsed arguments, which isn't very efficient because it
does a lot of extra reallocations and moves through the sds code a lot.
It also typically results in memory overhead, because `sdscatlen`
over-allocates, which is usually not needed since args are usually not
modified after being created.

The new implementation of sdssplitargs does two passes, the first to
parse the argument to figure out the final length and the second to
actually copy the string. It's generally about 2x faster for larger
strings (~100 bytes), and about 20% faster for small strings (~10
bytes). This is generally faster since as long as everything is in the
CPU cache, it's going to be fast.

There are a couple of sanity tests, none existed before, as well as some
fuzzying which was used to find some bugs and also to do the
benchmarking. The original benchmarking code can be seen
6576aeb86a.

```
test_sdssplitargs_benchmark - unit/test_sds.c:530] Using random seed: 1729883235
[test_sdssplitargs_benchmark - unit/test_sds.c:577] Improvement: 56.44%, new:13039us, old:29930us
[test_sdssplitargs_benchmark - unit/test_sds.c:577] Improvement: 56.58%, new:12057us, old:27771us
[test_sdssplitargs_benchmark - unit/test_sds.c:577] Improvement: 59.18%, new:9048us, old:22165us
[test_sdssplitargs_benchmark - unit/test_sds.c:577] Improvement: 54.61%, new:12381us, old:27278us
[test_sdssplitargs_benchmark - unit/test_sds.c:577] Improvement: 51.17%, new:16012us, old:32793us
[test_sdssplitargs_benchmark - unit/test_sds.c:577] Improvement: 49.18%, new:16041us, old:31563us
[test_sdssplitargs_benchmark - unit/test_sds.c:577] Improvement: 58.40%, new:12450us, old:29930us
[test_sdssplitargs_benchmark - unit/test_sds.c:577] Improvement: 56.49%, new:13066us, old:30031us
[test_sdssplitargs_benchmark - unit/test_sds.c:577] Improvement: 58.75%, new:12744us, old:30894us
[test_sdssplitargs_benchmark - unit/test_sds.c:577] Improvement: 52.44%, new:16885us, old:35504us
[test_sdssplitargs_benchmark - unit/test_sds.c:577] Improvement: 62.57%, new:8107us, old:21659us
[test_sdssplitargs_benchmark - unit/test_sds.c:577] Improvement: 62.12%, new:8320us, old:21966us
[test_sdssplitargs_benchmark - unit/test_sds.c:577] Improvement: 45.23%, new:13960us, old:25487us
[test_sdssplitargs_benchmark - unit/test_sds.c:577] Improvement: 57.95%, new:9188us, old:21849us
```

---------

Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
2024-10-31 11:37:53 -07:00
Masahiro Ide
91cbf77442
Eliminate snprintf usage at setDeferredAggregateLen (#1234)
to align with how we encode the length at `_addReplyLongLongWithPrefix`

Signed-off-by: Masahiro Ide <masahiro.ide@lycorp.co.jp>
Co-authored-by: Masahiro Ide <masahiro.ide@lycorp.co.jp>
2024-10-31 11:30:05 -07:00
zhenwei pi
ab98f375db
RDMA: Delete keepalive timer on closing (#1237)
Typically, RDMA connection gets closed by client side, the server side
handles diconnected CM event, and delete keepalive timer correctly.
However, the server side may close connection voluntarily, for example
the maxium connections exceed. Handle this case to avoid invalid memory
access.

Signed-off-by: zhenwei pi <pizhenwei@bytedance.com>
2024-10-30 11:12:42 +01:00
Binbin
789a73b0d0
Minor fix to debug logging in replicationFeedStreamFromPrimaryStream (#1235)
We should only print logs when hide-user-data-from-log is off.

Signed-off-by: Binbin <binloveplay1314@qq.com>
2024-10-30 10:25:50 +08:00
Shivshankar
13f5f665f2
Update the argument of clusterNodeGetReplica declaration (#1239)
clusterNodeGetReplica agrumnets are missed to migrate during the slave
to replication migration so updated the argument slave to replica.

Signed-off-by: Shivshankar-Reddy <shiva.sheri.github@gmail.com>
2024-10-30 00:19:56 +01:00
Madelyn Olson
5a4c0640ce
Mark main and serverAssert as weak symbols to be overridden (#1232)
At some point unit tests stopped building on MacOS because of duplicate
symbols. I had originally solved this problem by using a flag that
overrides symbols, but the much better solution is to mark the duplicate
symbols as weak and they can be overridden during linking. (Symbols by
default are strong, strong symbols override weak symbols)

I also added macos unit build to the CI, so that this doesn't silently
break in the future again.

---------

Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
2024-10-29 14:26:17 -07:00
zixuan zhao
8ee7a58025
Document log format configs in valkey.conf (#1233)
Add config options for log format and timestamp format introduced by
#1022
Related to #1225

This change adds two new configs into valkey.conf:
log-format
log-timestamp-format

---------

Signed-off-by: azuredream <zhaozixuan67@gmail.com>
2024-10-29 11:13:30 +01:00
Lipeng Zhu
c21f1dc084
Increase the IO_THREADS_MAX_NUM. (#1220)
### Description

This patch try to increase the max number of io-threads from 16(128) to
256 for below reasons:

1. The core number increases a lot in the modern server processors, for
example, the [Sierra
Forest](https://en.wikipedia.org/wiki/Sierra_Forest) processors are
targeted towards with up to **288** cores.
Due to limitation of **_io-threads_** number (16 and 128 ), benchmark
like https://openbenchmarking.org/test/pts/valkey even cannot run on a
high core count server.

2. For some workloads, the bottleneck could be main thread, but for the
other workloads, big key/value which caused heavy io, the bottleneck
could be the io-threads, for example benchmark `memtier_benchmark -s
127.0.0.1 -p 9001 "--data-size" "20000" --ratio 1:0 --key-pattern P:P
--key-minimum=1 --key-maximum 1000000 --test-time 180 -c 50 -t 16
--hide-histogram`. The QPS is still scalable after 16 io-threads.

![image](https://github.com/user-attachments/assets/e980f805-a162-44be-b03e-ab37a9c489cf)
**Fig 1. QPS Scale factor with io-threads number grows.**

Signed-off-by: Lipeng Zhu <lipeng.zhu@intel.com>
Co-authored-by: Wangyang Guo <wangyang.guo@intel.com>
2024-10-27 22:43:23 -07:00
Binbin
5d2ff853a3
Fix minor repldbfd leak in updateReplicasWaitingBgsave if fstat fails (#1226)
In the old code, if fstat fails, replica->repldbfd will hold the
fd and we are doing a free client. And in freeClient, we check and
close only if repl_state == REPLICA_STATE_SEND_BULK. So if fstat
fails, we will leak the fd.

We can also extend freeClient to handle REPLICA_STATE_WAIT_BGSAVE_END
as well, but here seems to be a more friendly (and safer) way.

Signed-off-by: Binbin <binloveplay1314@qq.com>
2024-10-27 15:23:00 +08:00
Shivshankar
4be09e434a
Fix typo in valkey.conf file's shutdown section (#1224)
Found typo "exists" ==> "exits" in valkey.conf in shutdown section.

Signed-off-by: Shivshankar-Reddy <shiva.sheri.github@gmail.com>
2024-10-25 14:03:59 +02:00
Lipeng Zhu
9c60fcdae2
Do security attack check only when command not found to reduce the critical path (#1212)
When explored the cycles distribution for main thread with io-threads
enabled. We found this security attack check takes significant time in
main thread, **~3%** cycles were used to do the commands security check
in main thread.

This patch try to completely avoid doing it in the hot path. We can do
it only after we looked up the command and it wasn't found, just before
we call commandCheckExistence.

---------

Signed-off-by: Lipeng Zhu <lipeng.zhu@intel.com>
Co-authored-by: Wangyang Guo <wangyang.guo@intel.com>
2024-10-25 11:13:28 +02:00
zixuan zhao
55bbbe09a3
Configurable log and timestamp formats (logfmt, ISO8601) (#1022)
Add ability to configure log output format and timestamp format in the
logs.

This change adds two new configs:

* `log-format`: Either legacy or logfmt (See https://brandur.org/logfmt)
* `log-timestamp-format`: legacy, iso8601 or milliseconds (since the
eppch).

Related to #1006.

Example:

```
$ ./valkey-server  /home/zhaoz12/git/valkey/valkey/valkey.conf
pid=109463 role=RDB/AOF timestamp="2024-09-10T20:37:25.738-04:00" level=warning message="WARNING Memory overcommit must be enabled! Without it, a background save or replication may fail under low memory condition. Being disabled, it can also cause failures without low memory condition, see https://github.com/jemalloc/jemalloc/issues/1328. To fix this issue add 'vm.overcommit_memory = 1' to /etc/sysctl.conf and then reboot or run the command 'sysctl vm.overcommit_memory=1' for this to take effect."
pid=109463 role=RDB/AOF timestamp="2024-09-10T20:37:25.738-04:00" level=notice message="oO0OoO0OoO0Oo Valkey is starting oO0OoO0OoO0Oo"
pid=109463 role=RDB/AOF timestamp="2024-09-10T20:37:25.738-04:00" level=notice message="Valkey version=255.255.255, bits=64, commit=affbea5d, modified=1, pid=109463, just started"
pid=109463 role=RDB/AOF timestamp="2024-09-10T20:37:25.738-04:00" level=notice message="Configuration loaded"
pid=109463 role=master timestamp="2024-09-10T20:37:25.738-04:00" level=notice message="monotonic clock: POSIX clock_gettime"
pid=109463 role=master timestamp="2024-09-10T20:37:25.739-04:00" level=warning message="Failed to write PID file: Permission denied"
```

---------

Signed-off-by: azuredream <zhaozixuan67@gmail.com>
2024-10-25 00:36:32 +02:00
Binbin
2956367731
Maintain return value of rdbSaveDb after writing slot-info aux (#1222)
All other places written in this function are maintained it,
although the caller of rdbSaveDb does not reply on it, it is
maintained to be consistent with other places, is its duty.

Signed-off-by: Binbin <binloveplay1314@qq.com>
2024-10-24 09:53:05 -04:00
Binbin
a21fe718f4
Limit CLUSTER_CANT_FAILOVER_DATA_AGE log to 10 times period (#1189)
If a replica is step into data_age too old stage, it can not
trigger the failover and currently it can not be automatically
recovered and we will print a log every
CLUSTER_CANT_FAILOVER_RELOG_PERIOD,
which is every second. If the primary has not recovered or there is
no manual failover, this log will flood the log file.

In this case, limit its frequency to 10 times period, which is
10 seconds in our code. Also in this data_age too old stage,
the repeated logs also can stand for the progress of the failover.

See also #780 for more details about it.

Signed-off-by: Binbin <binloveplay1314@qq.com>
Co-authored-by: Ping Xie <pingxie@outlook.com>
2024-10-24 16:38:47 +08:00
muelstefamzn
c419524c05
Trim free space from inline command argument strings to avoid excess memory usage (#1213)
The command argument strings created while parsing inline commands (see
`processInlineBuffer()`) can contain free capacity. Since some commands
,such as `SET`, store these strings in the database, that free capacity
increases the memory usage. In the worst case, it could double the
memory usage.

This only occurs if the inline command format is used. The argument
strings are built by appending character by character in
`sdssplitargs()`. Regular RESP commands are not affected.

This change trims the strings within `processInlineBuffer()`.

### Why `trimStringObjectIfNeeded()` within `object.c` is not solving
this?

When the command argument string is packed into an object,
`trimStringObjectIfNeeded()` is called.

This does only trim the string if it is larger than
`PROTO_MBULK_BIG_ARG` (32kB), as only strings larger than this would
ever need trimming if the command it sent using the bulk string format.

We could modify this condition, but that would potentially have a
performance impact on commands using the bulk format. Since those make
up for the vast majority of executed commands, limiting this change to
inline commands seems prudent.

### Experiment Results

* 1 million `SET [key] [value]` commands
* Random keys (16 bytes)
* 600 bytes values

Memory usage without this change:

```
used_memory:1089327888
used_memory_human:1.01G
used_memory_rss:1131696128
used_memory_rss_human:1.05G
used_memory_peak:1089348264
used_memory_peak_human:1.01G
used_memory_peak_perc:100.00%
used_memory_overhead:49302800
used_memory_startup:911808
used_memory_dataset:1040025088
used_memory_dataset_perc:95.55%
```

Memory usage with this change:
```
used_memory:705327888
used_memory_human:672.65M
used_memory_rss:718802944
used_memory_rss_human:685.50M
used_memory_peak:705348256
used_memory_peak_human:672.67M
used_memory_peak_perc:100.00%
used_memory_overhead:49302800
used_memory_startup:911808
used_memory_dataset:656025088
used_memory_dataset_perc:93.13%
```

If the same experiment is repeated using the normal RESP array of bulk
string format (`*3\r\n$3\r\nSET\r\n...`) then the memory usage is 672MB
with and without of this change.

If a replica is attached, its memory usage is 672MB with and without
this change, since the replication link never uses inline commands.

Signed-off-by: Stefan Mueller <muelstef@amazon.com>
2024-10-23 16:56:32 -07:00
danish-mehmood
c176de4251
Clarify the wording from dually to the more common doubly (#1214)
Clarify documentation is ziplist.c

Signed-off-by: danish-mehmood <rdm355190@gmail.com>
2024-10-23 14:30:42 -07:00
Binbin
b803f7aeff
Cleaned up getSlotOrReply is return -1 instead of C_ERR (#1211)
Minor cleanup since getSlotOrReply return -1 on error, not return C_ERR.

Signed-off-by: Binbin <binloveplay1314@qq.com>
2024-10-23 17:11:42 +08:00
Binbin
5d70ccd70e
Make replica CLUSTER RESET flush async based on lazyfree-lazy-user-flush (#1190)
Currently, if the replica has a lot of data, CLUSTER RESET
will block for a while and report the slowlog, and it seems
that there is no harm in making it async so external components
can be easier when monitoring it.

Signed-off-by: Binbin <binloveplay1314@qq.com>
Co-authored-by: Ping Xie <pingxie@outlook.com>
2024-10-23 10:22:25 +08:00
Shivshankar
285064b114
fix typo (#1202)
Signed-off-by: Shivshankar-Reddy <shiva.sheri.github@gmail.com>
2024-10-21 22:54:40 -04:00
Shivshankar
771918e4bf
Updating command.def by running the generate-command-code.py (#1203)
Part of https://github.com/valkey-io/valkey/pull/1200 PR, since feild is
changed. Looks like commands.def is missed to get genereated based on
the changes so that is causing CI failure on unstable.

Signed-off-by: Shivshankar-Reddy <shiva.sheri.github@gmail.com>
2024-10-21 13:48:29 -07:00
Viktor Söderqvist
5885dc56bd
Fix BGSAVE CANCEL since and history fields (#1200)
Fixes wrong "since" and "history" introduced in #757.

---------

Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
2024-10-21 16:04:47 +02:00
ranshid
29b83f1ac8
Introduce bgsave cancel (#757)
In some cases bgsave child process can run for a long time exhausting
system resources. Although it is possible to kill the bgsave child
process from the system shell, sometimes it is not possible allowing OS
level access.

This PR adds a new subcommand to the BGSAVE command.
When user will issue `BGSAVE CANCEL`, it will do one of the 2:

1. In case a bgsave child process is currently running, the child
   process would be immediately killed thus terminating any
   save/replication full sync process.
2. In case a bgsave child process is SCHEDULED to run, the scheduled
   execution will be cancelled.

---------

Signed-off-by: ranshid <ranshid@amazon.com>
Signed-off-by: ranshid <88133677+ranshid@users.noreply.github.com>
Signed-off-by: Ran Shidlansik <ranshid@amazon.com>
Signed-off-by: Binbin <binloveplay1314@qq.com>
Co-authored-by: Binbin <binloveplay1314@qq.com>
Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
2024-10-21 11:56:44 +02:00
zhenwei pi
71f8c34eed
RDMA: Fix listener priv opaque pointer (#1194)
struct connListener.priv should be used by connection type specific
data, static local listener data should not use this.

A RDMA config structure is going to be introduced in the next step:

```
typedef struct serverRdmaContextConfig {
    char *bindaddr;
    int bindaddr_count;
    int port;
    int rx_size;
    int comp_vector;
    ...
} serverRdmaContextConfig;
```

Then a builtin RDMA will be supported.

Signed-off-by: zhenwei pi <pizhenwei@bytedance.com>
2024-10-21 10:11:27 +02:00
Binbin
2743b7e04b
Fix SORT GET to ignore special pattern # in cluster slot check (#1182)
This special pattern '#' is used to get the element itself,
it does not actually participate in the slot check.

In this case, passing `GET #` will cause '#' to participate
in the slot check, causing the command to get an
`pattern may be in different slots` error.

Signed-off-by: Binbin <binloveplay1314@qq.com>
2024-10-19 14:56:10 +08:00
zhenwei pi
64cfdf61eb
Introduce connection context for Unix socket (#1160)
Hide 'unixsocketgroup' and 'unixsocketperm' into a Unix socket specific
data structure. A single opaque pointer 'void *priv' is enough for a
listener. Once any new config is added, we don't need 'void *priv2',
'void *priv3' and so on.

Signed-off-by: zhenwei pi <pizhenwei@bytedance.com>
2024-10-18 17:48:18 -07:00
Lipeng Zhu
a62d1f177b
Fix false sharing issue between main thread and io-threads when access used_memory_thread. (#1179)
When profiling some workloads with `io-threads` enabled. We found the
false sharing issue is heavy.

This patch try to split the the elements accessed by main thread and
io-threads into different cache line by padding the elements in the head
of `used_memory_thread_padded` array. 

This design helps mitigate the false sharing between main
thread and io-threads, because the main thread has been the bottleneck
with io-threads enabled. We didn't put each element in an individual
cache line is that we don't want to bring the additional cache line
fetch operation (3 vs 16 cache line) when call function like
`zmalloc_used_memory()`.

---------

Signed-off-by: Lipeng Zhu <lipeng.zhu@intel.com>
Signed-off-by: Lipeng Zhu <zhu.lipeng@outlook.com>
Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
Co-authored-by: Wangyang Guo <wangyang.guo@intel.com>
Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
2024-10-17 12:37:10 +02:00
Binbin
701ab72429
Remove the restriction that cli --cluster create requires at least 3 primary nodes (#1075)
There is no limitation in Valkey to create a cluster with 1 or 2 primaries,
only that it cannot do automatic failover. Remove this restriction and
add `are you sure` prompt to prompt the user.

This allow we use it to create a test cluster by cli or by
create-cluster.

Signed-off-by: Binbin <binloveplay1314@qq.com>
2024-10-17 13:33:44 +08:00
Nadav Levanoni
136d0fd212
Add 'WithDictIndex' expiry API and update RANDOMKEY command (#1155)
https://github.com/valkey-io/valkey/issues/1145

First part of a two-step effort to add `WithSlot` API for expiry. This
PR is to fix a crash that occurs when a RANDOMKEY uses a different slot
than the cached slot of a client during a multi-exec.

The next part will be to utilize the new API as an optimization to
prevent duplicate work when calculating the slot for a key.

---------

Signed-off-by: Nadav Levanoni <nadavl@amazon.com>
Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
Co-authored-by: Nadav Levanoni <nadavl@amazon.com>
Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>
2024-10-16 17:40:11 -07:00
zarkash-aws
06cfe2c254
Improved hashing algorithm in luaS_newlstr (#1168)
**Overview**

This PR introduces the use of
[MurmurHash3](https://en.wikipedia.org/wiki/MurmurHash) as the hashing
function for Lua's luaS_newlstr function, replacing the previous simple
hash function. The change aims to improve performance, particularly for
large strings.

**Changes**

Implemented MurmurHash3 algorithm in lstring.c
Updated luaS_newlstr to use MurmurHash3 for string hashing

**Performance Testing:**
Test Setup:

1. Ran a valkey server
2. Loaded 1000 keys with large values (100KB each) to the server using a
Lua script
```
local numKeys = 1000

for i = 1, numKeys do
    local key = "large_key_" .. i
    local largeValue = string.rep("x", 1024*100)
    redis.call("SET", key, largeValue)
end
```
3. Used a Lua script to randomly select and retrieve keys
```
local randomKey = redis.call("RANDOMKEY")
local result = redis.call("GET", randomKey)
```
4. Benchmarked using valkey-benchmark:
`./valkey-benchmark -n 100000 evalsha
c157a37967e69569339a39a953c046fc2ecb4258 0`

Results:

A | Unstable | This PR | Change
-- | -- | -- | --
Throughput | 6,835.74 requests per second | 17,061.94 requests per
second | **+150% increase**
Avg Latency | 7.218 ms | 2.838 ms | **-61% decrease**
Min Latency | 3.144 ms | 1.320 ms | **-58% decrease**
P50 Latency | 8.463 ms | 3.167 ms | **-63% decrease**
P95 Latency | 8.863 ms | 3.527 ms | **-60% decrease**
P99 Latency | 9.063 ms | 3.663 ms | **-60% decrease**
Max Latency | 63.871 ms | 55.327 ms | **-13% decrease**

Summary:
* Throughput: Improved by 150%.
* Latency: Significant reductions in average, minimum, and percentile
latencies (P50, P95, P99), leading to much faster response times.
* Max Latency: Slightly decreased by 13%, indicating fewer outlier
delays after the fix.

---------

Signed-off-by: Shai Zarka <zarkash@amazon.com>
Signed-off-by: zarkash-aws <zarkash@amazon.com>
Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>
Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
2024-10-15 15:18:58 -07:00
Shivshankar
b927fb09d4
Remove 'posting in the mailing list' in CONTRIBUTING.md (#1174)
Remove reference to "the mailing list". We don't have a mailing list.
2024-10-15 23:03:27 +02:00
Amit Nagler
b0f23df165
Refactor return and goto statements (#945)
Consolidate the cleanup of local variables to a single point within the
method, ensuring proper resource management and p
reventing memory leaks or double-free issues.

Previoslly descused here:
- https://github.com/valkey-io/valkey/pull/60#discussion_r1667872633
- https://github.com/valkey-io/valkey/pull/60#discussion_r1668045666

---------

Signed-off-by: naglera <anagler123@gmail.com>
Signed-off-by: Amit Nagler <58042354+naglera@users.noreply.github.com>
Co-authored-by: Ping Xie <pingxie@outlook.com>
2024-10-15 09:26:42 -07:00
Binbin
247a8f23c5
Fix FUNCTION KILL error message being displayed as SCRIPT KILL (#1171)
The client that was killed by FUNCTION KILL received a reply of
SCRIPT KILL and the server log also showed SCRIPT KILL.

Signed-off-by: Binbin <binloveplay1314@qq.com>
2024-10-15 23:32:42 +08:00
Binbin
dc05a327f9
Take hz into account in activerehashing to avoid CPU spikes (#977)
Currently in conf we describe activerehashing as: Active rehashing
uses 1 millisecond every 100 milliseconds of CPU time. This is the
case for hz = 10.

If we change hz, the description in conf will be inaccurate. Users
may notice that the server spends some CPU (used in activerehashing)
at high hz but don't know why, since our cron calls are fixed to 1ms.

This PR takes hz into account and fixed the CPU usage at 1% (this may
not be accurate in some cases because we do 100 step rehashing in
dictRehashMicroseconds but it can avoid CPU spikes in this case).

This PR also improves the description of the activerehashing
configuration item to explain this change.

Signed-off-by: Binbin <binloveplay1314@qq.com>
Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
2024-10-15 23:32:22 +08:00
Romain Geissler @ Amadeus
e30ae762a8
Rename z{malloc,calloc,realloc,free} into valkey_{malloc,calloc,realloc,free} (#1169)
The zcalloc symbol is a symbol name already used by zlib, which is
defining other names using the "z" prefix specific to zlib. In practice,
linking valkey with a static openssl, which itself might depend on a
static libz will result in link time error rejecting multiple symbol
definitions.

Fixes: #1157

Signed-off-by: Romain Geissler <romain.geissler@amadeus.com>
2024-10-15 13:05:22 +02:00
Binbin
416defdc0e
Minor cleanups in acl-v2 tests (#1166)
1. Make sure to assert the ERR prefix.
2. Match "Syntax error*" in case of the message change.

Signed-off-by: Binbin <binloveplay1314@qq.com>
2024-10-15 10:30:03 +08:00
Binbin
87b5e13465
Use listLast to replace listIndex -1 (#1163)
Minor cleanup, listLast do the same thing and is widely used
and easier to understand (less code).

Signed-off-by: Binbin <binloveplay1314@qq.com>
2024-10-15 10:29:52 +08:00
Binbin
9c20c84251
Set fail-fast to false in daily CI (#1162)
Currently in our daily, if a job fails, it will cancel the other jobs
in the same matrix, we want to avoid this so that all jobs in a matrix
can eventually run to completion.

Docs: jobs.<job_id>.strategy.fail-fast applies to the entire matrix.
If jobs.<job_id>.strategy.fail-fast is set to true or its expression
evaluates to true, GitHub will cancel all in-progress and queued jobs
in the matrix if any job in the matrix fails. This property defaults
to true.

Signed-off-by: Binbin <binloveplay1314@qq.com>
2024-10-15 10:29:34 +08:00
ranshid
36d438ba27
Deflake test ync should continue if not all slaves dropped dual-channel-replication (#1164)
Sometimes when dual-channel is turned off the tested replica might
disconnect on COB overrun. disable the replica COB limit in order to
prevent such cases.

Fixes: #1153

Signed-off-by: Ran Shidlansik <ranshid@amazon.com>
Signed-off-by: Binbin <binloveplay1314@qq.com>
Co-authored-by: Binbin <binloveplay1314@qq.com>
2024-10-14 15:31:59 +08:00
ranshid
597aa037cc
Deflake test Primary COB growth with inactive replica (#1165)
in case of valgrind run, the replica might get disconnected from the
primary due to repl-timeout reached. Fix is to configure larger timeout
in case of valgrind test.

**Partially** fixes: #1152

Signed-off-by: Ran Shidlansik <ranshid@amazon.com>
2024-10-14 15:30:29 +08:00
Binbin
1a5c80fe90
Minor comments cleanup around replication.c (#1154)
Typo, comment cleanups.

Signed-off-by: Binbin <binloveplay1314@qq.com>
2024-10-14 12:37:19 +08:00
Binbin
e50f31ef3a
Fix aof race in shutdown nosave timedout script test (#1156)
Ci report this failure:
```
*** [err]: SHUTDOWN NOSAVE can kill a timedout script anyway in tests/unit/scripting.tcl
Expected 'BUSY Valkey is busy running a script. *' to match '*connection refused*' (context: type eval line 8 cmd {assert_match {*connection refused*} $e} proc ::test)
```

We can see the logs the shutdown got rejected because there is an AOFRW
pending:
```
Writing initial AOF, can't exit.
Errors trying to shut down the server. Check the logs for more information.
```

The reason is that the previous test enabled the aof.

Signed-off-by: Binbin <binloveplay1314@qq.com>
2024-10-13 22:06:28 +08:00
Masahiro Ide
262d970a50
Move prepareClientToWrite out of loop for HGETALL command (#1119)
Similar to #860 but this is for HGETALL families (HGETALL/HKEYS/HVALS).
This patch moves `prepareClientToWrite` out of the loop to reduce the
function overhead.

Signed-off-by: Masahiro Ide <imasahiro9@gmail.com>
Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>
2024-10-11 21:28:42 -07:00
Shivshankar
ef971a34eb
Correct the note details for deprecated config 'io-threads-do-reads' (#1150)
Remove explicit reference to removal and just indicate to avoid using it.

Signed-off-by: Shivshankar-Reddy <shiva.sheri.github@gmail.com>
2024-10-11 21:21:09 -07:00
Binbin
014219879d
Fix typo last_procssed -> last_processed (#1142)
Minor typo.

Signed-off-by: Binbin <binloveplay1314@qq.com>
2024-10-11 00:09:22 +08:00
Shivshankar
079f18ad97
Add io-threads-do-reads config to deprecated config table to have no effect. (#1138)
this fixes: https://github.com/valkey-io/valkey/issues/1116

_Issue details from #1116 by @zuiderkwast_ 

> This config is undocumented since #758. The default was changed to
"yes" and it is quite useless to set it to "no". Yet, it can happen that
some user has an old config file where it is explicitly set to "no". The
result will be bad performace, since I/O threads will not do all the
I/O.
> 
> It's indeed confusing.
> 
> 1. Either remove the whole option from the code. And thus no need for
documentation. _OR:_
> 2. Introduce the option back in the configuration, just as a comment
is fine. And showing the default value "yes": `# io-threads-do-reads
yes` with additional text.
> 
> _Originally posted by @melroy89 in [#1019 (reply in
thread)](https://github.com/orgs/valkey-io/discussions/1019#discussioncomment-10824778)_

---------

Signed-off-by: Shivshankar-Reddy <shiva.sheri.github@gmail.com>
2024-10-10 17:46:09 +02:00
Roshan Khatri
9b8a06137c
Fix empty response for ACL CAT category subcommand for module defined categories (#1140)
The module commands which were added to acl categories were getting
skipped when `ACL CAT category` command was executed.

This PR fixes the bug.
Before:
```
127.0.0.1:6379> ACL CAT foocategory
(empty array)
```
After:
```
127.0.0.1:6379> ACL CAT foocategory
aclcheck.module.command.test.add.new.aclcategories
```

---------

Signed-off-by: Roshan Khatri <rvkhatri@amazon.com>
Co-authored-by: Harkrishn Patro <bunty.hari@gmail.com>
2024-10-09 21:20:47 -07:00
kronwerk
cd8de095c4
Add flush-before-load option for repl-diskless-load (#909)
A new option for diskless replication on the replica side.

After a network failure, the replica may need to perform a full sync.
The other option for diskless full sync is `swapdb`, but it uses twice
as much memory, temporarily. In situations where this is not acceptable,
and where losing data is acceptable, the `flush-before-load` can be
useful. If the full sync fails, the old data is lost though. Therefore,
the new option is marked as "dangerous".

---------

Signed-off-by: kronwerk <ca11e5e22g@gmail.com>
Signed-off-by: kronwerk <kronwerk@users.noreply.github.com>
Co-authored-by: kronwerk <ca11e5e22g@gmail.com>
2024-10-09 13:11:53 +02:00
Binbin
1892f8a731
Add server log when module load fails with busy name (#1084)
Currently when module loading fails due to busy name, we
don't have a clean way to assist to troubleshooting.

Case 1: when loading the same module multiple times, we can
not detemine the cause of its failure without referring to
the module list or the earliest module load log. The log
may not exist and sometimes it is difficult for people
to associate module list.

Case 2: when multiple modules use the same module name,
we can not quickly associate the busy name without referring
to the module list and the earliest module load log.
Different people wrote modules with the same module name,
they don't easily associate module name.

So in this PR, when doing module onload, we will try to
print a busy name log if this happen. Currently we check
ctx.module since if it is NULL it means the Init call
failed, and Init currently only fails with busy name.

It's kind of ugly. It would have been nice if we could have had a
better way for onload to signal why the load failed.

Signed-off-by: Binbin <binloveplay1314@qq.com>
2024-10-09 16:10:29 +08:00
chx9
cba8eaf4c9
fix typo (#1136)
Signed-off-by: chx9 <cheng.huan@icloud.com>
2024-10-08 08:07:51 -07:00
Madelyn Olson
e617bf2ddc
Removing incorrect comment about a warning (#1132)
There is a lot of bad legacy usage of `default:` with enums, which is an
anti-pattern. If you omit the default, the compiler will tell you if a
new enum value was added and that it is missing from a switch statement.

Someone mentioned on another PR they used `default:` because of this
warning, so just removing it, but might create an issue to do a wider
cleanup.

Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
2024-10-07 11:56:15 -07:00
Masahiro Ide
b5eb793079
Eliminate hashTypeIterator memory allocation by assigning it on stack (#1105)
Signed-off-by: Masahiro Ide <masahiro.ide@lycorp.co.jp>
Signed-off-by: Masahiro Ide <imasahiro9@gmail.com>
Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
Co-authored-by: Masahiro Ide <masahiro.ide@lycorp.co.jp>
2024-10-06 21:34:45 +02:00
otheng
a1cc7c263a
Reuse obey_client variable in processCommand() function (#1101)
I’ve prepared a minor fix for `processCommand()` function. 

In `processCommand()`, the `obey_client` variable is created, but some
conditional statements call the `mustObeyClient()` function instead of
reusing `obey_client`.

I’ve modified these statements to `reuse obey_client`.

Since I’m relatively new to Redis, please let me know if there are any
reasons why the conditional statements need to call `mustObeyClient()`
again.

Thank you for taking the time to review my PR.

Signed-off-by: otheng03 <07c00h@gmail.com>
2024-10-06 10:40:58 -07:00
Viktor Söderqvist
00c97979d9
Make ./runtest --dump-logs dump logs on crash (#1117)
Until now, this flag only dumped logs on a failed assert in test case.
It is useful that this flag dumps logs on a crash as well.

Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
2024-10-06 10:40:36 -07:00
Shivshankar
0c49053214
Adding the "-j" option in ci make commands to parallelize CI builds (#1128)
fixes: https://github.com/valkey-io/valkey/issues/1123 

As per github documentation below is core information on runners.

**Linux:**
public repositories: 4 cores 
private repositories: 2 cores

**Macos:**
its 3 or 4 based on both and its depends on the Processor.

**Reference details for more information:** Discussion in
https://github.com/valkey-io/valkey/issues/1123

- Public repo:
https://docs.github.com/en/actions/using-github-hosted-runners/using-github-hosted-runners/about-github-hosted-runners#standard-github-hosted-runners-for-public-repositories

- Private repo:
https://docs.github.com/en/actions/using-github-hosted-runners/using-github-hosted-runners/about-github-hosted-runners#standard-github-hosted-runners-for--private-repositories

Suggested-by: zhenwei pi <pizhenwei@bytedance.com>
Signed-off-by: Shivshankar-Reddy <shiva.sheri.github@gmail.com>
2024-10-05 21:12:07 -07:00
zhenwei pi
b96f8813b7
Add tags into .gitignore (#1125)
ctags is used widely on a linux platform, add tags into .gitignore.

Signed-off-by: zhenwei pi <pizhenwei@bytedance.com>
2024-10-05 10:03:57 +02:00
zhenwei pi
23ae21244e
RDMA: use protected mode for test (#1124)
Since a7cbca40661 ("RDMA: Support .is_local method (#1089)"),
valkey-server started to support auto-detect local connection, then we
can use protected mode for local RDMA device for test.

Signed-off-by: zhenwei pi <pizhenwei@bytedance.com>
2024-10-04 23:22:48 +02:00
Shivshankar
c8aaceed46
Correct the typo in valkey.conf file (#1118)
Correct the typo in valkey.conf file

Signed-off-by: Shivshankar-Reddy <shiva.sheri.github@gmail.com>
2024-10-04 13:30:59 -07:00
Parth
d8cd3527bf
Removing Redis from internal lua function names and comments (#1102)
Improved documentation and readability of lua code as well as removed references to Redis.

---------

Signed-off-by: Parth Patel <661497+parthpatel@users.noreply.github.com>
2024-10-04 12:58:42 -07:00
Shivshankar
1c22680fa7
Include second solo test execution in total test count (#1071)
This change counts both solo test executions to give an accurate total number of tests being run.

---------

Signed-off-by: Shivshankar-Reddy <shiva.sheri.github@gmail.com>
2024-10-04 10:19:44 -07:00
Ricardo Dias
6a8540cefe
Fix some unitialized fields in client struct (#1126)
This commit adds initialization code for the fields
`io_last_reply_block` and `io_last_bufpos` of the `client` struct.

While in the current code flow, these fields are only accessed after
being written in the `trySendWriteToIOThreads`, I discovered that they
were not being initialized while doing some changes to the code flow of
IO threads.

I believe it's good pratice to initialize all fields of a struct upon
creation, and will avoid future bugs which are usually hard to debug.

Signed-off-by: Ricardo Dias <ricardo.dias@percona.com>
2024-10-04 09:17:49 -07:00
Viktor Söderqvist
dcac3e1499
Fix undefined-santitizer warning in rax test (#1122)
Fix the warning introduced in #688:

```
unit/test_rax.c:168:15: runtime error: left shift of 36625 by 16 places cannot be represented in type 'int'
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior unit/test_rax.c:168:15 in 
Fuzz test in mode 1 [7504]: 
```

Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
2024-10-03 17:34:03 +02:00
Madelyn Olson
150c197bdd
Apply CVE patches for CVE-2024-31449, CVE-2024-31227, CVE-2024-31228 (#1115)
Applying the CVEs against mainline.

(CVE-2024-31449) Lua library commands may lead to stack overflow and
potential RCE.
(CVE-2024-31227) Potential Denial-of-service due to malformed ACL
selectors.
(CVE-2024-31228) Potential Denial-of-service due to unbounded pattern
matching.

Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
2024-10-02 19:22:09 -04:00
Melroy van den Berg
b77440a9b9
Build binary releases with systemd support (#1107)
- Add systemd support to the build artifact tarballs, so people can use
it under systemd compatible distros. As discussed here:
https://github.com/orgs/valkey-io/discussions/1103#discussioncomment-10815549.
Adding `libsystemd-dev` to install and add `USE_SYSTEMD=yes` to the
build.
- Cleanup & bring the arm & x86 workflow files in-sync. It was a bit of
a mess ;) (removing `jq wget awscli` from the 'Tarball' step)

Signed-off-by: Melroy van den Berg <melroy@melroy.org>
2024-10-02 19:48:54 +02:00
Melroy van den Berg
43c80a2860
Avoid .c, .d and .o files from being copied to the binary tar.gz releases (#1106)
As discussed here:
https://github.com/orgs/valkey-io/discussions/1103#discussioncomment-10814006

`cp` can't be used anymore, `rsync` is more powerful and allow to
exclude files.

Alternatively:

1. Remove the c, d and o files. Which isn't ideal either.
2. Improve the build. Eg. by building inside a `build` directory instead
of in the src folder.

Ps. I know these workflows aren't trigger in this PR. Only via "Build
Release Packages" workflow action:
https://github.com/valkey-io/valkey/actions/workflows/build-release-packages.yml..
So I can't fully test in this PR. But it should work ^^

Ps. ps. I did test `rsync -av --exclude='*.c' --exclude='*.d'
--exclude='*.o' src/valkey-*` command in isolation and that works as
expected!

---------

Signed-off-by: Melroy van den Berg <melroy@melroy.org>
2024-10-02 19:43:34 +02:00
Guillaume Koenig
f85d8bfde9
Rax size tracking (#688)
Introduce a `size_t` field into the rax struct to track allocation size.
Update the allocation size on rax insert and deletes.
Return the allocation size when `raxAllocSize` is called.

This size tracking is now used in MEMORY USAGE and MEMORY STATS in place
of the previous method based on sampling.

The module API allows to create sorted dictionaries, which are backed by
rax. Users now also get precise memory allocation for them (through
`ValkeyModule_MallocSizeDict`).

Fixes #677.

For the release notes:

* MEMORY USAGE and MEMORY STATS are now exact for streams, rather than
based on sampling.

---------

Signed-off-by: Guillaume Koenig <knggk@amazon.com>
Signed-off-by: Guillaume Koenig <106696198+knggk@users.noreply.github.com>
Co-authored-by: Joey <yzhaon@amazon.com>
Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
2024-10-02 19:28:55 +02:00
Binbin
9827eef4d0
Avoid timing issue in diskless-load-swapdb test (#1077)
Since we paused the primary node earlier, the replica may enter
cluster down due to primary node pfail. Here set allow read to
prevent subsequent read errors.

Signed-off-by: Binbin <binloveplay1314@qq.com>
2024-10-01 13:14:30 -07:00
Wen Hui
613e4e028f
Update keyspace notifications link to valkey.io in code comment (#1100)
As title description


![image](https://github.com/user-attachments/assets/655324e6-b042-4c2f-b558-b912a7d2c10c)

Signed-off-by: hwware <wen.hui.ware@gmail.com>
2024-10-01 04:30:35 -04:00
Masahiro Ide
ac569c09f8
Create empty lua tables with specified initial capacity as much as possible (#1092)
Currently, we create a Lua table without initial capacity even when the
capacity is known. As a result, we need to resize the Lua tables
repeatedly when converting RESP serialized object to Lua object and it
consumes extra cpu resources a bit when we need to transfer
RESP-serialized data to Lua world.

This patch try to remove this extra resize to reduce (re-)allocation
overhead.

| name | unstable bb57dfe6303 (rps) | this patch(rps) | improvements |
| --------------- | -------- | --------- | -------------- |
| evalsha - hgetall h1 | 60565.68 | 64487.01 |  6.47% |
| evalsha - hgetall h10 | 47023.41 | 50602.17 | 7.61% |
| evalsha - hgetall h25 | 33572.82 | 37345.48 | 11.23% |
| evalsha - hgetall h50 | 24206.63 | 25276.14 | 4.42% |
| evalsha - hgetall h100 | 15068.87 | 15656.8 | 3.90% |
| evalsha - hgetall h300 | 5948.56 | 6094.74 | 2.46% |

Signed-off-by: Masahiro Ide <masahiro.ide@lycorp.co.jp>
Co-authored-by: Masahiro Ide <masahiro.ide@lycorp.co.jp>
2024-09-30 20:59:22 -07:00
Viktor Söderqvist
69eddb4874
Speed up AOF rewrite test case (#1093)
These two test cases run in a loop:

* AOF rewrite during write load: RDB preamble=yes
* AOF rewrite during write load: RDB preamble=no

Both of the test cases build up a lot of data (3-4 million keys when I
run locally) so we should empty the data before the second test case.
Otherwise, the second test cases adds keys on top of the keys added in
the first test case, resulting in the double number of keys and takes
more time.

Before this commit:

    [ok]: AOF rewrite during write load: RDB preamble=yes (18225 ms)
    [ok]: AOF rewrite during write load: RDB preamble=no (37249 ms)

After:

    [ok]: AOF rewrite during write load: RDB preamble=yes (18777 ms)
    [ok]: AOF rewrite during write load: RDB preamble=no (19940 ms)

Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
2024-09-30 19:55:23 +02:00
ranshid
c873287d16
avoid double close on replica main channel (#1097)
fixes #1088

Signed-off-by: Ran Shidlansik <ranshid@amazon.com>
2024-09-30 07:41:05 -07:00
zhenwei pi
a7cbca4066
RDMA: Support .is_local method (#1089)
There is no ethernet style virtual device (like lo 127.0.0.1) for RDMA,
however a connection with the same local address and peer address are
considered as local.

Signed-off-by: zhenwei pi <pizhenwei@bytedance.com>
2024-09-30 11:54:05 +02:00
chx9
bb57dfe630
Fix typo in test_helper.tcl (#1080)
Fix typo in test_helper.tcl: even driven => event driven

Signed-off-by: chx9 <cheng.huan@icloud.com>
2024-09-28 11:48:35 +08:00
Shivshankar
a37dee4b3a
Change return value of aeTimeProc callback function to long long. (#1057)
moduleTimerHandler is aeTimeProc handler and event loop gets created
with this. However, found that the function return type is int but
actually returns "long long" value(i.e., next_period). and return value
being assigned to int variable in processTimeEvents(where time events
are processed), this might cause an overflow of the timer values. So
changed the return type of the function to long long. And also updated
other callback function return type to be consistent.

I found this when I was checking functions reported in
https://github.com/valkey-io/valkey/issues/1054 issue stacktrace. (FYI,
this is just to update the return type to be consistent and it will not
the fix for the issue reported)

Signed-off-by: Shivshankar-Reddy <shiva.sheri.github@gmail.com>
2024-09-27 12:20:47 -07:00
Binbin
bf8183d065
Add --cluster option to runtest to run only cluster tests (#1052)
Currently cluster tests in unit/cluster are run as part of
the ./runtest. Sometims we change the cluster code and only
want to run cluster tests. This PR added a --cluster option
to runtest so that we can run only cluster tests.

Signed-off-by: Binbin <binloveplay1314@qq.com>
2024-09-26 10:31:57 +08:00
zhenwei pi
983bb5110d
Fix RDMA build dependence (#1074)
RDMA module has dependence on '$(SERVER_NAME)' rather than the old style
'$(REDIS_SERVER_NAME)'.

Signed-off-by: zhenwei pi <pizhenwei@bytedance.com>
2024-09-25 11:30:45 +02:00
Viktor Söderqvist
99865b197c
Fix bug for CLUSTER SLOTS from EVAL over TLS (#1072)
For fake clients like the ones used for Lua and modules, we don't
determine TLS in the right way, causing CLUSTER SLOTS from EVAL over TLS
to fail a debug-assert.

This error was introduced when the caching of CLUSTER SLOTS was
introduced, i.e. in 8.0.0.

Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
2024-09-25 03:55:53 -04:00
Mikhail Koviazin
6b3a90e40e
Added new reformat commit to .git-blame-ignore-revs (#1073)
Signed-off-by: Mikhail Koviazin <mikhail.koviazin@aiven.io>
Signed-off-by: Binbin <binloveplay1314@qq.com>
Co-authored-by: Binbin <binloveplay1314@qq.com>
2024-09-25 15:34:36 +08:00
Binbin
80fcbd3fec
Fix module / script call CLUSTER SLOTS / SHARDS fake client check crash (#1063)
The reason is VM_Call will use a fake client without connection,
so we also need to check if c->conn is NULL.

This also affects scripts. If they are called in the script, the
server will crash. Injecting commands into AOF will also cause
startup failure.

Fixes #1054.

Signed-off-by: Binbin <binloveplay1314@qq.com>
2024-09-25 14:50:48 +08:00
Binbin
6e0216471d
Trigger the election as soon as possible when doing a forced manual failover (#1067)
In CLUSTER FAILOVER FORCE case, we will set mf_can_start to
1 and wait for a cron to trigger the election. We can also set a
CLUSTER_TODO_HANDLE_MANUALFAILOVER flag so that we
can start the election as soon as possible instead of waiting for
the cron, so that we won't have a 100ms delay (clusterCron).

Signed-off-by: Binbin <binloveplay1314@qq.com>
2024-09-25 12:08:48 +08:00
Mikhail Koviazin
af811748e7
clang-format: set ColumnLimit to 0 and reformat (#1045)
This commit hopefully improves the formatting of the codebase by setting
ColumnLimit to 0 and hence stopping clang-format from trying to put as
much stuff in one line as possible.

This change enabled us to remove most of `clang-format off` directives
and fixed a bunch of lines that looked like this:

```c
#define KEY \
    VALUE /* comment */
```

Additionally, one pair of `clang-format off` / `clang-format on` had
`clang-format off` as the second comment and hence didn't enable the
formatting for the rest of the file. This commit addresses this issue as
well.

Please tell me if anything in the changes seem off. If everything is
fine, I will add this commit to `.git-blame-ignore-revs` later.

---------

Signed-off-by: Mikhail Koviazin <mikhail.koviazin@aiven.io>
2024-09-25 01:22:54 +02:00
Binbin
6ce75cdea8
Fix replica online timing issue in failover test (#1044)
Ci reported this failure:
```
[exception]: Executing test client: ERR FAILOVER target replica is not online..
ERR FAILOVER target replica is not online.
    while executing
"$node_0 failover to $node_1_host $node_1_port"
```

We can see somehow the replica is not online in time and
casuing this failure, added a verify_replica_online to make
sure the replica is online for the test.

Signed-off-by: Binbin <binloveplay1314@qq.com>
2024-09-23 17:35:02 +08:00
Ricardo Dias
c15eee3407
Changes tcmalloc.h header location (#1039)
This commit changes the `tcmalloc.h` header location from the deprecated
location `google/` to `gperftools/`.

**Why we're doing this now?**

The location `google/tcmalloc.h` has been deprecated for more than 10
years in favor of `gperftools/tcmalloc.h`, and the deprecated location
will be removed in the next release of gperftools.

Fixes #1033

Signed-off-by: Ricardo Dias <ricardo.dias@percona.com>
2024-09-23 10:23:48 +02:00
Binbin
56fba564b6
Print an empty primary log when primary lost its last slot (#1064)
The one in CLUSTER SETSLOT help us keep track of state better,
of course it also can make the test case happy.

The one in gossip process fixes a problem that a replica can
print a log saying it is an empty primary.

Signed-off-by: Binbin <binloveplay1314@qq.com>
Co-authored-by: Ping Xie <pingxie@outlook.com>
2024-09-23 13:14:09 +08:00
Binbin
d07c29791a
Use _Thread_local to solve threads.h build issue (#1053)
Apparently this will fail to compile in some masOS version.
And internet claims _Thread_local is portable.

Fixes #1051.

Signed-off-by: Binbin <binloveplay1314@qq.com>
2024-09-22 20:20:55 +08:00
Shivshankar
56c90b78e3
Fix a typo in the valkey.conf (#1048)
Signed-off-by: Shivshankar-Reddy <shiva.sheri.github@gmail.com>
2024-09-21 21:22:39 +08:00
Binbin
ea7a7995ed
Fix default value of primary-reboot-down-after-period in sentinel.conf (#1040)
Since in here the monitor value is mymaster, we need to make sure the
primary name is the same, otherwise the default configuration cannot start
sentinel.
```
sentinel monitor mymaster 127.0.0.1 6379 2
```

The following error occurs when the default configuration is started:
```
*** FATAL CONFIG FILE ERROR (Version 255.255.255) ***
Reading the configuration file, at line 358
>>> 'SENTINEL primary-reboot-down-after-period myprimary 0'
No such master with specified name.
```

Introduced in #647.

Signed-off-by: Binbin <binloveplay1314@qq.com>
2024-09-21 21:09:13 +08:00
Binbin
d9c41e9ef9
Fix timing issue in the new tot-net-out replica test (#1060)
Apparently there is a timing issue when using wait_for_ofs_sync:
```
[exception]: Executing test client: can't read "out_before": no such variable.
can't read "out_before": no such variable
```

The reason is that if the connection between the primary
and the replica is not established yet, the master_repl_offset
of the primary and replica in wait_for_ofs_sync is 0, and
the check fails, resulting in no replica client in the
client list below.

In this case, we need to make sure the replica is online
before proceeding.

Signed-off-by: Binbin <binloveplay1314@qq.com>
2024-09-20 14:25:05 +08:00
Binbin
7fab15795f
Add log about old primary after myself failover (#1058)
Sometims it is hard to see the old primary during a
multi primaries failover, adding this log can help
use to find the old primary node.

Signed-off-by: Binbin <binloveplay1314@qq.com>
Co-authored-by: Ping Xie <pingxie@outlook.com>
2024-09-20 14:15:19 +08:00
Shivshankar
56fd97733b
Move printver test to info-command file (#1056)
This fixes: #219

Signed-off-by: Shivshankar-Reddy <shiva.sheri.github@gmail.com>
2024-09-20 10:18:19 +08:00
ranshid
4593dc2f05
Fix memory allocation for server databases (#1046)
Fix a bug in the way we allocate memory for the server databases
Introduced in #156.

Signed-off-by: Ran Shidlansik <ranshid@amazon.com>
2024-09-18 19:35:35 +08:00
Shivshankar
ba71c7e56e
Copy 'errno' and use copied value in the if check of retry in cluster migrate commands socket_err block. (#1042)
errno is global variable and shared with system calls, so there is
chance it may be overwritten during io free or close socket in migrate
command code. It would be better it is copied before the free or
closesocket and use copied value to check for retry in socket_err block.
So added new variable to take copy and used the copy variable for the
check.

Signed-off-by: Shivshankar-Reddy <shiva.sheri.github@gmail.com>
2024-09-18 10:34:11 +08:00
Josef Šimánek
ff69b4be1d
Fix casing in README.md (#1043)
TO -> To

Signed-off-by: Josef Šimánek <josef.simanek@gmail.com>
2024-09-18 10:32:40 +08:00
Binbin
f89ff3137d
Add --moduleapi option to better use runtest-moduleapi (#1007)
This allows us to avoid error #1002 and enables us to actually
use `./runtest-moduleapi --single xxx`.

Signed-off-by: Binbin <binloveplay1314@qq.com>
2024-09-17 19:50:38 +08:00
Shivshankar
9f8185f5c8
Update valkey-benchmark log output to reference 'server' instead of 'Redis' (#1029)
Replaced "Could not connect to Redis" with "Could not connect to server" in the log
output for connection errors in `getRedisContext` and `createClient`.

Signed-off-by: Shivshankar-Reddy <shiva.sheri.github@gmail.com>
2024-09-13 21:43:20 -07:00
Binbin
17390383b5
Replica flush the old data after RDB file is ok in disk-based replication (#926)
Call emptyData right before rdbLoad to prevent errors in the middle
and we drop the replication stream and leaving an empty database.
The real changes is in disk-based part, the rest is just code movement.

Signed-off-by: Binbin <binloveplay1314@qq.com>
2024-09-14 11:49:49 +08:00
Ping Xie
09def3cf03
Improve code readability in dict.c (#943)
This pull request improves code readability, as a follow up of #749.

- Internal Naming Conventions: Removed the use of underscores (_) for
internal static structures/functions.

- Descriptive Function Names: Updated function names to be more
descriptive, making their purpose clearer. For instance, `_dictExpand`
is renamed to `dictExpandIfAutoResizeAllowed`.

---------

Signed-off-by: Ping Xie <pingxie@google.com>
2024-09-13 17:21:20 -07:00
Binbin
dcc7678fc4
Fix replica unable trigger migration when it received CLUSTER SETSLOT in advance (#981)
Fix timing issue in evaluating `cluster-allow-replica-migration` for replicas

There is a timing bug where the primary and replica have different 
`cluster-allow-replica-migration` settings. In issue #970, we found that if 
the replica receives `CLUSTER SETSLOT` before the gossip update, it remains 
in the original shard. This happens because we only process the 
`cluster-allow-replica-migration` flag for primaries during `CLUSTER SETSLOT`.

This commit fixes the issue by also evaluating this flag for replicas in the 
`CLUSTER SETSLOT` path, ensuring correct replica migration behavior.

Closes #970
---------

Signed-off-by: Binbin <binloveplay1314@qq.com>
Co-authored-by: Ping Xie <pingxie@outlook.com>
2024-09-13 15:32:20 -07:00
Wen Hui
d090fbefde
Add the missing help output for new command: client capa redirect (#1025)
Update client help output message for new command: client capa redirect

---------

Signed-off-by: hwware <wen.hui.ware@gmail.com>
Signed-off-by: Binbin <binloveplay1314@qq.com>
Co-authored-by: Binbin <binloveplay1314@qq.com>
Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>
2024-09-13 09:22:21 -07:00
Ping Xie
3cc619f637
Disable flaky empty shard slot migration tests (#1027)
Will continue my investigation offline

Signed-off-by: Ping Xie <pingxie@google.com>
2024-09-13 00:02:39 -07:00
Binbin
f7c5b40183
Avoid false positive in election tests (#984)
The node may not be able to initiate an election in time due to
problems with cluster communication. If an election is initiated,
make sure its offset is 0.

Closes #967.

Signed-off-by: Binbin <binloveplay1314@qq.com>
2024-09-13 14:53:39 +08:00
Binbin
38457b7320
Trigger a save of the cluster configuration file before shutting down (#822)
The cluster configuration file is the metadata "database" for the
cluster. It is best to trigger a save when shutdown the server, to
avoid inconsistent content that is not refreshed.

We save the nodes.conf whenever something that affects the nodes.conf
has changed. But we are saving nodes.conf in clusterBeforeSleep, and
some events may save it without a fsync, there is a time gap.

And shutdown has its own save seems good to me, it doesn't need to
care about the others.

At the same time, a comment is added in unlock nodes.conf to explain
why we actively unlock when shutdown.

Signed-off-by: Binbin <binloveplay1314@qq.com>
2024-09-12 15:43:12 +08:00
Ping Xie
76a59788e6
Re-enable empty-shard slot migration tests (#1024)
Related to #734 and #858

Signed-off-by: Ping Xie <pingxie@google.com>
2024-09-11 23:19:32 -07:00
xu0o0
3513f22027
Make clang-format insert a newline at end of file if missing (#1023)
clang generates warning if there is no newline at the end of the source
file.

Update .clang-format to handle the missing newline at eof.

Signed-off-by: haoqixu <hq.xu0o0@gmail.com>
2024-09-11 22:33:07 -07:00
uriyage
8cca11ac54
Fix wrong count for replica's tot-net-out (#1013)
Fix duplicate calculation of replica's `net_output_bytes`

- Remove redundant calculation leftover from previous refactor
- Add test to prevent regression

Signed-off-by: Uri Yagelnik <uriy@amazon.com>
Signed-off-by: Binbin <binloveplay1314@qq.com>
Co-authored-by: Binbin <binloveplay1314@qq.com>
2024-09-12 10:36:40 +08:00
Madelyn Olson
fa348e2e59
Optimize the per slot dictionary by checking for cluster mode earlier (#995)
While doing some profiling, I noticed that getKeySlot() was a fairly
large part (~0.7%) of samples doing perf with high pipeline during
standalone. I think this is because we do a very late check for
server.cluster_mode, we first call getKeySlot() and then call
calculateKeySlot(). (calculateKeySlot was surprisingly not automatically
inlined, we were doing a jump into it and then immediately returning
zero). We then also do useless work in the form of caching zero in
client->slot, which will further mess with cache lines.

So, this PR tries to accomplish a few things things.
1) The usage of the `slot` name made a lot more sense before the
introduction of the kvstore. Now with kvstore, we call this the database
index, so all the references to slot in standalone are no longer really
accurate.
2) Pull the cluster mode check all the way out of getKeySlot(), so
hopefully a bit more performant.
3) Remove calculateKeySlot() as independent from getKeySlot().
calculateKeySlot used to have 3 call sites outside of db.c, which
warranted it's own function. It's now only called in two places,
pubsub.c and networking.c.

I ran some profiling, and saw about ~0.3% improvement, but don't really
trust it because you'll see a much higher (~2%) variance in test runs
just by how the branch predictions will get changed with a new memory
layout. Running perf again showed no samples in getKeySlot() and a
reduction in samples in lookupKey(), so maybe this will help a little
bit.

---------

Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
2024-09-11 09:53:42 -07:00
Madelyn Olson
2b207ee1b3
Improve stability of hostnames test (#1016)
Maybe partially resolves https://github.com/valkey-io/valkey/issues/952.

The hostnames test relies on an assumption that node zero and node six
don't communicate with each other to test a bunch of behavior in the
handshake stake. This was done by previously dropping all meet packets,
however it seems like there was some case where node zero was sending a
single pong message to node 6, which was partially initializing the
state.

I couldn't track down why this happened, but I adjusted the test to
simply pause node zero which also correctly emulates the state we want
to be in since we're just testing state on node 6, and removes the
chance of errant messages. The test was failing about 5% of the time
locally, and I wasn't able to reproduce a failure with this new
configuration.

---------

Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
2024-09-11 09:52:34 -07:00
Mikhail Koviazin
c77e8f223c
Added .git-blame-ignore-revs (#1010)
This file enables developers to ignore the certain revisions in
git-blame. This is quite handy considering there was a commit that
reformatted the large amount of code in valkey.

As a downside, one has to do a manual step for each clone of valkey to
enable this feature. The instructions are available in the file itself.

---------

Signed-off-by: Mikhail Koviazin <mikhail.koviazin@aiven.io>
2024-09-10 22:50:35 -07:00
Binbin
4033c99ef5
Fix module RdbLoad wrongly disable the AOF (#1001)
In RdbLoad, we disable AOF before emptyData and rdbLoad to prevent copy-on-write issues. After rdbLoad completes, AOF should be re-enabled, but the code incorrectly checks server.aof_state, which has been reset to AOF_OFF in stopAppendOnly. This leads to AOF not being re-enabled after being disabled.
---------

Signed-off-by: Binbin <binloveplay1314@qq.com>
2024-09-10 21:00:08 -07:00
Amit Nagler
1b24168450
Dual Channel Replication - Verify Replica Local Buffer Limit Configuration (#989)
Prior to comparing the replica buffer against the configured limit, we
need to ensure that the limit configuration is enabled. If the limit is
set to zero, it indicates that there is no limit, and we should skip the
buffer limit check.

---------

Signed-off-by: naglera <anagler123@gmail.com>
2024-09-10 17:26:28 -07:00
Lipeng Zhu
58fe9c0138
Use hashtable as the default type of temp set object during sunion/sdiff (#996)
This patch try to set the temp set object as default hash table type.
And did a simple predication of the temp set object encoding when
initialize `dstset` to reduce the unnecessary conversation.

## Issue Description

According to existing code logic, when did operation like `sunion` and
`sdiff` , the temp set object could be `intset`, `listpack` and
`hashtable`, for the `listpack`, the efficiency is low when did
operation like `find` and `compare` , need to traverse all elements.
When we exploring the hotspots, found the `lpFind` and `memcmp` has been
the bottleneck when running workloads like below:

-
[memtier_benchmark-2keys-set-10-100-elements-sunion.yml](https://github.com/redis/redis-benchmarks-specification/blob/main/redis_benchmarks_specification/test-suites/memtier_benchmark-2keys-set-10-100-elements-sunion.yml)
-
[memtier_benchmark-2keys-set-10-100-elements-sdiff.yml](https://github.com/redis/redis-benchmarks-specification/blob/main/redis_benchmarks_specification/test-suites/memtier_benchmark-2keys-set-10-100-elements-sdiff.yml)


![image](https://github.com/user-attachments/assets/71dfc70b-2ad5-4832-a338-712deefca20e)

## Optimization 

This patch try to set the temp set object as default hash table type.
And did a simple predication of the temp set object encoding when
initialize `dstset` to reduce the unnecessary conversation.

### Test Environment

- OPERATING SYSTEM: Ubuntu 22.04.4 LTS
- Kernel: 5.15.0-116-generic
- PROCESSOR: Intel Xeon Platinum 8380
- Server and Client in same socket.

#### Server Configuration
```
taskset -c 0-3 ~/valkey/src/valkey-server /tmp/valkey.conf

port 9001
bind * -::*
daemonize no
protected-mode no
save ""
```

#### Performance Boost 

| Test Name| Perf Boost|
|-|-|

|[memtier_benchmark-2keys-set-10-100-elements-sunion.yml](https://github.com/redis/redis-benchmarks-specification/blob/main/redis_benchmarks_specification/test-suites/memtier_benchmark-2keys-set-10-100-elements-sunion.yml)
|41%|

|[memtier_benchmark-2keys-set-10-100-elements-sdiff.yml](https://github.com/redis/redis-benchmarks-specification/blob/main/redis_benchmarks_specification/test-suites/memtier_benchmark-2keys-set-10-100-elements-sdiff.yml)
|27%|


### More Tests
With above test set which have total 110 elements in the 2 given sets.
We also did some benchmarking by adjusting the total number of elements
in all given sets. We can still observe the performance boost.


![image](https://github.com/user-attachments/assets/b2ab420c-43e5-45de-9715-7d943df229cb)

---------

Signed-off-by: Lipeng Zhu <lipeng.zhu@intel.com>
Co-authored-by: Wangyang Guo <wangyang.guo@intel.com>
2024-09-10 22:09:18 +02:00
uriyage
9f0c80187e
Fix crash in async IO threads with TLS (#1011)
Fix for https://github.com/valkey-io/valkey/issues/997

Root Cause Analysis:
1. Two different jobs (READ and WRITE) may be sent to the same IO
thread.
2. When processing the read job in `processIOThreadsReadDone`, the IO
thread may find that the write job has also been completed.
3. In this case, the IO thread calls `processClientIOWriteDone` to first
process the completed write job and free the COBs
affbea5dc1/src/networking.c (L4666)
4. If there are pending writes (resulting from pipeline commands), a new
async IO write job is sent before processing the completed read job
affbea5dc1/src/networking.c (L2417)
When sending the write job, the `TLS_CONN_FLAG_POSTPONE_UPDATE_STATE`
flag is set to prevent the IO thread from updating the event loop, which
is not thread-safe.
5. Upon resuming the read job processing, the flag is cleared,
affbea5dc1/src/networking.c (L4685)
causing the IO thread to update the event loop.

Fix:
Prevent sending async write job for pending writes when a read job is
about to be processed.

Testing:
The issue could not be reproduced due to its rare occurrence, which
requires multiple specific conditions to align simultaneously.

Signed-off-by: Uri Yagelnik <uriy@amazon.com>
2024-09-10 11:20:10 -07:00
bentotten
affbea5dc1
For MEETs, save the extensions support flag immediately during MEET processing (#778)
For backwards compatibility reasons, a node will wait until it receives
a cluster message with the extensions flag before sending its own
extensions. This leads to a delay in shard ID propagation that can
corrupt nodes.conf with inaccurate shard IDs if a node is restarted
before this can stabilize.

This fixes much of that delay by immediately triggering the
extensions-supported flag during the MEET processing and attaching the
node to the link, allowing the PONG reply to contain OSS extensions.

Partially fixes #774

---------

Signed-off-by: Ben Totten <btotten@amazon.com>
Co-authored-by: Ben Totten <btotten@amazon.com>
2024-09-09 20:46:02 -07:00
Binbin
50c1fe59f7
Add missing moduleapi getchannels test and fix tests (#1002)
Signed-off-by: Binbin <binloveplay1314@qq.com>
2024-09-10 10:13:54 +08:00
zhaozhao.zz
f504cf233b
add assertion for kvstore's dictType (#1004)
Signed-off-by: zhaozhao.zz <zhaozhao.zz@alibaba-inc.com>
2024-09-09 12:13:18 -07:00
xu0o0
20d583f774
Migrate dict.c unit tests to new framework (#946)
This PR migrates the tests related to dict into new test framework as
part of #428.

Signed-off-by: haoqixu <hq.xu0o0@gmail.com>
Signed-off-by: Binbin <binloveplay1314@qq.com>
Co-authored-by: Binbin <binloveplay1314@qq.com>
2024-09-09 13:03:15 +08:00
xu0o0
14016d2df7
Migrate listpack.c unit tests to new framework (#949)
This PR migrates the tests related to listpack into new test framework
as part of #428.

Signed-off-by: haoqixu <hq.xu0o0@gmail.com>
Signed-off-by: Binbin <binloveplay1314@qq.com>
Co-authored-by: Binbin <binloveplay1314@qq.com>
2024-09-09 13:01:25 +08:00
Binbin
c642cf0134
Add client info to SHUTDOWN / CLUSTER FAILOVER logs (#875)
Print the full client info by using catClientInfoString, the
info is useful when we want to identify the source of request.

Signed-off-by: Binbin <binloveplay1314@qq.com>
2024-09-08 16:26:56 +08:00
Binbin
6478526597
Fix aof base suffix when modifying aof-use-rdb-preamble during rewrite (#886)
If we modify aof-use-rdb-preamble in the middle of rewrite,
we may get a wrong aof base suffix. This is because the suffix
is concatenated by the main process afterwards, and it may be
different from the beginning.

We cache this value when we start the rewrite.

Signed-off-by: Binbin <binloveplay1314@qq.com>
2024-09-07 23:27:59 +08:00
Binbin
9b51949abe
Fix missing replication link re-connection when primary's IP/port is updated in clusterProcessGossipSection (#965)
`clusterProcessGossipSection` currently doesn't trigger a check and call `replicationSetPrimary` when `myself`'s primary node’s IP/port is updated. This fix ensures that after every node address update, `replicationSetPrimary` is called if the updated node is `myself`'s primary. This prevents missed updates and ensures that replicas reconnect properly to maintain their replication link with the primary.
2024-09-05 22:19:50 -07:00
Binbin
9033734b6b
Add newline to argv in crash report when doing redact (#993)
Minor cleanup, introduced in #877.

Signed-off-by: Binbin <binloveplay1314@qq.com>
2024-09-05 11:13:29 +08:00
Kyle Kim (kimkyle@)
2d1eca577e
Add SLOT-STATS under CLUSTER HELP string. (#988)
Add help wording for cluster SLOT-STATS.

Signed-off-by: Kyle Kim <kimkyle@amazon.com>
2024-09-03 12:59:06 -07:00
Viktor Söderqvist
ea58fbf40d
Rewrite lazyfree docs in valkey.conf to reflect that lazy is now default (#983)
Before this doc update, the comments in valkey.conf said that DEL is a
blocking command, and even refered to other synchronous freeing as "in a
blocking way, like if DEL was called". This has now become confusing and
incorrect, since DEL is now non-blocking by default.

The comments also mentioned too much about the "old default" and only
later explain that the "new default" is non-blocking.

This doc update focuses on the current default and expresses it like
"Starting from Valkey 8.0, lazy freeing is enabled by default", rather
than using words like old and new.

This is a follow-up to #913.

---------

Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
2024-09-03 10:47:23 +02:00
NAM UK KIM
f143ffd2a5
Fix typo in valkey-cli.c (#979)
Change from replicsa to replicas in valkey-cli.c

Signed-off-by: NAM UK KIM <namuk2004@naver.com>
2024-09-03 14:58:09 +08:00
Ping Xie
981f977abf
Improve type safety and refactor dict entry handling (#749)
This pull request introduces several changes to improve the type safety
of Valkey's dictionary implementation:

- Getter/Setter Macros: Implemented macros `DICT_SET_VALUE` and
`DICT_GET_VALUE` to centralize type casting within these macros. This
change emulates the behavior of C++ templates in C, limiting type
casting to specific low-level operations and preventing it from being
spread across the codebase.

- Reduced Assert Overhead: Removed unnecessary asserts from critical hot
paths in the dictionary implementation.

- Consistent Naming: Standardized the naming of dictionary entry types.
For example, all dictionary entry types start their names with
`dictEntry`.


Fix #737

---------

Signed-off-by: Ping Xie <pingxie@google.com>
Signed-off-by: Ping Xie <pingxie@outlook.com>
Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>
2024-09-02 18:28:15 -07:00
Madelyn Olson
3e14516d86
Initialize all the fields for the test kvstore (#982)
Follow up to https://github.com/valkey-io/valkey/pull/966, which didn't
update the kvstore tests. I'm not actually entirely clear why it fixes
it, but the consistency prevents the crash very reliably so will merge
it now and maybe see if Zhao has a better explanation.

---------

Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
2024-09-02 11:01:59 -07:00
Amit Nagler
5fdb47c2e2
Add configuration hide-user-data-from-log to hide user data from server logs (#877)
Implement data masking for user data in server logs and diagnostic output. This change prevents potential exposure of confidential information, such as PII, and enhances privacy protection. It masks all command arguments, client names, and client usernames.

Added a new hide-user-data-from-log configuration item, default yes.

---------

Signed-off-by: Amit Nagler <anagler123@gmail.com>
2024-09-02 09:50:36 -07:00
Binbin
5693fe4664
Fix set expire test due to the new lazyfree configs changes (#980)
Test failed because these two PRs #865 and #913.

Signed-off-by: Binbin <binloveplay1314@qq.com>
2024-09-02 22:43:09 +08:00
zhaozhao.zz
32116d09bb
Use metadata to handle the reference relationship between kvstore and dict (#966)
Feature `one-dict-per-slot` refactors the database, and part of it
involved splitting the rehashing list from the global level back to the
database level, or more specifically, the kvstore level. This change is
fine, and it also simplifies the process of swapping databases, which is
good. And it should not have a major impact on the efficiency of
incremental rehashing.

To implement the kvstore-level rehashing list, each `dict` under the
`kvstore` needs to know which `kvstore` it belongs. However, kvstore did
not insert the reference relationship into the `dict` itself, instead,
it placed it in the `dictType`. In my view, this is a somewhat odd way.
Theoretically, `dictType` is just a collection of function handles, a
kind of virtual type that can be referenced globally, not an entity. But
now the `dictType` is instantiated, with each `kvstore` owning an actual
`dictType`, which in turn holds a reverse reference to the `kvstore`'s
resource pointer. This design is somewhat uncomfortable for me.

I think the `dictType` should not be instantiated. The references
between actual resources (`kvstore` and `dict`) should occur between
specific objects, rather than force materializing the `dictType`, which
is supposed to be virtual.

---------

Signed-off-by: zhaozhao.zz <zhaozhao.zz@alibaba-inc.com>
2024-09-02 22:35:24 +08:00
Binbin
70624ea63d
Change all the lazyfree configurations to yes by default (#913)
## Set replica-lazy-flush and lazyfree-lazy-user-flush to yes by
default.
There are many problems with running flush synchronously. Even in
single CPU environments, the thread managers should balance between
the freeing and serving incoming requests.

## Set lazy eviction, expire, server-del, user-del to yes by default
We now have a del and a lazyfree del, we also have these configuration
items to control: lazyfree-lazy-eviction, lazyfree-lazy-expire,
lazyfree-lazy-server-del, lazyfree-lazy-user-del. In most cases lazyfree
is better since it reduces the risk of blocking the main thread, and
because we have lazyfreeGetFreeEffort, on those with high effor
(currently
64) will use lazyfree.

Part of #653.

---------

Signed-off-by: Binbin <binloveplay1314@qq.com>
2024-09-02 07:07:17 -07:00
Madelyn Olson
089048d364
Fix zipmap test null pointer (#975)
The previous test does a strncmp on a NULL, which is not valid. It
should be using an empty length string instead. Addresses
https://github.com/valkey-io/valkey/actions/runs/10649272046/job/29519233939.

Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
2024-09-01 12:05:37 +02:00
Binbin
e3af1a30e4
Fast path in SET if the expiration time is expired (#865)
If the expiration time passed in SET is expired, for example, it
has expired due to the machine time (DTS) or the expiration time
passed in (wrong arg). In this case, we don't need to set the key
and wait for the active expire scan before deleting the key.

Compared with previous changes:
1. If the key does not exist, previously we would set the key and wait
for the active expire to delete it, so it is a set + del from the
perspective
of propaganda. Now we will no set the key and return, so it a NOP.

2. If the key exists, previously we woule set the key and wait
for the active expire to delete it, so it is a set + del From the
perspective
of propaganda. Now we will delete it and return, so it is a del.

Adding a new deleteExpiredKeyFromOverwriteAndPropagate function
to reduce the duplicate code.

Signed-off-by: Binbin <binloveplay1314@qq.com>
Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>
2024-08-31 22:39:07 +08:00
Viktor Söderqvist
5d458c6292
Delete unused parts of zipmap (#973)
Deletes zipmapSet, zipmapGet, etc. Only keep iterator and validate
integrity, what we use when loading an old RDB file.

Adjust unit tests to not use zipmapSet, etc.

Solves a build failure where when compiling with fortify source.

---------

Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
2024-08-31 15:42:44 +02:00
Binbin
fea49bce2c
Fix timing issue in replica migration test (#968)
The reason is the server 3 still have the server 7 as its replica
due to a short wait, the wait is not enough, we should wait for
server loss its replica.
```
*** [err]: valkey-cli make source node ignores NOREPLICAS error when doing the last CLUSTER SETSLOT
Expected '{127.0.0.1 21497 267}' to be equal to '' (context: type eval line 34 cmd {assert_equal [lindex [R 3 role] 2] {}} proc ::test)
```

Signed-off-by: Binbin <binloveplay1314@qq.com>
2024-08-30 19:58:46 +08:00
zhaozhao.zz
743f5ac2ae
standalone -REDIRECT handles special case of MULTI context (#895)
In standalone mode, when a `-REDIRECT` error occurs, special handling is
required if the client is in the `MULTI` context.

We have adopted the same handling method as the cluster mode:

1. If a command in the transaction encounters a `REDIRECT` at the time
of queuing, the execution of `EXEC` will return an `EXECABORT` error (we
expect the client to redirect and discard the transaction upon receiving
a `REDIRECT`). That is:

    ```
    MULTI    ==>  +OK
    SET x y  ==>  -REDIRECT
    EXEC     ==>  -EXECABORT
    ```
2. If all commands are successfully queued (i.e., `QUEUED` results are
received) but a redirect is detected during `EXEC` execution (such as a
primary-replica switch), a `REDIRECT` is returned to instruct the client
to perform a redirect. That is:

    ```
    MULTI    ==>  +OK
    SET x y  ==>  +QUEUED
    failover
    EXEC     ==>  -REDIRECT
    ```

---------

Signed-off-by: zhaozhao.zz <zhaozhao.zz@alibaba-inc.com>
2024-08-30 10:17:53 +08:00
Shivshankar
2b76c8fbe2
Migrate zipmap unit test to new framework (#474)
Migrate zipmap unit test to new unit test framework, parent ticket #428
.

---------

Signed-off-by: Shivshankar-Reddy <shiva.sheri.github@gmail.com>
Signed-off-by: hwware <wen.hui.ware@gmail.com>
Co-authored-by: hwware <wen.hui.ware@gmail.com>
2024-08-29 11:17:53 -04:00
Binbin
ecbfb6a7ec
Fix reconfiguring sub-replica causing data loss when myself change shard_id (#944)
When reconfiguring sub-replica, there may a case that the sub-replica will
use the old offset and win the election and cause the data loss if the old
primary went down.

In this case, sender is myself's primary, when executing updateShardId,
not only the sender's shard_id is updated, but also the shard_id of
myself is updated, casuing the subsequent areInSameShard check, that is,
the full_sync_required check to fail.

As part of the recent fix of #885, the sub-replica needs to decide whether
a full sync is required or not when switching shards. This shard membership
check is supposed to be done against sub-replica's current shard_id, which
however was lost in this code path. This then leads to sub-replica joining
the other shard with a completely different and incorrect replication history.

This is the only place where replicaof state can be updated on this path
so the most natural fix would be to pull the chain replication reduction
logic into this code block and before the updateShardId call.

This one follow #885 and closes #942.

Signed-off-by: Binbin <binloveplay1314@qq.com>
Co-authored-by: Ping Xie <pingxie@outlook.com>
2024-08-29 22:39:53 +08:00
zhaozhao.zz
4a9b4f667c
free client's multi state when it becomes dirty (#961)
Release the client's MULTI state when the transaction becomes dirty to
save memory.

---------

Signed-off-by: zhaozhao.zz <zhaozhao.zz@alibaba-inc.com>
2024-08-29 19:20:53 +08:00
Ping Xie
ad0ede302c
Exclude '.' and ':' from isValidAuxChar's banned charset (#963)
Fix a bug in isValidAuxChar where valid characters '.' and ':' were
incorrectly included in the banned charset. This issue affected the
validation of auxiliary fields in the nodes.conf file used by Valkey in
cluster mode, particularly when handling IPv4 and IPv6 addresses. The
code now correctly allows '.' and ':' as valid characters, ensuring
proper handling of these fields. Comments were added to clarify the use
of the banned charset.
 
Related to #736

---------

Signed-off-by: Ping Xie <pingxie@google.com>
2024-08-28 23:35:31 -07:00
Binbin
75b824052d
Revert make KEYS to be an exact match if there is no pattern (#964)
In #792, the time complexity became ambiguous, fluctuating between
O(1) and O(n), which is a significant difference. And we agree uncertainty
can potentially bring disaster to the business, the right thing to do is
to persuade users to use EXISTS instead of KEYS in this case, to do the
right thing the right way, rather than accommodating this incorrect usage.

This reverts commit d66a06e8183818c035bb78706f46fd62645db07e.
This reverts #792.

Signed-off-by: Binbin <binloveplay1314@qq.com>
2024-08-29 10:58:19 +08:00
Viktor Söderqvist
25dd943087
Delete TLS.md and update README.md about tests (#960)
Most of the content of TLS.md has already been copied to README.md in
#927.

The description of how to run tests with TLS is moved to
tests/README.md.

Descriptions of the additional scripts runtest-cluster, runtest-sentinel
and runtest-module are added in tests/README.md.

Links to tests/README.md and src/unit/README.md are added in the
top-level README.md along with a brief overview of the `make test-*`
commands.

Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
2024-08-28 21:17:04 +02:00
Viktor Söderqvist
927c2a8cd1
Delete files MANIFESTO, BUGS and INSTALL (#958)
The MANIFESTO is not Valkey's manifesto and it doesn't even mention open
source. Let's write another one later, or some other document about our
project principles.

The other two files are one-line files with no relevant info. They're
polluting the file listing at root level. It's the first thing you see
when you start exploring the repo for the first time.

Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
2024-08-28 20:04:23 +02:00
I-Hsin Cheng
6172907094
Migrate the contents of TLS.md into README.md (#927)
Migrate the contents in TLS.md into TLS sections including building,
running and detail supports. TODO list in the TLS.md is almost done
except the implementation of benchmark support is still not the best
approach which should migrate to hiredis async mode.

Closes #888

---------

Signed-off-by: I Hsin Cheng <richard120310@gmail.com>
Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
2024-08-28 12:43:29 +02:00
Ping Xie
2b71a78241
Add comment explaining log file reopening for rotation support (#956) 2024-08-27 21:00:17 -07:00
mwish
744b13e302
Using intrinsics to optimize counting HyperLogLog trailing bits (#846)
Godbolt link: https://godbolt.org/z/3YPvxsr5s

__builtin_ctz would generate shorter code than hand-written loop.

---------

Signed-off-by: mwish <maplewish117@gmail.com>
Signed-off-by: Binbin <binloveplay1314@qq.com>
Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
Co-authored-by: Binbin <binloveplay1314@qq.com>
Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>
2024-08-27 20:44:32 -07:00
Binbin
4fe8320711
Add pause path coverage to replica migration tests (#937)
In #885, we only add a shutdown path, there is another path
is that the server might got hang by slowlog. This PR added
the pause path coverage to cover it.

Signed-off-by: Binbin <binloveplay1314@qq.com>
2024-08-28 11:08:27 +08:00
Lipeng Zhu
076bf6605f
Move prepareClientToWrite out of loop for lrange command to reduce the redundant call. (#860)
## Description 
When I explore the cycles distributions for `lrange` test (
`valkey-benchmark -p 9001 -t lrange -d 100 -r 1000000 -n 1000000 -c 50
--threads 4`). I found the `prepareClientToWrite` and
`clientHasPendingReplies` could be reduced to single call outside
instead of called in a loop, ideally we can gain 3% performance. The
corresponding `LRANG_100`, `LRANG_300`, `LRANGE_500`, `LRANGE_600` have
~2% - 3% performance boost, the benchmark test prove it helps.

This patch try to move the `prepareClientToWrite` and its child
`clientHasPendingReplies` out of the loop to reduce the function
overhead.

---------

Signed-off-by: Lipeng Zhu <lipeng.zhu@intel.com>
2024-08-27 19:11:09 -07:00
Binbin
6a84e06b05
Wait for the role change and fix the timing issue in the new test (#947)
The test might be fast enough and then there is no change in the role
causing the test to fail. Adding a wait to avoid the timing issue:
```
*** [err]: valkey-cli make source node ignores NOREPLICAS error when doing the last CLUSTER SETSLOT
Expected '{127.0.0.1 23154 267}' to be equal to '' (context: type eval line 24 cmd {assert_equal [lindex [R 3 role] 2] {}} proc ::test)
```

Signed-off-by: Binbin <binloveplay1314@qq.com>
2024-08-28 09:51:10 +08:00
Vadym Khoptynets
4f29ad4583
Use sdsAllocSize instead of sdsZmallocSize (#923)
sdsAllocSize returns the correct size without consulting the
allocator. Which is much faster than consulting the allocator.
The only exception is SDS_TYPE_5, for which it has to
consult the allocator.

This PR also sets alloc field correctly for embedded string objects.
It assumes that no allocator would allocate a buffer larger
than `259 + sizeof(robj)` for embedded string. We use embedded strings
for strings up to 44 bytes. If this assumption is wrong, the whole
function would require a rewrite. In general case sds type adjustment
might be needed. Such logic should go to sds.c.

---------

Signed-off-by: Vadym Khoptynets <vadymkh@amazon.com>
2024-08-27 14:43:01 -07:00
Amit Nagler
1ff2a3b6ae
Remove dual-channel-replication Feature Flag's Protection (#908)
Currently, the `dual-channel-replication` feature flag is immutable if
`enable-protected-configs` is enabled, which is the default behavior.
This PR proposes to make the `dual-channel-replication` flag mutable,
allowing it to be changed dynamically without restarting the cluster.

**Motivation:**
The ability to change the `dual-channel-replication` flag dynamically is
essential for testing and validating the feature on real clusters
running in production environments. By making the flag mutable, we can
enable or disable the feature without disrupting the cluster's
operations, facilitating easier testing and experimentation.
Additionally, this change would provide more flexibility for users to
enable or disable the feature based on their specific requirements or
operational needs without requiring a cluster restart.

---------

Signed-off-by: naglera <anagler123@gmail.com>
2024-08-27 10:18:48 -07:00
Viktor Söderqvist
54c0f743dd
Connection minor fixes (#953)
1. Remove redundant connIncrRefs/connDecrRefs

    In socket.c, the reference counter is incremented before calling
callHandler, but the same reference counter is also incremented inside
callHandler before calling the actual callback.

        static inline int callHandler(connection *conn, ConnectionCallbackFunc handler) {
            connIncrRefs(conn);
            if (handler) handler(conn);
            connDecrRefs(conn);
            ...
        }

    This commit removes the redundant incr/decr calls in socket.c

2. Correct return value of connRead for TLS when peer closed

    According to comments in connection.h, connRead returns 0 when the peer
has closed the connection. This patch corrects the return value for TLS
connections. (Without this patch, it returns -1 which means error.)

    There is an observable difference in what is logged in the verbose
level: "Client closed connection" vs "Reading from client: (null)".

---------

Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
2024-08-27 16:11:33 +02:00
uriyage
04d76d8b02
Improve multithreaded performance with memory prefetching (#861)
This PR utilizes the IO threads to execute commands in batches, allowing
us to prefetch the dictionary data in advance.

After making the IO threads asynchronous and offloading more work to
them in the first 2 PRs, the `lookupKey` function becomes a main
bottle-neck and it takes about 50% of the main-thread time (Tested with
SET command). This is because the Valkey dictionary is a straightforward
but inefficient chained hash implementation. While traversing the hash
linked lists, every access to either a dictEntry structure, pointer to
key, or a value object requires, with high probability, an expensive
external memory access.

### Memory Access Amortization

Memory Access Amortization (MAA) is a technique designed to optimize the
performance of dynamic data structures by reducing the impact of memory
access latency. It is applicable when multiple operations need to be
executed concurrently. The principle behind it is that for certain
dynamic data structures, executing operations in a batch is more
efficient than executing each one separately.

Rather than executing operations sequentially, this approach interleaves
the execution of all operations. This is done in such a way that
whenever a memory access is required during an operation, the program
prefetches the necessary memory and transitions to another operation.
This ensures that when one operation is blocked awaiting memory access,
other memory accesses are executed in parallel, thereby reducing the
average access latency.

We applied this method in the development of `dictPrefetch`, which takes
as parameters a vector of keys and dictionaries. It ensures that all
memory addresses required to execute dictionary operations for these
keys are loaded into the L1-L3 caches when executing commands.
Essentially, `dictPrefetch` is an interleaved execution of dictFind for
all the keys.


**Implementation details**

When the main thread iterates over the `clients-pending-io-read`, for
clients with ready-to-execute commands (i.e., clients for which the IO
thread has parsed the commands), a batch of up to 16 commands is
created. Initially, the command's argv, which were allocated by the IO
thread, is prefetched to the main thread's L1 cache. Subsequently, all
the dict entries and values required for the commands are prefetched
from the dictionary before the command execution. Only then will the
commands be executed.

---------

Signed-off-by: Uri Yagelnik <uriy@amazon.com>
2024-08-26 21:10:44 -07:00
Binbin
694246cfab
Drop the outdated script replication example comments (#951)
This example was for script replication which we have
completely removed in 7.0, so this example is outdated now.

Signed-off-by: Binbin <binloveplay1314@qq.com>
2024-08-27 12:04:47 +08:00
Binbin
d66a06e818
Make KEYS to be an exact match if there is no pattern (#792)
Although KEYS is a dangerous command and we recommend people
to avoid using it, some people who are not familiar with it
still using it, and even use KEYS with no pattern at all.

Once KEYS is using with no pattern, we can convert it to an
exact match to avoid iterating over all data.

Signed-off-by: Binbin <binloveplay1314@qq.com>
2024-08-27 12:04:27 +08:00
xu0o0
73698fa028
Fix invalid escape sequence in utils, minor cleanup in python script (#948)
According to the Python document[1], any invalid escape sequences in
string literals now generate a DeprecationWarning (SyntaxWarning as of
3.12) and in the future this will become a SyntaxError.

This Change uses Python’s raw string notation for regular expression
patterns to avoid it.

[1]: https://docs.python.org/3.10/library/re.html

Signed-off-by: haoqixu <hq.xu0o0@gmail.com>
2024-08-26 22:53:35 +08:00
Binbin
9f4b1adbea
Add explicit assert to ensure thread_shared_qb won't expand (#938)
Although this won't happen now, adding this statement explicitly.

Signed-off-by: Binbin <binloveplay1314@qq.com>
2024-08-25 12:03:34 +08:00
Binbin
c7d1daea05
Add epoch information to failover auth denied logs (#816)
When failover deny to vote, sometimes due to network or
some blocking operations, the time of FAILOVER_AUTH_REQUEST
packet arrival is very uncertain. Since there is no epoch
information in these logs, it is hard to associate the log
with other logs.

Signed-off-by: Binbin <binloveplay1314@qq.com>
2024-08-24 18:03:24 +08:00
NAM UK KIM
0053429a02
Update "Total" message and used_memory_human log information in serverCron() function (#594)
At the VERBOSE/DEBUG log level, which is output once every 5 seconds,
added to show the "Total" message of all clients and to show memory
usage (used_memory) with used_memory_human.
Also, it seems clearer to show "total" number of keys and the number of
volatile in entire keys.

---------

Signed-off-by: NAM UK KIM <namuk2004@naver.com>
2024-08-23 18:02:18 -07:00
Ayush Sharma
b48596a914
Add support for setting the group on a unix domain socket (#901)
Add new optional, immutable string config called `unixsocketgroup`. 
Change the group of the unix socket to `unixsocketgroup` after `bind()`
if specified.

Adds tests to validate the behavior.

Fixes #873.

Signed-off-by: Ayush Sharma <mrayushs933@gmail.com>
2024-08-23 11:52:08 -07:00
Madelyn Olson
829aa7fe3c
Remove accurate from extra test tag (#935)
Today if we attached the "run-extra-tests" tag it adds at least 20
minutes because the dump-fuzzer test runs with full accuracy. This
fuzzer is useful, but probably only really needed for the daily, so
removing it from the PRs. We still run the fuzzers, just not for as
long.

Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
2024-08-23 11:05:41 -07:00
Binbin
8045994972
valkey-cli make source node ignores NOREPLICAS when doing the last CLUSTER SETSLOT (#928)
This fixes #899. In that issue, the primary is cluster-allow-replica-migration no
and its replica is cluster-allow-replica-migration yes.

And during the slot migration:
1. Primary calling blockClientForReplicaAck, waiting its replica.
2. Its replica reconfiguring itself as a replica of other shards due to
replica migration and disconnect from the old primary.
3. The old primary never got the chance to receive the ack, so it got a
timeout and got a NOREPLICAS error.

In this case, the replicas might automatically migrate to another primary,
resulting in the client being unblocked with the NOREPLICAS error. In this
case, since the configuration will eventually propagate itself, we can safely
ignore this error on the source node.

Signed-off-by: Binbin <binloveplay1314@qq.com>
2024-08-23 16:22:30 +08:00
Binbin
5d97f5133c
Fix CLUSTER SETSLOT block and unblock error when all replicas are down (#879)
In CLUSTER SETSLOT propagation logic, if the replicas are down, the
client will get block during command processing and then unblock
with `NOREPLICAS Not enough good replicas to write`.

The reason is that all replicas are down (or some are down), but
myself->num_replicas is including all replicas, so the client will
get block and always get timeout.

We should only wait for those online replicas, otherwise the waiting
propagation will always timeout since there are not enough replicas.
The admin can easily check if there are replicas that are down for an
extended period of time. If they decide to move forward anyways, we
should not block it. If a replica  failed right before the replication and
was not included in the replication, it would also unlikely win the election.

Signed-off-by: Binbin <binloveplay1314@qq.com>
Co-authored-by: Ping Xie <pingxie@google.com>
2024-08-23 16:21:53 +08:00
Yunxiao Du
0a11c4a140
Delete redundant declaration clusterNodeCoversSlot and countKeysInSlot (#930)
Delete redundant declaration, clusterNodeCoversSlot and countKeysInSlot
has been declared in cluster.h

Signed-off-by: Yunxiao Du <me@jackdu.cn>
2024-08-23 12:17:27 +08:00
Madelyn Olson
b12668af7a
Revert repl backlog size back to 1mb for dual channel tests (#934)
There is a test that assumes that the backlog will get overrun, but
because of the recent changes to the default it no longer fails. It
seems like it is a bit flakey now though, so resetting the value in the
test back to 1mb. (This relates to the CoB of 1100k. So it should
consistently work with a 1mb limit).

Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
2024-08-22 15:35:28 -07:00
Wen Hui
959dd3485b
Decline unsubscribe related command in non-subscribed mode (#759)
Now, when clients run the unsubscribe, sunsubscribe and punsubscribe
commands in the non-subscribed mode, it returns 0.
Indeed this is a bug, we should not allow client run these kind of
commands here.

Thus, this PR fixes this bug, but it is a break change for existing
clients

---------

Signed-off-by: hwware <wen.hui.ware@gmail.com>
2024-08-22 11:21:33 -04:00
Binbin
8d9b8c9d3d
Make runtest-cluster support --io-threads option (#933)
In #764, we add a --io-threads mode in test, but forgot
to handle runtest-cluster, they are different framework.

Currently runtest-cluster does not support tags, and we
don't have plan to support it. And currently cluster tests
does not have any io-threads tests, so this PR just align
--io-threads option with #764.

Signed-off-by: Binbin <binloveplay1314@qq.com>
2024-08-22 11:21:06 -04:00
Binbin
08aaeea4b7
Avoid to re-establish replication if node is already myself primary in CLUSTER REPLICATE (#884)
If n is already myself primary, there is no need to re-establish the
replication connection.

In the past we allow a replica node to reconnect with its primary via
this CLUSTER REPLICATE command, it will use psync. But since #885, we
will assume that a full sync is needed in this case, so if we don't do
this, the replica will always use full sync.

Signed-off-by: Binbin <binloveplay1314@qq.com>
Co-authored-by: Ping Xie <pingxie@google.com>
2024-08-22 11:00:18 +08:00
uriyage
39f8bcb91b
Skip tracking clients OOM test when I/O threads are enabled (#764)
Fix feedback loop in key eviction with tracking clients when using I/O
threads.

Current issue:
Evicting keys while tracking clients or key space-notification exist
creates a feedback loop when using I/O threads:

While evicting keys we send tracking async writes to I/O threads,
preventing immediate release of tracking clients' COB memory
consumption.

Before the I/O thread finishes its write, we recheck used_memory, which
now includes the tracking clients' COB and thus continue to evict more
keys.

**Fix:**
We will skip the test for now while IO threads are active. We may
consider avoiding sending writes in `processPendingWrites` to I/O
threads for tracking clients when we are out of memory.

---------

Signed-off-by: Uri Yagelnik <uriy@amazon.com>
Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>
2024-08-21 17:02:57 -07:00
Harkrishn Patro
002d052eef
Update README.md file to reference valkey.io (#931)
Update README.md since the project is no longer under construction, and
can reference the main website.

---------

Signed-off-by: Harkrishn Patro <harkrisp@amazon.com>
2024-08-21 14:19:50 -07:00
Binbin
a1ac459ef1
Set repl-backlog-size from 1mb to 10mb by default (#911)
The repl-backlog-size 1mb is too small in most cases, now network
transmission and bandwidth performance have improved rapidly in more
than ten years.

The bigger the replication backlog, the longer the replica can endure
the disconnect and later be able to perform a partial resynchronization.

Part of #653.

---------

Signed-off-by: Binbin <binloveplay1314@qq.com>
2024-08-21 11:59:02 -04:00
Wen Hui
b8dd4fbbf7
Fix Error in Daily CI -- reply-schemas-validator (#922)
Just add one more test for command "sentinel IS-PRIMARY-DOWN-BY-ADDR" to
make the reply-schemas-validator
run successfully.

Note: test result here
https://github.com/hwware/valkey/actions/runs/10457516111

Signed-off-by: hwware <wen.hui.ware@gmail.com>
2024-08-21 09:36:02 -04:00
zhenwei pi
2673320b66
RDMA: Support user keepalive command (#916)
If the client side crashes by any issue or exits normally, the kernel
will try to disconnect RDMA QPs. Then the kernel of server side receives
CM packets, valkey-server handles CM disconnected event and close
connection.

However, there is a lack of keepalive mechanism from RDMA transport
layer. Once the kernel of client side crashes, the server side will not
be notified. To avoid this issue, valkey server sents Keepaliv command
periodically to detect any dead QPs.

An example of mlx-cx5:

```
 # RDMA: CQ handle error status: transport retry counter exceeded[0xc], opcode : 0x0
 # RDMA: CQ handle error status: transport retry counter exceeded[0xc], opcode : 0x0
 # RDMA: CQ handle error status: Work Request Flushed Error[0x5], opcode : 0x0
 # RDMA: CQ handle error status: Work Request Flushed Error[0x5], opcode : 0x0
 # RDMA: CQ handle error status: Work Request Flushed Error[0x5], opcode : 0x0
 # RDMA: CQ handle error status: Work Request Flushed Error[0x5], opcode : 0x0
```

Signed-off-by: zhenwei pi <pizhenwei@bytedance.com>
2024-08-21 10:38:34 +02:00
Binbin
e1b3629186 Fix data loss when replica do a failover with a old history repl offset (#885)
Our current replica can initiate a failover without restriction when
it detects that the primary node is offline. This is generally not a
problem. However, consider the following scenarios:

1. In slot migration, a primary loses its last slot and then becomes
a replica. When it is fully synchronized with the new primary, the new
primary downs.

2. In CLUSTER REPLICATE command, a replica becomes a replica of another
primary. When it is fully synchronized with the new primary, the new
primary downs.

In the above scenario, case 1 may cause the empty primary to be elected
as the new primary, resulting in primary data loss. Case 2 may cause the
non-empty replica to be elected as the new primary, resulting in data
loss and confusion.

The reason is that we have cached primary logic, which is used for psync.
In the above scenario, when clusterSetPrimary is called, myself will cache
server.primary in server.cached_primary for psync. In replicationGetReplicaOffset,
we get server.cached_primary->reploff for offset, gossip it and rank it,
which causes the replica to use the old historical offset to initiate
failover, and it get a good rank, initiates election first, and then is
elected as the new primary.

The main problem here is that when the replica has not completed full
sync, it may get the historical offset in replicationGetReplicaOffset.

The fix is to clear cached_primary in these places where full sync is
obviously needed, and let the replica use offset == 0 to participate
in the election. In this way, this unhealthy replica has a worse rank
and is not easy to be elected.

Of course, it is possible that it will be elected with offset == 0.
In the future, we may need to prohibit the replica with offset == 0
from having the right to initiate elections.

Another point worth mentioning, in above cases:
1. In the ROLE command, the replica status will be handshake, and the
offset will be -1.
2. Before this PR, in the CLUSTER SHARD command, the replica status will
be online, and the offset will be the old cached value (which is wrong).
3. After this PR, in the CLUSTER SHARD, the replica status will be loading,
and the offset will be 0.

Signed-off-by: Binbin <binloveplay1314@qq.com>
2024-08-21 13:11:21 +08:00
Binbin
829243e76b
Correct RDB_EOF_MARK_SIZE usage where EOF mark is relevant (#925)
In these places we should use RDB_EOF_MARK_SIZE, but we mixed
it with CONFIG_RUN_ID_SIZE. This is not an issue since they are
all 40, just a cleanup.

Signed-off-by: Binbin <binloveplay1314@qq.com>
2024-08-21 00:00:29 +08:00
ranshid
e2ab7ffd89
Make use of a single listNode pointer for blocking utility lists (#919)
Saves some memory (one pointer) in the client struct.

Since a client cannot be blocked multiple times, we can assume
it will be held in only one extra utility list, so it is ok to maintain
a union of these listNode references. 

Signed-off-by: Ran Shidlansik <ranshid@amazon.com>
Signed-off-by: ranshid <88133677+ranshid@users.noreply.github.com>
Signed-off-by: Binbin <binloveplay1314@qq.com>
Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
Co-authored-by: Binbin <binloveplay1314@qq.com>
2024-08-20 18:54:53 +08:00
gmbnomis
7795152fff
Fix valgrind timing issue failure in replica-redirect test (#917)
Wait for the replica to become online before starting the actual test.

Signed-off-by: Simon Baatz <gmbnomis@gmail.com>
2024-08-18 21:20:53 +08:00
Binbin
70b9285802
Optimize linear search of WAIT and WAITAOF when unblocking the client (#787)
Currently, if the client enters a blocked state, it will be
added to the server.clients_waiting_acks list. When the client
is unblocked, that is, when unblockClient is called, we will
need to linearly traverse server.clients_waiting_acks to delete
the client, and this search is O(N).

When WAIT (or WAITAOF) is used extensively in some cases, this
O(N) search may be time-consuming. We can remember the list node
and store it in the blockingState struct and it can avoid the
linear search in unblockClientWaitingReplicas.

Signed-off-by: Binbin <binloveplay1314@qq.com>
2024-08-18 21:20:35 +08:00
Wen Hui
33c7ca41be
Add 4 commands for sentinel and update most test cases and json files (#789)
Add 4 new commands for Sentinel (reference
https://github.com/valkey-io/valkey/issues/36)

Sentinel GET-PRIMARY-ADDR-BY-NAME
Sentinel PRIMARY
Sentinel PRIMARIES
Sentinel IS-PRIMARY-DOWN-BY-ADDR

and deprecate 4 old commands:

Sentinel GET-MASTER-ADDR-BY-NAME
Sentinel MASTER
Sentinel MASTERS
Sentinel IS-MASTER-DOWN-BY-ADDR

and all sentinel tests pass here
https://github.com/hwware/valkey/actions/runs/9962102363/job/27525124583

Note: 

1. runtest-sentinel pass all test cases
2. I finished a sentinel rolling upgrade test: 1 primary 2 replicas 3
sentinel
   there are 4 steps in this test scenario: 
step 1: all 3 sentinel nodes run old sentinel, shutdown primary, and
then new primary can be voted successfully.
step 2: replace sentinel 1 with new sentinel bin file, and then shutdown
primary, and then another new primary can be voted successfully
step 3: replace sentinel 2 with new sentinel bin file, and then shutdown
primary, and then another new primary can be voted successfully
step 4: replace sentinel 3 with new sentinel bin file, and then shutdown
primary, and then another new primary can be voted successfully
   
We can see, even mixed version sentinel running, whole system still
works.

---------

Signed-off-by: hwware <wen.hui.ware@gmail.com>
2024-08-16 09:46:36 -04:00
DarrenJiang13
adf53c212b
Add lfu support for DEBUG OBJECT command, added lfu_freq and lfu_access_time_minutes fields (#479)
For `debug object` command, we use `val->lru` but ignore the `lfu` mode.  
So in `lfu` mode, `debug object` would return meaningless `lru` descriptions. 

Added two new fields lfu_freq and lfu_access_time_minutes.

Signed-off-by: jiangyujie.jyj <yjjiang1996@163.com>
Co-authored-by: Binbin <binloveplay1314@qq.com>
2024-08-16 17:49:46 +08:00
Binbin
fc9f291033
Make a light weight version of DEBUG OBJECT, add FAST option (#881)
Adding FAST option to DEBUG OBJECT command.

The light version only shows the light weight infomation,
which mostly O(1). The pre-existing version that show more
stats such as serializedlength sometimes is time consuming.

This should allow looking into debug stats (the key expired
but not deleted), even on huge object, on which we're afraid
to run the command for fear of causing a server freeze.

Somehow like 3ca451c46fed894bf49e7561fa0282d2583f1c06.

Signed-off-by: Binbin <binloveplay1314@qq.com>
2024-08-16 10:18:36 +08:00
Binbin
76ad8f7a76
Skip IPv6 tests when TCLSH version is < 8.6 (#910)
In #786, we did skip it in the daily, but not for the others.
When running ./runtest on MacOS, we will get the failure.
```
couldn't open socket: host is unreachable (nodename nor servname provided, or not known)
```

The reason is that TCL 8.5 doesn't support ipv6, so we skip tests
tagged with ipv6. This also revert #786.

Signed-off-by: Binbin <binloveplay1314@qq.com>
2024-08-15 15:11:38 +08:00
secwall
103365fe0e
Log unexpected $ENDOFF responses in dual channel replication (#839)
I've tried to test a dual channel replication but forgot to add +sync
for my replication user. As a result replica entered silent cycle like
this:
```
* Connecting to PRIMARY 127.0.0.1:6379
* PRIMARY <-> REPLICA sync started
* Non blocking connect for SYNC fired the event.
* Primary replied to PING, replication can continue...
* Trying a partial resynchronization (request ...)
* PSYNC is not possible, initialize RDB channel.
* Aborting dual channel sync
```

And primary got endless cycle like this:
```
* Replica 127.0.0.1:6380 asks for synchronization
* Partial resynchronization not accepted: Replication ID mismatch (Replica asked for '...', my replication IDs are '...' and '...')
* Replica 127.0.0.1:6380 is capable of dual channel synchronization, and partial sync isn't possible. Full sync will continue with dedicated RDB channel.
```

There was no way to understand that replication user is missing +sync
acl on notice log level. With this one-line change we get a warning
message in our replica log.

---------

Signed-off-by: secwall <secwall@yandex-team.ru>
2024-08-14 22:00:57 -07:00
Pieter Cailliau
4d284daefd
Copyright update to reflect IP transfer from salvatore to Redis (#740)
Update references of copyright being assigned to Salvatore when it was
transferred to Redis Ltd. as per
https://github.com/valkey-io/valkey/issues/544.

---------

Signed-off-by: Pieter Cailliau <pieter@redis.com>
2024-08-14 09:20:36 -07:00
Salvatore Mesoraca
68b2270947
Prevent later accesses to unallocated memory (#907)
A pointer to dtype is stored in the dict forever.
dtype is stack-allocated while the dict created is global.
The dict (and the pointer to dtype in it) will live past the lifetime of
dtype.
clusterManagerLinkDictType is a global object that has the same values
as dtype.

Signed-off-by: Salvatore Mesoraca <salvatore.mesoraca@aiven.io>
2024-08-14 09:03:27 -07:00
zhaozhao.zz
131857e80a
To avoid bouncing -REDIRECT during FAILOVER (#871)
Fix #821

During the `FAILOVER` process, when conditions are met (such as when the
force time is reached or the primary and replica offsets are
consistent), the primary actively becomes the replica and transitions to
the `FAILOVER_IN_PROGRESS` state. After the primary becomes the replica,
and after handshaking and other operations, it will eventually send the
`PSYNC FAILOVER` command to the replica, after which the replica will
become the primary. This means that the upgrade of the replica to the
primary is an asynchronous operation, which implies that during the
`FAILOVER_IN_PROGRESS` state, there may be a period of time where both
nodes are replicas. In this scenario, if a `-REDIRECT` is returned, the
request will be redirected to the replica and then redirected back,
causing back and forth redirection. To avoid this situation, during the
`FAILOVER_IN_PROGRESS state`, we temporarily suspend the clients that
need to be redirected until the replica truly becomes the primary, and
then resume the execution.

---------

Signed-off-by: zhaozhao.zz <zhaozhao.zz@alibaba-inc.com>
2024-08-14 14:04:29 +08:00
Binbin
370bdb3e46
Change server.daylight_active to an atomic variable (#876)
We are updating this variable in the main thread, and the
child threads can printing the logs at the same time. This
generating a warning in SANITIZER=thread:
```
WARNING: ThreadSanitizer: data race (pid=74208)
  Read of size 4 at 0x000102875c10 by thread T3:
    #0 serverLogRaw <null>:52173615 (valkey-server:x86_64+0x10003c556)
    #1 _serverLog <null>:52173615 (valkey-server:x86_64+0x10003ca89)
    #2 bioProcessBackgroundJobs <null>:52173615 (valkey-server:x86_64+0x1001402c9)

  Previous write of size 4 at 0x000102875c10 by main thread (mutexes: write M0):
    #0 afterSleep <null>:52173615 (valkey-server:x86_64+0x10004989b)
    #1 aeProcessEvents <null>:52173615 (valkey-server:x86_64+0x100031e52)
    #2 main <null>:52173615 (valkey-server:x86_64+0x100064a3c)
    #3 start <null>:52173615 (dyld:x86_64+0xfffffffffff5c365)
    #4 start <null>:52173615 (dyld:x86_64+0xfffffffffff5c365)
```

The refresh of daylight_active is not real time, we update
it in aftersleep, so we don't need a strong synchronization,
so using memory_order_relaxed. But also noted we are doing
load/store operations only for daylight_active, which is an
aligned 32-bit integer, so using memory_order_relaxed will
not provide more consistency than what we have today.

So this is just a cleanup that to clear the warning.

Signed-off-by: Binbin <binloveplay1314@qq.com>
2024-08-14 13:08:20 +08:00
Amit Nagler
6cb86fff51
Fix dual-channel replication test under valgrind (#904)
Test dual-channel-replication primary gets cob overrun during replica
rdb load` fails during the Valgrind run. This is due to the load
handlers disconnecting before the tests complete, resulting in a low
primary COB. Increasing the handlers' timeout should resolve this issue.

Failure:
https://github.com/valkey-io/valkey/actions/runs/10361286333/job/28681321393

Server logs reveals that the load handler clients were disconnected
before the test started

Also the two previus test took about 20 seconds which is the handler
timeout.

---------

Signed-off-by: naglera <anagler123@gmail.com>
Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>
2024-08-13 10:40:19 -07:00
Binbin
f622e375a0
Better messages when valkey-cli cluster --fix meet value check failed (#867)
The clusterManagerCompareKeysValues is introduced in
143bfa1e6e65cf8be1eaad0b8169e2d95ca62f9a, which calls
DEBUG DIGEST-VALUE to check whether the value of the
source node and the target node are consistent.

However, the DEBUG DIGEST-VALUE command is not supported
in older version, such as version 4, and the node will return
unknown subcommand. Or the DEBUG command can be disabled
in version 7, and the node will return DEBUG not allowed.

In these cases, we need to output friendly message to
allow users to proceed to the next step, instead of just
outputing `Value check failed!`.

Unknown subcommand example:
```
*** Target key exists
*** Checking key values on both nodes...
Node 127.0.0.1:30001 replied with error:
ERR unknown subcommand or wrong number of arguments for 'DIGEST-VALUE'. Try DEBUG HELP.
Node 127.0.0.1:30003 replied with error:
ERR unknown subcommand or wrong number of arguments for 'DIGEST-VALUE'. Try DEBUG HELP.
*** Value check failed!
DEBUG DIGEST-VALUE command is not supported.
You can relaunch the command with --cluster-replace option to force key overriding.
```

DEBUG not allowed example:
```
*** Target key exists
*** Checking key values on both nodes...
Node 127.0.0.1:30001 replied with error:
ERR DEBUG command not allowed. If the enable-debug-command option is ...
Node 127.0.0.1:30003 replied with error:
ERR DEBUG command not allowed. If the enable-debug-command option is ...
*** Value check failed!
DEBUG command is not allowed.
You can turn on the enable-debug-command option.
Or you can relaunch the command with --cluster-replace option to force key overriding.
```

Signed-off-by: Binbin <binloveplay1314@qq.com>
2024-08-13 19:25:14 +08:00
Rayacoo
76f809bc19
Optimize ZUNION[STORE] command by removing unnecessary accumulator dict (#829)
In the past implementation of `ZUNION` and `ZUNIONSTORE` commands, we
first create a temporary dict called `accumulator`. After adding all
member-score mappings to `accumulator`, we still need to convert
`accumulator` back to the final dict `dstzset->dict`. However, we can
directly use `dstzset->dict` to avoid the additional copy operation.

This PR removes the `accumulator` dict and directly uses` dstzset->dict
`to store the member-score mappings.

- **Test**
First, I added 300 unique elements to two sorted sets called
'zunion_test1' and 'zunion_test2'. Then, I tested `zunion` and
`zunionstore` on these two sorted sets. The test results shown below
indicate that the performance of both zunion and zunionstore improved
about 31%.

### ZUNION
#### unstable
```
./valkey-benchmark -P 10 -n 100000 zunion 2 zunion_test1 zunion_test2

Summary:
  throughput summary: 2713.41 requests per second
  latency summary (msec):
          avg       min       p50       p95       p99       max
      146.252     3.464   153.343   182.015   184.959   192.895
```
#### This PR
```
./valkey-benchmark -P 10 -n 100000 zunion 2 zunion_test1 zunion_test2

Summary:
  throughput summary: 3543.84 requests per second
  latency summary (msec):
          avg       min       p50       p95       p99       max
      108.259     2.984   114.239   141.695   145.151   160.255
```
### ZUNIONSTORE
#### unstable
```
./valkey-benchmark -P 10 -n 100000 zunionstore out 2 zunion_test1 zunion_test2

Summary:
  throughput summary: 3168.07 requests per second
  latency summary (msec):
          avg       min       p50       p95       p99       max
      157.511     3.368   183.167   189.311   193.535   231.679
```
#### This PR
```
./valkey-benchmark -P 10 -n 100000 zunionstore out 2 zunion_test1 zunion_test2

Summary:
  throughput summary: 4144.73 requests per second
  latency summary (msec):
          avg       min       p50       p95       p99       max
      120.374     2.648   141.823   149.119   153.855   183.167
```

---------

Signed-off-by: RayCao <zisong.cw@alibaba-inc.com>
Signed-off-by: zisong.cw <zisong.cw@alibaba-inc.com>
2024-08-13 16:50:57 +08:00
Eran Liberty
6dfb8203cc
Add debug-context config (#874)
A configuration option with zero impact on server operation but is
printed out on server crash and can be accessed by gdb for debugging. It
can be used by the user/operator to store any free-form string. This
string will persist as long as the server is running and will be
accessible in the following ways:

And printed in crash reports:
```
------ CONFIG DEBUG OUTPUT ------
lazyfree-lazy-eviction no
...
io-threads-do-reads yes
debug-context "test2"
proto-max-bulk-len 512mb
```

---------

Signed-off-by: Eran Liberty <eranl@amazon.com>
Co-authored-by: Eran Liberty <eranl@amazon.com>
2024-08-12 16:33:23 -07:00
naglera
27fce29500
Fix dual-channel-replication related issues (#837)
- Fix TLS bug where connection were shutdown by primary's main process
while the child process was still writing- causing main process to be
blocked.
- TLS connection fix -file descriptors are set to blocking mode in the
main thread, followed by a blocking write. This sets the file
descriptors to non-blocking if TLS is used (see `connTLSSyncWrite()`)
(@xbasel).
- Improve the reliability of dual-channel tests. Modify the pause
mechanism to verify process status directly, rather than relying on log.
- Ensure that `server.repl_offset` and `server.replid` are updated
correctly when dual channel synchronization completes successfully.
Thist led to failures in replication tests that validate replication IDs
or compare replication offsets.

---------

Signed-off-by: naglera <anagler123@gmail.com>
Signed-off-by: naglera <58042354+naglera@users.noreply.github.com>
Signed-off-by: xbasel <103044017+xbasel@users.noreply.github.com>
Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
Signed-off-by: Binbin <binloveplay1314@qq.com>
Co-authored-by: ranshid <88133677+ranshid@users.noreply.github.com>
Co-authored-by: xbasel <103044017+xbasel@users.noreply.github.com>
Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>
Co-authored-by: Binbin <binloveplay1314@qq.com>
2024-08-12 13:03:12 -07:00
naglera
1c198a95ac
Add debug assert on duplicate freeClientAsync (#896)
When debug assert mode enabled, verify that we don't insert the same
client twice into server.clients_to_close.

Signed-off-by: naglera <anagler123@gmail.com>
2024-08-12 12:44:53 -07:00
Binbin
929283fc6f
Dual channel replication should not update lastbgsave_status when transfer error (#811)
Currently lastbgsave_status is used in bgsave or disk-replication,
and the target is the disk. In #60, we update it when transfer error,
i think it is mainly used in tests, so we can use log to replace it.

It changes lastbgsave_status to err in this case, but it is strange
that it does not set ok or err in the above if and the following else.
Also noted this will affect stop-writes-on-bgsave-error.

Signed-off-by: Binbin <binloveplay1314@qq.com>
2024-08-12 11:25:55 -07:00
Binbin
5166d489da
Correctly recode client infomation to the slowlog when runing script (#805)
Currently when we are runing a script, we will passing a fake client.
So if the command executed in the script is recoded in the slowlog,
the client's ip:port and name information will be empty.

before:
```
127.0.0.1:6379> client setname myclient
OK
127.0.0.1:6379> config set slowlog-log-slower-than 0
OK
127.0.0.1:6379> eval "redis.call('ping')" 0
(nil)
127.0.0.1:6379> slowlog get 2
1) 1) (integer) 2
   2) (integer) 1721314289
   3) (integer) 96
   4) 1) "eval"
      2) "redis.call('ping')"
      3) "0"
   5) "127.0.0.1:61106"
   6) "myclient"
2) 1) (integer) 1
   2) (integer) 1721314289
   3) (integer) 4
   4) 1) "ping"
   5) ""
   6) ""
```

after:
```
127.0.0.1:6379> client setname myclient
OK
127.0.0.1:6379> config set slowlog-log-slower-than 0
OK
127.0.0.1:6379> eval "redis.call('ping')" 0
(nil)
127.0.0.1:6379> slowlog get 2
1) 1) (integer) 2
   2) (integer) 1721314371
   3) (integer) 53
   4) 1) "eval"
      2) "redis.call('ping')"
      3) "0"
   5) "127.0.0.1:51983"
   6) "myclient"
2) 1) (integer) 1
   2) (integer) 1721314371
   3) (integer) 1
   4) 1) "ping"
   5) "127.0.0.1:51983"
   6) "myclient"
```

Signed-off-by: Binbin <binloveplay1314@qq.com>
2024-08-10 23:46:56 +08:00
Harkrishn Patro
7424620ca0
Check if the server is currently running the feature before cron run (#838)
I think we should first check if the server is currently enabled in
cluster mode or if it has modules loaded prior to the throttled cron run
(`run_with_period`) condition.

Signed-off-by: Harkrishn Patro <harkrisp@amazon.com>
2024-08-08 13:28:45 -07:00
Harkrishn Patro
109cc21267
Assert network bytes out for replication slot stat computation is only allowed on primary (#847)
Added an assertion to avoid incorrect usage of the network bytes out for
replication code flow in slot stats computation.

Signed-off-by: Harkrishn Patro <harkrisp@amazon.com>
2024-08-07 16:14:16 -07:00
Binbin
380f700816
Improve cluster cant failover log conditions (#780)
This PR adjusts the logging conditions of clusterLogCantFailover
in this two ways.

1. For the same cant_failover_reason, we will print the log once
in CLUSTER_CANT_FAILOVER_RELOG_PERIOD, but its value is 10s, which
is a bit long, shorten it to 1s, so we can better track its state.
We get to see the system making progress by watching the message.
Using 1s also covers pretty much all cases as i don't see a reason
for using a <1s node timeout, test or prod.

2. We will not print logs before the nolog_fail_time, its value
is cluster-node-timeout+5000. This may casue us to lose some logs,
for example, if cluster-node-timeout is small, auth_timeout will
be 2000, and auth_retry_time will be 4000. In this case, we will
lose all the reasons during the election if the failover is timedout.
So remove the nolog_fail_time logic, since we still do have the
CLUSTER_CANT_FAILOVER_RELOG_PERIOD logic, we won't print too many
logs.

Signed-off-by: Binbin <binloveplay1314@qq.com>
2024-08-06 21:14:18 +08:00
Yury-Fridlyand
bfdab65791
Fix CI concurrency (#849)
Few CI improvements witch will reduce occupation CI queue and eliminate
stale runs.

1. Kill CI jobs on PRs once PR branch gets a new push. This will prevent
situation happened today - a huge job triggered twice in less than an
hour and occupied all **org** (for all repositories) runners queue for
the rest of the day (see pic). This completely blocked valkey-glide
team.
2. Distribute nightly croned jobs on time to prevent them running
together. Keep in mind, cron's TZ is UTC, so midnight tasks incur
developers located in other timezones.

This must be backported to all release branches (`valkey-x.y` and `x.y`)

![image](https://github.com/user-attachments/assets/923d8237-3cb7-42f5-80c8-5322b3f5187d)

---------

Signed-off-by: Yury-Fridlyand <yury.fridlyand@improving.com>
2024-08-05 22:05:29 -07:00
Harkrishn Patro
0fc43edc6c
Update sentinel conf access string to allow hello channel access (#854)
This example of a minimal user account in your Valkey server
for Sentinel is incorrect. If you add this ACL as-is to your
valkey users.acl, valkey will add resetchannels -@all before
the +client which prevents sentinel from publishing messages
to the __sentinel__:hello pubsub for sentinel discovery.

Fix #744.

Signed-off-by: Harkrishn Patro <harkrisp@amazon.com>
2024-08-03 23:32:53 +08:00
Wen Hui
facd123ce6
Update redis.conf to valkey.conf in log message (#855)
Update redis.conf to valkey.conf

Signed-off-by: hwware <wen.hui.ware@gmail.com>
2024-08-03 23:30:55 +08:00
Binbin
054ffd140f
Fix outdated comment of migrate in valkey-cli --cluster (#864)
After 503fd229e4181e932ba74b3ca8a222712d80ebca the comment is outdated.

Signed-off-by: Binbin <binloveplay1314@qq.com>
2024-08-03 14:13:37 +08:00
Madelyn Olson
b728e4170f
Disable empty shard slot migration test until test is de-flaked (#859)
We have a number of test failures in the empty shard migration which
seem to be related to race conditions in the failover, but could be more
pervasive. For now disable the tests to prevent so many false negative
test failures.

Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
2024-07-31 16:52:20 -07:00
Madelyn Olson
4b8de6b1be
Update replica version comparison to handle version 8 RC candidates (#851)
Release candidates have a version that is lower than 8.0.0 to allow for
8.0.0 to have 0x080000 as a release number. However, we did an explicit
check to make sure a version was 8.0 or greater to validate a replica
supports a feature. Now we are using the highest patch version of latest
minor to do the comparison to accommodate future versions.

Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
2024-07-31 10:01:48 -07:00
uriyage
1d18842074
Fix bug in writeToClient (#834)
Fix bug in writeToClient
In https://github.com/valkey-io/valkey/pull/758, a major refactor was
done to `networking.c`.

As part of this refactor, a new bug was introduced: we don't advance the
`c->buf` pointer in repeated writes.

This bug should be very unlikely to manifest, as it requires the
client's TCP buffer to be filled in the first try and then released
immediately after in the second try.

Despite all my efforts to reproduce this scenario, I was unable to do
so.

Signed-off-by: Uri Yagelnik <uriy@amazon.com>
2024-07-30 21:54:18 -07:00
Binbin
fa238dc049
Update dir in valkey.conf to mention cluster-config-file (#635)
I think it is a good idea to mention this.

The Cluster config file is written relative this directory, if the
'cluster-config-file' configuration directive is a relative path.

Signed-off-by: Binbin <binloveplay1314@qq.com>
Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
2024-07-30 21:13:54 +08:00
Harkrishn Patro
aaa7834362
Handle underflow condition of network out slot stats metric (#840)
Fixes test failure
(https://github.com/valkey-io/valkey/actions/runs/10146979329/job/28056316421?pr=837)
on 32 bit system for slot stats metric underflow on the following
condition:

```
server.cluster->slot_stats[c->slot].network_bytes_out += (len * listLength(server.replicas));
```
* Here listLength accesses `len` which is of `unsigned long` type and
multiplied with `len` (which could be negative). This is a risky
operation and behaves differently based on the architecture.

```
clusterSlotStatsAddNetworkBytesOutForReplication(-sdslen(selectcmd->ptr));
```
* `sdslen` method returns `size_t`. applying `-` operation to decrement
network bytes out is also incorrect.

This change adds assertion on `len` being negative and handles the
wrapping of overall value.

---------

Signed-off-by: Harkrishn Patro <harkrisp@amazon.com>
2024-07-29 21:50:46 -07:00
Madelyn Olson
b4d96caa78
Remove static to avoid compiler warning (#836) 2024-07-28 13:08:09 -07:00
naglera
9211aed72e
Improve reliability of dual-channel-replication pause resume tests (#835)
Update the dual channel-replication tests to wait for the pause to begin
before attempting to unpause.

---------

Signed-off-by: naglera <anagler123@gmail.com>
2024-07-28 11:14:56 -07:00
Binbin
e32518d655
Fix unexpected behavior when turning appendonly on and off within a transaction (#826)
If we do `config set appendonly yes` and `config set appendonly no`
in a multi, there are some unexpected behavior.

When doing appendonly yes, we will schedule a AOFRW, and when we
are doding appendonly no, we will call stopAppendOnly to stop it.
In stopAppendOnly, the aof_fd is -1 since the aof is not start yet
and the fsync and close will take the -1 and call it, so they will
all fail with EBADF. And stopAppendOnly will emit a server log, the
close(-1) should be no problem but it is still an undefined behavior.

This PR also adds a log `Background append only file rewriting
scheduled.` to bgrewriteaofCommand when it was scheduled. 
And adds a log in stopAppendOnly when a scheduled AOF is canceled,
it will print  `AOF was disabled but there is a scheduled AOF background, cancel it.`

Signed-off-by: Binbin <binloveplay1314@qq.com>
Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
2024-07-27 18:38:23 +08:00
Kyle Kim (kimkyle@)
e1d936b339
Add network-bytes-in and network-bytes-out metric support under CLUSTER SLOT-STATS command (#20) (#720)
Adds two new metrics for per-slot statistics, network-bytes-in and
network-bytes-out. The network bytes are inclusive of replication bytes
but exclude other types of network traffic such as clusterbus traffic.

#### network-bytes-in
The metric tracks network ingress bytes under per-slot context, by
reverse calculation of `c->argv_len_sum` and `c->argc`, stored under a
newly introduced field `c->net_input_bytes_curr_cmd`.

#### network-bytes-out
The metric tracks network egress bytes under per-slot context, by
hooking onto COB buffer mutations.

#### sample response
Both metrics are reported under the `CLUSTER SLOT-STATS` command.
```
127.0.0.1:6379> cluster slot-stats slotsrange 0 0
1) 1) (integer) 0
    2) 1) "key-count"
       2) (integer) 0
       3) "cpu-usec"
       4) (integer) 0
       5) "network-bytes-in"
       6) (integer) 0
       7) "network-bytes-out"
       8) (integer) 0
```

---------

Signed-off-by: Kyle Kim <kimkyle@amazon.com>
Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>
2024-07-26 16:06:16 -07:00
Roshan Khatri
e745e9c240
Adds Light-weight cluster bus header for pubsub message. (#654)
Adds light-weight cluster bus header for pubsub message. Closes #557.

This also supports sending to and receiving non-light messages from
older versions of the engine.

The light-weight cluster bus message supports multiple pubsub messages
(payloads) for one pubsub channel. Receiving messages with multiple
payloads is supported but we're not yet sending such multi-payload
messages to other nodes.

---------

Signed-off-by: Roshan Khatri <rvkhatri@amazon.com>
2024-07-26 10:49:18 -07:00
naglera
48ca2c9176
Improve dual channel replication stability and fix compatibility issues (#804)
Introduce several improvements to improve the stability of dual-channel
replication and fix compatibility issues.

1. Make dual-channel-replication tests more reliable: use pause instead
of forced sleep.
2. Fix race conditions when freeing RDB client.
3. Check if sync was stopped during local buffer streaming.
4. Fix $ENDOFFSET reply format to work on 32-bit machines too.

---------

Signed-off-by: naglera <anagler123@gmail.com>
Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>
2024-07-25 09:34:39 -07:00
zisong.cw
da286a599d
Optimize the logic for checking the conversion of zset from listpack to skiplist during the ZADD operation. (#806)
During the ZADD operation, a conversion from listpack to skiplist might
be necessary for the sorted set. Currently, the function
zsetTypeMaybeConvert only examines the number of elements but does not
check the Max size of the elements. It is advisable to include a check
for value_len_hint for a more robust conversion check mechanism.

---------

Signed-off-by: RayCao <zisong.cw@alibaba-inc.com>
Signed-off-by: Binbin <binloveplay1314@qq.com>
Co-authored-by: Binbin <binloveplay1314@qq.com>
2024-07-25 17:46:00 +08:00
ranshid
5c073a58e4
Increase rioConnsetWrite max chunk size to 16K (#817)
Fixes #796

Currently rioConnWite uses 1024 bytes chunk when feeding the replication
sockets on RDB write. This value seems too small and we will end up with
high syscall overhead.
**This PR sets the max chunk size to 16K.**

Using a simple test program we did not observe any significant
improvement in read/write times going with chunks bigger than 4K but
that might be la bottleneck on network throughput. We did observe sweet
point of CPU utilization at 16K when using TLS.

```
lsb_release -a
No LSB modules are available.
Distributor ID:	Ubuntu
Description:	Ubuntu 24.04 LTS
Release:	24.04
Codename:	noble
```

```
uname -a
Linux ip-172-31-22-140 6.8.0-1009-aws #9-Ubuntu SMP Fri May 17 14:39:23 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
```
All files were compiled with O3 optimization level.
```
gcc --version
gcc (Ubuntu 13.2.0-23ubuntu4) 13.2.0
```

**Results:**

Chunk Size | write-time sec | writes total | write cpu-time (usr+sys) |
read-time sec | read total syscalls | read cpu-time (usr+sys)
-- | -- | -- | -- | -- | -- | --
1K | 0.162946 | 102400 | 0.185916 | 0.168479 | 2447 | 0.026945
4K | 0.163036 | 25600 | 0.122629 | 0.168627 | 715 | 0.023382
8K | 0.163942 | 12800 | 0.121131 | 0.168887 | 704 | 0.039388
16K | 0.163614 | 6400 | 0.104742 | 0.168202 | 2483 | 0.025574
64K | 0.16279 | 1600 | 0.098792 | 0.168854 | 1068 | 0.046929
1K - TLS | 0.32648 | 102400 | 0.366961 | 0.330785 | 102400 | 0.337377
4K - TLS | 0.164296 | 25600 | 0.183326 | 0.169032 | 25600 | 0.129952
8K - TLS | 0.163977 | 12800 | 0.163118 | 0.169484 | 12800 | 0.098432
16K - TLS | 0.164861 | 6400 | 0.150666 | 0.169878 | 6383 | 0.094794
64K - TLS | 0.163704 | 6400 | 0.156125 | 0.169323 | 6388 | 0.089971

---------

Signed-off-by: Ran Shidlansik <ranshid@amazon.com>
Signed-off-by: ranshid <88133677+ranshid@users.noreply.github.com>
Signed-off-by: Binbin <binloveplay1314@qq.com>
Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>
Co-authored-by: Binbin <binloveplay1314@qq.com>
2024-07-24 08:23:06 -07:00
uriyage
3cca26881a
Improve CPU usage in tests with IO threads (#823)
Currently, when running tests with IO threads, we set the
`events-per-io-thread` config to 0. This activated IO threads 100% of
the time, regardless of the number of IO events.

This is causing issues with tests running multiple server instances, as
it drained machine CPU resources. As a result, tests could have very
long runtimes, especially on limited instances.
For example, in
https://github.com/valkey-io/valkey/actions/runs/10066315827/job/27827426986?pr=804,
the `Cluster consistency during live resharding` test ran for 1 hour and
41 minutes.

This PR addresses the issue by:
1. Deactivating IO threads when there are no IO events
2. Continuing to offload all IO events to IO threads

Tested on 16 cores instance, after implementing these changes, the
runtime for the `Cluster consistency during live resharding` test
dropped from 7 minutes an 14 seconds to 3 minutes and 28 seconds.

Signed-off-by: Uri Yagelnik <uriy@amazon.com>
2024-07-23 22:24:41 -07:00
Binbin
f00c8f6214
Modify clusterSaveConfig function call to use C_OK / C_ERR return value (#818)
Minor cleanups.

Signed-off-by: Binbin <binloveplay1314@qq.com>
2024-07-24 09:58:44 +08:00
Binbin
59aa00823c
Replicas with the same offset queue up for election (#762)
In some cases, like read more than write scenario, the replication
offset of the replicas are the same. When the primary fails, the
replicas have the same rankings (rank == 0). They issue the election
at the same time (although we have a random 500), the simultaneous
elections may lead to the failure of the election due to quorum.

In clusterGetReplicaRank, when we calculates the rank, if the offsets
are the same, the one with the smaller node name will have a better
rank to avoid this situation.

---------

Signed-off-by: Binbin <binloveplay1314@qq.com>
2024-07-22 23:43:16 -07:00
Binbin
9a7bf910cb
Fix extra reply in debug sleep-after-fork-seconds error path (#810)
The getDoubleFromObjectOrReply already add the error reply when
C_ERR, remove this extra error reply.

Signed-off-by: Binbin <binloveplay1314@qq.com>
2024-07-23 12:48:23 +08:00
Kyle Kim (kimkyle@)
5000c050b5
Add cpu-usec metric support under CLUSTER SLOT-STATS command (#20). (#712)
The metric tracks cpu time in micro-seconds, sharing the same value as
`INFO COMMANDSTATS`, aggregated under per-slot context.

---------

Signed-off-by: Kyle Kim <kimkyle@amazon.com>
Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>
2024-07-22 18:03:28 -07:00
Madelyn Olson
ccafbb750b
Added a flag to run additional tests for additional tests (#815)
This PR allows running a subset of the daily tests with a PR by
attaching the `run-extra-tests` flag. This is done by conditionally
running the daily tests when the label is attached. (I will do that for
this PR to demonstrate).

One downside of this PR is that a lot of tests will forever show-up as
"skipped" for most PRs, as long as that doesn't bother us it should be
OK. Skipped tests don't take up any of our runner compute.

Another note, if the label isn't attached on the first commit, the
submitter will need to push something to get the tests to run again.
There is a way to make it kick off tests during a label, but that added
a bunch more complexity so just wanted to start with this.

---------

Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
2024-07-22 17:44:18 -07:00
Binbin
14e09e981e
Fix the wrong woff when execute WAIT / WAITAOF in script (#776)
When executing the script, the client passed in is a fake
client, and its woff is always 0.

This results in woff always being 0 when executing wait/waitaof
in the script, and the command returns a wrong number.

---------

Signed-off-by: Binbin <binloveplay1314@qq.com>
2024-07-22 10:33:10 +02:00
mwish
6eb19cf38e
Typo fix in hyperloglog.c (#807)
Change from hypreloglog to hyperloglog

Signed-off-by: mwish <maplewish117@gmail.com>
2024-07-20 23:48:00 +02:00
Binbin
15a8290231
Optimize failover time when the new primary node is down again (#782)
We will not reset failover_auth_time after setting it, this is used
to check auth_timeout and auth_retry_time, but we should at least
reset it after a successful failover.

Let's assume the following scenario:
1. Two replicas initiate an election.
2. Replica 1 is elected as the primary node, and replica 2 does not have
   enough votes.
3. Replica 1 is down, ie the new primary node down again in a short
time.
4. Replica 2 know that the new primary node is down and wants to
initiate
   a failover, but because the failover_auth_time of the previous round
   has not been reset, it needs to wait for it to time out and then wait
for the next retry time, which will take cluster-node-timeout * 4 times,
   this adds a lot of delay.

There is another problem. Like we will set additional random time for
failover_auth_time, such as random 500ms and replicas ranking 1s. If
replica 2 receives PONG from the new primary node before sending the
FAILOVER_AUTH_REQUEST, that is, before the failover_auth_time, it will
change itself to a replica. If the new primary node goes down again at
this time, replica 2 will use the previous failover_auth_time to
initiate
an election instead of going through the logic of random 500ms and
replicas ranking 1s again, which may lead to unexpected consequences
(for example, a low-ranking replica initiates an election and becomes
the new primary node).

That is, we need to reset failover_auth_time at the appropriate time.
When the replica switches to a new primary, we reset it, because the
existing failover_auth_time is already out of date in this case.

---------

Signed-off-by: Binbin <binloveplay1314@qq.com>
2024-07-19 15:27:49 -04:00
Harkrishn Patro
816accea76
Generate correct slot information in cluster shards command on primary failure (#790)
Fix #784 

Prior to the change, `CLUSTER SHARDS` command processing might pick a
failed primary node which won't have the slot coverage information and
the slots `output` in turn would be empty. This change finds an
appropriate node which has the slot coverage information served by a
given shard and correctly displays it as part of `CLUSTER SHARDS`
output.

 Before:
 
 ```
 1) 1) "slots"
   2)  (empty array)
   3) "nodes"
   4) 1)  1) "id"
          2) "2936f22a490095a0a851b7956b0a88f2b67a5d44"
          ...
          9) "role"
         10) "master"
         ...
         13) "health"
         14) "fail"
 ```
 
 After:
 

 ```
 1) 1) "slots"
   2) 1) 0
       2) 5461
   3) "nodes"
   4) 1)  1) "id"
          2) "2936f22a490095a0a851b7956b0a88f2b67a5d44"
          ...
          9) "role"
         10) "master"
         ...
         13) "health"
         14) "fail"
 ```

---------

Signed-off-by: Harkrishn Patro <harkrisp@amazon.com>
2024-07-19 09:32:39 -07:00
Binbin
35a1888333
Fix incorrect usage of process_is_paused in tests (#783)
It was introduced wrong in #442.

Signed-off-by: Binbin <binloveplay1314@qq.com>
2024-07-19 11:25:58 +08:00
uriyage
94bc15cb71
Io thread work offload (#763)
### IO-Threads Work Offloading 

This PR is the 2nd of 3 PRs intended to achieve the goal of 1M requests
per second.
(1st PR: https://github.com/valkey-io/valkey/pull/758)

This PR offloads additional work to the I/O threads, beyond the current
read-parse/write operations, to better utilize the I/O threads and
reduce the load on the main thread.

It contains the following 3 commits:

### Poll Offload

Currently, the main thread is responsible for executing the poll-wait
system call, while the IO threads wait for tasks from the main thread.
The poll-wait operation is expensive and can consume up to 30% of the
main thread's time. We could have let the IO threads do the poll-wait by
themselves, with each thread listening to some of the clients and
notifying the main thread when a client's command is ready to execute.

However, the current approach, where the main thread listens for events
from the network, has several benefits. The main thread remains in
charge, allowing it to know the state of each client
(idle/read/write/close) at any given time. Additionally, it makes the
threads flexible, enabling us to drain an IO thread's job queue and stop
a thread when the load is light without modifying the event loop and
moving its clients to a different IO thread. Furthermore, with this
approach, the IO threads don't need to wait for both messages from the
network and from the main thread instead, the threads wait only for
tasks from the main thread.

To enjoy the benefits of both the main thread remaining in charge and
the poll being offloaded, we propose offloading the poll-wait as a
single-time, non-blocking job to one of the IO threads. The IO thread
will perform a poll-wait non-blocking call while the main thread
processes the client commands. Later, in `aeProcessEvents`, instead of
sleeping on the poll, we check for the IO thread's poll-wait results.

The poll-wait will be offloaded in `beforeSleep` only when there are
ready events for the main thread to process. If no events are pending,
the main thread will revert to the current behavior and sleep on the
poll by itself.

**Implementation Details**

A new call back `custompoll` was added to the `aeEventLoop` when not set
to `NULL` the ae will call the `custompoll` callback instead of the
`aeApiPoll`.

When the poll is offloaded we will set the `custompoll` to
`getIOThreadPollResults` and send a poll-job to the thread. the thread
will take a mutex, call a non-blocking (with timeout 0) to `aePoll`
which will populate the fired events array. the IO thread will set the
`server.io_fired_events` to the number of the returning `numevents`,
later the main-thread in `custompoll` will return the
`server.io_fired_events` and will set the `customPoll` back to `NULL`.

To ensure thread safety when accessing server.el, all functions that
modify the eventloop events were wrapped with a mutex to ensure mutual
exclusion when modifying the events.

### Command Lookup Offload

As the IO thread parses the command from the client's Querybuf, it can
perform a command lookup in the commands dictionary, which can consume
up to ~5% of the main-thread runtime.

**Implementation details**
The IO thread will store the looked-up command in the client's new field
`io_parsed_cmd` field. We can't use `c->cmd` for that since we use
`c->cmd `to check if a command was reprocessed or not.

To ensure thread safety when accessing the command dictionary, we make
sure the main thread isn't changing the dictionary while IO threads are
accessing it. This is accomplished by introducing a new flag called
`no_incremental_rehash` for the `dictType` commands. When performing
`dictResize`, we will rehash the entire dictionary in place rather than
deferring the process.

### Free Offload

Since the command arguments are allocated by the I/O thread, it would be
beneficial if they were also freed by the same thread. If the main
thread frees objects allocated by the I/O thread, two issues arise:

1. During the freeing process, the main thread needs to access the SDS
pointed to by the object to get its length.
2. With Jemalloc, each thread manages thread local pool (`tcache`) of
buffers for quick reallocation without accessing the arena. If the main
thread constantly frees objects allocated by other threads, those
threads will have to frequently access the shared arena to obtain new
memory allocations

**Implementation Details**
When freeing the client's argv, we will send the argv array to the
thread that allocated it. The thread will be identified by the client
ID. When freeing an object during `dbOverwrite`, we will offload the
object free as well. We will extend this to offload the free during
`dbDelete` in a future PR, as its effects on defrag/memory evictions
need to be studied.

---------

Signed-off-by: Uri Yagelnik <uriy@amazon.com>
2024-07-18 19:21:45 -07:00
naglera
8b480310a6
Remove read handler upon RDB connection close (#803)
Primary side: Remove read handler upon RDB connection close.

At this stage we do not expect any writed form that connection
so it should be safe to remove the read handler. Otherwise the
read handler will keep printing the `Client closed connection`
logs, see handleReadResult.

Signed-off-by: naglera <anagler123@gmail.com>
2024-07-18 16:14:02 +08:00
Binbin
36e81d9e79
Fix rdb_child_exit_pipe incorrect close call (#801)
server.rdb_child_exit_pipe is init in !dual_channel block,
so the call here would be close(-1) in !dual_channel way.

It will also generate a warning in valgrind:
Warning: invalid file descriptor -1 in syscall close()

Introduced in #60.

Signed-off-by: Binbin <binloveplay1314@qq.com>
2024-07-17 22:47:08 -07:00
Binbin
c584371506
Fix rdb pipe uninitialized false positive warning (#800)
After #60, the CI report this warning:
```
rdb.c: In function 'rdbSaveToReplicasSockets':
rdb.c:3661:28: error: 'safe_to_exit_pipe' may be used uninitialized [-Werror=maybe-uninitialized]
 3661 |         if (!dual_channel) close(safe_to_exit_pipe);
      |                            ^~~~~~~~~~~~~~~~~~~~~~~~
rdb.c:3512:37: note: 'safe_to_exit_pipe' was declared here
 3512 |     int pipefds[2], rdb_pipe_write, safe_to_exit_pipe;
      |                                     ^~~~~~~~~~~~~~~~~
rdb.c:3654:17: error: 'rdb_pipe_write' may be used uninitialized [-Werror=maybe-uninitialized]
 3654 |                 close(rdb_pipe_write); /* close write in parent so that it can detect the close on the child. */
      |                 ^~~~~~~~~~~~~~~~~~~~~
rdb.c:3512:21: note: 'rdb_pipe_write' was declared here
 3512 |     int pipefds[2], rdb_pipe_write, safe_to_exit_pipe;
      |                     ^~~~~~~~~~~~~~
cc1: all warnings being treated as errors
```

Signed-off-by: Binbin <binloveplay1314@qq.com>
2024-07-18 11:49:37 +08:00
naglera
ff6b780fe6
Dual channel replication (#60)
In this PR we introduce the main benefit of dual channel replication by
continuously steaming the COB (client output buffers) in parallel to the
RDB and thus keeping the primary's side COB small AND accelerating the
overall sync process. By streaming the replication data to the replica
during the full sync, we reduce
1. Memory load from the primary's node.
2. CPU load from the primary's main process. [Latest performance
tests](#data)

## Motivation
* Reduce primary memory load. We do that by moving the COB tracking to
the replica side. This also decrease the chance for COB overruns. Note
that primary's input buffer limits at the replica side are less
restricted then primary's COB as the replica plays less critical part in
the replication group. While increasing the primary’s COB may end up
with primary reaching swap and clients suffering, at replica side we’re
more at ease with it. Larger COB means better chance to sync
successfully.
* Reduce primary main process CPU load. By opening a new, dedicated
connection for the RDB transfer, child processes can have direct access
to the new connection. Due to TLS connection restrictions, this was not
possible using one main connection. We eliminate the need for the child
process to use the primary's child-proc -> main-proc pipeline, thus
freeing up the main process to process clients queries.


 ## Dual Channel Replication high level interface design
- Dual channel replication begins when the replica sends a `REPLCONF
CAPA DUALCHANNEL` to the primary during initial
handshake. This is used to state that the replica is capable of dual
channel sync and that this is the replica's main channel, which is not
used for snapshot transfer.
- When replica lacks sufficient data for PSYNC, the primary will send
`-FULLSYNCNEEDED` response instead
of RDB data. As a next step, the replica creates a new connection
(rdb-channel) and configures it against
the primary with the appropriate capabilities and requirements. The
replica then requests a sync
     using the RDB channel. 
- Prior to forking, the primary sends the replica the snapshot's end
repl-offset, and attaches the replica
to the replication backlog to keep repl data until the replica requests
psync. The replica uses the main
     channel to request a PSYNC starting at the snapshot end offset. 
- The primary main threads sends incremental changes via the main
channel, while the bgsave process
sends the RDB directly to the replica via the rdb-channel. As for the
replica, the incremental
changes are stored on a local buffer, while the RDB is loaded into
memory.
- Once the replica completes loading the rdb, it drops the
rdb-connection and streams the accumulated incremental
     changes into memory. Repl steady state continues normally.

## New replica state machine


![image](https://github.com/user-attachments/assets/38fbfff0-60b9-4066-8b13-becdb87babc3)





## Data <a name="data"></a>

![image](https://github.com/user-attachments/assets/d73631a7-0a58-4958-a494-a7f4add9108f)


![image](https://github.com/user-attachments/assets/f44936ed-c59a-4223-905d-0fe48a6d31a6)


![image](https://github.com/user-attachments/assets/bd333ee2-3c47-47e5-b244-4ea75f77c836)

## Explanation 
These graphs demonstrate performance improvements during full sync
sessions using rdb-channel + streaming rdb directly from the background
process to the replica.

First graph- with at most 50 clients and light weight commands, we saw
5%-7.5% improvement in write latency during sync session.
Two graphs below- full sync was tested during heavy read commands from
the primary (such as sdiff, sunion on large sets). In that case, the
child process writes to the replica without sharing CPU with the loaded
main process. As a result, this not only improves client response time,
but may also shorten sync time by about 50%. The shorter sync time
results in less memory being used to store replication diffs (>60% in
some of the tested cases).

## Test setup 
Both primary and replica in the performance tests ran on the same
machine. RDB size in all tests is 3.7gb. I generated write load using
valkey-benchmark ` ./valkey-benchmark -r 100000 -n 6000000 lpush my_list
__rand_int__`.

---------

Signed-off-by: naglera <anagler123@gmail.com>
Signed-off-by: naglera <58042354+naglera@users.noreply.github.com>
Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
Co-authored-by: Ping Xie <pingxie@outlook.com>
Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>
2024-07-17 13:59:33 -07:00
Ping Xie
66d0f7d9a1
Ensure only primary sender drives slot ownership updates (#754)
Fixes a regression introduced in PR #445, which allowed a message from a
replica
to update the slot ownership of its primary. The regression results in a
`replicaof` cycle, causing server crashes due to the cycle detection
assert. The
fix restores the previous behavior where only primary senders can
trigger
`clusterUpdateSlotsConfigWith`.

Additional changes:

* Handling of primaries without slots is obsoleted by new handling of
when a
  sender that was a replica announces that it is now a primary.
* Replication loop detection code is unchanged but shifted downwards.
* Some variables are renamed for better readability and some are
introduced to
  avoid repeated memcmp() calls.

Fixes #753.

---------

Signed-off-by: Ping Xie <pingxie@google.com>
2024-07-16 13:05:49 -07:00
Wen Hui
1a8bd045f3
Replace master-reboot-down-after-period with primary-reboot-down-after-period in sentinel.conf (#647)
Update sentinel.conf config parameter,

From:

SENTINEL master-reboot-down-after-period mymaster 0

To:

SENTINEL primary-reboot-down-after-period myprimary 0

But we still keep the backward compatibility, clients could use SENTINEL
master-reboot-down-after-period mymaster 0 OR
SENTINEL primary-reboot-down-after-period myprimary 0

---------

Signed-off-by: hwware <wen.hui.ware@gmail.com>
2024-07-15 13:45:40 -04:00
zhenwei pi
dd4bd5065b
Introduce Valkey Over RDMA transport (experimental) (#477)
Adds an option to build RDMA support as a module:

    make BUILD_RDMA=module

To start valkey-server with RDMA, use a command line like the following:

    ./src/valkey-server --loadmodule src/valkey-rdma.so \
        port=6379 bind=xx.xx.xx.xx

* Implement server side of connection module only, this means we can
*NOT*
  compile RDMA support as built-in.

* Add necessary information in README.md

* Support 'CONFIG SET/GET', for example, 'CONFIG Set rdma.port 6380',
then
  check this by 'rdma res show cm_id' and valkey-cli (with RDMA support,
  but not implemented in this patch).

* The full listeners show like:

      listener0:name=tcp,bind=*,bind=-::*,port=6379
      listener1:name=unix,bind=/var/run/valkey.sock
      listener2:name=rdma,bind=xx.xx.xx.xx,bind=yy.yy.yy.yy,port=6379
      listener3:name=tls,bind=*,bind=-::*,port=16379

Because the lack of RDMA support from TCL, use a simple C program to
test
Valkey Over RDMA (under tests/rdma/). This is a quite raw version with
basic
library dependence: libpthread, libibverbs, librdmacm. Run using the
script:

    ./runtest-rdma [ OPTIONS ]

To run RDMA in GitHub actions, a kernel module RXE for emulated soft
RDMA, needs
to be installed. The kernel module source code is fetched a repo
containing only
the RXE kernel driver from the Linux kernel, but stored in an separate
repo to
avoid cloning the whole Linux kernel repo.

----

Since 2021/06, I created a
[PR](https://github.com/redis/redis/pull/9161) for *Redis Over RDMA*
proposal. Then I did some work to [fully abstract connection and make
TLS dynamically loadable](https://github.com/redis/redis/pull/9320), a
new connection type could be built into Redis statically, or a separated
shared library(loaded by Redis on startup) since Redis 7.2.0.

Base on the new connection framework, I created a new
[PR](https://github.com/redis/redis/pull/11182), some
guys(@xiezhq-hermann @zhangyiming1201 @JSpewock @uvletter @FujiZ)
noticed, played and tested this PR. However, because of the lack of time
and knowledge from the maintainers, this PR has been pending about 2
years.

Related doc: [Introduce *Valkey Over RDMA*
specification](https://github.com/valkey-io/valkey-doc/pull/123). (same
as Redis, and this should be same)

Changes in this PR:
- implement *Valkey Over RDMA*. (compact the Valkey style)

Finally, if this feature is considered to merge, I volunteer to maintain
it.

---------

Signed-off-by: zhenwei pi <pizhenwei@bytedance.com>
2024-07-15 14:04:22 +02:00
Viktor Söderqvist
c1bbdc796d
Skip IPv6 tests on MacOS (daily) (#786)
Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
2024-07-15 13:38:15 +02:00
KarthikSubbarao
418901dec4
Limit tracking custom errors (e.g. from LUA) while allowing non custom errors to be tracked normally (#500)
Implementing the change proposed here:
https://github.com/valkey-io/valkey/issues/487

In this PR, we prevent tracking new custom error messages (e.g. LUA) if
the number of error messages (in the errors RAX) is greater than 128.
Instead, we will track any additional custom error prefix in a new
counter: `errorstat_ERRORSTATS_OVERFLOW ` and if any non-custom flagged
errors (e.g. MOVED / CLUSTERDOWN) occur, they will continue to be
tracked as usual.

This will address the issue of spammed error messages / memory usage of
the errors RAX. Additionally, we will not have to execute `CONFIG
RESETSTAT` to restore error stats functionality because normal error
messages continue to be tracked.

Example:
```
# Errorstats
.
.
.
errorstat_127:count=2
errorstat_128:count=2
errorstat_ERR:count=1
errorstat_ERRORSTATS_OVERFLOW:count=2
```

---------

Signed-off-by: Karthik Subbarao <karthikrs2021@gmail.com>
Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
2024-07-14 20:04:47 -07:00
Brennan
34649bd034
Configurable cluster blacklist TTL (#738)
Allows cluster admins to configure the blacklist TTL as needed to allow
sufficient time for `CLUSTER FORGET` to be executed on every node in the
cluster.

Config name `cluster-blacklist-ttl`; unit seconds; deault 60.

---------

Signed-off-by: Brennan Cathcart <brennancathcart@gmail.com>
2024-07-13 20:38:25 +02:00
Binbin
b4ac2c406c
Update gitignore to ignore the cluster-cluster test files (#756)
Normally we can create a test cluster directly in the directory
using `./utils/create-cluster/create-cluster`, which would keep
the test files under `./` and messed up the git.

Signed-off-by: Binbin <binloveplay1314@qq.com>
2024-07-13 23:27:17 +08:00
Binbin
8c92b2f747
Minor fix for --loops option in normal testing framework (#781)
Inputing a negative number equivalent to --loop, and inputing a
number greater than or equal to 0 will cause the tests to be run
one more time.

Signed-off-by: Binbin <binloveplay1314@qq.com>
2024-07-13 23:25:51 +08:00
Binbin
a4ee8dada4
Fix WAITAOF test in external test due to appendonly is enabled (#775)
The test fails because, in external, another test may have
enabled appendonly, causing acklocal to return 1.

We can add a CONFIG SET to disable the appendonly, but this
is not safe too unless we use multi. The test does not actually
rely on appendonly, so we can just * it.

Fixes #770.

Signed-off-by: Binbin <binloveplay1314@qq.com>
2024-07-12 23:32:39 +08:00
Madelyn Olson
9948f07a01
Temporary skip blockwait aof test until it's fixed (#773)
See https://github.com/valkey-io/valkey/issues/770 for details about
failure. Want to prevent the test failures.

Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
2024-07-11 13:10:13 -04:00
Viktor Söderqvist
a323dce890
Dual stack and client-specific IPs in cluster (#736)
New configs:

* `cluster-announce-client-ipv4`
* `cluster-announce-client-ipv6`

New module API function:

* `ValkeyModule_GetClusterNodeInfoForClient`, takes a client id and is
otherwise just like its non-ForClient cousin.

If configured, one of these IP addresses are reported to each client in
CLUSTER SLOTS, CLUSTER SHARDS, CLUSTER NODES and redirects, replacing
the IP (`custer-announce-ip` or the auto-detected IP) of each node.
Which one is reported to the client depends on whether the client is
connected over IPv4 or IPv6.

Benefits:

* This allows clients using IPv4 to get the IPv4 addresses of all
cluster nodes and IPv6 clients to get the IPv6 clients.
* This allows the IPs visible to clients to be different to the IPs used
between the cluster nodes due to NAT'ing.

The information is propagated in the cluster bus using new Ping
extensions. (Old nodes without this feature ignore unknown Ping
extensions.)

This adds another dimension to CLUSTER SLOTS reply. It now depends on
the client's use of TLS, the IP address family and RESP version.
Refactoring: The cached connection type definition is moved from
connection.h (it actually has nothing to do with the connection
abstraction) to server.h and is changed to a bitmap, with one bit for
each of TLS, IPv6 and RESP3.

Fixes #337

---------

Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
2024-07-10 13:53:52 +02:00
Brennan
6a5a11f21c
Fix ULong config boundary checking (#752)
I noticed in #738 that we don't properly check ULong config boundaries
and made the change there. I'm pulling out that particular commit into
this PR since we don't know if we want to merge the configurable cluster
blacklist TTL yet.

---------

Signed-off-by: Brennan Cathcart <brennancathcart@gmail.com>
Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>
2024-07-09 13:25:42 -07:00
Viktor Söderqvist
b99c7237f4
Fix unstable test case EVAL+WAITAOF (#766)
Test case "EVAL - Scripts do not block on waitaof" observed to fail in
e.g.
https://github.com/valkey-io/valkey/actions/runs/9860131487/job/27233756421?pr=688

It can happen that the local AOF has been written and 1 is returned here
where 0 is expected. Writing a key inside the EVAL script makes sure
there's no time to write the AOF.

Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
2024-07-09 21:40:49 +02:00
K.G. Wang
548b4e0ea9
Calculate the actual mask to be removed in the eventloop before aeApiDelEvent (#725)
for kqueue:  
EV_DELETE fails if the specified fd is not associated with the kqfd. If
EVFILT_WRITE is associated but EVFILT_READ is not, then calling
aeApiDelEvent with mask = -1 or `(AE_READABLE|AE_WRITABLE)` will
cause the kevent() to fail with errno = 2(No such file or directory) and
EVFILT_WRITE not dissociated. So we need to calculate the actual mask
to be removed, instead of passing in whatever user provides.

for evport:  
The comment clearly states that aeApiDelEvent "rely on the fact that our
caller has already updated the mask in the eventLoop".

for epoll:  
There's no need to calculate the "actual mask" twice, once in
`aeDeleteFileEvent` and another in `aeApiDelEvent`, let's just use the
mask recorded in the eventLoop.

Fixes #715 

Signed-off-by: wkgcass <wkgcass@hotmail.com>
Co-authored-by: Andy Pan <i@andypan.me>
Co-authored-by: Binbin <binloveplay1314@qq.com>
2024-07-09 11:29:44 +08:00
uriyage
bbfd041895
Async IO threads (#758)
This PR is 1 of 3 PRs intended to achieve the goal of 1 million requests
per second, as detailed by [dan touitou](https://github.com/touitou-dan)
in https://github.com/valkey-io/valkey/issues/22. This PR modifies the
IO threads to be fully asynchronous, which is a first and necessary step
to allow more work offloading and better utilization of the IO threads.

### Current IO threads state:

Valkey IO threads were introduced in Redis 6.0 to allow better
utilization of multi-core machines. Before this, Redis was
single-threaded and could only use one CPU core for network and command
processing. The introduction of IO threads helps in offloading the IO
operations to multiple threads.

**Current IO Threads flow:**

1. Initialization: When Redis starts, it initializes a specified number
of IO threads. These threads are in addition to the main thread, each
thread starts with an empty list, the main thread will populate that
list in each event-loop with pending-read-clients or
pending-write-clients.
2. Read Phase: The main thread accepts incoming connections and reads
requests from clients. The reading of requests are offloaded to IO
threads. The main thread puts the clients ready-to-read in a list and
set the global io_threads_op to IO_THREADS_OP_READ, the IO threads pick
the clients up, perform the read operation and parse the first incoming
command.
3. Command Processing: After reading the requests, command processing is
still single-threaded and handled by the main thread.
4. Write Phase: Similar to the read phase, the write phase is also be
offloaded to IO threads. The main thread prepares the response in the
clients’ output buffer then the main thread puts the client in the list,
and sets the global io_threads_op to the IO_THREADS_OP_WRITE. The IO
threads then pick the clients up and perform the write operation to send
the responses back to clients.
5. Synchronization: The main-thread communicate with the threads on how
many jobs left per each thread with atomic counter. The main-thread
doesn’t access the clients while being handled by the IO threads.

**Issues with current implementation:**

* Underutilized Cores: The current implementation of IO-threads leads to
the underutilization of CPU cores.
* The main thread remains responsible for a significant portion of
IO-related tasks that could be offloaded to IO-threads.
* When the main-thread is processing client’s commands, the IO threads
are idle for a considerable amount of time.
* Notably, the main thread's performance during the IO-related tasks is
constrained by the speed of the slowest IO-thread.
* Limited Offloading: Currently, Since the Main-threads waits
synchronously for the IO threads, the Threads perform only read-parse,
and write operations, with parsing done only for the first command. If
the threads can do work asynchronously we may offload more work to the
threads reducing the load from the main-thread.
* TLS: Currently, we don't support IO threads with TLS (where offloading
IO would be more beneficial) since TLS read/write operations are not
thread-safe with the current implementation.

### Suggested change

Non-blocking main thread - The main thread and IO threads will operate
in parallel to maximize efficiency. The main thread will not be blocked
by IO operations. It will continue to process commands independently of
the IO thread's activities.

**Implementation details**

**Inter-thread communication.**

* We use a static, lock-free ring buffer of fixed size (2048 jobs) for
the main thread to send jobs and for the IO to receive them. If the ring
buffer fills up, the main thread will handle the task itself, acting as
back pressure (in case IO operations are more expensive than command
processing). A static ring buffer is a better candidate than a dynamic
job queue as it eliminates the need for allocation/freeing per job.
* An IO job will be in the format: ` [void* function-call-back | void
*data] `where data is either a client to read/write from and the
function-ptr is the function to be called with the data for example
readQueryFromClient using this format we can use it later to offload
other types of works to the IO threads.
* The Ring buffer is one way from the main-thread to the IO thread, Upon
read/write event the main thread will send a read/write job then in
before sleep it will iterate over the pending read/write clients to
checking for each client if the IO threads has already finished handling
it. The IO thread signals it has finished handling a client read/write
by toggling an atomic flag read_state / write_state on the client
struct.

**Thread Safety**

As suggested in this solution, the IO threads are reading from and
writing to the clients' buffers while the main thread may access those
clients.
We must ensure no race conditions or unsafe access occurs while keeping
the Valkey code simple and lock free.

Minimal Action in the IO Threads
The main change is to limit the IO thread operations to the bare
minimum. The IO thread will access only the client's struct and only the
necessary fields in this struct.
The IO threads will be responsible for the following:

* Read Operation: The IO thread will only read and parse a single
command. It will not update the server stats, handle read errors, or
parsing errors. These tasks will be taken care of by the main thread.
* Write Operation: The IO thread will only write the available data. It
will not free the client's replies, handle write errors, or update the
server statistics.


To achieve this without code duplication, the read/write code has been
refactored into smaller, independent components:

* Functions that perform only the read/parse/write calls.
* Functions that handle the read/parse/write results.

This refactor accounts for the majority of the modifications in this PR.

**Client Struct Safe Access**

As we ensure that the IO threads access memory only within the client
struct, we need to ensure thread safety only for the client's struct's
shared fields.

* Query Buffer 
* Command parsing - The main thread will not try to parse a command from
the query buffer when a client is offloaded to the IO thread.
* Client's memory checks in client-cron - The main thread will not
access the client query buffer if it is offloaded and will handle the
querybuf grow/shrink when the client is back.
* CLIENT LIST command - The main thread will busy-wait for the IO thread
to finish handling the client, falling back to the current behavior
where the main thread waits for the IO thread to finish their
processing.
* Output Buffer 
* The IO thread will not change the client's bufpos and won't free the
client's reply lists. These actions will be done by the main thread on
the client's return from the IO thread.
* bufpos / block→used: As the main thread may change the bufpos, the
reply-block→used, or add/delete blocks to the reply list while the IO
thread writes, we add two fields to the client struct: io_last_bufpos
and io_last_reply_block. The IO thread will write until the
io_last_bufpos, which was set by the main-thread before sending the
client to the IO thread. If more data has been added to the cob in
between, it will be written in the next write-job. In addition, the main
thread will not trim or merge reply blocks while the client is
offloaded.
* Parsing Fields 
    * Client's cmd, argc, argv, reqtype, etc., are set during parsing.
* The main thread will indicate to the IO thread not to parse a cmd if
the client is not reset. In this case, the IO thread will only read from
the network and won't attempt to parse a new command.
* The main thread won't access the c→cmd/c→argv in the CLIENT LIST
command as stated before it will busy wait for the IO threads.
* Client Flags 
* c→flags, which may be changed by the main thread in multiple places,
won't be accessed by the IO thread. Instead, the main thread will set
the c→io_flags with the information necessary for the IO thread to know
the client's state.
* Client Close 
* On freeClient, the main thread will busy wait for the IO thread to
finish processing the client's read/write before proceeding to free the
client.
* Client's Memory Limits 
* The IO thread won't handle the qb/cob limits. In case a client crosses
the qb limit, the IO thread will stop reading for it, letting the main
thread know that the client crossed the limit.

**TLS**

TLS is currently not supported with IO threads for the following
reasons:

1. Pending reads - If SSL has pending data that has already been read
from the socket, there is a risk of not calling the read handler again.
To handle this, a list is used to hold the pending clients. With IO
threads, multiple threads can access the list concurrently.
2. Event loop modification - Currently, the TLS code
registers/unregisters the file descriptor from the event loop depending
on the read/write results. With IO threads, multiple threads can modify
the event loop struct simultaneously.
3. The same client can be sent to 2 different threads concurrently
(https://github.com/redis/redis/issues/12540).

Those issues were handled in the current PR:

1. The IO thread only performs the read operation. The main thread will
check for pending reads after the client returns from the IO thread and
will be the only one to access the pending list.
2. The registering/unregistering of events will be similarly postponed
and handled by the main thread only.
3. Each client is being sent to the same dedicated thread (c→id %
num_of_threads).


**Sending Replies Immediately with IO threads.**

Currently, after processing a command, we add the client to the
pending_writes_list. Only after processing all the clients do we send
all the replies. Since the IO threads are now working asynchronously, we
can send the reply immediately after processing the client’s requests,
reducing the command latency. However, if we are using AOF=always, we
must wait for the AOF buffer to be written, in which case we revert to
the current behavior.

**IO threads dynamic adjustment**

Currently, we use an all-or-nothing approach when activating the IO
threads. The current logic is as follows: if the number of pending write
clients is greater than twice the number of threads (including the main
thread), we enable all threads; otherwise, we enable none. For example,
if 8 IO threads are defined, we enable all 8 threads if there are 16
pending clients; else, we enable none.
It makes more sense to enable partial activation of the IO threads. If
we have 10 pending clients, we will enable 5 threads, and so on. This
approach allows for a more granular and efficient allocation of
resources based on the current workload.

In addition, the user will now be able to change the number of I/O
threads at runtime. For example, when decreasing the number of threads
from 4 to 2, threads 3 and 4 will be closed after flushing their job
queues.

**Tests**

Currently, we run the io-threads tests with 4 IO threads
(443d80f168/.github/workflows/daily.yml (L353)).
This means that we will not activate the IO threads unless there are 8
(threads * 2) pending write clients per single loop, which is unlikely
to happened in most of tests, meaning the IO threads are not currently
being tested.

To enforce the main thread to always offload work to the IO threads,
regardless of the number of pending events, we add an
events-per-io-thread configuration with a default value of 2. When set
to 0, this configuration will force the main thread to always offload
work to the IO threads.

When we offload every single read/write operation to the IO threads, the
IO-threads are running with 100% CPU when running multiple tests
concurrently some tests fail as a result of larger than expected command
latencies. To address this issue, we have to add some after or wait_for
calls to some of the tests to ensure they pass with IO threads as well.

Signed-off-by: Uri Yagelnik <uriy@amazon.com>
2024-07-08 20:01:39 -07:00
nitaicaro
5f0ccf1478
Remove duplicate definition of UNUSED(V) (#755)
Signed-off-by: Nitai Caro <caronita@amazon.com>
Co-authored-by: Nitai Caro <caronita@amazon.com>
2024-07-07 23:44:48 +08:00
bentotten
f2bbd1ff0f
Fix minor memory leak in clusterLoadConfig (#741)
We forgot to call sdsfreesplitres in the error path during a
nodes.conf corruption check, this function exits on the error
paths so this is just a cleanup.

Signed-off-by: bentotten <59932872+bentotten@users.noreply.github.com>
2024-07-04 16:55:55 -07:00
Wen Hui
1680378845
Update redis keyword to valkey in some sentinel functions (Redis Legacy) (#706)
This PR updates all Redis/redis keywords to Valkey/valkey, including
variable names, comments, function names.

All sentinel test cases passed.

---------

Signed-off-by: hwware <wen.hui.ware@gmail.com>
2024-07-04 11:54:58 -04:00
Binbin
6bf1d02edf
Nested MULTI or WATCH in MULTI now will abort the transaction (#723)
Currently, for nested MULTI or executing WATCH in MULTI, we will return
an error but we will not abort the transaction.

```
127.0.0.1:6379> multi
OK
127.0.0.1:6379(TX)> multi
(error) ERR MULTI calls can not be nested
127.0.0.1:6379(TX)> set key value
QUEUED
127.0.0.1:6379(TX)> exec
1) OK

127.0.0.1:6379> multi
OK
127.0.0.1:6379(TX)> watch key
(error) ERR WATCH inside MULTI is not allowed
127.0.0.1:6379(TX)> set key value
QUEUED
127.0.0.1:6379(TX)> exec
1) OK
```

This is an unexpected behavior that should abort the transaction.
The number of elements returned by EXEC also doesn't match the number
of commands in MULTI.
Add the NO_MULTI flag to them so that they will
be rejected in processCommand and rejectCommand will abort the
transaction.

So there are two visible changes:

- Different words in the error messages. (Command not allowed inside a
transaction)
- Exec returns error.

Signed-off-by: Binbin <binloveplay1314@qq.com>
2024-07-03 21:27:45 +02:00
Binbin
2d6791bb11
Use clusterNodeIsVotingPrimary function to check the right (#735)
Minor cleanups.

---------

Signed-off-by: Binbin <binloveplay1314@qq.com>
2024-07-03 20:42:25 +02:00
AlanZhang1204
b298dfd6ef
Use 'primary' instead of 'master' in Sentinel tcl testing. (#724)
Use 'primary' instead of 'master' in Sentinel tcl testing.

---------

Signed-off-by: z00808363 <zhangtianlun2@huawei.com>
Co-authored-by: z00808363 <zhangtianlun2@huawei.com>
2024-07-03 10:41:27 -04:00
Harkrishn Patro
8faf2788a2
Embed key into dict entry (#541)
This PR incorporates changes related to key embedding described in the
https://github.com/redis/redis/issues/12216
With this change there will be no `key` pointer and embedded the key
within the `dictEntry`. 1 byte is used for additional bookkeeping.
Overall the saving would be 7 bytes on average.

Key changes:

New dict entry type introduced, which is now used as an entry for the
main dictionary:

```c
typedef struct {
    union {
        void *val;
        uint64_t u64;
        int64_t s64;
        double d;
    } v;
    struct dictEntry *next;  /* Next entry in the same hash bucket. */
    uint8_t key_header_size; /* offset into key_buf where the key is located at. */
    unsigned char key_buf[]; /* buffer with embedded key. */
} embeddedDictEntry;
```

One new function has been added to the dictType:

```c
size_t (*embedKey)(unsigned char *buf, size_t buf_len, const void *key, unsigned char *header_size);
```


Change is opt-in per dict type, hence sets, hashes and other types that
are using dictionary are not impacted.
With this change main dictionary now owns the data, so copy on insert in
dbAdd is no longer needed.

### Benchmarking results

TLDR; Around 9-10% memory usage reduction in overall memory usage for
scenario with key of 16 bytes and value of 8 bytes and 16 bytes. The
throughput per second varies but is similar or greater in most of the
run(s) with the changes against unstable (ae2d421).

---------

Signed-off-by: Harkrishn Patro <harkrisp@amazon.com>
Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>
2024-07-02 15:45:37 -07:00
Binbin
1ea49e5845
Make valkey compatible with redis-sentinel to start sentinel (#731)
We already have similar changes to check-rdb / check-aof, apply
this change to sentinel.

Fixes #719.

Signed-off-by: Binbin <binloveplay1314@qq.com>
2024-07-02 11:32:34 -04:00
Sankar
eff45f5467
Fix flakiness of cluster-multiple-meets and cluster-reliable-meet (#728)
Tests in cluster-multiple-meets were flaky as reported by @madolson 

*
https://github.com/valkey-io/valkey/actions/runs/9688455588/job/26776953320
*
https://github.com/valkey-io/valkey/actions/runs/9688455588/job/26776953585

I wasn't able to reproduce this locally, but I suspect that the
flakiness is coming from the fact that nodes are reported as "connected"
as long as there is an outgoing link. An outgoing link is created before
MEET is sent out.

Signed-off-by: Sankar <1890648+srgsanky@users.noreply.github.com>
2024-07-01 22:27:38 -07:00
Lipeng Zhu
3323e422ad
Introduce thread-local storage variable to update thread's own used_memory and sum when reading to reduce atomic contention. (#674)
#### Description 
This patch try to introduce a thread-local storage variable for each
thread to update its own `used_memory`, and then sum them together when
reading in `zmalloc_used_memory`. Then we can reduce unnecessary `lock
add` contention from atomic variable. We also add a protection if too
many threads created and the total threads number greater than 132, then
fall back to atomic operation for the threads index >= 132.

#### Problem Statement
`zmalloc` and `zfree` related functions will update the `used_memory`
atomicity for each operation, and they are called very frequency. From
the benchmark of
[memtier_benchmark-1Mkeys-load-stream-5-fields-with-100B-values-pipeline-10.yml](https://github.com/redis/redis-benchmarks-specification/blob/main/redis_benchmarks_specification/test-suites/memtier_benchmark-1Mkeys-load-stream-5-fields-with-100B-values-pipeline-10.yml)
, the cycles ratio of `zmalloc` and `zfree` are high, they are wrappers
for the lower allocator library, it should not take too much cycles. And
most of the cycles are contributed by `lock add` and `lock sub` , they
are expensive instructions. From the profiling, the metrics' update
mainly come from the main thread, use a TLS will reduce a lot of
contention.

#### Performance Boost

**Note:** This optimization should benefit common benchmark widely. I
choose below 2 scenarios to validate the performance boost in my local
environment.

| Test Suites | Performance Boost |
|-|-|

|[memtier_benchmark-1Mkeys-load-stream-5-fields-with-100B-values-pipeline-10](https://github.com/redis/redis-benchmarks-specification/blob/main/redis_benchmarks_specification/test-suites/memtier_benchmark-1Mkeys-load-stream-5-fields-with-100B-values-pipeline-10.yml)|8%|

|[memtier_benchmark-1Mkeys-load-string-with-100B-values-pipeline-10](https://github.com/redis/redis-benchmarks-specification/blob/main/redis_benchmarks_specification/test-suites/memtier_benchmark-1Mkeys-load-string-with-100B-values-pipeline-10.yml)|4%|

##### Test Env
- OS: Ubuntu 22.04.4 LTS
- Platform: Intel Xeon Platinum 8380
- Server and Client in same socket

##### Start Server 
```sh
taskset -c 0-3 ~/valkey/src/valkey-server /tmp/valkey_1.conf
port 9001
bind * -::*
daemonize yes
protected-mode no
save ""
```

---------

Signed-off-by: Lipeng Zhu <lipeng.zhu@intel.com>
Co-authored-by: Wangyang Guo <wangyang.guo@intel.com>
2024-07-01 21:52:43 -07:00
Binbin
0cc16d0298
Fix wrong reserved bits in ClientFlags (#729)
The bits should be 10, it causes ClientFlags to consume 8 more bytes now.
Introduced in #614.

Signed-off-by: Binbin <binloveplay1314@qq.com>
2024-07-02 12:05:21 +08:00
KarthikSubbarao
fa01a29365
Allow Module authentication to succeed when cluster is down (#693)
Module Authentication using a blocking implementation currently gets
rejected when the "cluster is down" from the client timeout cron job
(`clientsCronHandleTimeout`).

This PR exempts clients blocked on Module Authentication from being
rejected here.

---------

Signed-off-by: KarthikSubbarao <karthikrs2021@gmail.com>
2024-07-01 13:59:06 -07:00
Ping Xie
9f4f6036b8
Restore comments for client flags (#718) 2024-07-01 13:45:14 -07:00
ranshid
752b6ee8ff
Avoid compilation error oin valkey-cli (#721)
Signed-off-by: ranshid <ranshid@amazon.com>
2024-07-01 19:47:45 +08:00
ranshid
24208812a6
Increase ping and cluster timeout for cluster-slots test (#717)
cluster-slots test is tesing a very fragmented slots range of a
relatively large cluster. For this reason, when run under valgrind, some
of the nodes are timing out when cluster is attempting to converge and
propagate.
This pr sets the test's cluster-node-timeout to 90000 and
cluster-ping-interval to 1000.

Signed-off-by: ranshid <ranshid@amazon.com>
2024-06-30 16:30:46 -07:00
skyfirelee
e4c1f6d45a
Replace client flags to bitfield (#614) 2024-06-30 11:33:10 -07:00
Wen Hui
7415a576a8
Add prompt when Ctrl-C pressed (#702)
When I play the pubsub command in console, I am confused by the
following scenario:


![image](https://github.com/valkey-io/valkey/assets/51993843/c56e3976-1e8f-4053-9abb-16fa05ef6ec4)

The reason is that when I press Ctrl-C, client exits current connection
and reconnects to the server.
Thus I add one prompt message to make everyone clear what it happens.


![image](https://github.com/valkey-io/valkey/assets/51993843/cc620f27-4522-4f34-a7b3-86bcdeedfaba)

---------

Signed-off-by: hwware <wen.hui.ware@gmail.com>
2024-06-28 22:05:40 -04:00
w. ian douglas
b59762f734
Very minor misspelling in some tests (#705)
Fix misspelling "faiover" instead of "failover" in two test cases.

Signed-off-by: w. ian douglas <ian.douglas@iandouglas.com>
2024-06-28 23:56:30 +02:00
Binbin
2979fe6060
CLUSTER SLOT-STATS ORDERBY when stats are the same, compare by slot in ascending order (#710)
Test failed in my local:
```
*** [err]: CLUSTER SLOT-STATS ORDERBY LIMIT correct response pagination, where limit is less than number of assigned slots in tests/unit/cluster/slot-stats.tcl
Expected [dict exists 0 0 1 0 2 0 3 0 4 0 16383] (context: type source line 64 file /xxx/tests/unit/cluster/slot-stats.tcl cmd {assert {[dict exists $expected_slots $slot]}} proc ::assert_slot_visibility level 1)
```

It seems that when the stat is equal, that is, when the key-count is
equal,
the qsort performance will be different. When the stat is equal, we
compare
by slot (in ascending order).

Signed-off-by: Binbin <binloveplay1314@qq.com>
2024-06-28 08:03:03 -07:00
Binbin
518f0bf79b
Fix limit undefined behavior crash in CLUSTER SLOT-STATS (#709)
We did not set a default value for limit, but it will be used
in addReplyOrderBy later, the undefined behavior may crash the
server since the value could be negative and crash will happen
in addReplyArrayLen.

An interesting reproducible example (limit reuses the value of -1):
```
> cluster slot-stats orderby key-count desc limit -1
(error) ERR Limit has to lie in between 1 and 16384 (maximum number of slots).
> cluster slot-stats orderby key-count desc
Error: Server closed the connection
```

Set the default value of limit to 16384.

---------

Signed-off-by: Binbin <binloveplay1314@qq.com>
2024-06-28 08:02:52 -07:00
Binbin
7f7ef9a3fa
Update availability-zone to use the flag instead of the number 0 (#711)
Minor cleanup.

Signed-off-by: Binbin <binloveplay1314@qq.com>
2024-06-28 08:00:07 -07:00
zhaozhao.zz
4fbe31ab87
Fix the TLS and REPS issues about CLUSTER SLOTS cache (#581)
PR #53 introduced the cache of CLUSTER SLOTS response, but the cache has
some problems for different types of clients:

1. the RESP version is wrongly ignored:

    ```
    $./valkey-cli
    127.0.0.1:6379> cluster slots
    1) 1) (integer) 0
       2) (integer) 16383
       3) 1) ""
          2) (integer) 6379
          3) "f1aeceb352401ce57acd432c68c60b359c00ef85"
          4) (empty array)
    127.0.0.1:6379> hello 3
    1# "server" => "valkey"
    2# "version" => "255.255.255"
    3# "proto" => (integer) 3
    4# "id" => (integer) 3
    5# "mode" => "cluster"
    6# "role" => "master"
    7# "modules" => (empty array)
    127.0.0.1:6379> cluster slots
    1) 1) (integer) 0
       2) (integer) 16383
       3) 1) ""
          2) (integer) 6379
          3) "f1aeceb352401ce57acd432c68c60b359c00ef85"
          4) (empty array)
    ```

    RESP3 should get "empty hash" but get RESP2's "empty array"

3. we should use the original client's connect type, or lua/function and
module would get wrong port:

    ```
    $./valkey-cli --tls --insecure -p 6789
    127.0.0.1:6789> config get port tls-port
    1) "tls-port"
    2) "6789"
    3) "port"
    4) "6379"
    127.0.0.1:6789> cluster slots
    1) 1) (integer) 0
       2) (integer) 16383
       3) 1) ""
          2) (integer) 6789
          3) "f1aeceb352401ce57acd432c68c60b359c00ef85"
          4) (empty array)
    127.0.0.1:6789> eval "return redis.call('cluster','slots')" 0
    1) 1) (integer) 0
       2) (integer) 16383
       3) 1) ""
          2) (integer) 6379
          3) "f1aeceb352401ce57acd432c68c60b359c00ef85"
          4) (empty array)
        ```

---------

Signed-off-by: zhaozhao.zz <zhaozhao.zz@alibaba-inc.com>
2024-06-28 14:56:13 +08:00
Kyle Kim (kimkyle@)
1269532fbd
Introduce CLUSTER SLOT-STATS command (#20). (#351)
The command provides detailed slot usage statistics upon invocation,
with initial support for key-count metric. cpu-usec (approved) and
memory-bytes (pending-approval) metrics will soon follow after the
merger of this PR.

---------

Signed-off-by: Kyle Kim <kimkyle@amazon.com>
Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>
2024-06-27 16:58:27 -07:00
Wen Hui
7719dbb84b
Update readonly and readwrite json (#704)
Update and align with the latest readonly.md and readwrite.md doc under
https://github.com/valkey-io/valkey-doc/tree/main/commands

Signed-off-by: hwware <wen.hui.ware@gmail.com>
2024-06-27 13:23:53 -07:00
John Sully
ad5704f803
Upstream the availability zone info string from KeyDB (#700)
When Redis/Valkey/KeyDB is run in a cloud environment across multiple
AZ's it is preferable to keep traffic local to an AZ both for cost
reasons and for latency. This is typically done when you are enabling
reads on replicas with the READONLY command.

For this change we are creating a setting that is echo'd back in the
info command. We do not want to add the cloud SDKs as dependencies and
this is the easiest way around that. It is fairly trivial to grab the AZ
from the cloud and push that into your setting file.

Currently at Snapchat we have a custom client that after connecting
reads this from the server and will preferentially use that server if
the AZ string matches its internally configured AZ.

In the future it would be ideal if we used this information when
performing failover or even exposed it in cluster nodes.

Signed-off-by: John Sully <john@csquare.ca>
2024-06-27 12:30:26 -07:00
Binbin
2b0723957e
Enable protected-configs, debug and module commands in create-cluster script (#701)
The create-cluster in utils mainly used to create a test cluster, 
turning on these options is useful for testing purposes.

Signed-off-by: Binbin <binloveplay1314@qq.com>
2024-06-27 12:27:09 -07:00
zhaozhao.zz
28c5a17edf
replica redirect read&write to primary in standalone mode (#325)
To implement #319 

1. replica is able to redirect read and write commands to it's primary
in standalone mode
    * reply with "-REDIRECT primary-ip:port"
2. add a subcommand `CLIENT CAPA redirect`, a client can announce the
capability to handle redirection
    * if a client can handle redirection, the data access commands (read and
write) will be redirected
3. allow `readonly` and `readwrite` command in standalone mode, may be a
breaking change
    * a client with redirect capability cannot process read commands on a
replica by default
    * use READONLY command can allow read commands on a replica

---------

Signed-off-by: zhaozhao.zz <zhaozhao.zz@alibaba-inc.com>
2024-06-27 19:00:45 +08:00
Ouri Half
ab3873011a
Replacing REDIS_STATIC with static (#691)
As discussed, we want to remove the old `REDIS_STATIC` flag which is no
longer relevant.

When moving from Redis to Valkey we renamed all REDIS flags in Makefile.
The REDIS_STATIC flag was renamed to SERVER_STATIC, but this change was
not updated in some of the files.

After discussing it with @madolson and @ranshid, we decided that since
this was introduced 10 years ago, and in many places in the code base we
simply use `static`, we should simplify and remove the flag entirely.

---------

Signed-off-by: Ouri Half <ourih@amazon.com>
2024-06-26 09:47:59 -07:00
Pierre
495c35d918
Add check in CLUSTERLINK KILL cmd to avoid freeing links to myself (#689)
Add check in CLUSTERLINK KILL cmd to avoid freeing cluster bus links to
myself. Also add an assert in `freeClusterLink()`.

Testing:
```
127.0.0.1:6379> debug clusterlink kill all c0404ee68574c6aa1048aaebfe90283afe51d2fc
(error) ERR Cannot free cluster link(s) to myself
```

Signed-off-by: Pierre Turin <pieturin@amazon.com>
2024-06-25 15:18:30 -07:00
Kyle Kim (kimkyle@)
b49eaad367
Introduce a minimal debugger for .tcl integration test suite. (#683)
Introduce a break-point function called `bp`, based on the tcl wiki's
minimal debugger.

```tcl
 proc bp {{s {}}} {
    if ![info exists ::bp_skip] {
        set ::bp_skip [list]
    } elseif {[lsearch -exact $::bp_skip $s]>=0} return
    if [catch {info level -1} who] {set who ::}
    while 1 {
        puts -nonewline "$who/$s> "; flush stdout
        gets stdin line
        if {$line=="c"} {puts "continuing.."; break}
        if {$line=="i"} {set line "info locals"}
        catch {uplevel 1 $line} res
        puts $res
    }
 }
```

```
... your test code before break-point
bp 1
... your test code after break-point
```

The `bp 1` will give back the tcl interpreter to the developer, and
allow you to interactively print local variables (through `puts`), run
functions and so forth.

Source: https://wiki.tcl-lang.org/page/A+minimal+debugger

---------

Signed-off-by: Kyle Kim <kimkyle@amazon.com>
Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>
2024-06-25 10:24:53 -07:00
ranshid
3df9d42794
Fix bad memory accounting for sds when no malloc_size available (#694)
Issue Introduced  by #453. 
When we check the SDS _TYPE_5 allocation size we mistakenly used
zmalloc_size which DOES take the PREFIX size into account when no
malloc_size support.
Later when we free we add the PREFIX_SIZE again which leads to negative
memory accounting on some tests.
Example test failure:
https://github.com/valkey-io/valkey/actions/runs/9654170962/job/26627901497

Signed-off-by: ranshid <ranshid@amazon.com>
2024-06-25 08:18:07 -07:00
Lipeng Zhu
4d3d6c06a1
Reduce redundant call of prepareClientToWrite when call addReply* continuously. (#670)
## Description

While exploring hotspots with profiling some benchmark workloads, we
noticed the high cycles ratio of `prepareClientToWrite`, taking about 9%
of the CPU of `smembers`, `lrange` commands. After deep dive the code
logic, we thought we can gain the performance by reducing the redundant
call of `prepareClientToWrite` when call addReply* continuously.

For example: In
https://github.com/valkey-io/valkey/blob/unstable/src/networking.c#L1080-L1082,
`prepareClientToWrite` is called three times in a row.

---------

Signed-off-by: Lipeng Zhu <lipeng.zhu@intel.com>
Co-authored-by: Wangyang Guo <wangyang.guo@intel.com>
2024-06-24 18:33:30 -07:00
Ping Xie
32ca6e5b38
Improve CLUSTER SETSLOT replication handling to support older replica versions. (#686) 2024-06-23 22:08:52 -07:00
Madelyn Olson
ce79539047
Fail tests immediately if the server is no longer running (#678)
Fix a minor inconvenience I have when writing tests. If I have a typo or
forget to generate the tls certificates, the start_server handle will
just loop for 2 minutes before printing the error. This just fails and
prints as soon as it sees the error.

Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
2024-06-21 15:29:05 +08:00
Binbin
bf1fb1fd36
Fix copy-paste error in scripts eviction test (#671)
The test needs to test "return 2" but mistakenly uses "return 1".
Also remove a extra debug print.

Signed-off-by: Binbin <binloveplay1314@qq.com>
2024-06-20 10:28:47 +08:00
poiuj
0143b7c9dd
Add zfree_with_size to optimize sdsfree since we can get zmalloc_size from the header (#453)
### Description ###
zfree updates memory statistics. It gets the size of the buffer from
jemalloc by calling zmalloc_size. This operation is costly. We can avoid
it if we know the buffer size. For example, we can calculate size of sds
from the data we have in its header.

This commit introduces zfree_with_size function that accepts both
pointer to a buffer, and its size. zfree is refactored to call
zfree_with_size.

sdsfree uses the new interface for all but SDS_TYPE_5.

### Benchmark ###

Dataset is 3 million strings. Each benchmark run uses its own value size
(8192, 512, and 120). The benchmark is 100% write load for 5 minutes.

```
value size       new tps      old tps      %       new us/call    old us/call    %
8k               272088.53    269971.75    0.78    1.83           1.92           -4.69
512              356881.91    352856.72    1.14    1.27           1.35           -5.93
120              377523.81    368774.78    2.37    1.14           1.19           -4.20
```

---------

Signed-off-by: Vadym Khoptynets <vadymkh@amazon.com>
Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>
2024-06-19 16:13:55 -07:00
Wen Hui
5d2348cee2
Update json file and sentinelCommand function for Valkey Sentinel (#675)
In this PR, we update master keyword to primary keyword several in
sentinel command json file and sentinelCommand function.
And there is no update for configurable parameters in sentinel.conf file

Signed-off-by: hwware <wen.hui.ware@gmail.com>
2024-06-19 17:16:49 -04:00
Wen Hui
e84eda9092
Remove useless code in sentinel source code (#676)
Just remove them.

Signed-off-by: hwware <wen.hui.ware@gmail.com>
2024-06-19 17:16:35 -04:00
kukey
ae2d4217e1
Add new SCRIPT SHOW subcommand to dump script via sha1 (#617)
In some scenarios, the business may not be able to find the
previously used Lua script and only have a SHA signature.
Or there are multiple identical evalsha's args in monitor/slowlog,
and admin is not able to distinguish the script body.

Add a new script subcommmand to show the contents of script
given the scripts sha1. Returns a NOSCRIPT error if the script
is not present in the cache.

Usage: `SCRIPT SHOW sha1`
Complexity: `O(1)`

Closes #604.
Doc PR: https://github.com/valkey-io/valkey-doc/pull/143

---------

Signed-off-by: wei.kukey <wei.kukey@gmail.com>
Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>
Co-authored-by: Binbin <binloveplay1314@qq.com>
Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
2024-06-18 17:48:58 -07:00
ranshid
be2c321682
Support RDB compatability with Redis 7.2.4 RDB format (#665)
This PR makes our current RDB format compatible with the Redis 7.2.4 RDB
format. there are 2 changes introduced in this PR:
1. Move back the RDB version to 11
2. Make slot info section persist as AUX data instead of dedicated
section.

We have introduced slot-info as part of the work to replace cluster
metadata with slot specific dictionaries. This caused us to bump the RDB
version and thus we prevent downgrade (which is conceptualy O.K but
better be prevented). We do not require the slot-info section to exist,
so making it an AUX section will help suppport version downgrade from
Valkey 8.

fixes: [#645](https://github.com/valkey-io/valkey/issues/645)

NOTE: tested manually by:
1. connecting Redis 7.2.4 replica to a Valkey 8(RC) 
2. upgrade/downgrade Redis 7.2.4 cluster and Valkey 8(RC) cluster

---------

Signed-off-by: ranshid <ranshid@amazon.com>
Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
2024-06-18 12:04:06 -07:00
Binbin
a2cc2fe26d
Fix memory leak when loading slot migrations states fails (#658)
When we goto eoferr, we need to release the auxkey and auxval,
this is a cleanup, also explicitly check that decoder return
value is C_ERR.

Introduced in #586.

Signed-off-by: Binbin <binloveplay1314@qq.com>
2024-06-17 21:18:57 -07:00
Madelyn Olson
b33f932c56
Add missing commas from debug command (#662)
The missing commas caused the `DEBUG HELP` to be compressed onto a
single line.

Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
2024-06-18 12:08:08 +08:00
Ping Xie
4135894a5d
Update remaining master references to primary (#660)
Signed-off-by: Ping Xie <pingxie@google.com>
2024-06-17 20:31:15 -07:00
Binbin
495a121f19
Adjust the log level of some logs in the cluster (#633)
I think the log of pfail status changes will be very useful.
The other parts were scanned and found that it can be modified.

Changes:
1. Changing pfail status releated logs from VERBOSE to NOTICE.
2. Changing configEpoch collision log from VERBOSE(warning) to NOTICE.
3. Changing some logs from DEBUG to NOTICE.

Signed-off-by: Binbin <binloveplay1314@qq.com>
Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>
2024-06-18 10:46:56 +08:00
Andy Pan
5a51bf5045
Combine events to eliminate redundant kevent(2) calls (#638)
Combine events to eliminate redundant kevent(2) calls 
to improve performance.

---------

Signed-off-by: Andy Pan <i@andypan.me>
2024-06-16 21:18:20 -07:00
Binbin
db6d3c1138
Only primary with slots has the right to mark a node as failed (#634)
In markNodeAsFailingIfNeeded we will count needed_quorum and failures,
needed_quorum is the half the cluster->size and plus one, and
cluster-size
is the size of primary node which contain slots, but when counting
failures, we dit not check if primary has slots.

Only the primary has slots that has the rights to vote, adding a new
clusterNodeIsVotingPrimary to formalize this concept.

Release notes:

bugfix where nodes not in the quorum group might spuriously mark nodes
as failed

---------

Signed-off-by: Binbin <binloveplay1314@qq.com>
Co-authored-by: Ping Xie <pingxie@outlook.com>
2024-06-16 20:46:08 -07:00
Sankar
a81c32079c
Make cluster meet reliable under link failures (#461)
When there is a link failure while an ongoing MEET request is sent the
sending node stops sending anymore MEET and starts sending PINGs. Since
every node responds to PINGs from unknown nodes with a PONG, the
receiving node never adds the sending node. But the sending node adds
the receiving node when it sees a PONG. This can lead to asymmetry in
cluster membership. This changes makes the sender keep sending MEET
until it sees a PONG, avoiding the asymmetry.

---------

Signed-off-by: Sankar <1890648+srgsanky@users.noreply.github.com>
2024-06-16 20:37:09 -07:00
Samuel Adetunji
93123f97a0
Format yaml files (#615)
Closes #533

---------

Signed-off-by: adetunjii <adetunjithomas1@outlook.com>
2024-06-14 13:40:06 -07:00
Madelyn Olson
6faa48a358
Don't initialize the key buffer in getKeysResult (#631)
getKeysResults is typically initialized with 2kb of zeros (16 * 256),
which isn't strictly necessary since the only thing we have to
initialize is some of the metadata fields. The rest of the data can
remain junk as long as we don't access it. This was a bit of a
regression in 7.0 with the keyspecs, since we doubled the size of the
zeros, but hopefully this recovers a lot of the performance drop.

I saw a modest performance bump for deep pipeline of cluster mode (~8%).

I think we would see some comparable improvements in the other places
where we are using it such as tracking and ACLs.

---------

Signed-off-by: Madelyn Olson <matolson@amazon.com>
2024-06-14 08:42:00 -07:00
Binbin
d5496e42bc
Lua scripts promoted from eval to script load to avoid evict (#637)
In ad28d222edcef9d4496fd7a94656013f07dd08e5, we added a Lua eval
scripts eviction. If the script was previously added via EVAL, we
promote it to SCRIPT LOAD, prevent it from being evicted later.

Signed-off-by: Binbin <binloveplay1314@qq.com>
2024-06-14 08:32:19 -07:00
Wenwen-Chen
4c6bf30f58
Latency report: Rebranding and refine Dave dialog (#644)
This patch try to correct the latency report.

1. Rename Redis to Valkey.
2. Remove redundant Dave dialog, and refine the output message.

---------

Signed-off-by: Wenwen Chen <wenwen.chen@samsung.com>
Signed-off-by: hwware <wen.hui.ware@gmail.com>
2024-06-14 12:56:59 +02:00
Ping Xie
8a776c3509
Fix potential infinite loop in clusterNodeGetPrimary (#651) 2024-06-13 23:43:36 -07:00
Ping Xie
5d9d41868d
Replace DEBUG RESTART with pause_server and resume_server (#652) 2024-06-13 17:52:50 -07:00
Binbin
d309b9b235
Make configs dir/dbfilename/cluster-config-file reject empty string (#636)
Until now, these configuration items allowed typing empty strings,
but empty strings behave strangely.

Empty dir will fail in chdir with No such file or directory:
```
./src/valkey-server --dir ""

*** FATAL CONFIG FILE ERROR (Version 255.255.255) ***
Reading the configuration file, at line 2
>>> 'dir ""'
No such file or directory
```

Empty dbfilename will cause shutdown to fail since it will
always fail in rdb save:
```
./src/valkey-server --dbfilename ""

 * User requested shutdown...
 * Saving the final RDB snapshot before exiting.
 # Error moving temp DB file temp-19530.rdb on the final destination  (in server root dir /xxx/xxx/valkey): No such file or directory
 # Error trying to save the DB, can't exit.
 # Errors trying to shut down the server. Check the logs for more information.
```

Empty cluster-config-file will fail in clusterLockConfig:
```
./src/valkey-server --cluster-enabled yes --cluster-config-file ""

 Can't open  in order to acquire a lock: No such file or directory
```

With this patch, now we will just reject it in config set like:
```
*** FATAL CONFIG FILE ERROR (Version 255.255.255) ***
Reading the configuration file, at line 2
>>> 'xxx ""'
xxx can't be empty
```

Signed-off-by: Binbin <binloveplay1314@qq.com>
2024-06-14 01:47:20 +02:00
Harkrishn Patro
76fc041685
represent cluster node flags with bitwise shift value (#642)
While debugging a cluster bus issue, found the cluster node flags were
represented in numbers. I generally find it easy when these are
represented as bitwise shift operation. It improves readability a bit.

Signed-off-by: Harkrishn Patro <harkrisp@amazon.com>
2024-06-14 00:58:03 +02:00
uriyage
d211078a27
Fix query buffer resized test flakiness (#646)
Added a wait_for_condition to avoid the timing issue.
```
*** [err]: query buffer resized correctly in tests/unit/querybuf.tcl
Expected 11 >= 16384 && 11 <= 32770 (context: type eval line 24 cmd {assert {$orig_test_client_qbuf >= 16384 && $orig_test_client_qbuf <= $MAX_QUERY_BUFFER_SIZE}} proc ::test)
*** [err]: query buffer resized correctly when not idle in tests/unit/querybuf.tcl
Expected 11 > 32768 (context: type eval line 14 cmd {assert {$orig_test_client_qbuf > 32768}} proc ::test)
*** [err]: query buffer resized correctly with fat argv in tests/unit/querybuf.tcl
query buffer should not be resized when client idle time smaller than 2s
```

Signed-off-by: Uri Yagelnik <uriy@amazon.com>
2024-06-13 18:07:07 +08:00
Harkrishn Patro
b546dd26e5
Allow CLUSTER NODES/INFO/MYID/MYSHARDID during loading state (#596)
Allow CLUSTER subcommands NODES, INFO, MYID, MYSHARDID while loading
data.

It's safe to allow them and it's helpful for clients to get cluster
nodes/info information
during a node failover and while loading data to monitor the
state of the cluster.

---------

Signed-off-by: Harkrishn Patro <harkrisp@amazon.com>
2024-06-13 06:09:01 +02:00
Madelyn Olson
627d387ad8
Improve reliability of querybuf test (#639)
We've been seeing some pretty consistent failures from
`test-valgrind-test` and `test-sanitizer-address` because of the
querybuf test periodically failing. I tracked it down to the test
periodically taking too long and the client cron getting triggered. A
simple solution is to just disable the cron during the key race
condition. I was able to run this locally for 100 iterations without
seeing a failure.

Example:
https://github.com/valkey-io/valkey/actions/runs/9474458354/job/26104103514
and
https://github.com/valkey-io/valkey/actions/runs/9474458354/job/26104106830.

Signed-off-by: Madelyn Olson <matolson@amazon.com>
2024-06-12 14:27:42 -07:00
Viktor Söderqvist
4bb7cc471a
Remove unnecessary clang-format off annotations (#628)
We added some clang-format off comments before we had decided on the
format configuration. Now, it turns out that turning formatting off is
often not necessary.

---------

Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
2024-06-12 12:52:18 +02:00
Shivshankar
e65b2d235c
Update rewriteConfigSaveOption function code to rewrite multiple save in one line. (#583)
Currently, "config rewrite" writes some default value in the config file
incase of empty config file specified.

But it adds multiple "save" config entries as follows:
```
save 3600 1
save 300 100
save 60 10000
```

After the fix the save will look like:
```
save 3600 1 300 100 60 10000
```

---------

Signed-off-by: Shivshankar-Reddy <shiva.sheri.github@gmail.com>
2024-06-10 16:24:04 -04:00
Madelyn Olson
a3f1535b57
Fix misuse of safe iterators (#612)
Safe iterators must call resetIterators when they are done being used.
Fix one issue where a safe iterator was not correctly calling reset
during cluster slot caching and fixed a second issue where reset
iterator was being called twice. For the double release case,
kvstoreIteratorNextDict is responsible for patching up the iterator, but
we were calling it a second time in kvstoreIteratorNext.

In addition, I added some documentation around initializing iterators,
added an assert to prevent double initialization, and remove a function
from the public interface which isn't needed and might lead to incorrect
usage of the safe iterators.

Bumping srgsanky for finding it here:
c4782066e7 (r142867004).

---------

Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
2024-06-10 12:30:57 -07:00
Wen Hui
95a753b18d
Add BSD license explicitly (#620)
Add "BSD 3-Clause License" in License 1 and License 2 part

---------

Signed-off-by: hwware <wen.hui.ware@gmail.com>
Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>
2024-06-10 13:14:37 -04:00
Neal Gompa (ニール・ゴンパ)
71dd85dc5a
src/Makefile: Link libatomic on POWER systems (#607)
This ensures that fallbacks for unsupported atomic operations are
available for POWER systems.

Signed-off-by: Neal Gompa <neal@gompa.dev>
2024-06-09 15:09:08 -07:00
skyfirelee
09b5825b26
Moving client->authenticated to a flag instead of an int (#592)
Moving client->authenticated to a flag

Fix #589 

Signed-off-by: artikell <739609084@qq.com>
2024-06-09 11:49:05 -07:00
flowerysong
d28ae52004
Remove redundant function nextPingExt() (#613)
Functionally identical to the older, documented `getNextPingExt()`.

Fixes #610.

Signed-off-by: Paul Arthur <paul.arthur@flowerysong.com>
2024-06-08 21:55:58 -07:00
Ping Xie
aad6769a80
Replicate slot migration states via RDB aux fields (#586) 2024-06-07 20:32:27 -07:00
Ping Xie
54c9747935
Remove master and slave from source code (#591)
External facing interfaces are not affected.

---------

Signed-off-by: Ping Xie <pingxie@google.com>
2024-06-07 14:21:33 -07:00
Madelyn Olson
bce240eab7
Replace masteruser and masterauth with primaryuser and primaryauth (#598)
Make the one backwards compatible config change we are allowed to
replace for removing master from our API.

`masterauth` and `masteruser` are still used as an alias, but aren't
explicitly referenced. As an addendum to
https://github.com/valkey-io/valkey/pull/591, it would be good to have
this in 8. Given the related PR for updated other references for master,
I just updated the ones around this specific change.

Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
2024-06-07 00:46:52 -07:00
Viktor Söderqvist
ad5fd5b95c
More rebranding (#606)
More rebranding of

* Log messages (#252)
* The DENIED error reply
* Internal function names and comments, mainly Lua API

---------

Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
2024-06-07 01:40:55 +02:00
Viktor Söderqvist
278ce0cae0
Rebrand the Lua debugger (#603)
Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
2024-06-06 19:53:17 +02:00
Eran Liberty
60c10a5a4d
Remove valdup from BenchmarkDictType (#600)
makes SERVER_CFLAGS='-DSERVER_TEST' compile as well

Introduced in #443.

Signed-off-by: Eran Liberty <eranl@amazon.com>
Co-authored-by: Eran Liberty <eranl@amazon.com>
2024-06-05 11:00:53 +08:00
Shivshankar
9319f7aeca
Replace valkey in log and panic messages (#550)
Part of #207

---------

Signed-off-by: Shivshankar-Reddy <shiva.sheri.github@gmail.com>
2024-06-04 20:46:59 +02:00
Eran Liberty
0700c441c6
Remove unused valDup (#443)
Remove the unused value duplicate API from dict. It's unused in the codebase and introduces unnecessary overhead. 

---------

Signed-off-by: Eran Liberty <eran.liberty@gmail.com>
2024-06-03 12:22:06 -07:00
Madelyn Olson
b95e7c384f
Skip tls for xgroup read regression since it doesn't matter (#595)
"Client blocked on XREADGROUP while stream's slot is migrated" uses the
migrate command, which requires special handling for TLS and non-tls.
This was not being handled, so was throwing an error.

Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
2024-06-03 11:49:15 -07:00
uriyage
b72e43ed16
Adjust query buffer resized correctly test to non-jemalloc allocators. (#593)
Test `query buffer resized correctly` start to fail
(https://github.com/valkey-io/valkey/actions/runs/9278013807) with
non-jemalloc allocators after
https://github.com/valkey-io/valkey/pull/258 PR.

With Jemalloc we allocate ~20K for the query buffer, in the test we read
1 byte in the first read, in the second read we make sure we have at
least 16KB free place in the query buffer and we have as Jemalloc
allocated 20KB, But with non jemalloc we allocate in the first read
exactly 16KB. in the second read we check and see that we don't have
16KB free space as we already read 1 byte hence we reallocate this time
greedly (*2 of the requested size of 16KB+1) hence the test condition
that the querybuf size is < 32KB is no longer true

The `query buffer resized correctly test` starts
[failing](https://github.com/valkey-io/valkey/actions/runs/9278013807)
with non-jemalloc allocators after PR #258 .

With jemalloc, we allocate ~20KB for the query buffer. In the test, we
read 1 byte initially and then ensure there is at least 16KB of free
space in the buffer for the second read, which is satisfied by
jemalloc's 20KB allocation. However, with non-jemalloc allocators, the
first read allocates exactly 16KB. When we check again, we don't have
16KB free due to the 1 byte already read. This triggers a greedy
reallocation (doubling the requested size of 16KB+1), causing the query
buffer size to exceed the 32KB limit, thus failing the test condition.

This PR adjusted the test query buffer upper limit to be 32KB +2.

Signed-off-by: Uri Yagelnik <uriy@amazon.com>
2024-06-03 11:15:28 -07:00
poiuj
417660449f
Adjust sds types (#502)
sds type should be determined based on the size of the underlying
buffer, not the logical length of the sds. Currently we truncate the
alloc field in case buffer is larger than we can handle. It leads to a
mismatch between alloc field and the actual size of the buffer. Even
considering that alloc doesn't include header size and the null
terminator.

It also leads to a waste of memory with jemalloc. For example, let's
consider creation of sds of length 253. According to the length, the
appropriate type is SDS_TYPE_8. But we allocate `253 + sizeof(struct
sdshdr8) + 1` bytes, which sums to 257 bytes. In this case jemalloc
allocates buffer from the next size bucket. With current configuration
on Linux it's 320 bytes. So we end up with 320 bytes buffer, while we
can't address more than 255.

The same happens with other types and length close enough to the
appropriate powers of 2.

The downside of the adjustment is that with allocators that do not
allocate larger than requested chunks (like GNU allocator), we switch to
a larger type "too early". It leads to small waste of memory.
Specifically: sds of length 31 takes 35 bytes instead of 33 (2 bytes
wasted) sds of length 255 takes 261 bytes instead of 259 (2 bytes
wasted) sds of length 65,535 takes 65,545 bytes instead of 65,541 (4
bytes wasted) sds of length 4,294,967,295 takes 4,294,967,313 bytes
instead of 4,294,967,305 (8 bytes wasted)

---------

Signed-off-by: Vadym Khoptynets <vadymkh@amazon.com>
2024-06-02 20:55:54 -07:00
naglera
28e055af0b
Deflake chained replicas disconnect (#574)
Deflake chained replicas disconnect when replica re-connect with the
same master. sync_partial_ok counter might get incremented if replica
timed out during test.

Signed-off-by: naglera <anagler123@gmail.com>
2024-06-02 20:53:39 -07:00
Ping Xie
30f277a86d
Enable debug asserts for cluster and sentinel tests (#588)
Also make `enable-debug-assert` an immutable config

Address review comments in #584

---------

Signed-off-by: Ping Xie <pingxie@google.com>
2024-06-02 13:15:08 -07:00
Chen Tianjie
d16b4ec1b9
Unshare object to avoid LRU/LFU being messed up (#250)
When LRU/LFU enabled, Valkey does not allow using shared objects, as
value objects may be shared among many different keys and they can't
share LRU/LFU information.

However `maxmemory-policy` is modifiable at runtime. If LRU/LFU is not
enabled at start, but then enabled when some shared objects are already
used, there could be some confusion in LRU/LFU information.

For `set` command it is OK since it is going to create a new object when
LRU/LFU enabled, but `get` command will not unshare the object and just
update LRU/LFU information.

So we may duplicate the object in this case. It is a one-time task for
each key using shared objects, unless this is the case for so many keys,
there should be no serious performance degradation.

Still, LRU will be updated anyway, no matter LRU/LFU is enabled or not,
because `OBJECT IDLETIME` needs it, unless `maxmemory-policy` is set to
LFU. So idle time of a key may still be messed up.

---------

Signed-off-by: chentianjie.ctj <chentianjie.ctj@alibaba-inc.com>
Signed-off-by: Chen Tianjie <TJ_Chen@outlook.com>
2024-06-01 10:09:20 +02:00
Ping Xie
2b97aa6171
Introduce enable-debug-assert to enable/disable debug asserts at runtime (#584)
Introduce a new hidden server configuration, `enable-debug-assert`, which
allows selectively enabling or disabling, at runtime, expensive or risky
assertions used primarily for debugging and testing.

Fix #569

---------

Signed-off-by: Ping Xie <pingxie@google.com>
2024-05-31 22:50:08 -07:00
Ping Xie
f927565d28
Consolidate various BLOCKED_WAIT* states (#562)
There are currently three block types: BLOCKED_WAIT, BLOCKED_WAITAOF,
and BLOCKED_WAIT_PREREPL, used to block clients executing `WAIT`,
`WAITAOF`, and `CLUSTER SETSLOT`, respectively. They share the same
workflow: the client is blocked until replication to the expected number
of replicas completes. However, they provide different responses
depending on the commands involved. Using distinct block types leads to
code duplication and reduced readability. This PR consolidates the three
types into a single WAIT type, differentiating them using the pending
command to ensure the appropriate response is returned.


Fix #427

---------

Signed-off-by: Ping Xie <pingxie@google.com>
2024-05-30 23:45:47 -07:00
nitaicaro
6fb90adf4b
Fix crash where command duration is not reset when client is blocked … (#526)
In #11012, we changed the way command durations were computed to handle
the same command being executed multiple times. In #11970, we added an
assert if the duration is not properly reset, potentially indicating
that a call to report statistics was missed.

I found an edge case where this happens - easily reproduced by blocking
a client on `XGROUPREAD` and migrating the stream's slot. This causes
the engine to process the `XGROUPREAD` command twice:

1. First time, we are blocked on the stream, so we wait for unblock to
come back to it a second time. In most cases, when we come back to
process the command second time after unblock, we process the command
normally, which includes recording the duration and then resetting it.
2. After unblocking we come back to process the command, and this is
where we hit the edge case - at this point, we had already migrated the
slot to another node, so we return a `MOVED` response. But when we do
that, we don’t reset the duration field.

Fix: also reset the duration when returning a `MOVED` response. I think
this is right, because the client should redirect the command to the
right node, which in turn will calculate the execution duration.

Also wrote a test which reproduces this, it fails without the fix and
passes with it.

---------

Signed-off-by: Nitai Caro <caronita@amazon.com>
Co-authored-by: Nitai Caro <caronita@amazon.com>
2024-05-30 12:55:00 -07:00
Wen Hui
0d2ba9b94d
Update redis legacy word when run TLS cert file (#572)
Reference:
https://github.com/valkey-io/valkey-doc/blob/main/topics/encryption.md

Before we runtest --tls, we need first run utils/gen-test-certs.sh

I found there are some redis legacy word there, update them.

Signed-off-by: hwware <wen.hui.ware@gmail.com>
2024-05-30 13:09:29 -04:00
Binbin
6bab2d7968
Make sure clear the CLUSTER SLOTS cache on time when updating hostname (#564)
In #53, we will cache the CLUSTER SLOTS response to improve the
throughput and reduct the latency.

In the code snippet below, the second cluster slots will use the
old hostname:
```
config set cluster-preferred-endpoint-type hostname
config set cluster-announce-hostname old-hostname.com
multi
cluster slots
config set cluster-announce-hostname new-hostname.com
cluster slots
exec
```

When updating the hostname, in updateAnnouncedHostname, we will set
CLUSTER_TODO_SAVE_CONFIG and we will do a clearCachedClusterSlotsResponse
in clusterSaveConfigOrDie, so harmless in most cases.

Move the clearCachedClusterSlotsResponse call to clusterDoBeforeSleep
instead of scheduling it to be called in clusterSaveConfigOrDie.

Signed-off-by: Binbin <binloveplay1314@qq.com>
2024-05-30 10:44:12 +08:00
LiiNen
168da8b52e
Fix bitops.c clang-format properly (#570)
ref:
- https://github.com/valkey-io/valkey/pull/118 (my pervious change)
- https://github.com/valkey-io/valkey/pull/461 (issuing that clang
format checker fails due to my change)

There was an issue that clang-format cheker failed.
I don't know why I missed it and why it didn't catch.

just running `clang-format -i bitops.c` was all.

Signed-off-by: LiiNen <kjeonghoon065@gmail.com>
2024-05-28 21:49:50 -07:00
LiiNen
96dcd1183a
Change BITCOUNT 'end' as optional like BITPOS (#118)
_This change is the thing I suggested to redis when it was BSD, and is
not just migration - this is of course more advanced_

### Issue
There is weird difference in syntax between BITPOS and BITCOUNT:
```
BITPOS key bit [start [end [BYTE | BIT]]]
BITCOUNT key [start end [BYTE | BIT]]
```

I think this might cause confusion in terms of usability.
It was not just a syntax typo error, and really works differently.
The results below are with unstable build:
```
> get TEST:ABCD
"ABCD"
> BITPOS TEST:ABCD 1 0 -1
(integer) 1
> BITCOUNT TEST:ABCD 0 -1
(integer) 9
> BITPOS TEST:ABCD 1 0
(integer) 1
> BITCOUNT TEST:ABCD 0
(error) ERR syntax error
```

### What did I fix

simply changes logic, to accept BITCOUNT also without 'end' - 'end'
become optional, like BITPOS
```
> GET TEST:ABCD
"ABCD"
> BITPOS TEST:ABCD 1 0 -1
(integer) 1
> BITCOUNT TEST:ABCD 0 -1
(integer) 9
> BITPOS TEST:ABCD 1 0
(integer) 1
> BITCOUNT TEST:ABCD 0
(integer) 9
```

Of course, I also fixed syntax hint:
```
# ASIS 
> BITCOUNT key [start end [BYTE|BIT]]
# TOBE
> BITCOUNT key [start [end [BYTE|BIT]]]
```

![image](https://github.com/valkey-io/valkey/assets/38001238/8485f58e-6785-4106-9f3f-45e62f90d24b)


### Moreover ...
I hadn't noticed that there was very small dead code in these command
logic, when I wrote PR to redis.
I found it now, when write code again, so I wrote it in valkey.
``` c
/* asis unstable */

/* bitcountCommand() */
if (!strcasecmp(c->argv[4]->ptr,"bit")) isbit = 1;
// ...
if (c->argc < 4) {
    if (isbit) end = (totlen<<3) + 7;
    else end = totlen-1;
}

/* bitposCommand() */
if (!strcasecmp(c->argv[5]->ptr,"bit")) isbit = 1;
// ...
if (c->argc < 5) {
    if (isbit) end = (totlen<<3) + 7;
    else end = totlen-1;
}
```
Bit variable (actually int) "isbit" is only being set as 1, when 'BIT'
is declared.
But we were checking whether 'isbit' is true or false in this 'if'
phrase, even if isbit could never be 1, because argc is always less than
4 (or 5 in bitpos).



I think this minor fixes will make valkey command operation more
consistent.
Of course, this PR contains just changing args from "required" to
"optional", so it will never hurt previous users.

Thanks,

---------

Signed-off-by: LiiNen <kjeonghoon065@gmail.com>
Co-authored-by: Madelyn Olson <34459052+madolson@users.noreply.github.com>
2024-05-28 15:01:28 -04:00
uriyage
fd58b73f0a
Introduce shared query buffer for client reads (#258)
This PR optimizes client query buffer handling in Valkey by introducing
a shared query buffer that is used by default for client reads. This
reduces memory usage by ~20KB per client by avoiding allocations for
most clients using short (<16KB) complete commands. For larger or
partial commands, the client still gets its own private buffer.

The primary changes are:

* Adding a shared query buffer `shared_qb` that clients use by default
* Modifying client querybuf initialization and reset logic
* Copying any partial query from shared to private buffer before command
execution
* Freeing idle client query buffers when empty to allow reuse of shared
buffer
* Master client query buffers are kept private as their contents need to
be preserved for replication stream

In addition to the memory savings, this change shows a 3% improvement in
latency and throughput when running with 1000 active clients.

The memory reduction may also help reduce the need to evict clients when
reaching max memory limit, as the query buffer is the main memory
consumer per client.

---------

Signed-off-by: Uri Yagelnik <uriy@amazon.com>
Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>
2024-05-28 11:09:37 -07:00
Shivshankar
7ba7e4d053
Update zfree on data in test_crc64combine before return. (#548)
Signed-off-by: Shivshankar-Reddy <shiva.sheri.github@gmail.com>
Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>
2024-05-28 10:36:54 -07:00
Ping Xie
84157890fd
Set up clang-format github action (#538)
Setup clang-format GitHub action to ensure coding style consistency
---------

Signed-off-by: Ping Xie <pingxie@google.com>
2024-05-28 09:27:51 -07:00
Viktor Söderqvist
4e44f5aae9
Fix races in test for tot-net-in, tot-net-out, tot-cmds (#559)
The races are between the '$rd' client and the 'r' client in the test
case.

Test case "client input output and command process statistics" in
unit/introspection.

---------

Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
2024-05-28 17:13:16 +02:00
Viktor Söderqvist
045d475a94
Implement REPLCONF VERSION (#554)
The replica sends its version when initiating replication, in
pipeline with other REPLCONF commands.

The primary stores it in the client struct. Other fields are made
smaller to avoid making the client struct consume more memory.

Fixes #414.

---------

Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
2024-05-27 23:03:34 +02:00
Ping Xie
e4ead9442b
Make CLUSTER SETSLOT with TIMEOUT 0 block indefinitely (#556)
This aligns the behaviour with established Valkey commands with a
TIMEOUT argument, such as BLPOP.

Fix #422

Signed-off-by: Ping Xie <pingxie@google.com>
2024-05-27 07:11:24 -07:00
Samuel Adetunji
5d0f4bc9f0
Require C11 atomics (#490)
- Replaces custom atomics logic with C11 default atomics logic.
- Drops  "atomicvar_api" field from server info

Closes #485

---------

Signed-off-by: adetunjii <adetunjithomas1@outlook.com>
Signed-off-by: Samuel Adetunji <adetunjithomas1@outlook.com>
Co-authored-by: teej4y <samuel.adetunji@prunny.com>
2024-05-26 18:41:11 +02:00
Jonathan Wright
1c55f3ca5a
Replace centos 7 with alternative versions (#543)
replace centos 7 with almalinux 8, add almalinux 9, centos stream 9, fedora stable, rawhide

Fixes #527

---------

Signed-off-by: Jonathan Wright <jonathan@almalinux.org>
Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>
2024-05-24 16:08:51 -07:00
Madelyn Olson
fbbabe3543
Revert format updates on config.c file for config block (#552)
Although I think this improves the readability of individual configs,
the fact there are now 1k more lines of configs makes this overall much
harder to parse. So reverting it back to the way it was before.

`,\n               [ ]+` replace with `, `.

---------

Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
2024-05-24 15:53:44 -07:00
Viktor Söderqvist
d72ba06dd0
Make cluster replicas return ASK and TRYAGAIN (#495)
After READONLY, make a cluster replica behave as its primary regarding
returning ASK redirects and TRYAGAIN.

Without this patch, a client reading from a replica cannot tell if a key
doesn't exist or if it has already been migrated to another shard as
part of an ongoing slot migration. Therefore, without an ASK redirect in
this situation, offloading reads to cluster replicas wasn't reliable.

Note: The target of a redirect is always a primary. If a client wants to
continue reading from a replica after following a redirect, it needs to
figure out the replicas of that new primary using CLUSTER SHARDS or
similar.

This is related to #21 and has been made possible by the introduction of
Replication of Slot Migration States in #445.

----

Release notes:

During cluster slot migration, replicas are able to return -ASK
redirects and -TRYAGAIN.

---------

Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
2024-05-24 17:58:03 +02:00
Harkrishn Patro
3dd2f5a586
Undeprecate cluster slots command (#536)
Undeprecate cluster slots command. This command is widely used by
clients to form the cluster topology and with the recent change to
improve performance of `CLUSTER SLOTS` command via #53 as well as us
looking to further improve the usability via #517, it makes sense to
undeprecate this command.

---------

Signed-off-by: Harkrishn Patro <harkrisp@amazon.com>
2024-05-24 07:14:23 +02:00
Ping Xie
a0aebb6b67
Reinitialize pointer 'p' after ziplistDeleteRange to fix head deletion bug (#537)
Fix
https://github.com/valkey-io/valkey/actions/runs/9200055659/job/25305949916

Signed-off-by: Ping Xie <pingxie@google.com>
2024-05-22 23:37:18 -07:00
Ping Xie
c41dd77a3e
Add clang-format configs (#323)
I have validated that these settings closely match the existing coding
style with one major exception on `BreakBeforeBraces`, which will be
`Attach` going forward. The mixed `BreakBeforeBraces` styles in the
current codebase are hard to imitate and also very odd IMHO - see below

```
if (a == 1) { /*Attach */
}
```

```
if (a == 1 ||
    b == 2)
{ /* Why? */
}
```

Please do NOT merge just yet. Will add the github action next once the
style is reviewed/approved.

---------

Signed-off-by: Ping Xie <pingxie@google.com>
2024-05-22 23:24:12 -07:00
Roshan Khatri
c4782066e7
Cache CLUSTER SLOTS response for improving throughput and reduced latency. (#53)
This commit adds a logic to cache `CLUSTER SLOTS` response for reduced
latency and also updates the cache when a change in the cluster is
detected.

Historically, `CLUSTER SLOTS` command was deprecated, however all the
server clients have been using `CLUSTER SLOTS` and have not migrated to
`CLUSTER SHARDS`. In future this logic can be added to any other
commands to improve the performance of the engine.

---------

Signed-off-by: Roshan Khatri <rvkhatri@amazon.com>
2024-05-22 14:21:41 -07:00
Mason Hall
72538622ff
Migrate ziplist.c unit tests to new framework (#486)
Issue #428.

Moved the SERVER_TEST block from ziplist.c into unit tests in
test_ziplist.c. I left the benchmark related tasks alone in their own
test, as I am not sure what to do with them.

Some of the assertions are a little vague/useless, but I will try to
refine them.

---------

Signed-off-by: Mason Hall <hallmason17@gmail.com>
2024-05-21 17:54:09 -07:00
Siddhartha Sankar Mondal
005a018db6
Deprecate MacOS 11 build target (#524)
Deprecate MacOS 11 build target. End of life June 2024.  Fixes #523

---------

Signed-off-by: Siddhartha Mondal <siddharthmondal@gmail.com>
Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>
Co-authored-by: Roshan Khatri <117414976+roshkhatri@users.noreply.github.com>
2024-05-21 12:21:28 -07:00
Madelyn Olson
acb74f8da1
Delete dead code "zfree_usable" (#518)
Implemented in
3945a32177 (diff-a154d1fa454a9868e2c455acdae971e3605151516f9a8efac7f2c9b2845d914d),
this function is never called and never used. I was trying to understand
whether we could use this for another PR, but couldn't really find a
point for it because it didn't do exactly what I expected.

Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
2024-05-20 14:52:23 -07:00
Lipeng Zhu
122cba5103
Quick fix of failure when use libc allocator in daily CI. (#521)
Make `sdsResize` logic align with `_sdsnewlen`, fix of
https://github.com/valkey-io/valkey/pull/476.
Fix: 

https://github.com/valkey-io/valkey/actions/runs/9143545732/job/25140329542#step:6:1505

https://github.com/valkey-io/valkey/actions/runs/9135654696/job/25123330670#step:6:1501

Signed-off-by: Lipeng Zhu <lipeng.zhu@intel.com>
2024-05-20 14:52:12 -07:00
Binbin
e7e5a104ec
Revert mmap_rnd bits back to default value in CI (#520)
In 3f725b8, we introduced a change back in march to reduce the
entropy of ASLR, because ASAN didn't support it. Now the
vm.mmap_rnd_bits
was reverted in actions/runner-images#9491 so can remove this changes.

Closes #519.

Signed-off-by: Binbin <binloveplay1314@qq.com>
2024-05-20 12:23:25 -07:00
Madelyn Olson
d52c8f30e0
Include stddef in zmalloc.h (#516)
zmalloc.h uses size_t, which requires stddef. It seems like all the
previous code paths were properly including it except for ASAN, which
uses libc malloc and skips the special mac only malloc inclusions.

Example failure:
https://github.com/valkey-io/valkey/actions/runs/9143545732/job/25140329029

See
https://github.com/valkey-io/valkey/actions/runs/9149533754/job/25153263875.

Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
2024-05-19 11:33:33 -07:00
artikell
dcc9fd4fe8
Resolve numtests counter error (#514)
When supporting test, a type error was found

Signed-off-by: artikell <739609084@qq.com>
2024-05-19 10:48:28 -07:00
Viktor Söderqvist
efa8ba519b
Finish postponed SCAN changes (#501)
Commit 07ed0eafa98a66 introduced some SCAN improvements, but some
changes were postponed to a later version (8.0), which this PR finishes:

1. Prepare to move the TYPE filtering to the scan callback as well. this
was put on hold since it has side effects that can be considered a
breaking change, which is that we will not attempt to do lazy expire
(delete) a key that was filtered by not matching the TYPE (changing it
would mean TYPE filter starts behaving the same as MATCH filter already
does in that respect).
2. when the specified key TYPE filter is an unknown type, server will
reply a error immediately instead of doing a full scan that comes back
empty handed.

Fixes #235

Release notes:

> SCAN: Expired keys that don't match the TYPE argument for the SCAN are
no longer deleted by SCAN

Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
2024-05-17 13:35:31 +02:00
Madelyn Olson
9b6232b501
Automatically notify the slack channel when tests fail (#509)
Adds a job that will automatically run at the end of the daily, which
will collect all the failed tests and send them to the developer slack.
It will include a link to the job as well.

Example job that ran on my private repo:
https://github.com/madolson/valkey/actions/runs/9123245899/job/25085418567

Example notification: 
<img width="662" alt="image"
src="https://github.com/valkey-io/valkey/assets/34459052/69127db4-e416-4321-bc06-eefcecab1130">
(Note: I removed the sassy text at the bottom from the PR)

Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
2024-05-16 23:51:33 -07:00
Ping Xie
fd53f17a61
Use pause_process to stop a node to make Valgrind happy, hopefully (#508)
Signed-off-by: Ping Xie <pingxie@google.com>
2024-05-16 22:59:00 -07:00
Lipeng Zhu
7a9951fb80
Correct the actual allocated size from allocator when call sdsRedize to align the logic with sdsnewlen function. (#476)
This patch try to correct the actual allocated size from allocator
when call sdsRedize to align the logic with sdsnewlen function.

Maybe the https://github.com/valkey-io/valkey/pull/453 optimization
should depend on this.

Signed-off-by: Lipeng Zhu <lipeng.zhu@intel.com>
2024-05-15 18:22:50 -07:00
Arthur Lee
3de5c71f48
[Feat] Support fast fail option for tcl test cases (#482)
This PR added a new option for tcl test case which will fail fast once any test cases fail.
This can be useful while running redis CI pipeline, and you want to accelerate the CI pipeline.

usage for example

> ./runtest --single unit/type/hash --fast-fail

---------

Signed-off-by: arthur.lee <arthur-lee@qq.com>
2024-05-15 06:55:24 -07:00
Madelyn Olson
6e4a61093e
Make it to so that unit tests build on mac (#499)
The test logic is not smart enough to realize that a test is fully
#ifdef'd out, so it will try to attach it to the test suite anyways.
This is a minor work around for the reclaim file page test so that it
will still attach the test, it will just always succeed. Also remove an
unnecessary print statement that was missed in the same test.

Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
2024-05-15 14:48:30 +08:00
Madelyn Olson
546cef6684
Initial cleanup for cluster refactoring (#460)
Cleaned up the minor cluster refactoring notes that were intended to be
follow ups that never happened. Basically:
1. Minor style nitpicks
2. Generalized clusterNodeIsMyself so that it wasn't implementation
dependent.
3. Removed getMyClusterId, and just make it an explicit call to myself's
name, which seems more straightforward and removes unnecessary
abstraction.
4. Remove clusterNodeGetSlaveof infavor of clusterNodeGetMaster. We
already do a check if it's a replica, and if it wasn't working it would
have been crashing.

Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
2024-05-14 17:09:49 -07:00
Karthick Ariyaratnam
741ee702ca
[New] Migrate zmalloc.c unit tests to new test framework. (#493)
This is the actual PR which is created to migrate all tests related to
zmalloc into new test framework as part of the parent issue
https://github.com/valkey-io/valkey/issues/428.

Signed-off-by: Karthick Ariyaratnam <karthyuom@gmail.com>
2024-05-14 15:54:33 -07:00
Viktor Söderqvist
72f2a8743c
Minor fix in module API doc script (#494)
The script extracts the comments and prototypes from module.c and does
some pre-processing, e.g. converts URLs to markdown links. The URL
regexp didn't account for '#', '?' (and a few more chars) so an URL like
`https://example.com/#section` was converted to markdown as

    [https://example.com/](https://example.com/)#section

With this change, it's instead correctly converted to

    [https://example.com/#section](https://example.com/#section)

Additional change: Removes an unused metadata field.

Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
2024-05-14 01:12:30 +02:00
Madelyn Olson
4e18e326a1
Remove endian coverage from server.c (#492)
In c7ad9feb52,
we missed removed endian coverage from the legacy unit tests, so it failed to find it when building.

Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
2024-05-13 11:02:41 +08:00
Karthick Ariyaratnam
c7ad9feb52
Migrate endianconv.c unit tests to new test framework (#458)
This PR migrates all tests related to endianconv into new test framework
as part of the parent issue https://github.com/valkey-io/valkey/issues/428.

---------

Signed-off-by: Karthick Ariyaratnam <karthyuom@gmail.com>
Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>
2024-05-12 16:58:50 -07:00
Andy Pan
dca1722340
Use kqueue as the backend of AE on DragonFlyBSD (#450)
Currently, we use select(2) on DragonFlyBSD while
`kqueue` is available on DragonFlyBSD since FreeBSD 4.1
and DragonFlyBSD was originally forked from FreeBSD 4.8

`select(2)` is a pretty old technique that has many defects
compared to `kqueue`, we should switch to `kqueue` on DragonFlyBSD.

Signed-off-by: Andy Pan <i@andypan.me>
2024-05-12 16:29:00 -07:00
Ping Xie
ac47ca2d47
Suppress ASAN errors on tests that intentially crash the server via crash-memcheck-enabled no (#489)
Fix daily CI run errors like
https://github.com/valkey-io/valkey/actions/runs/9039450198/job/24842308071#step:6:4176

Signed-off-by: Ping Xie <pingxie@google.com>
2024-05-12 16:08:47 -07:00
Shivshankar
07367df981
Update rdb and module's child proc name to valkey accordingly (compatible with redis symlink) (#454)
If `valkey-server` was started with the `redis-server` symlink, the old
proc names are used, for backwards compatibility.

---------

Signed-off-by: Shivshankar-Reddy <shiva.sheri.github@gmail.com>
2024-05-10 21:51:01 +02:00
Shivshankar
e242799867
Migrate sha1 unit test to new framework (#470)
This migrates unit tests related to sha1 to new framework, ref: #428.

---------

Signed-off-by: Shivshankar-Reddy <shiva.sheri.github@gmail.com>
Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>
2024-05-09 19:44:40 -07:00
Ping Xie
138a7d9846
Handle role change error in cluster setslot when migrating the last slot away with allow-replica-migration enabled (#466) 2024-05-09 18:12:55 -07:00
Andy Pan
8048abb2fd
Support pipe2() on *BSD (#462)
Before this PR, `pipe2()` is only enabled on Linux and FreeBSD while
`pipe2()` is available on *BSD.

This PR enables `pipe2()` for the rest of *BSD: DragonFlyBSD, NetBSD and
OpenBSD.

## References

- [pipe2 on
DraonFlyBSD](https://man.dragonflybsd.org/?command=pipe&section=2)
- [__DragonFly_version for
pipe2](7485684fa5/sys/sys/param.h (L121))
- [pipe2 on  NetBSD](https://man.netbsd.org/pipe.2)
- [pipe2 on OpenBSD](https://man.openbsd.org/pipe.2)

Signed-off-by: Andy Pan <i@andypan.me>
2024-05-10 02:30:39 +02:00
Lipeng Zhu
0342a81b7c
Migrate sds.c unit tests to new test framework. (#478)
This patch migrates all tests in sds.c into new test framework as part
of the parent issue https://github.com/valkey-io/valkey/issues/428.

---------

Signed-off-by: Lipeng Zhu <lipeng.zhu@intel.com>
Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>
2024-05-09 14:54:39 -07:00
Arthur Lee
4e1d8e1721
[Fix] move deps from slowlog.c into slowlog.h (#465)
Move dependency from `slowlog.c` into `slowlog.h`, make sure the
language server can work properly under `slowlog.h`

Signed-off-by: arthur.lee <arthur-lee@qq.com>
2024-05-09 14:29:18 -07:00
Arthur Lee
2559f64f5a
[FIX] Remove redundant statement and return (#481)
* `freeClientArgv` was previously defined in `server.h`
* remove the redundant return

Signed-off-by: arthur.lee <arthur-lee@qq.com>
2024-05-09 14:26:45 -07:00
Karthick Ariyaratnam
b166980c8e
Fix UNUSED repetition issue in test sources (#475)
This is a follow-up PR to address UNUSED repetition issue (see
https://github.com/valkey-io/valkey/pull/446#discussion_r1593204956) in
different test source files.

Signed-off-by: Karthick Ariyaratnam <karthyuom@gmail.com>
2024-05-09 14:26:15 -07:00
Binbin
fdd023ff82
Migrate cluster mode tests to normal framework (#442)
We currently has two disjoint TCL frameworks:
1. Normal testing framework, which trigger by runtest, which individually
launches nodes for testing.
2. Cluster framework, which trigger by runtest-cluster, which pre-allocates
N nodes and uses them for testing large configurations.

The normal TCL testing framework is much more readily tested and is also
automatically run as part of the CI for new PRs. The runtest-cluster since
it runs very slowly (cannot be parallelized), it currently only runs in daily
CI, this results in some changes to the cluster not being exposed in PR CI
in time.

This PR migrate the Cluster mode tests to normal framework. Some cluster
tests are kept in runtest-cluster because of timing issues or not yet
supported, we can process them later.

Signed-off-by: Binbin <binloveplay1314@qq.com>
2024-05-09 10:14:47 +08:00
Shivshankar
6cff0d6a7b
Remove intsettest declaration from intset.h (#471)
All the intset unit tests are migrated to new test framework as part of
https://github.com/valkey-io/valkey/pull/344, but the old framework
declaration is missed to remove from intset.h. So removed the code.

Signed-off-by: Shivshankar-Reddy <shiva.sheri.github@gmail.com>
2024-05-08 12:38:50 -07:00
Shivshankar
1125bdbb80
Update serverpanic output based on 'extended-redis-compatibility' config. (#415)
Updated serverPanic output in db.c based on the
extended-redis-compatibility config. and also updated comments in other
files.

---------

Signed-off-by: Shivshankar-Reddy <shiva.sheri.github@gmail.com>
2024-05-08 12:17:32 -07:00
Viktor Söderqvist
6af51f5092
Prevent clang-format in certain places (#468)
This is a preparation for adding clang-format.

These comments prevent automatic formatting in some places. With these
exceptions, we will be able to run clang-format on the rest of the code.

This is a preparation for #323.

---------

Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
2024-05-08 20:58:53 +02:00
Shivshankar
2278dfd253
Fix build error for unit test (#473)
Fix the compile error with the following command:
`make all-with-unit-tests SERVER_CFLAGS='-Werror -DSERVER_TEST'
`

```
/usr/bin/ld: /home/ubuntu/valkey-shiv-repo/valkey/src/eval.c:1172: undefined reference to `lua_next'
/usr/bin/ld: /home/ubuntu/valkey-shiv-repo/valkey/src/eval.c:1154: undefined reference to `lua_toboolean'
/usr/bin/ld: /home/ubuntu/valkey-shiv-repo/valkey/src/eval.c:1175: undefined reference to `lua_type'
/usr/bin/ld: /home/ubuntu/valkey-shiv-repo/valkey/src/eval.c:1176: undefined reference to `lua_tonumber'
collect2: error: ld returned 1 exit status
make[1]: *** [Makefile:469: valkey-unit-tests] Error 1
make[1]: Leaving directory '/home/ubuntu/valkey-shiv-repo/valkey/src'
make: *** [Makefile:6: all-with-unit-tests] Error 2
```

Issue is happened as all deps libraries not linked for
valkey-unit-tests, so linked all libraries to the binary.

Signed-off-by: Shivshankar-Reddy <shiva.sheri.github@gmail.com>
2024-05-08 14:52:21 -04:00
Shivshankar
315b7573c4
Update server function's name to valkey (#456)
Updated valkey in follwing functions.

genRedisInfoString -> genValkeyInfoString
genRedisInfoStringCommandStats -> genValkeyInfoStringCommandStats
genRedisInfoStringACLStats -> genValkeyInfoStringACLStats
genRedisInfoStringLatencyStats -> genValkeyInfoStringLatencyStats

Signed-off-by: Shivshankar-Reddy <shiva.sheri.github@gmail.com>
2024-05-08 09:44:05 -04:00
Karthick Ariyaratnam
4e944cedee
Migrate kvstore.c unit tests to new test framework. (#446)
This PR migrates all tests related to kvstore into new test framework as
part of the parent issue https://github.com/valkey-io/valkey/issues/428.

---------

Signed-off-by: Karthick Ariyaratnam <karthyuom@gmail.com>
Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>
2024-05-07 16:49:24 -07:00
Shivshankar
1aca85e3de
Update module api and variable to valkey accordingly. (#455)
Updated redis instances accordingly as follows.
rediscmd -> serverCmd
freeRedisModuleAsyncRMCallPromise -> freeValkeyModuleAsyncRMCallPromise
MyCommand_RedisCommand -> MyCommand_ValkeyCommand
RedisModuleString -> ValkeyModuleString
flushRedisModuleIOBuffer -> flushValkeyModuleIOBuffer

Signed-off-by: Shivshankar-Reddy <shiva.sheri.github@gmail.com>
2024-05-07 16:29:46 -07:00
Karthick Ariyaratnam
2ed71de8e1
Migrate util.c unit tests to new test framework (#448)
This PR migrates all tests related to util into new test framework as
part of the parent issue https://github.com/valkey-io/valkey/issues/428.

---------

Signed-off-by: Karthick Ariyaratnam <karthyuom@gmail.com>
2024-05-07 16:21:23 -07:00
Andy Pan
cde8ec1b41
Don't try to set SO_REUSEADDR on sockets of AF_UNIX (#451)
Despite the fact that SO_REUSEADDR can be set on a Unix domain socket
via setsockopt() without reporting an error, SO_REUSEADDR was actually
created for ipv4/ipv6 and it's not supported for sockets of AF_UNIX.

Therefore, setting this option on a Unix domain socket does nothing but
costs one extra system call.

Signed-off-by: Andy Pan <i@andypan.me>
2024-05-07 19:25:26 +02:00
Chen Tianjie
ba9dd7b23a
Add noscores option to command ZSCAN. (#324)
Command syntax is now:
```
ZSCAN key cursor [MATCH pattern] [COUNT count] [NOSCORES]
```
Return format:
```
127.0.0.1:6379> zadd z 1 a 2 b 3 c
(integer) 3
127.0.0.1:6379> zscan z 0
1) "0"
2) 1) "a"
   2) "1"
   3) "b"
   4) "2"
   5) "c"
   6) "3"
127.0.0.1:6379> zscan z 0 noscores
1) "0"
2) 1) "a"
   2) "b"
   3) "c"
```
when NOSCORES is on, the command will only return members in the zset,
without scores.

For client side parsing the command return, I believe it is fine as long
as the command is backwards compatible. The return structure are still
lists, what has changed is the content. And clients can tell the
difference by the subcommand they use.

Since `novalues` option of `HSCAN` is already accepted
(redis/redis#12765), I think similar thing can be done to `ZSCAN`.

---------

Signed-off-by: Chen Tianjie <TJ_Chen@outlook.com>
2024-05-07 14:39:28 +08:00
Ping Xie
6e7af9471c
Slot migration improvement (#445) 2024-05-06 21:40:28 -07:00
Karthick Ariyaratnam
e2aec3b1a2
Fix an error in unit/README (#447)
This PR fixes an error in the unit/READMED.md (see
https://github.com/valkey-io/valkey/issues/428) in order to correct the
steps for running single unit test file.

Signed-off-by: Karthick Ariyaratnam <karthyuom@gmail.com>
2024-05-06 18:34:09 -07:00
NAM UK KIM
93f8a19b6f
Change strlcat function name from redis to valkey (#440)
Updated strlcat function and macros name (redis_strlcat ->
valkey_strlcat).

I think the standard strcat function is not safe.
(https://codeql.github.com/codeql-query-help/cpp/cpp-unsafe-strcat/)
So, it would be better to keep it as a safe function.

Signed-off-by: NAM UK KIM <namuk2004@naver.com>
2024-05-06 00:09:01 -07:00
Madelyn Olson
1b3199e070
Fix unit test issues on address sanitizer and fortify (#437)
This commit does four things:

1. On various images, the linker was not able to correctly load the flto
optimizations from the archive generated for unit tests, and was
throwing errors. I was able to solve this by updating the plugin for the
fortify test, but was unable to reproduce it on the ASAN tests or find a
solution. So I decided to go with a single solution for now, which was
to just disable the linker optimizations for those tests. This shouldn't
weaken the protections provided by ASAN.
2. The change to remove flto for some reason caused some odd inlining
behavior in the intset test, that I wasn't really able to understand.
The error was basically that we were doing a 4 byte write, starting at
byte offset 8, for the first addition to listpack that was of size 10.
Practically this has no effect, since I'm not aware of any allocator
that would give us a 10 byte block as opposed to 12 (or more likely 16)
bytes. The isn't the correct behavior, since an uninitialized listpack
defaults to 16bit encoding, which should only be writing 2 bytes. I
rabbit holed like 2 hours into this, and gave up and just ignored the
warning on the file.
3. Now that address sanitizer was correctly running, it picked up two
issues. A memory leak and uninitialized value, so those were easy to
fix.
4. There is also a small change to the fortify to build the test up
front instead of later, this is just to be consistent with other tests
and has no functional change.

Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
2024-05-05 22:00:08 -07:00
Chen Tianjie
cc703aa3bc
Input output traffic stats and command process count for each client. (#327)
We already have global stats for input traffic, output traffic and how
many commands have been executed.

However, some users have the difficulty of locating the IP(s) which have
heavy network traffic. So here some stats for single client are
introduced.
```              
tot-net-in   // Total network input bytes read from the client
tot-net-out  // Total network output bytes sent to the client
tot-cmds     // Total commands the client has executed     
```                             
These three stats are shown in `CLIENT LIST` and `CLIENT INFO`.

Though the metrics are handled in hot paths of the code, personally I
don't think it will slow down the server. Considering all other complex
operations handled nearby, this is only a small and simple operation.

However we do need to be cautious when adding more and more metrics, as
discussed in redis/redis#12640, we may need to find a way to tell
whether this has obvious performance degradation.

---------

Signed-off-by: Chen Tianjie <TJ_Chen@outlook.com>
2024-05-05 21:52:59 -07:00
Sergey Fedorov
9ebbd5f038
Fix issues for older versions of Darwin and improve PowerPC support (#436)
Existing code does not build on macOS < 10.7. There are two issues:
1. The check for `MAC_OS_10_6_DETECTED` does the opposite of what it
should, since `AvailabilityMacros.h` does not define underscore-prefixed
versions of macros. `__MAC_OS_X_VERSION_MAX_ALLOWED` evaluates to 0, and
on 10.6 everything breaks down:

Credits to @mohd-akram who pointed out at possible origin of the this
problem.

2. Once that is fixed, on 10.6 when building for `ppc` there are new
errors, because the code uses inaccurate assumptions for archs. Fix that
too.

---------

Signed-off-by: Sergey Fedorov <vital.had@gmail.com>
2024-05-04 12:38:06 -07:00
Björn Svensson
39d4b43d4b
Pin versions of Github Actions in CI (#221)
Pin the Github Action dependencies to the hash according to secure
software development best practices
recommended by the Open Source Security Foundation (OpenSSF).

When developing a CI workflow, it's common to version-pin dependencies
(i.e. actions/checkout@v4). However, version tags are mutable, so a
malicious attacker could overwrite a version tag to point to a malicious
or vulnerable commit instead.
Pinning workflow dependencies by hash ensures the dependency is
immutable and its behavior is guaranteed.
See
https://github.com/ossf/scorecard/blob/main/docs/checks.md#pinned-dependencies

The `dependabot` supports updating a hash and the version comment so its
update will continue to work as before.

Links to used actions and theit tag/hash for review/validation:
https://github.com/actions/checkout/tags    (v4.1.2 was rolled back)
https://github.com/github/codeql-action/tags
https://github.com/maxim-lobanov/setup-xcode/tags
https://github.com/cross-platform-actions/action/releases/tag/v0.22.0
https://github.com/py-actions/py-dependency-install/tags
https://github.com/actions/upload-artifact/tags
https://github.com/actions/setup-node/tags
https://github.com/taiki-e/install-action/releases/tag/v2.32.2

This PR is part of #211.

Signed-off-by: Björn Svensson <bjorn.a.svensson@est.tech>
2024-05-04 01:54:14 +02:00
Viktor Söderqvist
472c1ca26b
Update links in module API docs (generated from module.c) (#433)
These are used in the docs and on the website.

Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
2024-05-04 00:14:56 +02:00
Mike Dolan
43692aca7a
Update README.md to add Valkey entity information (#432)
I added details at the end with the Valkey entity information under an
"About Valkey" heading. Please feel free to adjust the heading if that's
not an appropriate one to use.

---------

Signed-off-by: Mike Dolan <mikedolan@gmail.com>
Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>
2024-05-03 12:12:13 -07:00
pshankinclarke
3b256a5af0
Improve TLS.md configuration instructions (#385)
Signed-off-by: Parker Shankin-Clarke <parkerwsc1@gmail.com>
Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
2024-05-03 20:13:10 +02:00
Ted Lyngmo
8f974484ed
Log the real reason for why posix_fadvise failed (#430)
`reclaimFilePageCache` did not set `errno` but `rdbSaveInternal` which
is logging the error assumed it did. This makes sure `errno` is set.

Signed-off-by: Ted Lyngmo <ted@lyncon.se>
2024-05-03 11:10:33 -07:00
Madelyn Olson
5b1fd222ed
An initial simple unit test framework (#344)
The core idea was to take a lot of the stuff from the C unity framework
and adapt it a bit here. Each file in the `unit` directory that starts
with `test_` is automatically assumed to be a test suite. Within each
file, all functions that start with `test_` are assumed to be a test.

See unit/README.md for details about the implementation.

Instead of compiling basically a net new binary, the way the tests are
compiled is that the main valkey server is compiled as a static archive,
which we then compile the individual test files against to create a new
test executable. This is not all that important now, other than it makes
the compilation simpler, but what it will allow us to do is overwrite
functions in the archive to enable mocking for cross compilation unit
functions. There are also ways to enable mocking from within the same
compilation unit, but I don't know how important this is.

Tests are also written in one of two styles:
1. Including the header file and directly calling functions from the
archive.
2. Importing the original file, and then calling the functions. This
second approach is cool because we can call static functions. It won't
mess up the archive either.

---------

Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
2024-05-02 20:00:04 -07:00
Ikko Eltociear Ashimine
443d80f168
Fix typo in comment in quicklist.h (#416)
Signed-off-by: Ikko Eltociear Ashimine <eltociear@gmail.com>
2024-05-02 17:36:07 +02:00
Viktor Söderqvist
d1de34930a
Document the commands JSON files (#403)
These JSON files were originally not intended to be used directly, since
they contain internals and some fiels like "acl_categories" that are not
the final ACL categories. (Valkey will apply some implicit rules to
compute the final ACL categories.) However, people see JSON files
and use them directly anyway.

So it's better to document them.

In a later PR, we can get rid of all implicit ACL categories and instead
populate them explicitly in the JSON files. Then, we'll add a validation
(e.g. in generate-command-code.py) that the implied categories are set.

---------

Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
Co-authored-by: Binbin <binloveplay1314@qq.com>
2024-05-02 17:02:38 +02:00
Rolandas Šimkus
68ca258b0f
Changed links and naming to valkey instead of redis (#389)
This is a minor change where only naming and links now points properly
to valkey.

Fixes #388

---------

Signed-off-by: Rolandas Šimkus <rolandas@simkus.io>
Signed-off-by: simkusr <rolandas.s@wilibox.com>
Signed-off-by: simkusr <rolandas@simkus.io>
Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
Co-authored-by: simkusr <rolandas.s@wilibox.com>
Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
2024-05-02 14:53:37 +02:00
Shivshankar
8abeb79f52
Rename redis in aof logs and proc title redis-aof-rewrite to valkey-aof-rewrite (#393)
Renamed redis to valkey/server in aof.c serverlogs.

The AOF rewrite child process title is set to "redis-aof-rewrite" if
Valkey was started from a redis-server symlink, otherwise to
"valkey-aof-rewrite".

This is a breaking changes since logs are changed.

Part of #207.

---------

Signed-off-by: Shivshankar-Reddy <shiva.sheri.github@gmail.com>
2024-05-01 18:15:19 +02:00
Josiah Carlson
f4e10eee06
CRC64 perf improvements from Redis patches (#350)
Improve the performance of crc64 for large batches by processing large
number of bytes in parallel and combining the results.

## Performance 
* 53-73% faster on Xeon 2670 v0 @ 2.6ghz
* 2-2.5x faster on Core i3 8130U @ 2.2 ghz
* 1.6-2.46 bytes/cycle on i3 8130U
* likely >2x faster than crcspeed on newer CPUs with more resources than
a 2012-era Xeon 2670
* crc64 combine function runs in <50 nanoseconds typical with vector +
cache optimizations (~8 *microseconds* without vector optimizations, ~80
*microseconds without cache, the combination is extra effective)
* still single-threaded
* valkey-server test crc64 --help (requires `make distclean && make
SERVER_TEST=yes`)

---------

Signed-off-by: Josiah Carlson <josiah.carlson@gmail.com>
Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>
2024-04-30 19:32:01 -07:00
Viktor Söderqvist
89f72bc3ae
Don't include config.h from serverassert.h (#404)
Serverassert is a drop-in replacement of assert. We use it even in code
copied from other sources. To make these files usable outside of Valkey,
it should be enough to replace the `serverassert.h` include with
`<assert.h>`. Therefore, this file shouldn't have any dependencies to
the rest of the valkey code.

---------

Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
2024-05-01 03:26:59 +02:00
Lipeng Zhu
44f273d13b
Delete unused declaration (#400)
Delete unused declaration `void *dictEntryMetadata(dictEntry *de);` in
dict.h.

---------

Signed-off-by: Lipeng Zhu <lipeng.zhu@intel.com>
2024-05-01 03:02:22 +02:00
NAM UK KIM
e43fda5685
Modify mem_freed variable in evict.c and Update debug.c (#376)
Fix the mem_freed variable to be initialized with init.
with this PR prevents the variable from acting unknowingly.

Signed-off-by: NAM UK KIM <namuk2004@naver.com>
2024-04-30 16:41:37 -07:00
Madelyn Olson
b283c6b508
Initial PR outlining the governance for the project (#345)
Initial PR to add a governance doc outlining permissions for the main
Valkey project as well as define responsibilities for sub-projects.

---------

Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
Co-authored-by: Ping Xie <pingxie@outlook.com>
Co-authored-by: zhaozhao.zz <zhaozhao.zz@alibaba-inc.com>
Co-authored-by: hwware <wen.hui.ware@gmail.com>
Co-authored-by: binyan <binbin.yan@nokia.com
2024-04-30 14:57:21 -07:00
Shivshankar
0cb97653b7
Change default pidfile from redis.pid to valkey.pid (#378)
Changes the default value for the `pidfile` config.

The template config file `valkey.conf` already contains `pidfile
/var/run/valkey_6379.pid`. This is not a default. The default is what
you get when you start valkey without config.

Tests suites config pidfile changed to valkey accordingly.

Signed-off-by: Shivshankar-Reddy <shiva.sheri.github@gmail.com>
2024-04-30 18:45:08 +02:00
Viktor Söderqvist
6e05d0fcb1
Update script to generate Valkey Module API docs (#406)
The output of this script becomes the contents of
`topics/module-api-ref.md` in the `valkey-doc` repo. (Updating it is a
manual process.)

The script uses git tags to find the version that first added an API
function. To preserve the history from old Redis OSS versions, for which
we don't keep git tags, a mapping is stored in a file.

Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
2024-04-30 18:24:18 +02:00
Karthick Ariyaratnam
05251c55d7
Change default syslog-ident from redis to valkey (#390)
Default value for the "syslog-ident" config changed from "redis" to
"valkey".

Fixes #301.

---------

Signed-off-by: Karthick Ariyaratnam <karthyuom@gmail.com>
2024-04-30 14:34:19 +02:00
Andy Pan
948cd8f2c2
Use SOCK_NONBLOCK to reduce system calls for outgoing connections (#293)
What this PR mainly does:

1. Refactor the `anetCreateSocket()` to make it more generic for more
   socket arguments, and use `SOCK_NONBLOCK` if available, which will
   reduce two system calls (`F_GETFL` and `F_SETFL`) of enabling the
   non-blocking mode on each newly created socket.
2. Clean up the unused `anetUnixGenericConnect()` that calls
   `anetCreateSocket()`.

`SOCK_NONBLOCK` for system call `socket()` is supported on most
UNIX-like platforms (`linux`, `dragonfly`, `freebsd`, `netbsd`,
`openbsd`, `solaris`, etc.). This improvement will significantly reduce
the system calls considering how massively `anetTcpGenericConnect()`
will be called when needed.

As for the cleanup, `anetUnixGenericConnect` was introduced in c61e692
and the only reference back then was from the `createClient()` in
`redis-benchmark.c` which had been removed in ec8f066 and made it the
dead code. Most of that dead code was also cleaned up in f657315e, and
it seems that the `anetUnixGenericConnect` got left out. Therefore, I
also cleaned it up, ***but I'm not so certain about doing this cleanup
in this PR. Maybe you would prefer to do it in a separate PR?***

References:

- [socket(2) on Linux](https://man7.org/linux/man-pages/man2/socket.2.html)
- [socket(2) on FreeBSD](https://man.freebsd.org/cgi/man.cgi?socket(2))
- [socket(2) on DragonFly](https://man.dragonflybsd.org/?command=socket&section=2)
- [socket(2) on NetBSD](https://man.netbsd.org/socket.2)
- [socket(2) on OpenBSD](https://man.openbsd.org/socket.2)
- [socket(3c) on Solaris](https://docs.oracle.com/cd/E88353_01/html/E37843/socket-3c.html)
- [socket(3socket) on illumos](https://illumos.org/man/3SOCKET/socket)

---------

Signed-off-by: Andy Pan <i@andypan.me>
2024-04-30 11:49:22 +02:00
Madelyn Olson
b0d5a0f58d
Make sure standard library is imported when using abort (#395)
I found a few cases where serverAssert() resulted in abort being added, but abort requires
stdlib. So, when serverAssert() uses abort, also automatically include stdlib.

Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
2024-04-26 17:32:17 -07:00
Sher_Sun
a5a1377dfc
Update REDIS* to VALKEY* in object.c and utils/create-cluster/README (#380)
1. Rename `REDIS_*` macros defined in object.c to `VALKEY_*`, 
2. Rename `Redis` to `Valkey` , `redis-cli` to `valkey-cli` in logs
(i.e. put statement) and descriptions in object.c and
utils/create-cluster/README

---------

Signed-off-by: Sher Sun <sher.sun@huawei.com>
Co-authored-by: Sher Sun <sher.sun@huawei.com>
2024-04-26 10:26:19 -07:00
bentotten
19c4c647e0
Fix incorrect comment for count in clusterMsg (#381)
The "count" field of clusterMsg is only used for gossip.

Signed-off-by: Ben Totten <btotten@amazon.com>
Co-authored-by: Ben Totten <btotten@amazon.com>
2024-04-25 16:33:44 -07:00
pshankinclarke
2e926b2de1
Fix command line formatting in TLS.md (#372)
Fix command line formatting in TLS.md 

Signed-off-by: Parker Shankin-Clarke <parkerwsc1@gmail.com>
2024-04-25 14:20:52 -07:00
Viktor Söderqvist
d0ee4188c5
Don't let install flags affect build (#382)
Don't let the Make valiable `USE_REDIS_SYMLINKS` affect the build.
If it does, it causes the second line in the example below (`make
install`) to recompile what was already compiled on the line above, and
this time it's built without BUILD_TLS=yes USE_SYSTEMD=yes.

    make BUILD_TLS=yes USE_SYSTEMD=yes 
    make PREFIX=custom/usr USE_REDIS_SYMLINKS=no install

Fixes #377

Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
2024-04-25 23:19:15 +02:00
Karthick Ariyaratnam
9eaefc77b0
Mention one of rediss:// and valkeys:// in error message, not both (#373)
When using a TLS scheme for valkey-cli and benchmark-cli compiled
without TLS, make the error message only mention the scheme used.

Before:

"valkeys:// and rediss:// are only supported when valkey-cli is compiled
with OpenSSL"

After, depending on which scheme the user tried to use:

"valkeys:// is only supported when valkey-cli is compiled with OpenSSL"
"rediss:// is only supported when valkey-cli is compiled with OpenSSL"

Follow up of #199.

---------

Signed-off-by: karthyuom <karthyuom@gmail.com>
Co-authored-by: k00809413 <karthick.ariyaratnam1@huawei.com>
2024-04-25 22:02:22 +02:00
Andy Pan
4948d536b9
Detect accept4() on specific versions of various platforms (#294)
This PR has mainly done three things:

1. Enable `accept4()` on DragonFlyBSD
2. Fix the failures of determining the presence of `accept4()` due to
   the missing <sys/param.h> on two OSs: NetBSD, OpenBSD
3. Drop the support of FreeBSD < 10.0 for `valkey`

### References
- [param.h in
DragonFlyBSD](7485684fa5/sys/sys/param.h (L129-L257))
- [param.h in
FreeBSD](https://github.com/freebsd/freebsd-src/blob/main/sys/sys/param.h#L46-L76)
- [param.h in
NetBSD](b5f8d2f930/sys/sys/param.h (L53-L70))
- [param.h in
OpenBSD](d9c286e032/sys/sys/param.h (L40-L45))

---------

Signed-off-by: Andy Pan <i@andypan.me>
2024-04-25 16:20:40 +02:00
Shivshankar
52f9291f79
Rename redis to valkey in test suite logs and test names. (#366)
This PR covers below cases.
1. test suite's prints(i.e., puts statement logs).
2. Links refering to redis issues.
3. test names contains redis.

Signed-off-by: Shivshankar-Reddy <shiva.sheri.github@gmail.com>
2024-04-25 15:13:21 +08:00
artikell
74a0486e3d
Update redis* to valkey* in server.c and module.c (#367)
it's a supplementary modification from #352.

Signed-off-by: artikell <739609084@qq.com>
2024-04-25 10:35:12 +08:00
Wen Hui
2864fffe73
Update redis* to valkey* in syscheck.c (#365)
Fixes #352

Signed-off-by: hwware <wen.hui.ware@gmail.com>
2024-04-25 10:34:37 +08:00
PatrickJS
be81469cde
Update COPYING to Valkey (#32)
Add License Copyright description for Valkey contributors in COPYING

---------

Signed-off-by: PatrickJS <github@patrickjs.com>
Signed-off-by: Madelyn Olson <34459052+madolson@users.noreply.github.com>
Co-authored-by: Madelyn Olson <34459052+madolson@users.noreply.github.com>
Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>
2024-04-24 16:40:30 -04:00
Wen Hui
191be272b4
Rename redis.tcl to valkey.tcl (#283)
Includes some more changes e.g. the README under tests and some script under utils.

Signed-off-by: hwware <wen.hui.ware@gmail.com>
2024-04-24 20:54:52 +02:00
Shivshankar
8baf322742
Rename remaining test procedures (#355)
Renamed below procedures and variables (missed in #287) as follows.

redis_cluster             ->     valkey_cluster
redis1                    ->     valkey1
redis2                    ->     valkey2
get_redis_dir             ->     get_valkey_dir
test_redis_cli_rdb_dump   ->     test_valkey_cli_rdb_dump
test_redis_cli_repl       ->     test_valkey_cli_repl
redis-cli                 ->     valkey-cli
redis_reset_state         ->     valkey_reset_state
redisHandle               ->     valkeyHandle
redis_safe_read           ->     valkey_safe_read
redis_safe_gets           ->     valkey_safe_gets
redis_write               ->     valkey_write
redis_read_reply          ->     valkey_read_reply
redis_readable            ->     valkey_readable
redis_readnl              ->     valkey_readnl
redis_writenl             ->     valkey_writenl
redis_read_map            ->     valkey_read_map
redis_read_line           ->     valkey_read_line
redis_read_null           ->     valkey_read_null
redis_read_bool           ->     valkey_read_bool
redis_read_double         ->     valkey_read_double
redis_read_verbatim_str   ->     valkey_read_verbatim_str
redis_call_callback       ->     valkey_call_callback

---------

Signed-off-by: Shivshankar-Reddy <shiva.sheri.github@gmail.com>
2024-04-24 18:01:33 +02:00
Wen Hui
d09a59c3b1
Rename redis_init_script file and its content (#357)
Rename the init script and a related `.tpl` file and rename variable
names inside (redis to valkey). Update variables in
`utils/install_server.sh`.

Fixes #354

Signed-off-by: hwware <wen.hui.ware@gmail.com>
2024-04-24 10:05:11 +02:00
Shivshankar
669f1d3014
redisbenchmark to valkeybenchmark in test directory and some test name rename. (#347)
This pr covers following chnages.
1. redisbenchmark to valkeybenchmark in test directory 
2. Removed redis from some test's title and changed the name
accordingly.
3. Updated test suite name and redis-server to valkey readme in test
directory.

---------

Signed-off-by: Shivshankar-Reddy <shiva.sheri.github@gmail.com>
2024-04-23 10:51:53 -07:00
Dmitry Polyakovsky
f0e1edc273
Updated modules examples and tests to use valkey vs redis (#349)
Scope of the changes:
- updated example modules to reference Valkey vs Redis in variable names
- updated tests to use valkeymodule.h
- updated vars in tests/modules to use valkey vs redis in variable names

Summary of the testing:
- ran make for all modules, loaded them into valkey-server and tested
commands
- ran make for test/modules
- ran make test for the entire codebase

---------

Signed-off-by: Dmitry Polyakovsky <dmitry.polyakovky@oracle.com>
Co-authored-by: Dmitry Polyakovsky <dmitry.polyakovky@oracle.com>
2024-04-23 17:55:44 +02:00
Lipeng Zhu
393c8fde29
Rename macros in config.h (#257)
This patch try to do following things:

1. Rename `redis_*` and `REDIS_*` macros defined in config.h to
`valkey_*`, `VALKEY_*` and update associated used files. (`redis_fstat`,
`redis_fsync`, `REDIS_THREAD_STACK_SIZE`, etc.)
2. Remove the leading double underscore for guard macro in config.h.

---------

Signed-off-by: Lipeng Zhu <lipeng.zhu@intel.com>
2024-04-23 14:20:35 +02:00
Lipeng Zhu
87a5bfc002
Support connection schemes valkey:// and valkeys:// (#199)
1. Support valkey:// and valkeys:// scheme in valkey-cli and
valkey-benchmark. Retain the original Redis schemes for compatibility.
2. Add unit tests for valid URI, all schemes.

Fixes: https://github.com/valkey-io/valkey/issues/198
Fixes: https://github.com/valkey-io/valkey/issues/200

---------

Signed-off-by: Lipeng Zhu <lipeng.zhu@intel.com>
2024-04-23 03:02:41 +02:00
Dmitry Polyakovsky
a989ee5c05
Updated modules examples to reference Valkey* (#342)
we already have valkeymodule.h with new naming convention and reference
it from modules examples

Signed-off-by: Dmitry Polyakovsky <dmitry.polyakovky@oracle.com>
Co-authored-by: Dmitry Polyakovsky <dmitry.polyakovky@oracle.com>
2024-04-22 16:01:04 +02:00
Shivshankar
4693aa258e
Rename redis in install_server.sh file (#341)
Readme in github shows that install script will help to install valkey
server, However the logs of thes cripts and variables in the script still
points to redis so renamed redis to valkey/server accordingly.

Signed-off-by: Shivshankar-Reddy <shiva.sheri.github@gmail.com>
2024-04-19 16:38:18 -07:00
Shivshankar
7809df0c93
Remove REDIS tag from test macros. (#333)
Signed-off-by: Shivshankar-Reddy <shiva.sheri.github@gmail.com>
2024-04-19 17:34:02 +08:00
Shivshankar
cc94c98a9d
Update redis to valkey in generate-commands-json.py (#238)
Previously these scripts were updated but still some places are left so
updated the valkey.

---------

Signed-off-by: Shivshankar-Reddy <shiva.sheri.github@gmail.com>
2024-04-18 21:31:49 +02:00
Shivshankar
34413e0862
Replace "redis" with "valkey" test code (#287)
Occurrences of "redis" in TCL test suites and helpers, such as TCL
client used in tests, are replaced with "valkey".

Occurrences of uppercase "Redis" are not changed in this PR.

No files are renamed in this PR.

---------

Signed-off-by: Shivshankar-Reddy <shiva.sheri.github@gmail.com>
2024-04-18 15:57:17 +02:00
Viktor Söderqvist
9e2b7838ea
Add 'extended-redis-compatibility' config (#306)
New config 'extended-redis-compatibility' (yes/no) default no

* When yes:
  * Use "Redis" in the following error replies:
    - `-LOADING Redis is loading the dataset in memory`
    - `-BUSY Redis is busy`...
    - `-MISCONF Redis is configured to`...
* Use `=== REDIS BUG REPORT` in the crash log delimiters (START and
END).
* The HELLO command returns `"server" => "redis"` and `"version" =>
"7.2.4"` (our Redis OSS compatibility version).
  * The INFO field for mode is called `"redis_mode"`.
* When no:
* Use "Valkey" instead of "Redis" in the mentioned errors and crash log
delimiters.
* The HELLO command returns `"server" => "valkey"` and the Valkey
version for `"version"`.
  * The INFO field for mode is called `"server_mode"`.

* Documentation added in valkey.conf:

> Valkey is largely compatible with Redis OSS, apart from a few cases
where
> Redis OSS compatibility mode makes Valkey pretend to be Redis. Enable
this
  > only if you have problems with tools or clients. This is a temporary
> configuration added in Valkey 8.0 and is scheduled to have no effect
in Valkey
  > 9.0 and be completely removed in Valkey 10.0.

* A test case for the config is added. It is designed to fail if the
config is not deprecated (has no effect) in Valkey 9 and deleted in
Valkey 10.

* Other test cases are adjusted to work regardless of this config.

Fixes #274
Fixes #61

---------

Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
2024-04-18 14:10:24 +02:00
Vitah Lin
b6dbc8109b
Add Codecov for Automated Code Coverage (#316)
This PR introduces Codecov to automate code coverage tracking for our
project's tests.

For more information about the Codecov platform, please refer to
https://docs.codecov.com/docs/quick-start

---------

Signed-off-by: Vitah Lin <vitahlin@gmail.com>
Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>
2024-04-17 22:39:02 -07:00
Shivshankar
af47cffc83
Update oom_score_adjusted_by_redis to oom_score_adjusted_by_valkey in server.c (#229)
Update oom_score_adjusted_by_redis to oom_score_adjusted_by_valkey in
server.c

Signed-off-by: Shivshankar-Reddy <shiva.sheri.github@gmail.com>
2024-04-18 11:53:22 +08:00
Shivshankar
3040c439b8
Remove REDIS tag from REDIS_CONFIG_REWRITE_SIGNATURE. (#331)
This macros is used to add rewrite string in src/config.c and removing
the redis will not effect log or output.

Signed-off-by: Shivshankar-Reddy <shiva.sheri.github@gmail.com>
2024-04-18 11:38:39 +08:00
Bany
96d14fe263
Change Redis to Valkey in log messages (#226)
Log messages containing "Redis" in some files are changed.

Add macro SERVER_TITLE defined to "Valkey" (uppercase V) is introduced
and used in log messages, so at least it will be easy to patch this
definition to get Redis or any other name in the logs instead of Valkey.

Change "Redis" in some log messages to use %s and SERVER_TITLE.

This is a partial implementation of
https://github.com/valkey-io/valkey/issues/207

---------

Signed-off-by: 0del <bany.y0599@gmail.com>
2024-04-17 14:38:21 +02:00
Chen Tianjie
5d23f8f58a
Complete fields in client list and client info test. (#326)
Add `lib-name` and `lib-ver` check.

Signed-off-by: Chen Tianjie <TJ_Chen@outlook.com>
2024-04-17 11:33:25 +08:00
Viktor Söderqvist
8dcc8ebba4
Remove 'Redis' in error replies (#206)
Low-risk error replies containing "Redis" are changed.

In most cases, the word "Redis" is simply removed from the error message,
such as in "This Redis instance is not configured to use an ACL file. (...)",
the message is changed to "This instance is not configured to use an ACL
file. (...)".

Additionally, error replies from `redis.call` in a Lua script are
affected, such as

* "Please specify at least one argument for this redis lib call"
* "Wrong number of args calling Redis command from script"
* "Unknown Redis command called from script"
* "Invalid command passed to redis.acl_check_cmd()"

The name Redis is simply removed from these error message. In the last
one above, "redis.acl_check_cmd()" is replaced by
"server.acl_check_cmd()" in the error message.

The following error replies are considered high of causing problems for
clients, so they are not changed in this commit:

* (not in scope) "-MISCONF Redis is configured to save RDB snapshots
(...)"
* (not in scope) "-LOADING Redis is loading the dataset in memory"
* (not in scope) "-BUSY Redis is busy running a script (...)"

Fixes #204

---------

Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
2024-04-16 21:17:38 +02:00
Roshan Khatri
b16e647679
Adds workflows to build release binaries and push to S3 (#315)
[related to](https://github.com/valkey-io/valkey/issues/230)

Adds workflows to build Valkey binaries and push to S3 to make it
available to download from the website

The Workflows can be triggered by pushing a release to the repo and the
other option is manually by one of the Maintainers.

Once the workflow triggers, it will generate a matrix of Jobs for the
platforms we need to build from `utils/releasetools/build-config.json`
and then the respective Jobs are triggered. These jobs make Valkey with
respect to the platform binaries we want to release and would push to a
private S3 bucket.

---------

Signed-off-by: Roshan Khatri <rvkhatri@amazon.com>
2024-04-16 07:01:36 -07:00
Ping Xie
2ec8f638b5
Fixed url links in valkey.conf (#320)
Signed-off-by: Ping Xie <pingxie@google.com>
Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>
2024-04-15 10:49:57 -07:00
Viktor Söderqvist
10d980890c
Update CONTRIBUTING.md and issue templates (#311)
Update CONTRIBUTING.md:
    
* A more friendly approach, hopefully.
* The note about receiving patches is moved to the DCO section.
* Some "Get started" links in a bullet list (inspired by OpenTofu's
contributing file).
* For questions, refer to GitHub Discussions and Discord instead of only
Discord.
* Minor edits and formatting.

Update issue templates:
    
* The issue template for questions is replaced by a link to Discussions
and to Matrix and Discord chats.
* Add a link to the valkey-doc repo.
* The crash report template is extended into a form, with separate
fields for the crash report and the additional info.

---------

Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
2024-04-15 16:00:59 +02:00
jonghoonpark
c090874ed4
List test files dynamically (#313)
Motivation: Currently we have to manually update the all_tests variable
when introducing new test files.

Fix: I've modified it to list test files dynamically, but rather than
modify it to add all test files, I've first modified it to only add test
files from the following 4 paths so that it doesn't deviate too much
from what we already do

- unit
- unit/type
- unit/cluster
- integration

Related issue: https://github.com/valkey-io/valkey/issues/302

---------

Signed-off-by: jonghoonpark <dev@jonghoonpark.com>
2024-04-15 14:25:33 +02:00
Vitah Lin
1221b3951a
Fix typo in comment (#318)
Signed-off-by: Vitah Lin <vitahlin@gmail.com>
2024-04-15 13:45:00 +02:00
Andy Pan
fc5bf6a0ef
Clamp TCP_KEEPINTVL and simplify TCP_KEEPALIVE_ABORT_THRESHOLD on Solaris (#292)
The time between each consequent probes is set by TCP_KEEPINTVL in
seconds. The minimum value is ten seconds. The maximum is ten days,
while the default is two hours. The TCP connection will be aborted after
certain amount of probes, which is set by TCP_KEEPCNT, without receiving
response.

## References
[Solaris
11.4](https://docs.oracle.com/cd/E88353_01/html/E37851/tcp-4p.html)

Signed-off-by: Andy Pan <i@andypan.me>
2024-04-14 21:53:07 -07:00
Shivshankar
a4da212f11
upadate release tool script to valkey (#239)
Updated release-tool scripts to valkey, This PR covered only the file
names for the tarball but still location needs to be updated
accordingly.

Signed-off-by: Shivshankar-Reddy <shiva.sheri.github@gmail.com>
2024-04-14 13:22:48 -07:00
Sher_Sun
e71be72745
Fix typo and rename Redis to Valkey in the utils/lru/README (#314)
This utils/lru/README incorrectly refers to REDIS_LRU_CLOCK_RESOLUTION
in server.h to modify the LRU clock resolution. However, the actual
constant in server.h has been updated to LRU_CLOCK_RESOLUTION, but the
README was not updated to reflect this change.

1. Replaced REDIS_LRU_CLOCK_RESOLUTION with LRU_CLOCK_RESOLUTION in the
text of utils/lru/README.
2. Updated references from "Redis" to "Valkey" within the same README
file as part of the ongoing rebranding efforts:)

---------

Signed-off-by: Sher Sun <sher.sun@huawei.com>
Co-authored-by: Sher Sun <sher.sun@huawei.com>
2024-04-14 11:40:01 -07:00
Kyle J. Davis
eb7f5c4e0a
Removes empty history arrays in json (#317)
Fixes #241 

Removes the empty `history` arrays in 6 command JSON files. This
normalizes these 6 with the rest of the command JSON which omit the
`history` array entirely when there is no history.

This makes parsing these files slightly less annoying in languages where
empty arrays are falsey.

Signed-off-by: Kyle J. Davis <kyledvs@amazon.com>
2024-04-13 21:38:05 +02:00
Björn Svensson
1c282a9306
Set permissions for Github Actions in CI (#312)
This sets the default permission for current CI workflows to only be
able to read from the repository (scope: "contents").
When a used Github Action require additional permissions (like CodeQL)
we grant that permission on job-level instead.

This means that a compromised action will not be able to modify the repo
or even steal secrets since all other permission-scopes are implicit set
to "none", i.e. not permitted. This is recommended by
[OpenSSF](https://github.com/ossf/scorecard/blob/main/docs/checks.md#token-permissions).

This PR includes a small fix for the possibility of missing server logs
artifacts, found while verifying the permission.
The `upload-artifact@v3` action will replace artifacts which already
exists. Since both CI-jobs `test-external-standalone` and
`test-external-nodebug` uses the same artifact name, when both jobs
fail, we only get logs from the last finished job. This can be avoided
by using unique artifact names.

This PR is part of #211

More about permissions and scope can be found here:

https://docs.github.com/en/actions/using-workflows/workflow-syntax-for-github-actions#permissions

---------

Signed-off-by: Björn Svensson <bjorn.a.svensson@est.tech>
2024-04-12 17:24:22 +02:00
Tom Morris
7b58f080a8
Fix typo in SECURITY.md (#309)
Fix typo: "disclore" to "disclosure" in SECURITY.md

Signed-off-by: Tom Morris <tom@tommorris.org>
2024-04-12 11:05:39 -04:00
Parth
644692db79
Fixing a lua debugger bug that prevented use of 'server' for server.call invocations. (#303)
* Tested it on local instance. This was originally part of
https://github.com/valkey-io/valkey/pull/288 but I am pushing this
separately, so that we can easily merge it into the upcoming release.

```
lua debugger> server ping
<redis> ping
<reply> "+PONG"
lua debugger> redis ping
<redis> ping
<reply> "+PONG"
```

* I also searched for lua debugger related unit tests to add coverage
for this but did not find any relevant test to modify. Leaving it at
that for now.

---------

Signed-off-by: Parth Patel <661497+parthpatel@users.noreply.github.com>
2024-04-11 15:54:39 -07:00
Madelyn Olson
3d887df265
Add links for security issues (#299)
Add an initial security release page. In the fullness of time I would
like to also include our version support here, but until that has been
decided I would like to keep this simple and just include links.

---------

Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
2024-04-11 15:12:28 -07:00
Wen Hui
c36f67a3ec
Fix a minor issue for Redis brand name in Sentinel.conf (#300)
Signed-off-by: hwware <wen.hui.ware@gmail.com>
2024-04-11 13:59:52 -07:00
Shivshankar
4be97ebcbe
update valkey in serverLog messeges in server.c file (#231)
Updated keyword "Redis" to "Valkey" in log messeges in server.c file

Signed-off-by: Shivshankar-Reddy <shiva.sheri.github@gmail.com>
2024-04-11 16:34:32 -04:00
bentotten
6975242529
Update comment in cluster_legacy.h (#277)
Update comment suggesting clusterMsgPingExtTypes to clusterMsgPingtypes
as clusterMsgPingExtTypes does not exist

Signed-off-by: Ben Totten <btotten@amazon.com>
2024-04-11 13:18:20 -07:00
Daniel House
f0113a4105
Clarify the usage of Valkey in a comment (#233)
Signed-off-by: Daniel House <daniel.house@huawei.com>
2024-04-11 13:06:04 -07:00
Roshan Khatri
f4f1bd6fde
Revert update of RedisModuleEvent_MasterLinkChange (#289)
ValkeyModuleEvent_MasterLinkChange was updated to use more inclusive
language, but it was done in the compatibility layer as well
(RedisModuleEvent_).

---------

Signed-off-by: Roshan Khatri <rvkhatri@amazon.com>
2024-04-10 17:35:08 -07:00
Daniel House
b669af0eab
Rename 'redis_tls_ctx' and 'redis_tls_client_ctx' global variables (#268)
Signed-off-by: Daniel House <daniel.house@huawei.com>
Signed-off-by: daniel-house <danny@cs.toronto.edu>
Co-authored-by: Daniel House <daniel.house@huawei.com>
2024-04-10 23:00:27 +02:00
Shivshankar
2e46046625
Rename macros in valkey-cli.c and redis_strlcpy to valkey (#284)
Signed-off-by: Shivshankar-Reddy <shiva.sheri.github@gmail.com>
2024-04-10 22:50:52 +02:00
Viktor Söderqvist
f8cec23a9b
Delete old deprecated cli program redis-trib (#281)
Actually, this script doesn't do anything except printing that it is
replaced by redis-cli.

Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
2024-04-10 18:14:58 +02:00
Shivshankar
a054862b72
Rename redis_client* procedure to valkey_client* in test environment (#276)
Renamed redis-client* procedure to valkey_client*

Signed-off-by: Shivshankar-Reddy <shiva.sheri.github@gmail.com>
2024-04-10 10:18:47 -04:00
Shivshankar
05d16579e6
Rename redis in valkey-cli file comments and prints (#222)
Updated to Valkey in valkey-cli.c file's comments and prints.

* The output of valkey-cli --help
* The output of the cli built-in HELP command
* The prompt in interactive valkey-cli -s unixsocket
* The history file and the default rc file (changed filename)

---------

Signed-off-by: Shivshankar-Reddy <shiva.sheri.github@gmail.com>
2024-04-10 11:03:08 +02:00
Vitah Lin
4a8b4f4229
Rename redis.info to valkey.info in LCOV (#259)
Signed-off-by: Vitah Lin <vitahlin@gmail.com>
2024-04-10 08:59:45 +02:00
Madelyn Olson
03650e91b7
Revert the default PID file back to the real default (#275)
The default pid file is created at /var/run/redis.pid based on the code
at
da831c0d22/src/server.h (L132).
Until we update it, we should reflect that in the conf file.

Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
2024-04-09 08:21:06 -07:00
Shivshankar
da831c0d22
rename procedure redis_deferring_client to valkey_deferring_client (#270)
Updated procedure redis_deferring_client in test environent to
valkey_deferring_client.

Signed-off-by: Shivshankar-Reddy <shiva.sheri.github@gmail.com>
2024-04-09 10:38:09 -04:00
Madelyn Olson
c0cef48e98
Fix reference to redis-tls module (#273)
Update test usage of valkey-tls.so module to use valkey-tls.so instead.

Fixes tests failures like
https://github.com/valkey-io/valkey/actions/runs/8592855995/job/23543475478.

Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
2024-04-09 07:15:59 -07:00
Jacob Murphy
df5db0627f
Remove trademarked language in code comments (#223)
This includes comments used for module API documentation.

* Strategy for replacement: Regex search: `(//|/\*| \*|#).* ("|\()?(r|R)edis( |\.
  |'|\n|,|-|\)|")(?!nor the names of its contributors)(?!Ltd.)(?!Labs)(?!Contributors.)`
* Don't edit copyright comments
* Replace "Redis version X.X" -> "Redis OSS version X.X" to distinguish
from newly licensed repository
* Replace "Redis Object" -> "Object"
* Exclude markdown for now
* Don't edit Lua scripting comments referring to redis.X API
* Replace "Redis Protocol" -> "RESP"
* Replace redis-benchmark, -cli, -server, -check-aof/rdb with "valkey-"
prefix
* Most other places, I use best judgement to either remove "Redis", or
replace with "the server" or "server"

Fixes #148

---------

Signed-off-by: Jacob Murphy <jkmurphy@google.com>
Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
2024-04-09 10:24:03 +02:00
Harkrishn Patro
1aa13decf6
Remove single node cluster validation check from benchmark (#266)
Allow redis-benchmark to run against single shard clusters. 

Signed-off-by: Harkrishn Patro <harkrisp@amazon.com>
2024-04-08 21:20:10 -07:00
VoletiRam
d89ef06ce5
Wait for cluster fully online in cluster_config_consistent (#272)
Wait for cluster to be in a fully consistent and online state in
`cluster_config_consistent`. We expect the `start_server` to create the
desired primaries and replicas before the start of the tests. With the
current setup, the replicas may not complete the sync with primaries and
can be in loading state. In some cases, the role of replicas can still
be master with the delay of propagation of replicate command. The tests
can show flaky behavior in such cases. Add a check that verifies the
nodes health status 'online' for the cluster consistency. Leverage the
deterministic order of `CLUSTER SLOTS` to consider the cluster as
consistent along with the nodes health status.

---------

Signed-off-by: Harkrishn Patro <harkrisp@amazon.com>
Signed-off-by: Ram Prasad Voleti <ramvolet@amazon.com>
Co-authored-by: Harkrishn Patro <harkrisp@amazon.com>
Co-authored-by: Ram Prasad Voleti <ramvolet@amazon.com>
2024-04-08 20:03:56 -07:00
Harkrishn Patro
e59dd41e42
Maintain determinstic ordering of replica(s) in CLUSTER SLOTS response (#265)
Sort `clusterNode.slaves` while adding a new replica to the cluster on
basis of `name`. This will enable deterministic ordering of replica(s)
information in `CLUSTER SLOTS` response.

Before this change:

```
127.0.0.1:6380> CLUSTER SLOTS
1) 1) (integer) 0
   2) (integer) 16383
   3) 1) "127.0.0.1"
      2) (integer) 6379
      3) "fc72609a620c62d073a31eed9ddde5104c1fa302"
      4) (empty array)
   4) 1) "127.0.0.1"
      2) (integer) 6381
      3) "fac84bbf2ffc5cfcdebc92c06b8ead9c3cba4051"
      4) (empty array)
   5) 1) "127.0.0.1"
      2) (integer) 6380
      3) "b1249d394326f1485df0b895f2fd38e141aa5056"
      4) (empty array)
```

After this change:

```
127.0.0.1:6380> CLUSTER SLOTS
1) 1) (integer) 0
   2) (integer) 16383
   3) 1) "127.0.0.1"
      2) (integer) 6379
      3) "fc72609a620c62d073a31eed9ddde5104c1fa302"
      4) (empty array)
   4) 1) "127.0.0.1"
      2) (integer) 6380
      3) "b1249d394326f1485df0b895f2fd38e141aa5056"
      4) (empty array)
   5) 1) "127.0.0.1"
      2) (integer) 6381
      3) "fac84bbf2ffc5cfcdebc92c06b8ead9c3cba4051"
      4) (empty array)
```

Signed-off-by: Harkrishn Patro <harkrisp@amazon.com>
2024-04-08 19:58:23 -07:00
Harkrishn Patro
c2ebe9ebbd
Add sharded-pubsub tcl test to test_helper all test set (#267)
Add sharded-pubsub to the supported TCL tests.

Signed-off-by: Harkrishn Patro <harkrisp@amazon.com>
2024-04-08 17:18:45 -07:00
Shivshankar
5bccd7b800
Rename systemd files and content to valkey from redis (#234)
Changed systemd file names and content of them to valkey.

Signed-off-by: Shivshankar-Reddy <shiva.sheri.github@gmail.com>
2024-04-08 14:37:56 -04:00
Madelyn Olson
750e94cad3
Update crash wording to include our repo (#263)
Update the wording in the crash log to point to Valkey repo instead of Redis repo.

Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
2024-04-08 10:09:27 -07:00
Harkrishn Patro
ebfb440629
Pass extensions to node if extension processing is handled by it (#52)
Ref: https://github.com/redis/redis/pull/12760

### Description

#### Fixes compatibilty of PlaceholderKV cluster (7.2 - extensions
enabled by default) with older Redis cluster (< 7.0 - extensions not
handled) .

With some of the extensions enabled by default in 7.2 version, new nodes
running 7.2 and above start sending out larger clusterbus message
payload including the ping extensions. This caused an incompatibility
with node running engine versions < 7.0. Old nodes (< 7.0) would receive
the payload from new nodes (> 7.2) would observe a payload length
(totlen) > (estlen) and would perform an early exit and won't process
the message.

This fix introduces a flag `extensions_supported` on the clusterMsg
indicating the sender node can handle extensions parsing. Once, a
receiver nodes receives a message with this flag set to 1, it would
update clusterNode new field extensions_supported and start sending out
extensions if it has any.


This PR also introduces a DEBUG sub command to enable/disable cluster
message extensions `process-clustermsg-extensions` feature.

Note: A successful `PING`/`PONG` is required as a sender for a given
node to be marked as `extensions_supported` and then extensions message
will be sent to it. This could cause a slight delay in receiving the
extensions message(s).

### Testing

TCL test verifying the cluster state is healthy irrespective of
enabling/disabling cluster message extensions feature.

---------

Signed-off-by: Harkrishn Patro <harkrisp@amazon.com>
2024-04-08 09:01:30 -07:00
Bany
d92dc78cb8
Update ValkeyModuleEvent_MasterLinkChange to ValkeyModuleEvent_PrimaryLinkChange (#262)
Update ValkeyModuleEvent_MasterLinkChange to ValkeyModuleEvent_PrimaryLinkChange

Signed-off-by: 0del <bany.y0599@gmail.com>
2024-04-08 08:56:39 -07:00
Madelyn Olson
6411629c61
Madelyn's attempt as a logo (#251)
Apply new logo at startup.

It is one character wider and 2 characters taller than the original
Redis logo.

---------

Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
2024-04-07 17:30:38 -07:00
Viktor Söderqvist
d26d596b3e
Log branding (#252)
Small changes to the log messages printed during startup and shutdown,
for Valkey branding.

SERVER_NAME is replaced by verbatim "Valkey" in one place, because
SERVER_NAME expands to "valkey" in lowercase. (Should we introduce
another macro that expands to "Valkey"?)

Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
2024-04-07 17:28:02 -07:00
Arpit Pandey
1b47daff09
Fix: Typo clock_getting -> clock_gettime (#244)
typo in comment monotonic.c file:- 

Changed: faster than calling 'clock_getting' (POSIX) 
To: faster than calling 'clock_gettime' (POSIX)

Signed-off-by: Arpit Pandey <arpit.pandey05@gmail.com>
2024-04-07 14:49:14 -07:00
Madelyn Olson
9f03dfc1f6
Fix two typos that were flagged in the 7.2 build (#248)
These were flagged on the 7.2 build system, which is using the old spell
check. I think we should consider re-adding it as it missed some typos.

Relevant: https://github.com/valkey-io/valkey/pull/72

Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
2024-04-07 00:07:51 -07:00
0del
717ec1e144
Rename ValkeyModule_DefragModuleString to ValkeyModule_DefragValkeyModuleString (#243)
fixes: #242

---------

Signed-off-by: 0del <bany.y0599@gmail.com>
Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
2024-04-06 22:50:56 +02:00
Vitah Lin
ba0c93cbdf
Add redis symlinks at the same place as the installed binaries (#193)
Adds a new make variable called `USE_REDIS_SYMLINKS`, with default value
`yes`. If yes, then `make install` creates additional symlinks to the
installed binaries:

* `valkey-server`
* `valkey-cli`
* `valkey-benchmark`
* `valkey-check-rdb`
* `valkey-check-aof`
* `valkey-sentinel`

The names of the symlinks are the legacy redis binary names
(`redis-server`, etc.). The purpose is to provide backward compatibility
for scripts expecting the these filenames. The symlinks are installed in
the same directory as the binaries (typically `/usr/local/bin/` or
similar).

Similarly, `make uninstall` removes these symlinks if
`USE_REDIS_SYMLINKS` is `yes`.

This is described in a note in README.md.

Fixes #147

---------

Signed-off-by: Vitah Lin <vitahlin@gmail.com>
Co-authored-by: Madelyn Olson <34459052+madolson@users.noreply.github.com>
2024-04-06 18:41:53 +02:00
Parth
620d325fdc
Adding server.call/pcall option to LUA scripting. (#136) (#213)
This commit does not remove redis.call/pcall just yet. It also does not
rename Redis in error messages such as "Please specify at least one
argument for this redis lib call". This allows users to maintain full
backwards compatibility while introducing an option to use server.call
for new code.

I verified that the unit tests pass. Also manually verified that the
redis-server responds to server.call invocations within lua scripting.
Also verified that function registration works as expected.

```
[ok]: EVAL - is Lua able to call Redis API? (0 ms)
[ok]: EVAL - is Lua able to call Server API? (1 ms)
[ok]: EVAL - No arguments to redis.call/pcall is considered an error (0 ms)
[ok]: EVAL - No arguments to server.call/pcall is considered an error (1 ms)
```

---------

Signed-off-by: Parth Patel <661497+parthpatel@users.noreply.github.com>
Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>
2024-04-05 21:17:11 -07:00
Madelyn Olson
bc28fb4ac0
Update Server version to valkey version (#232)
This commit updates the following fields:
1. server_version -> valkey_version in server info. Since we would like
to advertise specific compatibility, we are making the version specific
to valkey. servername will remain as an optional indicator, and other
valkey compatible stores might choose to advertise something else.
1. We dropped redis-ver from the API. This isn't related to API
compatibility, but we didn't want to "fake" that valkey was creating an
rdb from a Redis version.
1. Renamed server-ver -> valkey_version in rdb info. Same as point one,
we want to explicitly indicate this was created by a valkey server.

---------

Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
2024-04-05 21:15:57 -07:00
Ping Xie
aaec321213
Remove REDISMODULE_ prefixes and introduce compatibility header (#194)
Fix #146 

Removed REDISMODULE_ prefixes from the core source code to align with
the new SERVERMODULE_ naming convention. Added a new 'redismodule.h'
header file to ensure full backward compatibility with existing modules.
This compatibility layer maps all legacy REDISMODULE_ prefixed
identifiers to their new SERVERMODULE_ equivalents, allowing existing
Redis modules to function without modification.

---------

Signed-off-by: Ping Xie <pingxie@google.com>
2024-04-05 16:59:55 -07:00
Shivshankar
906c8e8f90
delete cluster fail time script (#237) 2024-04-05 14:50:42 -07:00
Wen Hui
7f5bcc96f0
Update some valkey-cli related in tcl (#236)
Signed-off-by: hwware <wen.hui.ware@gmail.com>
2024-04-05 16:46:33 -04:00
Siddhartha Sankar Mondal
47444c67de
Update Makefile comments to relect name change (#106)
Update the comments in the Makefile to reflect the new names.

Signed-off-by: Siddhartha Mondal <siddharthmondal@gmail.com>
2024-04-05 13:54:28 -04:00
Wen Hui
490f4ebb65
Update runtest test name and test filename (#214)
Update runtest test name and test filename

Signed-off-by: hwware <wen.hui.ware@gmail.com>
2024-04-04 18:41:45 -07:00
Wen Hui
bb1a3fffe7
Fix CI break issue due to serverTests merged issue (#218)
Here is the latest CI run result for this PR

https://github.com/hwware/placeholderkv/actions/runs/8561152261

Signed-off-by: hwware <wen.hui.ware@gmail.com>
2024-04-04 17:37:15 -04:00
Viktor Söderqvist
4646d0825e
Redis in HELP commands (#216)
Removes the word Redis in the output of COMMAND HELP and DEBUG HELP.

---------

Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
2024-04-04 23:18:37 +02:00
Wen Hui
29621bc356
Update Valkey keyword in sentinel.conf (#171)
Mostly comments, but one pre-filled config in this template config file
is changed:

    pidfile /var/run/valkey-sentinel.pid

---------

Signed-off-by: hwware <wen.hui.ware@gmail.com>
2024-04-04 20:54:49 +02:00
Wen Hui
a0b09763d0
Update remaning redis to valkey in TLS.md (#201)
Updated TLS.md to remove references to Redis and replace them with Valkey.

Signed-off-by: hwware <wen.hui.ware@gmail.com>
2024-04-04 09:40:04 -07:00
0del
6ea6a693e2
Rename 'redis' to 'server' functions missing (#203)
related: https://github.com/valkey-io/valkey/issues/144

---------

Signed-off-by: 0del <bany.y0599@gmail.com>
2024-04-04 18:21:11 +02:00
Viktor Söderqvist
48184ae2db
In CONTIBUTING.md, mention how to link PR to issue (#197)
This little suggestion can help contributors to link their PRs to
issues. This, in turn, helps the maintainers.

Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
2024-04-04 10:42:14 -04:00
Lipeng Zhu
9c5e2bb226
Changes references to redis binaries in output of "--help", "--version" (#113)
Rename output from redis-* to valkey-* for binaries:

1. `valkey-benchmark`
2. `valkey-cli`
3. `valkey-server`
4. `valkey-sentinel`
5. `valkey-check-rdb`
6. `valkey-check-aof`

"--help" "--version" option.

Signed-off-by: Lipeng Zhu <lipeng.zhu@intel.com>
2024-04-04 10:46:17 +02:00
0del
e3e1f9a372
Rename 'redis' to 'server' and redisNodeFlags to clusterNodeFlags (#191)
Rename additional instances of redis to server, as well as redisNodeFlags to clusterNodeFlags.

---------

Signed-off-by: 0del <bany.y0599@gmail.com>
2024-04-03 18:45:23 -07:00
Shivshankar
9a02b775c1
Replace Valkey in runtest scripts error prints (#190)
Replaced Redis with Valkey in runtest script's error prints.

Signed-off-by: Shivshankar-Reddy <shiva.sheri.github@gmail.com>
2024-04-03 16:17:38 -07:00
Madelyn Olson
39d0f457a2
Update versioning fields for compatibility (#47)
New info information to be used to determine the valkey versioning info.

Internally, introduce new define values for "SERVER_VERSION" which is
different from the Redis compatibility version, "REDIS_VERSION".

Add two new info fields:
`server_version`: The Valkey server version
`server_name`: Indicates that the server is valkey.

Add one new RDB field: `server_ver`, which indicates the valkey version
that produced the server.

Add 3 new LUA globals: `SERVER_VERSION_NUM`, `SERVER_VERSION`, and
`SERVER_NAME`. Which reflect the valkey version instead of the Redis
compatibility version.

Also clean up various places where Redis and configuration was being
used that is no longer necessary.

---------

Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
2024-04-03 14:52:36 -07:00
Daniel House
55de74e0dc
The usage (--help) message now refers to valkey (#189)
Fixing redis -> valkey in the output of valkey-server --help.

Signed-off-by: Daniel House <daniel.house@huawei.com>
Co-authored-by: Daniel House <daniel.house@huawei.com>
2024-04-03 23:23:34 +02:00
Lipeng Zhu
e1cb4c8a8b
Rename #include guards (#167)
Rename include guard macros (redis -> valkey) and remove the leading double underscore.

---------

Signed-off-by: Lipeng Zhu <lipeng.zhu@intel.com>
Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
2024-04-03 23:20:06 +02:00
Shivshankar
e4d61c4825
Rename cli benchmark check-aof and check-rdb src and object files to valkey respectively (#188)
As part of earlier PRs binary names and some file names renamed to
valkey, but still cli and benchmark and other source files still with redis
name. So changed the file names and makefile accordingly.

Signed-off-by: hwware <wen.hui.ware@gmail.com>
Co-authored-by: hwware <wen.hui.ware@gmail.com>
2024-04-03 23:06:45 +02:00
Shivshankar
f3ccfbb01f
Rename TLS test cert files to valkey (#186)
This PR covers changing the redis.crt and redis.key to valkey certs for
TLS testing.

The files are generated by the gen-test-certs.sh script under tests/tls/.

Also covers comments provided.

Signed-off-by: hwware <wen.hui.ware@gmail.com>
Co-authored-by: hwware <wen.hui.ware@gmail.com>
2024-04-03 23:04:52 +02:00
0del
125a2987af
rename git sha related (#184)
redisGitSHA1 -> serverGitSHA1  
redisGitDirty -> serverGitDirty 
redisBuildId -> serverBuildId 
redisBuildIdRaw -> serverBuildIdRaw 
redisBuildIdString -> serverBuildIdString

#144
#170

Signed-off-by: 0del <bany.y0599@gmail.com>
2024-04-03 20:46:23 +02:00
0del
598b951fb5
rename redisServer to valkeyServer (#183)
https://github.com/valkey-io/valkey/issues/144

Signed-off-by: 0del <bany.y0599@gmail.com>
2024-04-03 20:34:18 +02:00
0del
3a0ba0ad93
rename redisCommandArgType serverCommandArgType (#182)
redisCommandArgType -> serverCommandArgType
redisCommandArg -> serverCommandArg

https://github.com/valkey-io/valkey/issues/144

Signed-off-by: 0del <bany.y0599@gmail.com>
2024-04-03 20:33:38 +02:00
0del
edee864b34
rename redisOp to serverOp (#181)
https://github.com/valkey-io/valkey/issues/144

Signed-off-by: 0del <bany.y0599@gmail.com>
2024-04-03 20:30:30 +02:00
0del
8b3ab8f74f
Rename redisAtomic to serverAtomic (#180)
https://github.com/valkey-io/valkey/issues/144

Signed-off-by: 0del <bany.y0599@gmail.com>
2024-04-03 20:29:33 +02:00
0del
f753db5141
rename redis functions in server.h (#179)
redisPopcount -> serverPopcount
redisSetProcTitle -> serverSetProcTitle
redisCommunicateSystemd -> serverCommunicateSystemd
redisSetCpuAffinity -> serverSetCpuAffinity
redisFork -> serverFork

#144

Signed-off-by: 0del <bany.y0599@gmail.com>
2024-04-03 20:26:33 +02:00
0del
add5f5615c
Rename some redis structs to server (#178)
- redisFunctionSym -> serverFunctionSym
- redisSortObject -> serverSortObject
- redisSortOperation -> serverSortOperation

#144

Signed-off-by: 0del <bany.y0599@gmail.com>
2024-04-03 20:24:01 +02:00
Harkrishn Patro
1736018aa9
Remove trademarked wording on configuration file and individual configs (#29)
Remove trademarked wording on configuration layer.

Following changes for release notes:

1. Rename redis.conf to valkey.conf
2. Pre-filled config in the template config file: Changing pidfile to `/var/run/valkey_6379.pid`

Signed-off-by: Harkrishn Patro <harkrisp@amazon.com>
2024-04-03 19:47:26 +02:00
0del
1629e28f86
Rename redisError to serverError (#177)
Part of #144

Signed-off-by: 0del <bany.y0599@gmail.com>
2024-04-03 19:12:34 +02:00
0del
c413834da1
Rename redisTLSContextConfig to serverTLSContextConfig (#176)
Part of #144

Signed-off-by: 0del <bany.y0599@gmail.com>
2024-04-03 19:06:01 +02:00
0del
25122b140e
Rename redisObject to serverObject (#175)
Part of #144

Signed-off-by: 0del <bany.y0599@gmail.com>
2024-04-03 19:04:51 +02:00
0del
b19ebaf551
Rename redisCommand to serverCommand (#174)
Part of #144

---------

Signed-off-by: 0del <bany.y0599@gmail.com>
2024-04-03 18:54:33 +02:00
0del
a236fc8ef0
Rename redisCommandProc, redisGetKeysProc to server prefix (#173)
Part of #144

Signed-off-by: 0del <bany.y0599@gmail.com>
2024-04-03 18:33:33 +02:00
0del
99bdcc0ed0
Rename redisCommandGroup to serverCommandGroup (#172)
Part of issue #144

Signed-off-by: 0del <bany.y0599@gmail.com>
2024-04-03 18:29:59 +02:00
0del
def09488aa
Rename redis_member2struct ro server_member2struct (#166)
part of #144

Signed-off-by: 0del <bany.y0599@gmail.com>
2024-04-03 17:07:28 +08:00
Jun Luo
69d28be0f1
Rename redis to valkey in create-cluster script (#165)
Otherwise currently create-cluster will not work (because there is no redis-*).

Signed-off-by: Jun Luo <luojunmoo@gmail.com>
2024-04-03 16:39:16 +08:00
0del
730174445b
Rename redisOpArray to serverOpArray (#157)
A task of #144

Signed-off-by: 0del <bany.y0599@gmail.com>
2024-04-03 12:26:20 +08:00
Vitah Lin
cbbaf69d1d
Remove unused REDIS_TEST_MAIN dead code in crc64.c (#160)
We use `#ifdef SERVER_TEST` to run the relavent tests, we can now remove
the dead code `#ifdef REDIS_TEST_MAIN`.

Signed-off-by: Vitah Lin <vitahlin@gmail.com>
2024-04-03 12:24:55 +08:00
0del
717dfe8022
Rename redisDb to serverDb (#156)
A task of #144.

Signed-off-by: 0del <bany.y0599@gmail.com>
2024-04-03 11:02:43 +08:00
0del
98892bb5c3
Rename redisMemOverhead to serverMemOverhead (#159)
Part of #144.

Signed-off-by: 0del <bany.y0599@gmail.com>
2024-04-03 10:29:42 +08:00
Gabi Ganam
7a7288b292
Log unused module configuration entries that cause redis to abort (#132)
Log unused module configuration entries that cause redis to abort
on startup.

Example:
```
17797:M 31 Mar 2024 12:26:12.146 # Unused Module Configuration: module1.whatever
17797:M 31 Mar 2024 12:26:12.146 # Unused Module Configuration: module2.test
17797:M 31 Mar 2024 12:26:12.146 # Module Configuration detected without loadmodule directive or no ApplyConfig call: aborting
```

Signed-off-by: Gabi Ganam <ggabi@amazon.com>
2024-04-02 16:03:31 -07:00
Wen Hui
c0a83c0058
Fix CI centos issue (#150)
Because centos do not support actions/checkout@v4, we need roll back to
actions/checkout@v3

Please check the run result
https://github.com/hwware/placeholderkv/actions/runs/8526052560/job/23354458574

It looks our CI make happy now

Signed-off-by: hwware <wen.hui.ware@gmail.com>
2024-04-02 14:48:12 -04:00
0del
c9fff60178
Pin 'typos' spellcheck to fixed version in CI (#151)
Pin aiki-e/install-action to v2.32.2 (currently the latest version).

Fixes #140.

Signed-off-by: 0del <bany.y0599@gmail.com>
2024-04-02 19:48:26 +02:00
pragnesh
42e00635af
Fix inconsistencies in the json command file descriptions (#63)
Grammar. Use full sentences.

Signed-off-by: Pragnesh <pg.radadia@gmail.com>
2024-04-02 19:15:23 +02:00
0del
621edbafba
Rename redisassert to serverassert in comment (#142)
Signed-off-by: 0del <bany.y0599@gmail.com>
2024-04-02 23:44:00 +08:00
0del
a1516d53de
Rename files redisassert to serverassert (#138)
Signed-off-by: 0del <bany.y0599@gmail.com>
2024-04-02 15:56:17 +02:00
Vitah Lin
98e7b41b85
Fix rename redis to valkey to pass reply-schemas-validator job (#133)
Signed-off-by: Vitah Lin <vitahlin@gmail.com>
2024-04-02 09:42:50 -04:00
LiiNen
ba8bda9cff
Fix shutdown syntax hint, following intention (#116)
Change syntax hint from

    SHUTDOWN [NOSAVE | SAVE] [NOW] [FORCE] [ABORT]

to

    SHUTDOWN [[NOSAVE | SAVE] [NOW] [FORCE] | ABORT]

It's not that important for docs, but the latter is preferred for valkey-cli's automatic syntax hints.

Signed-off-by: LiiNen <kjeonghoon065@gmail.com>
2024-04-02 15:36:53 +02:00
John Vandenberg
83507d74b8
Update for latest 'typos' (#139)
Signed-off-by: john vandenberg <jayvdb@gmail.com>
Co-authored-by: john vandenberg <jayvdb@192-168-1-101.tpgi.com.au>
2024-04-02 14:51:14 +02:00
Vitah Lin
e35d86f2a2
Fix rename REDIS_TEST to SERVER_TEST to pass the Daily workflow (#131)
The test flag `REDIS_TEST` has already be changed to `SERVER_TEST` in
`.github/workflows/daily.yml`, the name in the src directory need to be
changed as well.

```shell
 run: |
        sudo apt-get update && sudo apt-get install libc6-dev-i386
        make 32bit SERVER_CFLAGS='-Werror -DSERVER_TEST'
```

Signed-off-by: Vitah Lin <vitahlin@gmail.com>
2024-04-02 15:43:37 +08:00
0del
4d7fff9aba
Remove unused var desc in luaRegisterFunctionReadPositionalArgs (#130)
desc is set to NULL, never set to anything, and then
checked if it should be freed on the error path if it's NULL.
This can be cleaned up, since it's really unused.

Fixes #129

Signed-off-by: 0del <bany.y0599@gmail.com>
2024-04-02 15:37:19 +08:00
Madelyn Olson
0ba2f1b14e
Update coverity to reflect project name (#127)
Fix the coverity name to reflect the actual name in coverity. See
successful build here:
https://github.com/valkey-io/valkey/actions/runs/8516329554. Also
removed unnecessary TCL dependency from the install.

Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
2024-04-01 21:31:17 -07:00
ICHINOSE Shogo
0a51ceca88
bump actions/checkout v4 (#87)
Node.js 16 actions are deprecated. To result them we are updating to actions/checkout@v4.
For more information see:
https://github.blog/changelog/2023-09-22-github-actions-transitioning-from-node-16-to-node-20/.

e.g. failure https://github.com/valkey-io/valkey/actions/runs/8482578610

---------

Signed-off-by: ICHINOSE Shogo <shogo82148@gmail.com>
2024-04-01 18:44:21 -07:00
Roshan Khatri
3630dd08a6
Restore all tests state prior to fork (#117)
Related to
https://github.com/valkey-io/valkey/pull/11#issuecomment-2028930612
Restore all tests state prior to fork and re-enables Daily tests on PRs
on release branches.
Reverts
2aa820f945

---------

Signed-off-by: Roshan Khatri <rvkhatri@amazon.com>
2024-04-01 12:55:01 -04:00
Ping Xie
bdd7314f41
Fine-tune ASCII art (#103) 2024-03-31 19:56:24 -07:00
John Vandenberg
253fe9dced
Fix typos and replace 'codespell' with 'typos' (#72)
Uses https://github.com/taiki-e/install-action to install
https://github.com/crate-ci/typos in CI

This finds many more/different typos than
https://github.com/codespell-project/codespell , while having very few
false positives.

Signed-off-by: John Vandenberg <jayvdb@gmail.com>
2024-03-31 12:38:22 -07:00
Cong Chen
af1b0de92d
Fix typo (#84)
A simple PR to fix a typo.

Signed-off-by: Cong Chen <iamchencong@gmail.com>
2024-03-30 22:11:13 -07:00
ranshid
b2a397366b
Change ascii logo with temporal valkey logo (#77) 2024-03-30 11:50:16 -07:00
Vitah Lin
de311aea53
Doc add SECURITY.md link inside CONTRIBUTING.md (#96)
Signed-off-by: Vitah Lin <linw1225@gmail.com>
Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
2024-03-30 16:24:21 +01:00
Roshan Khatri
3ab066f710
For additional compatibility this adds REDIS_CFLAGS and REDIS_LDFLAGS support to MAKEFILE (#66)
This resolves (1.viii) from
https://github.com/valkey-io/valkey/issues/43
> REDIS_FLAGS will be updated to SERVER_FLAGS. Maybe we should also
allow REDIS_FLAGS -> SERVER_FLAGS as well, for an extra layer of
compatibility.

---------

Signed-off-by: Roshan Khatri <rvkhatri@amazon.com>
2024-03-30 15:42:00 +01:00
Mikel Olasagasti Uranga
289eb47eb9
Replace offensive term (#86)
Signed-off-by: Mikel Olasagasti Uranga <mikel@olasagasti.info>
2024-03-30 14:23:50 +01:00
Ping Xie
1950acd1e2
Fix remaining names in CONTRIBUTING.md (#70)
Signed-off-by: Ping Xie <pingxie@google.com>
2024-03-29 18:38:13 +01:00
Vitah Lin
9be3a1d5ab
Fix renameing names in SECURITY.md (#74)
Signed-off-by: Vitah Lin <linw1225@gmail.com>
2024-03-29 08:25:51 -07:00
John Vandenberg
1fa8da3359
Rename functionsLibMataData to functionsLibMetaData (#71)
Signed-off-by: John Vandenberg <jayvdb@gmail.com>
2024-03-28 19:34:07 -07:00
Madelyn Olson
57789d4d08
Update naming to to Valkey (#62)
Documentation references should use `Valkey` while server and cli
references are all under `valkey`.

---------

Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
2024-03-28 09:58:28 -07:00
Harkrishn Patro
c120a45874
Sharded pubsub command execution within multi/exec (#13)
Allow SPUBLISH command within multi/exec on replica.
2024-03-27 21:29:44 -07:00
Madelyn Olson
699f62e8e3
Updated contributing file with DCO (#48)
Update the contributor file to include a DCO, which are going to require
for all incoming contributions to provide indication that the
contributor has the rights to contribute the code.

I also fixed the formatting of contribution HOWTO, since it was getting
mangled.

Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
2024-03-27 15:24:42 -07:00
Roshan Khatri
c7df0f06e7
Makefile respect user's SERVER_CFLAGS and OPT (#55)
Makefile respect user's SERVER_CFLAGS and OPT.

This was unintentionally modified by
38632278fd (diff-3e2513390df543315686d7c85bd901ca9256268970032298815d2f893a9f0685).

Signed-off-by: Roshan Khatri <rvkhatri@amazon.com>
2024-03-27 23:19:04 +01:00
Parth
11ddd8b289
Changing Redis references in files documenting development guidelines (#26) (#45)
Changing some references from
Redis to placeholderkv.

Signed-off-by: Parth Patel <661497+parthpatel@users.noreply.github.com>
2024-03-26 19:55:37 -07:00
Viktor Söderqvist
3782446a40
Untrademark json files (#35)
Replaces #26

Name agnostic. Now without spelling errors, ready to merge if you ask me.
2024-03-26 11:53:34 -07:00
Viktor Söderqvist
975d3b6947
Update issue templates (#37)
Update issue template to remove reference to Redis

Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
2024-03-26 09:58:44 -07:00
Harkrishn Patro
8153b3c8fd
Cleanup tcl tmp directory leaking resources (#34)
Few of the servers log are stored as `server1/2/3.log` . Various type of
acl files are created and weren't getting cleaned up prior to this
change.
2024-03-26 16:26:22 +01:00
Wen Hui
1c9710fca1
Fix the CI sentinel break issue (#30)
I run the CI in my branch, test cases for sentinel could pass


https://github.com/hwware/placeholderkv/actions/runs/8429003027/job/23082549380
2024-03-26 10:20:08 -04:00
Roshan Khatri
340ab6d62d
Fixes external server tests and change other references (#14) 2024-03-25 18:49:52 +01:00
PlavorSeol
601177b416
Update Discord server invite (#24)
Update Discord server invite link to a new, working link
2024-03-25 10:34:51 +01:00
Roshan Khatri
2aa820f945
Avoid daily test to run on PRs (#11) 2024-03-22 12:36:08 -07:00
Madelyn Olson
3a5e2b604e
Merge pull request #10 from roshkhatri/unstable
Adds discord link for discussions.
2024-03-22 11:04:18 -07:00
Madelyn Olson
1ea46d3ddb
Merge pull request #7 from hpatro/patch-1
Update SECURITY.md
2024-03-22 10:48:37 -07:00
Roshan Khatri
e0edcbaf7c
Adds discord link for discussions. 2024-03-22 10:39:59 -07:00
Harkrishn Patro
9f0b3bbe78
Update SECURITY.md 2024-03-22 08:46:10 -07:00
Madelyn Olson
9c6c6e52d7
Update CODE_OF_CONDUCT.md (#5) 2024-03-22 08:12:45 +01:00
Harkrishn Patro
8a976ab609
Update SECURITY.md 2024-03-21 23:44:08 -07:00
Madelyn Olson
dc3e76b3fb
Merge pull request #6 from madolson/fix_daily_yml 2024-03-21 23:19:57 -07:00
Madelyn Olson
9da3166e5c Fix daily.yml change 2024-03-21 22:41:23 -07:00
Madelyn Olson
af205e7d86
Update README.md 2024-03-21 21:41:01 -07:00
Madelyn Olson
75929f46c5
Update README.md 2024-03-21 21:40:43 -07:00
Ping Xie
0e9a25fb59
Merge pull request #2 from madolson/test-pr 2024-03-21 20:05:42 -07:00
Madelyn Olson
1dcd7a32fa Updated security.md 2024-03-21 20:03:59 -07:00
Madelyn Olson
084ab10e17 Disabled some workflows for now 2024-03-21 19:50:02 -07:00
Madelyn Olson
2e9855fbd0 Disable some workflows since I don't want to burn through the free tier 2024-03-21 19:39:21 -07:00
Madelyn Olson
bd7387354a Fixed fork/exec problem 2024-03-21 19:36:11 -07:00
Madelyn Olson
9a6721bd11 Fix check aof 2024-03-21 19:19:40 -07:00
Madelyn Olson
3586485355 Moved to correct cli and benchmark 2024-03-21 19:16:03 -07:00
Madelyn Olson
ee107481e5 Refactor some tests to reference new executable 2024-03-21 19:11:08 -07:00
Madelyn Olson
0aaae734f4 Fix incorrect find/replace 2024-03-21 19:04:09 -07:00
Madelyn Olson
38632278fd A single commit to get stuff building 2024-03-21 19:00:46 -07:00
Yanqi Lv
e64d91c371
Fix dict use-after-free problem in kvs->rehashing (#13154)
In ASAN CI, we find server may crash because of NULL ptr in `kvstoreIncrementallyRehash`.
the reason is that we use two phase unlink in `dbGenericDelete`. After `kvstoreDictTwoPhaseUnlinkFind`,
the dict may be in rehashing and only have one element in ht[0] of `db->keys`.

When we delete the last element in `db->keys` meanwhile `db->keys` is in rehashing, we may free the
dict in `kvstoreDictTwoPhaseUnlinkFree` without deleting the node in `kvs->rehashing`. Then we may
use this freed ptr in `kvstoreIncrementallyRehash` in the `serverCron` and cause the crash.
This is indeed a use-after-free problem.

The fix is to call rehashingCompleted in dictRelease and dictEmpty, so that every call for
rehashingStarted is always matched with a rehashingCompleted.

Adding a test in the unit test to catch it consistently

---------

Co-authored-by: Oran Agra <oran@redislabs.com>
Co-authored-by: debing.sun <debing.sun@redis.com>
2024-03-20 22:44:28 +02:00
Yanqi Lv
bad33f8738
fix wrong data type conversion in zrangeResultBeginStore (#13148)
In `beginResultEmission`, -1 means the result length is not known in
advance. But after #12185, if we pass -1 to `zrangeResultBeginStore`, it
will convert to SIZE_MAX in `zsetTypeCreate` and try to `dictExpand`.
Although `dictExpand` won't succeed because the size overflows, I think
we'd better to avoid this wrong conversion.

This bug can be triggered when the source of `zrangestore` doesn't exist
or we use `zrangestore` command with `byscore` or `bylex`.
The impact is that dst keys will be converted to use skiplist instead of
listpack.
2024-03-19 08:52:55 +02:00
Binbin
e04d41d78d
Prevent lua error_reply abuse from causing errorstats to become larger (#13141)
Users who abuse lua error_reply will generate a new error object on each
error call, which can make server.errors get bigger and bigger. This
will
cause the server to block when calling INFO (we also return errorstats
by
default).

To prevent the damage it can cause, when a misuse is detected, we will
print a warning log and disable the errorstats to avoid adding more new
errors. It can be re-enabled via CONFIG RESETSTAT.

Because server.errors may be very large (it may be better now since we
have the limit), config resetstat may block for a while. So in
resetErrorTableStats, we will try to lazyfree server.errors.

See the related discussion at the end of #8217.
2024-03-19 08:18:22 +02:00
Chen Tianjie
aeada20140
Avoid unnecessary dict shrink in zremrangeGenericCommand (#13143)
If the skiplist is emptied, there is no need to shrink the dict in
skiplist, it can be deleted directly.
2024-03-19 10:14:19 +08:00
Binbin
7b070423b8
Fix dictionary use-after-free in active expire and make kvstore iter to respect EMPTY flag (#13135)
After #13072, there is an use-after-free error. In expireScanCallback, we
will delete the dict, and then in dictScan we will continue to use the dict,
like we will doing `dictResumeRehashing(d)` in the end, this casued an error.

In this PR, in freeDictIfNeeded, if the dict's pauserehash is set, don't
delete the dict yet, and then when scan returns try to delete it again.

At the same time, we noticed that there will be similar problems in iterator.
We may also delete elements during the iteration process, causing the dict
to be deleted, so the part related to iter in the PR has also been modified.
dictResetIterator was also missing from the previous kvstoreIteratorNextDict,
we currently have no scenario that elements will be deleted in kvstoreIterator
process, deal with it together to avoid future problems. Added some simple
tests to verify the changes.

In addition, the modification in #13072 omitted initTempDb and emptyDbAsync,
and they were also added. This PR also remove the slow flag from the expire
test (consumes 1.3s) so that problems can be found in CI in the future.
2024-03-18 17:41:54 +02:00
Alexander Mahone
98a6e55d4e
Add missing REDIS_STATIC in quicklist (#13147)
Compiler complained when I tried to compile only quicklist.c.
Since static keyword is needed when a static function declaration is
placed before its implementation.

```
#ifndef REDIS_STATIC
#define REDIS_STATIC static
#endif
```

[How to solve static declaration follows non-static declaration in GCC C
code?](https://stackoverflow.com/questions/3148244/how-to-solve-static-declaration-follows-non-static-declaration-in-gcc-c-code)
2024-03-18 08:22:19 +02:00
Madelyn Olson
3f725b8619
Change mmap rand bits as a temporary mitigation to resolve asan bug (#13150)
There is a new change in linux kernel 6.6.6 that uses randomization of
address space to harden security, but it interferes with the way ASAN
works. Folks are working on a fix, but this is a temporary mitigation
for us to get our CI to be green again.

See https://github.com/google/sanitizers/issues/1716 for more
information

See
https://github.com/redis/redis/actions/runs/8305126288/job/22731614306?pr=13149
for a recent failure
2024-03-17 09:06:51 +02:00
Viktor Söderqvist
1d77a8e2c5
Makefile respect user's REDIS_CFLAGS and OPT (#13073)
This change to the Makefile makes it possible to opt out of
`-fno-omit-frame-pointer` added in #12973 and `-flto` (#11350). Those
features were implemented by conditionally modifying the `REDIS_CFLAGS`
and `REDIS_LDFLAGS` variables. Historically, those variables provided a
way for users to pass options to the compiler and linker unchanged.

Instead of conditionally appending optimization flags to REDIS_CFLAGS
and REDIS_LDFLAGS, I want to append them to the OPTIMIZATION variable.

Later in the Makefile, we have `OPT=$(OPTIMIZATION)` (meaning
OPTIMIZATION is only a default for OPT, but OPT can be overridden by the
user), and later the flags are combined like this:

FINAL_CFLAGS=$(STD) $(WARN) $(OPT) $(DEBUG) $(CFLAGS) $(REDIS_CFLAGS)
    FINAL_LDFLAGS=$(LDFLAGS) $(OPT) $(REDIS_LDFLAGS) $(DEBUG)

This makes it possible for the the user to override all optimization
flags with e.g. `make OPT=-O1` or just `make OPT=`.

For some reason `-O3` was also already added to REDIS_LDFLAGS by default
in #12339, so I added OPT to FINAL_LDFLAGS to avoid more complex logic
(such as introducing a separate LD_OPT variable).
2024-03-13 17:02:00 +02:00
Binbin
3b3d16f748
Add KVSTORE_FREE_EMPTY_DICTS to cluster mode keys / expires kvstore (#13072)
Currently (following #11695, and #12822), keys kvstore and expires
kvstore both flag with ON_DEMAND, it means that a cluster node will
only allocate a dict when the slot is assigned to it and populated,
but on the other hand, when the slot is unassigned, the dict will
remain allocated.

We considered releasing the dict when the slot is unassigned, but it
causes complications on replicas. On the other hand, from benchmarks
we conducted, it looks like the performance impact of releasing the
dict when it becomes empty and re-allocate it when a key is added
again, isn't huge.

This PR add KVSTORE_FREE_EMPTY_DICTS to cluster mode keys / expires
kvstore.

The impact is about about 2% performance drop, for this hopefully
uncommon scenario.

---------

Co-authored-by: Oran Agra <oran@redislabs.com>
2024-03-13 08:30:20 +02:00
Binbin
ad28d222ed
Lua eval scripts first in first out LRU eviction (#13108)
In some cases, users will abuse lua eval. Each EVAL call generates
a new lua script, which is added to the lua interpreter and cached
to redis-server, consuming a large amount of memory over time.

Since EVAL is mostly the one that abuses the lua cache, and these
won't have pipeline issues (i.e. the script won't disappear
unexpectedly,
and cause errors like it would with SCRIPT LOAD and EVALSHA),
we implement a plain FIFO LRU eviction only for these (not for
scripts loaded with SCRIPT LOAD).

### Implementation notes:
When not abused we'll probably have less than 100 scripts, and when
abused we'll have many thousands. So we use a hard coded value of 500
scripts. And considering that we don't have many scripts, then unlike
keys, we don't need to worry about the memory usage of keeping a true
sorted LRU linked list. We compute the SHA of each script anyway,
and put the script in a dict, we can store a listNode there, and use
it for quick removal and re-insertion into an LRU list each time the
script is used.

### New interfaces:
At the same time, a new `evicted_scripts` field is added to
INFO, which represents the number of evicted eval scripts. Users
can check it to see if they are abusing EVAL.

### benchmark:
`./src/redis-benchmark -P 10 -n 1000000 -r 10000000000 eval "return
__rand_int__" 0`

The simple abuse of eval benchmark test that will create 1 million EVAL
scripts. The performance has been improved by 50%, and the max latency
has dropped from 500ms to 13ms (this may be caused by table expansion
inside Lua when the number of scripts is large). And in the INFO memory,
it used to consume 120MB (server cache) + 310MB (lua engine), but now
it only consumes 70KB (server cache) + 210KB (lua_engine) because of
the scripts eviction.

For non-abusive case of about 100 EVAL scripts, there's no noticeable
change in performance or memory usage.

### unlikely potentially breaking change:
in theory, a user can maybe load a
script with EVAL and then use EVALSHA to call it (by calculating the
SHA1 value on the client side), it could be that if we read the docs
carefully we'll realized it's a valid scenario, but we suppose it's
extremely rare. So it may happen that EVALSHA acts on a script created
by EVAL, and the script is evicted and EVALSHA returns a NOSCRIPT error.
that is if you have more than 500 scripts being used in the same
transaction / pipeline.

This solves the second point in #13102.
2024-03-13 08:27:41 +02:00
Ronen Kalish
a8e745117f
Xread last entry in stream (#7388) (#13117)
Allow using `+` as a special ID for last item in stream on XREAD
command.

This would allow to iterate on a stream with XREAD starting with the
last available message instead of the next one which `$` is used for.
I.e. the caller can use `BLOCK` and `+` on the first call, and change to
`$` on the next call.

Closes #7388

---------

Co-authored-by: Felipe Machado <462154+felipou@users.noreply.github.com>
2024-03-13 08:23:32 +02:00
Viktor Söderqvist
9efc6ad6a6
Add API RedisModule_ClusterKeySlot and RedisModule_ClusterCanonicalKeyNameInSlot (#13069)
Sometimes it's useful to compute a key's cluster slot in a module.

This API function is just like the command CLUSTER KEYSLOT (but faster).

A "reverse" API is also added:
`RedisModule_ClusterCanonicalKeyNameInSlot`. Given a slot, it returns a
short string that we can call a canonical key for the slot.
2024-03-12 09:26:12 -07:00
Andy Pan
9c065c417d
Enable accept4() on *BSD (#13104)
Redis enabled `accept4` on Linux after #9177, reducing extra system
calls for sockets.

`accept4` system call is also widely supported on *BSD and Solaris in
addition to Linux. This PR enables `accept4` on all platforms that
support it.

### References
- [accept4 on
FreeBSD](https://man.freebsd.org/cgi/man.cgi?query=accept4&sektion=2&n=1)
- [accept4 on
DragonFly](https://man.dragonflybsd.org/?command=accept&section=2)
- [accept4 on NetBSD](https://man.netbsd.org/accept.2)
- [accept4 on OpenBSD](https://man.openbsd.org/accept4.2)
- [accept4 on
Solaris](https://docs.oracle.com/cd/E88353_01/html/E37843/accept4-3c.html)
2024-03-12 16:35:52 +02:00
Binbin
da727ad445
Fix redis-check-aof incorrectly considering data in manifest format as MP-AOF (#12958)
The check in fileIsManifest misjudged the manifest file. For example,
if resp aof contains "file", it will be considered a manifest file and
the check will fail:
```
*3
$3
set
$4
file
$4
file
```

In #12951, if the preamble aof also contains it, it will also fail.
Fixes #12951.

the bug was happening if the the word "file" is mentioned
in the first 1024 lines of the AOF. and now as soon as it finds
a non-comment line it'll break (if it contains "file" or doesn't)
2024-03-12 08:47:43 +02:00
Harkrishn Patro
3c8d15f8c3
Pick random slot for a node to distribute operation across slots in redis-benchmark (#12986)
Distribute operations via `redis-benchmark` on different slots owned by
node.

`current_slot_index` is never updated hence the value is always `0` and
the tag used is always the first slot owned by the node. Hence any
read/write operation via `redis-benchmark` in cluster mode always
happens on a particular slot.

This is inconvenient to load data uniformly via `redis-benchmark`.
2024-03-11 11:19:30 -07:00
Matthew Douglass
5fdaa53d20
Fix conversion of numbers in lua args to redis args (#13115)
Since lua_Number is not explicitly an integer or a double, we need to
make an effort
to convert it as an integer when that's possible, since the string could
later be used
in a context that doesn't support scientific notation (e.g. 1e9 instead
of 100000000).

Since fpconv_dtoa converts numbers with the equivalent of `%f` or `%e`,
which ever is shorter,
this would break if we try to pass a long integer number to a command
that takes integer.
we'll get an implicit conversion to string in Lua, and then the parsing
in getLongLongFromObjectOrReply will fail.

```
> eval "redis.call('hincrby', 'key', 'field', '1000000000')" 0
(nil)
> eval "redis.call('hincrby', 'key', 'field', tonumber('1000000000'))" 0
(error) ERR value is not an integer or out of range script: ac99c32e4daf7e300d593085b611de261954a946, on @user_script:1.
```

Switch to using ll2string if the number can be safely represented as a
long long.

The problem was introduced in #10587 (Redis 7.2).
closes #13113.

---------

Co-authored-by: Binbin <binloveplay1314@qq.com>
Co-authored-by: debing.sun <debing.sun@redis.com>
Co-authored-by: Oran Agra <oran@redislabs.com>
2024-03-10 08:46:49 +02:00
Madelyn Olson
4979cf02ff
Change crc16 slot table to be fixed size character array instead of pointer to strings (#13112)
Update the crc16 hash lookup table to use fixed size character arrays instead of pointers 
to static string addresses. Since the actual values are so short, we can just store them
in a uniform array instead. This saves about 128kb of memory and should improve the 
performance as well since we should have much better memory locality.
2024-03-08 15:50:36 -08:00
debing.sun
9738ba9841
Check user's oom_score_adj write permission for oom-score-adj test (#13111)
`CONFIG SET oom-score-adj handles configuration failures` test failed in
some CI jobs today.
Failed CI: https://github.com/redis/redis/actions/runs/8152519326

Not sure why the github action's docker image perssions have changed,
but the issue is similar to #12887,
where we can't assume the range of oom_score_adj that a user can change.

## Solution:
Modify the way of determining whether the current user has no privileges
or not,
instead of relying on whether the user id is 0 or not.
2024-03-05 14:42:28 +02:00
Ping Xie
28976a9003
Fix PONG message processing for primary-ship tracking during failovers (#13055)
This commit updates the processing of PONG gossip messages in the
cluster. When a node (B) becomes a replica due to a failover, its PONG
messages include its new primary node's (A) information and B's
configuration epoch is aligned with A's. This allows observer nodes to
identify changes in primary-ship, addressing issues of intermediate
states and enhancing cluster state consistency during topology changes.

Fix #13018
2024-03-04 17:32:25 -08:00
debing.sun
ad12730333
Implement defragmentation for pubsub kvstore (#13058)
After #13013

### This PR make effort to defrag the pubsub kvstore in the following
ways:

1. Till now server.pubsub(shard)_channels only share channel name obj
with the first subscribed client, now change it so that the clients and
the pubsub kvstore share the channel name robj.
This would save a lot of memory when there are many subscribers to the
same channel.
It also means that we only need to defrag the channel name robj in the
pubsub kvstore, and then update
all client references for the current channel, avoiding the need to
iterate through all the clients to do the same things.
    
2. Refactor the code to defragment pubsub(shard) in the same way as
defragment of keys and EXPIRES, with the exception that we only
defragment pubsub(without shard) when slot is zero.


### Other
Fix an overlook in #11695, if defragment doesn't reach the end time, we
should wait for the current
db's keys and expires, pubsub and pubsubshard to finish before leaving,
now it's possible to exit
early when the keys are defragmented.

---------

Co-authored-by: oranagra <oran@redislabs.com>
2024-03-04 16:56:50 +02:00
Binbin
33ea432585
Call finalizerProc when free the aeTimeEvent in ae (#13101)
Supplement to #6189, we also need to call finalizerProc.
This is a minor cleanup, no one currently uses this finalizerProc
feature.
2024-03-03 09:20:18 +02:00
Binbin
df75153d79
Fix reply schemas validator build issue due to new regular expression (#13103)
The new regular expression break the validator:
```
In file included from commands.c:10:
commands_with_reply_schema.def:14528:72: error: stray ‘\’ in program
14528 | struct jsonObjectElement MEMORY_STATS_ReplySchema_patternProperties__db\_\d+__properties_overhead_hashtable_main_elements[] = {
```

The reason is that special characters are not added to to_c_name,
causes special characters to appear in the structure name, causing
c file compilation to fail.

Broken by #12913
2024-03-02 21:26:05 +02:00
YaacovHazan
a50bbcb656
redis-cli fixes around help hints version filtering (#13097)
- In removeUnsupportedArgs, trying to access the next item after the
last one and causing an out of bounds read.
- In versionIsSupported, when the 'version' is equal to 'since', the
return value is 0 (not supported).
Also, change the function to return `not supported` in case they have
different numbers of digits

Both issues were found by `Non-interactive non-TTY CLI: Test
command-line hinting - old server` under `test-sanitizer-address` (When
changing the `src/version.h` locally to `8.0.0`)

The new `MAXAGE` argument inside `client-kill` triggered the issue (new
argument at the end of the list)

---------

Co-authored-by: YaacovHazan <yaacov.hazan@redislabs.com>
2024-03-02 11:48:36 +02:00
Chen Tianjie
4cae99e785
Add overhead of all DBs and rehashing dict count to info. (#12913)
Sometimes we need to make fast judgement about why Redis is suddenly
taking more memory. One of the reasons is main DB's dicts doing
rehashing.

We may use `MEMORY STATS` to monitor the overhead memory of each DB, but
there still lacks a total sum to show an overall trend. So this PR adds
the total overhead of all DBs to `INFO MEMORY` section, together with
the total count of rehashing DB dicts, providing some intuitive metrics
about main dicts rehashing.

This PR adds the following metrics to INFO MEMORY
* `mem_overhead_db_hashtable_rehashing` - only size of ht[0] in
dictionaries we're rehashing (i.e. the memory that's gonna get released
soon)

and a similar ones to MEMORY STATS:
* `overhead.db.hashtable.lut` (complements the existing
`overhead.hashtable.main` and `overhead.hashtable.expires` which also
counts the `dictEntry` structs too)
* `overhead.db.hashtable.rehashing` - temporary rehashing overhead.
* `db.dict.rehashing.count` - number of top level dictionaries being
rehashed.

---------

Co-authored-by: zhaozhao.zz <zhaozhao.zz@alibaba-inc.com>
Co-authored-by: Oran Agra <oran@redislabs.com>
2024-03-01 13:41:24 +08:00
Binbin
f17381a38d
Fix propagation of entries_read by calling streamPropagateGroupID unconditionally (#12898)
In XREADGROUP ACK, because streamPropagateXCLAIM does not propagate
entries-read, entries-read will be inconsistent between master and
replicas.
I.e. if no entries were claimed, it would have propagated correctly, but
if some
were claimed, then the entries-read field would be inconsistent on the
replica.

The fix was suggested by guybe7, call streamPropagateGroupID
unconditionally,
so that we will normalize entries_read on the replicas. In the past, we
would
only set propagate_last_id when NOACK was specified. And in #9127,
XCLAIM did
not propagate entries_read in ACK, which would cause entries_read to be
inconsistent between master and replicas.

Another approach is add another arg to XCLAIM and let it propagate
entries_read,
but we decided not to use it. Because we want minimal damage in case
there's an
old target and new source (in the worst case scenario, the new source
doesn't
recognize XGROUP SETID ... ENTRIES READ and the lag is lost. If we
change XCLAIM,
the damage is much more severe).

In this patch, now if the user uses XREADGROUP .. COUNT 1 there will be
an additional
overhead of MULTI, EXEC and XGROUPSETID. We assume the extra command in
case of
COUNT 1 (4x factor, changing from one XCLAIM to
MULTI+XCLAIM+XSETID+EXEC), is probably
ok since reading just one entry is in any case very inefficient (a
client round trip
per record), so we're hoping it's not a common case.

Issue was introduced in #9127.
2024-02-29 09:48:20 +02:00
zhaozhao.zz
cc9fbd270e
freeDictIfNeeded when kvstoreEmpty (#13098)
just like `kvstoreDictDelete`, we need check `freeDictIfNeeded` when
`kvstoreEmpty`.
2024-02-29 08:16:41 +02:00
Binbin
a7abc2f067
SCRIPT FLUSH run truly async, close lua interpreter in bio (#13087)
Even if we have SCRIPT FLUSH ASYNC now, when there are a lot of
lua scripts, SCRIPT FLUSH ASYNC will still block the main thread.
This is because lua_close is executed in the main thread, and lua
heap needs to release a lot of memory.

In this PR, we take the current lua instance on lctx.lua and call
lua_close on it in a background thread, to close it in async way.
This is MeirShpilraien's idea.
2024-02-28 17:57:29 +02:00
LiiNen
763827c981
Fix redis-cli --count (for --scan, --bigkeys, etc) was ignored unless --pattern was also used (#13092)
The --count option for redis-cli has been released in redis 7.2.
https://github.com/redis/redis/pull/12042
But I have found in code, that some logic was missing for using this
'count' option.

```
static redisReply *sendScan(unsigned long long *it) {
    redisReply *reply;

    if (config.pattern)
        reply = redisCommand(context, "SCAN %llu MATCH %b COUNT %d",
            *it, config.pattern, sdslen(config.pattern), config.count);
    else
        reply = redisCommand(context,"SCAN %llu",*it);
```

The intention was being able to using scan count.
But in this case, the --count will be only applied when 'pattern' is
declared.
So, I had fix it simply, to be worked properly - even if --pattern
option is not being used.

I tested it simply with time() command several times, and I could see it
works as intended with this commit.
The examples of test results are below:
```
# unstable build

time(./redis-cli -a $AUTH -p $PORT -h $HOST --scan >/dev/null 2>/dev/null)

real    0m1.287s
user    0m0.011s
sys     0m0.022s

# count is not applied
time(./redis-cli -a $AUTH -p $PORT -h $HOST --scan --count 1000 >/dev/null 2>/dev/null)

real    0m1.117s
user    0m0.011s
sys     0m0.020s

# count is applied with --pattern

time(./redis-cli -a $AUTH -p $PORT -h $HOST --scan --count 1000 --pattern "hash:*" >/dev/null 2>/dev/null)

real    0m0.045s
user    0m0.002s
sys     0m0.002s
```

```
# fix-redis-cli-scan-count build
time(./redis-cli -a $AUTH -p $PORT -h $HOST --scan >/dev/null 2>/dev/null)

real    0m1.084s
user    0m0.008s
sys     0m0.024s

# count is applied even if --pattern is not declared
time(./redis-cli -a $AUTH -p $PORT -h $HOST --scan --count 1000 >/dev/null 2>/dev/null)

real    0m0.043s
user    0m0.000s
sys     0m0.004s

# of course this also applied
time(./redis-cli -a $AUTH -p $PORT -h $HOST --scan --count 1000 --pattern "hash:*" >/dev/null 2>/dev/null)

real    0m0.031s
user    0m0.002s
sys     0m0.002s
```



Thanks a lot.
2024-02-28 09:44:30 +02:00
Yanqi Lv
0a12f380e8
Optimize DEL on expired keys (#13080)
If we call `DEL` on expired keys, keys may be deleted in
`expireIfNeeded` and we don't need to call `dbSyncDelete` or
`dbAsyncDelete` after, which repeat the deletion process(i.e. find keys
in main db).

In this PR, I refine the return values of `expireIfNeeded` to indicate
whether we have deleted the expired key to avoid the potential redundant
deletion logic in `delGenericCommand`. Besides, because both KEY_EXPIRED
and KEY_DELETED are non-zero, this PR won't affect other functions
calling `expireIfNeeded`.

I also make a performance test. I first close active expiration by
`debug set-active-expire 0` and write 1 million keys with 1ms TTL. Then
I repeatedly delete 100 expired keys in one `DEL`. The results are as
follow, which shows that this PR can improve performance by about 10% in
this situation.
**unstable**
```
Summary:
  throughput summary: 10080.65 requests per second
  latency summary (msec):
          avg       min       p50       p95       p99       max
        0.953     0.136     0.959     1.215     1.335     2.247
```

**This PR**
```
Summary:			
  throughput summary: 11074.20 requests per second			
  latency summary (msec):			
          avg       min       p50       p95       p99       max			
        0.865     0.128     0.879     1.055     1.175     2.159			
```

---------

Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
Co-authored-by: Oran Agra <oran@redislabs.com>
2024-02-26 12:50:04 +02:00
Binbin
104b207602
Fix size stat in malloc(0) and cleanups around zmalloc file (#13068)
In #8554, we added a MALLOC_MIN_SIZE to use a minimum allocation
size when using malloc(0). However, we did not update the size,
when malloc_size is missing.

When malloc_size exists, we record the size that was allocated
instead of the size that was requested. This would work with both
jemalloc, and libc malloc (the change in #8554, doesn't break this).

When malloc_size is missing, we allocate extra size_t bytes and
store the requested size in it. In that case, the requested size
is probably different than the allocated size anyway (the change
in #8554 doesn't conceptually change that).

So we have room for improvement since in this case we are aware
of the extra bytes we asked for. Same as we're also aware of the
extra size_t bytes we asked for.

In addition, some cleaning was done:
1. fixes some outupdated comments.
2. test cleanups
2024-02-26 12:07:06 +02:00
Binbin
bfcaa7db0a
Fix minor memory leak in rewriteSetObject (#13086)
It seems to be a leak caused by code refactoring in #11290.
it's a small leak, that only happens if there's an IO error.
2024-02-22 14:46:56 +02:00
debing.sun
4a265554ae
Expose lua os.clock() api (#12971)
Implement #12699

This PR exposing Lua os.clock() api for getting the elapsed time of Lua
code execution.

Using:
```lua
local start = os.clock()
...
do something
...
local elpased = os.clock() - start
```

---------

Co-authored-by: Meir Shpilraien (Spielrein) <meir@redis.com>
Co-authored-by: Madelyn Olson <34459052+madolson@users.noreply.github.com>
2024-02-22 11:29:52 +02:00
debing.sun
165afc5f2a
Determine the large limit of the quicklist node based on fill (#12659)
Following #12568

In issue #9357, when inserting an element larger than 1GB, we currently
store it in a plain node instead of a listpack.
Presently, when we insert an element that exceeds the maximum size of a
packed node, it cannot be accommodated in any other nodes, thus ending
up isolated like a large element.
I.e. it's a node with only one element, but it's listpack encoded rather
than a plain buffer.

This PR lowers the threshold for considering an element as 'large' from
1GB to the maximum size of a node.
While this change doesn't completely resolve the bug mentioned in the
previous PR, it does mitigate its potential impact.

As a result of this change, we can now only use LSET to replace an
element with another element that falls below the maximum size
threshold.
In the worst-case scenario, with a fill of -5, the largest packed node
we can create is 2GB (32k * 64k):
* 32k: The smallest element in a listpack is 2 bytes, which allows us to
store up to 32k elements.
* 64k: This is the maximum size for a single quicklist node.

## Others
To fully fix #9357, we need more work, as discussed in #12568, when we
insert an element into a quicklistNode, it may be created in a new node,
put into another node, or merged, and we can't correctly delete the node
that was supposed to be deleted.
I'm not sure it's worth it, since it involves a lot of modifications.
2024-02-22 10:02:38 +02:00
guybe7
820a4e45f1
Edit the history field of xinfo-consumers (#13078)
Now it matches the information in xinfo-stream.json
2024-02-22 09:44:29 +02:00
Binbin
5b9fc46523
Add new allocator.muzzy field to memory-stats reply schema (#13076)
This field was added in #12996 but forgot to add it in json file.
This also causes reply-schemas-validator to fail.
2024-02-21 08:35:10 +02:00
debing.sun
f6785df663
Defragger improvements around large bins (#12996)
Implement #12963

## Changes
1. large bins don't have external fragmentation or are at least
non-defraggable, so we should ignore the effect of
large bins when measuring fragmentation, and only measure fragmentation
of small bins. this affects both the allocator_frag* metrics and also
the active-defrag trigger
2. Adding INFO metrics for `muzzy` memory, which is memory returned to
the OS but still shows as RSS until the OS reclaims it.

---------

Co-authored-by: Oran Agra <oran@redislabs.com>
2024-02-20 18:11:09 +02:00
Binbin
ca5cac998e
xinfo-stream add minimum to seen-time, skip logreqres in fuzzer (#13056)
Recently I saw in CI that reply-schemas-validator fails here:
```
Failed validating 'minimum' in schema[1]['properties']['groups']['items']['properties']['consumers']['items']['properties']['active-time']:
    {'description': 'Last time this consumer was active (successful '
                    'reading/claiming).',
     'minimum': 0,
     'type': 'integer'}

On instance['groups'][0]['consumers'][0]['active-time']:
    -1729380548878722639
```

The reason is that in fuzzer, we may restore corrupted active-time,
which will cause the reply schema CI to fail.

The fuzzer can cause corrupt the state in many places, which will
bugs that mess up the reply, so we decided to skip logreqres.

Also, seen-time is the same type as active-time, adding the minimum.

---------

Co-authored-by: Oran Agra <oran@redislabs.com>
2024-02-20 12:21:10 +02:00
Binbin
3c2ea1ea95
Fix wathced client test timing issue caused by late close (#13062)
There is a timing issue in the test, close may arrive late, or in
freeClientAsync we will free the client in async way, which will
lead to errors in watching_clients statistics, since we will only
unwatch all keys when we truly freeClient.

Add a wait here to avoid this problem. Also fixed some outdated
comments i saw. The test was introduced in #12966.
2024-02-20 11:12:19 +02:00
Binbin
4e3be944fc
Fix timing issue in blockedclient test (#13071)
We can see that the past time here happens to be busy_time_limit,
causing the test to fail:
```
[err]: RM_Call from blocked client in tests/unit/moduleapi/blockedclient.tcl
Expected '50' to be more than '50' (context: type eval line 26 cmd {assert_morethan [expr [clock clicks -milliseconds]-$start] $busy_time_limit} proc ::test)
```

It is reasonable for them to be equal, so equal is added here.
It should be noted that in the previous `Busy module command` test,
we also used assert_morethan_equal, so this should have been missed
at the time.
2024-02-20 08:43:13 +02:00
judeng
fc3a68d8fb
add -fno-omit-frame-pointer to default complication flags (#12973)
Currently redis uses O3 level optimization would remove the frame pointer
in the target bin.

In the very old past, when gcc optimized at O1 and above levels, the
frame pointer is deleted by default to improve performance. This saves
the RBP registers and reduces the pop/push instructions. But it makes it
difficult for us to observe the running status of the program. For
example, the perf tool cannot be used effectively, especially the modern
eBPF tools such as bcc/memleak.
2024-02-19 11:47:02 -08:00
guybe7
6df42df291
Adds a README to the command JSON files (#13066)
Add readme about the command json folder, what it does, and who should
(not) use it.
see discussion
https://github.com/redis/redis/issues/9359#issuecomment-1936420698

---------

Co-authored-by: Oran Agra <oran@redislabs.com>
Co-authored-by: Binbin <binloveplay1314@qq.com>
2024-02-19 18:49:31 +02:00
zhaozhao.zz
8876d264ac
Calculate the incremental rehash time more precisely (#13063)
In the `databasesCron()`, the time consumed by
`kvstoreIncrementallyRehash()` is used to calculate the exit condition.
However, within `kvstoreIncrementallyRehash()`, the loop first checks
for timeout before performing rehashing. Therefore, the time for the
last rehash isn't accounted for, making the consumed time inaccurate. We
need to precisely calculate all the time spent on rehashing.
Additionally, the time allocated to `kvstoreIncrementallyRehash()`
should be the remaining time, which is
`INCREMENTAL_REHASHING_THRESHOLD_US` minus the already consumed
`elapsed_us`.
2024-02-19 14:29:54 +02:00
Binbin
9103ccc398
AOF_FSYNC_EVERYSEC higher resolution, change aof_last_fsync and aof_flush_postponed_start to use mstime (#13041)
Currently aof_last_fsync is using a low resolution unixtime is really
bad,
it checks if the absolute number of (full) seconds changed by one.
depending on which side of the second barrier it falls, we can get very
different results.

This PR change the resolution to use milliseconds instead of complete
seconds.

In cases where the event loop cycle duration is short and their rapid
(e.g. running
many fast commands with short pipeline, or a high `hz` config), this
change will not
make much difference, since in anyway, we'll be quick to detect that
we're on a "new
second", and it's likely that these fsync will always be executed close
to the second
switch barrier.

But in cases of rare or slow event loops cycles (e.g. either slow
commands, or very
low rate of traffic to redis, and low `hz`), it could easily be that
with the old code,
in some cases we'll have over 1.5 seconds between fsyncs, and in others
less than 0.5.

see discussion in #8612

This PR also handle aof_flush_postponed_start as well, the damage there
is smaller
since the threshold is 2 seconds, and not 1.

---------

Co-authored-by: Oran Agra <oran@redislabs.com>
2024-02-18 12:08:29 +02:00
Binbin
dd92dd8fb5
redis-cli - fix sscanf incorrect return-value check warnings (#13059)
From CodeQL: The result of scanf is only checked against 0, but
it can also return EOF.

Reported in https://github.com/redis/redis/security/code-scanning/38.
Reported in https://github.com/redis/redis/security/code-scanning/39.
2024-02-18 10:55:11 +02:00
zhaozhao.zz
50d6fe8c4b
Add metrics for WATCH (#12966)
Redis has some special commands that mark the client's state, such as
`subscribe` and `blpop`, which mark the client as `CLIENT_PUBSUB` or
`CLIENT_BLOCKED`, and we have metrics for the special use cases.

However, there are also other special commands, like `WATCH`, which
although do not have a specific flags, and should also be considered
stateful client types. For stateful clients, in many scenarios, the
connections cannot be shared in "connection pool", meaning connection
pool cannot be used. For example, whenever the `WATCH` command is
executed, a new connection is required to put the client into the "watch
state" because the watched keys are stored in the client.

If different business logic requires watching different keys, separate
connections must be used; otherwise, there will be contamination. This
also means that if a user's business heavily relies on the `WATCH`
command, a large number of connections will be required.

Recently we have encountered this situation in our platform, where some
users consume a significant number of connections when using Redis
because of `WATCH`.

I hope we can have a way to observe these special use cases and special
client connections. Here I add a few monitoring metrics:

1. `watching_clients` in `INFO` reply: The number of clients currently
in the "watching" state.
2. `total_watched_keys` in `INFO` reply: The total number of keys being
watched.
3. `watch` in `CLIENT LIST` reply: The number of keys each client is
currently watching.
2024-02-18 10:36:41 +02:00
Binbin
c854873746
Minor optimization in kvstoreDictAddRaw when dict exists (#13054)
Usually, the probability that a dict exists is much greater than the
probability that it does not exist. In kvstoreDictAddRaw, we will call
kvstoreGetDict multiple times. Based on this assumption, we change
createDictIfNeeded to something like get or create function:
```
before:
dict exist: 2 kvstoreGetDict
dict non-exist: 2 kvstoreGetDict

after:
dict exist: 1 kvstoreGetDict
dict non-exist: 3 kvstoreGetDict
```

A possible 3% performance improvement was observed:

In addition, some typos/comments i saw have been cleaned up.
2024-02-15 18:07:24 +02:00
Binbin
063de675e0
zunionInterDiffGenericCommand use ztrycalloc to avoid OOM panic (#13052)
In low memory situations, sending a big number of arguments (sets)
may cause OOM panic. Use ztrycalloc, like we do on LCS and XAUTOCLAIM,
and fail gracefully.

This change affects the following commands: ZUNION, ZINTER, ZDIFF,
ZUNIONSTORE, ZINTERSTORE, ZDIFFSTORE, ZINTERCARD.
2024-02-15 10:49:10 +02:00
Binbin
32f44da510
Increase tolerance range to block reprocess tests to avoid timing issues (#13053)
These tests have all failed in daily CI:
```
*** [err]: Blocking XREADGROUP for stream key that has clients blocked on stream - reprocessing command in tests/unit/type/stream-cgroups.tcl
Expected '1101' to be between to '1000' and '1100' (context: type eval line 23 cmd {assert_range [expr $end-$start] 1000 1100} proc ::test)

*** [err]: BLPOP unblock but the key is expired and then block again - reprocessing command in tests/unit/type/list.tcl
Expected '1101' to be between to '1000' and '1100' (context: type eval line 23 cmd {assert_range [expr $end-$start] 1000 1100} proc ::test)

*** [err]: BZPOPMIN unblock but the key is expired and then block again - reprocessing command in tests/unit/type/zset.tcl
Expected '1103' to be between to '1000' and '1100' (context: type eval line 23 cmd {assert_range [expr $end-$start] 1000 1100} proc ::test)
```

Increase the range to avoid failures, and improve the comment to be
clearer.
tests was introduced in #13004.
2024-02-15 10:44:49 +02:00
Sankar
c1d2ac2a73
Do not include gossip about receiver in cluster messages (#13046)
The receiver does not update any of its cluster state based on gossip
about itself. This commit explicitly avoids sending or processing gossip
about the receiver.

Currently cluster bus gossips include 10% of nodes in the cluster with a
minimum of 3 nodes. For up to 30 node clusters, this commit makes sure
that 1/3 of the gossip (1 out of 3 gossips) is never discarded. This
should help with relatively faster convergence of cluster state in
general.
2024-02-13 16:38:37 -08:00
YaacovHazan
e9c795e777
Fix loading rdb opcode RDB_OPCODE_RESIZEDB (#13050)
Following the changes introduced by 8cd62f82c, the dbExpandExpires used
the db_size instead of expires_size.

Co-authored-by: YaacovHazan <yaacov.hazan@redislabs.com>
2024-02-12 21:55:37 +02:00
YaacovHazan
7ca0b84af6
Fix loading rdb opcode RDB_OPCODE_SLOT_INFO (#13049)
Following the changes introduced by 8cd62f82c, the kvstoreDictExpand for
the expires kvstore used the slot_size instead of expires_slot_size.

Co-authored-by: YaacovHazan <yaacov.hazan@redislabs.com>
2024-02-12 21:46:06 +02:00
Binbin
8eeece4ab3
Fix CLIENAT KILL MAXAGE test timing issue (#13047)
This test fails occasionally:
```
*** [err]: CLIENT KILL maxAGE will kill old clients in tests/unit/introspection.tcl
Expected 2 == 1 (context: type eval line 14 cmd {assert {$res == 1}} proc ::test)
```

This test is very likely to do a false positive if the execute time
takes longer than the max age, for example, if the execution time
between sleep and kill exceeds 1s, rd2 will also be killed due to
the max age.

The test can adjust the order of execution statements to increase
the probability of passing, but this is still will be a timing issue
in some slow machines, so decided give it a few more chances.

The test was introduced in #12299.
2024-02-12 08:11:33 +02:00
debing.sun
676f27acb0
Fix the failure of defrag test under 32-bit (#13013)
Fail CI:
https://github.com/redis/redis/actions/runs/7837608438/job/21387609715

## Why defragment tests only failed under 32-bit

First of all, under 32-bit jemalloc will allocate more small bins and
less large bins, which will also lead to more external fragmentation,
therefore, the fragmentation ratio is higher in 32-bit than in 64-bit,
so the defragment tests(`Active defrag eval scripts: cluster` and
`Active defrag big keys: cluster`) always fails in 32-bit.

## Why defragment tests only failed with cluster
The fowllowing is the result of `Active defrag eval scripts: cluster`
test.

1) Before #11695, the fragmentation ratio is 3.11%.

2) After #11695, the fragmentation ratio grew to 4.58%.
Since we are using per-slot dictionary to manage slots, we will only
defragment the contents of these dictionaries (keys, values), but not
the dictionaries' struct and ht_table, which means that frequent
shrinking and expanding of the dictionaries, will make more fragments.

3) After #12850 and #12948, In cluster mode, a large number of cluster
slot dicts will be shrunk, creating additional fragmention, and the
dictionary will not be defragged.

## Solution
* Add defragmentation of the per-slot dictionary's own structures, dict
struct and ht_table.

## Other change
* Increase floating point print precision of `frags` and `rss` in debug
logs for defrag

---------

Co-authored-by: Oran Agra <oran@redislabs.com>
2024-02-11 15:12:42 +02:00
Binbin
493e31e3ad
Add new DEBUG dict-resizing command to disable the dict resize (#13043)
The test fails here and there:
```
*** [err]: expire scan should skip dictionaries with lot's of empty buckets in tests/unit/expire.tcl
scan didn't handle slot skipping logic.
```

There are two case:
1. In the case of passing the test, we use child process to avoid the
dict resize, but it can not completely limit it, since in the dictDelete
we still have chance to trigger the resize (hit the force radio). The
reason why our test passed before is because the expire dict is still
in the rehashing process, so the dictDelete, the dictShrinkIfNeeded can
not trigger the resize.

2. In the case of failing the test, the expire dict finished the
rehashing,
so the last dictDelete, the dictShrinkIfNeeded trigger the dict resize
since it hit the force radio, so the skipping logic fail.

This PR add a new DEBUG command to disbale the dict resize.
2024-02-08 16:39:58 +02:00
Binbin
813327b231
Fix SORT STORE quicklist with the right options (#13042)
We forgot to call quicklistSetOptions after createQuicklistObject,
in the sort store scenario, we will create a quicklist with default
fill or compress options.

This PR adds fill and depth parameters to createQuicklistObject to
specify that options need to be set after creating a quicklist.

This closes #12871.

release notes:
> Fix lists created by SORT STORE to respect list compression and
packing configs.
2024-02-08 14:36:11 +02:00
debing.sun
1e8dc1da0d
Fix crash due to merge of quicklist node introduced by #12955 (#13040)
Fix two crash introducted by #12955

When a quicklist node can't be inserted and split, we eventually merge
the current node with its neighboring
nodes after inserting, and compress the current node and its siblings.

1. When the current node is merged with another node, the current node
may become invalid and can no longer be used.

   Solution: let `_quicklistMergeNodes()` return the merged nodes.

3. If the current node is a LZF quicklist node, its recompress will be
1. If the split node can be merged with a sibling node to become head or
tail, recompress may cause the head and tail to be compressed, which is
not allowed.

    Solution: always recompress to 0 after merging.
2024-02-08 14:29:16 +02:00
Binbin
81666a6510
Fix heap-use-after-free when pubsubshard_channels became NULL (#13038)
After fix for #13033, address sanitizer reports this heap-use-after-free
error. When the pubsubshard_channels dict becomes empty, we will delete
the dict, and the dictReleaseIterator will call dictResetIterator, it
will use the dict so we will trigger the error.

This PR introduced a new struct kvstoreDictIterator to wrap
dictIterator.
Replace the original dict iterator with the new kvstore dict iterator.

---------

Co-authored-by: Oran Agra <oran@redislabs.com>
Co-authored-by: guybe7 <guy.benoish@redislabs.com>
2024-02-07 14:53:50 +02:00
Binbin
886b117031
Fix dict don't rehash when there is child test (#13035)
The reason is the same as #13016. The reason is that in #12819,
in cron, in addition to trying to shrink, we will also tyring
to expand. The dict was expanded by cron before we trigger the
bgsave since we do have the enough keys (4096) to hit the radio.

Before the bgsave, we only add 4095 keys to avoid this issue.
2024-02-07 09:19:18 +02:00
debing.sun
1f00c951c2
Prevent LSET command from causing quicklist plain node size to exceed 4GB (#12955)
Fix #12864

The main reason for this crash is that when replacing a element of a
quicklist packed node with lpReplace() method,
if the final size is larger than 4GB, lpReplace() will fail and returns
NULL, causing `node->entry` to be incorrectly set to NULL.

Since the inserted data is not a large element, we can't just replace it
like a large element, first quicklistInsertAfter()
and then quicklistDelIndex(), because the current node may be merged and
invalidated in quicklistInsertAfter().

The solution of this PR:
When replacing a node fails (listpack exceeds 4GB), split the current
node, create a new node to put in the middle, and try to merge them.
This is the same as inserting a large element.
In the worst case, its size will not exceed 4GB.
2024-02-06 18:21:28 +02:00
Gann
0777dc7896
Improve error handling in connSocketBlockingConnect for various connction failures (#13008)
This commit addresses a problem in connSocketBlockingConnect where
different types of connection failures, including timeouts and other
errors, were not consistently handled. Previously, the function did not
return C_ERR immediately after detecting a connection failure, which
could lead to inconsistent states and misinterpretation of the
connection status.

With this update, connSocketBlockingConnect now correctly returns C_ERR
upon encountering any connection error, ensuring that all types of
connection failures are handled consistently and the behavior of the
function aligns with expected outcomes in case of connection issues.

Closes #12900
2024-02-06 14:31:08 +02:00
Binbin
8096515432
Fix invalid dictNext usage when pubsubshard_channels became empty (#13033)
After #12822, when pubsubshard_channels became empty, kvstoreDictDelete
will delete the dict (which is the only one currently deleting dicts
that
become empty) and in the next loop, we will make an invalid call to
dictNext.

After the dict becomes empty, we break out of the loop without calling
dictNext.
2024-02-06 13:41:02 +02:00
Binbin
13bd3643c2
Re-compute active_defrag_running after adjusting defrag configurations (#13020)
Currently, once active defrag starts, we can not adjust
active_defrag_running
downwards. This is because active_defrag_running will be dynamically
compute
based on the fragmentation, we think we should not lower the effort when
the
fragmentation drops.

However, we need to note that active_defrag_running will also be
dynamically
computed based on configurations. In this case, we are not respecting
cycle-min
or cycle-max. Some people may realize halfway through that defrag
consumes a
lot and want to adjust it.

Previously we could only turn off activedefrag and then turn it on again
to
adjust active_defrag_running downwards. So in this PR, when a active
defrag
configuration change is made, we will re-compute it.

These configuration items are:
- active-defrag-cycle-min
- active-defrag-cycle-max
- active-defrag-threshold-upper
2024-02-06 13:39:07 +02:00
Binbin
87eaf119cd
Minor optimization for expire dict in defragKey (#13027)
Since now a DB in cluster mode is divided into 16384 dicts, here
we directly check kvstoreDictSize instead of kvstoreSize, which
may have a higher probability that we can save the lookup.

The other change is a cleanup, obviously kvstoreGetHash should be
applied to the db->expires dicts.
2024-02-06 12:19:44 +02:00
Binbin
84fd745d65
Fix kvstore unable to push resize_cursor for resize when dict is NULL (#13031)
When the dict is NULL, we also need to push resize_cursor, otherwise it
will keep doing useless continue here, and there is no way to resize the
other dict behind it.

Introduced in #12822.

---------

Co-authored-by: Oran Agra <oran@redislabs.com>
2024-02-06 09:41:14 +02:00
guybe7
8cd62f82ca
Refactor the per-slot dict-array db.c into a new kvstore data structure (#12822)
# Description
Gather most of the scattered `redisDb`-related code from the per-slot
dict PR (#11695) and turn it to a new data structure, `kvstore`. i.e.
it's a class that represents an array of dictionaries.

# Motivation
The main motivation is code cleanliness, the idea of using an array of
dictionaries is very well-suited to becoming a self-contained data
structure.
This allowed cleaning some ugly code, among others: loops that run twice
on the main dict and expires dict, and duplicate code for allocating and
releasing this data structure.

# Notes
1. This PR reverts the part of https://github.com/redis/redis/pull/12848
where the `rehashing` list is global (handling rehashing `dict`s is
under the responsibility of `kvstore`, and should not be managed by the
server)
2. This PR also replaces the type of `server.pubsubshard_channels` from
`dict**` to `kvstore` (original PR:
https://github.com/redis/redis/pull/12804). After that was done,
server.pubsub_channels was also chosen to be a `kvstore` (with only one
`dict`, which seems odd) just to make the code cleaner by making it the
same type as `server.pubsubshard_channels`, see
`pubsubtype.serverPubSubChannels`
3. the keys and expires kvstores are currenlty configured to allocate
the individual dicts only when the first key is added (unlike before, in
which they allocated them in advance), but they won't release them when
the last key is deleted.

Worth mentioning that due to the recent change the reply of DEBUG
HTSTATS changed, in case no keys were ever added to the db.

before:
```
127.0.0.1:6379> DEBUG htstats 9
[Dictionary HT]
Hash table 0 stats (main hash table):
No stats available for empty dictionaries
[Expires HT]
Hash table 0 stats (main hash table):
No stats available for empty dictionaries
```

after:
```
127.0.0.1:6379> DEBUG htstats 9
[Dictionary HT]
[Expires HT]
```
2024-02-05 17:21:35 +02:00
Binbin
f20774eced
Fix active expire timeout when db done the scanning (#13030)
When db->expires_cursor==0, it means the DB is done the scanning,
we should exit the loop to avoid the useless scanning.

It is easy to see the active expire timeout in the modified test,
for example, let's assume that there is only 1 expired key in the
DB, and the size / buckets ratio is less than 1%, which means that
we will skip it in isExpiryDictValidForSamplingCb, and the return
value of expires_cursor is 0.

Because `data.sampled == 0` is always true, so `repeat` is also
always true, we will keep scanning the DB, but every time it is
skipped by the previous judgment (expires_cursor = 0), until the
timelimit is finally exhausted.
2024-02-05 16:56:46 +02:00
Daz
02a87885e6
Add missing structural API changes to JSON file (#12434)
The JSON file lacks the following structural API changes:

- GEORADIUSBYMEMBER: add the ANY option for COUNT since 6.2.0.
- GEORADIUSBYMEMBER_RO: add the ANY option for COUNT since 6.2.0.
- GEORADIUS_RO: Added support for uppercase unit names since 7.0.0.
- GEORADIUSBYMEMBER_RO: Added support for uppercase unit names since
7.0.0.

---------

Signed-off-by: daz-3ux <daz-3ux@proton.me>
Co-authored-by: bodong.ybd <bodong.ybd@alibaba-inc.com>
Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
Co-authored-by: yangpengda.333 <yangpengda.333@bytedance.com>
Co-authored-by: Oran Agra <oran@redislabs.com>
2024-02-04 08:42:15 +02:00
Yanqi Lv
c1041c2c0d
Make db->avg_ttl more precise (#12949)
Currently, We compute `db->avg_ttl` after each short `dbScan` sweep (a
few buckets without checking the time limit). But after each `dbScan`
sweep, we don't have much data and this makes the db->avg_ttl less
precise. For example, even if we scan the whole db, we can't get the
exact avg_ttl because we separate the data.
i.e. because of the running average, if we issue 16 calls to scan, we'll
give lower weight to the first one, and higher weight to the last one.
I think we should calculate `db->avg_ttl` until completing more of the
db iteration (judgement of time limit or the beginning of iterating next
db) because we have more sample data in this db and can get more
accurate result. In the best case, if we scan the whole db, we can get
the exact avg_ttl.

In this PR, we postpone the avg_ttl calculation until the judgement of
time limit or iteration of next db, so we can accumulate more data to
get more precise avg_ttl.
Note that we still need to make sure to decay the old TTLs at the same
speed as before, which is why we want to run the decay mechanism several
times, or use the Pow formula, see the comment in the code.

In my experiment, this PR can improve 89% or 52% accuracy in different
workload.

Co-authored-by: Oran Agra <oran@redislabs.com>
2024-02-04 08:34:26 +02:00
Yanqi Lv
62153b3b2f
Refine the purpose of rdb saving with accurate flags (#12925)
In Redis, rdb is produced in three scenarios mainly.

- backup, such as `bgsave` and `save` command
- full sync in replication
- aof rewrite if `aof-use-rdb-preamble` is yes

We also have some RDB flags to identify the purpose of rdb saving.
```C
/* flags on the purpose of rdb save or load */
#define RDBFLAGS_NONE 0                 /* No special RDB loading. */
#define RDBFLAGS_AOF_PREAMBLE (1<<0)    /* Load/save the RDB as AOF preamble. */
#define RDBFLAGS_REPLICATION (1<<1)     /* Load/save for SYNC. */
```

But currently, it seems that these flags and purposes of rdb saving
don't exactly match. I find it in `rdbSaveRioWithEOFMark` which calls
`startSaving` with `RDBFLAGS_REPLICATION` but `rdbSaveRio` with
`RDBFLAGS_NONE`.
```C
int rdbSaveRioWithEOFMark(int req, rio *rdb, int *error, rdbSaveInfo *rsi) {
    char eofmark[RDB_EOF_MARK_SIZE];

    startSaving(RDBFLAGS_REPLICATION);
    getRandomHexChars(eofmark,RDB_EOF_MARK_SIZE);
    if (error) *error = 0;
    if (rioWrite(rdb,"$EOF:",5) == 0) goto werr;
    if (rioWrite(rdb,eofmark,RDB_EOF_MARK_SIZE) == 0) goto werr;
    if (rioWrite(rdb,"\r\n",2) == 0) goto werr;
    if (rdbSaveRio(req,rdb,error,RDBFLAGS_NONE,rsi) == C_ERR) goto werr;
    if (rioWrite(rdb,eofmark,RDB_EOF_MARK_SIZE) == 0) goto werr;
    stopSaving(1);
    return C_OK;

werr: /* Write error. */
    /* Set 'error' only if not already set by rdbSaveRio() call. */
    if (error && *error == 0) *error = errno;
    stopSaving(0);
    return C_ERR;
}
```

In this PR, I refine the purpose of rdb saving with accurate flags.
2024-02-01 13:41:02 +02:00
Binbin
9a7d311855
Fix dict resize allow test (#13016)
Ci report this failure:
```
*** [err]: Don't rehash if used memory exceeds maxmemory after rehash in tests/unit/maxmemory.tcl
Expected '4098' to equal or match '4002'

WARNING: the new maxmemory value set via CONFIG SET (1176088) is smaller than the current memory usage (1231083)
```

It can be seen from the log that used_memory changed before we set
maxmemory.
The reason is that in #12819, in cron, in addition to trying to shrink,
we will
also tyring to expand. The dict was expanded by cron before we set
maxmemory,
causing the test to fail.

Before setting maxmemory, we only add 4095 keys to avoid triggering
resize.
2024-01-31 13:11:52 +02:00
Binbin
6016973ac0
Fix module assertion crash when timer and timeout are unlocked in the same event loop (#13015)
When we use a timer to unblock a client in module, if the timer
period and the block timeout are very close, they will unblock the
client in the same event loop, and it will trigger the assertion.
The reason is that in moduleBlockedClientTimedOut we will protect
against re-processing, so we don't actually call updateStatsOnUnblock
(see #12817), so we are not able to reset the c->duration. 

The reason is unblockClientOnTimeout() didn't realize that bc had
been unblocked. We add a function to the module to determine if bc
is blocked, and then use it in unblockClientOnTimeout() to exit.

There is the stack:
```
beforeSleep
blockedBeforeSleep
handleBlockedClientsTimeout
checkBlockedClientTimeout
unblockClientOnTimeout
unblockClient
resetClient
-- assertion, crash the server
'c->duration == 0' is not true
```
2024-01-31 13:10:19 +02:00
Binbin
74a6e48a3d
Fix module unblock crash due to no timeout_callback (#13017)
The block timeout is passed in the test case, but we do not pass
in the timeout_callback, and it will crash when unlocking. In this
case, in moduleBlockedClientTimedOut we will check timeout_callback.
There is the stack:
```
beforeSleep
blockedBeforeSleep
handleBlockedClientsTimeout
checkBlockedClientTimeout
unblockClientOnTimeout
replyToBlockedClientTimedOut
moduleBlockedClientTimedOut
-- timeout_callback is NULL, invalidFunctionWasCalled
bc->timeout_callback(&ctx,(void**)c->argv,c->argc);
```
2024-01-31 09:28:50 +02:00
Chen Tianjie
f469dd8ca6
Add novalues option to command HSCAN. (#12765)
Add a way to HSCAN a hash key, and get only the filed names.
Command syntax is now:
```
HSCAN key cursor [MATCH pattern] [COUNT count] [NOVALUES]
```
when `NOVALUES` is on, the command will only return keys in the hash.

---------

Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
2024-01-30 20:32:58 +02:00
Slava Koyfman
24f6d08b3f
Implement CLIENT KILL MAXAGE <maxage> (#12299)
Adds an ability to kill clients older than a specified age.

Also, fixed the age calculation in `catClientInfoString` to use
`commandTimeSnapshot`
instead of the old `server.unixtime`, and added missing documentation
for
`CLIENT KILL ID` to output of `CLIENT help`.

---------

Co-authored-by: Oran Agra <oran@redislabs.com>
2024-01-30 20:24:36 +02:00
Oran Agra
7c9f41b52b
fix dict rehash tests introduced by #12802 broken by #12819 (#13009)
tests consistently fail on timeout (sleep that's too short).
it now takes more time because in #12819 we iterate on all dicts, not
just non-empty ones.
it passed the PR's CI because it skips the `slow` tag, which might have
been misplaced, but now it is probably required.
with the fix, the tests take quite a lot of time:
```
[ok]: Redis can trigger resizing (1860 ms)
[ok]: Redis can rewind and trigger smaller slot resizing (744 ms)
```
before #12819:
```
[ok]: Redis can trigger resizing (309 ms)
[ok]: Redis can rewind and trigger smaller slot resizing (295 ms)
```

failure:
https://github.com/redis/redis/actions/runs/7704158180/job/20995931735
```
*** [err]: expire scan should skip dictionaries with lot's of empty buckets in tests/unit/expire.tcl
scan didn't handle slot skipping logic.
*** [err]: Redis can trigger resizing in tests/unit/other.tcl
Expected '[Dictionary HT]
Hash table 0 stats (main hash table):
 table size: 128
 number of elements: 5
[Expires HT]
Hash table 0 stats (main hash table):
No stats available for empty dictionaries
' to match '*table size: 8*' (context: type eval line 29 cmd {assert_match "*table size: 8*" [r debug HTSTATS 0]} proc ::test) 
*** [err]: Redis can rewind and trigger smaller slot resizing in tests/unit/other.tcl
Expected '[Dictionary HT]
Hash table 0 stats (main hash table):
 table size: 256
 number of elements: 10
[Expires HT]
Hash table 0 stats (main hash table):
No stats available for empty dictionaries
' to match '*table size: 16*' (context: type eval line 27 cmd {assert_match "*table size: 16*" [r debug HTSTATS 0]} proc ::test) 
```
2024-01-30 14:32:38 +02:00
Binbin
45a35a79c7
Fix timeout not being set in module blockClient case (#13011)
This was introduced in #13004, missing this assignment.
It causes timeout to be a random value (may be less than now),
and then in `Unblock by timer` test, the client is unblocked
and then it call timeout_callback, since the callback is NULL,
the server will crash.

The crash stack is:
```
beforesleep
handleBlockedClientsTimeout
checkBlockedClientTimeout
unblockClientOnTimeout
replyToBlockedClientTimedOut
moduleBlockedClientTimedOut
-- the timeout_callback is NULL, invalidFunctionWasCalled
bc->timeout_callback(&ctx,(void**)c->argv,c->argc);
```
2024-01-30 14:32:17 +02:00
Binbin
76adbf6ff0
Adds connection timeout option to redis-cli (#10609)
This allows specifying the timeout value for opening the TCP
connection to a server. The timeout, default 0 means no limit,
depending on the OS. It can be specified using the new `-t` switch.

revive #3764, fixes #3763

---------

Co-authored-by: Itamar Haber <itamar@redislabs.com>
Co-authored-by: yoav-steinberg <yoav@redislabs.com>
2024-01-30 13:43:39 +02:00
Binbin
492021db95
Fix blocking commands timeout is reset due to re-processing command (#13004)
In #11012, we will reprocess command when client is unblocked on keys,
in some blocking commands, for example, in the XREADGROUP BLOCK
scenario,
because of the re-processing command, we will recalculate the block
timeout,
causing the blocking time to be reset.

This commit add a new CLIENT_REPROCESSING_COMMAND clent flag, explicitly
let the command know that it is being re-processed, later in
blockForKeys
we will not reset the timeout.

Affected BLOCK cases: 
- list / zset / stream, added test cases for each.

Unaffected cases:
- module (never re-process the commands).
- WAIT / WAITAOF (never re-process the commands).

Fixes #12998.
2024-01-30 11:32:59 +02:00
Chen Tianjie
af7ceeb765
Optimize resizing hash table to resize not only non-empty dicts. (#12819)
The function `tryResizeHashTables` only attempts to shrink the dicts
that has keys (change from #11695), this was a serious problem until the
change in #12850 since it meant if all keys are deleted, we won't shrink
the dick.
But still, both dictShrink and dictExpand may be blocked by a fork child
process, therefore, the cron job needs to perform both dictShrink and
dictExpand, for not just non-empty dicts, but all dicts in DBs.

What this PR does:

1. Try to resize all dicts in DBs (not just non-empty ones, as it was
since #12850)
2. handle both shrink and expand (not just shrink, as it was since
forever)
3. Refactor some APIs about dict resizing (get rid of `htNeedsShrink`
`htNeedsShrink` `dictShrinkToFit`, and expose `dictShrinkIfNeeded`
`dictExpandIfNeeded` which already contains all the code of those
functions we get rid of, to make APIs more neat)
4. In the `Don't rehash if redis has child process` test, now that cron
would do resizing, we no longer need to write to DB after the child
process got killed, and can wait for the cron to expand the hash table.
2024-01-29 21:02:07 +02:00
Ozan Tezcan
c5273cae18
Add RM_TryCalloc() and RM_TryRealloc() (#12985)
Modules may want to handle allocation failures gracefully. Adding
RM_TryCalloc() and RM_TryRealloc() for it.
RM_TryAlloc() was added before:
https://github.com/redis/redis/pull/10541
2024-01-29 20:56:03 +02:00
Binbin
acd9605223
Fix maxmemory-samples stack overflow crash in evictionPoolPopulate, limit its value to [1,64] (#13000)
We have not limited the value of maxmemory-samples in the past, it can
be set very large. If it is set very large, we will have stack overflow
in evictionPoolPopulate when we trigger the key eviction.

There is no reason for this config to be set too high, so just limit its
range to [1,64].
2024-01-29 10:38:52 +02:00
Roshan Khatri
5358bd7cdd
Reduce performance impact of dict rehashing and make it shorter. (#12899)
#### Problem Statement:
For any read/update operation during rehashing, we're doing ~10+ random
DRAM lookups to do the rehashing, as we are using the `rehashidx` to
rehash 10 buckets, whose dict entries most likely aren't cached in the
CPU or near the bucket we are operating on. If these random bucket are
empty, the rehashing process during that command execution is skipped.

#### Implementation:
For reducing the performance recession while dict is rehashing, we
determine the index at which the key would be stored in the 0th HT, we
check if that index has already been rehashed, if not we will rehash the
bucket containing the key and the bucket will be moved from 0th HT to
the 1st HT.

If the key has already been rehashed, we perform the random access
bucket rehash (using `rehashidx`) and we again verify if rehashing is
still ongoing and look up the key in the respective HT.

This ensures rehashing is not skipped in any command call and that we
rehash a particular bucket or random bucket in each call.

#### Changes in this PR:
- Added a new method `dictBucketRehash` to perform rehash on a single
bucket.
- Helper function `moveKeysInBucketOldtoNew` for `dictRehash` and
`dictBucketRehash` to move all the keys in a bucket from the old to the
new hash HT.
- Helper function `verifyMoreRehashRequired` for `dictRehash` and
`dictBucketRehash` to check if we have already rehashed the whole table
and if more rehashing is required.

### Benchmark:
- This PR still shows **~13%** improvement in the latency during
rehashing.

- Rehashing is now **~2%** faster for this PR when compared to unstable.

---------

Co-authored-by: Oran Agra <oran@redislabs.com>
Co-authored-by: Madelyn Olson <34459052+madolson@users.noreply.github.com>
2024-01-27 11:11:53 +02:00
judeng
98881f7558
fix the wrong path in mkreleasehdr.sh (#12993)
The question is introduced in #12799 , the script cannot find the
correct src and deps directories, so it always returns dirty as 0.
2024-01-26 15:01:54 -08:00
Binbin
4cb5ad85a5
Fix unauthenticated client query buffer 1MB limit (#12989)
Code incorrectly set the limit value to 1024MB.
Introduced in #12961.
2024-01-25 14:56:21 +02:00
zhaozhao.zz
85a834bfa2
Revert multi OOM limit and add multi buffer limit (#12961)
Fix #9926 , and introduce an alternative method to prevent abuse of
transactions:

1. revert #5454 (which was blocking read-only transactions in OOM
state), and break the tie of MULTI state memory usage and the server OOM
state. Meaning that we'll limit the total memory a single client can
queue, and do that unconditionally regardless of the server being OOM or
not.
2. to prevent abuse of transactions, we use the
`client-query-buffer-limit` to restrict the size of the transaction.
Because the commands cached in the MULTI/EXEC queue have not been
executed yet, so they are also considered a part of the "query buffer"
in a broader sense. In other words, the commands in the MULTI queue and
the `querybuf` of the client together constitute the "query buffer".
When they exceed the limit, the connection will be disconnected.

The reasoning is that it's sensible to sends a single command with a
huge (1GB) argument, and it's sensible to sends a transaction with many
small commands, but it's probably not common to sends a long transaction
with many huge arguments (will consume a lot of memory before even being
executed).

If anyone runs into that, they can simply increase the
`client-query-buffer-limit` config.

P.S. To prevent DDoS attacks, unauthenticated clients have a separate
hard limit. Their query buffer should not exceed a maximum of 1MB. In
other words, if the query buffer of an unauthenticated client exceeds
1MB or the `client-query-buffer-limit` (if it is set to a value smaller
than 1MB,), the connection will be disconnected.
2024-01-25 11:17:39 +02:00
Binbin
07b292af5e
Add sender NULL check in clusterProcessGossipSection invalid_ids case (#12980)
In the following case sender may be unknown, so we need to set up a
NULL check for sender:
```
/* If this is a MEET packet from an unknown node, we still process
 * the gossip section here since we have to trust the sender because
 * of the message type. */
if (!sender && type == CLUSTERMSG_TYPE_MEET)
    clusterProcessGossipSection(hdr,link);
```
2024-01-23 09:45:02 -08:00
Wen Hui
685409139b
Add INCR type command against wrong argument test cases. (#12836)
We have test cases for incr related commands with no key exist and
spaces in key and wrong type of key. However, we dont have test cases
covered for INCRBY INCRBYFLOAT DECRBY INCR DECR HINCRBY HINCRBYFLOAT
ZINCRBY with valid key and invalid value as argument, and float value to
incrby and decrby. So added test cases for the scenarios in incr.tcl.

Thank you!
2024-01-23 15:39:38 +02:00
Binbin
85c31e0cff
Allow running WAITAOF in scripts, remove NOSCRIPT flag (#12977)
In #11568 we removed the NOSCRIPT flag from commands, e.g. removing
NOSCRIPT flag from WAIT. Aiming to allow them in scripts and let them
implicitly behave in the non-blocking way.

This PR remove NOSCRIPT flag from WAITAOF just like WAIT (to be
symmetrical)).
And this PR also add BLOCKING flag for WAIT and WAITAOF.
2024-01-23 15:19:41 +02:00
Binbin
628c0dea1b
Some cleanups around function (#12940)
This PR did some cleanups around function:
- drop the comment about Libraries Ctx, since we do have comment
  in functionsLibCtx, no need to maintain multiple copies.
- remove outdated comment about the dropped Library description.
- remove unused desc and code vars in functionExtractLibMetaData.
- fix engines_nemory typo, changed it to engines_memory.
- remove outdated comment about FUNCTION CREATE and FUNCTION INFO,
  FUNCTION CREATE was renamed to FUNCTION LOAD.
- Check in initServer whether the return of functionsInit is OK.
2024-01-23 14:26:33 +02:00
Oran Agra
f9a0eb60f7
update redis-check-rdb types (#12969)
seems that we forgot to update the array in redis-check rdb.
2024-01-23 11:48:02 +02:00
dependabot[bot]
12fd752443
Bump actions/cache from 3 to 4 (#12978)
Bumps [actions/cache](https://github.com/actions/cache) from 3 to 4.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/actions/cache/releases">actions/cache's
releases</a>.</em></p>
<blockquote>
<h2>v4.0.0</h2>
<h2>What's Changed</h2>
<ul>
<li>Update action to node20 by <a
href="https://github.com/takost"><code>@​takost</code></a> in <a
href="https://redirect.github.com/actions/cache/pull/1284">actions/cache#1284</a></li>
<li>feat: save-always flag by <a
href="https://github.com/to-s"><code>@​to-s</code></a> in <a
href="https://redirect.github.com/actions/cache/pull/1242">actions/cache#1242</a></li>
</ul>
<h2>New Contributors</h2>
<ul>
<li><a href="https://github.com/takost"><code>@​takost</code></a> made
their first contribution in <a
href="https://redirect.github.com/actions/cache/pull/1284">actions/cache#1284</a></li>
<li><a href="https://github.com/to-s"><code>@​to-s</code></a> made their
first contribution in <a
href="https://redirect.github.com/actions/cache/pull/1242">actions/cache#1242</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/actions/cache/compare/v3...v4.0.0">https://github.com/actions/cache/compare/v3...v4.0.0</a></p>
<h2>v3.3.3</h2>
<h2>What's Changed</h2>
<ul>
<li>Cache v3.3.3 by <a
href="https://github.com/robherley"><code>@​robherley</code></a> in <a
href="https://redirect.github.com/actions/cache/pull/1302">actions/cache#1302</a></li>
</ul>
<h2>New Contributors</h2>
<ul>
<li><a href="https://github.com/robherley"><code>@​robherley</code></a>
made their first contribution in <a
href="https://redirect.github.com/actions/cache/pull/1302">actions/cache#1302</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/actions/cache/compare/v3...v3.3.3">https://github.com/actions/cache/compare/v3...v3.3.3</a></p>
<h2>v3.3.2</h2>
<h2>What's Changed</h2>
<ul>
<li>Fixed readme with new segment timeout values by <a
href="https://github.com/kotewar"><code>@​kotewar</code></a> in <a
href="https://redirect.github.com/actions/cache/pull/1133">actions/cache#1133</a></li>
<li>Readme fixes by <a
href="https://github.com/kotewar"><code>@​kotewar</code></a> in <a
href="https://redirect.github.com/actions/cache/pull/1134">actions/cache#1134</a></li>
<li>Updated description of the lookup-only input for main action by <a
href="https://github.com/kotewar"><code>@​kotewar</code></a> in <a
href="https://redirect.github.com/actions/cache/pull/1130">actions/cache#1130</a></li>
<li>Change two new actions mention as quoted text by <a
href="https://github.com/bishal-pdMSFT"><code>@​bishal-pdMSFT</code></a>
in <a
href="https://redirect.github.com/actions/cache/pull/1131">actions/cache#1131</a></li>
<li>Update Cross-OS Caching tips by <a
href="https://github.com/pdotl"><code>@​pdotl</code></a> in <a
href="https://redirect.github.com/actions/cache/pull/1122">actions/cache#1122</a></li>
<li>Bazel example (Take <a
href="https://redirect.github.com/actions/cache/issues/2">#2</a>️⃣) by
<a href="https://github.com/vorburger"><code>@​vorburger</code></a> in
<a
href="https://redirect.github.com/actions/cache/pull/1132">actions/cache#1132</a></li>
<li>Remove actions to add new PRs and issues to a project board by <a
href="https://github.com/jorendorff"><code>@​jorendorff</code></a> in <a
href="https://redirect.github.com/actions/cache/pull/1187">actions/cache#1187</a></li>
<li>Consume latest toolkit and fix dangling promise bug by <a
href="https://github.com/chkimes"><code>@​chkimes</code></a> in <a
href="https://redirect.github.com/actions/cache/pull/1217">actions/cache#1217</a></li>
<li>Bump action version to 3.3.2 by <a
href="https://github.com/bethanyj28"><code>@​bethanyj28</code></a> in <a
href="https://redirect.github.com/actions/cache/pull/1236">actions/cache#1236</a></li>
</ul>
<h2>New Contributors</h2>
<ul>
<li><a href="https://github.com/vorburger"><code>@​vorburger</code></a>
made their first contribution in <a
href="https://redirect.github.com/actions/cache/pull/1132">actions/cache#1132</a></li>
<li><a
href="https://github.com/jorendorff"><code>@​jorendorff</code></a> made
their first contribution in <a
href="https://redirect.github.com/actions/cache/pull/1187">actions/cache#1187</a></li>
<li><a href="https://github.com/chkimes"><code>@​chkimes</code></a> made
their first contribution in <a
href="https://redirect.github.com/actions/cache/pull/1217">actions/cache#1217</a></li>
<li><a
href="https://github.com/bethanyj28"><code>@​bethanyj28</code></a> made
their first contribution in <a
href="https://redirect.github.com/actions/cache/pull/1236">actions/cache#1236</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/actions/cache/compare/v3...v3.3.2">https://github.com/actions/cache/compare/v3...v3.3.2</a></p>
<h2>v3.3.1</h2>
<h2>What's Changed</h2>
<ul>
<li>Reduced download segment size to 128 MB and timeout to 10 minutes by
<a href="https://github.com/kotewar"><code>@​kotewar</code></a> in <a
href="https://redirect.github.com/actions/cache/pull/1129">actions/cache#1129</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/actions/cache/compare/v3...v3.3.1">https://github.com/actions/cache/compare/v3...v3.3.1</a></p>
<h2>v3.3.0</h2>
<h2>What's Changed</h2>
<ul>
<li>Bug: Permission is missing in cache delete example by <a
href="https://github.com/kotokaze"><code>@​kotokaze</code></a> in <a
href="https://redirect.github.com/actions/cache/pull/1123">actions/cache#1123</a></li>
</ul>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/actions/cache/blob/main/RELEASES.md">actions/cache's
changelog</a>.</em></p>
<blockquote>
<h1>Releases</h1>
<h3>3.0.0</h3>
<ul>
<li>Updated minimum runner version support from node 12 -&gt; node
16</li>
</ul>
<h3>3.0.1</h3>
<ul>
<li>Added support for caching from GHES 3.5.</li>
<li>Fixed download issue for files &gt; 2GB during restore.</li>
</ul>
<h3>3.0.2</h3>
<ul>
<li>Added support for dynamic cache size cap on GHES.</li>
</ul>
<h3>3.0.3</h3>
<ul>
<li>Fixed avoiding empty cache save when no files are available for
caching. (<a
href="https://redirect.github.com/actions/cache/issues/624">issue</a>)</li>
</ul>
<h3>3.0.4</h3>
<ul>
<li>Fixed tar creation error while trying to create tar with path as
<code>~/</code> home folder on <code>ubuntu-latest</code>. (<a
href="https://redirect.github.com/actions/cache/issues/689">issue</a>)</li>
</ul>
<h3>3.0.5</h3>
<ul>
<li>Removed error handling by consuming actions/cache 3.0 toolkit, Now
cache server error handling will be done by toolkit. (<a
href="https://redirect.github.com/actions/cache/pull/834">PR</a>)</li>
</ul>
<h3>3.0.6</h3>
<ul>
<li>Fixed <a
href="https://redirect.github.com/actions/cache/issues/809">#809</a> -
zstd -d: no such file or directory error</li>
<li>Fixed <a
href="https://redirect.github.com/actions/cache/issues/833">#833</a> -
cache doesn't work with github workspace directory</li>
</ul>
<h3>3.0.7</h3>
<ul>
<li>Fixed <a
href="https://redirect.github.com/actions/cache/issues/810">#810</a> -
download stuck issue. A new timeout is introduced in the download
process to abort the download if it gets stuck and doesn't finish within
an hour.</li>
</ul>
<h3>3.0.8</h3>
<ul>
<li>Fix zstd not working for windows on gnu tar in issues <a
href="https://redirect.github.com/actions/cache/issues/888">#888</a> and
<a
href="https://redirect.github.com/actions/cache/issues/891">#891</a>.</li>
<li>Allowing users to provide a custom timeout as input for aborting
download of a cache segment using an environment variable
<code>SEGMENT_DOWNLOAD_TIMEOUT_MINS</code>. Default is 60 minutes.</li>
</ul>
<h3>3.0.9</h3>
<ul>
<li>Enhanced the warning message for cache unavailablity in case of
GHES.</li>
</ul>
<h3>3.0.10</h3>
<ul>
<li>Fix a bug with sorting inputs.</li>
<li>Update definition for restore-keys in README.md</li>
</ul>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="13aacd865c"><code>13aacd8</code></a>
Merge pull request <a
href="https://redirect.github.com/actions/cache/issues/1242">#1242</a>
from to-s/main</li>
<li><a
href="53b35c5439"><code>53b35c5</code></a>
Merge branch 'main' into main</li>
<li><a
href="65b8989fab"><code>65b8989</code></a>
Merge pull request <a
href="https://redirect.github.com/actions/cache/issues/1284">#1284</a>
from takost/update-to-node-20</li>
<li><a
href="d0be34d544"><code>d0be34d</code></a>
Fix dist</li>
<li><a
href="66cf064d47"><code>66cf064</code></a>
Merge branch 'main' into update-to-node-20</li>
<li><a
href="1326563738"><code>1326563</code></a>
Merge branch 'main' into main</li>
<li><a
href="e71876755e"><code>e718767</code></a>
Fix format</li>
<li><a
href="01229828ff"><code>0122982</code></a>
Apply workaround for earlyExit</li>
<li><a
href="3185ecfd61"><code>3185ecf</code></a>
Update &quot;only-&quot; actions to node20</li>
<li><a
href="25618a0a67"><code>25618a0</code></a>
Bump version</li>
<li>Additional commits viewable in <a
href="https://github.com/actions/cache/compare/v3...v4">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=actions/cache&package-manager=github_actions&previous-version=3&new-version=4)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-01-23 11:09:49 +02:00
Harkrishn Patro
2bce71b5ff
Exit early if slowlog/acllog max len set to zero (#12965)
Currently slowlog gets disabled if slowlog-log-slower-than is set to less than zero. I think we should also disable it if slowlog-max-len is set to zero. We apply the same logic to acllog-max-len.
2024-01-22 16:01:04 -08:00
Brennan
e12f2decc1
Prevent nodes with invalid IDs from being propagated through gossip (#12921)
There have been occasional instances of memory corruption (though code bugs or bit flips) leading to invalid node information being gossiped around. To prevent this invalid information spreading, we verify the node IDs in received gossip are in an acceptable format, and disregard any gossiped nodes with invalid IDs. This PR uses the existing verifyClusterNodeId function to check the validity of the gossiped node IDs and if an invalid one is encountered, logs raw byte information to help debug the corruption.

---------

Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>
2024-01-22 11:25:43 -08:00
zhaozhao.zz
8d0156eb18
Set the correct id for tempDb (#12947)
background: some modules need to know the `dbid` information, such as
the function used during RDB loading:

```
robj *rdbLoadObject(int rdbtype, rio *rdb, sds key, int dbid, int *error) {
....
        moduleInitIOContext(io,mt,rdb,&keyobj,dbid);
```

However, during replication, the "tempDb" created for diskless RDB
loading is not correctly set with the dbid. This leads to passing the
wrong dbid to the `rdbLoadObject` function (as tempDb uses zcalloc, all
ids are 0).

```
disklessLoadInitTempDb()->rdbLoadRioWithLoadingCtx()->
        /* Read value */
        val = rdbLoadObject(type,rdb,key,db->id,&error);
```

To fix it, set the correct ID (relative index) for the tempdb.
2024-01-22 11:47:51 +08:00
Yanqi Lv
85a239b363
Change dictGetSafeIterator to dictGetIterator in pubsub (#12931)
In #12838, we misuse the safe iterator of the client dict, so we can't
catch the synchronous release of the client if there is a bug.

Since we realize that clients (even subscribers) are released with async
free, we change the safe iterators of the client dict into unsafe
iterators in `pubsub.c`. And I also remove redundant code.
2024-01-19 17:03:20 +02:00
Yanqi Lv
b07174afc2
Change the threshold of dict expand, shrink and rehash (#12948)
Before this change (most recently modified in
https://github.com/redis/redis/pull/12850#discussion_r1421406393), The
trigger for normal expand threshold was 100% utilization and the trigger
for normal shrink threshold was 10% (HASHTABLE_MIN_FILL).
While during fork (DICT_RESIZE_AVOID), when we want to avoid rehash, the
trigger thresholds were multiplied by 5 (`dict_force_resize_ratio`),
meaning 500% for expand and 2% (100/10/5) for shrink.

However, in `dictRehash` (the incremental rehashing), the rehashing
threshold for shrinking during fork (DICT_RESIZE_AVOID) was 20% by
mistake.
This meant that if a shrinking is triggered when `dict_can_resize` is
`DICT_RESIZE_ENABLE` which the threshold is 10%, the rehashing can
continue when `dict_can_resize` is `DICT_RESIZE_AVOID`.
This would cause unwanted CopyOnWrite damage.

It'll make sense to change the thresholds of the rehash trigger and the
thresholds of the incremental rehashing the same, however, in one we
compare the size of the hash table to the number of records, and in the
other we compare the size of ht[0] to the size of ht[1], so the formula
is not exactly the same.

to make things easier we change all the thresholds to powers of 2, so
the normal shrinking threshold is changed from 100/10 (i.e. 10%) to
100/8 (i.e. 12.5%), and we change the threshold during forks from 5 to
4, i.e. from 500% to 400% for expand, and from 2% (100/10/5) to 3.125%
(100/8/4)
2024-01-19 17:00:43 +02:00
debing.sun
d0640029dc
Fix race condition issues between the main thread and module threads (#12817)
Fix #12785 and other race condition issues.
See the following isolated comments.

The following report was obtained using SANITIZER thread.
```sh
make SANITIZER=thread
./runtest-moduleapi --config io-threads 4 --config io-threads-do-reads yes --accurate
```

1. Fixed thread-safe issue in RM_UnblockClient()
Related discussion:
https://github.com/redis/redis/pull/12817#issuecomment-1831181220
* When blocking a client in a module using `RM_BlockClientOnKeys()` or
`RM_BlockClientOnKeysWithFlags()`
with a timeout_callback, calling RM_UnblockClient() in module threads
can lead to race conditions
     in `updateStatsOnUnblock()`.

     - Introduced: 
        Version: 6.2
        PR: #7491

     - Touch:
`server.stat_numcommands`, `cmd->latency_histogram`, `server.slowlog`,
and `server.latency_events`
     
     - Harm Level: High
Potentially corrupts the memory data of `cmd->latency_histogram`,
`server.slowlog`, and `server.latency_events`

     - Solution:
Differentiate whether the call to moduleBlockedClientTimedOut() comes
from the module or the main thread.
Since we can't know if RM_UnblockClient() comes from module threads, we
always assume it does and
let `updateStatsOnUnblock()` asynchronously update the unblock status.
     
* When error reply is called in timeout_callback(), ctx is not
thread-safe, eventually lead to race conditions in `afterErrorReply`.

     - Introduced: 
        Version: 6.2
        PR: #8217

     - Touch
       `server.stat_total_error_replies`, `server.errors`, 

     - Harm Level: High
       Potentially corrupts the memory data of `server.errors`
   
      - Solution: 
Make the ctx in `timeout_callback()` with `REDISMODULE_CTX_THREAD_SAFE`,
and asynchronously reply errors to the client.

2. Made RM_Reply*() family API thread-safe
Related discussion:
https://github.com/redis/redis/pull/12817#discussion_r1408707239
Call chain: `RM_Reply*()` -> `_addReplyToBufferOrList()` -> touch
server.current_client

    - Introduced: 
       Version: 7.2.0
       PR: #12326

   - Harm Level: None
Since the module fake client won't have the `CLIENT_PUSHING` flag, even
if we touch server.current_client,
     we can still exit after `c->flags & CLIENT_PUSHING`.

   - Solution
      Checking `c->flags & CLIENT_PUSHING` earlier.

3. Made freeClient() thread-safe
    Fix #12785

    - Introduced: 
       Version: 4.0
Commit:
3fcf959e60

    - Harm Level: Moderate
       * Trigger assertion
It happens when the module thread calls freeClient while the io-thread
is in progress,
which just triggers an assertion, and doesn't make any race condiaions.

* Touch `server.current_client`, `server.stat_clients_type_memory`, and
`clientMemUsageBucket->clients`.
It happens between the main thread and the module threads, may cause
data corruption.
1. Error reset `server.current_client` to NULL, but theoretically this
won't happen,
because the module has already reset `server.current_client` to old
value before entering freeClient.
2. corrupts `clientMemUsageBucket->clients` in
updateClientMemUsageAndBucket().
3. Causes server.stat_clients_type_memory memory statistics to be
inaccurate.
    
    - Solution:
* No longer counts memory usage on fake clients, to avoid updating
`server.stat_clients_type_memory` in freeClient.
* No longer resetting `server.current_client` in unlinkClient, because
the fake client won't be evicted or disconnected in the mid of the
process.
* Judgment assertion `io_threads_op == IO_THREADS_OP_IDLE` only if c is
not a fake client.

4. Fixed free client args without GIL
Related discussion:
https://github.com/redis/redis/pull/12817#discussion_r1408706695
When freeing retained strings in the module thread (refcount decr), or
using them in some way (refcount incr), we should do so while holding
the GIL,
otherwise, they might be simultaneously freed while the main thread is
processing the unblock client state.

    - Introduced: 
       Version: 6.2.0
       PR: #8141

   - Harm Level: Low
     Trigger assertion or double free or memory leak. 

   - Solution:
Documenting that module API users need to ensure any access to these
retained strings is done with the GIL locked

5. Fix adding fake client to server.clients_pending_write
    It will incorrectly log the memory usage for the fake client.
Related discussion:
https://github.com/redis/redis/pull/12817#issuecomment-1851899163

    - Introduced: 
       Version: 4.0
Commit:
9b01b64430

    - Harm Level: None
      Only result in NOP

    - Solution:
       * Don't add fake client into server.clients_pending_write
* Add c->conn assertion for updateClientMemUsageAndBucket() and
updateClientMemoryUsage() to avoid same
         issue in the future.
So now it will be the responsibility of the caller of both of them to
avoid passing in fake client.

6. Fix calling RM_BlockedClientMeasureTimeStart() and
RM_BlockedClientMeasureTimeEnd() without GIL
    - Introduced: 
       Version: 6.2
       PR: #7491

   - Harm Level: Low
Causes inaccuracies in command latency histogram and slow logs, but does
not corrupt memory.

   - Solution:
Module API users, if know that non-thread-safe APIs will be used in
multi-threading, need to take responsibility for protecting them with
their own locks instead of the GIL, as using the GIL is too expensive.

### Other issue
1. RM_Yield is not thread-safe, fixed via #12905.

### Summarize
1. Fix thread-safe issues for `RM_UnblockClient()`, `freeClient()` and
`RM_Yield`, potentially preventing memory corruption, data disorder, or
assertion.
2. Updated docs and module test to clarify module API users'
responsibility for locking non-thread-safe APIs in multi-threading, such
as RM_BlockedClientMeasureTimeStart/End(), RM_FreeString(),
RM_RetainString(), and RM_HoldString().

### About backpot to 7.2
1. The implement of (1) is not too satisfying, would like to get more
eyes.
2. (2), (3) can be safely for backport
3. (4), (6) just modifying the module tests and updating the
documentation, no need for a backpot.
4. (5) is harmless, no need for a backpot.

---------

Co-authored-by: Oran Agra <oran@redislabs.com>
2024-01-19 15:12:49 +02:00
Chen Tianjie
f81c3fd89e
Optimize dictTypeResizeAllowed to avoid mistaken OOM judgement. (#12950)
When doing dict resizing, dictTypeResizeAllowed is used to judge whether
the new allocated memory for rehashing would cause OOM.

However when shrinking, we alloc `_dictNextExp(d->ht_used[0])` bytes of
memory, while in `dictTypeResizeAllowed` we still use
`_dictNextExp(d->ht_used[0]+1)` as the new allocated memory size. This
will overestimate the memory used by shrinking at special conditions,
causing a false OOM judgement.
2024-01-18 16:35:12 +02:00
Binbin
1c7eb0ad37
Fix minor memory leaks in dictTest (#12962)
Introduced in #12952, reported by valgrind.
2024-01-18 16:32:04 +02:00
Binbin
0e5a4a27ea
Call emptyData when disk-based sync rdbLoad fails (#12510)
We doing this in diskless on-empty-db mode, when diskless
loading fails, we will call emptyData to remove the half-loaded
data in case we started with an empty replica.

Now when a disk-based sync rdbLoad fails, we will call emptyData
too in case it loads partially incomplete data.

when the replica attempts another re-sync, it'll empty the dataset
again anyway, so this affects two things:
1. memory consumption in the time gap until the next rdb loading begins
2. if the unsynced replica is for some reason promoted, it would have kept
  the partial dataset instead of being empty.
2024-01-18 16:28:52 +02:00
Binbin
29e6245a05
Fix unexpected resize causing test failure (#12960)
Before #12850, we will only try to shrink the dict in serverCron,
which we can control by using a child process, but now every time
we delete a key, the shrink check will be called.

In these test (added in #12802), we meant to disable the resizing,
but druing the delete, the dict will meet the force shrink, like
2 / 128 = 0.015 < 0.2, the delete will trigger a force resize and
will cause the test to fail.

In this commit, we try to keep the load factor at 3 / 128 = 0.023,
that is, do not meet the force shrink.
2024-01-18 11:19:29 +02:00
Binbin
14b1edfd99
Fix dict resize ratio checks, avoid precision loss from integer division (#12952)
In the past we used integers to compare ratios, let us assume that
we have the following data in expanding:
```
used / size > 5
`80 / 16 > 5` is false
`81 / 16 > 5` is false
`95 / 16 > 5` is false
`96 / 16 > 5` is true
```

Because the integer result is rounded, our resize breaks the ratio
constraint, this has existed since the beginning, which resulted in
us not strictly following the ratio (shrink also has the same issue).

This PR change it to multiplication to avoid floating point
calculations.
2024-01-18 11:16:50 +02:00
Binbin
131d95f203
Fix race in slot dict resize test (#12942)
The test have a race:
```
*** [err]: Redis can rewind and trigger smaller slot resizing in tests/unit/other.tcl
Expected '[Dictionary HT]
Hash table 0 stats (main hash table):
 table size: 12
 number of elements: 2
[Expires HT]
Hash table 0 stats (main hash table):
No stats available for empty dictionaries
' to match '*table size: 8*' (context: type eval line 12 cmd {assert_match "*table size: 8*" [r debug HTSTATS 0]} proc ::test)
```

When `r del "{alice}$j"` is executed in the loop, when the key is
deleted to [9, 12], the load factor has meet HASHTABLE_MIN_FILL,
if serverCron happens to trigger slot dict resize, then the test
will fail. Because there is not way to meet HASHTABLE_MIN_FILL in
the subsequent dels.

The solution is to avoid triggering the resize in advance. We can
use multi to delete them at once, or we can disable the resize.
Since we disabled resize in the previous test, the fix also uses
the method of disabling resize.

The test is introduced in #12802.
2024-01-17 08:46:09 +02:00
Binbin
ecc31bc697
Updated comments on dictResizeEnable for new dict shrink (#12946)
The new shrink was added in #12850.
Also updated outdated comments, see #11692.
2024-01-15 10:28:24 +02:00
Yanqi Lv
e2b7932b34
Shrink dict when deleting dictEntry (#12850)
When we insert entries into dict, it may autonomously expand if needed.
However, when we delete entries from dict, it doesn't shrink to the
proper size. If there are few entries in a very large dict, it may cause
huge waste of memory and inefficiency when iterating.

The main keyspace dicts (keys and expires), are shrinked by cron
(`tryResizeHashTables` calls `htNeedsResize` and `dictResize`),
And some data structures such as zset and hash also do that (call
`htNeedsResize`) right after a loop of calls to `dictDelete`,
But many other dicts are completely missing that call (they can only
expand).

In this PR, we provide the ability to automatically shrink the dict when
deleting. The conditions triggering the shrinking is the same as
`htNeedsResize` used to have. i.e. we expand when we're over 100%
utilization, and shrink when we're below 10% utilization.

Additionally:
* Add `dictPauseAutoResize` so that flows that do mass deletions, will
only trigger shrinkage at the end.
* Rename `dictResize` to `dictShrinkToFit` (same logic as it used to
have, but better name describing it)
* Rename `_dictExpand` to `_dictResize` (same logic as it used to have,
but better name describing it)
 
related to discussion
https://github.com/redis/redis/pull/12819#discussion_r1409293878

---------

Co-authored-by: Oran Agra <oran@redislabs.com>
Co-authored-by: zhaozhao.zz <zhaozhao.zz@alibaba-inc.com>
2024-01-15 08:20:53 +02:00
zhaozhao.zz
bb2b6e2927
fix scripts access wrong slot if they disagree with pre-declared keys (#12906)
Regarding how to obtain the hash slot of a key, there is an optimization
in `getKeySlot()`, it is used to avoid redundant hash calculations for
keys: when the current client is in the process of executing a command,
it can directly use the slot of the current client because the slot to
access has already been calculated in advance in `processCommand()`.

However, scripts are a special case where, in default mode or with
`allow-cross-slot-keys` enabled, they are allowed to access keys beyond
the pre-declared range. This means that the keys they operate on may not
belong to the slot of the pre-declared keys. Currently, when the
commands in a script are executed, the slot of the original client
(i.e., the current client) is not correctly updated, leading to
subsequent access to the wrong slot.

This PR fixes the above issue. When checking the cluster constraints in
a script, the slot to be accessed by the current command is set for the
original client (i.e., the current client). This ensures that
`getKeySlot()` gets the correct slot cache.

Additionally, the following modifications are made:

1. The 'sort' and 'sort_ro' commands use `getKeySlot()` instead of
`c->slot` because the client could be an engine client in a script and
can lead to potential bug.
2. `getKeySlot()` is also used in pubsub to obtain the slot for the
channel, standardizing the way slots are retrieved.
2024-01-15 09:57:12 +08:00
Binbin
284ef21ea0
Fix fd check in memtest_test_linux_anonymous_maps (#12943)
The open function returns a fd on success or -1 on failure,
here we should check fd != -1, otherwise -1 will be judged
as success.

This closes #12938.
2024-01-14 11:18:17 +02:00
Chen Tianjie
87786342a5
Correct bytes_per_key computing. (#12897)
Change the calculation method of bytes_per_key to make it closer to
the true average key size. The calculation method is as follows:

mh->bytes_per_key = mh->total_keys ? (mh->dataset / mh->total_keys) : 0;
2024-01-12 11:58:53 +08:00
Harkrishn Patro
964f4a4576
Avoid double free of cluster link (#12930)
Avoid crash while performing `DEBUG CLUSTERLINK KILL` mutliple times
(cluster link might not be created/valid).
2024-01-11 15:59:22 -08:00
bentotten
b3aaa0a136
When one shard, sole primary node marks potentially failed replica as FAIL instead of PFAIL (#12824)
Fixes issue where a single primary cannot mark a replica as failed in a
single-shard cluster.
2024-01-11 15:48:19 -08:00
Binbin
b351a04b1e
Add announced-endpoints test to all_tests and fix tls related tests (#12927)
The test was introduced in #10745, but we forgot to add it to the
test_helper.tcl, so our CI did not actually run it. This PR adds it
and ensures it passes CI tests.
2024-01-09 18:18:59 -08:00
Oran Agra
f7b1d0287d
Fix possible corruption in sdsResize (CVE-2023-41056) (#12924)
#11766 introduced a bug in sdsResize where it could forget to update the
sds type in the sds header and then cause an overflow in sdsalloc. it
looks like the only implication of that is a possible assertion in HLL,
but it's hard to rule out possible heap corruption issues with
clientsCronResizeQueryBuffer
2024-01-09 13:51:56 +02:00
Madelyn Olson
8bb9a2895e
Address some failures with new tests for improving debug report (#12915)
Fix a daily test failure because alpine doesn't support stack traces and
add in an extra assertion related to making sure the stack trace was
printed twice.
2024-01-08 17:56:06 -08:00
Binbin
14e4a9835a
Fix minor fd leak in rdbSaveToSlavesSockets (#12919)
We should close server.rdb_child_exit_pipe when redisFork fails,
otherwise the pipe fd will be leaked.

Just a cleanup.
2024-01-08 17:36:34 +02:00
Andy Pan
50b8b99763
Re-indent code and reduce code being complied on Solaris for anetKeepAlive (#12914)
This is a follow-up PR for #12782, in which we introduced nested
preprocessor directives for TCP keep-alive on Solaris and added
redundant indentation for code. Besides, it could result in unreachable
code due to the lack of `#else` on the latest Solaris 11.4 where
`TCP_KEEPIDLE`, `TCP_KEEPINTVL`, and `TCP_KEEPCNT` are available. As a
result, this PR does three main things:

- To eliminate the redundant indention for C code in nested preprocessor
directives
- To add `#else` directives and move `TCP_KEEPALIVE_THRESHOLD` +
`TCP_KEEPALIVE_ABORT_THRESHOLD` settings under it, avoid unreachable
code and compiler warnings when `#if defined(TCP_KEEPIDLE) &&
defined(TCP_KEEPINTVL) && defined(TCP_KEEPCNT)` is met on Solaris 11.4
- To remove a few trailing whitespace in comments
2024-01-08 11:12:24 +02:00
Yanqi Lv
c452e414a8
Optimize performance when many clients [p|s]unsubscribe simultaneously (#12838)
I'm testing the performance of Pub/Sub command recently. I find if many
clients unsubscribe or are killed simultaneously, Redis needs a long
time to deal with it.

In my experiment, I set 5000 clients and each client subscribes 100
channels. Then I call `client kill type pubsub` to simulate the
situation where clients unsubscribe all channels at the same time and
calculate the execution time. The result shows that it takes about 23s.
I use the _perf_ and find that `listSearchKey` in
`pubsubUnsubscribeChannel` costs more than 90% cpu time. I think we can
optimize this situation.

In this PR, I replace list with dict to track the clients subscribing
the channel more efficiently. It changes O(N) to O(1) in the search
phase. Then I repeat the experiment as above. The results are as
follows.

|              | Execution Time(s) |used_memory(MB) |
| :---------------- | :------: | :----: |
| unstable(1bd0b54)        |   23.734   | 65.41 |
| optimize-pubsub           |   0.288   | 67.66 |

Thanks for #11595 , I use a no-value dict and the results shows that the
performance improves significantly but the memory usage only increases
slightly.

Notice:

- This PR will cause the performance degradation about 20% in
`[p|s]subscribe` command but won't freeze Redis.
2024-01-08 10:32:31 +02:00
debing.sun
4730563e93
Change destination key's key-spec flag from RW to OW for SINTERSTORE command (#12917)
In #10122, we set the destination key's flag of SINTERSTORE to `RW`, 
however, this command doesn't actually read or modify the destination
key, just overwrites it.
Therefore, we change it to `OW` similarly to all other *STORE commands.
2024-01-08 10:17:13 +02:00
Binbin
5b0c6a8255
Fix CLUSTER SHARDS crash in 7.0/7.2 mixed clusters where shard ids are not sync (#12832)
Crash reported in #12695. In the process of upgrading the cluster from
7.0 to 7.2, because the 7.0 nodes will not gossip shard id, in 7.2 we
will rely on shard id to build the server.cluster->shards dict.

In some cases, for example, the 7.0 master node and the 7.2 replica node.
From the view of 7.2 replica node, the cluster->shards dictionary does not
have its master node. In this case calling CLUSTER SHARDS on the 7.2 replica
node may crash.

We should fix the underlying assumption of updateShardId, which is that the
shard dict should be always in sync with the node's shard_id. The fix was
suggested by PingXie, see more details in #12695.
2024-01-07 20:54:41 -08:00
debing.sun
ca1f67af80
Make RM_Yield thread-safe (#12905)
## Issues and solutions from #12817
1. Touch ProcessingEventsWhileBlocked and calling moduleCount() without
GIL in afterSleep()
    - Introduced: 
       Version: 7.0.0
       PR: #9963

   - Harm Level: Very High
If the module thread calls `RM_Yield()` before the main thread enters
afterSleep(),
and modifies `ProcessingEventsWhileBlocked`(+1), it will cause the main
thread to not wait for GIL,
which can lead to all kinds of unforeseen problems, including memory
data corruption.

   - Initial / Abandoned Solution:
      * Added `__thread` specifier for ProcessingEventsWhileBlocked.
`ProcessingEventsWhileBlocked` is used to protect against nested event
processing, but event processing
in the main thread and module threads should be completely independent
and unaffected, so it is safer
         to use TLS.
* Adding a cached module count to keep track of the current number of
modules, to avoid having to use `dictSize()`.
    
    - Related Warnings:
```
WARNING: ThreadSanitizer: data race (pid=1136)
  Write of size 4 at 0x0001045990c0 by thread T4 (mutexes: write M0):
    #0 processEventsWhileBlocked networking.c:4135 (redis-server:arm64+0x10006d124)
    #1 RM_Yield module.c:2410 (redis-server:arm64+0x10018b66c)
    #2 bg_call_worker <null>:83232836 (blockedclient.so:arm64+0x16a8)

  Previous read of size 4 at 0x0001045990c0 by main thread:
    #0 afterSleep server.c:1861 (redis-server:arm64+0x100024f98)
    #1 aeProcessEvents ae.c:408 (redis-server:arm64+0x10000fd64)
    #2 aeMain ae.c:496 (redis-server:arm64+0x100010f0c)
    #3 main server.c:7220 (redis-server:arm64+0x10003f38c)
```

2. aeApiPoll() is not thread-safe
When using RM_Yield to handle events in a module thread, if the main
thread has not yet
entered `afterSleep()`, both the module thread and the main thread may
touch `server.el` at the same time.

    - Introduced: 
       Version: 7.0.0
       PR: #9963

   - Old / Abandoned Solution:
Adding a new mutex to protect timing between after beforeSleep() and
before afterSleep().
Defect: If the main thread enters the ae loop without any IO events, it
will wait until
the next timeout or until there is any event again, and the module
thread will
always hang until the main thread leaves the event loop.

    - Related Warnings:
```
SUMMARY: ThreadSanitizer: data race ae_kqueue.c:55 in addEventMask
==================
==================
WARNING: ThreadSanitizer: data race (pid=14682)
  Write of size 4 at 0x000100b54000 by thread T9 (mutexes: write M0):
    #0 aeApiPoll ae_kqueue.c:175 (redis-server:arm64+0x100010588)
    #1 aeProcessEvents ae.c:399 (redis-server:arm64+0x10000fb84)
    #2 processEventsWhileBlocked networking.c:4138 (redis-server:arm64+0x10006d3c4)
    #3 RM_Yield module.c:2410 (redis-server:arm64+0x10018b66c)
    #4 bg_call_worker <null>:16042052 (blockedclient.so:arm64+0x169c)

  Previous write of size 4 at 0x000100b54000 by main thread:
    #0 aeApiPoll ae_kqueue.c:175 (redis-server:arm64+0x100010588)
    #1 aeProcessEvents ae.c:399 (redis-server:arm64+0x10000fb84)
    #2 aeMain ae.c:496 (redis-server:arm64+0x100010da8)
    #3 main server.c:7238 (redis-server:arm64+0x10003f51c)
```

## The final fix as the comments:
https://github.com/redis/redis/pull/12817#discussion_r1436427232
Optimized solution based on the above comment:

First, we add `module_gil_acquring` to indicate whether the main thread
is currently in the acquiring GIL state.

When the module thread starts to yield, there are two possibilities(we
assume the caller keeps the GIL):
1. The main thread is in the mid of beforeSleep() and afterSleep(), that
is, `module_gil_acquring` is not 1 now.
At this point, the module thread will wake up the main thread through
the pipe and leave the yield,
waiting for the next yield when the main thread may already in the
acquiring GIL state.
    
2. The main thread is in the acquiring GIL state.
The module thread release the GIL, yielding CPU to give the main thread
an opportunity to start
event processing, and then acquire the GIL again until the main thread
releases it.
This is what
https://github.com/redis/redis/pull/12817#discussion_r1436427232
mentioned direction.

---------

Co-authored-by: Oran Agra <oran@redislabs.com>
2024-01-07 12:10:29 +02:00
Binbin
4cae66f5e8
Use shard-id of the master if the replica does not support shard-id (#12805)
If there are nodes in the cluster that do not support shard-id, they
will gossip shard-id. From the perspective of nodes that support shard-id,
their shard-id is meaningless (since shard-id is randomly generated when
we create a node.)

Nodes that support shard-id will save the shard-id information in nodes.conf.
If the node is restarted according to nodes.conf, the server will report a
corrupted cluster config file error. Because auxShardIdSetter will reject
configurations with inconsistent master-replica shard-ids.

A cluster-wide consensus for the node's shard_id is not necessary. The key
is maintaining consistency of the shard_id on each individual 7.2 node.
As the cluster progressively upgrades to version 7.2, we can expect the
shard_ids across all nodes to naturally converge and align.

In this PR, when processing the gossip, if sender is a replica and does not
support shard-id, set the shard_id to the shard_id of its master.
2024-01-06 20:24:41 -08:00
dependabot[bot]
38f0234946
Bump cross-platform-actions/action from 0.21.1 to 0.22.0 (#12904)
Bumps
[cross-platform-actions/action](https://github.com/cross-platform-actions/action)
from 0.21.1 to 0.22.0.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/cross-platform-actions/action/releases">cross-platform-actions/action's
releases</a>.</em></p>
<blockquote>
<h2>Cross Platform Action 0.22.0</h2>
<h3>Added</h3>
<ul>
<li>
<p>Added support for using the action in multiple steps in the same job
(<a
href="https://redirect.github.com/cross-platform-actions/action/issues/26">#26</a>).
All the inputs need to be the same for all steps, except for the
following
inputs: <code>sync_files</code>, <code>shutdown_vm</code> and
<code>run</code>.</p>
</li>
<li>
<p>Added support for specifying that the VM should not shutdown after
the action
has run. This adds a new input parameter: <code>shutdown_vm</code>. When
set to <code>false</code>,
this will hopefully mitigate very frequent freezing of VM during
teardown (<a
href="https://redirect.github.com/cross-platform-actions/action/issues/61">#61</a>,
<a
href="https://redirect.github.com/cross-platform-actions/action/issues/72">#72</a>).</p>
</li>
</ul>
<h3>Changed</h3>
<ul>
<li>
<p>Always terminate VM instead of shutting down. This is more efficient
and this
will hopefully mitigate very frequent freezing of VM during teardown
(<a
href="https://redirect.github.com/cross-platform-actions/action/issues/61">#61</a>,
<a
href="https://redirect.github.com/cross-platform-actions/action/issues/72">#72</a>).</p>
</li>
<li>
<p>Use <code>unsafe</code> as the cache mode for QEMU disks. This should
improve performance (<a
href="https://redirect.github.com/cross-platform-actions/action/issues/67">#67</a>).</p>
</li>
</ul>
</blockquote>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/cross-platform-actions/action/blob/master/changelog.md">cross-platform-actions/action's
changelog</a>.</em></p>
<blockquote>
<h2>[0.22.0] - 2023-12-27</h2>
<h3>Added</h3>
<ul>
<li>
<p>Added support for using the action in multiple steps in the same job
(<a
href="https://redirect.github.com/cross-platform-actions/action/issues/26">#26</a>).
All the inputs need to be the same for all steps, except for the
following
inputs: <code>sync_files</code>, <code>shutdown_vm</code> and
<code>run</code>.</p>
</li>
<li>
<p>Added support for specifying that the VM should not shutdown after
the action
has run. This adds a new input parameter: <code>shutdown_vm</code>. When
set to <code>false</code>,
this will hopefully mitigate very frequent freezing of VM during
teardown (<a
href="https://redirect.github.com/cross-platform-actions/action/issues/61">#61</a>,
<a
href="https://redirect.github.com/cross-platform-actions/action/issues/72">#72</a>).</p>
</li>
</ul>
<h3>Changed</h3>
<ul>
<li>
<p>Always terminate VM instead of shutting down. This is more efficient
and this
will hopefully mitigate very frequent freezing of VM during teardown
(<a
href="https://redirect.github.com/cross-platform-actions/action/issues/61">#61</a>,
<a
href="https://redirect.github.com/cross-platform-actions/action/issues/72">#72</a>).</p>
</li>
<li>
<p>Use <code>unsafe</code> as the cache mode for QEMU disks. This should
improve performance (<a
href="https://redirect.github.com/cross-platform-actions/action/issues/67">#67</a>).</p>
</li>
</ul>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="5800fa0060"><code>5800fa0</code></a>
Release 0.22.0</li>
<li><a
href="20ad4b2ceb"><code>20ad4b2</code></a>
Fix <a
href="https://redirect.github.com/cross-platform-actions/action/issues/67">#67</a>:
Use <code>unsafe</code> as the cache mode disks</li>
<li><a
href="d9184930c3"><code>d918493</code></a>
Always terminate VM instead of shutting down.</li>
<li><a
href="626f1d6c95"><code>626f1d6</code></a>
Fix error when terminating the VM</li>
<li><a
href="d59f08dc5c"><code>d59f08d</code></a>
Print stack trace for uncaught exceptions</li>
<li><a
href="7f2fab9c56"><code>7f2fab9</code></a>
Revert &quot;Run SSH in verbose mode when debug mode is
enabled&quot;</li>
<li><a
href="0f566c356e"><code>0f566c3</code></a>
[no ci] Update the changelog</li>
<li><a
href="b7f77446bb"><code>b7f7744</code></a>
[no ci] Fix spelling</li>
<li><a
href="9894a9b118"><code>9894a9b</code></a>
Wrap <code>host</code> module in namespace</li>
<li><a
href="87fdd346a2"><code>87fdd34</code></a>
Fix broken test-vm-shutdown tests</li>
<li>Additional commits viewable in <a
href="https://github.com/cross-platform-actions/action/compare/v0.21.1...v0.22.0">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=cross-platform-actions/action&package-manager=github_actions&previous-version=0.21.1&new-version=0.22.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-01-04 22:38:33 +02:00
Lior Kogan
5189838350
Update CONTRIBUTING.md (#12907)
- Referring to Redis Discord channel instead of the mailing list.
- Referring to the licensing instead of repeating it.
2024-01-03 17:21:19 +02:00
Madelyn Olson
068051e378
Handle recursive serverAsserts and provide more information for recursive segfaults (#12857)
This change is trying to make two failure modes a bit easier to deep dive:
1. If a serverPanic or serverAssert occurs during the info (or module)
printing, it will recursively panic, which is a lot of fun as it will
just keep recursively printing. It will eventually stack overflow, but
will generate a lot of text in the process.
2. When a segfault happens during the segfault handler, no information
is communicated other than it happened. This can be problematic because
`info` may help diagnose the real issue, but without fixing the
recursive crash it might be hard to get at that info.
2024-01-02 18:20:22 -08:00
AshMosh
c3f8b542ee
Manage number of new connections per cycle (#12178)
There are situations (especially in TLS) in which the engine gets too occupied managing a large number of new connections. Existing connections may time-out while the server is processing the new connections initial TLS handshakes, which may cause cause new connections to be established, perpetuating the problem. To better manage the tradeoff between new connection rate and other workloads, this change adds a new config to manage maximum number of new connections per event loop cycle, instead of using a predetermined number (currently 1000).

This change introduces two new configurations, max-new-connections-per-cycle and max-new-tls-connections-per-cycle. The default value of the tcp connections is 10 per cycle and the default value of tls connections per cycle is 1.
---------

Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>
2024-01-02 15:15:03 -08:00
Chen Tianjie
9d0158bf89
Reorder signalModifiedKey in xaddCommand. (#12895)
This PR is a supplement to #11144, moving `signalModifiedKey` in
`xaddCommand` after the trimming, to ensure the state of key eventual
consistency. Currently there is no problem with Redis, but it is better
to avoid issues in future development on Redis.
2023-12-28 13:29:27 +02:00
dependabot[bot]
2c5b51ad26
Bump github/codeql-action from 2 to 3 (#12869)
Bumps [github/codeql-action](https://github.com/github/codeql-action)
from 2 to 3.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/github/codeql-action/releases">github/codeql-action's
releases</a>.</em></p>
<blockquote>
<h2>CodeQL Bundle v2.15.4</h2>
<p>Bundles CodeQL CLI v2.15.4</p>
<ul>
<li>(<a
href="https://github.com/github/codeql-cli-binaries/blob/HEAD/CHANGELOG.md">changelog</a>,
<a
href="https://github.com/github/codeql-cli-binaries/releases/tag/v2.15.4">release</a>)</li>
</ul>
<p>Includes the following CodeQL language packs from <a
href="https://github.com/github/codeql/tree/codeql-cli/v2.15.4"><code>github/codeql@codeql-cli/v2.15.4</code></a>:</p>
<ul>
<li><code>codeql/cpp-queries</code> (<a
href="https://github.com/github/codeql/tree/codeql-cli/v2.15.4/cpp/ql/src/CHANGELOG.md">changelog</a>,
<a
href="https://github.com/github/codeql/tree/codeql-cli/v2.15.4/cpp/ql/src">source</a>)</li>
<li><code>codeql/cpp-all</code> (<a
href="https://github.com/github/codeql/tree/codeql-cli/v2.15.4/cpp/ql/lib/CHANGELOG.md">changelog</a>,
<a
href="https://github.com/github/codeql/tree/codeql-cli/v2.15.4/cpp/ql/lib">source</a>)</li>
<li><code>codeql/csharp-queries</code> (<a
href="https://github.com/github/codeql/tree/codeql-cli/v2.15.4/csharp/ql/src/CHANGELOG.md">changelog</a>,
<a
href="https://github.com/github/codeql/tree/codeql-cli/v2.15.4/csharp/ql/src">source</a>)</li>
<li><code>codeql/csharp-all</code> (<a
href="https://github.com/github/codeql/tree/codeql-cli/v2.15.4/csharp/ql/lib/CHANGELOG.md">changelog</a>,
<a
href="https://github.com/github/codeql/tree/codeql-cli/v2.15.4/csharp/ql/lib">source</a>)</li>
<li><code>codeql/go-queries</code> (<a
href="https://github.com/github/codeql/tree/codeql-cli/v2.15.4/go/ql/src/CHANGELOG.md">changelog</a>,
<a
href="https://github.com/github/codeql/tree/codeql-cli/v2.15.4/go/ql/src">source</a>)</li>
<li><code>codeql/go-all</code> (<a
href="https://github.com/github/codeql/tree/codeql-cli/v2.15.4/go/ql/lib/CHANGELOG.md">changelog</a>,
<a
href="https://github.com/github/codeql/tree/codeql-cli/v2.15.4/go/ql/lib">source</a>)</li>
<li><code>codeql/java-queries</code> (<a
href="https://github.com/github/codeql/tree/codeql-cli/v2.15.4/java/ql/src/CHANGELOG.md">changelog</a>,
<a
href="https://github.com/github/codeql/tree/codeql-cli/v2.15.4/java/ql/src">source</a>)</li>
<li><code>codeql/java-all</code> (<a
href="https://github.com/github/codeql/tree/codeql-cli/v2.15.4/java/ql/lib/CHANGELOG.md">changelog</a>,
<a
href="https://github.com/github/codeql/tree/codeql-cli/v2.15.4/java/ql/lib">source</a>)</li>
<li><code>codeql/javascript-queries</code> (<a
href="https://github.com/github/codeql/tree/codeql-cli/v2.15.4/javascript/ql/src/CHANGELOG.md">changelog</a>,
<a
href="https://github.com/github/codeql/tree/codeql-cli/v2.15.4/javascript/ql/src">source</a>)</li>
<li><code>codeql/javascript-all</code> (<a
href="https://github.com/github/codeql/tree/codeql-cli/v2.15.4/javascript/ql/lib/CHANGELOG.md">changelog</a>,
<a
href="https://github.com/github/codeql/tree/codeql-cli/v2.15.4/javascript/ql/lib">source</a>)</li>
<li><code>codeql/python-queries</code> (<a
href="https://github.com/github/codeql/tree/codeql-cli/v2.15.4/python/ql/src/CHANGELOG.md">changelog</a>,
<a
href="https://github.com/github/codeql/tree/codeql-cli/v2.15.4/python/ql/src">source</a>)</li>
<li><code>codeql/python-all</code> (<a
href="https://github.com/github/codeql/tree/codeql-cli/v2.15.4/python/ql/lib/CHANGELOG.md">changelog</a>,
<a
href="https://github.com/github/codeql/tree/codeql-cli/v2.15.4/python/ql/lib">source</a>)</li>
<li><code>codeql/ruby-queries</code> (<a
href="https://github.com/github/codeql/tree/codeql-cli/v2.15.4/ruby/ql/src/CHANGELOG.md">changelog</a>,
<a
href="https://github.com/github/codeql/tree/codeql-cli/v2.15.4/ruby/ql/src">source</a>)</li>
<li><code>codeql/ruby-all</code> (<a
href="https://github.com/github/codeql/tree/codeql-cli/v2.15.4/ruby/ql/lib/CHANGELOG.md">changelog</a>,
<a
href="https://github.com/github/codeql/tree/codeql-cli/v2.15.4/ruby/ql/lib">source</a>)</li>
<li><code>codeql/swift-queries</code> (<a
href="https://github.com/github/codeql/tree/codeql-cli/v2.15.4/swift/ql/src/CHANGELOG.md">changelog</a>,
<a
href="https://github.com/github/codeql/tree/codeql-cli/v2.15.4/swift/ql/src">source</a>)</li>
<li><code>codeql/swift-all</code> (<a
href="https://github.com/github/codeql/tree/codeql-cli/v2.15.4/swift/ql/lib/CHANGELOG.md">changelog</a>,
<a
href="https://github.com/github/codeql/tree/codeql-cli/v2.15.4/swift/ql/lib">source</a>)</li>
</ul>
<h2>CodeQL Bundle</h2>
<p>Bundles CodeQL CLI v2.15.3</p>
<ul>
<li>(<a
href="https://github.com/github/codeql-cli-binaries/blob/HEAD/CHANGELOG.md">changelog</a>,
<a
href="https://github.com/github/codeql-cli-binaries/releases/tag/v2.15.3">release</a>)</li>
</ul>
<p>Includes the following CodeQL language packs from <a
href="https://github.com/github/codeql/tree/codeql-cli/v2.15.3"><code>github/codeql@codeql-cli/v2.15.3</code></a>:</p>
<ul>
<li><code>codeql/cpp-queries</code> (<a
href="https://github.com/github/codeql/tree/codeql-cli/v2.15.3/cpp/ql/src/CHANGELOG.md">changelog</a>,
<a
href="https://github.com/github/codeql/tree/codeql-cli/v2.15.3/cpp/ql/src">source</a>)</li>
<li><code>codeql/cpp-all</code> (<a
href="https://github.com/github/codeql/tree/codeql-cli/v2.15.3/cpp/ql/lib/CHANGELOG.md">changelog</a>,
<a
href="https://github.com/github/codeql/tree/codeql-cli/v2.15.3/cpp/ql/lib">source</a>)</li>
<li><code>codeql/csharp-queries</code> (<a
href="https://github.com/github/codeql/tree/codeql-cli/v2.15.3/csharp/ql/src/CHANGELOG.md">changelog</a>,
<a
href="https://github.com/github/codeql/tree/codeql-cli/v2.15.3/csharp/ql/src">source</a>)</li>
<li><code>codeql/csharp-all</code> (<a
href="https://github.com/github/codeql/tree/codeql-cli/v2.15.3/csharp/ql/lib/CHANGELOG.md">changelog</a>,
<a
href="https://github.com/github/codeql/tree/codeql-cli/v2.15.3/csharp/ql/lib">source</a>)</li>
<li><code>codeql/go-queries</code> (<a
href="https://github.com/github/codeql/tree/codeql-cli/v2.15.3/go/ql/src/CHANGELOG.md">changelog</a>,
<a
href="https://github.com/github/codeql/tree/codeql-cli/v2.15.3/go/ql/src">source</a>)</li>
<li><code>codeql/go-all</code> (<a
href="https://github.com/github/codeql/tree/codeql-cli/v2.15.3/go/ql/lib/CHANGELOG.md">changelog</a>,
<a
href="https://github.com/github/codeql/tree/codeql-cli/v2.15.3/go/ql/lib">source</a>)</li>
<li><code>codeql/java-queries</code> (<a
href="https://github.com/github/codeql/tree/codeql-cli/v2.15.3/java/ql/src/CHANGELOG.md">changelog</a>,
<a
href="https://github.com/github/codeql/tree/codeql-cli/v2.15.3/java/ql/src">source</a>)</li>
<li><code>codeql/java-all</code> (<a
href="https://github.com/github/codeql/tree/codeql-cli/v2.15.3/java/ql/lib/CHANGELOG.md">changelog</a>,
<a
href="https://github.com/github/codeql/tree/codeql-cli/v2.15.3/java/ql/lib">source</a>)</li>
<li><code>codeql/javascript-queries</code> (<a
href="https://github.com/github/codeql/tree/codeql-cli/v2.15.3/javascript/ql/src/CHANGELOG.md">changelog</a>,
<a
href="https://github.com/github/codeql/tree/codeql-cli/v2.15.3/javascript/ql/src">source</a>)</li>
<li><code>codeql/javascript-all</code> (<a
href="https://github.com/github/codeql/tree/codeql-cli/v2.15.3/javascript/ql/lib/CHANGELOG.md">changelog</a>,
<a
href="https://github.com/github/codeql/tree/codeql-cli/v2.15.3/javascript/ql/lib">source</a>)</li>
<li><code>codeql/python-queries</code> (<a
href="https://github.com/github/codeql/tree/codeql-cli/v2.15.3/python/ql/src/CHANGELOG.md">changelog</a>,
<a
href="https://github.com/github/codeql/tree/codeql-cli/v2.15.3/python/ql/src">source</a>)</li>
<li><code>codeql/python-all</code> (<a
href="https://github.com/github/codeql/tree/codeql-cli/v2.15.3/python/ql/lib/CHANGELOG.md">changelog</a>,
<a
href="https://github.com/github/codeql/tree/codeql-cli/v2.15.3/python/ql/lib">source</a>)</li>
<li><code>codeql/ruby-queries</code> (<a
href="https://github.com/github/codeql/tree/codeql-cli/v2.15.3/ruby/ql/src/CHANGELOG.md">changelog</a>,
<a
href="https://github.com/github/codeql/tree/codeql-cli/v2.15.3/ruby/ql/src">source</a>)</li>
<li><code>codeql/ruby-all</code> (<a
href="https://github.com/github/codeql/tree/codeql-cli/v2.15.3/ruby/ql/lib/CHANGELOG.md">changelog</a>,
<a
href="https://github.com/github/codeql/tree/codeql-cli/v2.15.3/ruby/ql/lib">source</a>)</li>
<li><code>codeql/swift-queries</code> (<a
href="https://github.com/github/codeql/tree/codeql-cli/v2.15.3/swift/ql/src/CHANGELOG.md">changelog</a>,
<a
href="https://github.com/github/codeql/tree/codeql-cli/v2.15.3/swift/ql/src">source</a>)</li>
<li><code>codeql/swift-all</code> (<a
href="https://github.com/github/codeql/tree/codeql-cli/v2.15.3/swift/ql/lib/CHANGELOG.md">changelog</a>,
<a
href="https://github.com/github/codeql/tree/codeql-cli/v2.15.3/swift/ql/lib">source</a>)</li>
</ul>
<h2>CodeQL Bundle</h2>
<p>Bundles CodeQL CLI v2.15.2</p>
<ul>
<li>(<a
href="https://github.com/github/codeql-cli-binaries/blob/HEAD/CHANGELOG.md">changelog</a>,
<a
href="https://github.com/github/codeql-cli-binaries/releases/tag/v2.15.2">release</a>)</li>
</ul>
<p>Includes the following CodeQL language packs from <a
href="https://github.com/github/codeql/tree/codeql-cli/v2.15.2"><code>github/codeql@codeql-cli/v2.15.2</code></a>:</p>
<ul>
<li><code>codeql/cpp-queries</code> (<a
href="https://github.com/github/codeql/tree/codeql-cli/v2.15.2/cpp/ql/src/CHANGELOG.md">changelog</a>,
<a
href="https://github.com/github/codeql/tree/codeql-cli/v2.15.2/cpp/ql/src">source</a>)</li>
</ul>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/github/codeql-action/blob/main/CHANGELOG.md">github/codeql-action's
changelog</a>.</em></p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="3a9f6a89e0"><code>3a9f6a8</code></a>
update javascript files</li>
<li><a
href="cc4fead714"><code>cc4fead</code></a>
update version in various hardcoded locations</li>
<li><a
href="183559cea8"><code>183559c</code></a>
Merge branch 'main' into update-bundle/codeql-bundle-v2.15.4</li>
<li><a
href="5b52b36d41"><code>5b52b36</code></a>
reintroduce PR check that confirm action can be still be compiled on
node16</li>
<li><a
href="5b19bef41e"><code>5b19bef</code></a>
change to node20 for all actions</li>
<li><a
href="f2d0c2e7ae"><code>f2d0c2e</code></a>
upgrade node type definitions</li>
<li><a
href="d651fbc494"><code>d651fbc</code></a>
change to node20 for all actions</li>
<li><a
href="382a50a028"><code>382a50a</code></a>
Merge pull request <a
href="https://redirect.github.com/github/codeql-action/issues/2021">#2021</a>
from github/mergeback/v2.22.9-to-main-c0d1daa7</li>
<li><a
href="458b4226ad"><code>458b422</code></a>
Update checked-in dependencies</li>
<li><a
href="5e0f9dbc48"><code>5e0f9db</code></a>
Update changelog and version after v2.22.9</li>
<li>See full diff in <a
href="https://github.com/github/codeql-action/compare/v2...v3">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=github/codeql-action&package-manager=github_actions&previous-version=2&new-version=3)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-12-28 11:32:23 +02:00
guybe7
12b611b374
WAITAOF: Try to wake blocked clients ASAP in the next beforeSleep (#12627)
In case server.fsynced_reploff changed (e.g. flushAppendOnly set it to
server.master_repl_offset in case there was nothing to fsync) we want to
avoid sleeping before the next beforeSleep so we can call
blockedBeforeSleep ASAP.
without that, in case there's no incoming traffic, we could be waiting
for the next cron timer event to wake us up.
2023-12-28 11:27:58 +02:00
Binbin
99c468c38c
Fix crash caused by pubsubShardUnsubscribeAllChannelsInSlot not deleting the client (#12896)
The code does not delete the corresponding node when traversing clients,
resulting in a loop, causing the dictDelete() == DICT_OK assertion to
fail.

In addition, did a cleanup, in the dictCreate scenario, we can avoid a
dictFind call since the dict is empty.

Issue was introduced in #12804.
2023-12-28 08:32:51 +02:00
Binbin
5b1fe925f2
Adjust redis-cli --cluster create arity from -2 to -1 (#12892)
When arity is -2, it allows us to input two nodes, but returns:
```
*** ERROR: Invalid configuration for cluster creation.
*** Redis Cluster requires at least 3 master nodes.
*** This is not possible with 2 nodes and 0 replicas per node.
*** At least 3 nodes are required.
```

When we input one node, it return:
```
[ERR] Wrong number of arguments for specified --cluster sub command
```

Strictly speaking -2 should also be rejected, because redis-cli
requires at least three nodes. However, the error message was not
very friendly, so decided to change it arity -1.

This closes #12891.
2023-12-28 08:26:23 +02:00
Chen Tianjie
8527959598
Replace slots_to_channels radix tree with slot specific dictionaries for shard channels. (#12804)
We have achieved replacing `slots_to_keys` radix tree with key->slot
linked list (#9356), and then replacing the list with slot specific
dictionaries for keys (#11695).

Shard channels behave just like keys in many ways, and we also need a
slots->channels mapping. Currently this is still done by using a radix
tree. So we should split `server.pubsubshard_channels` into 16384 dicts
and drop the radix tree, just like what we did to DBs.

Some benefits (basically the benefits of what we've done to DBs):
1. Optimize counting channels in a slot. This is currently used only in
removing channels in a slot. But this is potentially more useful:
sometimes we need to know how many channels there are in a specific slot
when doing slot migration. Counting is now implemented by traversing the
radix tree, and with this PR it will be as simple as calling `dictSize`,
from O(n) to O(1).
2. The radix tree in the cluster has been removed. The shard channel
names no longer require additional storage, which can save memory.
3. Potentially useful in slot migration, as shard channels are logically
split by slots, thus making it easier to migrate, remove or add as a
whole.
4. Avoid rehashing a big dict when there is a large number of channels.

Drawbacks:
1. Takes more memory than using radix tree when there are relatively few
shard channels.

What this PR does:
1. in cluster mode, split `server.pubsubshard_channels` into 16384
dicts, in standalone mode, still use only one dict.
2. drop the `slots_to_channels` radix tree.
3. to save memory (to solve the drawback above), all 16384 dicts are
created lazily, which means only when a channel is about to be inserted
to the dict will the dict be initialized, and when all channels are
deleted, the dict would delete itself.
5. use `server.shard_channel_count` to keep track of the number of all
shard channels.

---------

Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
2023-12-27 17:40:45 +08:00
Moshe Kaplan
fa751f9bef
config.c: Avoid leaking file handle if file is 0 bytes (#12828)
If fopen() is successful and redis_fstat determines that the file is 0
bytes, the file handle stored in fp will leak. This change closes the
filehandle stored in fp if the file is 0 bytes.

Second attempt at fixing Coverity 390029

This is a follow-up to #12796
2023-12-27 08:53:56 +02:00
sundb
bef5715374
Fix oom-score-adj test due to no permission (#12887)
Fix #12792

On ubuntu 23(lunar), non-root users will not be allowed to change the
oom_score_adj of a process to a value that is too low.
Since terminal's default oom_score_adj is 200, if we run the test on
terminal, we won't be able to set the oom_score_adj of the redis process
to 9 or 22, which is too low.

Reproduction on ubuntu 23(lunar) terminal:
```sh
$ cat /proc/`pgrep redis-server`/oom_score_adj
200
$ echo 100 > /proc/`pgrep redis-server`/oom_score_adj
# success without error
$ echo 99 > /proc/`pgrep redis-server`/oom_score_adj
echo: write error: Permission denied
```

As from the output above, we can only set the minimum oom score of redis
processes to 100.
By modifying the test, make oom_score_adj only increase upwards and not
decrease.

---------

Co-authored-by: debing.sun <debing.sun@redis.com>
2023-12-27 08:42:46 +02:00
Andy Pan
1aa633d61b
Implement TCP Keep-Alives across most Unix-like systems (#12782)
## TCP Keep-Alives

[TCP
Keep-Alives](https://datatracker.ietf.org/doc/html/rfc9293#name-tcp-keep-alives)
provides a way to detect whether a TCP connection is alive or dead,
which can be useful for reducing system resources by cleaning up dead
connections.

There is full support of TCP Keep-Alives on Linux and partial support on
macOS in `redis` at present.

This PR intends to complete the rest.

## Unix-like OS's support

`TCP_KEEPIDLE`, `TCP_KEEPINTVL`, and `TCP_KEEPCNT` are not included in
the POSIX standard for `setsockopts`, while these three socket options
are widely available on most Unix-like systems and Windows.

### References

- [AIX](https://www.ibm.com/support/pages/ibm-aix-tcp-keepalive-probes)
- [DragonflyBSD](https://man.dragonflybsd.org/?command=tcp&section=4)
- [FreeBSD](https://www.freebsd.org/cgi/man.cgi?query=tcp)
-
[HP-UX](https://docstore.mik.ua/manuals/hp-ux/en/B2355-60130/TCP.7P.html)
- [illumos](https://illumos.org/man/4P/tcp)
- [Linux](https://man7.org/linux/man-pages/man7/tcp.7.html)
- [NetBSD](https://man.netbsd.org/NetBSD-8.0/tcp.4)
-
[Windows](https://learn.microsoft.com/en-us/windows/win32/winsock/ipproto-tcp-socket-options)

### Mac OS

In earlier versions, macOS only supported setting `TCP_KEEPALIVE` (the
equivalent of `TCP_KEEPIDLE` on other platforms), but since macOS 10.8
it has supported `TCP_KEEPINTVL` and `TCP_KEEPCNT`.

Check out [this mailing
list](https://lists.apple.com/archives/macnetworkprog/2012/Jul/msg00005.html)
and [the source
code](https://github.com/apple/darwin-xnu/blob/main/bsd/netinet/tcp.h#L215-L230)
for more details.

### Solaris

Solaris claimed it supported the TCP-Alives mechanism, but
`TCP_KEEPIDLE`, `TCP_KEEPINTVL`, and `TCP_KEEPCNT` were not available on
Solaris until the latest version 11.4. Therefore, we need to simulate
the TCP-Alives mechanism on other platforms via
`TCP_KEEPALIVE_THRESHOLD` + `TCP_KEEPALIVE_ABORT_THRESHOLD`.

- [Solaris
11.3](https://docs.oracle.com/cd/E86824_01/html/E54777/tcp-7p.html)
- [Solaris
11.4](https://docs.oracle.com/cd/E88353_01/html/E37851/tcp-4p.html)

---------

Co-authored-by: Oran Agra <oran@redislabs.com>
2023-12-26 18:44:18 +02:00
Jeff Liu
27a8e3b04e
fix missing comments (#12878)
add a missing comment for `dont_compress` and fix the bits calculation
2023-12-25 19:30:05 -08:00
zalj
baf5699d77
fix comment of aeProcessEvents (#12884)
The implementation of aeProcessEvents seems have different behavior from
the top comment.
The implementation process file events first, then process time events.
2023-12-25 18:58:14 -08:00
Binbin
71f31da66f
Add restart option to create-cluster script (#12885)
When testing and debugging the cluster code before, you need
to stop the cluster after making changes, and then start the
cluster again. Add a restart option for ease of use.
2023-12-25 18:36:44 -08:00
Slava Koyfman
20214b26a4
Don't disconnect all clients in ACL LOAD (#12171)
Previous implementation would disconnect _all_ clients when running `ACL
LOAD`, which wasn't very useful.

This change brings the behavior in line with that of `ACL SETUSER`, `ACL
DELUSER`, in that only clients whose user is deleted or clients
subscribed to channels which they no longer have access to will be
disconnected.

---------

Co-authored-by: Oran Agra <oran@redislabs.com>
Co-authored-by: Madelyn Olson <34459052+madolson@users.noreply.github.com>
2023-12-24 11:56:44 +02:00
Binbin
09e0d338f5
redis-cli adds -4 / -6 options to determine IPV4 / IPV6 priority in DNS lookup (#11315)
This PR, we added -4 and -6 options to redis-cli to determine
IPV4 / IPV6 priority in DNS lookup.
This was mentioned in
https://github.com/redis/redis/pull/11151#issuecomment-1231570651

For now it's only used in CLUSTER MEET.

The options also made it possible to reliably test dns lookup in CI,
using this option, we can add some localhost tests for #11151.

The commit was cherry-picked from #11151, back then we decided to split
the PR.

Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
2023-12-24 10:40:34 +02:00
Binbin
23e980e77a
Move cliVersion to cli_common and add --version support for redis-check-aof (#10856)
Let us see which version of redis this tool is part of.
Similarly to redis-cli, redis-benchmark and redis-check-rdb

redis-rdb-check and redis-aof-check are actually symlinks to redis,
so they will directly use getVersion in server, the format became:
```
{title} v={redis_version} sha={sha}:{dirty} malloc={malloc} bits={bits} build={build}
```

Move cliVersion into cli_common, redis-cli and redis-benchmark will
use it, and the format is not change:
```
{title} {redis_version} (git:{sha})
```
2023-12-21 13:51:46 +02:00
Wen Hui
5dc631d880
Add missing test cases for hash commands (#12851)
We dont have test for hgetall against key doesnot exist so added the
test in test suite and along with this, added wrong type cases for other
missing commands.
2023-12-17 14:02:53 +02:00
Binbin
adbb534f03
Always keep an in-memory history of all commands in redis-cli (#12862)
redis-cli avoids saving sensitive commands in it's history (doesn't
persist them to the history file).
this means that if you had a typo and you wanna re-run the command, you
can't easily do that.
This PR changes that to keep an in-memory history of all the redacted
commands, and just
not persist them to disk. This way we would be able to press the up
arrow and
re-try the command freely, and it'll just not survive a redis-cli
restart.
2023-12-15 17:22:02 +02:00
zhaozhao.zz
d8a21c5767
Unified db rehash method for both standalone and cluster (#12848)
After #11695, we added two functions `rehashingStarted` and
`rehashingCompleted` to the dict structure. We also registered two
handlers for the main database's dict and expire structures. This allows
the main database to record the dict in `rehashing` list when rehashing
starts. Later, in `serverCron`, the `incrementallyRehash` function is
continuously called to perform the rehashing operation. However,
currently, when rehashing is completed, `rehashingCompleted` does not
remove the dict from the `rehashing` list. This results in the
`rehashing` list containing many invalid dicts. Although subsequent cron
checks and removes dicts that don't require rehashing, it is still
inefficient.

This PR implements the functionality to remove the dict from the
`rehashing` list in `rehashingCompleted`. This is achieved by adding
`metadata` to the dict structure, which keeps track of its position in
the `rehashing` list, allowing for quick removal. This approach avoids
storing duplicate dicts in the `rehashing` list.

Additionally, there are other modifications:

1. Whether in standalone or cluster mode, the dict in database is
inserted into the rehashing linked list when rehashing starts. This
eliminates the need to distinguish between standalone and cluster mode
in `incrementallyRehash`. The function only needs to focus on the dicts
in the `rehashing` list that require rehashing.
2. `rehashing` list is moved from per-database to Redis server level.
This decouples `incrementallyRehash` from the database ID, and in
standalone mode, there is no need to iterate over all databases,
avoiding unnecessary access to databases that do not require rehashing.
In the future, even if unsharded-cluster mode supports multiple
databases, there will be no risk involved.
3. The insertion and removal operations of dict structures in the
`rehashing` list are decoupled from `activerehashing` config.
`activerehashing` only controls whether `incrementallyRehash` is
executed in serverCron. There is no need for additional steps when
modifying the `activerehashing` switch, as in #12705.
2023-12-15 10:42:53 +08:00
Guillaume Koenig
967fb3c6e8
Extend rax usage by allowing any long long value (#12837)
The raxFind implementation uses a special pointer value (the address of
a static string) as the "not found" value. It works as long as actual
pointers were used. However we've seen usages where long long,
non-pointer values have been used. It creates a risk that one of the
long long value precisely is the address of the special "not found"
value. This commit changes raxFind to return 1 or 0 to indicate
elementhood, and take in a new void **value to optionally return the
associated value.

By extension, this also allow the RedisModule_DictSet/Replace operations
to also safely insert integers instead of just pointers.
2023-12-14 14:50:18 -08:00
Chen Tianjie
e95a5d4831
Support by/get options for sort(_ro) in cluster mode when pattern implies slot. (#12728)
The by/get options of sort/sort_ro command used to be forbidden in
cluster mode, since we are not sure which slot the pattern may be in.

As the optimization done in #12536, patterns now can be mapped to slots,
we should allow by/get options in cluster mode when the pattern maps to
the same slot as the key.
2023-12-13 21:16:36 +02:00
Binbin
3c0fd25201
Redact ACL username information and mark *-key-file-pass configs as sensitive (#12860)
In #11489, we consider acl username to be sensitive information,
and consider the ACL GETUSER a sensitive command and remove it
from redis-cli historyfile.

This PR redact username information in ACL GETUSER and ACL DELUSER
from SLOWLOG, and also remove ACL DELUSER from redis-cli historyfile.

This PR also mark tls-key-file-pass and tls-client-key-file-pass
as sensitive config, will redact it from SLOWLOG and also
remove them from redis-cli historyfile.
2023-12-13 15:28:13 +02:00
Chen Tianjie
f9cc25c1dd
Add metric to INFO CLIENTS: pubsub_clients. (#12849)
In INFO CLIENTS section, we already have blocked_clients and
tracking_clients. We should add a new metric showing the number of
pubsub connections, which helps performance monitoring and trouble
shooting.
2023-12-13 13:44:13 +08:00
Binbin
c85a9b7896
Fix delKeysInSlot server events are not executed inside an execution unit (#12745)
This is a follow-up fix to #12733. We need to apply the same changes to
delKeysInSlot. Refer to #12733 for more details.

This PR contains some other minor cleanups / improvements to the test
suite and docs.
It uses the postnotifications test module in a cluster mode test which
revealed a leak in the test module (fixed).
2023-12-11 20:15:19 +02:00
Binbin
62419c01db
Handle missing fields in dbSwapDatabases and swapMainDbWithTempDb (#12763)
The change in dbSwapDatabases seems harmless. Because in non-clustered
mode, dbBuckets calculations are strictly accurate and in cluster mode,
we only have one DB. Modify it for uniformity (just like resize_cursor).

The change in swapMainDbWithTempDb is needed in case we swap with the
temp db, otherwise the overhead memory usage of db can be miscalculated.

In addition we will swap all fields (including rehashing list), just for
completeness (and reduce the chance of surprises in the future).

Introduced in #12697.
2023-12-10 17:30:20 +02:00
Binbin
e6423b7a7e
Fix rehashingStarted miscalculating bucket_count in dict initialization (#12846)
In the old dictRehashingInfo implementation, for the initialization
scenario,
it mistakenly directly set to_size to DICTHT_SIZE(DICT_HT_INITIAL_EXP),
which
is 4 in our code by default.

In scenarios where dictExpand directly passes the target size as
initialization,
the code will calculate bucket_count incorrectly. For example, in DEBUG
POPULATE
or RDB load scenarios, it will cause the final bucket_count to be
initialized to
65536 (16384 * 4), see:
```
before:
DB 0: 10000000 keys (0 volatile) in 65536 slots HT.

it should be:
DB 0: 10000000 keys (0 volatile) in 16777216 slots HT.
```

In PR, new ht will also be initialized before calling rehashingStarted
in
_dictExpand, so that the calls in dictRehashingInfo can be unified.

Bug was introduced in #12697.
2023-12-10 10:55:30 +02:00
Binbin
a3ae2ed37b
Remove dead code around should_expand_db (#12767)
when dbExpand is called from rdb.c with try_expand set to 0, it will
either panic panic on OOM, or be non-fatal (should not fail RDB loading)

At the same time, the log text has been slightly adjusted to make it
more unified.
2023-12-10 10:40:15 +02:00
Binbin
7410d985bc
Remove overhead.hashtable.slot-to-keys from memory-stats reply_schema (#12784)
overhead.hashtable.slot-to-keys was added in 7.0 in #10017, then removed
in #11695. Now remove it from reply_schema.
2023-12-10 09:46:21 +02:00
Yanqi Lv
15b993f1ef
Optimize dictExpand of empty dict (#12847)
If a dict is empty before `dictExpand`, we don't need to do rehashing
actually. We can just create a new dict and set it as ht_table[0] to
skip incremental rehashing and free the memory quickly.
2023-12-08 17:48:52 +08:00
bentotten
826b39e016
Align server.lastsave and server.rdb_save_time_last by removing multiple calls to time(NULL) (#12823)
This makes sure the various times (server.lastsave and server.rdb_save_time_last) are aligned by using the result of the same time call.
2023-12-07 17:03:51 -08:00
Chen Tianjie
f2d59c4f91
Avoid unnecessary slot computing in KEYS command. (#12843)
If not in cluster mode, there is no need to compute slot.

A bit optimization for #12754
2023-12-07 19:48:15 +08:00
zhaozhao.zz
8e11f84ded
Fix replica node cannot expand dicts when loading legacy RDB (#12839)
When loading RDB on cluster nodes, it is necessary to consider the
scenario where a node is a replica.

For example, during a rolling upgrade, new version instances are often
mounted as replicas on old version instances. In this case, the full
synchronization legacy RDB does not contain slot information, and the
new version instance, acting as a replica, should be able to handle the
legacy RDB correctly for `dbExpand`.

Additionally, renaming `getMyClusterSlotCount` to `getMyShardSlotCount`
would be appropriate.

Introduced in #11695
2023-12-07 14:30:48 +08:00
Moshe Kaplan
e2a3f3091f
coverity.yml: Upload should go to project redis-unstable (#12841)
Coverity project name was changed from redis to redis-unstable. Fix the
upload destination to also go to redis-unstable.

Continuation of #12807
2023-12-06 20:51:58 +02:00
Binbin
2f6d4dabaa
Fix outdated LFU comments to eliminate confusion (#12244)
The decrement time was replaced by access time in
583c31472577fb8175e17ee0ce243972f4dd8425.

The halved and doubled LFU_INIT_VAL logic has been changed in
06ca9d683920da19ad53532f8cd55b54584027bc.
Now we just decrement the counter by num_periods. This has been
previously fixed in redis.conf, #11108.
2023-12-06 20:46:57 +02:00
Moshe Kaplan
77e69d8884
GH Workflows: Create CI job for Coverity scan (#12807)
I've noticed that https://scan.coverity.com/projects/redis already
exists, but appears to be only updated on an ad-hoc basis. creating
[redis-unstable](https://scan.coverity.com/projects/redis-unstable?tab=project_settings)
project in coverity for this CI.

This PR adds a GitHub Action-based CI job to create a new Coverity build
once daily, so that there is always a recent scan available.

This is within the limit, as Redis is ~150K LOC and per
https://scan.coverity.com/faq#frequency :

> Up to 21 builds per week, with a maximum of 3 builds per day, for
projects with 100K to 500K lines of code

Before this is merged in, two new secrets will need to be created:

COVERITY_SCAN_EMAIL with the email address used for accessing Coverity
COVERITY_SCAN_TOKEN with the Project token from
https://scan.coverity.com/projects/redis-unstable?tab=project_settings

---------

Co-authored-by: Oran Agra <oran@redislabs.com>
2023-12-06 14:50:00 +02:00
zhaozhao.zz
b730404c2f
Fix multi dbs donot dbExpand when loading RDB (#12840)
Currently, during RDB loading, once a `dbExpand` is performed, the
`should_expand_db` flag is set to 0. This causes the remaining DBs
unable to do `dbExpand` when there are multiple DBs.

To fix this issue, we need to set `should_expand_db` back to 1 whenever
we encounter `RDB_OPCODE_RESIZEDB`. This ensures that each DB can
perform `dbExpand` correctly.

Additionally, the initial value of `should_expand_db` should also be set
to 0 to prevent invalid `dbExpand` in older versions of RDB where
`RDB_OPCODE_RESIZEDB` is not present.

problem introduced in #11695
2023-12-06 10:59:56 +02:00
zhaozhao.zz
9ee1cc33a3
Make the sampling logic in eviction clearer (#12781)
Additional optimizations for the eviction logic in #11695:

To make the eviction logic clearer and decouple the number of sampled
keys from the running mode (cluster or standalone).
* When sampling in each database, we only care about the number of keys
in the current database (not the dicts we sampled from).
* If there are a insufficient number of keys in the current database
(e.g. 10 times the value of `maxmemory_samples`), we can break out
sooner (to avoid looping on a sparse database).
* We'll never try to sample the db dicts more times than the number of
non-empty dicts in the db (max 1 in non-cluster mode).

And it also ensures that each database has a sufficient amount of
sampled keys, so even if unsharded-cluster supports multiple databases,
there won't be any issues.

other changes:
1. keep track of the number of non-empty dicts in each database.
2. move key_count tracking into cumulativeKeyCountAdd rather than all
it's callers

---------

Co-authored-by: Oran Agra <oran@redislabs.com>
2023-12-06 09:37:24 +08:00
Chen Tianjie
991aff1c0f
Optimize KEYS when pattern includes hashtag and implies a single slot. (#12754)
in #12536 we made a similar optimization for SCAN, now that hashtags in
patterns. When we can make sure all keys matching the pettern will be in
the same slot, we can limit the iteration to run only one one.
2023-12-05 16:21:50 +02:00
Madelyn Olson
35c8d616cf
Only rebuild server when src or deps are modified (#12799)
`make test` will unnecessary rebuild `redis-server` if tests are modified because mkheader will touch releae.c.
This changes scopes down when we re-generate release.c to only when actual source files are modified.
2023-12-04 19:24:24 -08:00
Binbin
764838d66f
Check whether the client is NULL in luaCreateFunction (#12829)
It was first added to load lua from RDB, see 28dfdca7. After #9812,
we no longer save lua in RDB. luaCreateFunction will only be called
in script load and eval*, both of which are available in the client.

It could be that that some day we'll still want to load scripts from
somewhere that's not a client. This fix is in dead code.
2023-12-04 20:12:48 +02:00
Binbin
8a4ccb01b3
Fix clusterLoadConfig aux_argv minor memory leak (#12726)
We forgot to call sdsfreesplitres. This is just a cleanup since it will
only be leaked in the error paths, and we will exit on the error paths.
2023-12-03 11:00:53 +02:00
sundb
91309f7981
Fix compilation warning in KeySpace_ServerEventCallback and add CFLAGS=-Werror flag for module CI (#12786)
Warning:
```
postnotifications.c:216:77: warning: format specifies type 'long' but the argument has type 'uint64_t' (aka 'unsigned long long') [-Wformat]
        RedisModule_Log(ctx, "warning", "Got an unexpected subevent '%ld'", subevent);
                                                                     ~~~    ^~~~~~~~
                                                                     %llu
```

CI:
https://github.com/redis/redis/actions/runs/6937308713/job/18871124342#step:6:115

## Other
Add `CFLAGS=-Werror` flag for module CI.

---------

Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
2023-11-30 17:41:00 +02:00
Binbin
e216c83909
Change addReplyErrorFormat to addReplyError when there is no format (#12641)
This is just a cleanup, although they are both correct, the change
is normatively better, and addReplyError is also much faster.
Although not important, speed is not important for these error cases.
2023-11-30 12:36:17 +02:00
lyq2333
423565f784
Optimize CPU cache efficiency on dict while rehashing in dictTwoPhaseUnlinkFind (#12818)
In #5692, we optimize CPU cache efficiency on dict while rehashing but
missed the modification in dictTwoPhaseUnlinkFind.

This PR supplements it.
2023-11-30 13:50:09 +08:00
Binbin
22cc9b5122
Use CLZ in _dictNextExp to get the next power of two (#12815)
In the past, we did not call _dictNextExp frequently. It was only
called when the dictionary was expanded.

Later, dictTypeExpandAllowed was introduced in #7954, which is 6.2.
For the data dict and the expire dict, we can check maxmemory before
actually expanding the dict. This is a good optimization to avoid
maxmemory being exceeded due to the dict expansion.

And in #11692, we moved the dictTypeExpandAllowed check before the
threshold check, this caused a bit of performance degradation, every
time a key is added to the dict, dictTypeExpandAllowed is called to
check.

The main reason for degradation is that in a large dict, we need to
call _dictNextExp frequently, that is, every time we add a key, we
need to call _dictNextExp once. Then the threshold is checked to see
if the dict needs to be expanded. We can see that the order of checks
here can be optimized.

So we moved the dictTypeExpandAllowed check back to after the threshold
check in #12789. In this way, before the dict is actually expanded (that
is, before the threshold is reached), we will not do anything extra
compared to before, that is, we will not call _dictNextExp frequently.

But note we'll still hit the degradation when we over the thresholds.
When the threshold is reached, because #7954, we may delay the dict
expansion due to maxmemory limitations. In this case, we will call
_dictNextExp every time we add a key during this period.

This PR use CLZ in _dictNextExp to get the next power of two. CLZ (count
leading zeros) can easily give you the next power of two. It should be
noted that we have actually introduced the use of __builtin_clzl in
#8687,
which is 7.0. So i suppose all the platforms we use have it (even if the
CPU doesn't have an instruction).

We build 67108864 (2**26) keys through DEBUG POPULTE, which will use
approximately 5.49G memory (used_memory:5898522936). If expansion is
triggered, the additional hash table will consume approximately 1G
memory (2 ** 27 * 8). So we set maxmemory to 6871947673 (that is, 6.4G),
which will be less than 5.49G + 1G, so we will delay the dict rehash
while addint the keys.

After that, each time an element is added to the dict, an allow check
will be performed, that is, we can frequently call _dictNextExp to test
the comparison before and after the optimization. Using DEBUG HTSTATS 0
to
check and make sure that our dict expansion is dealyed.

Using `./src/redis-server redis.conf --save "" --maxmemory 6871947673`.
Using `./src/redis-benchmark -P 100 -r 1000000000 -t set -n 5000000`.
After ten rounds of testing:
```
unstable:           this PR:
769585.94           816860.00
771724.00           818196.69
775674.81           822368.44
781983.12           822503.69
783576.25           828088.75
784190.75           828637.75
791389.69           829875.50
794659.94           835660.69
798212.00           830013.25
801153.62           833934.56
```

We can see there is about 4-5% performance improvement in this case.
2023-11-29 14:42:22 +02:00
Binbin
bdceaf50e4
Fix the check for new RDB_OPCODE_SLOT_INFO in redis-check-rdb (#12768)
We did not read expires_slot_size, causing its check to fail.
An overlook in #11695.
2023-11-29 14:25:17 +02:00
zhaozhao.zz
3431b1f156
format cpu config as redis style (#7351)
The following four configurations are renamed to align with Redis style:

1. server_cpulist renamed to server-cpulist
2. bio_cpulist renamed to bio-cpulist
3. aof_rewrite_cpulist renamed to aof-rewrite-cpulist
4. bgsave_cpulist renamed to bgsave-cpulist

The original names are retained as aliases to ensure compatibility with
old configuration files. We recommend users to gradually transition to
using the new configuration names to maintain consistency in style.
2023-11-29 13:40:06 +08:00
zhaozhao.zz
a1c5171c1d
Fix resize hash tables stuck on the last non-empty slot (#12802)
Introduced in #11695 .

The tryResizeHashTables function gets stuck on the last non-empty slot
while iterating through dictionaries. It does not restart from the
beginning. The reason for this issue is a problem with the usage of
dbIteratorNextDict:

/* Returns next dictionary from the iterator, or NULL if iteration is complete. */
dict *dbIteratorNextDict(dbIterator *dbit) {
    if (dbit->next_slot == -1) return NULL;
    dbit->slot = dbit->next_slot;
    dbit->next_slot = dbGetNextNonEmptySlot(dbit->db, dbit->slot, dbit->keyType);
    return dbGetDictFromIterator(dbit);
}

When iterating to the last non-empty slot, next_slot is set to -1,
causing it to loop indefinitely on that slot. We need to modify the code
to ensure that after iterating to the last non-empty slot, it returns to
the first non-empty slot.

BTW, function tryResizeHashTables is actually iterating over slots
that have keys. However, in its implementation, it leverages the
dbIterator (which is a key iterator) to obtain slot and dictionary
information. While this approach works fine, but it is not very
intuitive. This PR also improves readability by changing the iteration
to directly iterate over slots, thereby enhancing clarity.
2023-11-28 18:50:16 +08:00
zhaozhao.zz
095d05786f
clarify the comment of findSlotByKeyIndex function (#12811)
The current comment for `findSlotByKeyIndex` is a bit ambiguous and can
be misleading, as it may be misunderstood as getting the next slot
corresponding to target.
2023-11-27 18:45:40 -08:00
Binbin
d6f19539d2
Un-register notification and server event when RedisModule_OnLoad fails (#12809)
When we register notification or server event in RedisModule_OnLoad, but
RedisModule_OnLoad eventually fails, triggering notification or server
event
will cause the server to crash.

If the loading fails on a later stage of moduleLoad, we do call
moduleUnload
which handles all un-registration, but when it fails on the
RedisModule_OnLoad
call, we only un-register several specific things and these were
missing:

- moduleUnsubscribeNotifications
- moduleUnregisterFilters
- moduleUnsubscribeAllServerEvents

Refactored the code to reuse the code from moduleUnload.

Fixes #12808.
2023-11-27 17:26:33 +02:00
zhaozhao.zz
1bd0b54957
Optimize the efficiency of active expiration when databases exceeds 16. (#12738)
Currently, when the number of databases exceeds 16,
the efficiency of cleaning expired keys is relatively low.

The reason is that by default only 16 databases are scanned when
attempting to clean expired keys (CRON_DBS_PER_CALL is 16). But users
may set databases higher than 16, such as 256, but it does not
necessarily mean that all 256 databases have expiration time set. If
only one database has expiration time set, this database needs 16
activeExpireCycle rounds in order to be scanned once, and 15 of those
rounds are meaningless.

To optimize the efficiency of expiration in such scenarios, we use dbs_per_call
to control the number of databases with expired keys being scanned.

Additionally, add a condition to limit the maximum number of rounds
to server.dbnum to prevent excessive spinning. This ensures that even if
only one database has expired keys, it can be triggered within a single cron job.
2023-11-27 12:12:12 +08:00
binfeng-xin
56ec1ff1ce
Call signalModifiedKey after the key modification is completed (#11144)
Fix `signalModifiedKey()` order, call it after key modification
completed, to ensure the state of key eventual consistency.

When a key is modified, Redis calls `signalModifiedKey` to notify other
systems, such as the watch system of transactions and the tracking
system of client side caching. However, in some commands, the
`signalModifiedKey` call happens during the key modification process
instead of after the key modification is completed. This can potentially
cause issues, as systems relying on `signalModifiedKey` may receive the
"write in flight" status of the key rather than its final state.

These commands include:
1. PFADD
2. LSET, LMOVE, LREM
3. ZPOPMIN, ZPOPMAX, BZPOPMIN, BZPOPMAX, ZMPOP, BZMPOP

Currently there is no problem with Redis, but it is better to adjust the
order of `signalModifiedKey()`, to avoid issues in future development on
Redis.

---------

Co-authored-by: zhaozhao.zz <zhaozhao.zz@alibaba-inc.com>
2023-11-27 11:16:41 +08:00
meiravgri
2e854bccc6
Fix async safety in signal handlers (#12658)
see discussion from after https://github.com/redis/redis/pull/12453 was
merged
----
This PR replaces signals that are not considered async-signal-safe
(AS-safe) with safe calls.

#### **1. serverLog() and serverLogFromHandler()**
`serverLog` uses unsafe calls. It was decided that we will **avoid**
`serverLog` calls by the signal handlers when:
* The signal is not fatal, such as SIGALRM. In these cases, we prefer
using `serverLogFromHandler` which is the safe version of `serverLog`.
Note they have different prompts:
`serverLog`: `62220:M 26 Oct 2023 14:39:04.526 # <msg>`
`serverLogFromHandler`: `62220:signal-handler (1698331136) <msg>`
* The code was added recently. Calls to `serverLog` by the signal
handler have been there ever since Redis exists and it hasn't caused
problems so far. To avoid regression, from now we should use
`serverLogFromHandler`

#### **2. `snprintf` `fgets` and `strtoul`(base = 16) -------->
`_safe_snprintf`, `fgets_async_signal_safe`, `string_to_hex`**
The safe version of `snprintf` was taken from
[here](8cfc4ca5e7/src/mc_util.c (L754))

#### **3. fopen(), fgets(), fclose() --------> open(), read(), close()**

#### **4. opendir(), readdir(), closedir() --------> open(),
syscall(SYS_getdents64), close()**

#### **5. Threads_mngr sync mechanisms**
* waiting for the thread to generate stack trace: semaphore -------->
busy-wait
* `globals_rw_lock` was removed: as we are not using malloc and the
semaphore anymore we don't need to protect `ThreadsManager_cleanups`.

#### **6. Stacktraces buffer**
The initial problem was that we were not able to safely call malloc
within the signal handler.
To solve that we created a buffer on the stack of `writeStacktraces` and
saved it in a global pointer, assuming that under normal circumstances,
the function `writeStacktraces` would complete before any thread
attempted to write to it. However, **if threads lag behind, they might
access this global pointer after it no longer belongs to the
`writeStacktraces` stack, potentially corrupting memory.**
To address this, various solutions were discussed
[here](https://github.com/redis/redis/pull/12658#discussion_r1390442896)
Eventually, we decided to **create a pipe** at server startup that will
remain valid as long as the process is alive.
We chose this solution due to its minimal memory usage, and since
`write()` and `read()` are atomic operations. It ensures that stack
traces from different threads won't mix.

**The stacktraces collection process is now as  follows:**
* Cleaning the pipe to eliminate writes of late threads from previous
runs.
* Each thread writes to the pipe its stacktrace
* Waiting for all the threads to mark completion or until a timeout (2
sec) is reached
* Reading from the pipe to print the stacktraces.

#### **7. Changes that were considered and eventually were dropped**
* replace watchdog timer with a POSIX timer: 
according to [settimer man](https://linux.die.net/man/2/setitimer)

> POSIX.1-2008 marks getitimer() and setitimer() obsolete, recommending
the use of the POSIX timers API
([timer_gettime](https://linux.die.net/man/2/timer_gettime)(2),
[timer_settime](https://linux.die.net/man/2/timer_settime)(2), etc.)
instead.

However, although it is supposed to conform to POSIX std, POSIX timers
API is not supported on Mac.
You can take a look here at the Linux implementation:

[here](c7562ee135)
To avoid messing up the code, and uncertainty regarding compatibility,
it was decided to drop it for now.

* avoid using sds (uses malloc) in logConfigDebugInfo
It was considered to print config info instead of using sds, however
apparently, `logConfigDebugInfo` does more than just print the sds, so
it was decided this fix is out of this issue scope.

#### **8. fix Signal mask check**
The check `signum & sig_mask` intended to indicate whether the signal is
blocked by the thread was incorrect. Actually, the bit position in the
signal mask corresponds to the signal number. We fixed this by changing
the condition to: `sig_mask & (1L << (sig_num - 1))`

#### **9. Unrelated changes**
both `fork.tcl `and `util.tcl` implemented a function called
`count_log_message` expecting different parameters. This caused
confusion when trying to run daily tests with additional test parameters
to run a specific test.
The `count_log_message` in `fork.tcl` was removed and the calls were
replaced with calls to `count_log_message` located in `util.tcl`

---------

Co-authored-by: Ozan Tezcan <ozantezcan@gmail.com>
Co-authored-by: Oran Agra <oran@redislabs.com>
2023-11-23 13:22:20 +02:00
Binbin
5e403099bd
Fix misleading error message in redis.log when loglevel is invalid (#12636)
We don't have any debug level, change it to log level.
2023-11-23 10:23:30 +02:00
Moshe Kaplan
c9aa586b6b
rdb.c: Avoid potential file handle leak (Coverity 404720) (#12795)
`open()` can return any non-negative integer on success, including zero.
This change modifies the check from open()'s return value to also
include checking for a return value of zero (e.g., if stdin were closed
and then `open()` was called).

Fixes Coverity 404720

Can't happen in Redis. just a cleanup.
2023-11-23 10:14:17 +02:00
Moshe Kaplan
ae09d4d3ef
redis-check-aof.c: Avoid leaking file handle if file is zero bytes (#12797)
If fopen() is successful, but redis_fstat() determines the file is zero
bytes, the file handle stored in fp will leak. This change closes the
filehandle stored in fp if the file is zero bytes.

An FD leak on a tool like redis-check-aof isn't an issue (it'll exit soon anyway).
This is just a cleanup.
2023-11-23 10:08:49 +02:00
Moshe Kaplan
1c48d3dab2
config.c: Avoid leaking file handle if redis_fstat() fails (#12796)
If fopen() is successful, but redis_fstat() fails, the file handle
stored in fp will leak. This change closes the filehandle stored in fp
if redis_fstat() fails.

Fixes Coverity 390029
2023-11-23 10:06:13 +02:00
Moshe Kaplan
157e5d47b5
util.c: Don't leak directory handle if recursive call to dirRemove fails (#12800)
If a recursive call to dirRemove() returns -1, dirRemove() the directory
handle stored in dir will leak. This change closes the directory handle
stored in dir even if the recursive call to dirRemove() returns -1.

Fixed Coverity 371073
2023-11-23 10:04:02 +02:00
Binbin
463476933c
Optimize dict expand check, move allow check after the thresholds check (#12789)
dictExpandAllowed (for the main db dict and the expire dict) seems to
involve a few function calls and memory accesses, and we can do it only
after the thresholds checks and can get some performance improvements.

A simple benchmark test: there are 11032768 fixed keys in the database,
start a redis-server with `--maxmemory big_number --save ""`,
start a redis-benchmark with `-P 100 -r 1000000000 -t set -n 5000000`,
collect `throughput summary: n requests per second` result.

After five rounds of testing:
```
unstable     this PR
848032.56    897988.56
854408.69    913408.88
858663.94    914076.81
871839.56    916758.31
882612.56    920640.75
```

We can see a 5% performance improvement in general condition.
But note we'll still hit the degradation when we over the thresholds.
2023-11-22 11:33:26 +02:00
Yehoshua (Josh) Hershberg
ef1ca6c882
add some file level comments and copyright (#12793)
A followup PR for #12742
Add some brief comments explaining the purpose of the file to the head
of cluster_legacy.c and cluster.c.
Add copyright notice to cluster.c

Signed-off-by: Josh Hershberg <yehoshua@redis.com>
Co-authored-by: Josh Hershberg <yehoshua@redis.com>
2023-11-22 11:32:23 +02:00
Binbin
dedbf99a80
Fix dbExpand not dividing by slots, resulting in consuming slots times the dictExpand (#12773)
We meant to divide it by the number of slots, otherwise it will do slots
times
dictExpand, bug was introduced in #11695.
2023-11-22 11:16:06 +02:00
Oran Agra
58cb302526
Cluster refactor, common API for different implementations (#12742)
This PR reworks the clustering code to support multiple clustering
implementations, specifically, the current "legacy" clustering
implementation or, although not part of this PR, flotilla (see #10875).
Which implementation is used could be a compile-time flag (will be added
later). Legacy clustering functionality remains unchanged.

The basic idea is as follows. The header cluster.h now contains function
declarations that define the "Cluster API." These are the contract and
interface between any clustering implementation and the rest of the
Redis source code. Some of the function definitions are shared between
all clustering implementations. These functions are in cluster.c. The
functions and data structures specific to legacy clustering are in
cluster-legacy.c/h. One consequence of this is that the structs
clusterNode and clusterState which were previously "public" to the rest
of Redis are now hidden behind the Cluster API.

The PR is divided up into commits, each with a commit message explaining
the changes. some are just mass rename or moving code between files (may
not require close inspection / review), others are manual changes.

One other, related change is:
- The "failover" command is now plugged into the Cluster API so that the
clustering implementation can (a) enable/disable the command to begin
with and if enabled (b) perform the actual failover. The "failover"
command remains disabled for legacy clustering.
2023-11-22 09:31:19 +02:00
Josh Hershberg
eebb025826 Cluster refactor: Some code convention fixes
Signed-off-by: Josh Hershberg <yehoshua@redis.com>
2023-11-22 05:54:48 +02:00
Josh Hershberg
290f376429 Cluster refactor: fn renames + small compilation issue on ubuntu
Signed-off-by: Josh Hershberg <yehoshua@redis.com>
2023-11-22 05:54:06 +02:00
Josh Hershberg
13b754853c Cluster refactor: cluster.h - reorder functions into logical groups
Signed-off-by: Josh Hershberg <yehoshua@redis.com>
2023-11-22 05:54:06 +02:00
Josh Hershberg
2e5181ef28 Cluster refactor: Add failover cmd support to cluster api
The failover command is up until now not supported
in cluster mode. This commit allows a cluster
implementation to support the command. The legacy
clustering implementation still does not support
this command.

Signed-off-by: Josh Hershberg <yehoshua@redis.com>
2023-11-22 05:54:06 +02:00
Josh Hershberg
c6157b3510 Cluster refactor: Make clustering functions common
Move primary functions used to implement datapath
clustering into cluster.c, making them shared. This
required adding "accessor" and other functions to
abstract access to node details and cluster state.

Signed-off-by: Josh Hershberg <yehoshua@redis.com>
2023-11-22 05:54:06 +02:00
Josh Hershberg
4afc54ad9b Cluster refactor: break up clusterCommand
Divide up clusterCommand into clusterCommand for shared
sub-commands and clusterCommandSpecial for implementation
specific sub-commands. So to, the cluster command help
sub-command has been divided into two implementations,
clusterCommandHelp and clusterCommandHelpSpecial. Some
common sub-subcommand implementations have been extracted
and their implemenations either made shared or else
implementation specific.

Signed-off-by: Josh Hershberg <yehoshua@redis.com>
2023-11-22 05:54:06 +02:00
Josh Hershberg
33ef6a3003 Cluster refactor: s/clusterNodeGetSlotBit/clusterNodeCoversSlot/
Simple rename, "GetSlotBit" is implementation specific

Signed-off-by: Josh Hershberg <yehoshua@redis.com>
2023-11-22 05:54:06 +02:00
Josh Hershberg
ac1513221b Cluster refactor: Move items from cluster_legacy.c to cluster.c
Move (but do not change) some items from cluster_legacy.c
back info cluster.c. These items are shared code that all
clustering implementations will use.

Signed-off-by: Josh Hershberg <yehoshua@redis.com>
2023-11-22 05:54:06 +02:00
Josh Hershberg
040cb6a4aa Cluster refactor: verifyClusterNodeId need not be 'public'
Signed-off-by: Josh Hershberg <yehoshua@redis.com>
2023-11-22 05:54:06 +02:00
Josh Hershberg
4944eda696 Cluster refactor: Move more stuff from cluster.h to cluster_legacy.h
More declerations can be moved into cluster_legacy.h
as they are not requied for the cluster api. The code
was simply moved, not changed in any way.

Signed-off-by: Josh Hershberg <yehoshua@redis.com>
2023-11-22 05:54:03 +02:00
Josh Hershberg
d9a0478599 Cluster refactor: Make clusterNode private
Move clusterNode into cluster_legacy.h.
In order to achieve this some accessor methods
were added and also a refactor of how debugCommand
handles cluster related subcommands.

Signed-off-by: Josh Hershberg <yehoshua@redis.com>
2023-11-22 05:50:46 +02:00
Josh Hershberg
98a6c44b75 Cluster refactor: Make clusterState private
Move clusterState into cluster_legacy.h. In order to achieve
this some "accessor" methods needed to be added to the
cluster API and some other minor refactors.

Signed-off-by: Josh Hershberg <yehoshua@redis.com>
2023-11-22 05:44:10 +02:00
Binbin
2c41b13505
Block DEBUG POPULATE in loading and async-loading (#12790)
When we are loading data, it is not safe to generate data
through DEBUG POPULATE. POPULATE may generate keys, causing
panic when loading data with duplicate keys.
2023-11-21 17:00:27 +02:00
Josh Hershberg
5292adb985 Cluster refactor: Move trivial stuff into cluster_legacy.h
Move some declerations from cluster.h to cluster_legacy.h.
The items moved are specific to the legacy clustering
implementation and DO NOT require any other refactoring
other than moving them from one file to another.

Signed-off-by: Josh Hershberg <yehoshua@redis.com>
2023-11-21 12:49:14 +02:00
Josh Hershberg
6a6ae6ffe8 Cluster refactor: Create new cluster.c and include of cluster_legacy.h
create new cluster.c

Signed-off-by: Josh Hershberg <yehoshua@redis.com>

forgot to #include cluster_legacy.h

Signed-off-by: Josh Hershberg <yehoshua@redis.com>
2023-11-21 12:49:14 +02:00
Josh Hershberg
86915775f1 Cluster refactor: rename cluster.c -> cluster_legacy.c
Signed-off-by: Josh Hershberg <yehoshua@redis.com>
2023-11-21 12:49:14 +02:00
iKun
4278ed8de5
redis-cli --bigkeys ,--hotkeys and --memkeys to replica in cluster mode (#12735)
Make redis-cli --bigkeys and --memkeys usable on a replicas in cluster
mode, by sending the READONLY command. This is only if -c is also given.

We don't detect if a node is a master or a replica so we send READONLY
in both cases. The READONLY has no effect on masters.

    Release notes:
    Make redis-cli --bigkeys and --memkeys usable on cluster replicas

---------

Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
Co-authored-by: Oran Agra <oran@redislabs.com>
2023-11-20 09:33:28 +02:00
Hwang Si Yeon
a1f91ffa18
Add an explanation for URI with -u in redis-cli --help (#12751)
Add documentation of the URI format in the `--help` output of
`redis-cli` and `redis-benchmark`.

In particular, it's good for users to know that they need to specify
"default" as the username when authenticating without a username. Other
details of the URI format are described too, like scheme and dbnum.

It used to be possible to connect to Redis using an URL with an empty
username, like `redis-cli -u redis://:PASSWORD@localhost:6379/0`. This
was broken in 6.2 (#8048), and there was a discussion about it #9186.
Now, users need to specify "default" as the username and it's better to
document it.

Refer to #12746 for more details.
2023-11-19 15:09:14 +02:00
Wen Hui
5a1f4b9aec
Adding missing SWAPDB related test cases. (#12769)
We have some test cases of swapdb with watchkey but missing seperate
basic swapdb test cases, unhappy path and flushdb after swapdb. So added
the test cases in keyspace.tcl.
2023-11-19 12:44:48 +02:00
Binbin
3d9c427f8c
Fix timing issue in CLUSTER SLAVE / REPLICAS consistent test (#12774)
CI reports that this test failed, the reason is because during
the command processing, the node processed PING/PONG, resulting
in ping_sent or pong_received mismatch.

Change to use MULTI to avoid timing issue. The test was introduced
in #12224.
2023-11-19 11:09:33 +02:00
zhaozhao.zz
eb392c0a6f
replace calculateKeySlot with known slot in evictionPoolPopulate (#12777) 2023-11-17 19:01:06 +08:00
zhaozhao.zz
6013122c8e
check dbSize when rewrite AOF, in case unnecessary SELECT (#12775)
Introduced by #11695, should skip empty db in case unnecessary SELECT.
2023-11-17 19:00:26 +08:00
Binbin
9b6dded421
Fix empty rehashing list in swapdb mode (#12770)
In swapdb mode, the temp db does not init the rehashing list,
the change added in #12764 caused cluster ci to fail.
2023-11-16 11:18:25 +02:00
Binbin
4366bbaa61
Empty rehashing list in emptyDbStructure (#12764)
This is currently harmless, since we have already cleared the dict
before, it will reset the rehashidx to -1, and in incrementallyRehash
we will call dictIsRehashing to check.

It would be nice to empty the list to avoid meaningless attempts, and
the code is also unified to reduce misunderstandings.
2023-11-15 07:55:34 +02:00
Binbin
fe36306340
Fix DB iterator not resetting pauserehash causing dict being unable to rehash (#12757)
When using DB iterator, it will use dictInitSafeIterator to init a old safe
dict iterator. When dbIteratorNext is used, it will jump to the next slot db
dict when we are done a dict. During this process, we do not have any calls to
dictResumeRehashing, which causes the dict's pauserehash to always be > 0.

And at last, it will be returned directly in dictRehashMilliseconds, which causes
us to have slot dict in a state where rehash cannot be completed.

In the "expire scan should skip dictionaries with lot's of empty buckets" test,
adding a `keys *` can reproduce the problem stably. `keys *` will call dbIteratorNext
to trigger a traversal of all slot dicts.

Added dbReleaseIterator and dbIteratorInitNextSafeIterator methods to call dictResetIterator.
Issue was introduced in #11695.
2023-11-14 14:28:46 +02:00
Yossi Gottlieb
a9e73c00bc
Reduce FreeBSD daily scope. (#12758)
The full test is very flaky running on a VM inside GitHub worker, so we
have to settle for only building and running a small smoke test.
2023-11-13 17:22:09 +02:00
Roshan Khatri
88e83e517b
Add DEBUG_ASSERTIONS option to custom assert (#12667)
This PR introduces a new macro, serverAssertWithInfoDebug, to do complex assertions only for debugging. The main intention is to allow running complex operations during tests without impacting runtime performance. This assertion is enabled when setting DEBUG_ASSERTIONS.

The DEBUG_ASSERTIONS flag is set for the daily and CI variants of `test-sanitizer-address`.
2023-11-11 20:31:34 -08:00
Harkrishn Patro
9ca8490315
Increase timeout for expiry cluster tests (#12752)
Test recently added fails on timeout in valgrind in GH actions.

Locally with valgrind the test finishes within 1.5 sec(s). Couldn't find
any issue due to lack of reproducibility. Increasing the timeout and
adding an additional log to the test to understand how many keys
were left at the end.
2023-11-11 12:01:04 +02:00
zhaozhao.zz
6258edebf0
reset bucket_count when empty db (#12750)
Introduced in #12697 , should reset bucket_count when empty db, or the overhead memory usage of db can be miscalculated.
2023-11-10 15:52:57 +02:00
zhaozhao.zz
cf6ed3feeb
fix the wrong judgement for activerehashing in standalone mode (#12741)
Introduced by #11695, the judgement should be dictIsRehashing.
2023-11-09 11:30:50 +02:00
Binbin
53294e537c
Fix genClusterDebugString minor sds leaks (#12739)
This function now will only be called in printCrashReport,
so this is just a cleanup.
2023-11-08 19:14:36 +02:00
Meir Shpilraien (Spielrein)
0ffb9d2ea9
Before evicted and before expired server events are not executed inside an execution unit. (#12733)
Redis 7.2 (#9406) introduced a new modules event, `RedisModuleEvent_Key`.
This new event allows the module to read the key data just before it is removed
from the database (either deleted, expired, evicted, or overwritten).

When the key is removed from the database, either by active expire or eviction.
The new event was not called as part of an execution unit. This can cause an
issue if the module registers a post notification job inside the event. This job will
not be executed atomically with the expiration/eviction operation and will not
replicated inside a Multi/Exec. Moreover, the post notification job will be executed
right after the event where it is still not safe to perform any write operation, this will
violate the promise that post notification job will be called atomically with the
operation that triggered it and **only when it is safe to write**.

This PR fixes the issue by wrapping each expiration/eviction of a key with an execution
unit. This makes sure the entire operation will run atomically and all the post notification
jobs will be executed at the end where it is safe to write.

Tests were modified to verify the fix.
2023-11-08 09:28:22 +02:00
Yossi Gottlieb
6223355cf3
Use cross-platform-actions for FreeBSD support. (#12732)
This change overcomes many stability issues experienced with the
vmactions action.

We need to limit VMs to 8GB for better stability, as the 13GB default
seems to hang them occasionally.

Shell code has been simplified since this action seem to use `bash -e`
which will abort on non-zero exit codes anyway.
2023-11-06 18:07:14 +02:00
dingrui
a888503b4f
Remove unnecessary argument(tp) in gettimeofday() call for retrieving timezone (#12722)
changes the `gettimeofday` caller, by removing an unused optional output argument.

It would take 2 benefits:

- simplify code, discard unnecessary arg.
- possibly faster due to the implementation in kernel.
2023-11-06 15:10:09 +02:00
Chen Tianjie
282b82e9d2
Handle all CLUSTER_REDIR_ error code when verifying script. (#12707)
Clarify the errors related to the cluster mode in the script, return the command that encountered an execution error along with the specific error message.

---------

Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>
2023-11-06 17:48:58 +08:00
Wen Hui
28b6155ba5
Fix the bug that write redis sensitive command information to redis_cli historyfile (#11489)
Currently, we do not write the following sensitive commands into the ~/.rediscli_history file:

ACL SETUSER username [rule [rule ...]]
AUTH [username] password
HELLO [AUTH username password] 
MIGRATE host port <key | ""> destination-db timeout [[AUTH password | AUTH2 username password]]
CONFIG SET masterauth master-password
CONFIG SET masteruser username
CONFIG SET requirepass foobared

However, we still write the following sensitive commands into the ~/.rediscli_history file:
ACL GETUSER username
Sentinel CONFIG set sentinel-pass password
Sentinel CONFIG set sentinel-user username
Sentinel set mastername auth-pass password
Sentinel set mastername auth-user username

This change adds the commands of the second list to be skipped from being written to the history file.
2023-11-05 14:20:15 +02:00
Roshan Khatri
15a048d4f0
re-enable defrag tests in cluster mode. (#12710)
Reverts the skipping defrag tests in cluster mode (done in #12672.
instead it skips only some defrag tests that are relevant for cluster modes.
The test now run well after investigating and making the changes in #12674 and #12694.

Co-authored-by: Oran Agra <oran@redislabs.com>
2023-11-02 13:55:48 +02:00
dependabot[bot]
0ce74872c4
Bump actions/setup-node from 3 to 4 (#12708)
Bumps [actions/setup-node](https://github.com/actions/setup-node) from 3 to 4.
- [Release notes](https://github.com/actions/setup-node/releases)
- [Commits](https://github.com/actions/setup-node/compare/v3...v4)

---
updated-dependencies:
- dependency-name: actions/setup-node
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-11-02 12:22:50 +02:00
Viktor Söderqvist
8878817d89
Optimize SCAN with MATCH when pattern implies cluster slot (#12536)
Optimize the performance of SCAN commands when a match pattern can only contain keys from a 
single slot in cluster mode. This can happen when the pattern contains a hash tag before any 
wildcard matchers or when the key contains no matchers.
2023-11-01 00:06:49 -07:00
Chen Tianjie
e9f312e087
Change stat_client_qbuf_limit_disconnections to atomic. (#12711)
In #12476 server.stat_client_qbuf_limit_disconnections was added. It is written in readQueryFromClient, which may be called by multiple threads when io-threads and io-threads-do-reads are turned on. Somehow we missed to make it an atomic variable.
2023-11-01 10:57:24 +08:00
Viktor Söderqvist
8d675950e6
Don't crash when adding a forgotten node to blacklist twice (#12702)
Add a defensive checks to prevent double freeing a node from the cluster blacklist.
2023-10-31 07:20:06 -07:00
erpeng
4bbb2b0152
Optimize CPU cache efficiency on dict while it's being rehashed (#5692)
when find a key  ,if redis is rehashing, currently we should lookup both tables (ht[0] and ht[1]).
if we use the key's index comparing to the rehashidx,if index < rehashidx,then  we can conclude: 
1. it is rehashing(rehashidx is -1 if it is not rehashing) 
2. we can't find key in ht[0] so just continue to find key in ht[1]

The possible performance gain here, is not the looping over the linked list (which is empty),
but rather the lookup in the table (which could be a cache miss).
---------

Co-authored-by: zhangshihua003 <zhangshihua003@ke.com>
Co-authored-by: sundb <sundbcn@gmail.com>
Co-authored-by: Oran Agra <oran@redislabs.com>
Co-authored-by: judeng <abc3844@126.com>
2023-10-31 09:57:26 +02:00
Roshan Khatri
f7fa481156
Optimize finding the slot for a given key count in a fenwick tree (#12704)
This PR optimizes the time complexity of findSlotByKeyIndex from O(log^2(N)) to O(log(N)) by using the 
tree structure of binary index tree to find a slot in one search of the index.
2023-10-27 17:15:19 -07:00
Harkrishn Patro
4145d628b4
Reduce dbBuckets operation time complexity from O(N) to O(1) (#12697)
As part of #11695 independent dictionaries were introduced per slot.
Time complexity to discover total no. of buckets across all dictionaries
increased to O(N) with straightforward implementation of iterating over
all dictionaries and adding dictBuckets of each.

To optimize the time complexity, we could maintain a global counter at
db level to keep track of the count of buckets and update it on the start
and end of rehashing.

---------

Co-authored-by: Roshan Khatri <rvkhatri@amazon.com>
2023-10-27 22:05:40 +03:00
Roshan Khatri
7d68208a6e
Reset later item flag after defrag later is done (#12694)
Fixing issues described in #12672, started after #11695
Related to #12674

Fixes the `defrag didn't stop' issue.

In some cases of how the keys were stored in memory
defrag_later_item_in_progress was not getting reset once we finish
defragging the later items and we move to the next slot. This stopped
the scan to happen in the later slots and did not get
2023-10-27 13:56:15 +03:00
Oran Agra
ba900f6cb8
Fix fd leak causing deleted files to remain open and eat disk space (#12693)
This was introduced in v7.2 by #11248
2023-10-25 20:54:02 +03:00
Binbin
372ea21875
Update comment around propagateDeletion (#12687)
Fix some outdated comments and add comment for moduleNotifyKeyspaceEvent
we added in #11084 since it seems a bit implicit.

---------

Co-authored-by: Oran Agra <oran@redislabs.com>
2023-10-24 13:10:03 +03:00
Harkrishn Patro
3fac869f02
Fix test, disable expiration until empty buckets are formed (#12689)
Test failure on freebsd CI:
```
*** [err]: expire scan should skip dictionaries with lot's of empty buckets in tests/unit/expire.tcl
  scan didn't handle slot skipping logic.
```

Observation:

expiry of keys might happen before the empty buckets are formed and won't help with the expiry skip logic validation.

Solution:

Disable expiration until the empty buckets are formed.
2023-10-24 11:29:40 +03:00
Harkrishn Patro
26eb4ce397
Fix defrag test (#12674)
Fixing issues started after #11695 when the defrag tests are being executed in cluster mode too.
For some reason, it looks like the defragmentation is over too quickly, before the test is able to
detect that it's running.
so now instead of waiting to see that it's active, we wait to see that it did some work
```
[err]: Active defrag big list: cluster in tests/unit/memefficiency.tcl
defrag not started.
[err]: Active defrag big keys: cluster in tests/unit/memefficiency.tcl
defrag didn't stop.
```
2023-10-22 11:56:45 +03:00
Harkrishn Patro
becd50d0da
Disable flaky defrag tests affecting daily run (#12672)
Temporarily disabling few of the defrag tests in cluster mode to make the daily run stable:

Active defrag eval scripts
Active defrag big keys
Active defrag big list
Active defrag edge case
2023-10-19 21:12:58 +03:00
Harkrishn Patro
f3bf8485d8
Fix resize hash table dictionary iterator (#12660)
Dictionary iterator logic in the `tryResizeHashTables` method is picking the next
(incorrect) dictionary while the cursor is at a given slot. This could lead to some
dictionary/slot getting skipped from resizing.

Also stabilize the test.

problem introduced recently in #11695
2023-10-19 13:58:32 +03:00
Oran Agra
03345ddc7f
Fix issue of listen before chmod on Unix sockets (CVE-2023-45145) (#12671)
Before this commit, Unix socket setup performed chmod(2) on the socket
file after calling listen(2). Depending on what umask is used, this
could leave the file with the wrong permissions for a short period of
time. As a result, another process could exploit this race condition and
establish a connection that would otherwise not be possible.

We now make sure the socket permissions are set up prior to calling
listen(2).

(cherry picked from commit 1119ecae6fd8796fa337df2212f09173ab6c7b0a)

Co-authored-by: Yossi Gottlieb <yossigo@gmail.com>
2023-10-18 14:00:00 +03:00
sundb
3c734b8e9d
Add new compilation CI for macos-11 and macos-13 (#12666)
As discussed in #12611
Add a build CI for macox 11 and 13 to avoid compatibility breakage introduced by future macos sdk versions.
2023-10-18 13:25:52 +03:00
meiravgri
d27c7413a9
remove heap allocations from signal handlers. (#12655)
Using heap allocation during signal handlers is unsafe.
This PR purpose is to replace all the heap allocations done within the signal
handlers raised upon server crash and assertions.
These were added in #12453.

writeStacktraces(): allocates the stacktraces output array on the calling thread's
stack and assigns the address to a global variable.
It calls `ThreadsManager_runOnThreads()` that invokes `collect_stacktrace_data()`
by each thread: each thread writes to a different location in the above array to allow
sync writes.

get_ready_to_signal_threads_tids(): instead of allocating the `tids` array, it receives it
as a fixed size array parameter, allocated on on the stack of the calling function, and
returns the number of valid threads. The array size is hard-coded to 50.

`ThreadsManager_runOnThreads():` To avoid the outputs array allocation, the
**callback signature** was changed. Now it should return void. This function return type
has also changed to int - returns 1 if successful, and 0 otherwise.

Other unsafe calls will be handled in following PRs
2023-10-16 17:21:49 +03:00
Vitaly
0270abda82
Replace cluster metadata with slot specific dictionaries (#11695)
This is an implementation of https://github.com/redis/redis/issues/10589 that eliminates 16 bytes per entry in cluster mode, that are currently used to create a linked list between entries in the same slot.  Main idea is splitting main dictionary into 16k smaller dictionaries (one per slot), so we can perform all slot specific operations, such as iteration, without any additional info in the `dictEntry`. For Redis cluster, the expectation is that there will be a larger number of keys, so the fixed overhead of 16k dictionaries will be The expire dictionary is also split up so that each slot is logically decoupled, so that in subsequent revisions we will be able to atomically flush a slot of data.

## Important changes
* Incremental rehashing - one big change here is that it's not one, but rather up to 16k dictionaries that can be rehashing at the same time, in order to keep track of them, we introduce a separate queue for dictionaries that are rehashing. Also instead of rehashing a single dictionary, cron job will now try to rehash as many as it can in 1ms.
* getRandomKey - now needs to not only select a random key, from the random bucket, but also needs to select a random dictionary. Fairness is a major concern here, as it's possible that keys can be unevenly distributed across the slots. In order to address this search we introduced binary index tree). With that data structure we are able to efficiently find a random slot using binary search in O(log^2(slot count)) time.
* Iteration efficiency - when iterating dictionary with a lot of empty slots, we want to skip them efficiently. We can do this using same binary index that is used for random key selection, this index allows us to find a slot for a specific key index. For example if there are 10 keys in the slot 0, then we can quickly find a slot that contains 11th key using binary search on top of the binary index tree.
* scan API - in order to perform a scan across the entire DB, the cursor now needs to not only save position within the dictionary but also the slot id. In this change we append slot id into LSB of the cursor so it can be passed around between client and the server. This has interesting side effect, now you'll be able to start scanning specific slot by simply providing slot id as a cursor value. The plan is to not document this as defined behavior, however. It's also worth nothing the SCAN API is now technically incompatible with previous versions, although practically we don't believe it's an issue.
* Checksum calculation optimizations - During command execution, we know that all of the keys are from the same slot (outside of a few notable exceptions such as cross slot scripts and modules). We don't want to compute the checksum multiple multiple times, hence we are relying on cached slot id in the client during the command executions. All operations that access random keys, either should pass in the known slot or recompute the slot. 
* Slot info in RDB - in order to resize individual dictionaries correctly, while loading RDB, it's not enough to know total number of keys (of course we could approximate number of keys per slot, but it won't be precise). To address this issue, we've added additional metadata into RDB that contains number of keys in each slot, which can be used as a hint during loading.
* DB size - besides `DBSIZE` API, we need to know size of the DB in many places want, in order to avoid scanning all dictionaries and summing up their sizes in a loop, we've introduced a new field into `redisDb` that keeps track of `key_count`. This way we can keep DBSIZE operation O(1). This is also kept for O(1) expires computation as well.

## Performance
This change improves SET performance in cluster mode by ~5%, most of the gains come from us not having to maintain linked lists for keys in slot, non-cluster mode has same performance. For workloads that rely on evictions, the performance is similar because of the extra overhead for finding keys to evict. 

RDB loading performance is slightly reduced, as the slot of each key needs to be computed during the load.

## Interface changes
* Removed `overhead.hashtable.slot-to-keys` to `MEMORY STATS`
* Scan API will now require 64 bits to store the cursor, even on 32 bit systems, as the slot information will be stored.
* New RDB version to support the new op code for SLOT information. 

---------

Co-authored-by: Vitaly Arbuzov <arvit@amazon.com>
Co-authored-by: Harkrishn Patro <harkrisp@amazon.com>
Co-authored-by: Roshan Khatri <rvkhatri@amazon.com>
Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>
Co-authored-by: Oran Agra <oran@redislabs.com>
2023-10-14 23:58:26 -07:00
Oran Agra
f0c1c730d4
test suite: clean server pids after server crashed (#12639)
when a server in the test suite crashes and is restarted by redstart_server, we didn't clean it's pid from the list.
we can see that when the corrupt-dump-fuzzer hangs, it has a long list of servers to lean, but in fact they're all already dead.
2023-10-13 16:28:52 +03:00
Harkrishn Patro
b784c5375e
Unsubscribe all clients from replica for shard channel if the master ownership changes (#12577)
Unsubscribe all clients from replica for shard channel if the master ownership changes
2023-10-12 20:48:27 -07:00
Ye Lin Aung
b705049a7a
Replace emptyDb() with new emptyData() (#12646)
The function was renamed, but the comments were outdated.
2023-10-12 15:34:08 +03:00
zhaozhao.zz
77a65e82b2
support XREAD[GROUP] with BLOCK option in scripts (#12596)
In #11568 we removed the NOSCRIPT flag from commands and keep the BLOCKING flag.
Aiming to allow them in scripts and let them implicitly behave in the non-blocking way.

In that sense, the old behavior was to allow LPOP and reject BLPOP, and the new behavior,
is to allow BLPOP too, and fail it only in case it ends up blocking.
So likewise, so far we allowed XREAD and rejected XREAD BLOCK, and we will now allow
that too, and only reject it if it ends up blocking.
2023-10-12 10:54:50 +03:00
Binbin
e5ef161374
Fix crash when running rebalance command in a mixed cluster of 7.0 and 7.2 (#12604)
In #10536, we introduced the assert, some older versions of servers
(like 7.0) doesn't gossip shard_id, so we will not add the node to
cluster->shards, and node->shard_id is filled in randomly and may not
be found here.

It causes that if we add a 7.2 node to a 7.0 cluster and allocate slots
to the 7.2 node, the 7.2 node will crash when it hits this assert. Somehow
like #12538.

In this PR, we remove the assert and replace it with an unconditional removal.
2023-10-11 22:15:25 -07:00
Binbin
4de4fcf280
Fix redis-cli pubsub_mode and connect minor prompt / crash issue (#12571)
When entering pubsub mode and using the redis-cli only
connect command, we need to reset pubsub_mode because
we switch to a different connection.

This will affect the prompt when the connection is successful,
and redis-cli will crash when the connect fails:
```
127.0.0.1:6379> subscribe ch
1) "subscribe"
2) "ch"
3) (integer) 1
127.0.0.1:6379(subscribed mode)> connect 127.0.0.1 6380
127.0.0.1:6380(subscribed mode)> ping
PONG
127.0.0.1:6380(subscribed mode)> connect a b
Could not connect to Redis at a:0: Name or service not known
Segmentation fault
```
2023-10-11 10:45:38 +03:00
Binbin
8d92f7f2b7
Support NO ONE block in REPLICAOF command json (#12633)
The current commands.json doesn't mention the special NO ONE arguments.
This change is also applied to SLAVEOF
2023-10-10 11:10:40 +03:00
Oran Agra
b810384c62
dump server longs on hang corrupt dump fuzzer test
recently there are some incidents of hanged tests in the CI
when we try to reproduce them, we get an assertion, not a hang.
maybe the server logs will reveal some info.
2023-10-08 16:19:31 +03:00
Jachin
a2b0701d2c
Fix compile on macOS 13 (#12611)
Use the __MAC_OS_X_VERSION_MIN_REQUIRED macro to detect the
macOS system version instead of using MAC_OS_X_VERSION_10_6.

From MacOSX14.0.sdk, the default definitions of MAC_OS_X_VERSION_xxx have
been removed in usr/include/AvailabilityMacros.h. It includes AvailabilityVersions.h,
where the following condition must be met:
`#if (!defined(_POSIX_C_SOURCE) && !defined(_XOPEN_SOURCE)) || defined(_DARWIN_C_SOURCE)`
Only then will MAC_OS_X_VERSION_xxx be defined.
However, in the project, _DARWIN_C_SOURCE is not defined, which leads to the
loss of the definition for MAC_OS_X_VERSION_10_6.
2023-10-08 11:12:50 +03:00
Oran Agra
fe37e4fc87
Cleanup nested module keyspace notifications (#12630)
Recently we added a way for the module to declare that it wishes to
receive nested KSN, by setting ALLOW_NESTED_KEYSPACE_NOTIFICATIONS.
but it looks like this flow has a bug, clearing the `active` member
when it was previously set. however, since nesting is permitted,
this bug has no implications, since regardless of the active member,
the notification is permitted.
2023-10-05 13:50:17 +03:00
YaacovHazan
2cf50ddbad
Fix 'load corrupted rdb with no CRC' test (#12629)
After the change in #12626 (2e0f6724e), the is_alive proc gets pid and not
server config.

This PR aligns it in 'load corrupted rdb with no CRC' test.
2023-10-03 11:09:25 +03:00
Madelyn Olson
31c3172d9b
Better standardize around assertions (#12539)
We use the C standard assert() in various places in the codebase, which requires NDEBUG to be undefined. We introduced the redisassert.h file in order to allow low level files to access the assert that maps to serverPanic, but this was only applied tactically and is not available broadly.

This PR removes all usage of the standard library asserts and replaces them with an assert that maps to serverPanic. It makes us immune to accidentally setting the NDEBUG flag preventing assertions. I also marked marked the server asserts as "likely" to not execute. I spot checked various points in the code, and it didn't change the code layout on my x86 mac, but it is more consistent with redisassert.h and seems more correct overall.
2023-10-02 18:58:44 -07:00
Madelyn Olson
9d31768cbb
Fix a couple of tabs that caused misindentation (#12541)
Fixed some usages of tabs which caused weird indentation in the code. Tried to find all of the places so their was one PR. I ignored all of the usages of tabs which don't really affect readability.
2023-10-02 16:44:09 -07:00
meiravgri
4ba9e18ef0
fix crash in crash-report and other improvements (#12623)
## Crash fix
### Current behavior
We might crash if we fail to collect some of the threads' output. If it exceeds timeout for example.

The threads mngr API guarantees that the output array length will be `tids_len`, however, some
indices can be NULL, in case it fails to collect some of the threads' outputs.

When we use the threads mngr to collect the threads' stacktraces, we rely on this and skip NULL
entries. Since the output array was allocated with malloc, instead of NULL, it contained garbage,
so we got a segmentation fault when trying to read this garbage. (in debug.c:writeStacktraces() )

### fix
Allocate the global output array with zcalloc.

### To reproduce the bug, you'll have to change the code:
**in threadsmngr:ThreadsManager_runOnThreads():**
make sure the g_output_array allocation is initialized with garbage and not 0s 
(add `memset(g_output_array, 2, sizeof(void*) * tids_len);` below the allocation).

Force one of the threads to write to the array:
add a global var: `static redisAtomic size_t return_now = 0;` 
add to `invoke_callback()` before writing to the output array:
```
    size_t i_return;
    atomicGetIncr(return_now, i_return, 1);
    if(i_return == 1) return;
```
compile, start the server with `--enable-debug-command local` and run `redis-cli debug assert`
The assertion triggers the the stacktrace collection. 
Expect to get 2 prints of the stack trace - since we get the segmentation fault after we return from
the threads mngr, it can be safely triggered again.

## Added global variables r/w lock in ThreadsManager
To avoid a situation where the main thread runs `ThreadsManager_cleanups` while threads are still
invoking the signal handler, we use a r/w lock.
For cleanups, we will acquire the write lock.
The threads will acquire the read lock to enable them to write simultaneously.
If we fail to acquire the read lock, it means cleanups are in progress and we return immediately.
After acquiring the lock we can safely check that the global output array wasn't nullified and proceed
to write to it.
This way we ensure the threads are not modifying the global variables/ trying to write to the output
array after they were zeroed/nullified/destroyed(the semaphore).

## other minor logging change
1. removed logging if the semaphore times out because the threads can still write to the output array
  after this check. Instead, we print the total number of printed stacktraces compared to the exacted
  number (len_tids).
2. use noinline attribute to make sure the uplevel number of ignored stack trace entries stays correct.
3. improve testing

Co-authored-by: Oran Agra <oran@redislabs.com>
2023-10-02 20:02:02 +03:00
YaacovHazan
2e0f6724e0
Stabilization and improvements around aof tests (#12626)
In some tests, the code manually searches for a log message, and it
uses tail -1 with a delay of 1 second, which can miss the expected line.

Also, because the aof tests use start_server_aof and not start_server,
the test name doesn't log into the server log.

To fix the above, I made the following changes:
- Change the start_server_aof to wrap the start_server.
  This will add the created aof server to the servers list, and make
  srv() and wait_for_log_messages() available for the tests.

- Introduce a new option for start_server.
  'wait_ready' - an option to let the caller start the test code without
  waiting for the server to be ready. useful for tests on a server that
  is expected to exit on startup.

- Create a new start_server_aof_ex.
  The new proc also accept options as argument and make use of the
  new 'short_life' option for tests that are expected to exit on startup
  because of some error in the aof file(s).

Because of the above, I had to change many lines and replace every
local srv variable (a server config) usage with the srv().
2023-10-02 08:20:53 +03:00
guybe7
c2a4b78491
WAITAOF: Update fsynced_reploff_pending even if there's nothing to fsync (#12622)
The problem is that WAITAOF could have hang in case commands were
propagated only to replicas.
This can happen if a module uses RM_Call with the REDISMODULE_ARGV_NO_AOF flag.
In that case, master_repl_offset would increase, but there would be nothing to fsync, so
in the absence of other traffic, fsynced_reploff_pending would stay the static, and WAITAOF can hang.

This commit updates fsynced_reploff_pending to the latest offset in flushAppendOnlyFile in case
there's nothing to fsync. i.e. in case it's behind because of the above mentions case it'll be refreshed
and release the WAITAOF.

Other changes:
Fix a race in wait.tcl (client getting blocked vs. the fsync thread)
2023-09-28 17:19:20 +03:00
guybe7
bfa3931a04
WAITAOF: Update fsynced_reploff_pending just before starting the initial AOFRW fork (#12620)
If we set `fsynced_reploff_pending` in `startAppendOnly`, and the fork doesn't start
immediately (e.g. there's another fork active at the time), any subsequent commands
will increment `server.master_repl_offset`, but will not cause a fsync (given they were
executed before the fork started, they just ended up in the RDB part of it)
Therefore, any WAITAOF will wait on the new master_repl_offset, but it will time out
because no fsync will be executed.

Release notes:
```
WAITAOF could timeout in the absence of write traffic in case a new AOF is created and
an AOFRW can't immediately start.
This can happen by the appendonly config is changed at runtime, but also after FLUSHALL,
and replica full sync.
```
2023-09-28 17:05:53 +03:00
Viktor Söderqvist
f924bebd83
Rewrite huge printf calls to smaller ones for readability (#12257)
In a long printf call with many placeholders, it's hard to see which argument
belongs to which placeholder.

The long printf-like calls in the INFO and CLIENT commands are rewritten into
pairs of (format, argument). These pairs are then rewritten to a single call with
a long format string and a long list of arguments, using a macro called FMTARGS.

The file `fmtargs.h` is added to the repo.

Co-authored-by: Madelyn Olson <34459052+madolson@users.noreply.github.com>
2023-09-28 09:21:23 +03:00
Binbin
9fe63bdc80
Dump server logs when corrupt fuzzer reports crash (#12612)
Recently we found some signal crashes, but unable to reproduce them.
It is a good idea to dump the server logs when a failure happens.
2023-09-27 09:08:18 +03:00
Sankar
8cdeddc81c
Clear owner_not_claiming_slot bit for the slot in clusterDelSlot (#12564)
Clear owner_not_claiming_slot bit for the slot in clusterDelSlot to keep it
consistent with slot ownership information.
2023-09-26 14:03:27 -07:00
Nir Rattner
24187ed8e3
Fix overflow calculation for next timer event (#12474)
The `retval` variable is defined as an `int`, so with 4 bytes, it cannot properly represent
microsecond values greater than the equivalent of about 35 minutes. 

This bug shouldn't impact standard Redis behavior because Redis doesn't have timer
events that are scheduled as far as 35 minutes out, but it may affect custom Redis modules
which interact with the event timers via the RM_CreateTimer API.

The impact is that `usUntilEarliestTimer` may return 0 for as long as `retval` is scaled to
an overflowing value. While `usUntilEarliestTimer` continues to return `0`, `aeApiPoll`
will have a zero timeout, and so Redis will use significantly more CPU iterating through
its event loop without pause. For timers scheduled far enough into the future, Redis will
cycle between ~35 minute periods of high CPU usage and ~35 minute periods of standard
CPU usage.
2023-09-24 13:31:12 +03:00
meiravgri
cc2be63997
Print stack trace from all threads in crash report (#12453)
In this PR we are adding the functionality to collect all the process's threads' backtraces.

## Changes made in this PR

### **introduce threads mngr API**
The **threads mngr API** which has 2 abilities:
* `ThreadsManager_init() `- register to SIGUSR2. called on the server start-up.
* ` ThreadsManager_runOnThreads()` - receives a list of a pid_t and a callback, tells every
  thread in the list to invoke the callback, and returns the output collected by each invocation.
**Elaborating atomicvar API**
* `atomicIncrGet(var,newvalue_var,count) `-- Increment and get the atomic counter new value
* `atomicFlagGetSet` -- Get and set the atomic counter value to 1

### **Always set SIGALRM handler**
SIGALRM handler prints the process's stacktrace to the log file. Up until now, it was set only if the
`server.watchdog_period` > 0. This can be also useful if debugging is needed. However, in situations
where the server can't get requests, (a deadlock, for example) we weren't able to change the signal handler.
To make it available at run time we set SIGALRM handler on server startup. The signal handler name was
changed to a more general `sigalrmSignalHandler`.

### **Print all the process' threads' stacktraces**

`logStackTrace()` now calls `writeStacktraces()`, instead of logging the current thread stacktrace.
`writeStacktraces()`:
* On Linux systems we use the threads manager API to collect the backtraces of all the process' threads.
  To get the `tids` list (threads ids) we read the `/proc/<redis-server-pid>/tasks` file which includes a list of directories.
  Each directory name corresponds to one tid (including the main thread). For each thread, we also need to check if it
  can get the signal from the threads manager (meaning it is not blocking/ignoring that signal). We send the threads
  manager this tids list and `collect_stacktrace_data()` callback, which collects the thread's backtrace addresses,
  its name, and tid.
* On other systems, the behavior remained as it was (writing only the current thread stacktrace to the log file).

## compatibility notes
1. **The threads mngr API is only supported in linux.** 
2. glibc earlier than 2.3 We use `syscall(SYS_gettid)` and `syscall(SYS_tgkill...)` because their dedicated
  alternatives (`gettid()` and `tgkill`) were added in glibc 2.3.

## Output example

Each thread backtrace will have the following format:
`<tid> <thread_name> [additional_info]`
* **tid**: as read from the `/proc/<redis-server-pid>/tasks` file
* **thread_name**: the tread name as it is registered in the os/
* **additional_info**: Sometimes we want to add specific information about one of the threads. currently.
  it is only used to mark the thread that handles the backtraces collection by adding "*".
  In case of crash - this also indicates which thread caused the crash. The handling thread in won't
  necessarily appear first.

```
------ STACK TRACE ------
EIP:
/lib/aarch64-linux-gnu/libc.so.6(epoll_pwait+0x9c)[0xffffb9295ebc]

67089 redis-server *
linux-vdso.so.1(__kernel_rt_sigreturn+0x0)[0xffffb9437790]
/lib/aarch64-linux-gnu/libc.so.6(epoll_pwait+0x9c)[0xffffb9295ebc]
redis-server *:6379(+0x75e0c)[0xaaaac2fe5e0c]
redis-server *:6379(aeProcessEvents+0x18c)[0xaaaac2fe6c00]
redis-server *:6379(aeMain+0x24)[0xaaaac2fe7038]
redis-server *:6379(main+0xe0c)[0xaaaac3001afc]
/lib/aarch64-linux-gnu/libc.so.6(+0x273fc)[0xffffb91d73fc]
/lib/aarch64-linux-gnu/libc.so.6(__libc_start_main+0x98)[0xffffb91d74cc]
redis-server *:6379(_start+0x30)[0xaaaac2fe0370]

67093 bio_lazy_free
/lib/aarch64-linux-gnu/libc.so.6(+0x79dfc)[0xffffb9229dfc]
/lib/aarch64-linux-gnu/libc.so.6(pthread_cond_wait+0x208)[0xffffb922c8fc]
redis-server *:6379(bioProcessBackgroundJobs+0x174)[0xaaaac30976e8]
/lib/aarch64-linux-gnu/libc.so.6(+0x7d5c8)[0xffffb922d5c8]
/lib/aarch64-linux-gnu/libc.so.6(+0xe5d1c)[0xffffb9295d1c]

67091 bio_close_file
/lib/aarch64-linux-gnu/libc.so.6(+0x79dfc)[0xffffb9229dfc]
/lib/aarch64-linux-gnu/libc.so.6(pthread_cond_wait+0x208)[0xffffb922c8fc]
redis-server *:6379(bioProcessBackgroundJobs+0x174)[0xaaaac30976e8]
/lib/aarch64-linux-gnu/libc.so.6(+0x7d5c8)[0xffffb922d5c8]
/lib/aarch64-linux-gnu/libc.so.6(+0xe5d1c)[0xffffb9295d1c]

67092 bio_aof
/lib/aarch64-linux-gnu/libc.so.6(+0x79dfc)[0xffffb9229dfc]
/lib/aarch64-linux-gnu/libc.so.6(pthread_cond_wait+0x208)[0xffffb922c8fc]
redis-server *:6379(bioProcessBackgroundJobs+0x174)[0xaaaac30976e8]
/lib/aarch64-linux-gnu/libc.so.6(+0x7d5c8)[0xffffb922d5c8]
/lib/aarch64-linux-gnu/libc.so.6(+0xe5d1c)[0xffffb9295d1c]
67089:signal-handler (1693824528) --------
```
2023-09-24 09:47:23 +03:00
Chen Tianjie
2aad03fa39
Use server.current_client to decide whether cluster commands should return TLS info. (#12569)
Starting a change in #12233 (released in 7.2), CLUSTER commands use client's
connection to decide whether to return TLS port or non-TLS port, but commands
called by Lua script and module's RM_Call don't have a real client with connection,
and would currently be regarded as non-TLS connections.

We can use server.current_client instead when it is available. When it is not (module calls
commands without a real client), we may see this as an undefined behavior, and return null
or default port (currently in this PR it returns default port, judged by server.tls_cluster).
2023-09-21 18:41:32 +03:00
Binbin
4031a18732
Fix that slot return in CLUSTER SHARDS should be integer (#12561)
An unintentional change was introduced in #10536, we used
to use addReplyLongLong and now it is addReplyBulkLonglong, 
revert it back the previous behavior.
2023-09-09 23:33:00 -07:00
Binbin
96e9dec419
Bump codespell from 2.2.4 to 2.2.5 (#12557)
and adjustments.
2023-09-08 16:10:17 +03:00
nihohit
90e9fc387c
Update command tips on more admin / configuration commands (#12545)
Updated the command tips for ACL SAVE / SETUSER / DELUSER, CLIENT SETNAME / SETINFO, and LATENCY RESET.
The tips now match CONFIG SET, since there's a similar behavior for all of these commands - the
user expects to update the various configurations & states on all nodes, not only on a single, random node.
For LATENCY RESET the response tip is now agg_sum.

Co-authored-by: Shachar Langbeheim <shachlan@amazon.com>
2023-09-04 21:30:42 +03:00
secwall
a2046c1eb1
Check shard_id pointer validity in updateShardId (#12538)
When connecting between a 7.0 and 7.2 cluster, the 7.0 cluster will not populate the shard_id field, which is expect on the 7.2 cluster. This is not intended behavior, as the 7.2 cluster is supposed to use a temporary shard_id while the node is in the upgrading state, but it wasn't being correctly set in this case.
2023-09-02 20:14:48 -07:00
alonre24
044e29dd34
redis-benchmark - add the support for binary strings (#9414)
Recently, the option of sending an argument from stdin using `-x` flag
was added to redis-benchmark (this option is available in redis-cli as well).
However, using the `-x` option for sending a blobs that contains null-characters
doesn't work as expected - the argument is trimmed in the first occurrence of
`\X00` (unlike in redis-cli).  
This PR aims to fix this issue and add the support for every binary string input,
by sending arguments length to `redisFormatCommandArgv` when processing
redis-benchmark command, so we won't treat the arguments as C-strings.

Additionally, we add a simple test coverage for `-x` (without binary strings, and
also remove an excessive server started in tests, and make sure to select db 0
so that `r` and the benchmark work on the same db.

Co-authored-by: Oran Agra <oran@redislabs.com>
2023-09-02 15:37:04 +03:00
Binbin
4ba144a4eb
Add logreqres:skip flag to new INFO obuf limit test (#12537)
The new test added in #12476 causes reply-schemas-validator to fail.
When doing `catch {r get key}`, the req-res output is:
```
3
get
3
key
12
__argv_end__
$100000
aaaaaaaaaaaaaaaaaaaa...4
info
5
stats
12
__argv_end__
=1670
txt:# Stats
...
```

And we can see the link after `$100000`, there is a 4 in the last,
it break the req-res-log-validator script since the format is wrong.

The reason i guess is after the client reconnection (after the output
buf limit), we will not add newlines, but append args directly.
Since obuf-limits.tcl is doing the same thing, and it had the logreqres:skip
flag, so this PR is following it.
2023-09-01 14:15:11 +03:00
Roshan Khatri
49f7d173b4
Remove unnecessary use of sds and mem copy in module.c (#12533)
Found that in moduleConfigValidityCheck and isModuleConfigNameRegistered, sds is not required. This also allowed to remove unnecessary memcopy from some of the config registering APIs.
2023-08-31 14:08:05 -07:00
icy17
370d38016f
Fix potential crash on failed OpenSSL init (#12447) 2023-08-31 22:45:36 +03:00
Chen Tianjie
b26e8e3213
Optimize ZRANGE offset location from linear search to skiplist jump. (#12450)
ZRANGE BYSCORE/BYLEX with [LIMIT offset count] option was
using every level in skiplist to jump to the first/last node in range,
but only use level[0] in skiplist to locate the node at offset, resulting
in sub-optimal performance using LIMIT:
```
while (ln && offset--) {
    if (reverse) {
        ln = ln->backward;
    } else {
        ln = ln->level[0].forward;
    }
}
```
It could be slow when offset is very big. We can get the total rank of
the offset location and use skiplist to jump to it. It is an improvement
from O(offset) to O(log rank).

Below shows how this is implemented (if the offset is positve):

Use the skiplist to seach for the first element in the range, record its
rank `rank_0`, so we can have the rank of the target node `rank_t`.
Meanwhile we record the last node we visited which has zsl->level-1
levels and its rank `rank_1`. Then we start from the zsl->level-1 node,
use skiplist to go forward `rank_t-rank_1` nodes to reach the target node.

It is very similiar when the offset is reversed.

Note that if `rank_t` is very close to `rank_0`, we just start from the first
element in range and go node by node, this for the case when zsl->level-1
node is to far away and it is quicker to reach the target node by node.

Here is a test using a random generated zset including 10000 elements
(with different positive scores), doing a bench mark which compares how
fast the `ZRANGE` command is exucuted before and after the optimization. 

The start score is set to 0 and the count is set to 1 to make sure that
most of the time is spent on locating the offset.
```
memtier_benchmark -h 127.0.0.1 -p 6379 --command="zrange test 0 +inf byscore limit <offset> 1"
```
| offset | QPS(unstable) | QPS(optimized) |
|--------|--------|--------|
| 10 | 73386.02 | 74819.82 |
| 1000 | 48084.96 | 73177.73 |
| 2000 | 31156.79 | 72805.83 |
| 5000 | 10954.83 | 71218.21 |

With the result above, we can see that the original code is greatly
slowed down when offset gets bigger, and with the optimization the
speed is almost not affected.

Similiar results are generated when testing reversed offset:
```
memtier_benchmark -h 127.0.0.1 -p 6379 --command="zrange test +inf 0 byscore rev limit <offset> 1"
```
| offset | QPS(unstable) | QPS(optimized) |
|--------|--------|--------|
| 10 | 74505.14 | 71653.67 |
| 1000 | 46829.25 | 72842.75 |
| 2000 | 28985.48 | 73669.01 |
| 5000 | 11066.22 | 73963.45 | 

And the same conclusion is drawn from the tests of ZRANGE BYLEX.
2023-08-31 14:42:08 +03:00
Binbin
9ce8c54d74
Update sort_ro reply_schema to mention the null reply (#12534)
Also added a test to cover this case, so this can
cover the reply schemas check.
2023-08-31 06:36:35 +03:00
Roshan Khatri
7519960527
Allows modules to declare new ACL categories. (#12486)
This PR adds a new Module API int RM_AddACLCategory(RedisModuleCtx *ctx, const char *category_name) to add a new ACL command category.

Here, we initialize the ACLCommandCategories array by allocating space for 64 categories and duplicate the 21 default categories from the predefined array 'ACLDefaultCommandCategories' into the ACLCommandCategories array while ACL initialization. Valid ACL category names can only contain alphanumeric characters, underscores, and dashes.

The API when called, checks for the onload flag, category name validity, and for duplicate category name if present. If the conditions are satisfied, the API adds the new category to the trailing end of the ACLCommandCategories array and assigns the acl_categories flag bit according to the index at which the category is added.

If any error is encountered the errno is set accordingly by the API.

---------

Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>
2023-08-30 13:01:24 -07:00
bodong.ybd
b59f53efb3
Fix sort_ro get-keys function return wrong key number (#12522)
Before:
```
127.0.0.1:6379> command getkeys sort_ro key
(empty array)
127.0.0.1:6379>
```
After:
```
127.0.0.1:6379> command getkeys sort_ro key
1) "key"
127.0.0.1:6379>
```
2023-08-30 22:00:02 +03:00
Chen Tianjie
e3d4b30d09
Add two stats to count client input and output buffer oom. (#12476)
Add these INFO metrics:
* client_query_buffer_limit_disconnections
* client_output_buffer_limit_disconnections

Sometimes it is useful to monitor whether clients reaches size limit of
query buffer and output buffer, to decide whether we need to adjust the
buffer size limit or reduce client query payload.
2023-08-30 21:51:14 +03:00
nihohit
4b281ce519
Align CONFIG RESETSTAT/REWRITE tips with SET. (#12530)
Since the three commands have similar behavior (change config, return
OK), the tips that govern how they should behave should be similar.

Co-authored-by: Shachar Langbeheim <shachlan@amazon.com>
2023-08-30 21:49:02 +03:00
Binbin
e792653753
Add printing for LATENCY related tests (#12514)
This test failed several times:
```
*** [err]: LATENCY GRAPH can output the event graph in tests/unit/latency-monitor.tcl
Expected '478' to be more than or equal to '500' (context: type eval
line 8 cmd {assert_morethan_equal $high 500} proc ::test)
```

Not sure why, adding some verbose printing that'll print the command
result on the next time.
2023-08-27 11:42:55 +03:00
Danilo Bargen
a6eff389b5
redis.conf: Add data loss warning to "appendonly" (#12506)
warning against editing the config file and restarting the server.
which will attempt to load an AOF file and disregard the RDB.

Co-authored-by: Oran Agra <oran@redislabs.com>
2023-08-22 18:15:47 +03:00
Binbin
1407ac1f3e
BITCOUNT and BITPOS with non-existing key and illegal arguments should return error, not 0 (#11734)
BITCOUNT and BITPOS with non-existing key will return 0 even the
arguments are error, before this commit:
```
> flushall
OK
> bitcount s 0
(integer) 0
> bitpos s 0 0 1 hello
(integer) 0

> set s 1
OK
> bitcount s 0
(error) ERR syntax error
> bitpos s 0 0 1 hello
(error) ERR syntax error
```

The reason is that we judged non-existing before parameter checking and
returned. This PR fixes it, and after this commit:
```
> flushall
OK
> bitcount s 0
(error) ERR syntax error
> bitpos s 0 0 1 hello
(error) ERR syntax error
```

Also BITPOS made the same fix as #12394, check for wrong argument, before
checking for key.
```
> lpush mylist a b c
(integer) 3                                                                                    
> bitpos mylist 1 a b
(error) WRONGTYPE Operation against a key holding the wrong kind of value
```
2023-08-21 19:48:30 +03:00
Wen Hui
45d3310694
BITCOUNT: check for argument, before checking for key (#12394)
Generally, In any command we first check for  the argument and then check if key exist.

Some of the examples are

```
127.0.0.1:6379> getrange no-key invalid1 invalid2
(error) ERR value is not an integer or out of range
127.0.0.1:6379> setbit no-key 1 invalid
(error) ERR bit is not an integer or out of range
127.0.0.1:6379> xrange no-key invalid1 invalid2
(error) ERR Invalid stream ID specified as stream command argument
```

**Before change** 
```
bitcount no-key invalid1 invalid2
0
```

**After change**
```
bitcount no-key invalid1 invalid2
(error) ERR value is not an integer or out of range
```
2023-08-21 12:53:46 +03:00
Binbin
c98a28a848
Fix LREM count LONG_MIN overflow minor issue (#12465)
Limit the range of LREM count to -LONG_MAX ~ LONG_MAX.
Before the fix, passing -LONG_MAX would cause an overflow
and would effectively be the same as passing 0. (Because
this condition `toremove && removed == toremove `can never
be satisfied).

This is a minor fix as it shouldn't really affect users,
more like a cleanup.
2023-08-21 12:50:41 +03:00
Yves LeBras
16988208bd
config.memkeys init for consistency (#12505)
Initializing `memkeys` to 0 for consistency and clarity.
the config struct is anyway zeroed, but other fields are explicitly initialized.
2023-08-21 08:17:07 +03:00
Wen Hui
e532c95dfc
Added tests for Client commands (#10276)
In our test case, now we missed some test coverage for client sub-commands.
This pr goal is to add some test coverage cases of the following commands:

Client caching
Client kill
Client no-evict
Client pause
Client reply
Client tracking
Client setname

At the very least, this is useful to make sure there are no leaks and crashes in these code paths.
2023-08-20 19:17:51 +03:00
meiravgri
fe47c2027b
Signal handler attributes (#12426)
This PR purpose is to make the crash report process thread safe.
main changes include:

1. `setupSigSegvHandler()` is introduced to initialize the signal handler.
This function first initializes the signal handler mutex (if not initialized yet)
and then registers the process to the signal handler. 

2. **sigsegvHandler** flags :
SA_NODEFER - don't add the signal to the process signal mask. We use this
flag because we want to be able to handle a second call to the signal manually.
removed SA_RESETHAND: this flag resets the signal handler function upon the first
entrance to the registered function. The reason to use this flag is to protect from
recursively entering the signal handler by the same thread. But, it also means
that if a second thread crashes while handling a signal, the process will be
terminated immediately and we won't get the crash report.
In this PR we discard this flag. The signal handler guard described below purpose
is to solve the above issues.

3. Add a **signal handler lock** with ERRORCHECK attributes. 
The lock's purpose is to ensure that only one thread generates a crash report.
Once a second thread enters the signal handler it will be blocked.
We use the ERRORCHECK lock in order to protect from possible deadlock in
case the thread handling the crash gets a signal. In the latest scenario, we log
what we have collected until the handler crashed.

At the end of the crash report we reset the signal handler SIG_DFL, with no flags, and
rethrow the signal to generate a core dump (if enabled) and exit the process.

During the work on this PR we wanted to understand the historical reasons for
how crash is handled.
With respect to the choice of the flag, we believe the **SA_RESETHAND** was not
added for any specific purpose.
**SA_ONSTACK** which is removed here from bugReportEnd(), was originally also
set in the initial registration to signal handler, but removed in 3ada43e73. In addition,
it was removed from another location in deee2c1ef with the following description,
which is also relevant to why it should be removed from bugReportEnd:

> it seems to be some valgrind bug with SA_ONSTACK.
> SA_ONSTACK seems unneeded since WD is not recursive (SA_NODEFER was removed),
> also, not sure if it's even valid without a call to sigaltstack()
2023-08-20 19:16:45 +03:00
Binbin
44cc0fcb9d
redis-cli --stat take dbnum value from CONFIG GET to output total keys (#12279)
In the past we hardcoded it to 20, causing it to not count keys
for more databases.
2023-08-16 10:54:37 +03:00
Tyler Bream (Event pipeline)
ac6bc5d1a8
redis-cli: Fix print of keys per cluster host when over int max (#11698)
When running cluster info, and the number of keys overflows the integer
value, the summary no longer makes sense. This fixes by using an appropriate
type to handle values over the max int value.
2023-08-16 10:48:49 +03:00
WangYu
17904780ae
skip the rehashed entries in dictNext (#12386)
If dict is rehashing, the  entries in the head of table[0] is moved to table[1]
and all entries in `table[0][0:rehashidx]` is NULL.

`dictNext` start looking for non-NULL entry from table 0 index 0, and the first call
of `dictNext` on a rehashing dict will Iterate many times to skip those NULL entries.
We can easily skip those entries by setting `iter->index` as `iter->d->rehashidx` when
dict is rehashing and it's the first call of dictNext (`iter->index == -1 && iter->table == 0`).

Co-authored-by: sundb <sundbcn@gmail.com>
2023-08-16 10:45:26 +03:00
Wen Hui
965dc90b72
change return type to be consistant (#12479)
Currently rdbSaveMillisecondTime, rdbSaveDoubleValue api's return type is
int but they return the value directly from rdbWriteRaw function which has the
return type of ssize_t. As this may cause overflow to int so changed to ssize_t.
2023-08-16 10:38:59 +03:00
Oran Agra
2b8cde71bb
Update supported version list. (#12488)
Add 7.2, drop 6.0 as per https://redis.io/docs/about/releases/
Also replace a few concordances of the `’` char, with standard `'`
2023-08-16 08:36:40 +03:00
780 changed files with 101583 additions and 64922 deletions

76
.cmake-format.yaml Normal file
View File

@ -0,0 +1,76 @@
format:
_help_line_width:
- How wide to allow formatted cmake files
line_width: 120
_help_tab_size:
- How many spaces to tab for indent
tab_size: 4
_help_use_tabchars:
- If true, lines are indented using tab characters (utf-8
- 0x09) instead of <tab_size> space characters (utf-8 0x20).
- In cases where the layout would require a fractional tab
- character, the behavior of the fractional indentation is
- governed by <fractional_tab_policy>
use_tabchars: false
_help_separate_ctrl_name_with_space:
- If true, separate flow control names from their parentheses
- with a space
separate_ctrl_name_with_space: true
_help_min_prefix_chars:
- If the statement spelling length (including space and
- parenthesis) is smaller than this amount, then force reject
- nested layouts.
min_prefix_chars: 4
_help_max_prefix_chars:
- If the statement spelling length (including space and
- parenthesis) is larger than the tab width by more than this
- amount, then force reject un-nested layouts.
max_prefix_chars: 10
_help_max_lines_hwrap:
- If a candidate layout is wrapped horizontally but it exceeds
- this many lines, then reject the layout.
max_lines_hwrap: 2
_help_line_ending:
- What style line endings to use in the output.
line_ending: unix
_help_command_case:
- Format command names consistently as 'lower' or 'upper' case
command_case: lower
_help_keyword_case:
- Format keywords consistently as 'lower' or 'upper' case
keyword_case: unchanged
_help_always_wrap:
- A list of command names which should always be wrapped
always_wrap: []
_help_enable_sort:
- If true, the argument lists which are known to be sortable
- will be sorted lexicographicall
enable_sort: true
_help_autosort:
- If true, the parsers may infer whether or not an argument
- list is sortable (without annotation).
autosort: false
_help_require_valid_layout:
- By default, if cmake-format cannot successfully fit
- everything into the desired linewidth it will apply the
- last, most aggressive attempt that it made. If this flag is
- True, however, cmake-format will print error, exit with non-
- zero status code, and write-out nothing
require_valid_layout: false
_help_layout_passes:
- A dictionary mapping layout nodes to a list of wrap
- decisions. See the documentation for more information.
layout_passes: {}
encode:
_help_emit_byteorder_mark:
- If true, emit the unicode byte-order mark (BOM) at the start
- of the file
emit_byteorder_mark: false
_help_input_encoding:
- Specify the encoding of the input file. Defaults to utf-8
input_encoding: utf-8
_help_output_encoding:
- Specify the encoding of the output file. Defaults to utf-8.
- Note that cmake only claims to support utf-8 so be careful
- when using anything else
output_encoding: utf-8

View File

@ -1,5 +0,0 @@
[codespell]
quiet-level = 2
count =
skip = ./deps,./src/crc16_slottable.h,tmp*,./.git,./lcov-html
ignore-words = ./.codespell/wordlist.txt

View File

@ -1 +0,0 @@
codespell==2.2.4

View File

@ -1,21 +0,0 @@
ake
bale
fle
fo
gameboy
mutli
nd
nees
oll
optin
ot
smove
te
tre
cancelability
ist
statics
filetest
ro
exat
clen

7
.config/format.yml Normal file
View File

@ -0,0 +1,7 @@
formatter:
type: basic
indent: 2
retain_line_breaks_single: true
exclude:
- "deps/"

65
.config/typos.toml Normal file
View File

@ -0,0 +1,65 @@
# See https://github.com/crate-ci/typos/blob/master/docs/reference.md to configure typos
[files]
extend-exclude = [
".git/",
"deps/",
# crc16_slottable is primarily pre-generated random strings.
"src/crc16_slottable.h",
]
ignore-hidden = false
[default.extend-words]
exat = "exat"
optin = "optin"
smove = "smove"
[type.c]
extend-ignore-re = [
"BA3E2571", # sha1.c
"D4C4DAA4", # sha1.c
"Georg Nees",
"\\[l\\]ist", # eval.c
'"LKE"', # test_rax.c
]
[type.tcl]
extend-ignore-re = [
"DUMPed",
]
[type.c.extend-identifiers]
advices = "advices"
clen = "clen"
fle = "fle"
nd = "nd"
ot = "ot"
[type.tcl.extend-identifiers]
oll = "oll"
stressers = "stressers"
[type.sv.extend-identifiers]
# sv = .h
fo = "fo"
[type.sv.extend-words]
# sv = .h
seeked = "seeked"
[type.c.extend-words]
arange = "arange"
fo = "fo"
frst = "frst"
limite = "limite"
pn = "pn"
seeked = "seeked"
tre = "tre"
[type.systemd.extend-words]
# systemd = .conf
ake = "ake"
[type.tcl.extend-words]
fo = "fo"
tre = "tre"

17
.git-blame-ignore-revs Normal file
View File

@ -0,0 +1,17 @@
# This is a file that can be used by git-blame to ignore some revisions.
# (git 2.23+, released in August 2019)
#
# Can be configured as follow:
#
# $ git config blame.ignoreRevsFile .git-blame-ignore-revs
#
# For more information you can look at git-blame(1) man page.
# Applied clang-format (#323)
c41dd77a3e93e02be3c4bc75d8c76b7b4169a4ce
# Removed terms `master` and `slave` from the source code (#591)
54c97479356ecf41b4b63733494a1be2ab919e17
# Set ColumnLimit to 0 and reformat (#1045)
af811748e7819a5ac31a6df4b21622aa58c64ae4

View File

@ -1,6 +1,6 @@
---
name: Bug report
about: Help us improve Redis by reporting a bug
about: Help us improve by reporting a bug
title: '[BUG]'
labels: ''
assignees: ''

17
.github/ISSUE_TEMPLATE/config.yml vendored Normal file
View File

@ -0,0 +1,17 @@
blank_issues_enabled: true
contact_links:
- name: Questions?
url: https://github.com/valkey-io/valkey/discussions
about: Ask and answer questions on GitHub Discussions.
- name: Chat with us on Discord?
url: https://discord.gg/zbcPa5umUB
about: We are on Discord!
- name: Chat with us on Matrix?
url: https://matrix.to/#/#valkey:matrix.org
about: We are on Matrix too!
- name: Chat with us on Slack?
url: https://join.slack.com/t/valkey-oss-developer/shared_invite/zt-2nxs51chx-EB9hu9Qdch3GMfRcztTSkQ
about: We are on Slack too!
- name: Documentation issue?
url: https://github.com/valkey-io/valkey-doc/issues
about: Report it on the valkey-doc repo.

View File

@ -1,25 +0,0 @@
---
name: Crash report
about: Submit a crash report
title: '[CRASH] <short description>'
labels: ''
assignees: ''
---
Notice!
- If a Redis module was involved, please open an issue in the module's repo instead!
- If you're using docker on Apple M1, please make sure the image you're using was compiled for ARM!
**Crash report**
Paste the complete crash log between the quotes below. Please include a few lines from the log preceding the crash report to provide some context.
```
```
**Additional information**
1. OS distribution and version
2. Steps to reproduce (if any)

34
.github/ISSUE_TEMPLATE/crash_report.yml vendored Normal file
View File

@ -0,0 +1,34 @@
name: Crash report
description: Submit a crash report
title: '[CRASH] <short description>'
body:
- type: markdown
attributes:
value: |
Thanks for taking the time to post your crash report!
Please notice:
- If a module was involved, please open an issue in the module's repo instead!
- If you're using docker on Apple M1, please make sure the image you're using was compiled for ARM!
- type: textarea
id: crash-report
attributes:
label: Crash report
description: Paste the complete crash log. Please include a few lines from the log preceding the crash report to provide some context.
render: shell
validations:
required: true
- type: markdown
attributes:
value: |
Please provide the following additional information below:
- OS distribution and version
- Steps to reproduce (if any)
- type: textarea
id: additional-information
attributes:
label: Additional information
description: OS version, steps to reproduce and other useful info.
render: shell
validations:
required: false

View File

@ -1,6 +1,6 @@
---
name: Feature request
about: Suggest a feature for Redis
about: Suggest a feature
title: '[NEW]'
labels: ''
assignees: ''

View File

@ -1,8 +0,0 @@
---
name: Other
about: Can't find the right issue type? Use this one!
title: ''
labels: ''
assignees: ''
---

View File

@ -1,21 +0,0 @@
---
name: Question
about: Ask the Redis developers
title: '[QUESTION]'
labels: ''
assignees: ''
---
Please keep in mind that this issue tracker should be used for reporting bugs or proposing improvements to the Redis server.
Generally, questions about using Redis should be directed to the [community](https://redis.io/community):
* [the mailing list](https://groups.google.com/forum/#!forum/redis-db)
* [the `redis` tag at StackOverflow](http://stackoverflow.com/questions/tagged/redis)
* [/r/redis subreddit](http://www.reddit.com/r/redis)
* [github discussions](https://github.com/redis/redis/discussions)
It is also possible that your question was already asked here, so please do a quick issues search before submitting. Lastly, if your question is about one of Redis' [clients](https://redis.io/clients), you may to contact your client's developers for help.
That said, please feel free to replace all this with your question :)

View File

@ -0,0 +1,44 @@
name: Generate target matrix.
description: Matrix creation for building Valkey for different architectures and platforms.
inputs:
ref:
description: The commit, tag or branch of Valkey to checkout to determine what version to use.
required: true
outputs:
x86_64-build-matrix:
description: The x86_64 build matrix.
value: ${{ steps.set-matrix.outputs.x86matrix }}
arm64-build-matrix:
description: The arm64 build matrix.
value: ${{ steps.set-matrix.outputs.armmatrix }}
runs:
using: "composite"
steps:
- name: Checkout code for version check
uses: actions/checkout@v4
with:
ref: ${{ inputs.ref }}
path: version-check
- name: Get targets
run: |
x86_arch=$(jq -c '[.linux_targets[] | select(.arch=="x86_64")]' .github/actions/generate-package-build-matrix/build-config.json)
x86_matrix=$(echo "{ \"distro\" : $x86_arch }" | jq -c .)
echo "X86_MATRIX=$x86_matrix" >> $GITHUB_ENV
arm_arch=$(jq -c '[.linux_targets[] | select(.arch=="arm64")]' .github/actions/generate-package-build-matrix/build-config.json)
arm_matrix=$(echo "{ \"distro\" : $arm_arch }" | jq -c .)
echo "ARM_MATRIX=$arm_matrix" >> $GITHUB_ENV
shell: bash
- id: set-matrix
run: |
echo $X86_MATRIX
echo $X86_MATRIX| jq .
echo "x86matrix=$X86_MATRIX" >> $GITHUB_OUTPUT
echo $ARM_MATRIX
echo $ARM_MATRIX| jq .
echo "armmatrix=$ARM_MATRIX" >> $GITHUB_OUTPUT
shell: bash

View File

@ -0,0 +1,35 @@
{
"linux_targets": [
{
"arch": "x86_64",
"target": "ubuntu-20.04",
"type": "deb",
"platform": "focal"
},
{
"arch": "x86_64",
"target": "ubuntu-22.04",
"type": "deb",
"platform": "jammy"
},
{
"arch": "x86_64",
"target": "ubuntu-24.04",
"type": "deb",
"platform": "noble"
},
{
"arch": "arm64",
"target": "ubuntu20.04",
"type": "deb",
"platform": "focal"
},
{
"arch": "arm64",
"target": "ubuntu22.04",
"type": "deb",
"platform": "jammy"
}
]
}

View File

@ -2,14 +2,9 @@
# package ecosystems to update and where the package manifests are located.
# Please see the documentation for all configuration options:
# https://help.github.com/github/administering-a-repository/configuration-options-for-dependency-updates
version: 2
updates:
- package-ecosystem: github-actions
directory: /
schedule:
interval: weekly
- package-ecosystem: pip
directory: /.codespell
schedule:
interval: weekly

View File

@ -0,0 +1,112 @@
name: Build Release Packages
on:
release:
types: [published]
push:
paths:
- '.github/workflows/build-release-packages.yml'
- '.github/workflows/call-build-linux-arm-packages.yml'
- '.github/workflows/call-build-linux-x86-packages.yml'
- '.github/actions/generate-package-build-matrix/build-config.json'
workflow_dispatch:
inputs:
version:
description: Version of Valkey to build
required: true
permissions:
id-token: write
contents: read
jobs:
# This job provides the version metadata from the tag for the other jobs to use.
release-build-get-meta:
name: Get metadata to build
if: github.event_name == 'workflow_dispatch' || github.repository == 'valkey-io/valkey'
runs-on: ubuntu-latest
outputs:
version: ${{ steps.get_version.outputs.VERSION }}
is_test: ${{ steps.check-if-testing.outputs.IS_TEST }}
steps:
- run: |
echo "Version: ${{ inputs.version || github.ref_name }}"
shell: bash
# This step is to consolidate the three different triggers into a single "version"
# 1. If manual dispatch - use the version provided.
# 3. If tag trigger, use that tag.
- name: Get the version
id: get_version
run: |
if [[ "${{ github.event_name }}" == "push" ]]; then
VERSION=${{ github.ref_name }}
else
VERSION="${INPUT_VERSION}"
fi
if [ -z "${VERSION}" ]; then
echo "Error: No version specified"
exit 1
fi
echo "VERSION=$VERSION" >> $GITHUB_OUTPUT
shell: bash
env:
# Use the dispatch variable in preference, if empty use the context ref_name which should
# only ever be a tag
INPUT_VERSION: ${{ inputs.version || github.ref_name }}
- name: Check if we are testing
id: check-if-testing
run: |
if [[ "${{ github.event_name }}" == "push" ]]; then
echo "This is a test workflow -> We will upload to the Test S3 Bucket"
echo "IS_TEST=true" >> $GITHUB_OUTPUT
else
echo "This is a Release workflow -> We will upload to the Release S3 Bucket"
echo "IS_TEST=false" >> $GITHUB_OUTPUT
fi
shell: bash
generate-build-matrix:
name: Generating build matrix
if: github.event_name == 'workflow_dispatch' || github.repository == 'valkey-io/valkey'
runs-on: ubuntu-latest
outputs:
x86_64-build-matrix: ${{ steps.set-matrix.outputs.x86_64-build-matrix }}
arm64-build-matrix: ${{ steps.set-matrix.outputs.arm64-build-matrix }}
steps:
- name: Checkout repository
uses: actions/checkout@v4
# Set up the list of target to build so we can pass the JSON to the reusable job
- uses: ./.github/actions/generate-package-build-matrix
id: set-matrix
with:
ref: ${{ needs.release-build-get-meta.outputs.version }}
release-build-linux-x86-packages:
needs:
- release-build-get-meta
- generate-build-matrix
uses: ./.github/workflows/call-build-linux-x86-packages.yml
with:
version: ${{ needs.release-build-get-meta.outputs.version }}
ref: ${{ inputs.version || github.ref_name }}
build_matrix: ${{ needs.generate-build-matrix.outputs.x86_64-build-matrix }}
region: us-west-2
secrets:
bucket_name: ${{ needs.release-build-get-meta.outputs.is_test == 'true' && secrets.AWS_S3_TEST_BUCKET || secrets.AWS_S3_BUCKET }}
role_to_assume: ${{ secrets.AWS_ROLE_TO_ASSUME }}
release-build-linux-arm-packages:
needs:
- release-build-get-meta
- generate-build-matrix
uses: ./.github/workflows/call-build-linux-arm-packages.yml
with:
version: ${{ needs.release-build-get-meta.outputs.version }}
ref: ${{ inputs.version || github.ref_name }}
build_matrix: ${{ needs.generate-build-matrix.outputs.arm64-build-matrix }}
region: us-west-2
secrets:
bucket_name: ${{ needs.release-build-get-meta.outputs.is_test == 'true' && secrets.AWS_S3_TEST_BUCKET || secrets.AWS_S3_BUCKET }}
role_to_assume: ${{ secrets.AWS_ROLE_TO_ASSUME }}

View File

@ -0,0 +1,74 @@
name: Builds Linux arm binary packages into S3 bucket.
on:
workflow_call:
inputs:
version:
description: The version of Valkey to create.
type: string
required: true
ref:
description: The commit, tag or branch of Valkey to checkout for building that creates the version above.
type: string
required: true
build_matrix:
description: The build targets to produce as a JSON matrix.
type: string
required: true
region:
description: The AWS region to push packages into.
type: string
required: true
secrets:
bucket_name:
description: The S3 bucket to push packages into.
required: true
role_to_assume:
description: The role to assume for the S3 bucket.
required: true
permissions:
id-token: write
contents: read
jobs:
build-valkey:
# Capture source tarball and generate checksum for it
name: Build package ${{ matrix.distro.target }} ${{ matrix.distro.arch }}
runs-on: "ubuntu-latest"
strategy:
fail-fast: false
matrix: ${{ fromJSON(inputs.build_matrix) }}
steps:
- name: Checkout repository
uses: actions/checkout@v4
with:
ref: ${{ inputs.version }}
- name: Configure AWS credentials
uses: aws-actions/configure-aws-credentials@v4
with:
aws-region: ${{ inputs.region }}
role-to-assume: ${{ secrets.role_to_assume }}
- name: Make Valkey
uses: uraimo/run-on-arch-action@v2
with:
arch: aarch64
distro: ${{matrix.distro.target}}
install: apt-get update && apt-get install -y build-essential libssl-dev libsystemd-dev
run: make -C src all BUILD_TLS=yes USE_SYSTEMD=yes
- name: Create Tarball and SHA256sums
run: |
TAR_FILE_NAME=valkey-${{inputs.version}}-${{matrix.distro.platform}}-${{ matrix.distro.arch}}
mkdir -p "$TAR_FILE_NAME/bin" "$TAR_FILE_NAME/share"
rsync -av --exclude='*.c' --exclude='*.d' --exclude='*.o' src/valkey-* "$TAR_FILE_NAME/bin/"
cp -v /home/runner/work/valkey/valkey/COPYING "$TAR_FILE_NAME/share/LICENSE"
tar -czvf $TAR_FILE_NAME.tar.gz $TAR_FILE_NAME
sha256sum $TAR_FILE_NAME.tar.gz > $TAR_FILE_NAME.tar.gz.sha256
mkdir -p packages-files
cp -rfv $TAR_FILE_NAME.tar* packages-files/
- name: Sync to S3
run: aws s3 sync packages-files s3://${{ secrets.bucket_name }}/releases/

View File

@ -0,0 +1,72 @@
name: Builds Linux X86 binary packages into S3 bucket.
on:
workflow_call:
inputs:
version:
description: The version of Valkey to create.
type: string
required: true
ref:
description: The commit, tag or branch of Valkey to checkout for building that creates the version above.
type: string
required: true
build_matrix:
description: The build targets to produce as a JSON matrix.
type: string
required: true
region:
description: The AWS region to upload the packages to.
type: string
required: true
secrets:
bucket_name:
description: The name of the S3 bucket to upload the packages to.
required: true
role_to_assume:
description: The role to assume for the S3 bucket.
required: true
permissions:
id-token: write
contents: read
jobs:
build-valkey:
# Capture source tarball and generate checksum for it
name: Build package ${{ matrix.distro.target }} ${{ matrix.distro.arch }}
runs-on: ${{matrix.distro.target}}
strategy:
fail-fast: false
matrix: ${{ fromJSON(inputs.build_matrix) }}
steps:
- name: Checkout repository
uses: actions/checkout@v4
with:
ref: ${{ inputs.version }}
- name: Configure AWS credentials
uses: aws-actions/configure-aws-credentials@v4
with:
aws-region: ${{ inputs.region }}
role-to-assume: ${{ secrets.role_to_assume }}
- name: Install dependencies
run: sudo apt-get update && sudo apt-get install -y build-essential libssl-dev libsystemd-dev
- name: Make Valkey
run: make -C src all BUILD_TLS=yes USE_SYSTEMD=yes
- name: Create Tarball and SHA256sums
run: |
TAR_FILE_NAME=valkey-${{inputs.version}}-${{matrix.distro.platform}}-${{ matrix.distro.arch}}
mkdir -p "$TAR_FILE_NAME/bin" "$TAR_FILE_NAME/share"
rsync -av --exclude='*.c' --exclude='*.d' --exclude='*.o' src/valkey-* "$TAR_FILE_NAME/bin/"
cp -v /home/runner/work/valkey/valkey/COPYING "$TAR_FILE_NAME/share/LICENSE"
tar -czvf $TAR_FILE_NAME.tar.gz $TAR_FILE_NAME
sha256sum $TAR_FILE_NAME.tar.gz > $TAR_FILE_NAME.tar.gz.sha256
mkdir -p packages-files
cp -rfv $TAR_FILE_NAME.tar* packages-files/
- name: Sync to S3
run: aws s3 sync packages-files s3://${{ secrets.bucket_name }}/releases/

View File

@ -2,83 +2,190 @@ name: CI
on: [push, pull_request]
jobs:
concurrency:
group: ci-${{ github.head_ref || github.ref }}
cancel-in-progress: true
permissions:
contents: read
jobs:
test-ubuntu-latest:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: make
# Fail build if there are warnings
# build with TLS just for compilation coverage
run: make REDIS_CFLAGS='-Werror' BUILD_TLS=yes
- name: test
run: |
sudo apt-get install tcl8.6 tclx
./runtest --verbose --tags -slow --dump-logs
- name: module api test
run: ./runtest-moduleapi --verbose --dump-logs
- name: validate commands.def up to date
run: |
touch src/commands/ping.json
make commands.def
dirty=$(git diff)
if [[ ! -z $dirty ]]; then echo $dirty; exit 1; fi
- uses: actions/checkout@b4ffde65f46336ab88eb53be808477a3936bae11 # v4.1.1
- name: make
# Fail build if there are warnings
# build with TLS just for compilation coverage
run: make -j4 all-with-unit-tests SERVER_CFLAGS='-Werror' BUILD_TLS=yes USE_FAST_FLOAT=yes
- name: install old server for compatibility testing
run: |
cd tests/tmp
wget https://download.valkey.io/releases/valkey-7.2.7-noble-x86_64.tar.gz
tar -xvf valkey-7.2.7-noble-x86_64.tar.gz
- name: test
run: |
sudo apt-get install tcl8.6 tclx
./runtest --verbose --tags -slow --dump-logs --other-server-path tests/tmp/valkey-7.2.7-noble-x86_64/bin/valkey-server
- name: module api test
run: CFLAGS='-Werror' ./runtest-moduleapi --verbose --dump-logs
- name: validate commands.def up to date
run: |
touch src/commands/ping.json
make commands.def
dirty=$(git diff)
if [[ ! -z $dirty ]]; then echo $dirty; exit 1; fi
- name: unit tests
run: |
./src/valkey-unit-tests
test-ubuntu-latest-cmake:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@b4ffde65f46336ab88eb53be808477a3936bae11 # v4.1.1
- name: cmake and make
run: |
sudo apt-get install -y cmake libssl-dev
mkdir -p build-release
cd build-release
cmake -DCMAKE_BUILD_TYPE=Release .. -DBUILD_TLS=yes -DBUILD_UNIT_TESTS=yes
make -j$(nproc)
- name: test
run: |
sudo apt-get install -y tcl8.6 tclx
ln -sf $(pwd)/build-release/bin/valkey-server $(pwd)/src/valkey-server
ln -sf $(pwd)/build-release/bin/valkey-cli $(pwd)/src/valkey-cli
ln -sf $(pwd)/build-release/bin/valkey-benchmark $(pwd)/src/valkey-benchmark
ln -sf $(pwd)/build-release/bin/valkey-server $(pwd)/src/valkey-check-aof
ln -sf $(pwd)/build-release/bin/valkey-server $(pwd)/src/valkey-check-rdb
ln -sf $(pwd)/build-release/bin/valkey-server $(pwd)/src/valkey-sentinel
./runtest --verbose --tags -slow --dump-logs
- name: unit tests
run: |
./build-release/bin/valkey-unit-tests
test-sanitizer-address:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- uses: actions/checkout@b4ffde65f46336ab88eb53be808477a3936bae11 # v4.1.1
- name: make
# build with TLS module just for compilation coverage
run: make SANITIZER=address REDIS_CFLAGS='-Werror' BUILD_TLS=module
run: make -j4 SANITIZER=address SERVER_CFLAGS='-Werror' BUILD_TLS=module
- name: testprep
run: sudo apt-get install tcl8.6 tclx -y
- name: test
run: ./runtest --verbose --tags -slow --dump-logs
- name: module api test
run: ./runtest-moduleapi --verbose --dump-logs
run: CFLAGS='-Werror' ./runtest-moduleapi --verbose --dump-logs
test-rdma:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@b4ffde65f46336ab88eb53be808477a3936bae11 # v4.1.1
- name: prepare-development-libraries
run: sudo apt-get install librdmacm-dev libibverbs-dev
- name: make-rdma-module
run: make -j4 BUILD_RDMA=module
- name: make-rdma-builtin
run: |
make distclean
make -j4 BUILD_RDMA=yes
- name: clone-rxe-kmod
run: |
mkdir -p tests/rdma/rxe
git clone https://github.com/pizhenwei/rxe.git tests/rdma/rxe
make -C tests/rdma/rxe
- name: clear-kernel-log
run: sudo dmesg -c > /dev/null
- name: test
run: sudo ./runtest-rdma --install-rxe
- name: show-kernel-log
run: sudo dmesg -c
build-debian-old:
runs-on: ubuntu-latest
container: debian:buster
steps:
- uses: actions/checkout@v3
- name: make
run: |
apt-get update && apt-get install -y build-essential
make REDIS_CFLAGS='-Werror'
- uses: actions/checkout@b4ffde65f46336ab88eb53be808477a3936bae11 # v4.1.1
- name: make
run: |
apt-get update && apt-get install -y build-essential
make -j4 SERVER_CFLAGS='-Werror'
build-macos-latest:
runs-on: macos-latest
steps:
- uses: actions/checkout@v3
- name: make
run: make REDIS_CFLAGS='-Werror'
- uses: actions/checkout@b4ffde65f46336ab88eb53be808477a3936bae11 # v4.1.1
- name: make
# Build with additional upcoming features
run: make -j3 all-with-unit-tests SERVER_CFLAGS='-Werror' USE_FAST_FLOAT=yes
build-32bit:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: make
run: |
sudo apt-get update && sudo apt-get install libc6-dev-i386
make REDIS_CFLAGS='-Werror' 32bit
- uses: actions/checkout@b4ffde65f46336ab88eb53be808477a3936bae11 # v4.1.1
- name: make
# Fast float requires C++ 32-bit libraries to compile on 64-bit ubuntu
# machine i.e. "-cross" suffixed version. Cross-compiling c++ to 32-bit
# also requires multilib support for g++ compiler i.e. "-multilib"
# suffixed version of g++. g++-multilib generally includes libstdc++.
# *cross version as well, but it is also added explicitly just in case.
run: |
sudo apt-get update
sudo apt-get install libc6-dev-i386 libstdc++-11-dev-i386-cross gcc-multilib g++-multilib
make -j4 SERVER_CFLAGS='-Werror' 32bit USE_FAST_FLOAT=yes
build-libc-malloc:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: make
run: make REDIS_CFLAGS='-Werror' MALLOC=libc
- uses: actions/checkout@b4ffde65f46336ab88eb53be808477a3936bae11 # v4.1.1
- name: make
run: make -j4 SERVER_CFLAGS='-Werror' MALLOC=libc USE_FAST_FLOAT=yes
build-centos7-jemalloc:
build-almalinux8-jemalloc:
runs-on: ubuntu-latest
container: centos:7
container: almalinux:8
steps:
- uses: actions/checkout@v3
- name: make
run: |
yum -y install gcc make
make REDIS_CFLAGS='-Werror'
- uses: actions/checkout@b4ffde65f46336ab88eb53be808477a3936bae11 # v4.1.1
- name: make
run: |
dnf -y install epel-release gcc gcc-c++ make procps-ng which
make -j4 SERVER_CFLAGS='-Werror' USE_FAST_FLOAT=yes
format-yaml:
runs-on: ubuntu-latest
steps:
- name: Checkout code
uses: actions/checkout@b4ffde65f46336ab88eb53be808477a3936bae11 # v4.1.1
- name: Set up Go
uses: actions/setup-go@cdcb36043654635271a94b9a6d1392de5bb323a7 # v5.0.1
with:
go-version: "1.22.4"
- name: Setup YAML formatter
run: |
go install github.com/google/yamlfmt/cmd/yamlfmt@latest
- name: Run yamlfmt
id: yamlfmt
run: |
yamlfmt -lint -conf .config/format.yml .
# Capture the diff output
DIFF=$(git diff)
if [ ! -z "$DIFF" ]; then
# Encode the diff in Base64 to ensure it's handled as a single line
ENCODED_DIFF=$(echo "$DIFF" | base64 -w 0)
echo "diff=$ENCODED_DIFF" >> $GITHUB_OUTPUT
fi
shell: bash
- name: Check for formatting changes
if: ${{ steps.yamlfmt.outputs.diff }}
run: |
echo "ERROR: YAML file is not formatted properly. Here is the diff: "
# Decode the Base64 diff to display it
echo "${{ steps.clang-format.outputs.diff }}" | base64 --decode
exit 1
shell: bash

53
.github/workflows/clang-format.yml vendored Normal file
View File

@ -0,0 +1,53 @@
name: Clang Format Check
on:
push:
pull_request:
paths:
- 'src/**'
concurrency:
group: clang-${{ github.head_ref || github.ref }}
cancel-in-progress: true
jobs:
clang-format-check:
runs-on: ubuntu-latest
steps:
- name: Checkout code
uses: actions/checkout@b4ffde65f46336ab88eb53be808477a3936bae11 # v4.1.1
- name: Set up Clang
run: |
sudo apt-get update -y
sudo apt-get upgrade -y
sudo apt-get install software-properties-common -y
wget -O - https://apt.llvm.org/llvm-snapshot.gpg.key | gpg --dearmor | sudo tee /usr/share/keyrings/llvm-toolchain.gpg > /dev/null
echo "deb [signed-by=/usr/share/keyrings/llvm-toolchain.gpg] http://apt.llvm.org/$(lsb_release -cs)/ llvm-toolchain-$(lsb_release -cs)-18 main" | sudo tee /etc/apt/sources.list.d/llvm.list
sudo apt-get update -y
sudo apt-get install clang-format-18 -y
- name: Run clang-format
id: clang-format
run: |
# Run clang-format and capture the diff
cd src
shopt -s globstar
clang-format-18 -i **/*.c **/*.h
# Capture the diff output
DIFF=$(git diff)
if [ ! -z "$DIFF" ]; then
# Encode the diff in Base64 to ensure it's handled as a single line
ENCODED_DIFF=$(echo "$DIFF" | base64 -w 0)
echo "diff=$ENCODED_DIFF" >> $GITHUB_OUTPUT
fi
shell: bash
- name: Check for formatting changes
if: ${{ steps.clang-format.outputs.diff }}
run: |
echo "ERROR: Code is not formatted correctly. Here is the diff:"
# Decode the Base64 diff to display it
echo "${{ steps.clang-format.outputs.diff }}" | base64 --decode
exit 1
shell: bash

28
.github/workflows/codecov.yml vendored Normal file
View File

@ -0,0 +1,28 @@
name: "Codecov"
# Enabling on each push is to display the coverage changes in every PR,
# where each PR needs to be compared against the coverage of the head commit
on: [push, pull_request]
concurrency:
group: codecov-${{ github.head_ref || github.ref }}
cancel-in-progress: true
jobs:
code-coverage:
runs-on: ubuntu-22.04
steps:
- name: Checkout repository
uses: actions/checkout@v4
- name: Install lcov and run test
run: |
sudo apt-get install lcov
make lcov
- name: Upload code coverage
uses: codecov/codecov-action@v4
with:
token: ${{ secrets.CODECOV_TOKEN }}
file: ./src/valkey.info

View File

@ -4,30 +4,39 @@ on:
pull_request:
schedule:
# run weekly new vulnerability was added to the database
- cron: '0 0 * * 0'
- cron: '0 3 * * 0'
concurrency:
group: codeql-${{ github.head_ref || github.ref }}
cancel-in-progress: true
permissions:
contents: read
jobs:
analyze:
name: Analyze
runs-on: ubuntu-latest
if: github.event_name != 'schedule' || github.repository == 'redis/redis'
if: github.event_name != 'schedule' || github.repository == 'valkey-io/valkey'
permissions:
security-events: write
strategy:
fail-fast: false
matrix:
language: [ 'cpp' ]
language: ['cpp']
steps:
- name: Checkout repository
uses: actions/checkout@v3
- name: Checkout repository
uses: actions/checkout@b4ffde65f46336ab88eb53be808477a3936bae11 # v4.1.1
- name: Initialize CodeQL
uses: github/codeql-action/init@v2
with:
languages: ${{ matrix.language }}
- name: Initialize CodeQL
uses: github/codeql-action/init@1b1aada464948af03b950897e5eb522f92603cc2 # v3.24.9
with:
languages: ${{ matrix.language }}
- name: Autobuild
uses: github/codeql-action/autobuild@v2
- name: Autobuild
uses: github/codeql-action/autobuild@1b1aada464948af03b950897e5eb522f92603cc2 # v3.24.9
- name: Perform CodeQL Analysis
uses: github/codeql-action/analyze@v2
- name: Perform CodeQL Analysis
uses: github/codeql-action/analyze@1b1aada464948af03b950897e5eb522f92603cc2 # v3.24.9

39
.github/workflows/coverity.yml vendored Normal file
View File

@ -0,0 +1,39 @@
# Creates and uploads a Coverity build on a schedule
name: Coverity Scan
on:
schedule:
# Run once daily, since below 500k LOC can have 21 builds per week, per https://scan.coverity.com/faq#frequency
- cron: '0 1 * * *'
# Support manual execution
workflow_dispatch:
concurrency:
group: coverity-${{ github.head_ref || github.ref }}
cancel-in-progress: true
permissions:
contents: read
jobs:
coverity:
if: github.repository == 'valkey-io/valkey'
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@b4ffde65f46336ab88eb53be808477a3936bae11 # v4.1.1
- name: Download and extract the Coverity Build Tool
run: |
wget -q https://scan.coverity.com/download/cxx/linux64 --post-data "token=${{ secrets.COVERITY_SCAN_TOKEN }}&project=valkey-io%2Fvalkey" -O cov-analysis-linux64.tar.gz
mkdir cov-analysis-linux64
tar xzf cov-analysis-linux64.tar.gz --strip 1 -C cov-analysis-linux64
- name: Install Valkey dependencies
run: sudo apt install -y gcc procps libssl-dev
- name: Build with cov-build
run: cov-analysis-linux64/bin/cov-build --dir cov-int make
- name: Upload the result
run: |
tar czvf cov-int.tgz cov-int
curl \
--form email=${{ secrets.COVERITY_SCAN_EMAIL }} \
--form token=${{ secrets.COVERITY_SCAN_TOKEN }} \
--form file=@cov-int.tgz \
https://scan.coverity.com/builds?project=valkey-io%2Fvalkey

File diff suppressed because it is too large Load Diff

View File

@ -1,82 +1,92 @@
name: External Server Tests
on:
pull_request:
push:
schedule:
- cron: '0 0 * * *'
pull_request:
push:
schedule:
- cron: '0 2 * * *'
concurrency:
group: external-${{ github.head_ref || github.ref }}
cancel-in-progress: true
permissions:
contents: read
jobs:
test-external-standalone:
runs-on: ubuntu-latest
if: github.event_name != 'schedule' || github.repository == 'redis/redis'
timeout-minutes: 14400
if: github.event_name != 'schedule' || github.repository == 'valkey-io/valkey'
timeout-minutes: 1440
steps:
- uses: actions/checkout@v3
- name: Build
run: make REDIS_CFLAGS=-Werror
- name: Start redis-server
run: |
./src/redis-server --daemonize yes --save "" --logfile external-redis.log \
--enable-protected-configs yes --enable-debug-command yes --enable-module-command yes
- name: Run external test
run: |
./runtest \
--host 127.0.0.1 --port 6379 \
--tags -slow
- name: Archive redis log
if: ${{ failure() }}
uses: actions/upload-artifact@v3
with:
name: test-external-redis-log
path: external-redis.log
test-external-cluster:
runs-on: ubuntu-latest
if: github.event_name != 'schedule' || github.repository == 'redis/redis'
timeout-minutes: 14400
steps:
- uses: actions/checkout@v3
- name: Build
run: make REDIS_CFLAGS=-Werror
- name: Start redis-server
run: |
./src/redis-server --cluster-enabled yes --daemonize yes --save "" --logfile external-redis.log \
--enable-protected-configs yes --enable-debug-command yes --enable-module-command yes
- name: Create a single node cluster
run: ./src/redis-cli cluster addslots $(for slot in {0..16383}; do echo $slot; done); sleep 5
- name: Run external test
run: |
./runtest \
--host 127.0.0.1 --port 6379 \
--cluster-mode \
--tags -slow
- name: Archive redis log
if: ${{ failure() }}
uses: actions/upload-artifact@v3
with:
name: test-external-cluster-log
path: external-redis.log
test-external-nodebug:
runs-on: ubuntu-latest
if: github.event_name != 'schedule' || github.repository == 'redis/redis'
timeout-minutes: 14400
steps:
- uses: actions/checkout@v3
- uses: actions/checkout@b4ffde65f46336ab88eb53be808477a3936bae11 # v4.1.1
- name: Build
run: make REDIS_CFLAGS=-Werror
- name: Start redis-server
run: make SERVER_CFLAGS=-Werror
- name: Start valkey-server
run: |
./src/redis-server --daemonize yes --save "" --logfile external-redis.log
./src/valkey-server --daemonize yes --save "" --logfile external-server.log \
--enable-protected-configs yes --enable-debug-command yes --enable-module-command yes
- name: Run external test
run: |
./runtest \
--host 127.0.0.1 --port 6379 \
--tags "-slow -needs:debug"
- name: Archive redis log
--verbose \
--tags -slow
- name: Archive server log
if: ${{ failure() }}
uses: actions/upload-artifact@v3
uses: actions/upload-artifact@6f51ac03b9356f520e9adb1b1b7802705f340c2b # v4.5.0
with:
name: test-external-redis-log
path: external-redis.log
name: test-external-standalone-log
path: external-server.log
test-external-cluster:
runs-on: ubuntu-latest
if: github.event_name != 'schedule' || github.repository == 'valkey-io/valkey'
timeout-minutes: 1440
steps:
- uses: actions/checkout@b4ffde65f46336ab88eb53be808477a3936bae11 # v4.1.1
- name: Build
run: make SERVER_CFLAGS=-Werror
- name: Start valkey-server
run: |
./src/valkey-server --cluster-enabled yes --daemonize yes --save "" --logfile external-server.log \
--enable-protected-configs yes --enable-debug-command yes --enable-module-command yes
- name: Create a single node cluster
run: ./src/valkey-cli cluster addslots $(for slot in {0..16383}; do echo $slot; done); sleep 5
- name: Run external test
run: |
./runtest \
--host 127.0.0.1 --port 6379 \
--verbose \
--cluster-mode \
--tags -slow
- name: Archive server log
if: ${{ failure() }}
uses: actions/upload-artifact@6f51ac03b9356f520e9adb1b1b7802705f340c2b # v4.5.0
with:
name: test-external-cluster-log
path: external-server.log
test-external-nodebug:
runs-on: ubuntu-latest
if: github.event_name != 'schedule' || github.repository == 'valkey-io/valkey'
timeout-minutes: 1440
steps:
- uses: actions/checkout@b4ffde65f46336ab88eb53be808477a3936bae11 # v4.1.1
- name: Build
run: make SERVER_CFLAGS=-Werror
- name: Start valkey-server
run: |
./src/valkey-server --daemonize yes --save "" --logfile external-server.log
- name: Run external test
run: |
./runtest \
--host 127.0.0.1 --port 6379 \
--verbose \
--tags "-slow -needs:debug"
- name: Archive server log
if: ${{ failure() }}
uses: actions/upload-artifact@6f51ac03b9356f520e9adb1b1b7802705f340c2b # v4.5.0
with:
name: test-external-nodebug-log
path: external-server.log

View File

@ -8,13 +8,20 @@ on:
paths:
- 'src/commands/*.json'
concurrency:
group: reply-schemas-linter-${{ github.head_ref || github.ref }}
cancel-in-progress: true
permissions:
contents: read
jobs:
reply-schemas-linter:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- uses: actions/checkout@b4ffde65f46336ab88eb53be808477a3936bae11 # v4.1.1
- name: Setup nodejs
uses: actions/setup-node@v3
uses: actions/setup-node@60edb5dd545a775178f52524783378180af0d1f8 # v4.0.2
- name: Install packages
run: npm install ajv
- name: linter

View File

@ -9,6 +9,13 @@ on:
push:
pull_request:
concurrency:
group: spellcheck-${{ github.head_ref || github.ref }}
cancel-in-progress: true
permissions:
contents: read
jobs:
build:
name: Spellcheck
@ -16,17 +23,12 @@ jobs:
steps:
- name: Checkout repository
uses: actions/checkout@v3
uses: actions/checkout@b4ffde65f46336ab88eb53be808477a3936bae11 # v4.1.1
- name: pip cache
uses: actions/cache@v3
- name: Install typos
uses: taiki-e/install-action@fe9759bf4432218c779595708e80a1aadc85cedc # v2.46.10
with:
path: ~/.cache/pip
key: ${{ runner.os }}-pip-${{ hashFiles('**/requirements.txt') }}
restore-keys: ${{ runner.os }}-pip-
- name: Install prerequisites
run: sudo pip install -r ./.codespell/requirements.txt
tool: typos
- name: Spell check
run: codespell --config=./.codespell/.codespellrc
run: typos --config=./.config/typos.toml

35
.gitignore vendored
View File

@ -1,30 +1,31 @@
.*.swp
*.o
*.a
*.xo
*.so
*.d
*.log
dump.rdb
redis-benchmark
redis-check-aof
redis-check-rdb
redis-check-dump
redis-cli
redis-sentinel
redis-server
dump*.rdb
*-benchmark
*-check-aof
*-check-rdb
*-check-dump
*-cli
*-sentinel
*-server
*-unit-tests
doc-tools
release
misc/*
src/release.h
appendonly.aof*
appendonlydir
appendonlydir*
SHORT_TERM_TODO
release.h
src/transfer.sh
src/configs
redis.ds
src/redis.conf
src/nodes.conf
src/*.conf
deps/lua/src/lua
deps/lua/src/luac
deps/lua/src/liblua.a
@ -41,3 +42,15 @@ Makefile.dep
.ccls-cache/*
compile_commands.json
redis.code-workspace
.cache
.cscope*
.swp
nodes*.conf
tests/cluster/tmp/*
tests/rdma/rdma-test
tags
build/
build-debug/
build-release/
cmake-build-debug/
cmake-build-release/

View File

@ -1,16 +0,0 @@
Hello! This file is just a placeholder, since this is the "unstable" branch
of Redis, the place where all the development happens.
There is no release notes for this branch, it gets forked into another branch
every time there is a partial feature freeze in order to eventually create
a new stable release.
Usually "unstable" is stable enough for you to use it in development environments
however you should never use it in production environments. It is possible
to download the latest stable release here:
https://download.redis.io/redis-stable.tar.gz
More information is available at https://redis.io
Happy hacking!

1
BUGS
View File

@ -1 +0,0 @@
Please check https://github.com/redis/redis/issues

44
CMakeLists.txt Normal file
View File

@ -0,0 +1,44 @@
cmake_minimum_required(VERSION 3.10)
# Must be done first
if (APPLE)
# Force clang compiler on macOS
find_program(CLANGPP "clang++")
find_program(CLANG "clang")
if (CLANG AND CLANGPP)
message(STATUS "Found ${CLANGPP}, ${CLANG}")
set(CMAKE_CXX_COMPILER ${CLANGPP})
set(CMAKE_C_COMPILER ${CLANG})
endif ()
endif ()
# Options
option(BUILD_UNIT_TESTS "Build valkey-unit-tests" OFF)
option(BUILD_TEST_MODULES "Build all test modules" OFF)
option(BUILD_EXAMPLE_MODULES "Build example modules" OFF)
set(CMAKE_MODULE_PATH ${CMAKE_MODULE_PATH} "${CMAKE_SOURCE_DIR}/cmake/Modules/")
project("valkey")
set(CMAKE_C_STANDARD 11)
set(CMAKE_C_STANDARD_REQUIRED ON)
set(CMAKE_C_EXTENSIONS ON)
include(ValkeySetup)
add_subdirectory(src)
add_subdirectory(tests)
# Include the packaging module
include(Packaging)
# Clear cached variables from the cache
unset(BUILD_TESTS CACHE)
unset(CLANGPP CACHE)
unset(CLANG CACHE)
unset(BUILD_RDMA_MODULE CACHE)
unset(BUILD_TLS_MODULE CACHE)
unset(BUILD_UNIT_TESTS CACHE)
unset(BUILD_TEST_MODULES CACHE)
unset(BUILD_EXAMPLE_MODULES CACHE)
unset(USE_TLS CACHE)
unset(DEBUG_FORCE_DEFRAG CACHE)

View File

@ -26,7 +26,7 @@ Examples of unacceptable behavior include:
advances of any kind
* Trolling, insulting or derogatory comments, and personal or political attacks
* Public or private harassment
* Publishing others private information, such as a physical or email
* Publishing others' private information, such as a physical or email
address, without their explicit permission
* Other conduct which could reasonably be considered inappropriate in a
professional setting
@ -49,7 +49,7 @@ representative at an online or offline event.
Enforcement
Instances of abusive, harassing, or otherwise unacceptable behavior may be
reported to the community leaders responsible for enforcement at
this email address: redis@redis.io.
this email address: maintainers@lists.valkey.io.
All complaints will be reviewed and investigated promptly and fairly.
All community leaders are obligated to respect the privacy and security of the
reporter of any incident.
@ -89,7 +89,7 @@ Attribution
This Code of Conduct is adapted from the Contributor Covenant,
version 2.0, available at
https://www.contributor-covenant.org/version/2/0/code_of_conduct.html.
Community Impact Guidelines were inspired by Mozillas code of conduct
Community Impact Guidelines were inspired by Mozilla's code of conduct
enforcement ladder.
For answers to common questions about this code of conduct, see the FAQ at
https://www.contributor-covenant.org/faq. Translations are available at

View File

@ -1,56 +1,107 @@
Note: by contributing code to the Redis project in any form, including sending
a pull request via Github, a code fragment or patch via private email or
public discussion groups, you agree to release your code under the terms
of the BSD license that you can find in the COPYING file included in the Redis
source distribution. You will include BSD license in the COPYING file within
each source file that you contribute.
Contributing to Valkey
======================
# IMPORTANT: HOW TO USE REDIS GITHUB ISSUES
Welcome and thank you for wanting to contribute!
Github issues SHOULD ONLY BE USED to report bugs, and for DETAILED feature
requests. Everything else belongs to the Redis Google Group:
https://groups.google.com/forum/m/#!forum/Redis-db
# Project governance
PLEASE DO NOT POST GENERAL QUESTIONS that are not about bugs or suspected
bugs in the Github issues system. We'll be very happy to help you and provide
all the support in the mailing list.
The Valkey project is led by a Technical Steering Committee, whose responsibilities are laid out in [GOVERNANCE.md](GOVERNANCE.md).
There is also an active community of Redis users at Stack Overflow:
## Get started
https://stackoverflow.com/questions/tagged/redis
* Have a question? Ask it on
[GitHub Discussions](https://github.com/valkey-io/valkey/discussions)
or [Valkey's Discord](https://discord.gg/zbcPa5umUB)
or [Valkey's Matrix](https://matrix.to/#/#valkey:matrix.org)
* Found a bug? [Report it here](https://github.com/valkey-io/valkey/issues/new?template=bug_report.md&title=%5BBUG%5D)
* Valkey crashed? [Submit a crash report here](https://github.com/valkey-io/valkey/issues/new?template=crash_report.md&title=%5BCRASH%5D+%3Cshort+description%3E)
* Suggest a new feature? [Post your detailed feature request here](https://github.com/valkey-io/valkey/issues/new?template=feature_request.md&title=%5BNEW%5D)
* Want to help with documentation? [Move on to valkey-doc](https://github.com/valkey-io/valkey-doc)
* Report a vulnerability? See [SECURITY.md](SECURITY.md)
Issues and pull requests for documentation belong on the redis-doc repo:
## Developer Certificate of Origin
https://github.com/redis/redis-doc
We respect the intellectual property rights of others and we want to make sure
all incoming contributions are correctly attributed and licensed. A Developer
Certificate of Origin (DCO) is a lightweight mechanism to do that. The DCO is
a declaration attached to every commit. In the commit message of the contribution,
the developer simply adds a `Signed-off-by` statement and thereby agrees to the DCO,
which you can find below or at [DeveloperCertificate.org](http://developercertificate.org/).
If you are reporting a security bug or vulnerability, see SECURITY.md.
```text
Developer's Certificate of Origin 1.1
# How to provide a patch for a new feature
By making a contribution to this project, I certify that:
(a) The contribution was created in whole or in part by me and I
have the right to submit it under the open source license
indicated in the file; or
(b) The contribution is based upon previous work that, to the
best of my knowledge, is covered under an appropriate open
source license and I have the right under that license to
submit that work with modifications, whether created in whole
or in part by me, under the same open source license (unless
I am permitted to submit under a different license), as
Indicated in the file; or
(c) The contribution was provided directly to me by some other
person who certified (a), (b) or (c) and I have not modified
it.
(d) I understand and agree that this project and the contribution
are public and that a record of the contribution (including
all personal information I submit with it, including my
sign-off) is maintained indefinitely and may be redistributed
consistent with this project or the open source license(s)
involved.
```
We require that every contribution to Valkey to be signed with a DCO. We require the
usage of known identity (such as a real or preferred name). We do not accept anonymous
contributors nor those utilizing pseudonyms. A DCO signed commit will contain a line like:
```text
Signed-off-by: Jane Smith <jane.smith@email.com>
```
You may type this line on your own when writing your commit messages. However, if your
user.name and user.email are set in your git configs, you can use `git commit` with `-s`
or `--signoff` to add the `Signed-off-by` line to the end of the commit message. We also
require revert commits to include a DCO.
If you're contributing code to the Valkey project in any other form, including
sending a code fragment or patch via private email or public discussion groups,
you need to ensure that the contribution is in accordance with the DCO.
# How to provide a patch or a new feature
1. If it is a major feature or a semantical change, please don't start coding
straight away: if your feature is not a conceptual fit you'll lose a lot of
time writing the code without any reason. Start by posting in the mailing list
and creating an issue at Github with the description of, exactly, what you want
to accomplish and why. Use cases are important for features to be accepted.
Here you'll see if there is consensus about your idea.
time writing the code without any reason. Start by creating an issue at Github with the
description of, exactly, what you want to accomplish and why. Use cases are important for
features to be accepted. Here you can see if there is consensus about your idea.
2. If in step 1 you get an acknowledgment from the project leaders, use the
following procedure to submit a patch:
a. Fork Redis on github ( https://docs.github.com/en/github/getting-started-with-github/fork-a-repo )
b. Create a topic branch (git checkout -b my_branch)
c. Push to your branch (git push origin my_branch)
d. Initiate a pull request on github ( https://docs.github.com/en/github/collaborating-with-issues-and-pull-requests/creating-a-pull-request )
e. Done :)
2. If in step 1 you get an acknowledgment from the project leaders, use the following
procedure to submit a patch:
1. Fork Valkey on GitHub ([HOWTO](https://docs.github.com/en/github/getting-started-with-github/fork-a-repo))
1. Create a topic branch (`git checkout -b my_branch`)
1. Make the needed changes and commit with a DCO. (`git commit -s`)
1. Push to your branch (`git push origin my_branch`)
1. Initiate a pull request on GitHub ([HOWTO](https://docs.github.com/en/github/collaborating-with-issues-and-pull-requests/creating-a-pull-request))
1. Done :)
3. Keep in mind that we are very overloaded, so issues and PRs sometimes wait
for a *very* long time. However this is not lack of interest, as the project
for a *very* long time. However this is not a lack of interest, as the project
gets more and more users, we find ourselves in a constant need to prioritize
certain issues/PRs over others. If you think your issue/PR is very important
try to popularize it, have other users commenting and sharing their point of
view and so forth. This helps.
view, and so forth. This helps.
4. For minor fixes just open a pull request on Github.
4. For minor fixes, open a pull request on GitHub.
To link a pull request to an existing issue, please write "Fixes #xyz" somewhere
in the pull request description, where xyz is the issue number.
Thanks!

37
COPYING
View File

@ -1,4 +1,39 @@
Copyright (c) 2006-2020, Salvatore Sanfilippo
# License 1
BSD 3-Clause License
Copyright (c) 2024-present, Futriix contributors
All rights reserved.
Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
* Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
* Neither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
# License 2
BSD 3-Clause License
Copyright (c) 2024-present, Valkey contributors
All rights reserved.
Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
* Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
* Neither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
# License 3
BSD 3-Clause License
Copyright (c) 2006-2020, Redis Ltd.
All rights reserved.
Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

70
GOVERNANCE.md Normal file
View File

@ -0,0 +1,70 @@
# Project Governance
The Valkey project is managed by a Technical Steering Committee (TSC) composed of the maintainers of the Valkey repository.
The Valkey project includes all of the current and future repositories under the Valkey-io organization.
Committers are defined as individuals with write access to the code within a repository.
Maintainers are defined as individuals with full access to a repository and own its governance.
Both maintainers and committers should be clearly listed in the MAINTAINERS.md file in a given projects repository.
Maintainers of other repositories within the Valkey project are not members of the TSC unless explicitly added.
## Technical Steering Committee
The TSC is responsible for oversight of all technical, project, approval, and policy matters for Valkey.
The TSC members are listed in the [MAINTAINERS.md](MAINTAINERS.md) file in the Valkey repository.
Maintainers (and accordingly, TSC members) may be added or removed by no less than 2/3 affirmative vote of the current TSC.
The TSC shall appoint a Chair responsible for organizing TSC meetings.
If the TSC Chair is removed from the TSC (or the Chair steps down from that role), it is the responsibility of the TSC to appoint a new Chair.
The TSC can amend this governance document by no less than a 2/3 affirmative vote.
The TSC may, at its discretion, add or remove members who are not maintainers of the main Valkey repository.
The TSC may, at its discretion, add or remove maintainers from other repositories within the Valkey project.
## Voting
The TSC shall strive for all decisions to be made by consensus.
While explicit agreement of the entire TSC is preferred, it is not required for consensus.
Rather, the TSC shall determine consensus based on their good faith consideration of a number of factors, including the dominant view of the TSC and nature of support and objections.
The TSC shall document evidence of consensus in accordance with these requirements.
If consensus cannot be reached, the TSC shall make the decision by a vote.
A vote shall also be called when an issue or pull request is marked as a major decision, which are decisions that have a significant impact on the Valkey architecture or design.
Examples of major decisions:
* Fundamental changes to the Valkey core datastructures
* Adding a new data structure or API
* Changes that affect backward compatibility
* New user visible fields that need to be maintained
* Modifications to the TSC or other governance documents
* Adding members to other roles within the Valkey project
* Delegation of maintainership for projects to other groups or individuals
* Adding or removing a new external library such as a client
or module to the project.
Any member of the TSC can call a vote with reasonable notice to the TSC, setting out a discussion period and a separate voting period.
Any discussion may be conducted in person or electronically by text, voice, or video.
The discussion shall be open to the public, with the notable exception of discussions involving embargoed security issues or the addition or removal of maintainers, which will be private.
In any vote, each voting TSC member will have one vote.
The TSC shall give at least two weeks for all members to submit their vote.
Except as specifically noted elsewhere in this document, decisions by vote require a simple majority vote of all voting members.
It is the responsibility of the TSC chair to help facilitate the voting process as needed to make sure it completes within the voting period.
## Termination of Membership
A maintainer's access (and accordingly, their position on the TSC) will be removed if any of the following occur:
* Resignation: Written notice of resignation to the TSC.
* TSC Vote: 2/3 affirmative vote of the TSC to remove a member
* Unreachable Member: If a member is unresponsive for more than six months, the remaining active members of the TSC may vote to remove the unreachable member by a simple majority.
## Technical direction for other Valkey projects
The TSC may delegate decision making for other projects within the Valkey organization to the maintainers responsible for those projects.
Delegation of decision making for a project is considered a major decision, and shall happen with an explicit vote.
Projects within the Valkey organization must indicate the individuals with commit permissions by updating the MAINTAINERS.md within their repositories.
The TSC may, at its discretion, overrule the decisions made by other projects within the Valkey organization, although they should show restraint in doing so.
## License of this document
This document may be used, modified, and/or distributed under the terms of the
[Creative Commons Attribution 4.0 International (CC-BY) license](https://creativecommons.org/licenses/by/4.0/legalcode).

View File

@ -1 +0,0 @@
See README

BIN
Logo-Futriix.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 22 KiB

31
MAINTAINERS.md Normal file
View File

@ -0,0 +1,31 @@
## Overview
This document contains a list of maintainers in this repo.
See [GOVERNANCE.md](GOVERNANCE.md) that explains the function of this file.
## Current Maintainers
Maintainers listed in alphabetical order by their github ID.
| Maintainer | GitHub ID | Affiliation |
| ------------------- | ----------------------------------------------- | ----------- |
| Zhu Binbin | [enjoy-binbin](https://github.com/enjoy-binbin) | Tencent |
| Wen Hui | [hwware](https://github.com/hwware) | Huawei |
| Madelyn Olson | [madolson](https://github.com/madolson) | Amazon |
| Ping Xie | [pingxie](https://github.com/pingxie) | Google |
| Zhao Zhao | [soloestoy](https://github.com/soloestoy) | Alibaba |
| Viktor Söderqvist | [zuiderkwast](https://github.com/zuiderkwast) | Ericsson |
## Current Committers
Committers listed in alphabetical order by their github ID.
| Committer | GitHub ID | Affiliation |
| ------------------- | ----------------------------------------------- | ----------- |
| Harkrishn Patro | [hpatro](https://github.com/hpatro) | Amazon |
| Ran Shidlansik | [ranshid](https://github.com/ranshid) | Amazon |
### Former Maintainers and Committers
| Maintainer | GitHub ID | Affiliation |
| ------------------- | ----------------------------------------------- | ----------- |

106
MANIFESTO
View File

@ -1,106 +0,0 @@
[Note: this is the Redis manifesto, for general information about
installing and running Redis read the README file instead.]
Redis Manifesto
===============
1 - A DSL for Abstract Data Types. Redis is a DSL (Domain Specific Language)
that manipulates abstract data types and implemented as a TCP daemon.
Commands manipulate a key space where keys are binary-safe strings and
values are different kinds of abstract data types. Every data type
represents an abstract version of a fundamental data structure. For instance
Redis Lists are an abstract representation of linked lists. In Redis, the
essence of a data type isn't just the kind of operations that the data types
support, but also the space and time complexity of the data type and the
operations performed upon it.
2 - Memory storage is #1. The Redis data set, composed of defined key-value
pairs, is primarily stored in the computer's memory. The amount of memory in
all kinds of computers, including entry-level servers, is increasing
significantly each year. Memory is fast, and allows Redis to have very
predictable performance. Datasets composed of 10k or 40 millions keys will
perform similarly. Complex data types like Redis Sorted Sets are easy to
implement and manipulate in memory with good performance, making Redis very
simple. Redis will continue to explore alternative options (where data can
be optionally stored on disk, say) but the main goal of the project remains
the development of an in-memory database.
3 - Fundamental data structures for a fundamental API. The Redis API is a direct
consequence of fundamental data structures. APIs can often be arbitrary but
not an API that resembles the nature of fundamental data structures. If we
ever meet intelligent life forms from another part of the universe, they'll
likely know, understand and recognize the same basic data structures we have
in our computer science books. Redis will avoid intermediate layers in API,
so that the complexity is obvious and more complex operations can be
performed as the sum of the basic operations.
4 - We believe in code efficiency. Computers get faster and faster, yet we
believe that abusing computing capabilities is not wise: the amount of
operations you can do for a given amount of energy remains anyway a
significant parameter: it allows to do more with less computers and, at
the same time, having a smaller environmental impact. Similarly Redis is
able to "scale down" to smaller devices. It is perfectly usable in a
Raspberry Pi and other small ARM based computers. Faster code having
just the layers of abstractions that are really needed will also result,
often, in more predictable performances. We think likewise about memory
usage, one of the fundamental goals of the Redis project is to
incrementally build more and more memory efficient data structures, so that
problems that were not approachable in RAM in the past will be perfectly
fine to handle in the future.
5 - Code is like a poem; it's not just something we write to reach some
practical result. Sometimes people that are far from the Redis philosophy
suggest using other code written by other authors (frequently in other
languages) in order to implement something Redis currently lacks. But to us
this is like if Shakespeare decided to end Enrico IV using the Paradiso from
the Divina Commedia. Is using any external code a bad idea? Not at all. Like
in "One Thousand and One Nights" smaller self contained stories are embedded
in a bigger story, we'll be happy to use beautiful self contained libraries
when needed. At the same time, when writing the Redis story we're trying to
write smaller stories that will fit in to other code.
6 - We're against complexity. We believe designing systems is a fight against
complexity. We'll accept to fight the complexity when it's worthwhile but
we'll try hard to recognize when a small feature is not worth 1000s of lines
of code. Most of the time the best way to fight complexity is by not
creating it at all. Complexity is also a form of lock-in: code that is
very hard to understand cannot be modified by users in an independent way
regardless of the license. One of the main Redis goals is to remain
understandable, enough for a single programmer to have a clear idea of how
it works in detail just reading the source code for a couple of weeks.
7 - Threading is not a silver bullet. Instead of making Redis threaded we
believe on the idea of an efficient (mostly) single threaded Redis core.
Multiple of such cores, that may run in the same computer or may run
in multiple computers, are abstracted away as a single big system by
higher order protocols and features: Redis Cluster and the upcoming
Redis Proxy are our main goals. A shared nothing approach is not just
much simpler (see the previous point in this document), is also optimal
in NUMA systems. In the specific case of Redis it allows for each instance
to have a more limited amount of data, making the Redis persist-by-fork
approach more sounding. In the future we may explore parallelism only for
I/O, which is the low hanging fruit: minimal complexity could provide an
improved single process experience.
8 - Two levels of API. The Redis API has two levels: 1) a subset of the API fits
naturally into a distributed version of Redis and 2) a more complex API that
supports multi-key operations. Both are useful if used judiciously but
there's no way to make the more complex multi-keys API distributed in an
opaque way without violating our other principles. We don't want to provide
the illusion of something that will work magically when actually it can't in
all cases. Instead we'll provide commands to quickly migrate keys from one
instance to another to perform multi-key operations and expose the
trade-offs to the user.
9 - We optimize for joy. We believe writing code is a lot of hard work, and the
only way it can be worth is by enjoying it. When there is no longer joy in
writing code, the best thing to do is stop. To prevent this, we'll avoid
taking paths that will make Redis less of a joy to develop.
10 - All the above points are put together in what we call opportunistic
programming: trying to get the most for the user with minimal increases
in complexity (hanging fruits). Solve 95% of the problem with 5% of the
code when it is acceptable. Avoid a fixed schedule but follow the flow of
user requests, inspiration, Redis internal readiness for certain features
(sometimes many past changes reach a critical point making a previously
complex feature very easy to obtain).

View File

@ -1,4 +1,4 @@
# Top level makefile, the real shit is at src/Makefile
# Top level makefile, the real magic is at src/Makefile
default: all

782
README.md
View File

@ -1,506 +1,408 @@
This README is just a fast *quick start* document. You can find more detailed documentation at [redis.io](https://redis.io).
<!-- Improved compatibility of К началу link: See: https://github.com/othneildrew/Best-README-Template/pull/73 -->
<a id="readme-top"></a>
What is Redis?
--------------
<!-- PROJECT LOGO -->
<br />
<div align="center">
<!-- <a href="https://github.com/othneildrew/Best-README-Template"> -->
<img src="Logo-Futriix.png" height=100></img>
</a>
Redis is often referred to as a *data structures* server. What this means is that Redis provides access to mutable data structures via a set of commands, which are sent using a *server-client* model with TCP sockets and a simple protocol. So different processes can query and modify the same data structures in a shared way.
<h3 align="center">Futriix</h3>
Data structures implemented into Redis have a few special properties:
<p align="center">
Futriix's полная документация (команды идентичны)
<br />
<a href="https://valkey.io/"><strong>Изучить полную документацию</strong></a>
<br />
<a href="">Сообщить об ошибке</a>
&middot;
<a href="">Предложение новой функциональности</a>
</p>
</div>
* Redis cares to store them on disk, even if they are always served and modified into the server memory. This means that Redis is fast, but that it is also non-volatile.
* The implementation of data structures emphasizes memory efficiency, so data structures inside Redis will likely use less memory compared to the same data structure modelled using a high-level programming language.
* Redis offers a number of features that are natural to find in a database, like replication, tunable levels of durability, clustering, and high availability.
## Краткая документация проекта Futriix
Another good example is to think of Redis as a more complex version of memcached, where the operations are not just SETs and GETs, but operations that work with complex data types like Lists, Sets, ordered data structures, and so forth.
If you want to know more, this is a list of selected starting points:
* Introduction to Redis data types. https://redis.io/topics/data-types-intro
* Try Redis directly inside your browser. https://try.redis.io
* The full list of Redis commands. https://redis.io/commands
* There is much more inside the official Redis documentation. https://redis.io/documentation
Building Redis
--------------
Redis can be compiled and used on Linux, OSX, OpenBSD, NetBSD, FreeBSD.
We support big endian and little endian architectures, and both 32 bit
and 64 bit systems.
It may compile on Solaris derived systems (for instance SmartOS) but our
support for this platform is *best effort* and Redis is not guaranteed to
work as well as in Linux, OSX, and \*BSD.
It is as simple as:
% make
To build with TLS support, you'll need OpenSSL development libraries (e.g.
libssl-dev on Debian/Ubuntu) and run:
% make BUILD_TLS=yes
To build with systemd support, you'll need systemd development libraries (such
as libsystemd-dev on Debian/Ubuntu or systemd-devel on CentOS) and run:
% make USE_SYSTEMD=yes
To append a suffix to Redis program names, use:
% make PROG_SUFFIX="-alt"
You can build a 32 bit Redis binary using:
% make 32bit
After building Redis, it is a good idea to test it using:
% make test
If TLS is built, running the tests with TLS enabled (you will need `tcl-tls`
installed):
% ./utils/gen-test-certs.sh
% ./runtest --tls
<!-- TABLE OF CONTENTS -->
<br>
<details>
<summary><b>Содержание</b></summary>
<ol>
<li>
<a href="#о-проекте">О проекте</a>
</li>
<li><a href="#подготовка">Подготовка</a></li>
<li><a href="#компиляция">Компиляция</a></li>
<li><a href="#использование">Использование</a></li>
<li><a href="#кластер">Кластер</a></li>
<li><a href="#дорожная-карта">Дорожная карта</a></li>
<li><a href="#вклад">Вклад</a></li>
<li><a href="#лицензия">Лицензия</a></li>
<li><a href="#контакты">Контакты</a></li>
</ol>
</details>
Fixing build problems with dependencies or cached build options
---------
<!-- ABOUT THE PROJECT -->
## О проекте
Redis has some dependencies which are included in the `deps` directory.
`make` does not automatically rebuild dependencies even if something in
the source code of dependencies changes.
Проект Futriix является форком проекта Valkey.
Futriix-Распределённая СУБД на языке "C", построенная на базе [Valkey](https://valkey.io/), с поддержкой модулей на базе Искусственного интеллекта и модулей на языке Golang.
When you update the source code with `git pull` or when code inside the
dependencies tree is modified in any other way, make sure to use the following
command in order to really clean everything and rebuild from scratch:
СУБД поддерживает модуль c распределённым [JSON](https://source.futriix.ru/gvsafronov/futriix-json), [ИИ-модуль "Виртуальный помощник"](), [SQL-модуль](https://source.futriix.ru/gvsafronov/fdx).
% make distclean
Ниже приведён пример того, инструкции по настройке вашего проекта локально.
Чтобы запустить локальную копию проекта, выполните следующие простые шаги.
This will clean: jemalloc, lua, hiredis, linenoise and other dependencies.
Also if you force certain build options like 32bit target, no C compiler
optimizations (for debugging purposes), and other similar build time options,
those options are cached indefinitely until you issue a `make distclean`
command.
Fixing problems building 32 bit binaries
---------
### Подготовка
If after building Redis with a 32 bit target you need to rebuild it
with a 64 bit target, or the other way around, you need to perform a
`make distclean` in the root directory of the Redis distribution.
Ниже приведены шаги, которые помогут вам скомпилировать и установить Futriix.
* Устанавливаем язык программирования C, соопутствующие утилиты (autoconf и другие)
In case of build errors when trying to build a 32 bit binary of Redis, try
the following steps:
```sh
unix:$ sudo apt update && sudo apt upgrade
unix:$ sudo apt install build-essential nasm autotools-dev autoconf libjemalloc-dev tcl tcl-dev uuid-dev libcurl4-openssl-dev git
```
* Install the package libc6-dev-i386 (also try g++-multilib).
* Try using the following command line instead of `make 32bit`:
`make CFLAGS="-m32 -march=native" LDFLAGS="-m32"`
* Устанавливаем язык программирования Golang по инструкции с [официального сайта](https://go.dev/doc/install)
Allocator
---------
### Компиляция
Selecting a non-default memory allocator when building Redis is done by setting
the `MALLOC` environment variable. Redis is compiled and linked against libc
malloc by default, with the exception of jemalloc being the default on Linux
systems. This default was picked because jemalloc has proven to have fewer
fragmentation problems than libc malloc.
Для того, чтобы успешно скомпилировать проект, выполните шаги ниже:
To force compiling against libc malloc, use:
1. Скопировать репозиторий
```sh
git clone https://source.futriix.ru/gvsafronov/Futriix
```
2. Перейти в каталог с исходном кодом src
```sh
cd src/
```
<p align="right">(<a href="#readme-top">К началу</a>)</p>
% make MALLOC=libc
3. Скомпилировать Futriix с помощью утилиты Make
Futriix может быть скомпилирован для Linux, OSX, OpenBSD, NetBSD, FreeBSD.
Мы поддерживаем архитектуры endian и little endian, и 32-битные и 64-битные системы.
```sh
unix:$ make
```
Для сборки проекта с поддержкой TLS, вам необходима библиотека OpenSSL (например,
libssl-dev для Debian/Ubuntu).
Для сборки проекта с поддержкой TLS выпоните команды ниже:
```sh
unix:$ make BUILD_TLS=yes
```
To build TLS as Futriix module:
```sh
unix:$ make BUILD_TLS=module
```
Для сборки проекта с экспериментальной поддержкой RDMA вам необходимо установить библиотеку разработки RDMA
(например, librdmacm-dev and libibverbs-dev для Debian/Ubuntu).
Для сборки Futriix c поддержкой RDMA просто выполните следующие команды:
```sh
unix:$ make BUILD_RDMA=yes
```
To build RDMA as Futriix module:
```sh
unix:$ make BUILD_RDMA=module
```
Для сборки проекта с поддержкой systemd, вам необходимо установить соответсвующие библиотеки разработки (такие как
libsystemd-dev для Debian/Ubuntu или systemd-devel для CentOS) и выполнить следующие команды:
```sh
unix:$ make USE_SYSTEMD=yes
```
Для добавления суффикса в имя проекта Futriix, выполните следующие команды:
```sh
unix:$ make PROG_SUFFIX="-alt"
```
После сборки Futriix, мы рекомендуем запустить утилиту для проверки корректности сборки:
```sh
unix:$ make test
```
Команда выше запустит интегрированные в проект тесты. Additional tests are started using:
```sh
unix:$ make test-unit # Юнит-тесты
unix:$ make test-modules # Тесты модулей API
unix:$ make test-cluster # Тест Futriix для проверки работы кластера
```
Более подробную информацию вы найдёте ознакомившись со следующими источниками:
[tests/README.md](tests/README.md) а также [src/unit/README.md](src/unit/README.md).
<p align="right">(<a href="#readme-top">К началу</a>)</p>
## Исправление проблем сборки с зависимостями или кэшированными параметрами сборки.
Futriix содержит некоторые зависимости, которые хранятся в директории `deps`.
Утилита `make` автоматически не пересобирает зависимости даже если вносятся каие-либо изменения в код зависимостей.
Когда вы обновляете код проекта командой `git pull` или когда код внутри
дерева зависимостей изменен каким-либо другим способом, обязательно используйте следующее
команду для того, чтобы действительно все почистить и пересобрать с нуля:
```sh
unix:$ make distclean
```
В результате работы команды выше будут очищены: аллокатор памяти jemalloc, язык lua, библиотеку hiredis, библиотеку linenoise а также другие зависимости.
Кроме того, если вы принудительно используете определенные параметры сборки, такие как 32-битная версия для 32-битной системы, оптимизации компилятора C в данном случае не будут выполнены. Оптимизации (для целей отладки) и другие подобные параметры времени сборки,
кэшируются на неопределенный срок, пока вы не выполните команду `make distclean`.
<p align="right">(<a href="#readme-top">К началу</a>)</p>
## Аллокатор
Выбор аллокатора памяти не по умолчанию при сборке Futriix выполняется путем установки
параметра `MALLOC` переменной окружения. Futriix компилируется и компонуется с libc
malloc по умолчанию, за исключением jemalloc, который используется по умолчанию в дистрибутивах Linux.
Это значение по умолчанию было выбрано потому, что в jemalloc меньше
проблем c фрагментацией, чем libc malloc.
Чтобы принудительно скомпилировать libc malloc, выполните следующую команду:
```sh
unix:$ make MALLOC=libc
```
To compile against jemalloc on Mac OS X systems, use:
```sh
unix:$ make MALLOC=jemalloc
```
% make MALLOC=jemalloc
## Монотонные часы
Monotonic clock
---------------
By default, Redis will build using the POSIX clock_gettime function as the
monotonic clock source. On most modern systems, the internal processor clock
can be used to improve performance. Cautions can be found here:
По умолчанию Futriix будет использовать функцию POSIX clock_gettime в качестве
монотонный источник тактовой частоты. В большинстве современных систем внутреннюю тактовую частоту процессора
можно использовать для улучшения производительности. Предостережения можно найти здесь:
http://oliveryang.net/2015/09/pitfalls-of-TSC-usage/
To build with support for the processor's internal instruction clock, use:
Для сборки с поддержкой внутренней тактовой частоты процессора, используйте команду ниже:
% make CFLAGS="-DUSE_PROCESSOR_CLOCK"
```sh
unix:$ make CFLAGS="-DUSE_PROCESSOR_CLOCK"
```
Verbose build
-------------
## Расширенный вариант сборки
Redis will build with a user-friendly colorized output by default.
If you want to see a more verbose output, use the following:
Futriix по умолчанию создает удобный для пользователя цветной вывод.
Если вы хотите увидеть более подробный вывод, выполните следующую команду:
% make V=1
```sh
unix:$ make V=1
```
4. Если вы хотите запустить сервер Futriix с параметрами по-умолчанию (без указания файла конфигурации) выполните следующую команду:
```sh
`./futriix-server`
```
5. Также вы можете использовать файл конфигурации, располагающийся в директории "Futriix" `futriix.conf` для конфигурирования вашего сервера.
Для запуска Futriix с файлом конфигурации используйте команду ниже:
```sh
./futriix-server /path/to/futriix.conf
```
6. Запустите утилиту futriix-cli (Client Futriix) для подключения к **локальному** серверу Futriix, а также для того чтобы начать работу с инстансом:
Running Redis
-------------
```sh
./futriix-cli
```
To run Redis with the default configuration, just type:
7. Для подключения с помощью утилиты futriix-cli к конкретному узлу в сети, добавьте параметр `h`-указание удалённого хоста по его ip-адресу и параметр `p`- указания номера порта:
% cd src
% ./redis-server
```sh
./futriix-cli -h 11.164.22.7 -p 50000
```
If you want to provide your redis.conf, you have to run it using an additional
parameter (the path of the configuration file):
<p align="right">(<a href="#readme-top">К началу</a>)</p>
% cd src
% ./redis-server /path/to/redis.conf
## Запуск Futriix с RDMA:
It is possible to alter the Redis configuration by passing parameters directly
as options using the command line. Examples:
Обратите внимание, что поддержка RDMA в Futriix— экспериментальная функция.
Она может быть изменена или удалена в любой дополнительной или основной версии.
В настоящее время она поддерживается только в Linux.
% ./redis-server --port 9999 --replicaof 127.0.0.1 6379
% ./redis-server /etc/redis/6379.conf --loglevel debug
* Команда для включения RDMA :
```sh
./src/futriix-server --protected-mode no \
--rdma-bind 192.168.122.100 --rdma-port 9880
```
All the options in redis.conf are also supported as options using the command
line, with exactly the same name.
* Режим работы модуля RDMA:
```sh
./src/futriix-server --protected-mode no \
--loadmodule src/Futriix-rdma.so --rdma-bind 192.168.122.100 --rdma-port 9880
```
Можно изменить адрес/порт привязки RDMA с помощью команды времени выполнения:
Running Redis with TLS:
------------------
```sh
unix:$ 192.168.122.100:9880> CONFIG SET rdma-port 9380
```
Please consult the [TLS.md](TLS.md) file for more information on
how to use Redis with TLS.
Также возможно наличие одновременно RDMA и TCP, но нет
конфликт TCP(9880) и RDMA(9880), например:
Playing with Redis
------------------
```sh
unix:$ ./src/futriix-server --protected-mode no \
--loadmodule src/Futriix-rdma.so --rdma-bind 192.168.122.100 --rdma-port 9880 \
--port 9880
```
You can use redis-cli to play with Redis. Start a redis-server instance,
then in another terminal try the following:
Примечание: Ваша сетевая карта (с ip-адресом 192.168.122.100 в данном примере) должна поддерживать режим
RDMA. Для того что понять поддерживает сервер режим RDMA или нет, выполните команду ниже:
% cd src
% ./redis-cli
redis> ping
```sh
unix:$ rdma res show (a new version iproute2 package)
```
Или команду ниже:
```sh
unix:$ ibv_devices
```
<!-- USAGE EXAMPLES -->
## Использование
unix:$ cd src
unix:$ ./futriix-cli
127.0.0.1:futriix:~> ping
PONG
redis> set foo bar
127.0.0.1:futriix:~> set foo bar
OK
redis> get foo
127.0.0.1:futriix:~> get foo
"bar"
redis> incr mycounter
127.0.0.1:futriix:~> incr mycounter
(integer) 1
redis> incr mycounter
127.0.0.1:futriix:~> incr mycounter
(integer) 2
redis>
127.0.0.1:futriix:~>
You can find the list of all the available commands at https://redis.io/commands.
Installing Redis
-----------------
<p align="right">(<a href="#readme-top">К началу</a>)</p>
In order to install Redis binaries into /usr/local/bin, just use:
% make install
## Кластер
You can use `make PREFIX=/some/other/directory install` if you wish to use a
different destination.
`make install` will just install binaries in your system, but will not configure
init scripts and configuration files in the appropriate place. This is not
needed if you just want to play a bit with Redis, but if you are installing
it the proper way for a production system, we have a script that does this
for Ubuntu and Debian systems:
1. Откройте директорию Futriix
% cd utils
% ./install_server.sh
```sh
unix:$ cd futriix
_Note_: `install_server.sh` will not work on Mac OSX; it is built for Linux only.
The script will ask you a few questions and will setup everything you need
to run Redis properly as a background daemon that will start again on
system reboots.
You'll be able to stop and start Redis using the script named
`/etc/init.d/redis_<portnumber>`, for instance `/etc/init.d/redis_6379`.
Code contributions
-----------------
Note: By contributing code to the Redis project in any form, including sending
a pull request via Github, a code fragment or patch via private email or
public discussion groups, you agree to release your code under the terms
of the BSD license that you can find in the [COPYING][1] file included in the Redis
source distribution.
Please see the [CONTRIBUTING.md][2] file in this source distribution for more
information. For security bugs and vulnerabilities, please see [SECURITY.md][3].
[1]: https://github.com/redis/redis/blob/unstable/COPYING
[2]: https://github.com/redis/redis/blob/unstable/CONTRIBUTING.md
[3]: https://github.com/redis/redis/blob/unstable/SECURITY.md
Redis internals
===
If you are reading this README you are likely in front of a Github page
or you just untarred the Redis distribution tar ball. In both the cases
you are basically one step away from the source code, so here we explain
the Redis source code layout, what is in each file as a general idea, the
most important functions and structures inside the Redis server and so forth.
We keep all the discussion at a high level without digging into the details
since this document would be huge otherwise and our code base changes
continuously, but a general idea should be a good starting point to
understand more. Moreover most of the code is heavily commented and easy
to follow.
Source code layout
---
The Redis root directory just contains this README, the Makefile which
calls the real Makefile inside the `src` directory and an example
configuration for Redis and Sentinel. You can find a few shell
scripts that are used in order to execute the Redis, Redis Cluster and
Redis Sentinel unit tests, which are implemented inside the `tests`
directory.
Inside the root are the following important directories:
* `src`: contains the Redis implementation, written in C.
* `tests`: contains the unit tests, implemented in Tcl.
* `deps`: contains libraries Redis uses. Everything needed to compile Redis is inside this directory; your system just needs to provide `libc`, a POSIX compatible interface and a C compiler. Notably `deps` contains a copy of `jemalloc`, which is the default allocator of Redis under Linux. Note that under `deps` there are also things which started with the Redis project, but for which the main repository is not `redis/redis`.
There are a few more directories but they are not very important for our goals
here. We'll focus mostly on `src`, where the Redis implementation is contained,
exploring what there is inside each file. The order in which files are
exposed is the logical one to follow in order to disclose different layers
of complexity incrementally.
Note: lately Redis was refactored quite a bit. Function names and file
names have been changed, so you may find that this documentation reflects the
`unstable` branch more closely. For instance, in Redis 3.0 the `server.c`
and `server.h` files were named `redis.c` and `redis.h`. However the overall
structure is the same. Keep in mind that all the new developments and pull
requests should be performed against the `unstable` branch.
server.h
---
The simplest way to understand how a program works is to understand the
data structures it uses. So we'll start from the main header file of
Redis, which is `server.h`.
All the server configuration and in general all the shared state is
defined in a global structure called `server`, of type `struct redisServer`.
A few important fields in this structure are:
* `server.db` is an array of Redis databases, where data is stored.
* `server.commands` is the command table.
* `server.clients` is a linked list of clients connected to the server.
* `server.master` is a special client, the master, if the instance is a replica.
There are tons of other fields. Most fields are commented directly inside
the structure definition.
Another important Redis data structure is the one defining a client.
In the past it was called `redisClient`, now just `client`. The structure
has many fields, here we'll just show the main ones:
```c
struct client {
int fd;
sds querybuf;
int argc;
robj **argv;
redisDb *db;
int flags;
list *reply;
// ... many other fields ...
char buf[PROTO_REPLY_CHUNK_BYTES];
}
```
The client structure defines a *connected client*:
* The `fd` field is the client socket file descriptor.
* `argc` and `argv` are populated with the command the client is executing, so that functions implementing a given Redis command can read the arguments.
* `querybuf` accumulates the requests from the client, which are parsed by the Redis server according to the Redis protocol and executed by calling the implementations of the commands the client is executing.
* `reply` and `buf` are dynamic and static buffers that accumulate the replies the server sends to the client. These buffers are incrementally written to the socket as soon as the file descriptor is writable.
As you can see in the client structure above, arguments in a command
are described as `robj` structures. The following is the full `robj`
structure, which defines a *Redis object*:
```c
struct redisObject {
unsigned type:4;
unsigned encoding:4;
unsigned lru:LRU_BITS; /* LRU time (relative to global lru_clock) or
* LFU data (least significant 8 bits frequency
* and most significant 16 bits access time). */
int refcount;
void *ptr;
};
```
Basically this structure can represent all the basic Redis data types like
strings, lists, sets, sorted sets and so forth. The interesting thing is that
it has a `type` field, so that it is possible to know what type a given
object has, and a `refcount`, so that the same object can be referenced
in multiple places without allocating it multiple times. Finally the `ptr`
field points to the actual representation of the object, which might vary
even for the same type, depending on the `encoding` used.
2. Откройте файл конфигурации futriix.conf в любом текстовом редакторе, например nano, как в примере приведённом ниже:
Redis objects are used extensively in the Redis internals, however in order
to avoid the overhead of indirect accesses, recently in many places
we just use plain dynamic strings not wrapped inside a Redis object.
```sh
unix:$ nano futriix/futriix.conf
server.c
---
This is the entry point of the Redis server, where the `main()` function
is defined. The following are the most important steps in order to startup
the Redis server.
* `initServerConfig()` sets up the default values of the `server` structure.
* `initServer()` allocates the data structures needed to operate, setup the listening socket, and so forth.
* `aeMain()` starts the event loop which listens for new connections.
There are two special functions called periodically by the event loop:
1. `serverCron()` is called periodically (according to `server.hz` frequency), and performs tasks that must be performed from time to time, like checking for timed out clients.
2. `beforeSleep()` is called every time the event loop fired, Redis served a few requests, and is returning back into the event loop.
Inside server.c you can find code that handles other vital things of the Redis server:
* `call()` is used in order to call a given command in the context of a given client.
* `activeExpireCycle()` handles eviction of keys with a time to live set via the `EXPIRE` command.
* `performEvictions()` is called when a new write command should be performed but Redis is out of memory according to the `maxmemory` directive.
* The global variable `redisCommandTable` defines all the Redis commands, specifying the name of the command, the function implementing the command, the number of arguments required, and other properties of each command.
commands.c
---
This file is auto generated by utils/generate-command-code.py, the content is based on the JSON files in the src/commands folder.
These are meant to be the single source of truth about the Redis commands, and all the metadata about them.
These JSON files are not meant to be used by anyone directly, instead that metadata can be obtained via the `COMMAND` command.
networking.c
---
This file defines all the I/O functions with clients, masters and replicas
(which in Redis are just special clients):
* `createClient()` allocates and initializes a new client.
* The `addReply*()` family of functions are used by command implementations in order to append data to the client structure, that will be transmitted to the client as a reply for a given command executed.
* `writeToClient()` transmits the data pending in the output buffers to the client and is called by the *writable event handler* `sendReplyToClient()`.
* `readQueryFromClient()` is the *readable event handler* and accumulates data read from the client into the query buffer.
* `processInputBuffer()` is the entry point in order to parse the client query buffer according to the Redis protocol. Once commands are ready to be processed, it calls `processCommand()` which is defined inside `server.c` in order to actually execute the command.
* `freeClient()` deallocates, disconnects and removes a client.
aof.c and rdb.c
---
As you can guess from the names, these files implement the RDB and AOF
persistence for Redis. Redis uses a persistence model based on the `fork()`
system call in order to create a process with the same (shared) memory
content of the main Redis process. This secondary process dumps the content
of the memory on disk. This is used by `rdb.c` to create the snapshots
on disk and by `aof.c` in order to perform the AOF rewrite when the
append only file gets too big.
The implementation inside `aof.c` has additional functions in order to
implement an API that allows commands to append new commands into the AOF
file as clients execute them.
The `call()` function defined inside `server.c` is responsible for calling
the functions that in turn will write the commands into the AOF.
db.c
---
Certain Redis commands operate on specific data types; others are general.
Examples of generic commands are `DEL` and `EXPIRE`. They operate on keys
and not on their values specifically. All those generic commands are
defined inside `db.c`.
Moreover `db.c` implements an API in order to perform certain operations
on the Redis dataset without directly accessing the internal data structures.
The most important functions inside `db.c` which are used in many command
implementations are the following:
* `lookupKeyRead()` and `lookupKeyWrite()` are used in order to get a pointer to the value associated to a given key, or `NULL` if the key does not exist.
* `dbAdd()` and its higher level counterpart `setKey()` create a new key in a Redis database.
* `dbDelete()` removes a key and its associated value.
* `emptyDb()` removes an entire single database or all the databases defined.
The rest of the file implements the generic commands exposed to the client.
object.c
---
The `robj` structure defining Redis objects was already described. Inside
`object.c` there are all the functions that operate with Redis objects at
a basic level, like functions to allocate new objects, handle the reference
counting and so forth. Notable functions inside this file:
* `incrRefCount()` and `decrRefCount()` are used in order to increment or decrement an object reference count. When it drops to 0 the object is finally freed.
* `createObject()` allocates a new object. There are also specialized functions to allocate string objects having a specific content, like `createStringObjectFromLongLong()` and similar functions.
This file also implements the `OBJECT` command.
replication.c
---
This is one of the most complex files inside Redis, it is recommended to
approach it only after getting a bit familiar with the rest of the code base.
In this file there is the implementation of both the master and replica role
of Redis.
One of the most important functions inside this file is `replicationFeedSlaves()` that writes commands to the clients representing replica instances connected
to our master, so that the replicas can get the writes performed by the clients:
this way their data set will remain synchronized with the one in the master.
This file also implements both the `SYNC` and `PSYNC` commands that are
used in order to perform the first synchronization between masters and
replicas, or to continue the replication after a disconnection.
Script
---
The script unit is composed of 3 units:
* `script.c` - integration of scripts with Redis (commands execution, set replication/resp, ...)
* `script_lua.c` - responsible to execute Lua code, uses script.c to interact with Redis from within the Lua code.
* `function_lua.c` - contains the Lua engine implementation, uses script_lua.c to execute the Lua code.
* `functions.c` - contains Redis Functions implementation (FUNCTION command), uses functions_lua.c if the function it wants to invoke needs the Lua engine.
* `eval.c` - contains the `eval` implementation using `script_lua.c` to invoke the Lua code.
Other C files
---
* `t_hash.c`, `t_list.c`, `t_set.c`, `t_string.c`, `t_zset.c` and `t_stream.c` contains the implementation of the Redis data types. They implement both an API to access a given data type, and the client command implementations for these data types.
* `ae.c` implements the Redis event loop, it's a self contained library which is simple to read and understand.
* `sds.c` is the Redis string library, check https://github.com/antirez/sds for more information.
* `anet.c` is a library to use POSIX networking in a simpler way compared to the raw interface exposed by the kernel.
* `dict.c` is an implementation of a non-blocking hash table which rehashes incrementally.
* `cluster.c` implements the Redis Cluster. Probably a good read only after being very familiar with the rest of the Redis code base. If you want to read `cluster.c` make sure to read the [Redis Cluster specification][4].
[4]: https://redis.io/topics/cluster-spec
Anatomy of a Redis command
---
All the Redis commands are defined in the following way:
```c
void foobarCommand(client *c) {
printf("%s",c->argv[1]->ptr); /* Do something with the argument. */
addReply(c,shared.ok); /* Reply something to the client. */
}
```
The command function is referenced by a JSON file, together with its metadata, see `commands.c` described above for details.
The command flags are documented in the comment above the `struct redisCommand` in `server.h`.
For other details, please refer to the `COMMAND` command. https://redis.io/commands/command/
3. Найдите и установите значения "yes" для параметров "active-replica" и "multi-master". После чего добавьте в файл конфигурации ip-адреса, узлов вашего кластера. Если вы всё сделали правильно у вас должны отробразится строки в файле конфигурации `futriix.conf` как показано ниже:
After the command operates in some way, it returns a reply to the client,
usually using `addReply()` or a similar function defined inside `networking.c`.
```sh
There are tons of command implementations inside the Redis source code
that can serve as examples of actual commands implementations (e.g. pingCommand). Writing
a few toy commands can be a good exercise to get familiar with the code base.
active-replica yes
multi-master yes
replicaof 192.168.11.5 9880
replicaof 192.168.11.6 9880
replicaof 192.168.11.7 9880
There are also many other files not described here, but it is useless to
cover everything. We just want to help you with the first steps.
Eventually you'll find your way inside the Redis code base :-)
```
Enjoy!
4. Сохраните внесённые вами изменния, выйдите из редактора, воспользовавшись командами ниже:
```sh
unix:$ ctrl+O
unix:$ ctrl+x
```
5. Перейдите в директорию Futriix и запустите скрипт `cluster.sh` с параметрами `pick` (скрипт запущенный с данным параметром "соберёт кластер"), и `run`,(скрипт запущенный с данным параметром "запустит кластер") как указано ниже:
```sh
unix:$ ./cluster pick
unix:$ ./cluster run
```
6. Установите права на исполнение на скрипт `cluster.sh` , воспользовавшись командой ниже:
```sh
unix:$ chmod +x cluster.sh
```
7. Для остановки кластера запустите скрипт `cluster.sh` с параметром `stop`
```sh
unix:$ ./cluster stop
```
<p align="right">(<a href="#readme-top">К началу</a>)</p>
<!-- ROADMAP -->
## Дорожная карта
- [x] Добавить поддержку хранимых процедур
- [x] Изменить приглашение командной строки клиента futriix-cli
- [x] Переписать скрипт cluster.sh, формирующий кластер Futriix
- [x] Добавить поддержку модуля для работы с JSON
- [ ] Добавить в проект поддержку модуля, позволяющего запускать команды терминала операционной системы
- [ ] Реализовать поддержку алгоритма Raft
- [ ] Добавить поддержку языка запросов SQL
См. [Открытые проблемы](https://source.futriix.ru/gvsafronov/Futriix/issues) полный список предлагаемых функций (и известных проблем).
<p align="right">(<a href="#readme-top">К началу</a>)</p>
<!-- CONTRIBUTING -->
## Вклад
Вклады — это то, что делает сообщество открытого исходного кода таким замечательным местом для обучения, вдохновения и творчества. Любой ваш вклад **очень ценится**.
Если у вас есть предложение, которое могло бы улучшить ситуацию, создайте форк репозитория и создайте запрос на включение. Также можно просто открыть задачу с тегом «улучшение».
Не забудьте поставить проекту звезду! Еще раз спасибо!
1. Форкните проект
2. Создайте свою ветку функций (`git checkout -b Feature/AmazingFeature`)
3. Зафиксируйте свои изменения (git commit -m 'Add some AmazingFeature'`)
4. Отправьте в ветку (`git push origin Feature/AmazingFeature`)
5. Откройте запрос на включение
<!-- LICENSE -->
## Лицензия
Проект распространяется под 3-пунктной лицензией BSD. Подробнсти смотрите в файле `COPYING.txt`.
<p align="right">(<a href="#readme-top">К началу</a>)</p>
<!-- CONTACT -->
## Контакты
Григорий Сафронов - [E-mail](gvsafronov@yandex.ru)
Ссылка на проект (https://source.futriix.ru/gvsafronov/Futriix)
<p align="right">(<a href="#readme-top">К началу</a>)</p>

View File

@ -1,43 +1,6 @@
# Security Policy
## Supported Versions
Redis is generally backwards compatible with very few exceptions, so we
recommend users to always use the latest version to experience stability,
performance and security.
We generally backport security issues to a single previous major version,
unless this is not possible or feasible with a reasonable effort.
| Version | Supported |
| ------- | ------------------ |
| 7.0.x | :white_check_mark: |
| 6.2.x | :white_check_mark: |
| 6.0.x | :white_check_mark: |
| < 6.0 | :x: |
## Reporting a Vulnerability
If you believe youve discovered a serious vulnerability, please contact the
Redis core team at redis@redis.io. We will evaluate your report and if
necessary issue a fix and an advisory. If the issue was previously undisclosed,
well also mention your name in the credits.
## Responsible Disclosure
In some cases, we may apply a responsible disclosure process to reported or
otherwise discovered vulnerabilities. We will usually do that for a critical
vulnerability, and only if we have a good reason to believe information about
it is not yet public.
This process involves providing an early notification about the vulnerability,
its impact and mitigations to a short list of vendors under a time-limited
embargo on public disclosure.
Vendors on the list are individuals or organizations that maintain Redis
distributions or provide Redis as a service, who have third party users who
will benefit from the vendors ability to prepare for a new version or deploy a
fix early.
If you believe you should be on the list, please contact us and we will
consider your request based on the above criteria.
If you believe you've discovered a security vulnerability, please contact the Valkey team at security@lists.valkey.io.
Please *DO NOT* create an issue.
We follow a responsible disclosure procedure, so depending on the severity of the issue we may notify Valkey vendors about the issue before releasing it publicly.
If you would like to be added to our list of vendors, please reach out to the Valkey team at maintainers@lists.valkey.io.

104
TLS.md
View File

@ -1,104 +0,0 @@
TLS Support
===========
Getting Started
---------------
### Building
To build with TLS support you'll need OpenSSL development libraries (e.g.
libssl-dev on Debian/Ubuntu).
To build TLS support as Redis built-in:
Run `make BUILD_TLS=yes`.
Or to build TLS as Redis module:
Run `make BUILD_TLS=module`.
Note that sentinel mode does not support TLS module.
### Tests
To run Redis test suite with TLS, you'll need TLS support for TCL (i.e.
`tcl-tls` package on Debian/Ubuntu).
1. Run `./utils/gen-test-certs.sh` to generate a root CA and a server
certificate.
2. Run `./runtest --tls` or `./runtest-cluster --tls` to run Redis and Redis
Cluster tests in TLS mode.
3. Run `./runtest --tls-module` or `./runtest-cluster --tls-module` to
run Redis and Redis cluster tests in TLS mode with Redis module.
### Running manually
To manually run a Redis server with TLS mode (assuming `gen-test-certs.sh` was
invoked so sample certificates/keys are available):
For TLS built-in mode:
./src/redis-server --tls-port 6379 --port 0 \
--tls-cert-file ./tests/tls/redis.crt \
--tls-key-file ./tests/tls/redis.key \
--tls-ca-cert-file ./tests/tls/ca.crt
For TLS module mode:
./src/redis-server --tls-port 6379 --port 0 \
--tls-cert-file ./tests/tls/redis.crt \
--tls-key-file ./tests/tls/redis.key \
--tls-ca-cert-file ./tests/tls/ca.crt \
--loadmodule src/redis-tls.so
To connect to this Redis server with `redis-cli`:
./src/redis-cli --tls \
--cert ./tests/tls/redis.crt \
--key ./tests/tls/redis.key \
--cacert ./tests/tls/ca.crt
This will disable TCP and enable TLS on port 6379. It's also possible to have
both TCP and TLS available, but you'll need to assign different ports.
To make a Replica connect to the master using TLS, use `--tls-replication yes`,
and to make Redis Cluster use TLS across nodes use `--tls-cluster yes`.
Connections
-----------
All socket operations now go through a connection abstraction layer that hides
I/O and read/write event handling from the caller.
**Multi-threading I/O is not currently supported for TLS**, as a TLS connection
needs to do its own manipulation of AE events which is not thread safe. The
solution is probably to manage independent AE loops for I/O threads and longer
term association of connections with threads. This may potentially improve
overall performance as well.
Sync IO for TLS is currently implemented in a hackish way, i.e. making the
socket blocking and configuring socket-level timeout. This means the timeout
value may not be so accurate, and there would be a lot of syscall overhead.
However I believe that getting rid of syncio completely in favor of pure async
work is probably a better move than trying to fix that. For replication it would
probably not be so hard. For cluster keys migration it might be more difficult,
but there are probably other good reasons to improve that part anyway.
To-Do List
----------
- [ ] redis-benchmark support. The current implementation is a mix of using
hiredis for parsing and basic networking (establishing connections), but
directly manipulating sockets for most actions. This will need to be cleaned
up for proper TLS support. The best approach is probably to migrate to hiredis
async mode.
- [ ] redis-cli `--slave` and `--rdb` support.
Multi-port
----------
Consider the implications of allowing TLS to be configured on a separate port,
making Redis listening on multiple ports:
1. Startup banner port notification
2. Proctitle
3. How slaves announce themselves
4. Cluster bus port calculation

49
utils/create-cluster/create-cluster → cluster Executable file → Normal file
View File

@ -1,16 +1,17 @@
#!/bin/bash
#!/usr/bin/env sh
SCRIPT_DIR="$( cd -- "$( dirname -- "${BASH_SOURCE[0]}" )" &> /dev/null && pwd )"
# Settings
BIN_PATH="$SCRIPT_DIR/../../src/"
BIN_PATH="$SCRIPT_DIR/src"
CLUSTER_HOST=127.0.0.1
PORT=30000
PORT=7000
TIMEOUT=2000
NODES=6
REPLICAS=1
PROTECTED_MODE=yes
ADDITIONAL_OPTIONS=""
CONFIG_PATH="./futriix.conf"
# You may want to put the above config parameters into config.sh in order to
# override the defaults without modifying this script.
@ -23,17 +24,17 @@ fi
# Computed vars
ENDPORT=$((PORT+NODES))
if [ "$1" == "start" ]
if [ "$1" == "pick" ]
then
while [ $((PORT < ENDPORT)) != "0" ]; do
PORT=$((PORT+1))
echo "Starting $PORT"
$BIN_PATH/redis-server --port $PORT --protected-mode $PROTECTED_MODE --cluster-enabled yes --cluster-config-file nodes-${PORT}.conf --cluster-node-timeout $TIMEOUT --appendonly yes --appendfilename appendonly-${PORT}.aof --appenddirname appendonlydir-${PORT} --dbfilename dump-${PORT}.rdb --logfile ${PORT}.log --daemonize yes ${ADDITIONAL_OPTIONS}
$BIN_PATH/futriix-server ${CONFIG_PATH} --port $PORT --protected-mode $PROTECTED_MODE --cluster-enabled yes --cluster-config-file nodes-${PORT}.conf --cluster-node-timeout $TIMEOUT --appendonly yes --appendfilename appendonly-${PORT}.aof --appenddirname appendonlydir-${PORT} --dbfilename dump-${PORT}.rdb --logfile ${PORT}.log --daemonize yes --enable-protected-configs yes --enable-debug-command yes --enable-module-command yes ${ADDITIONAL_OPTIONS}
done
exit 0
fi
if [ "$1" == "create" ]
if [ "$1" == "run" ]
then
HOSTS=""
while [ $((PORT < ENDPORT)) != "0" ]; do
@ -44,7 +45,7 @@ then
if [ "$2" == "-f" ]; then
OPT_ARG="--cluster-yes"
fi
$BIN_PATH/redis-cli --cluster create $HOSTS --cluster-replicas $REPLICAS $OPT_ARG
$BIN_PATH/futriix-cli --cluster create $HOSTS --cluster-replicas $REPLICAS $OPT_ARG
exit 0
fi
@ -53,7 +54,24 @@ then
while [ $((PORT < ENDPORT)) != "0" ]; do
PORT=$((PORT+1))
echo "Stopping $PORT"
$BIN_PATH/redis-cli -p $PORT shutdown nosave
$BIN_PATH/futriix-cli -p $PORT shutdown nosave
done
exit 0
fi
if [ "$1" == "repick" ]
then
OLD_PORT=$PORT
while [ $((PORT < ENDPORT)) != "0" ]; do
PORT=$((PORT+1))
echo "Stopping $PORT"
$BIN_PATH/futriix-cli -p $PORT shutdown nosave
done
PORT=$OLD_PORT
while [ $((PORT < ENDPORT)) != "0" ]; do
PORT=$((PORT+1))
echo "picking $PORT"
$BIN_PATH/futriix-server ${CONFIG_PATH} --port $PORT --protected-mode $PROTECTED_MODE --cluster-enabled yes --cluster-config-file nodes-${PORT}.conf --cluster-node-timeout $TIMEOUT --appendonly yes --appendfilename appendonly-${PORT}.aof --appenddirname appendonlydir-${PORT} --dbfilename dump-${PORT}.rdb --logfile ${PORT}.log --daemonize yes --enable-protected-configs yes --enable-debug-command yes --enable-module-command yes ${ADDITIONAL_OPTIONS}
done
exit 0
fi
@ -64,7 +82,7 @@ then
while [ 1 ]; do
clear
date
$BIN_PATH/redis-cli -p $PORT cluster nodes | head -30
$BIN_PATH/futriix-cli -p $PORT cluster nodes | head -30
sleep 1
done
exit 0
@ -88,7 +106,7 @@ if [ "$1" == "call" ]
then
while [ $((PORT < ENDPORT)) != "0" ]; do
PORT=$((PORT+1))
$BIN_PATH/redis-cli -p $PORT $2 $3 $4 $5 $6 $7 $8 $9
$BIN_PATH/valkey-cli -p $PORT $2 $3 $4 $5 $6 $7 $8 $9
done
exit 0
fi
@ -113,13 +131,16 @@ then
exit 0
fi
echo "Usage: $0 [start|create|stop|watch|tail|tailall|clean|clean-logs|call]"
echo "start -- Launch Redis Cluster instances."
echo "create [-f] -- Create a cluster using redis-cli --cluster create."
echo "stop -- Stop Redis Cluster instances."
echo ""
echo "Usage: $0 [pick|run|stop|restart|watch|tail|tailall|clean|clean-logs|call]"
echo "pick -- Launch Futriix Cluster instances."
echo "run [-f] -- Create a cluster using futriix-cli --cluster create."
echo "stop -- Stop Futriix Cluster instances."
echo "restart -- Restart Futriix Cluster instances."
echo "watch -- Show CLUSTER NODES output (first 30 lines) of first node."
echo "tail <id> -- Run tail -f of instance at base port + ID."
echo "tailall -- Run tail -f for all the log files at once."
echo "clean -- Remove all instances data, logs, configs."
echo "clean-logs -- Remove just instances logs."
echo "call <cmd> -- Call a command (up to 7 arguments) on all nodes."
echo ""

192
cluster-experimental Normal file
View File

@ -0,0 +1,192 @@
#!/usr/bin/env sh
# This script automatically picked and started Futriix-cluster
# Also in this script added color indicator "Ok" and "Fail"
SCRIPT_DIR="$( cd -- "$( dirname -- "${BASH_SOURCE[0]}" )" &> /dev/null && pwd )"
# Settings of color
SETCOLOR_SUCCESS="echo -en \\033[1;32m"
SETCOLOR_FAILURE="echo -en \\033[1;31m"
SETCOLOR_NORMAL="echo -en \\033[0;39m"
# Settings
BIN_PATH="$SCRIPT_DIR/src"
CLUSTER_HOST=127.0.0.1
PORT=7000
TIMEOUT=2000
NODES=6
REPLICAS=1
PROTECTED_MODE=yes
ADDITIONAL_OPTIONS=""
CONFIG_PATH="./futriix.conf"
# You may want to put the above config parameters into config.sh in order to
# override the defaults without modifying this script.
if [ -a config.sh ]
then
source "config.sh"
fi
# Computed vars
ENDPORT=$((PORT+NODES))
#if [ $? -eq 0 ]; then
# $SETCOLOR_SUCCESS
# echo -n "$(tput hpa $(tput cols))$(tput cub 6)[OK]"
# $SETCOLOR_NORMAL
# echo
# else
# $SETCOLOR_FAILURE
# echo -n "$(tput hpa $(tput cols))$(tput cub 6)[fail]"
# $SETCOLOR_NORMAL
# echo
#fi
if [ "$1" == "pick" ]
then
while [ $((PORT < ENDPORT)) != "0" ]; do
PORT=$((PORT+1))
echo "Starting $PORT"
yes | $BIN_PATH/futriix-server ${CONFIG_PATH} --port $PORT --protected-mode $PROTECTED_MODE --cluster-enabled yes --cluster-config-file nodes-${PORT}.conf --cluster-node-timeout $TIMEOUT --appendonly yes --appendfilename appendonly-${PORT}.aof --appenddirname appendonlydir-${PORT} --dbfilename dump-${PORT}.rdb --logfile ${PORT}.log --daemonize yes --enable-protected-configs yes --enable-debug-command yes --enable-module-command yes ${ADDITIONAL_OPTIONS} >/dev/null 2>&1
if [ $? -eq 0 ]; then
$SETCOLOR_SUCCESS
echo -n "$(tput hpa $(tput cols))$(tput cub 6)[OK]"
$SETCOLOR_NORMAL
echo
else
$SETCOLOR_FAILURE
echo -n "$(tput hpa $(tput cols))$(tput cub 6)[fail]"
$SETCOLOR_NORMAL
echo
fi
done
exit 0
fi
if [ "$1" == "run" ]
then
HOSTS=""
while [ $((PORT < ENDPORT)) != "0" ]; do
PORT=$((PORT+1))
HOSTS="$HOSTS $CLUSTER_HOST:$PORT"
done
OPT_ARG=""
if [ "$2" == "-f" ]; then
OPT_ARG="--cluster-yes"
fi
yes | $BIN_PATH/futriix-cli --cluster create $HOSTS --cluster-replicas $REPLICAS $OPT_ARG >/dev/null 2>&1
exit 0
fi
if [ $? -eq 0 ]; then
$SETCOLOR_SUCCESS
echo -n "$(tput hpa $(tput cols))$(tput cub 6)[OK]"
$SETCOLOR_NORMAL
echo
else
$SETCOLOR_FAILURE
echo -n "$(tput hpa $(tput cols))$(tput cub 6)[fail]"
$SETCOLOR_NORMAL
echo
fi
if [ "$1" == "stop" ]
then
while [ $((PORT < ENDPORT)) != "0" ]; do
PORT=$((PORT+1))
echo "Stopping $PORT"
$BIN_PATH/futriix-cli -p $PORT shutdown nosave
done
exit 0
fi
if [ "$1" == "repick" ]
then
OLD_PORT=$PORT
while [ $((PORT < ENDPORT)) != "0" ]; do
PORT=$((PORT+1))
echo "Stopping $PORT"
$BIN_PATH/futriix-cli -p $PORT shutdown nosave
done
PORT=$OLD_PORT
while [ $((PORT < ENDPORT)) != "0" ]; do
PORT=$((PORT+1))
echo "picking $PORT"
$BIN_PATH/futriix-server ${CONFIG_PATH} --port $PORT --protected-mode $PROTECTED_MODE --cluster-enabled yes --cluster-config-file nodes-${PORT}.conf --cluster-node-timeout $TIMEOUT --appendonly yes --appendfilename appendonly-${PORT}.aof --appenddirname appendonlydir-${PORT} --dbfilename dump-${PORT}.rdb --logfile ${PORT}.log --daemonize yes --enable-protected-configs yes --enable-debug-command yes --enable-module-command yes ${ADDITIONAL_OPTIONS}
done
exit 0
fi
if [ "$1" == "watch" ]
then
PORT=$((PORT+1))
while [ 1 ]; do
clear
date
$BIN_PATH/futriix-cli -p $PORT cluster nodes | head -30
sleep 1
done
exit 0
fi
if [ "$1" == "tail" ]
then
INSTANCE=$2
PORT=$((PORT+INSTANCE))
tail -f ${PORT}.log
exit 0
fi
if [ "$1" == "tailall" ]
then
tail -f *.log
exit 0
fi
if [ "$1" == "call" ]
then
while [ $((PORT < ENDPORT)) != "0" ]; do
PORT=$((PORT+1))
$BIN_PATH/valkey-cli -p $PORT $2 $3 $4 $5 $6 $7 $8 $9
done
exit 0
fi
if [ "$1" == "clean" ]
then
echo "Cleaning *.log"
rm -rf *.log
echo "Cleaning appendonlydir-*"
rm -rf appendonlydir-*
echo "Cleaning dump-*.rdb"
rm -rf dump-*.rdb
echo "Cleaning nodes-*.conf"
rm -rf nodes-*.conf
exit 0
fi
if [ "$1" == "clean-logs" ]
then
echo "Cleaning *.log"
rm -rf *.log
exit 0
fi
echo ""
echo "Usage: $0 [pick|run|stop|restart|watch|tail|tailall|clean|clean-logs|call]"
echo "pick -- Launch Futriix Cluster instances."
echo "run [-f] -- Create a cluster using futriix-cli --cluster create."
echo "stop -- Stop Futriix Cluster instances."
echo "restart -- Restart Futriix Cluster instances."
echo "watch -- Show CLUSTER NODES output (first 30 lines) of first node."
echo "tail <id> -- Run tail -f of instance at base port + ID."
echo "tailall -- Run tail -f for all the log files at once."
echo "clean -- Remove all instances data, logs, configs."
echo "clean-logs -- Remove just instances logs."
echo "call <cmd> -- Call a command (up to 7 arguments) on all nodes."
echo ""

View File

@ -0,0 +1,44 @@
set(CPACK_PACKAGE_NAME "valkey")
valkey_parse_version(CPACK_PACKAGE_VERSION_MAJOR CPACK_PACKAGE_VERSION_MINOR CPACK_PACKAGE_VERSION_PATCH)
set(CPACK_PACKAGE_CONTACT "maintainers@lists.valkey.io")
set(CPACK_PACKAGE_DESCRIPTION_SUMMARY "Valkey is an open source (BSD) high-performance key/value datastore")
set(CPACK_RESOURCE_FILE_LICENSE "${CMAKE_SOURCE_DIR}/COPYING")
set(CPACK_RESOURCE_FILE_README "${CMAKE_SOURCE_DIR}/README.md")
set(CPACK_STRIP_FILES TRUE)
valkey_get_distro_name(DISTRO_NAME)
message(STATUS "Current host distro: ${DISTRO_NAME}")
if (DISTRO_NAME MATCHES ubuntu
OR DISTRO_NAME MATCHES debian
OR DISTRO_NAME MATCHES mint)
message(STATUS "Adding target package for ${DISTRO_NAME}")
set(CPACK_PACKAGING_INSTALL_PREFIX "/opt/valkey")
# Debian related parameters
set(CPACK_DEBIAN_PACKAGE_MAINTAINER "Valkey contributors")
set(CPACK_DEBIAN_PACKAGE_SHLIBDEPS ON)
set(CPACK_DEBIAN_FILE_NAME DEB-DEFAULT)
set(CPACK_GENERATOR "DEB")
endif ()
include(CPack)
unset(DISTRO_NAME CACHE)
# ---------------------------------------------------
# Create a helper script for creating symbolic links
# ---------------------------------------------------
write_file(
${CMAKE_BINARY_DIR}/CreateSymlink.sh
"\
#!/bin/bash \n\
if [ -z \${DESTDIR} ]; then \n\
# Script is called during 'make install' \n\
PREFIX=${CMAKE_INSTALL_PREFIX}/bin \n\
else \n\
# Script is called during 'make package' \n\
PREFIX=\${DESTDIR}${CPACK_PACKAGING_INSTALL_PREFIX}/bin \n\
fi \n\
cd \$PREFIX \n\
ln -sf \$1 \$2")

View File

@ -0,0 +1,157 @@
# -------------------------------------------------
# Define the sources to be built
# -------------------------------------------------
# valkey-server source files
set(VALKEY_SERVER_SRCS
${CMAKE_SOURCE_DIR}/src/threads_mngr.c
${CMAKE_SOURCE_DIR}/src/adlist.c
${CMAKE_SOURCE_DIR}/src/quicklist.c
${CMAKE_SOURCE_DIR}/src/ae.c
${CMAKE_SOURCE_DIR}/src/anet.c
${CMAKE_SOURCE_DIR}/src/dict.c
${CMAKE_SOURCE_DIR}/src/hashtable.c
${CMAKE_SOURCE_DIR}/src/kvstore.c
${CMAKE_SOURCE_DIR}/src/sds.c
${CMAKE_SOURCE_DIR}/src/zmalloc.c
${CMAKE_SOURCE_DIR}/src/lzf_c.c
${CMAKE_SOURCE_DIR}/src/lzf_d.c
${CMAKE_SOURCE_DIR}/src/pqsort.c
${CMAKE_SOURCE_DIR}/src/zipmap.c
${CMAKE_SOURCE_DIR}/src/sha1.c
${CMAKE_SOURCE_DIR}/src/ziplist.c
${CMAKE_SOURCE_DIR}/src/release.c
${CMAKE_SOURCE_DIR}/src/memory_prefetch.c
${CMAKE_SOURCE_DIR}/src/io_threads.c
${CMAKE_SOURCE_DIR}/src/networking.c
${CMAKE_SOURCE_DIR}/src/util.c
${CMAKE_SOURCE_DIR}/src/object.c
${CMAKE_SOURCE_DIR}/src/db.c
${CMAKE_SOURCE_DIR}/src/replication.c
${CMAKE_SOURCE_DIR}/src/rdb.c
${CMAKE_SOURCE_DIR}/src/t_string.c
${CMAKE_SOURCE_DIR}/src/t_list.c
${CMAKE_SOURCE_DIR}/src/t_set.c
${CMAKE_SOURCE_DIR}/src/t_zset.c
${CMAKE_SOURCE_DIR}/src/t_hash.c
${CMAKE_SOURCE_DIR}/src/config.c
${CMAKE_SOURCE_DIR}/src/aof.c
${CMAKE_SOURCE_DIR}/src/pubsub.c
${CMAKE_SOURCE_DIR}/src/multi.c
${CMAKE_SOURCE_DIR}/src/debug.c
${CMAKE_SOURCE_DIR}/src/sort.c
${CMAKE_SOURCE_DIR}/src/intset.c
${CMAKE_SOURCE_DIR}/src/syncio.c
${CMAKE_SOURCE_DIR}/src/cluster.c
${CMAKE_SOURCE_DIR}/src/cluster_legacy.c
${CMAKE_SOURCE_DIR}/src/cluster_slot_stats.c
${CMAKE_SOURCE_DIR}/src/crc16.c
${CMAKE_SOURCE_DIR}/src/endianconv.c
${CMAKE_SOURCE_DIR}/src/commandlog.c
${CMAKE_SOURCE_DIR}/src/eval.c
${CMAKE_SOURCE_DIR}/src/bio.c
${CMAKE_SOURCE_DIR}/src/rio.c
${CMAKE_SOURCE_DIR}/src/rand.c
${CMAKE_SOURCE_DIR}/src/memtest.c
${CMAKE_SOURCE_DIR}/src/syscheck.c
${CMAKE_SOURCE_DIR}/src/crcspeed.c
${CMAKE_SOURCE_DIR}/src/crccombine.c
${CMAKE_SOURCE_DIR}/src/crc64.c
${CMAKE_SOURCE_DIR}/src/bitops.c
${CMAKE_SOURCE_DIR}/src/sentinel.c
${CMAKE_SOURCE_DIR}/src/notify.c
${CMAKE_SOURCE_DIR}/src/setproctitle.c
${CMAKE_SOURCE_DIR}/src/blocked.c
${CMAKE_SOURCE_DIR}/src/hyperloglog.c
${CMAKE_SOURCE_DIR}/src/latency.c
${CMAKE_SOURCE_DIR}/src/sparkline.c
${CMAKE_SOURCE_DIR}/src/valkey-check-rdb.c
${CMAKE_SOURCE_DIR}/src/valkey-check-aof.c
${CMAKE_SOURCE_DIR}/src/geo.c
${CMAKE_SOURCE_DIR}/src/lazyfree.c
${CMAKE_SOURCE_DIR}/src/module.c
${CMAKE_SOURCE_DIR}/src/evict.c
${CMAKE_SOURCE_DIR}/src/expire.c
${CMAKE_SOURCE_DIR}/src/geohash.c
${CMAKE_SOURCE_DIR}/src/geohash_helper.c
${CMAKE_SOURCE_DIR}/src/childinfo.c
${CMAKE_SOURCE_DIR}/src/allocator_defrag.c
${CMAKE_SOURCE_DIR}/src/defrag.c
${CMAKE_SOURCE_DIR}/src/siphash.c
${CMAKE_SOURCE_DIR}/src/rax.c
${CMAKE_SOURCE_DIR}/src/t_stream.c
${CMAKE_SOURCE_DIR}/src/listpack.c
${CMAKE_SOURCE_DIR}/src/localtime.c
${CMAKE_SOURCE_DIR}/src/lolwut.c
${CMAKE_SOURCE_DIR}/src/lolwut5.c
${CMAKE_SOURCE_DIR}/src/lolwut6.c
${CMAKE_SOURCE_DIR}/src/acl.c
${CMAKE_SOURCE_DIR}/src/tracking.c
${CMAKE_SOURCE_DIR}/src/socket.c
${CMAKE_SOURCE_DIR}/src/tls.c
${CMAKE_SOURCE_DIR}/src/rdma.c
${CMAKE_SOURCE_DIR}/src/sha256.c
${CMAKE_SOURCE_DIR}/src/timeout.c
${CMAKE_SOURCE_DIR}/src/setcpuaffinity.c
${CMAKE_SOURCE_DIR}/src/monotonic.c
${CMAKE_SOURCE_DIR}/src/mt19937-64.c
${CMAKE_SOURCE_DIR}/src/resp_parser.c
${CMAKE_SOURCE_DIR}/src/call_reply.c
${CMAKE_SOURCE_DIR}/src/script_lua.c
${CMAKE_SOURCE_DIR}/src/script.c
${CMAKE_SOURCE_DIR}/src/functions.c
${CMAKE_SOURCE_DIR}/src/scripting_engine.c
${CMAKE_SOURCE_DIR}/src/function_lua.c
${CMAKE_SOURCE_DIR}/src/commands.c
${CMAKE_SOURCE_DIR}/src/strl.c
${CMAKE_SOURCE_DIR}/src/connection.c
${CMAKE_SOURCE_DIR}/src/unix.c
${CMAKE_SOURCE_DIR}/src/server.c
${CMAKE_SOURCE_DIR}/src/logreqres.c)
# valkey-cli
set(VALKEY_CLI_SRCS
${CMAKE_SOURCE_DIR}/src/anet.c
${CMAKE_SOURCE_DIR}/src/adlist.c
${CMAKE_SOURCE_DIR}/src/dict.c
${CMAKE_SOURCE_DIR}/src/valkey-cli.c
${CMAKE_SOURCE_DIR}/src/zmalloc.c
${CMAKE_SOURCE_DIR}/src/release.c
${CMAKE_SOURCE_DIR}/src/ae.c
${CMAKE_SOURCE_DIR}/src/serverassert.c
${CMAKE_SOURCE_DIR}/src/crcspeed.c
${CMAKE_SOURCE_DIR}/src/crccombine.c
${CMAKE_SOURCE_DIR}/src/crc64.c
${CMAKE_SOURCE_DIR}/src/siphash.c
${CMAKE_SOURCE_DIR}/src/crc16.c
${CMAKE_SOURCE_DIR}/src/monotonic.c
${CMAKE_SOURCE_DIR}/src/cli_common.c
${CMAKE_SOURCE_DIR}/src/mt19937-64.c
${CMAKE_SOURCE_DIR}/src/strl.c
${CMAKE_SOURCE_DIR}/src/cli_commands.c)
# valkey-benchmark
set(VALKEY_BENCHMARK_SRCS
${CMAKE_SOURCE_DIR}/src/ae.c
${CMAKE_SOURCE_DIR}/src/anet.c
${CMAKE_SOURCE_DIR}/src/valkey-benchmark.c
${CMAKE_SOURCE_DIR}/src/adlist.c
${CMAKE_SOURCE_DIR}/src/dict.c
${CMAKE_SOURCE_DIR}/src/zmalloc.c
${CMAKE_SOURCE_DIR}/src/serverassert.c
${CMAKE_SOURCE_DIR}/src/release.c
${CMAKE_SOURCE_DIR}/src/crcspeed.c
${CMAKE_SOURCE_DIR}/src/crccombine.c
${CMAKE_SOURCE_DIR}/src/crc64.c
${CMAKE_SOURCE_DIR}/src/siphash.c
${CMAKE_SOURCE_DIR}/src/crc16.c
${CMAKE_SOURCE_DIR}/src/monotonic.c
${CMAKE_SOURCE_DIR}/src/cli_common.c
${CMAKE_SOURCE_DIR}/src/mt19937-64.c
${CMAKE_SOURCE_DIR}/src/strl.c)
# valkey-rdma module
set(VALKEY_RDMA_MODULE_SRCS ${CMAKE_SOURCE_DIR}/src/rdma.c)
# valkey-tls module
set(VALKEY_TLS_MODULE_SRCS ${CMAKE_SOURCE_DIR}/src/tls.c)

115
cmake/Modules/Utils.cmake Normal file
View File

@ -0,0 +1,115 @@
# Return the current host distro name. For example: ubuntu, debian, amzn etc
function (valkey_get_distro_name DISTRO_NAME)
if (LINUX AND NOT APPLE)
execute_process(
COMMAND /bin/bash "-c" "cat /etc/os-release |grep ^ID=|cut -d = -f 2"
OUTPUT_VARIABLE _OUT_VAR
OUTPUT_STRIP_TRAILING_WHITESPACE)
# clean the output
string(REPLACE "\"" "" _OUT_VAR "${_OUT_VAR}")
string(REPLACE "." "" _OUT_VAR "${_OUT_VAR}")
set(${DISTRO_NAME}
"${_OUT_VAR}"
PARENT_SCOPE)
elseif (APPLE)
set(${DISTRO_NAME}
"darwin"
PARENT_SCOPE)
elseif (IS_FREEBSD)
set(${DISTRO_NAME}
"freebsd"
PARENT_SCOPE)
else ()
set(${DISTRO_NAME}
"unknown"
PARENT_SCOPE)
endif ()
endfunction ()
function (valkey_parse_version OUT_MAJOR OUT_MINOR OUT_PATCH)
# Read and parse package version from version.h file
file(STRINGS ${CMAKE_SOURCE_DIR}/src/version.h VERSION_LINES)
foreach (LINE ${VERSION_LINES})
string(FIND "${LINE}" "#define VALKEY_VERSION " VERSION_STR_POS)
if (VERSION_STR_POS GREATER -1)
string(REPLACE "#define VALKEY_VERSION " "" LINE "${LINE}")
string(REPLACE "\"" "" LINE "${LINE}")
# Change "." to ";" to make it a list
string(REPLACE "." ";" LINE "${LINE}")
list(GET LINE 0 _MAJOR)
list(GET LINE 1 _MINOR)
list(GET LINE 2 _PATCH)
message(STATUS "Valkey version: ${_MAJOR}.${_MINOR}.${_PATCH}")
# Set the output variables
set(${OUT_MAJOR}
${_MAJOR}
PARENT_SCOPE)
set(${OUT_MINOR}
${_MINOR}
PARENT_SCOPE)
set(${OUT_PATCH}
${_PATCH}
PARENT_SCOPE)
endif ()
endforeach ()
endfunction ()
# Given input argument `OPTION_VALUE`, check that the `OPTION_VALUE` is from the allowed values (one of:
# module/yes/no/1/0/true/false)
#
# Return value:
#
# If ARG is valid, return its number where:
#
# ~~~
# - `no` | `0` | `off` => return `0`
# - `yes` | `1` | `on` => return `1`
# - `module` => return `2`
# ~~~
function (valkey_parse_build_option OPTION_VALUE OUT_ARG_ENUM)
list(APPEND VALID_OPTIONS "yes")
list(APPEND VALID_OPTIONS "1")
list(APPEND VALID_OPTIONS "on")
list(APPEND VALID_OPTIONS "no")
list(APPEND VALID_OPTIONS "0")
list(APPEND VALID_OPTIONS "off")
list(APPEND VALID_OPTIONS "module")
string(TOLOWER "${OPTION_VALUE}" OPTION_VALUE)
list(FIND VALID_OPTIONS "${ARG}" OPT_INDEX)
if (VERSION_STR_POS GREATER -1)
message(FATAL_ERROR "Invalid value passed ''${OPTION_VALUE}'")
endif ()
if ("${OPTION_VALUE}" STREQUAL "yes"
OR "${OPTION_VALUE}" STREQUAL "1"
OR "${OPTION_VALUE}" STREQUAL "on")
set(${OUT_ARG_ENUM}
1
PARENT_SCOPE)
elseif (
"${OPTION_VALUE}" STREQUAL "no"
OR "${OPTION_VALUE}" STREQUAL "0"
OR "${OPTION_VALUE}" STREQUAL "off")
set(${OUT_ARG_ENUM}
0
PARENT_SCOPE)
else ()
set(${OUT_ARG_ENUM}
2
PARENT_SCOPE)
endif ()
endfunction ()
function (valkey_pkg_config PKGNAME OUT_VARIABLE)
if (NOT FOUND_PKGCONFIG)
# Locate pkg-config once
find_package(PkgConfig REQUIRED)
set(FOUND_PKGCONFIG 1)
endif ()
pkg_check_modules(__PREFIX REQUIRED ${PKGNAME})
message(STATUS "Found library for '${PKGNAME}': ${__PREFIX_LIBRARIES}")
set(${OUT_VARIABLE}
"${__PREFIX_LIBRARIES}"
PARENT_SCOPE)
endfunction ()

View File

@ -0,0 +1,394 @@
include(CheckIncludeFiles)
include(ProcessorCount)
include(Utils)
set(CMAKE_LIBRARY_OUTPUT_DIRECTORY "${CMAKE_BINARY_DIR}/lib")
set(CMAKE_RUNTIME_OUTPUT_DIRECTORY "${CMAKE_BINARY_DIR}/bin")
set(CMAKE_ARCHIVE_OUTPUT_DIRECTORY "${CMAKE_BINARY_DIR}/lib")
# Generate compile_commands.json file for IDEs code completion support
set(CMAKE_EXPORT_COMPILE_COMMANDS 1)
processorcount(VALKEY_PROCESSOR_COUNT)
message(STATUS "Processor count: ${VALKEY_PROCESSOR_COUNT}")
# Installed executables will have this permissions
set(VALKEY_EXE_PERMISSIONS
OWNER_EXECUTE
OWNER_WRITE
OWNER_READ
GROUP_EXECUTE
GROUP_READ
WORLD_EXECUTE
WORLD_READ)
set(VALKEY_SERVER_CFLAGS "")
set(VALKEY_SERVER_LDFLAGS "")
# ----------------------------------------------------
# Helper functions & macros
# ----------------------------------------------------
macro (add_valkey_server_compiler_options value)
set(VALKEY_SERVER_CFLAGS "${VALKEY_SERVER_CFLAGS} ${value}")
endmacro ()
macro (add_valkey_server_linker_option value)
list(APPEND VALKEY_SERVER_LDFLAGS ${value})
endmacro ()
macro (get_valkey_server_linker_option return_value)
list(JOIN VALKEY_SERVER_LDFLAGS " " ${value} ${return_value})
endmacro ()
set(IS_FREEBSD 0)
if (CMAKE_SYSTEM_NAME MATCHES "^.*BSD$|DragonFly")
message(STATUS "Building for FreeBSD compatible system")
set(IS_FREEBSD 1)
include_directories("/usr/local/include")
add_valkey_server_compiler_options("-DUSE_BACKTRACE")
endif ()
# Helper function for creating symbolic link so that: link -> source
macro (valkey_create_symlink source link)
install(
CODE "execute_process( \
COMMAND /bin/bash ${CMAKE_BINARY_DIR}/CreateSymlink.sh \
${source} \
${link} \
)"
COMPONENT "valkey")
endmacro ()
# Install a binary
macro (valkey_install_bin target)
# Install cli tool and create a redis symbolic link
install(
TARGETS ${target}
DESTINATION ${CMAKE_INSTALL_BINDIR}
PERMISSIONS ${VALKEY_EXE_PERMISSIONS}
COMPONENT "valkey")
endmacro ()
# Helper function that defines, builds and installs `target` In addition, it creates a symbolic link between the target
# and `link_name`
macro (valkey_build_and_install_bin target sources ld_flags libs link_name)
add_executable(${target} ${sources})
if (USE_JEMALLOC
OR USE_TCMALLOC
OR USE_TCMALLOC_MINIMAL)
# Using custom allocator
target_link_libraries(${target} ${ALLOCATOR_LIB})
endif ()
# Place this line last to ensure that ${ld_flags} is placed last on the linker line
target_link_libraries(${target} ${libs} ${ld_flags})
target_link_libraries(${target} hiredis)
if (USE_TLS)
# Add required libraries needed for TLS
target_link_libraries(${target} OpenSSL::SSL hiredis_ssl)
endif ()
if (IS_FREEBSD)
target_link_libraries(${target} execinfo)
endif ()
# Enable all warnings + fail on warning
target_compile_options(${target} PRIVATE -Werror -Wall)
# Install cli tool and create a redis symbolic link
valkey_install_bin(${target})
valkey_create_symlink(${target} ${link_name})
endmacro ()
# Helper function that defines, builds and installs `target` module.
macro (valkey_build_and_install_module target sources ld_flags libs)
add_library(${target} SHARED ${sources})
if (USE_JEMALLOC)
# Using jemalloc
target_link_libraries(${target} jemalloc)
endif ()
# Place this line last to ensure that ${ld_flags} is placed last on the linker line
target_link_libraries(${target} ${libs} ${ld_flags})
if (USE_TLS)
# Add required libraries needed for TLS
target_link_libraries(${target} OpenSSL::SSL hiredis_ssl)
endif ()
if (IS_FREEBSD)
target_link_libraries(${target} execinfo)
endif ()
# Install cli tool and create a redis symbolic link
valkey_install_bin(${target})
endmacro ()
# Determine if we are building in Release or Debug mode
if (CMAKE_BUILD_TYPE MATCHES Debug OR CMAKE_BUILD_TYPE MATCHES DebugFull)
set(VALKEY_DEBUG_BUILD 1)
set(VALKEY_RELEASE_BUILD 0)
message(STATUS "Building in debug mode")
else ()
set(VALKEY_DEBUG_BUILD 0)
set(VALKEY_RELEASE_BUILD 1)
message(STATUS "Building in release mode")
endif ()
# ----------------------------------------------------
# Helper functions - end
# ----------------------------------------------------
# ----------------------------------------------------
# Build options (allocator, tls, rdma et al)
# ----------------------------------------------------
if (NOT BUILD_MALLOC)
if (APPLE)
set(BUILD_MALLOC "libc")
elseif (UNIX)
set(BUILD_MALLOC "jemalloc")
endif ()
endif ()
# User may pass different allocator library. Using -DBUILD_MALLOC=<libname>, make sure it is a valid value
if (BUILD_MALLOC)
if ("${BUILD_MALLOC}" STREQUAL "jemalloc")
set(MALLOC_LIB "jemalloc")
set(ALLOCATOR_LIB "jemalloc")
add_valkey_server_compiler_options("-DUSE_JEMALLOC")
set(USE_JEMALLOC 1)
elseif ("${BUILD_MALLOC}" STREQUAL "libc")
set(MALLOC_LIB "libc")
elseif ("${BUILD_MALLOC}" STREQUAL "tcmalloc")
set(MALLOC_LIB "tcmalloc")
valkey_pkg_config(libtcmalloc ALLOCATOR_LIB)
add_valkey_server_compiler_options("-DUSE_TCMALLOC")
set(USE_TCMALLOC 1)
elseif ("${BUILD_MALLOC}" STREQUAL "tcmalloc_minimal")
set(MALLOC_LIB "tcmalloc_minimal")
valkey_pkg_config(libtcmalloc_minimal ALLOCATOR_LIB)
add_valkey_server_compiler_options("-DUSE_TCMALLOC")
set(USE_TCMALLOC_MINIMAL 1)
else ()
message(FATAL_ERROR "BUILD_MALLOC can be one of: jemalloc, libc, tcmalloc or tcmalloc_minimal")
endif ()
endif ()
message(STATUS "Using ${MALLOC_LIB}")
# TLS support
if (BUILD_TLS)
valkey_parse_build_option(${BUILD_TLS} USE_TLS)
if (USE_TLS EQUAL 1)
# Only search for OpenSSL if needed
find_package(OpenSSL REQUIRED)
message(STATUS "OpenSSL include dir: ${OPENSSL_INCLUDE_DIR}")
message(STATUS "OpenSSL libraries: ${OPENSSL_LIBRARIES}")
include_directories(${OPENSSL_INCLUDE_DIR})
endif ()
if (USE_TLS EQUAL 1)
add_valkey_server_compiler_options("-DUSE_OPENSSL=1")
add_valkey_server_compiler_options("-DBUILD_TLS_MODULE=0")
else ()
# Build TLS as a module RDMA can only be built as a module. So disable it
message(WARNING "BUILD_TLS can be one of: [ON | OFF | 1 | 0], but '${BUILD_TLS}' was provided")
message(STATUS "TLS support is disabled")
set(USE_TLS 0)
endif ()
else ()
# By default, TLS is disabled
message(STATUS "TLS is disabled")
set(USE_TLS 0)
endif ()
if (BUILD_RDMA)
set(BUILD_RDMA_MODULE 0)
# RDMA support (Linux only)
if (LINUX AND NOT APPLE)
valkey_parse_build_option(${BUILD_RDMA} USE_RDMA)
find_package(PkgConfig REQUIRED)
# Locate librdmacm & libibverbs, fail if we can't find them
valkey_pkg_config(librdmacm RDMACM_LIBS)
valkey_pkg_config(libibverbs IBVERBS_LIBS)
message(STATUS "${RDMACM_LIBS};${IBVERBS_LIBS}")
list(APPEND RDMA_LIBS "${RDMACM_LIBS};${IBVERBS_LIBS}")
if (USE_RDMA EQUAL 2) # Module
message(STATUS "Building RDMA as module")
add_valkey_server_compiler_options("-DUSE_RDMA=2")
set(BUILD_RDMA_MODULE 2)
elseif (USE_RDMA EQUAL 1) # Builtin
message(STATUS "Building RDMA as builtin")
add_valkey_server_compiler_options("-DUSE_RDMA=1")
add_valkey_server_compiler_options("-DBUILD_RDMA_MODULE=0")
list(APPEND SERVER_LIBS "${RDMA_LIBS}")
endif ()
else ()
message(WARNING "RDMA is only supported on Linux platforms")
endif ()
else ()
# By default, RDMA is disabled
message(STATUS "RDMA is disabled")
set(USE_RDMA 0)
endif ()
set(BUILDING_ARM64 0)
set(BUILDING_ARM32 0)
if ("${CMAKE_SYSTEM_PROCESSOR}" STREQUAL "arm64")
set(BUILDING_ARM64 1)
endif ()
if ("${CMAKE_SYSTEM_PROCESSOR}" STREQUAL "arm")
set(BUILDING_ARM32 1)
endif ()
message(STATUS "Building on ${CMAKE_HOST_SYSTEM_NAME}")
if (BUILDING_ARM64)
message(STATUS "Compiling valkey for ARM64")
add_valkey_server_linker_option("-funwind-tables")
endif ()
if (APPLE)
add_valkey_server_linker_option("-rdynamic")
add_valkey_server_linker_option("-ldl")
elseif (UNIX)
add_valkey_server_linker_option("-rdynamic")
add_valkey_server_linker_option("-pthread")
add_valkey_server_linker_option("-ldl")
add_valkey_server_linker_option("-lm")
endif ()
if (VALKEY_DEBUG_BUILD)
# Debug build, use enable "-fno-omit-frame-pointer"
add_valkey_server_compiler_options("-fno-omit-frame-pointer")
endif ()
# Check for Atomic
check_include_files(stdatomic.h HAVE_C11_ATOMIC)
if (HAVE_C11_ATOMIC)
add_valkey_server_compiler_options("-std=gnu11")
else ()
add_valkey_server_compiler_options("-std=c99")
endif ()
# Sanitizer
if (BUILD_SANITIZER)
# Common CFLAGS
list(APPEND VALKEY_SANITAIZER_CFLAGS "-fno-sanitize-recover=all")
list(APPEND VALKEY_SANITAIZER_CFLAGS "-fno-omit-frame-pointer")
if ("${BUILD_SANITIZER}" STREQUAL "address")
list(APPEND VALKEY_SANITAIZER_CFLAGS "-fsanitize=address")
list(APPEND VALKEY_SANITAIZER_LDFLAGS "-fsanitize=address")
elseif ("${BUILD_SANITIZER}" STREQUAL "thread")
list(APPEND VALKEY_SANITAIZER_CFLAGS "-fsanitize=thread")
list(APPEND VALKEY_SANITAIZER_LDFLAGS "-fsanitize=thread")
elseif ("${BUILD_SANITIZER}" STREQUAL "undefined")
list(APPEND VALKEY_SANITAIZER_CFLAGS "-fsanitize=undefined")
list(APPEND VALKEY_SANITAIZER_LDFLAGS "-fsanitize=undefined")
else ()
message(FATAL_ERROR "Unknown sanitizer: ${BUILD_SANITIZER}")
endif ()
endif ()
include_directories("${CMAKE_SOURCE_DIR}/deps/hiredis")
include_directories("${CMAKE_SOURCE_DIR}/deps/linenoise")
include_directories("${CMAKE_SOURCE_DIR}/deps/lua/src")
include_directories("${CMAKE_SOURCE_DIR}/deps/hdr_histogram")
include_directories("${CMAKE_SOURCE_DIR}/deps/fpconv")
add_subdirectory("${CMAKE_SOURCE_DIR}/deps")
# Update linker flags for the allocator
if (USE_JEMALLOC)
include_directories("${CMAKE_SOURCE_DIR}/deps/jemalloc/include")
endif ()
# Common compiler flags
add_valkey_server_compiler_options("-pedantic")
# ----------------------------------------------------
# Build options (allocator, tls, rdma et al) - end
# ----------------------------------------------------
# -------------------------------------------------
# Code Generation section
# -------------------------------------------------
find_program(PYTHON_EXE python3)
if (PYTHON_EXE)
# Python based code generation
message(STATUS "Found python3: ${PYTHON_EXE}")
# Rule for generating commands.def file from json files
message(STATUS "Adding target generate_commands_def")
file(GLOB COMMAND_FILES_JSON "${CMAKE_SOURCE_DIR}/src/commands/*.json")
add_custom_command(
OUTPUT ${CMAKE_BINARY_DIR}/commands_def_generated
DEPENDS ${COMMAND_FILES_JSON}
COMMAND ${PYTHON_EXE} ${CMAKE_SOURCE_DIR}/utils/generate-command-code.py
COMMAND touch ${CMAKE_BINARY_DIR}/commands_def_generated
WORKING_DIRECTORY "${CMAKE_SOURCE_DIR}/src")
add_custom_target(generate_commands_def DEPENDS ${CMAKE_BINARY_DIR}/commands_def_generated)
# Rule for generating fmtargs.h
message(STATUS "Adding target generate_fmtargs_h")
add_custom_command(
OUTPUT ${CMAKE_BINARY_DIR}/fmtargs_generated
DEPENDS ${CMAKE_SOURCE_DIR}/utils/generate-fmtargs.py
COMMAND sed '/Everything/,$$d' fmtargs.h > fmtargs.h.tmp
COMMAND ${PYTHON_EXE} ${CMAKE_SOURCE_DIR}/utils/generate-fmtargs.py >> fmtargs.h.tmp
COMMAND mv fmtargs.h.tmp fmtargs.h
COMMAND touch ${CMAKE_BINARY_DIR}/fmtargs_generated
WORKING_DIRECTORY "${CMAKE_SOURCE_DIR}/src")
add_custom_target(generate_fmtargs_h DEPENDS ${CMAKE_BINARY_DIR}/fmtargs_generated)
# Rule for generating test_files.h
message(STATUS "Adding target generate_test_files_h")
file(GLOB UNIT_TEST_SRCS "${CMAKE_SOURCE_DIR}/src/unit/*.c")
add_custom_command(
OUTPUT ${CMAKE_BINARY_DIR}/test_files_generated
DEPENDS "${UNIT_TEST_SRCS};${CMAKE_SOURCE_DIR}/utils/generate-unit-test-header.py"
COMMAND ${PYTHON_EXE} ${CMAKE_SOURCE_DIR}/utils/generate-unit-test-header.py
COMMAND touch ${CMAKE_BINARY_DIR}/test_files_generated
WORKING_DIRECTORY "${CMAKE_SOURCE_DIR}/src")
add_custom_target(generate_test_files_h DEPENDS ${CMAKE_BINARY_DIR}/test_files_generated)
else ()
# Fake targets
add_custom_target(generate_commands_def)
add_custom_target(generate_fmtargs_h)
add_custom_target(generate_test_files_h)
endif ()
# Generate release.h file (always)
add_custom_target(
release_header
COMMAND sh -c '${CMAKE_SOURCE_DIR}/src/mkreleasehdr.sh'
WORKING_DIRECTORY "${CMAKE_SOURCE_DIR}/src")
# -------------------------------------------------
# Code Generation section - end
# -------------------------------------------------
# ----------------------------------------------------------
# All our source files are defined in SourceFiles.cmake file
# ----------------------------------------------------------
include(SourceFiles)
# Clear the below variables from the cache
unset(CMAKE_C_FLAGS CACHE)
unset(VALKEY_SERVER_LDFLAGS CACHE)
unset(VALKEY_SERVER_CFLAGS CACHE)
unset(PYTHON_EXE CACHE)
unset(HAVE_C11_ATOMIC CACHE)
unset(USE_TLS CACHE)
unset(USE_RDMA CACHE)
unset(BUILD_TLS CACHE)
unset(BUILD_RDMA CACHE)
unset(BUILD_MALLOC CACHE)
unset(USE_JEMALLOC CACHE)
unset(BUILD_TLS_MODULE CACHE)
unset(BUILD_TLS_BUILTIN CACHE)

19
codecov.yml Normal file
View File

@ -0,0 +1,19 @@
coverage:
status:
patch:
default:
informational: true
project:
default:
informational: true
comment:
require_changes: false
require_head: false
require_base: false
layout: "condensed_header, diff, files"
hide_project_coverage: false
behavior: default
github_checks:
annotations: false

28
deps/CMakeLists.txt vendored Normal file
View File

@ -0,0 +1,28 @@
if (USE_JEMALLOC)
add_subdirectory(jemalloc)
endif ()
add_subdirectory(lua)
# Set hiredis options. We need to disable the defaults set in the OPTION(..) we do this by setting them in the CACHE
set(BUILD_SHARED_LIBS
OFF
CACHE BOOL "Build shared libraries")
set(DISABLE_TESTS
ON
CACHE BOOL "If tests should be compiled or not")
if (USE_TLS) # Module or no module
message(STATUS "Building hiredis_ssl")
set(ENABLE_SSL
ON
CACHE BOOL "Should we test SSL connections")
endif ()
add_subdirectory(hiredis)
add_subdirectory(linenoise)
add_subdirectory(fpconv)
add_subdirectory(hdr_histogram)
# Clear any cached variables passed to hiredis from the cache
unset(BUILD_SHARED_LIBS CACHE)
unset(DISABLE_TESTS CACHE)
unset(ENABLE_SSL CACHE)

11
deps/Makefile vendored
View File

@ -1,4 +1,4 @@
# Redis dependency Makefile
# Dependency Makefile
uname_S:= $(shell sh -c 'uname -s 2>/dev/null || echo not')
@ -42,6 +42,7 @@ distclean:
-(cd jemalloc && [ -f Makefile ] && $(MAKE) distclean) > /dev/null || true
-(cd hdr_histogram && $(MAKE) clean) > /dev/null || true
-(cd fpconv && $(MAKE) clean) > /dev/null || true
-(cd fast_float_c_interface && $(MAKE) clean) > /dev/null || true
-(rm -f .make-*)
.PHONY: distclean
@ -79,7 +80,7 @@ ifeq ($(uname_S),SunOS)
LUA_CFLAGS= -D__C99FEATURES__=1
endif
LUA_CFLAGS+= -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL -DREDIS_STATIC='' -DLUA_USE_MKSTEMP $(CFLAGS)
LUA_CFLAGS+= -Wall -DLUA_ANSI -DENABLE_CJSON_GLOBAL -DLUA_USE_MKSTEMP $(CFLAGS)
LUA_LDFLAGS+= $(LDFLAGS)
ifeq ($(LUA_DEBUG),yes)
LUA_CFLAGS+= -O0 -g -DLUA_USE_APICHECK
@ -116,3 +117,9 @@ jemalloc: .make-prerequisites
cd jemalloc && $(MAKE) lib/libjemalloc.a
.PHONY: jemalloc
fast_float_c_interface: .make-prerequisites
@printf '%b %b\n' $(MAKECOLOR)MAKE$(ENDCOLOR) $(BINCOLOR)$@$(ENDCOLOR)
cd fast_float_c_interface && $(MAKE)
.PHONY: fast_float_c_interface

40
deps/README.md vendored
View File

@ -1,11 +1,12 @@
This directory contains all Redis dependencies, except for the libc that
This directory contains all Valkey dependencies, except for the libc that
should be provided by the operating system.
* **Jemalloc** is our memory allocator, used as replacement for libc malloc on Linux by default. It has good performances and excellent fragmentation behavior. This component is upgraded from time to time.
* **hiredis** is the official C client library for Redis. It is used by redis-cli, redis-benchmark and Redis Sentinel. It is part of the Redis official ecosystem but is developed externally from the Redis repository, so we just upgrade it as needed.
* **linenoise** is a readline replacement. It is developed by the same authors of Redis but is managed as a separated project and updated as needed.
* **linenoise** is a readline replacement. It is developed by the same authors of Valkey but is managed as a separated project and updated as needed.
* **lua** is Lua 5.1 with minor changes for security and additional libraries.
* **hdr_histogram** Used for per-command latency tracking histograms.
* **fast_float** is a replacement for strtod to convert strings to floats efficiently.
How to upgrade the above dependencies
===
@ -13,10 +14,10 @@ How to upgrade the above dependencies
Jemalloc
---
Jemalloc is modified with changes that allow us to implement the Redis
active defragmentation logic. However this feature of Redis is not mandatory
and Redis is able to understand if the Jemalloc version it is compiled
against supports such Redis-specific modifications. So in theory, if you
Jemalloc is modified with changes that allow us to implement the Valkey
active defragmentation logic. However this feature of Valkey is not mandatory
and Valkey is able to understand if the Jemalloc version it is compiled
against supports such Valkey-specific modifications. So in theory, if you
are not interested in the active defragmentation, you can replace Jemalloc
just following these steps:
@ -28,7 +29,7 @@ just following these steps:
Jemalloc configuration script is broken and will not work nested in another
git repository.
However note that we change Jemalloc settings via the `configure` script of Jemalloc using the `--with-lg-quantum` option, setting it to the value of 3 instead of 4. This provides us with more size classes that better suit the Redis data structures, in order to gain memory efficiency.
However note that we change Jemalloc settings via the `configure` script of Jemalloc using the `--with-lg-quantum` option, setting it to the value of 3 instead of 4. This provides us with more size classes that better suit the Valkey data structures, in order to gain memory efficiency.
If you want to upgrade Jemalloc while also providing support for
active defragmentation, in addition to the above steps you need to perform
@ -38,7 +39,7 @@ the following additional steps:
to add `#define JEMALLOC_FRAG_HINT`.
6. Implement the function `je_get_defrag_hint()` inside `src/jemalloc.c`. You
can see how it is implemented in the current Jemalloc source tree shipped
with Redis, and rewrite it according to the new Jemalloc internals, if they
with Valkey, and rewrite it according to the new Jemalloc internals, if they
changed, otherwise you could just copy the old implementation if you are
upgrading just to a similar version of Jemalloc.
@ -61,7 +62,7 @@ cd deps/jemalloc
Hiredis
---
Hiredis is used by Sentinel, `redis-cli` and `redis-benchmark`. Like Redis, uses the SDS string library, but not necessarily the same version. In order to avoid conflicts, this version has all SDS identifiers prefixed by `hi`.
Hiredis is used by Sentinel, `valkey-cli` and `valkey-benchmark`. Like Valkey, uses the SDS string library, but not necessarily the same version. In order to avoid conflicts, this version has all SDS identifiers prefixed by `hi`.
1. `git subtree pull --prefix deps/hiredis https://github.com/redis/hiredis.git <version-tag> --squash`<br>
This should hopefully merge the local changes into the new version.
@ -71,7 +72,7 @@ Linenoise
---
Linenoise is rarely upgraded as needed. The upgrade process is trivial since
Redis uses a non modified version of linenoise, so to upgrade just do the
Valkey uses a non modified version of linenoise, so to upgrade just do the
following:
1. Remove the linenoise directory.
@ -81,11 +82,11 @@ Lua
---
We use Lua 5.1 and no upgrade is planned currently, since we don't want to break
Lua scripts for new Lua features: in the context of Redis Lua scripts the
Lua scripts for new Lua features: in the context of Valkey Lua scripts the
capabilities of 5.1 are usually more than enough, the release is rock solid,
and we definitely don't want to break old scripts.
So upgrading of Lua is up to the Redis project maintainers and should be a
So upgrading of Lua is up to the Valkey project maintainers and should be a
manual procedure performed by taking a diff between the different versions.
Currently we have at least the following differences between official Lua 5.1
@ -94,6 +95,7 @@ and our version:
1. Makefile is modified to allow a different compiler than GCC.
2. We have the implementation source code, and directly link to the following external libraries: `lua_cjson.o`, `lua_struct.o`, `lua_cmsgpack.o` and `lua_bit.o`.
3. There is a security fix in `ldo.c`, line 498: The check for `LUA_SIGNATURE[0]` is removed in order to avoid direct bytecode execution.
4. In `lstring.c`, the luaS_newlstr function's hash calculation has been upgraded from a simple hash function to MurmurHash3, implemented within the same file, to enhance performance, particularly for operations involving large strings.
Hdr_Histogram
---
@ -104,3 +106,17 @@ We use a customized version based on master branch commit e4448cf6d1cd08fff51981
2. Copy updated files from newer version onto files in /hdr_histogram.
3. Apply the changes from 1 above to the updated files.
fast_float
---
The fast_float library provides fast header-only implementations for the C++ from_chars functions for `float` and `double` types as well as integer types. These functions convert ASCII strings representing decimal values (e.g., `1.3e10`) into binary types. The functions are much faster than comparable number-parsing functions from existing C++ standard libraries.
Specifically, `fast_float` provides the following function to parse floating-point numbers with a C++17-like syntax (the library itself only requires C++11):
template <typename T, typename UC = char, typename = FASTFLOAT_ENABLE_IF(is_supported_float_type<T>())>
from_chars_result_t<UC> from_chars(UC const *first, UC const *last, T &value, chars_format fmt = chars_format::general);
To upgrade the library,
1. Check out https://github.com/fastfloat/fast_float/tree/main
2. cd fast_float
3. Invoke "python3 ./script/amalgamate.py --output fast_float.h"
4. Copy fast_float.h file to "deps/fast_float/".

3912
deps/fast_float/fast_float.h vendored Normal file

File diff suppressed because it is too large Load Diff

37
deps/fast_float_c_interface/Makefile vendored Normal file
View File

@ -0,0 +1,37 @@
CCCOLOR:="\033[34m"
SRCCOLOR:="\033[33m"
ENDCOLOR:="\033[0m"
CXX?=c++
# we need = instead of := so that $@ in QUIET_CXX gets evaluated in the rule and is assigned appropriate value.
TEMP:=$(CXX)
QUIET_CXX=@printf ' %b %b\n' $(CCCOLOR)C++$(ENDCOLOR) $(SRCCOLOR)$@$(ENDCOLOR) 1>&2;
CXX=$(QUIET_CXX)$(TEMP)
WARN=-Wall -W -Wno-missing-field-initializers
STD=-pedantic -std=c++11
OPT?=-O3
CLANG := $(findstring clang,$(shell sh -c '$(CC) --version | head -1'))
ifeq ($(OPT),-O3)
ifeq (clang,$(CLANG))
OPT+=-flto
else
OPT+=-flto=auto -ffat-lto-objects
endif
endif
# 1) Today src/Makefile passes -m32 flag for explicit 32-bit build on 64-bit machine, via CFLAGS. For 32-bit build on
# 32-bit machine and 64-bit on 64-bit machine, CFLAGS are empty. No other flags are set that can conflict with C++,
# therefore let's use CFLAGS without changes for now.
# 2) FASTFLOAT_ALLOWS_LEADING_PLUS allows +inf to be parsed as inf, instead of error.
CXXFLAGS=$(STD) $(OPT) $(WARN) -static -fPIC -fno-exceptions $(CFLAGS) -D FASTFLOAT_ALLOWS_LEADING_PLUS
.PHONY: all clean
all: fast_float_strtod.o
clean:
rm -f *.o || true;

View File

@ -0,0 +1,24 @@
/*
* Copyright Valkey Contributors.
* All rights reserved.
* SPDX-License-Identifier: BSD 3-Clause
*/
#include "../fast_float/fast_float.h"
#include <cerrno>
extern "C"
{
double fast_float_strtod(const char *str, const char** endptr)
{
double temp = 0;
auto answer = fast_float::from_chars(str, str + strlen(str), temp);
if (answer.ec != std::errc()) {
errno = (answer.ec == std::errc::result_out_of_range) ? ERANGE : EINVAL;
}
if (endptr) {
*endptr = answer.ptr;
}
return temp;
}
}

4
deps/fpconv/CMakeLists.txt vendored Normal file
View File

@ -0,0 +1,4 @@
project(fpconv)
set(SRCS "${CMAKE_CURRENT_LIST_DIR}/fpconv_dtoa.c" "${CMAKE_CURRENT_LIST_DIR}/fpconv_dtoa.h")
add_library(fpconv STATIC ${SRCS})

View File

@ -6,7 +6,7 @@
* [1] https://www.cs.tufts.edu/~nr/cs257/archive/florian-loitsch/printf.pdf
* ----------------------------------------------------------------------------
*
* Copyright (c) 2021, Redis Labs
* Copyright (c) 2021, Redis Ltd.
* Copyright (c) 2013-2019, night-shift <as.smljk at gmail dot com>
* Copyright (c) 2009, Florian Loitsch < florian.loitsch at inria dot fr >
* All rights reserved.

7
deps/hdr_histogram/CMakeLists.txt vendored Normal file
View File

@ -0,0 +1,7 @@
project(hdr_histogram)
set(SRCS "${CMAKE_CURRENT_LIST_DIR}/hdr_histogram.c" "${CMAKE_CURRENT_LIST_DIR}/hdr_histogram.h"
"${CMAKE_CURRENT_LIST_DIR}/hdr_atomic.h" "${CMAKE_CURRENT_LIST_DIR}/hdr_redis_malloc.h")
add_library(hdr_histogram STATIC ${SRCS})
target_compile_definitions(hdr_histogram PRIVATE HDR_MALLOC_INCLUDE=\"hdr_redis_malloc.h\")

View File

@ -1,13 +1,13 @@
#ifndef HDR_MALLOC_H__
#define HDR_MALLOC_H__
void *zmalloc(size_t size);
void *valkey_malloc(size_t size);
void *zcalloc_num(size_t num, size_t size);
void *zrealloc(void *ptr, size_t size);
void zfree(void *ptr);
void *valkey_realloc(void *ptr, size_t size);
void valkey_free(void *ptr);
#define hdr_malloc zmalloc
#define hdr_malloc valkey_malloc
#define hdr_calloc zcalloc_num
#define hdr_realloc zrealloc
#define hdr_free zfree
#define hdr_realloc valkey_realloc
#define hdr_free valkey_free
#endif

View File

@ -112,7 +112,7 @@ jobs:
run: $GITHUB_WORKSPACE/test.sh
freebsd:
runs-on: macos-12
runs-on: macos-13
name: FreeBSD
steps:
- uses: actions/checkout@v3

View File

@ -1,4 +1,4 @@
Copyright (c) 2009-2011, Salvatore Sanfilippo <antirez at gmail dot com>
Copyright (c) 2009-2011, Redis Ltd.
Copyright (c) 2010-2011, Pieter Noordhuis <pcnoordhuis at gmail dot com>
All rights reserved.

View File

@ -1,5 +1,5 @@
# Hiredis Makefile
# Copyright (C) 2010-2011 Salvatore Sanfilippo <antirez at gmail dot com>
# Copyright (C) 2010-2011 Redis Ltd.
# Copyright (C) 2010-2011 Pieter Noordhuis <pcnoordhuis at gmail dot com>
# This file is released under the BSD license, see the COPYING file

View File

@ -1,5 +1,5 @@
/*
* Copyright (c) 2009-2011, Salvatore Sanfilippo <antirez at gmail dot com>
* Copyright (c) 2009-2011, Redis Ltd.
* Copyright (c) 2010-2011, Pieter Noordhuis <pcnoordhuis at gmail dot com>
*
* All rights reserved.

View File

@ -1,5 +1,5 @@
/*
* Copyright (c) 2009-2011, Salvatore Sanfilippo <antirez at gmail dot com>
* Copyright (c) 2009-2011, Redis Ltd.
* Copyright (c) 2010-2011, Pieter Noordhuis <pcnoordhuis at gmail dot com>
*
* All rights reserved.

View File

@ -1,5 +1,5 @@
/*
* Copyright (c) 2009-2011, Salvatore Sanfilippo <antirez at gmail dot com>
* Copyright (c) 2009-2011, Redis Ltd.
* Copyright (c) 2010-2011, Pieter Noordhuis <pcnoordhuis at gmail dot com>
*
* All rights reserved.

2
deps/hiredis/dict.c vendored
View File

@ -5,7 +5,7 @@
* tables of power of two in size are used, collisions are handled by
* chaining. See the source code for more information... :)
*
* Copyright (c) 2006-2010, Salvatore Sanfilippo <antirez at gmail dot com>
* Copyright (c) 2006-2010, Redis Ltd.
* All rights reserved.
*
* Redistribution and use in source and binary forms, with or without

2
deps/hiredis/dict.h vendored
View File

@ -5,7 +5,7 @@
* tables of power of two in size are used, collisions are handled by
* chaining. See the source code for more information... :)
*
* Copyright (c) 2006-2010, Salvatore Sanfilippo <antirez at gmail dot com>
* Copyright (c) 2006-2010, Redis Ltd.
* All rights reserved.
*
* Redistribution and use in source and binary forms, with or without

View File

@ -1,5 +1,5 @@
/*
* Copyright (c) 2020, Salvatore Sanfilippo <antirez at gmail dot com>
* Copyright (c) 2020, Redis Ltd.
* Copyright (c) 2020, Pieter Noordhuis <pcnoordhuis at gmail dot com>
* Copyright (c) 2020, Matt Stancliff <matt at genges dot com>,
* Jan-Erik Rediger <janerik at fnordig dot com>

View File

@ -1,5 +1,5 @@
/*
* Copyright (c) 2009-2011, Salvatore Sanfilippo <antirez at gmail dot com>
* Copyright (c) 2009-2011, Redis Ltd.
* Copyright (c) 2010-2014, Pieter Noordhuis <pcnoordhuis at gmail dot com>
* Copyright (c) 2015, Matt Stancliff <matt at genges dot com>,
* Jan-Erik Rediger <janerik at fnordig dot com>

View File

@ -1,5 +1,5 @@
/*
* Copyright (c) 2009-2011, Salvatore Sanfilippo <antirez at gmail dot com>
* Copyright (c) 2009-2011, Redis Ltd.
* Copyright (c) 2010-2014, Pieter Noordhuis <pcnoordhuis at gmail dot com>
* Copyright (c) 2015, Matt Stancliff <matt at genges dot com>,
* Jan-Erik Rediger <janerik at fnordig dot com>

View File

@ -1,6 +1,6 @@
/*
* Copyright (c) 2019, Redis Labs
* Copyright (c) 2019, Redis Ltd.
*
* All rights reserved.
*

2
deps/hiredis/net.c vendored
View File

@ -1,6 +1,6 @@
/* Extracted from anet.c to work properly with Hiredis error reporting.
*
* Copyright (c) 2009-2011, Salvatore Sanfilippo <antirez at gmail dot com>
* Copyright (c) 2009-2011, Redis Ltd.
* Copyright (c) 2010-2014, Pieter Noordhuis <pcnoordhuis at gmail dot com>
* Copyright (c) 2015, Matt Stancliff <matt at genges dot com>,
* Jan-Erik Rediger <janerik at fnordig dot com>

2
deps/hiredis/net.h vendored
View File

@ -1,6 +1,6 @@
/* Extracted from anet.c to work properly with Hiredis error reporting.
*
* Copyright (c) 2009-2011, Salvatore Sanfilippo <antirez at gmail dot com>
* Copyright (c) 2009-2011, Redis Ltd.
* Copyright (c) 2010-2014, Pieter Noordhuis <pcnoordhuis at gmail dot com>
* Copyright (c) 2015, Matt Stancliff <matt at genges dot com>,
* Jan-Erik Rediger <janerik at fnordig dot com>

2
deps/hiredis/read.c vendored
View File

@ -1,5 +1,5 @@
/*
* Copyright (c) 2009-2011, Salvatore Sanfilippo <antirez at gmail dot com>
* Copyright (c) 2009-2011, Redis Ltd.
* Copyright (c) 2010-2011, Pieter Noordhuis <pcnoordhuis at gmail dot com>
*
* All rights reserved.

2
deps/hiredis/read.h vendored
View File

@ -1,5 +1,5 @@
/*
* Copyright (c) 2009-2011, Salvatore Sanfilippo <antirez at gmail dot com>
* Copyright (c) 2009-2011, Redis Ltd.
* Copyright (c) 2010-2011, Pieter Noordhuis <pcnoordhuis at gmail dot com>
*
* All rights reserved.

3
deps/hiredis/sds.c vendored
View File

@ -1,8 +1,7 @@
/* SDSLib 2.0 -- A C dynamic strings library
*
* Copyright (c) 2006-2015, Salvatore Sanfilippo <antirez at gmail dot com>
* Copyright (c) 2006-2015, Redis Ltd.
* Copyright (c) 2015, Oran Agra
* Copyright (c) 2015, Redis Labs, Inc
* All rights reserved.
*
* Redistribution and use in source and binary forms, with or without

5
deps/hiredis/sds.h vendored
View File

@ -1,8 +1,7 @@
/* SDSLib 2.0 -- A C dynamic strings library
*
* Copyright (c) 2006-2015, Salvatore Sanfilippo <antirez at gmail dot com>
* Copyright (c) 2006-2015, Redis Ltd.
* Copyright (c) 2015, Oran Agra
* Copyright (c) 2015, Redis Labs, Inc
* All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
@ -273,7 +272,7 @@ void *hi_sds_malloc(size_t size);
void *hi_sds_realloc(void *ptr, size_t size);
void hi_sds_free(void *ptr);
#ifdef REDIS_TEST
#ifdef SERVER_TEST
int hi_sdsTest(int argc, char *argv[]);
#endif

View File

@ -1,8 +1,7 @@
/* SDSLib 2.0 -- A C dynamic strings library
*
* Copyright (c) 2006-2015, Salvatore Sanfilippo <antirez at gmail dot com>
* Copyright (c) 2006-2015, Redis Ltd.
* Copyright (c) 2015, Oran Agra
* Copyright (c) 2015, Redis Labs, Inc
* All rights reserved.
*
* Redistribution and use in source and binary forms, with or without

4
deps/hiredis/ssl.c vendored
View File

@ -1,7 +1,7 @@
/*
* Copyright (c) 2009-2011, Salvatore Sanfilippo <antirez at gmail dot com>
* Copyright (c) 2009-2011, Redis Ltd.
* Copyright (c) 2010-2011, Pieter Noordhuis <pcnoordhuis at gmail dot com>
* Copyright (c) 2019, Redis Labs
* Copyright (c) 2019, Redis Ltd.
*
* All rights reserved.
*

View File

@ -23,8 +23,8 @@ SOCK_FILE=${tmpdir}/hiredis-test-redis.sock
if [ "$TEST_SSL" = "1" ]; then
SSL_CA_CERT=${tmpdir}/ca.crt
SSL_CA_KEY=${tmpdir}/ca.key
SSL_CERT=${tmpdir}/redis.crt
SSL_KEY=${tmpdir}/redis.key
SSL_CERT=${tmpdir}/valkey.crt
SSL_KEY=${tmpdir}/valkey.key
openssl genrsa -out ${tmpdir}/ca.key 4096
openssl req \

32
deps/jemalloc/CMakeLists.txt vendored Normal file
View File

@ -0,0 +1,32 @@
project(jemalloc)
# Build jemalloc using configure && make install
set(JEMALLOC_INSTALL_DIR ${CMAKE_BINARY_DIR}/jemalloc-build)
set(JEMALLOC_SRC_DIR ${CMAKE_CURRENT_LIST_DIR})
if (NOT EXISTS ${JEMALLOC_INSTALL_DIR}/lib/libjemalloc.a)
message(STATUS "Building jemalloc (custom build)")
message(STATUS "JEMALLOC_SRC_DIR = ${JEMALLOC_SRC_DIR}")
message(STATUS "JEMALLOC_INSTALL_DIR = ${JEMALLOC_INSTALL_DIR}")
execute_process(
COMMAND sh -c "${JEMALLOC_SRC_DIR}/configure --disable-cxx \
--with-version=5.3.0-0-g0 --with-lg-quantum=3 --disable-cache-oblivious --with-jemalloc-prefix=je_ \
--enable-static --disable-shared --prefix=${JEMALLOC_INSTALL_DIR}"
WORKING_DIRECTORY ${JEMALLOC_SRC_DIR} RESULTS_VARIABLE CONFIGURE_RESULT)
if (NOT ${CONFIGURE_RESULT} EQUAL 0)
message(FATAL_ERROR "Jemalloc configure failed")
endif ()
execute_process(COMMAND make -j${VALKEY_PROCESSOR_COUNT} lib/libjemalloc.a install
WORKING_DIRECTORY "${JEMALLOC_SRC_DIR}" RESULTS_VARIABLE MAKE_RESULT)
if (NOT ${MAKE_RESULT} EQUAL 0)
message(FATAL_ERROR "Jemalloc build failed")
endif ()
endif ()
# Import the compiled library as a CMake target
add_library(jemalloc STATIC IMPORTED GLOBAL)
set_target_properties(jemalloc PROPERTIES IMPORTED_LOCATION "${JEMALLOC_INSTALL_DIR}/lib/libjemalloc.a"
INCLUDE_DIRECTORIES "${JEMALLOC_INSTALL_DIR}/include")

View File

@ -337,55 +337,4 @@ imalloc_fastpath(size_t size, void *(fallback_alloc)(size_t)) {
return fallback_alloc(size);
}
JEMALLOC_ALWAYS_INLINE int
iget_defrag_hint(tsdn_t *tsdn, void* ptr) {
int defrag = 0;
emap_alloc_ctx_t alloc_ctx;
emap_alloc_ctx_lookup(tsdn, &arena_emap_global, ptr, &alloc_ctx);
if (likely(alloc_ctx.slab)) {
/* Small allocation. */
edata_t *slab = emap_edata_lookup(tsdn, &arena_emap_global, ptr);
arena_t *arena = arena_get_from_edata(slab);
szind_t binind = edata_szind_get(slab);
unsigned binshard = edata_binshard_get(slab);
bin_t *bin = arena_get_bin(arena, binind, binshard);
malloc_mutex_lock(tsdn, &bin->lock);
arena_dalloc_bin_locked_info_t info;
arena_dalloc_bin_locked_begin(&info, binind);
/* Don't bother moving allocations from the slab currently used for new allocations */
if (slab != bin->slabcur) {
int free_in_slab = edata_nfree_get(slab);
if (free_in_slab) {
const bin_info_t *bin_info = &bin_infos[binind];
/* Find number of non-full slabs and the number of regs in them */
unsigned long curslabs = 0;
size_t curregs = 0;
/* Run on all bin shards (usually just one) */
for (uint32_t i=0; i< bin_info->n_shards; i++) {
bin_t *bb = arena_get_bin(arena, binind, i);
curslabs += bb->stats.nonfull_slabs;
/* Deduct the regs in full slabs (they're not part of the game) */
unsigned long full_slabs = bb->stats.curslabs - bb->stats.nonfull_slabs;
curregs += bb->stats.curregs - full_slabs * bin_info->nregs;
if (bb->slabcur) {
/* Remove slabcur from the overall utilization (not a candidate to nove from) */
curregs -= bin_info->nregs - edata_nfree_get(bb->slabcur);
curslabs -= 1;
}
}
/* Compare the utilization ratio of the slab in question to the total average
* among non-full slabs. To avoid precision loss in division, we do that by
* extrapolating the usage of the slab as if all slabs have the same usage.
* If this slab is less used than the average, we'll prefer to move the data
* to hopefully more used ones. To avoid stagnation when all slabs have the same
* utilization, we give additional 12.5% weight to the decision to defrag. */
defrag = (bin_info->nregs - free_in_slab) * curslabs <= curregs + curregs / 8;
}
}
arena_dalloc_bin_locked_finish(tsdn, arena, bin, &info);
malloc_mutex_unlock(tsdn, &bin->lock);
}
return defrag;
}
#endif /* JEMALLOC_INTERNAL_INLINES_C_H */

View File

@ -147,7 +147,3 @@
#else
# define JEMALLOC_SYS_NOTHROW JEMALLOC_NOTHROW
#endif
/* This version of Jemalloc, modified for Redis, has the je_get_defrag_hint()
* function. */
#define JEMALLOC_FRAG_HINT

View File

@ -4474,12 +4474,3 @@ jemalloc_postfork_child(void) {
}
/******************************************************************************/
/* Helps the application decide if a pointer is worth re-allocating in order to reduce fragmentation.
* returns 1 if the allocation should be moved, and 0 if the allocation be kept.
* If the application decides to re-allocate it should use MALLOCX_TCACHE_NONE when doing so. */
JEMALLOC_EXPORT int JEMALLOC_NOTHROW
get_defrag_hint(void* ptr) {
assert(ptr != NULL);
return iget_defrag_hint(TSDN_NULL, ptr);
}

4
deps/linenoise/CMakeLists.txt vendored Normal file
View File

@ -0,0 +1,4 @@
project(linenoise)
set(SRCS "${CMAKE_CURRENT_LIST_DIR}/linenoise.c" "${CMAKE_CURRENT_LIST_DIR}/linenoise.h")
add_library(linenoise STATIC ${SRCS})

View File

@ -17,7 +17,7 @@ Line editing with some support for history is a really important feature for com
So what usually happens is either:
* Large programs with configure scripts disabling line editing if readline is not present in the system, or not supporting it at all since readline is GPL licensed and libedit (the BSD clone) is not as known and available as readline is (Real world example of this problem: Tclsh).
* Smaller programs not using a configure script not supporting line editing at all (A problem we had with Redis-cli for instance).
* Smaller programs not using a configure script not supporting line editing at all (A problem we had with Valkey-cli for instance).
The result is a pollution of binaries without line editing support.
@ -108,7 +108,7 @@ to search and re-edit already inserted lines of text.
The followings are the history API calls:
int linenoiseHistoryAdd(const char *line);
int linenoiseHistoryAdd(const char *line, int is_sensitive);
int linenoiseHistorySetMaxLen(int len);
int linenoiseHistorySave(const char *filename);
int linenoiseHistoryLoad(const char *filename);

View File

@ -10,7 +10,7 @@
*
* ------------------------------------------------------------------------
*
* Copyright (c) 2010-2016, Salvatore Sanfilippo <antirez at gmail dot com>
* Copyright (c) 2010-2016, Redis Ltd.
* Copyright (c) 2010-2013, Pieter Noordhuis <pcnoordhuis at gmail dot com>
*
* All rights reserved.
@ -134,6 +134,8 @@ static int atexit_registered = 0; /* Register atexit just 1 time. */
static int history_max_len = LINENOISE_DEFAULT_HISTORY_MAX_LEN;
static int history_len = 0;
static char **history = NULL;
static int *history_sensitive = NULL; /* An array records whether each line in
* history is sensitive. */
/* The linenoiseState structure represents the state during line editing.
* We pass this state to functions implementing specific editing
@ -177,7 +179,7 @@ enum KEY_ACTION{
};
static void linenoiseAtExit(void);
int linenoiseHistoryAdd(const char *line);
int linenoiseHistoryAdd(const char *line, int is_sensitive);
static void refreshLine(struct linenoiseState *l);
/* Debugging macro. */
@ -231,7 +233,7 @@ static int isUnsupportedTerm(void) {
return 0;
}
/* Raw mode: 1960 magic shit. */
/* Raw mode: 1960's magic. */
static int enableRawMode(int fd) {
struct termios raw;
@ -818,7 +820,7 @@ static int linenoiseEdit(int stdin_fd, int stdout_fd, char *buf, size_t buflen,
/* The latest history entry is always our current buffer, that
* initially is just an empty string. */
linenoiseHistoryAdd("");
linenoiseHistoryAdd("", 0);
if (write(l.ofd,prompt,l.plen) == -1) return -1;
while(1) {
@ -1112,6 +1114,7 @@ static void freeHistory(void) {
for (j = 0; j < history_len; j++)
free(history[j]);
free(history);
free(history_sensitive);
}
}
@ -1128,7 +1131,7 @@ static void linenoiseAtExit(void) {
* histories, but will work well for a few hundred of entries.
*
* Using a circular buffer is smarter, but a bit more complex to handle. */
int linenoiseHistoryAdd(const char *line) {
int linenoiseHistoryAdd(const char *line, int is_sensitive) {
char *linecopy;
if (history_max_len == 0) return 0;
@ -1137,7 +1140,14 @@ int linenoiseHistoryAdd(const char *line) {
if (history == NULL) {
history = malloc(sizeof(char*)*history_max_len);
if (history == NULL) return 0;
history_sensitive = malloc(sizeof(int)*history_max_len);
if (history_sensitive == NULL) {
free(history);
history = NULL;
return 0;
}
memset(history,0,(sizeof(char*)*history_max_len));
memset(history_sensitive,0,(sizeof(int)*history_max_len));
}
/* Don't add duplicated lines. */
@ -1150,9 +1160,11 @@ int linenoiseHistoryAdd(const char *line) {
if (history_len == history_max_len) {
free(history[0]);
memmove(history,history+1,sizeof(char*)*(history_max_len-1));
memmove(history_sensitive,history_sensitive+1,sizeof(int)*(history_max_len-1));
history_len--;
}
history[history_len] = linecopy;
history_sensitive[history_len] = is_sensitive;
history_len++;
return 1;
}
@ -1163,6 +1175,7 @@ int linenoiseHistoryAdd(const char *line) {
* than the amount of items already inside the history. */
int linenoiseHistorySetMaxLen(int len) {
char **new;
int *new_sensitive;
if (len < 1) return 0;
if (history) {
@ -1170,6 +1183,11 @@ int linenoiseHistorySetMaxLen(int len) {
new = malloc(sizeof(char*)*len);
if (new == NULL) return 0;
new_sensitive = malloc(sizeof(int)*len);
if (new_sensitive == NULL) {
free(new);
return 0;
}
/* If we can't copy everything, free the elements we'll not use. */
if (len < tocopy) {
@ -1179,9 +1197,13 @@ int linenoiseHistorySetMaxLen(int len) {
tocopy = len;
}
memset(new,0,sizeof(char*)*len);
memset(new_sensitive,0,sizeof(int)*len);
memcpy(new,history+(history_len-tocopy), sizeof(char*)*tocopy);
memcpy(new_sensitive,history_sensitive+(history_len-tocopy), sizeof(int)*tocopy);
free(history);
free(history_sensitive);
history = new;
history_sensitive = new_sensitive;
}
history_max_len = len;
if (history_len > history_max_len)
@ -1201,7 +1223,7 @@ int linenoiseHistorySave(const char *filename) {
if (fp == NULL) return -1;
fchmod(fileno(fp),S_IRUSR|S_IWUSR);
for (j = 0; j < history_len; j++)
fprintf(fp,"%s\n",history[j]);
if (!history_sensitive[j]) fprintf(fp,"%s\n",history[j]);
fclose(fp);
return 0;
}
@ -1223,7 +1245,7 @@ int linenoiseHistoryLoad(const char *filename) {
p = strchr(buf,'\r');
if (!p) p = strchr(buf,'\n');
if (p) *p = '\0';
linenoiseHistoryAdd(buf);
linenoiseHistoryAdd(buf, 0);
}
fclose(fp);
return 0;

View File

@ -7,7 +7,7 @@
*
* ------------------------------------------------------------------------
*
* Copyright (c) 2010-2014, Salvatore Sanfilippo <antirez at gmail dot com>
* Copyright (c) 2010-2014, Redis Ltd.
* Copyright (c) 2010-2013, Pieter Noordhuis <pcnoordhuis at gmail dot com>
*
* All rights reserved.
@ -58,7 +58,7 @@ void linenoiseAddCompletion(linenoiseCompletions *, const char *);
char *linenoise(const char *prompt);
void linenoiseFree(void *ptr);
int linenoiseHistoryAdd(const char *line);
int linenoiseHistoryAdd(const char *line, int is_sensitive);
int linenoiseHistorySetMaxLen(int len);
int linenoiseHistorySave(const char *filename);
int linenoiseHistoryLoad(const char *filename);

53
deps/lua/CMakeLists.txt vendored Normal file
View File

@ -0,0 +1,53 @@
project(lualib)
include(CheckFunctionExists)
set(LUA_SRC_DIR "${CMAKE_CURRENT_LIST_DIR}/src")
set(LUA_SRCS
${LUA_SRC_DIR}/fpconv.c
${LUA_SRC_DIR}/lbaselib.c
${LUA_SRC_DIR}/lmathlib.c
${LUA_SRC_DIR}/lstring.c
${LUA_SRC_DIR}/lparser.c
${LUA_SRC_DIR}/ldo.c
${LUA_SRC_DIR}/lzio.c
${LUA_SRC_DIR}/lmem.c
${LUA_SRC_DIR}/strbuf.c
${LUA_SRC_DIR}/lstrlib.c
${LUA_SRC_DIR}/lundump.c
${LUA_SRC_DIR}/lua_cmsgpack.c
${LUA_SRC_DIR}/loslib.c
${LUA_SRC_DIR}/lua_struct.c
${LUA_SRC_DIR}/ldebug.c
${LUA_SRC_DIR}/lobject.c
${LUA_SRC_DIR}/ldump.c
${LUA_SRC_DIR}/lua_cjson.c
${LUA_SRC_DIR}/ldblib.c
${LUA_SRC_DIR}/ltm.c
${LUA_SRC_DIR}/ltable.c
${LUA_SRC_DIR}/lstate.c
${LUA_SRC_DIR}/lua_bit.c
${LUA_SRC_DIR}/lua.c
${LUA_SRC_DIR}/loadlib.c
${LUA_SRC_DIR}/lcode.c
${LUA_SRC_DIR}/lapi.c
${LUA_SRC_DIR}/lgc.c
${LUA_SRC_DIR}/lvm.c
${LUA_SRC_DIR}/lfunc.c
${LUA_SRC_DIR}/lauxlib.c
${LUA_SRC_DIR}/ltablib.c
${LUA_SRC_DIR}/linit.c
${LUA_SRC_DIR}/lopcodes.c
${LUA_SRC_DIR}/llex.c
${LUA_SRC_DIR}/liolib.c)
add_library(lualib STATIC "${LUA_SRCS}")
target_include_directories(lualib PUBLIC "${LUA_SRC_DIR}")
target_compile_definitions(lualib PRIVATE ENABLE_CJSON_GLOBAL)
# Use mkstemp if available
check_function_exists(mkstemp HAVE_MKSTEMP)
if (HAVE_MKSTEMP)
target_compile_definitions(lualib PRIVATE LUA_USE_MKSTEMP)
endif ()
unset(HAVE_MKSTEMP CACHE)

View File

@ -234,10 +234,17 @@ static const luaL_Reg syslib[] = {
/* }====================================================== */
#define UNUSED(V) ((void) V)
/* Only a subset is loaded currently, for sandboxing concerns. */
static const luaL_Reg sandbox_syslib[] = {
{"clock", os_clock},
{NULL, NULL}
};
LUALIB_API int luaopen_os (lua_State *L) {
luaL_register(L, LUA_OSLIBNAME, syslib);
UNUSED(syslib);
luaL_register(L, LUA_OSLIBNAME, sandbox_syslib);
return 1;
}

View File

@ -6,6 +6,7 @@
#include <string.h>
#include <stdint.h>
#define lstring_c
#define LUA_CORE
@ -71,14 +72,55 @@ static TString *newlstr (lua_State *L, const char *str, size_t l,
return ts;
}
uint32_t murmur32(const uint8_t* key, size_t len, uint32_t seed) {
static const uint32_t c1 = 0xcc9e2d51;
static const uint32_t c2 = 0x1b873593;
static const uint32_t r1 = 15;
static const uint32_t r2 = 13;
static const uint32_t m = 5;
static const uint32_t n = 0xe6546b64;
uint32_t hash = seed;
const int nblocks = len / 4;
const uint32_t* blocks = (const uint32_t*) key;
for (int i = 0; i < nblocks; i++) {
uint32_t k = blocks[i];
k *= c1;
k = (k << r1) | (k >> (32 - r1));
k *= c2;
hash ^= k;
hash = ((hash << r2) | (hash >> (32 - r2))) * m + n;
}
const uint8_t* tail = (const uint8_t*) (key + nblocks * 4);
uint32_t k1 = 0;
switch (len & 3) {
case 3:
k1 ^= tail[2] << 16;
case 2:
k1 ^= tail[1] << 8;
case 1:
k1 ^= tail[0];
k1 *= c1;
k1 = (k1 << r1) | (k1 >> (32 - r1));
k1 *= c2;
hash ^= k1;
}
hash ^= len;
hash ^= (hash >> 16);
hash *= 0x85ebca6b;
hash ^= (hash >> 13);
hash *= 0xc2b2ae35;
hash ^= (hash >> 16);
return hash;
}
TString *luaS_newlstr (lua_State *L, const char *str, size_t l) {
GCObject *o;
unsigned int h = cast(unsigned int, l); /* seed */
size_t step = 1;
size_t l1;
for (l1=l; l1>=step; l1-=step) /* compute hash */
h = h ^ ((h<<5)+(h>>2)+cast(unsigned char, str[l1-1]));
unsigned int h = murmur32((uint8_t *)str, l, (uint32_t)l);
for (o = G(L)->strt.hash[lmod(h, G(L)->strt.size)];
o != NULL;
o = o->gch.next) {

View File

@ -132,6 +132,7 @@ static int bit_tohex(lua_State *L)
const char *hexdigits = "0123456789abcdef";
char buf[8];
int i;
if (n == INT32_MIN) n = INT32_MIN+1;
if (n < 0) { n = -n; hexdigits = "0123456789ABCDEF"; }
if (n > 8) n = 8;
for (i = (int)n; --i >= 0; ) { buf[i] = hexdigits[b & 15]; b >>= 4; }

View File

@ -10,7 +10,7 @@
#define LUACMSGPACK_NAME "cmsgpack"
#define LUACMSGPACK_SAFE_NAME "cmsgpack_safe"
#define LUACMSGPACK_VERSION "lua-cmsgpack 0.4.0"
#define LUACMSGPACK_COPYRIGHT "Copyright (C) 2012, Salvatore Sanfilippo"
#define LUACMSGPACK_COPYRIGHT "Copyright (C) 2012, Redis Ltd."
#define LUACMSGPACK_DESCRIPTION "MessagePack C implementation for Lua"
/* Allows a preprocessor directive to override MAX_NESTING */
@ -39,7 +39,7 @@
/* =============================================================================
* MessagePack implementation and bindings for Lua 5.1/5.2.
* Copyright(C) 2012 Salvatore Sanfilippo <antirez@gmail.com>
* Copyright(C) 2012 Redis Ltd.
*
* http://github.com/antirez/lua-cmsgpack
*
@ -958,7 +958,7 @@ LUALIB_API int luaopen_cmsgpack_safe(lua_State *L) {
}
/******************************************************************************
* Copyright (C) 2012 Salvatore Sanfilippo. All rights reserved.
* Copyright (C) 2012 Redis Ltd. All rights reserved.
*
* Permission is hereby granted, free of charge, to any person obtaining
* a copy of this software and associated documentation files (the

File diff suppressed because it is too large Load Diff

View File

@ -8,7 +8,7 @@ done
if [ -z $TCLSH ]
then
echo "You need tcl 8.5 or newer in order to run the Redis test"
echo "You need tcl 8.5 or newer in order to run the Valkey test"
exit 1
fi
$TCLSH tests/test_helper.tcl "${@}"

View File

@ -8,7 +8,7 @@ done
if [ -z $TCLSH ]
then
echo "You need tcl 8.5 or newer in order to run the Redis Cluster test"
echo "You need tcl 8.5 or newer in order to run the Valkey Cluster test"
exit 1
fi
$TCLSH tests/cluster/run.tcl $*

View File

@ -9,50 +9,9 @@ done
if [ -z $TCLSH ]
then
echo "You need tcl 8.5 or newer in order to run the Redis ModuleApi test"
echo "You need tcl 8.5 or newer in order to run the Valkey ModuleApi test"
exit 1
fi
$MAKE -C tests/modules && \
$TCLSH tests/test_helper.tcl \
--single unit/moduleapi/commandfilter \
--single unit/moduleapi/basics \
--single unit/moduleapi/fork \
--single unit/moduleapi/testrdb \
--single unit/moduleapi/infotest \
--single unit/moduleapi/moduleconfigs \
--single unit/moduleapi/infra \
--single unit/moduleapi/propagate \
--single unit/moduleapi/hooks \
--single unit/moduleapi/misc \
--single unit/moduleapi/blockonkeys \
--single unit/moduleapi/blockonbackground \
--single unit/moduleapi/scan \
--single unit/moduleapi/datatype \
--single unit/moduleapi/auth \
--single unit/moduleapi/keyspace_events \
--single unit/moduleapi/blockedclient \
--single unit/moduleapi/getkeys \
--single unit/moduleapi/test_lazyfree \
--single unit/moduleapi/defrag \
--single unit/moduleapi/keyspecs \
--single unit/moduleapi/hash \
--single unit/moduleapi/zset \
--single unit/moduleapi/list \
--single unit/moduleapi/stream \
--single unit/moduleapi/mallocsize \
--single unit/moduleapi/datatype2 \
--single unit/moduleapi/cluster \
--single unit/moduleapi/aclcheck \
--single unit/moduleapi/subcommands \
--single unit/moduleapi/reply \
--single unit/moduleapi/cmdintrospection \
--single unit/moduleapi/eventloop \
--single unit/moduleapi/timer \
--single unit/moduleapi/publish \
--single unit/moduleapi/usercall \
--single unit/moduleapi/postnotifications \
--single unit/moduleapi/async_rm_call \
--single unit/moduleapi/moduleauth \
--single unit/moduleapi/rdbloadsave \
"${@}"
$TCLSH tests/test_helper.tcl --moduleapi "${@}"

Some files were not shown because too many files have changed in this diff Show More