
### Summary of the change This is a base PR for refactoring defrag. It moves the defrag logic to rely on jemalloc [native api](https://github.com/jemalloc/jemalloc/pull/1463#issuecomment-479706489) instead of relying on custom code changes made by valkey in the jemalloc ([je_defrag_hint](9f8185f5c8/deps/jemalloc/include/jemalloc/internal/jemalloc_internal_inlines_c.h (L382)
)) library. This enables valkey to use latest vanila jemalloc without the need to maintain code changes cross jemalloc versions. This change requires some modifications because the new api is providing only the information, not a yes\no defrag. The logic needs to be implemented at valkey code. Additionally, the api does not provide, within single call, all the information needed to make a decision, this information is available through additional api call. To reduce the calls to jemalloc, in this PR the required information is collected during the `computeDefragCycles` and not for every single ptr, this way we are avoiding the additional api call. Followup work will utilize the new options that are now open and will further improve the defrag decision and process. ### Added files: `allocator_defrag.c` / `allocator_defrag.h` - This files implement the allocator specific knowledge for making defrag decision. The knowledge about slabs and allocation logic and so on, all goes into this file. This improves the separation between jemalloc specific code and other possible implementation. ### Moved functions: [`zmalloc_no_tcache` , `zfree_no_tcache` ](4593dc2f05/src/zmalloc.c (L215)
) - these are very jemalloc specific logic assumptions, and are very specific to how we defrag with jemalloc. This is also with the vision that from performance perspective we should consider using tcache, we only need to make sure we don't recycle entries without going through the arena [for example: we can use private tcache, one for free and one for alloc]. `frag_smallbins_bytes` - the logic and implementation moved to the new file ### Existing API: * [once a second + when completed full cycle] [`computeDefragCycles`](4593dc2f05/src/defrag.c (L916)
) * `zmalloc_get_allocator_info` : gets from jemalloc _allocated, active, resident, retained, muzzy_, `frag_smallbins_bytes` * [`frag_smallbins_bytes`](4593dc2f05/src/zmalloc.c (L690)
) : for each bin; gets from jemalloc bin_info, `curr_regs`, `cur_slabs` * [during defrag, for each pointer] * `je_defrag_hint` is getting a memory pointer and returns {0,1} . [Internally it uses](4593dc2f05/deps/jemalloc/include/jemalloc/internal/jemalloc_internal_inlines_c.h (L368)
) this information points: * #`nonfull_slabs` * #`total_slabs` * #free regs in the ptr slab ## Jemalloc API (via ctl interface) [BATCH][`experimental_utilization_batch_query_ctl`](4593dc2f05/deps/jemalloc/src/ctl.c (L4114)
) : gets an array of pointers, returns for each pointer 3 values, * number of free regions in the extent * number of regions in the extent * size of the extent in terms of bytes [EXTENDED][`experimental_utilization_query_ctl`](4593dc2f05/deps/jemalloc/src/ctl.c (L3989)
) : * memory address of the extent a potential reallocation would go into * number of free regions in the extent * number of regions in the extent * size of the extent in terms of bytes * [stats-enabled]total number of free regions in the bin the extent belongs to * [stats-enabled]total number of regions in the bin the extent belongs to ### `experimental_utilization_batch_query_ctl` vs valkey `je_defrag_hint`? [good] - We can query pointers in a batch, reduce the overall overhead - The per ptr decision algorithm is not within jemalloc api, jemalloc only provides information, valkey can tune\configure\optimize easily [bad] - In the batch API we only know the utilization of the slab (of that memory ptr), we don’t get the data about #`nonfull_slabs` and total allocated regs. ## New functions: 1. `defrag_jemalloc_init`: Reducing the cost of call to je_ctl: use the [MIB interface](https://jemalloc.net/jemalloc.3.html) to get a faster calls. See this quote from the jemalloc documentation: The mallctlnametomib() function provides a way to avoid repeated name lookups for applications that repeatedly query the same portion of the namespace,by translating a name to a “Management Information Base” (MIB) that can be passed repeatedly to mallctlbymib(). 6. `jemalloc_sz2binind_lgq*` : this api is to support reverse map between bin size and it’s info without lookup. This mapping depends on the number of size classes we have that are derived from [`lg_quantum`](4593dc2f05/deps/Makefile (L115)
) 7. `defrag_jemalloc_get_frag_smallbins` : This function replaces `frag_smallbins_bytes` the logic moved to the new file allocator_defrag `defrag_jemalloc_should_defrag_multi` → `handle_results` - unpacks the results 8. `should_defrag` : implements the same logic as the existing implementation [inside](9f8185f5c8/deps/jemalloc/include/jemalloc/internal/jemalloc_internal_inlines_c.h (L382)
) je_defrag_hint 9. `defrag_jemalloc_should_defrag_multi` : implements the hint for an array of pointers, utilizing the new batch api. currently only 1 pointer is passed. ### Logical differences: In order to get the information about #`nonfull_slabs` and #`regs`, we use the query cycle to collect the information per size class. In order to find the index of bin information given bin size, in o(1), we use `jemalloc_sz2binind_lgq*` . ## Testing This is the first draft. I did some initial testing that basically fragmentation by reducing max memory and than waiting for defrag to reach desired level. The test only serves as sanity that defrag is succeeding eventually, no data provided here regarding efficiency and performance. ### Test: 1. disable `activedefrag` 2. run valkey benchmark on overlapping address ranges with different block sizes 3. wait untill `used_memory` reaches 10GB 4. set `maxmemory` to 5GB and `maxmemory-policy` to `allkeys-lru` 5. stop load 6. wait for `mem_fragmentation_ratio` to reach 2 7. enable `activedefrag` - start test timer 8. wait until reach `mem_fragmentation_ratio` = 1.1 #### Results*: (With this PR)Test results: ` 56 sec` (Without this PR)Test results: `67 sec` *both runs perform same "work" number of buffers moved to reach fragmentation target Next benchmarking is to compare to: - DONE // existing `je_get_defrag_hint` - compare with naive defrag all: `int defrag_hint() {return 1;}` --------- Signed-off-by: Zvi Schneider <ezvisch@amazon.com> Signed-off-by: Zvi Schneider <zvi.schneider22@gmail.com> Signed-off-by: zvi-code <54795925+zvi-code@users.noreply.github.com> Co-authored-by: Zvi Schneider <ezvisch@amazon.com> Co-authored-by: Zvi Schneider <zvi.schneider22@gmail.com> Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>
This project was forked from the open source Redis project right before the transition to their new source available licenses.
This README is just a fast quick start document. More details can be found under valkey.io
What is Valkey?
Valkey is a high-performance data structure server that primarily serves key/value workloads. It supports a wide range of native structures and an extensible plugin system for adding new data structures and access patterns.
Building Valkey using Makefile
Valkey can be compiled and used on Linux, OSX, OpenBSD, NetBSD, FreeBSD. We support big endian and little endian architectures, and both 32 bit and 64 bit systems.
It may compile on Solaris derived systems (for instance SmartOS) but our support for this platform is best effort and Valkey is not guaranteed to work as well as in Linux, OSX, and *BSD.
It is as simple as:
% make
To build with TLS support, you'll need OpenSSL development libraries (e.g. libssl-dev on Debian/Ubuntu).
To build TLS support as Valkey built-in:
% make BUILD_TLS=yes
To build TLS as Valkey module:
% make BUILD_TLS=module
Note that sentinel mode does not support TLS module.
To build with experimental RDMA support you'll need RDMA development libraries (e.g. librdmacm-dev and libibverbs-dev on Debian/Ubuntu). For now, Valkey only supports RDMA as connection module mode. Run:
% make BUILD_RDMA=module
To build with systemd support, you'll need systemd development libraries (such as libsystemd-dev on Debian/Ubuntu or systemd-devel on CentOS) and run:
% make USE_SYSTEMD=yes
To append a suffix to Valkey program names, use:
% make PROG_SUFFIX="-alt"
You can build a 32 bit Valkey binary using:
% make 32bit
After building Valkey, it is a good idea to test it using:
% make test
The above runs the main integration tests. Additional tests are started using:
% make test-unit # Unit tests
% make test-modules # Tests of the module API
% make test-sentinel # Valkey Sentinel integration tests
% make test-cluster # Valkey Cluster integration tests
More about running the integration tests can be found in tests/README.md and for unit tests, see src/unit/README.md.
Fixing build problems with dependencies or cached build options
Valkey has some dependencies which are included in the deps
directory.
make
does not automatically rebuild dependencies even if something in
the source code of dependencies changes.
When you update the source code with git pull
or when code inside the
dependencies tree is modified in any other way, make sure to use the following
command in order to really clean everything and rebuild from scratch:
% make distclean
This will clean: jemalloc, lua, hiredis, linenoise and other dependencies.
Also if you force certain build options like 32bit target, no C compiler
optimizations (for debugging purposes), and other similar build time options,
those options are cached indefinitely until you issue a make distclean
command.
Fixing problems building 32 bit binaries
If after building Valkey with a 32 bit target you need to rebuild it
with a 64 bit target, or the other way around, you need to perform a
make distclean
in the root directory of the Valkey distribution.
In case of build errors when trying to build a 32 bit binary of Valkey, try the following steps:
- Install the package libc6-dev-i386 (also try g++-multilib).
- Try using the following command line instead of
make 32bit
:make CFLAGS="-m32 -march=native" LDFLAGS="-m32"
Allocator
Selecting a non-default memory allocator when building Valkey is done by setting
the MALLOC
environment variable. Valkey is compiled and linked against libc
malloc by default, with the exception of jemalloc being the default on Linux
systems. This default was picked because jemalloc has proven to have fewer
fragmentation problems than libc malloc.
To force compiling against libc malloc, use:
% make MALLOC=libc
To compile against jemalloc on Mac OS X systems, use:
% make MALLOC=jemalloc
Monotonic clock
By default, Valkey will build using the POSIX clock_gettime function as the monotonic clock source. On most modern systems, the internal processor clock can be used to improve performance. Cautions can be found here: http://oliveryang.net/2015/09/pitfalls-of-TSC-usage/
To build with support for the processor's internal instruction clock, use:
% make CFLAGS="-DUSE_PROCESSOR_CLOCK"
Verbose build
Valkey will build with a user-friendly colorized output by default. If you want to see a more verbose output, use the following:
% make V=1
Running Valkey
To run Valkey with the default configuration, just type:
% cd src
% ./valkey-server
If you want to provide your valkey.conf, you have to run it using an additional parameter (the path of the configuration file):
% cd src
% ./valkey-server /path/to/valkey.conf
It is possible to alter the Valkey configuration by passing parameters directly as options using the command line. Examples:
% ./valkey-server --port 9999 --replicaof 127.0.0.1 6379
% ./valkey-server /etc/valkey/6379.conf --loglevel debug
All the options in valkey.conf are also supported as options using the command line, with exactly the same name.
Running Valkey with TLS:
Running manually
To manually run a Valkey server with TLS mode (assuming ./gen-test-certs.sh
was invoked so sample certificates/keys are available):
-
TLS built-in mode:
./src/valkey-server --tls-port 6379 --port 0 \ --tls-cert-file ./tests/tls/valkey.crt \ --tls-key-file ./tests/tls/valkey.key \ --tls-ca-cert-file ./tests/tls/ca.crt
-
TLS module mode:
./src/valkey-server --tls-port 6379 --port 0 \ --tls-cert-file ./tests/tls/valkey.crt \ --tls-key-file ./tests/tls/valkey.key \ --tls-ca-cert-file ./tests/tls/ca.crt \ --loadmodule src/valkey-tls.so
Note that you can disable TCP by specifying --port 0
explicitly.
It's also possible to have both TCP and TLS available at the same time,
but you'll have to assign different ports.
Use valkey-cli
to connect to the Valkey server:
./src/valkey-cli --tls \
--cert ./tests/tls/valkey.crt \
--key ./tests/tls/valkey.key \
--cacert ./tests/tls/ca.crt
Specifying --tls-replication yes
makes a replica connect to the primary.
Using --tls-cluster yes
makes Valkey Cluster use TLS across nodes.
Running Valkey with RDMA:
Note that Valkey Over RDMA is an experimental feature. It may be changed or removed in any minor or major version. Currently, it is only supported on Linux.
To manually run a Valkey server with RDMA mode:
% ./src/valkey-server --protected-mode no \
--loadmodule src/valkey-rdma.so bind=192.168.122.100 port=6379
It's possible to change bind address/port of RDMA by runtime command:
192.168.122.100:6379> CONFIG SET rdma.port 6380
It's also possible to have both RDMA and TCP available, and there is no conflict of TCP(6379) and RDMA(6379), Ex:
% ./src/valkey-server --protected-mode no \
--loadmodule src/valkey-rdma.so bind=192.168.122.100 port=6379 \
--port 6379
Note that the network card (192.168.122.100 of this example) should support RDMA. To test a server supports RDMA or not:
% rdma res show (a new version iproute2 package)
Or:
% ibv_devices
Playing with Valkey
You can use valkey-cli to play with Valkey. Start a valkey-server instance, then in another terminal try the following:
% cd src
% ./valkey-cli
valkey> ping
PONG
valkey> set foo bar
OK
valkey> get foo
"bar"
valkey> incr mycounter
(integer) 1
valkey> incr mycounter
(integer) 2
valkey>
Installing Valkey
In order to install Valkey binaries into /usr/local/bin, just use:
% make install
You can use make PREFIX=/some/other/directory install
if you wish to use a
different destination.
Note: For compatibility with Redis, we create symlinks from the Redis names (redis-server
, redis-cli
, etc.) to the Valkey binaries installed by make install
.
The symlinks are created in same directory as the Valkey binaries.
The symlinks are removed when using make uninstall
.
The creation of the symlinks can be skipped by setting the makefile variable USE_REDIS_SYMLINKS=no
.
make install
will just install binaries in your system, but will not configure
init scripts and configuration files in the appropriate place. This is not
needed if you just want to play a bit with Valkey, but if you are installing
it the proper way for a production system, we have a script that does this
for Ubuntu and Debian systems:
% cd utils
% ./install_server.sh
Note: install_server.sh
will not work on Mac OSX; it is built for Linux only.
The script will ask you a few questions and will setup everything you need to run Valkey properly as a background daemon that will start again on system reboots.
You'll be able to stop and start Valkey using the script named
/etc/init.d/valkey_<portnumber>
, for instance /etc/init.d/valkey_6379
.
Building using CMake
In addition to the traditional Makefile
build, Valkey supports an alternative, experimental, build system using CMake
.
To build and install Valkey
, in Release
mode (an optimized build), type this into your terminal:
mkdir build-release
cd $_
cmake .. -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=/opt/valkey
sudo make install
# Valkey is now installed under /opt/valkey
Other options supported by Valkey's CMake
build system:
Special build flags
-DBUILD_TLS=<on|off|module>
enable TLS build for Valkey-DBUILD_RDMA=<off|module>
enable RDMA module build (only module mode supported)-DBUILD_MALLOC=<libc|jemalloc|tcmalloc|tcmalloc_minimal>
choose the allocator to use. Default on Linux:jemalloc
, for other OS:libc
-DBUILD_SANITIZER=<address|thread|undefined>
build with address sanitizer enabled-DBUILD_UNIT_TESTS=[1|0]
when set, the build will produce the executablevalkey-unit-tests
-DBUILD_TEST_MODULES=[1|0]
when set, the build will include the modules located under thetests/modules
folder-DBUILD_EXAMPLE_MODULES=[1|0]
when set, the build will include the example modules located under thesrc/modules
folder
Common flags
-DCMAKE_BUILD_TYPE=<Debug|Release...>
define the build type, see CMake manual for more details-DCMAKE_INSTALL_PREFIX=/installation/path
override this value to define a custom install prefix. Default:/usr/local
-G<Generator Name>
generate build files for "Generator Name". By default, CMake will generateMakefile
s.
Verbose build
CMake
generates a user-friendly colorized output by default.
If you want to see a more verbose output, use the following:
make VERBOSE=1
Troubleshooting
During the CMake
stage, CMake
caches variables in a local file named CMakeCache.txt
. All variables generated by Valkey
are removed from the cache once consumed (this is done by calling to unset(VAR-NAME CACHE)
). However, some variables,
like the compiler path, are kept in cache. To start a fresh build either remove the cache file CMakeCache.txt
from the
build folder, or delete the build folder completely.
It is important to re-run CMake
when adding new source files.
Integration with IDE
During the CMake
stage of the build, CMake
generates a JSON file named compile_commands.json
and places it under the
build folder. This file is used by many IDEs and text editors for providing code completion (via clangd
).
A small caveat is that these tools will look for compile_commands.json
under the Valkey's top folder.
A common workaround is to create a symbolic link to it:
cd /path/to/valkey/
# We assume here that your build folder is `build-release`
ln -sf $(pwd)/build-release/compile_commands.json $(pwd)/compile_commands.json
Restart your IDE and voila
Code contributions
Please see the CONTRIBUTING.md. For security bugs and vulnerabilities, please see SECURITY.md.
Valkey is an open community project under LF Projects
Valkey a Series of LF Projects, LLC 2810 N Church St, PMB 57274 Wilmington, Delaware 19802-4447