
### Summary of the change This is a base PR for refactoring defrag. It moves the defrag logic to rely on jemalloc [native api](https://github.com/jemalloc/jemalloc/pull/1463#issuecomment-479706489) instead of relying on custom code changes made by valkey in the jemalloc ([je_defrag_hint](9f8185f5c8/deps/jemalloc/include/jemalloc/internal/jemalloc_internal_inlines_c.h (L382)
)) library. This enables valkey to use latest vanila jemalloc without the need to maintain code changes cross jemalloc versions. This change requires some modifications because the new api is providing only the information, not a yes\no defrag. The logic needs to be implemented at valkey code. Additionally, the api does not provide, within single call, all the information needed to make a decision, this information is available through additional api call. To reduce the calls to jemalloc, in this PR the required information is collected during the `computeDefragCycles` and not for every single ptr, this way we are avoiding the additional api call. Followup work will utilize the new options that are now open and will further improve the defrag decision and process. ### Added files: `allocator_defrag.c` / `allocator_defrag.h` - This files implement the allocator specific knowledge for making defrag decision. The knowledge about slabs and allocation logic and so on, all goes into this file. This improves the separation between jemalloc specific code and other possible implementation. ### Moved functions: [`zmalloc_no_tcache` , `zfree_no_tcache` ](4593dc2f05/src/zmalloc.c (L215)
) - these are very jemalloc specific logic assumptions, and are very specific to how we defrag with jemalloc. This is also with the vision that from performance perspective we should consider using tcache, we only need to make sure we don't recycle entries without going through the arena [for example: we can use private tcache, one for free and one for alloc]. `frag_smallbins_bytes` - the logic and implementation moved to the new file ### Existing API: * [once a second + when completed full cycle] [`computeDefragCycles`](4593dc2f05/src/defrag.c (L916)
) * `zmalloc_get_allocator_info` : gets from jemalloc _allocated, active, resident, retained, muzzy_, `frag_smallbins_bytes` * [`frag_smallbins_bytes`](4593dc2f05/src/zmalloc.c (L690)
) : for each bin; gets from jemalloc bin_info, `curr_regs`, `cur_slabs` * [during defrag, for each pointer] * `je_defrag_hint` is getting a memory pointer and returns {0,1} . [Internally it uses](4593dc2f05/deps/jemalloc/include/jemalloc/internal/jemalloc_internal_inlines_c.h (L368)
) this information points: * #`nonfull_slabs` * #`total_slabs` * #free regs in the ptr slab ## Jemalloc API (via ctl interface) [BATCH][`experimental_utilization_batch_query_ctl`](4593dc2f05/deps/jemalloc/src/ctl.c (L4114)
) : gets an array of pointers, returns for each pointer 3 values, * number of free regions in the extent * number of regions in the extent * size of the extent in terms of bytes [EXTENDED][`experimental_utilization_query_ctl`](4593dc2f05/deps/jemalloc/src/ctl.c (L3989)
) : * memory address of the extent a potential reallocation would go into * number of free regions in the extent * number of regions in the extent * size of the extent in terms of bytes * [stats-enabled]total number of free regions in the bin the extent belongs to * [stats-enabled]total number of regions in the bin the extent belongs to ### `experimental_utilization_batch_query_ctl` vs valkey `je_defrag_hint`? [good] - We can query pointers in a batch, reduce the overall overhead - The per ptr decision algorithm is not within jemalloc api, jemalloc only provides information, valkey can tune\configure\optimize easily [bad] - In the batch API we only know the utilization of the slab (of that memory ptr), we don’t get the data about #`nonfull_slabs` and total allocated regs. ## New functions: 1. `defrag_jemalloc_init`: Reducing the cost of call to je_ctl: use the [MIB interface](https://jemalloc.net/jemalloc.3.html) to get a faster calls. See this quote from the jemalloc documentation: The mallctlnametomib() function provides a way to avoid repeated name lookups for applications that repeatedly query the same portion of the namespace,by translating a name to a “Management Information Base” (MIB) that can be passed repeatedly to mallctlbymib(). 6. `jemalloc_sz2binind_lgq*` : this api is to support reverse map between bin size and it’s info without lookup. This mapping depends on the number of size classes we have that are derived from [`lg_quantum`](4593dc2f05/deps/Makefile (L115)
) 7. `defrag_jemalloc_get_frag_smallbins` : This function replaces `frag_smallbins_bytes` the logic moved to the new file allocator_defrag `defrag_jemalloc_should_defrag_multi` → `handle_results` - unpacks the results 8. `should_defrag` : implements the same logic as the existing implementation [inside](9f8185f5c8/deps/jemalloc/include/jemalloc/internal/jemalloc_internal_inlines_c.h (L382)
) je_defrag_hint 9. `defrag_jemalloc_should_defrag_multi` : implements the hint for an array of pointers, utilizing the new batch api. currently only 1 pointer is passed. ### Logical differences: In order to get the information about #`nonfull_slabs` and #`regs`, we use the query cycle to collect the information per size class. In order to find the index of bin information given bin size, in o(1), we use `jemalloc_sz2binind_lgq*` . ## Testing This is the first draft. I did some initial testing that basically fragmentation by reducing max memory and than waiting for defrag to reach desired level. The test only serves as sanity that defrag is succeeding eventually, no data provided here regarding efficiency and performance. ### Test: 1. disable `activedefrag` 2. run valkey benchmark on overlapping address ranges with different block sizes 3. wait untill `used_memory` reaches 10GB 4. set `maxmemory` to 5GB and `maxmemory-policy` to `allkeys-lru` 5. stop load 6. wait for `mem_fragmentation_ratio` to reach 2 7. enable `activedefrag` - start test timer 8. wait until reach `mem_fragmentation_ratio` = 1.1 #### Results*: (With this PR)Test results: ` 56 sec` (Without this PR)Test results: `67 sec` *both runs perform same "work" number of buffers moved to reach fragmentation target Next benchmarking is to compare to: - DONE // existing `je_get_defrag_hint` - compare with naive defrag all: `int defrag_hint() {return 1;}` --------- Signed-off-by: Zvi Schneider <ezvisch@amazon.com> Signed-off-by: Zvi Schneider <zvi.schneider22@gmail.com> Signed-off-by: zvi-code <54795925+zvi-code@users.noreply.github.com> Co-authored-by: Zvi Schneider <ezvisch@amazon.com> Co-authored-by: Zvi Schneider <zvi.schneider22@gmail.com> Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>
155 lines
5.5 KiB
CMake
155 lines
5.5 KiB
CMake
# -------------------------------------------------
|
|
# Define the sources to be built
|
|
# -------------------------------------------------
|
|
|
|
# valkey-server source files
|
|
set(VALKEY_SERVER_SRCS
|
|
${CMAKE_SOURCE_DIR}/src/threads_mngr.c
|
|
${CMAKE_SOURCE_DIR}/src/adlist.c
|
|
${CMAKE_SOURCE_DIR}/src/quicklist.c
|
|
${CMAKE_SOURCE_DIR}/src/ae.c
|
|
${CMAKE_SOURCE_DIR}/src/anet.c
|
|
${CMAKE_SOURCE_DIR}/src/dict.c
|
|
${CMAKE_SOURCE_DIR}/src/kvstore.c
|
|
${CMAKE_SOURCE_DIR}/src/sds.c
|
|
${CMAKE_SOURCE_DIR}/src/zmalloc.c
|
|
${CMAKE_SOURCE_DIR}/src/lzf_c.c
|
|
${CMAKE_SOURCE_DIR}/src/lzf_d.c
|
|
${CMAKE_SOURCE_DIR}/src/pqsort.c
|
|
${CMAKE_SOURCE_DIR}/src/zipmap.c
|
|
${CMAKE_SOURCE_DIR}/src/sha1.c
|
|
${CMAKE_SOURCE_DIR}/src/ziplist.c
|
|
${CMAKE_SOURCE_DIR}/src/release.c
|
|
${CMAKE_SOURCE_DIR}/src/memory_prefetch.c
|
|
${CMAKE_SOURCE_DIR}/src/io_threads.c
|
|
${CMAKE_SOURCE_DIR}/src/networking.c
|
|
${CMAKE_SOURCE_DIR}/src/util.c
|
|
${CMAKE_SOURCE_DIR}/src/object.c
|
|
${CMAKE_SOURCE_DIR}/src/db.c
|
|
${CMAKE_SOURCE_DIR}/src/replication.c
|
|
${CMAKE_SOURCE_DIR}/src/rdb.c
|
|
${CMAKE_SOURCE_DIR}/src/t_string.c
|
|
${CMAKE_SOURCE_DIR}/src/t_list.c
|
|
${CMAKE_SOURCE_DIR}/src/t_set.c
|
|
${CMAKE_SOURCE_DIR}/src/t_zset.c
|
|
${CMAKE_SOURCE_DIR}/src/t_hash.c
|
|
${CMAKE_SOURCE_DIR}/src/config.c
|
|
${CMAKE_SOURCE_DIR}/src/aof.c
|
|
${CMAKE_SOURCE_DIR}/src/pubsub.c
|
|
${CMAKE_SOURCE_DIR}/src/multi.c
|
|
${CMAKE_SOURCE_DIR}/src/debug.c
|
|
${CMAKE_SOURCE_DIR}/src/sort.c
|
|
${CMAKE_SOURCE_DIR}/src/intset.c
|
|
${CMAKE_SOURCE_DIR}/src/syncio.c
|
|
${CMAKE_SOURCE_DIR}/src/cluster.c
|
|
${CMAKE_SOURCE_DIR}/src/cluster_legacy.c
|
|
${CMAKE_SOURCE_DIR}/src/cluster_slot_stats.c
|
|
${CMAKE_SOURCE_DIR}/src/crc16.c
|
|
${CMAKE_SOURCE_DIR}/src/endianconv.c
|
|
${CMAKE_SOURCE_DIR}/src/slowlog.c
|
|
${CMAKE_SOURCE_DIR}/src/eval.c
|
|
${CMAKE_SOURCE_DIR}/src/bio.c
|
|
${CMAKE_SOURCE_DIR}/src/rio.c
|
|
${CMAKE_SOURCE_DIR}/src/rand.c
|
|
${CMAKE_SOURCE_DIR}/src/memtest.c
|
|
${CMAKE_SOURCE_DIR}/src/syscheck.c
|
|
${CMAKE_SOURCE_DIR}/src/crcspeed.c
|
|
${CMAKE_SOURCE_DIR}/src/crccombine.c
|
|
${CMAKE_SOURCE_DIR}/src/crc64.c
|
|
${CMAKE_SOURCE_DIR}/src/bitops.c
|
|
${CMAKE_SOURCE_DIR}/src/sentinel.c
|
|
${CMAKE_SOURCE_DIR}/src/notify.c
|
|
${CMAKE_SOURCE_DIR}/src/setproctitle.c
|
|
${CMAKE_SOURCE_DIR}/src/blocked.c
|
|
${CMAKE_SOURCE_DIR}/src/hyperloglog.c
|
|
${CMAKE_SOURCE_DIR}/src/latency.c
|
|
${CMAKE_SOURCE_DIR}/src/sparkline.c
|
|
${CMAKE_SOURCE_DIR}/src/valkey-check-rdb.c
|
|
${CMAKE_SOURCE_DIR}/src/valkey-check-aof.c
|
|
${CMAKE_SOURCE_DIR}/src/geo.c
|
|
${CMAKE_SOURCE_DIR}/src/lazyfree.c
|
|
${CMAKE_SOURCE_DIR}/src/module.c
|
|
${CMAKE_SOURCE_DIR}/src/evict.c
|
|
${CMAKE_SOURCE_DIR}/src/expire.c
|
|
${CMAKE_SOURCE_DIR}/src/geohash.c
|
|
${CMAKE_SOURCE_DIR}/src/geohash_helper.c
|
|
${CMAKE_SOURCE_DIR}/src/childinfo.c
|
|
${CMAKE_SOURCE_DIR}/src/allocator_defrag.c
|
|
${CMAKE_SOURCE_DIR}/src/defrag.c
|
|
${CMAKE_SOURCE_DIR}/src/siphash.c
|
|
${CMAKE_SOURCE_DIR}/src/rax.c
|
|
${CMAKE_SOURCE_DIR}/src/t_stream.c
|
|
${CMAKE_SOURCE_DIR}/src/listpack.c
|
|
${CMAKE_SOURCE_DIR}/src/localtime.c
|
|
${CMAKE_SOURCE_DIR}/src/lolwut.c
|
|
${CMAKE_SOURCE_DIR}/src/lolwut5.c
|
|
${CMAKE_SOURCE_DIR}/src/lolwut6.c
|
|
${CMAKE_SOURCE_DIR}/src/acl.c
|
|
${CMAKE_SOURCE_DIR}/src/tracking.c
|
|
${CMAKE_SOURCE_DIR}/src/socket.c
|
|
${CMAKE_SOURCE_DIR}/src/tls.c
|
|
${CMAKE_SOURCE_DIR}/src/sha256.c
|
|
${CMAKE_SOURCE_DIR}/src/timeout.c
|
|
${CMAKE_SOURCE_DIR}/src/setcpuaffinity.c
|
|
${CMAKE_SOURCE_DIR}/src/monotonic.c
|
|
${CMAKE_SOURCE_DIR}/src/mt19937-64.c
|
|
${CMAKE_SOURCE_DIR}/src/resp_parser.c
|
|
${CMAKE_SOURCE_DIR}/src/call_reply.c
|
|
${CMAKE_SOURCE_DIR}/src/script_lua.c
|
|
${CMAKE_SOURCE_DIR}/src/script.c
|
|
${CMAKE_SOURCE_DIR}/src/functions.c
|
|
${CMAKE_SOURCE_DIR}/src/function_lua.c
|
|
${CMAKE_SOURCE_DIR}/src/commands.c
|
|
${CMAKE_SOURCE_DIR}/src/strl.c
|
|
${CMAKE_SOURCE_DIR}/src/connection.c
|
|
${CMAKE_SOURCE_DIR}/src/unix.c
|
|
${CMAKE_SOURCE_DIR}/src/server.c
|
|
${CMAKE_SOURCE_DIR}/src/logreqres.c)
|
|
|
|
# valkey-cli
|
|
set(VALKEY_CLI_SRCS
|
|
${CMAKE_SOURCE_DIR}/src/anet.c
|
|
${CMAKE_SOURCE_DIR}/src/adlist.c
|
|
${CMAKE_SOURCE_DIR}/src/dict.c
|
|
${CMAKE_SOURCE_DIR}/src/valkey-cli.c
|
|
${CMAKE_SOURCE_DIR}/src/zmalloc.c
|
|
${CMAKE_SOURCE_DIR}/src/release.c
|
|
${CMAKE_SOURCE_DIR}/src/ae.c
|
|
${CMAKE_SOURCE_DIR}/src/serverassert.c
|
|
${CMAKE_SOURCE_DIR}/src/crcspeed.c
|
|
${CMAKE_SOURCE_DIR}/src/crccombine.c
|
|
${CMAKE_SOURCE_DIR}/src/crc64.c
|
|
${CMAKE_SOURCE_DIR}/src/siphash.c
|
|
${CMAKE_SOURCE_DIR}/src/crc16.c
|
|
${CMAKE_SOURCE_DIR}/src/monotonic.c
|
|
${CMAKE_SOURCE_DIR}/src/cli_common.c
|
|
${CMAKE_SOURCE_DIR}/src/mt19937-64.c
|
|
${CMAKE_SOURCE_DIR}/src/strl.c
|
|
${CMAKE_SOURCE_DIR}/src/cli_commands.c)
|
|
|
|
# valkey-benchmark
|
|
set(VALKEY_BENCHMARK_SRCS
|
|
${CMAKE_SOURCE_DIR}/src/ae.c
|
|
${CMAKE_SOURCE_DIR}/src/anet.c
|
|
${CMAKE_SOURCE_DIR}/src/valkey-benchmark.c
|
|
${CMAKE_SOURCE_DIR}/src/adlist.c
|
|
${CMAKE_SOURCE_DIR}/src/dict.c
|
|
${CMAKE_SOURCE_DIR}/src/zmalloc.c
|
|
${CMAKE_SOURCE_DIR}/src/serverassert.c
|
|
${CMAKE_SOURCE_DIR}/src/release.c
|
|
${CMAKE_SOURCE_DIR}/src/crcspeed.c
|
|
${CMAKE_SOURCE_DIR}/src/crccombine.c
|
|
${CMAKE_SOURCE_DIR}/src/crc64.c
|
|
${CMAKE_SOURCE_DIR}/src/siphash.c
|
|
${CMAKE_SOURCE_DIR}/src/crc16.c
|
|
${CMAKE_SOURCE_DIR}/src/monotonic.c
|
|
${CMAKE_SOURCE_DIR}/src/cli_common.c
|
|
${CMAKE_SOURCE_DIR}/src/mt19937-64.c
|
|
${CMAKE_SOURCE_DIR}/src/strl.c)
|
|
|
|
# valkey-rdma module
|
|
set(VALKEY_RDMA_MODULE_SRCS ${CMAKE_SOURCE_DIR}/src/rdma.c)
|
|
|
|
# valkey-tls module
|
|
set(VALKEY_TLS_MODULE_SRCS ${CMAKE_SOURCE_DIR}/src/tls.c)
|