From 58fe9c0138af8a45dfcb906a3d5c631a6d6e9a30 Mon Sep 17 00:00:00 2001 From: Lipeng Zhu Date: Wed, 11 Sep 2024 04:09:18 +0800 Subject: [PATCH] Use hashtable as the default type of temp set object during sunion/sdiff (#996) This patch try to set the temp set object as default hash table type. And did a simple predication of the temp set object encoding when initialize `dstset` to reduce the unnecessary conversation. ## Issue Description According to existing code logic, when did operation like `sunion` and `sdiff` , the temp set object could be `intset`, `listpack` and `hashtable`, for the `listpack`, the efficiency is low when did operation like `find` and `compare` , need to traverse all elements. When we exploring the hotspots, found the `lpFind` and `memcmp` has been the bottleneck when running workloads like below: - [memtier_benchmark-2keys-set-10-100-elements-sunion.yml](https://github.com/redis/redis-benchmarks-specification/blob/main/redis_benchmarks_specification/test-suites/memtier_benchmark-2keys-set-10-100-elements-sunion.yml) - [memtier_benchmark-2keys-set-10-100-elements-sdiff.yml](https://github.com/redis/redis-benchmarks-specification/blob/main/redis_benchmarks_specification/test-suites/memtier_benchmark-2keys-set-10-100-elements-sdiff.yml) ![image](https://github.com/user-attachments/assets/71dfc70b-2ad5-4832-a338-712deefca20e) ## Optimization This patch try to set the temp set object as default hash table type. And did a simple predication of the temp set object encoding when initialize `dstset` to reduce the unnecessary conversation. ### Test Environment - OPERATING SYSTEM: Ubuntu 22.04.4 LTS - Kernel: 5.15.0-116-generic - PROCESSOR: Intel Xeon Platinum 8380 - Server and Client in same socket. #### Server Configuration ``` taskset -c 0-3 ~/valkey/src/valkey-server /tmp/valkey.conf port 9001 bind * -::* daemonize no protected-mode no save "" ``` #### Performance Boost | Test Name| Perf Boost| |-|-| |[memtier_benchmark-2keys-set-10-100-elements-sunion.yml](https://github.com/redis/redis-benchmarks-specification/blob/main/redis_benchmarks_specification/test-suites/memtier_benchmark-2keys-set-10-100-elements-sunion.yml) |41%| |[memtier_benchmark-2keys-set-10-100-elements-sdiff.yml](https://github.com/redis/redis-benchmarks-specification/blob/main/redis_benchmarks_specification/test-suites/memtier_benchmark-2keys-set-10-100-elements-sdiff.yml) |27%| ### More Tests With above test set which have total 110 elements in the 2 given sets. We also did some benchmarking by adjusting the total number of elements in all given sets. We can still observe the performance boost. ![image](https://github.com/user-attachments/assets/b2ab420c-43e5-45de-9715-7d943df229cb) --------- Signed-off-by: Lipeng Zhu Co-authored-by: Wangyang Guo --- src/t_set.c | 24 +++++++++++++++++++++++- 1 file changed, 23 insertions(+), 1 deletion(-) diff --git a/src/t_set.c b/src/t_set.c index b2aeec52e..a540c3c49 100644 --- a/src/t_set.c +++ b/src/t_set.c @@ -1445,6 +1445,7 @@ void sunionDiffGenericCommand(client *c, robj **setkeys, int setnum, robj *dstke robj **sets = zmalloc(sizeof(robj *) * setnum); setTypeIterator *si; robj *dstset = NULL; + int dstset_encoding = OBJ_ENCODING_INTSET; char *str; size_t len; int64_t llval; @@ -1463,6 +1464,23 @@ void sunionDiffGenericCommand(client *c, robj **setkeys, int setnum, robj *dstke zfree(sets); return; } + /* For a SET's encoding, according to the factory method setTypeCreate(), currently have 3 types: + * 1. OBJ_ENCODING_INTSET + * 2. OBJ_ENCODING_LISTPACK + * 3. OBJ_ENCODING_HT + * 'dstset_encoding' is used to determine which kind of encoding to use when initialize 'dstset'. + * + * If all sets are all OBJ_ENCODING_INTSET encoding or 'dstkey' is not null, keep 'dstset' + * OBJ_ENCODING_INTSET encoding when initialize. Otherwise it is not efficient to create the 'dstset' + * from intset and then convert to listpack or hashtable. + * + * If one of the set is OBJ_ENCODING_LISTPACK, let's set 'dstset' to hashtable default encoding, + * the hashtable is more efficient when find and compare than the listpack. The corresponding + * time complexity are O(1) vs O(n). */ + if (!dstkey && dstset_encoding == OBJ_ENCODING_INTSET && + (setobj->encoding == OBJ_ENCODING_LISTPACK || setobj->encoding == OBJ_ENCODING_HT)) { + dstset_encoding = OBJ_ENCODING_HT; + } sets[j] = setobj; if (j > 0 && sets[0] == sets[j]) { sameset = 1; @@ -1504,7 +1522,11 @@ void sunionDiffGenericCommand(client *c, robj **setkeys, int setnum, robj *dstke /* We need a temp set object to store our union/diff. If the dstkey * is not NULL (that is, we are inside an SUNIONSTORE/SDIFFSTORE operation) then * this set object will be the resulting object to set into the target key*/ - dstset = createIntsetObject(); + if (dstset_encoding == OBJ_ENCODING_INTSET) { + dstset = createIntsetObject(); + } else { + dstset = createSetObject(); + } if (op == SET_OP_UNION) { /* Union is trivial, just add every element of every set to the