Use hashtable as the default type of temp set object during sunion/sdiff (#996)
This patch try to set the temp set object as default hash table type. And did a simple predication of the temp set object encoding when initialize `dstset` to reduce the unnecessary conversation. ## Issue Description According to existing code logic, when did operation like `sunion` and `sdiff` , the temp set object could be `intset`, `listpack` and `hashtable`, for the `listpack`, the efficiency is low when did operation like `find` and `compare` , need to traverse all elements. When we exploring the hotspots, found the `lpFind` and `memcmp` has been the bottleneck when running workloads like below: - [memtier_benchmark-2keys-set-10-100-elements-sunion.yml](https://github.com/redis/redis-benchmarks-specification/blob/main/redis_benchmarks_specification/test-suites/memtier_benchmark-2keys-set-10-100-elements-sunion.yml) - [memtier_benchmark-2keys-set-10-100-elements-sdiff.yml](https://github.com/redis/redis-benchmarks-specification/blob/main/redis_benchmarks_specification/test-suites/memtier_benchmark-2keys-set-10-100-elements-sdiff.yml)  ## Optimization This patch try to set the temp set object as default hash table type. And did a simple predication of the temp set object encoding when initialize `dstset` to reduce the unnecessary conversation. ### Test Environment - OPERATING SYSTEM: Ubuntu 22.04.4 LTS - Kernel: 5.15.0-116-generic - PROCESSOR: Intel Xeon Platinum 8380 - Server and Client in same socket. #### Server Configuration ``` taskset -c 0-3 ~/valkey/src/valkey-server /tmp/valkey.conf port 9001 bind * -::* daemonize no protected-mode no save "" ``` #### Performance Boost | Test Name| Perf Boost| |-|-| |[memtier_benchmark-2keys-set-10-100-elements-sunion.yml](https://github.com/redis/redis-benchmarks-specification/blob/main/redis_benchmarks_specification/test-suites/memtier_benchmark-2keys-set-10-100-elements-sunion.yml) |41%| |[memtier_benchmark-2keys-set-10-100-elements-sdiff.yml](https://github.com/redis/redis-benchmarks-specification/blob/main/redis_benchmarks_specification/test-suites/memtier_benchmark-2keys-set-10-100-elements-sdiff.yml) |27%| ### More Tests With above test set which have total 110 elements in the 2 given sets. We also did some benchmarking by adjusting the total number of elements in all given sets. We can still observe the performance boost.  --------- Signed-off-by: Lipeng Zhu <lipeng.zhu@intel.com> Co-authored-by: Wangyang Guo <wangyang.guo@intel.com>
This commit is contained in:
parent
9f0c80187e
commit
58fe9c0138
24
src/t_set.c
24
src/t_set.c
@ -1445,6 +1445,7 @@ void sunionDiffGenericCommand(client *c, robj **setkeys, int setnum, robj *dstke
|
||||
robj **sets = zmalloc(sizeof(robj *) * setnum);
|
||||
setTypeIterator *si;
|
||||
robj *dstset = NULL;
|
||||
int dstset_encoding = OBJ_ENCODING_INTSET;
|
||||
char *str;
|
||||
size_t len;
|
||||
int64_t llval;
|
||||
@ -1463,6 +1464,23 @@ void sunionDiffGenericCommand(client *c, robj **setkeys, int setnum, robj *dstke
|
||||
zfree(sets);
|
||||
return;
|
||||
}
|
||||
/* For a SET's encoding, according to the factory method setTypeCreate(), currently have 3 types:
|
||||
* 1. OBJ_ENCODING_INTSET
|
||||
* 2. OBJ_ENCODING_LISTPACK
|
||||
* 3. OBJ_ENCODING_HT
|
||||
* 'dstset_encoding' is used to determine which kind of encoding to use when initialize 'dstset'.
|
||||
*
|
||||
* If all sets are all OBJ_ENCODING_INTSET encoding or 'dstkey' is not null, keep 'dstset'
|
||||
* OBJ_ENCODING_INTSET encoding when initialize. Otherwise it is not efficient to create the 'dstset'
|
||||
* from intset and then convert to listpack or hashtable.
|
||||
*
|
||||
* If one of the set is OBJ_ENCODING_LISTPACK, let's set 'dstset' to hashtable default encoding,
|
||||
* the hashtable is more efficient when find and compare than the listpack. The corresponding
|
||||
* time complexity are O(1) vs O(n). */
|
||||
if (!dstkey && dstset_encoding == OBJ_ENCODING_INTSET &&
|
||||
(setobj->encoding == OBJ_ENCODING_LISTPACK || setobj->encoding == OBJ_ENCODING_HT)) {
|
||||
dstset_encoding = OBJ_ENCODING_HT;
|
||||
}
|
||||
sets[j] = setobj;
|
||||
if (j > 0 && sets[0] == sets[j]) {
|
||||
sameset = 1;
|
||||
@ -1504,7 +1522,11 @@ void sunionDiffGenericCommand(client *c, robj **setkeys, int setnum, robj *dstke
|
||||
/* We need a temp set object to store our union/diff. If the dstkey
|
||||
* is not NULL (that is, we are inside an SUNIONSTORE/SDIFFSTORE operation) then
|
||||
* this set object will be the resulting object to set into the target key*/
|
||||
dstset = createIntsetObject();
|
||||
if (dstset_encoding == OBJ_ENCODING_INTSET) {
|
||||
dstset = createIntsetObject();
|
||||
} else {
|
||||
dstset = createSetObject();
|
||||
}
|
||||
|
||||
if (op == SET_OP_UNION) {
|
||||
/* Union is trivial, just add every element of every set to the
|
||||
|
Loading…
x
Reference in New Issue
Block a user