This PR builds upon the [previous entry prefetching
optimization](https://github.com/valkey-io/valkey/pull/1501) to further
enhance performance by implementing value prefetching for hashtable
iterators.
## Implementation
Modified `hashtableInitIterator` to accept a new flags parameter,
allowing control over iterator behavior.
Implemented conditional value prefetching within `hashtableNext` based
on the new `HASHTABLE_ITER_PREFETCH_VALUES` flag.
When the flag is set, hashtableNext now calls `prefetchBucketValues` at
the start of each new bucket, preemptively loading the values of filled
entries into the CPU cache.
The actual prefetching of values is performed using type-specific
callback functions implemented in `server.c`:
- For `robj` the `hashtableObjectPrefetchValue` callback is used to
prefetch the value if not embeded.
This implementation is specifically focused on main database iterations
at this stage. Applying it to hashtables that hold other object types
should not be problematic, but its performance benefits for those cases
will need to be proven through testing and benchmarking.
## Performance
### Setup:
- 64cores Graviton 3 Amazon EC2 instance.
- 50 mil keys with different value sizes.
- Running valkey server over RAM file system.
- crc checksum and comperssion off.
### Action
- save command.
### Results
The results regarding the duration of “save” command was taken from
“info all” command.
```
+--------------------+------------------+------------------+
| Prefetching | Value size (byte)| Time (seconds) |
+--------------------+------------------+------------------+
| No | 100 | 20.112279 |
| Yes | 100 | 12.758519 |
| No | 40 | 16.945366 |
| Yes | 40 | 10.902022 |
| No | 20 | 9.817000 |
| Yes | 20 | 9.626821 |
| No | 10 | 9.71510 |
| Yes | 10 | 9.510565 |
+--------------------+------------------+------------------+
```
The results largely align with our expectations, showing significant
improvements for larger values (100 bytes and 40 bytes) that are stored
outside the robj. For smaller values (20 bytes and 10 bytes) that are
embedded within the robj, we see almost no improvement, which is as
expected.
However, the small improvement observed even for these embedded values
is somewhat surprising. Given that we are not actively prefetching these
embedded values, this minor performance gain was not anticipated.
perf record on save command **without** value prefetching:
```
--99.98%--rdbSaveDb
|
|--91.38%--rdbSaveKeyValuePair
| |
| |--42.72%--rdbSaveRawString
| | |
| | |--26.69%--rdbWriteRaw
| | | |
| | | --25.75%--rioFileWrite.lto_priv.0
| | |
| | --15.41%--rdbSaveLen
| | |
| | |--7.58%--rdbWriteRaw
| | | |
| | | --7.08%--rioFileWrite.lto_priv.0
| | | |
| | | --6.54%--_IO_fwrite
| | |
| | |
| | --7.42%--rdbWriteRaw.constprop.1
| | |
| | --7.18%--rioFileWrite.lto_priv.0
| | |
| | --6.73%--_IO_fwrite
| |
| |
| |--40.44%--rdbSaveStringObject
| |
| --7.62%--rdbSaveObjectType
| |
| --7.39%--rdbWriteRaw.constprop.1
| |
| --7.04%--rioFileWrite.lto_priv.0
| |
| --6.59%--_IO_fwrite
|
|
--7.33%--hashtableNext.constprop.1
|
--6.28%--prefetchNextBucketEntries.lto_priv.0
```
perf record on save command **with** value prefetching:
```
rdbSaveRio
|
--99.93%--rdbSaveDb
|
|--79.81%--rdbSaveKeyValuePair
| |
| |--66.79%--rdbSaveRawString
| | |
| | |--42.31%--rdbWriteRaw
| | | |
| | | --40.74%--rioFileWrite.lto_priv.0
| | |
| | --23.37%--rdbSaveLen
| | |
| | |--11.78%--rdbWriteRaw
| | | |
| | | --11.03%--rioFileWrite.lto_priv.0
| | | |
| | | --10.30%--_IO_fwrite
| | | |
| | |
| | --10.98%--rdbWriteRaw.constprop.1
| | |
| | --10.44%--rioFileWrite.lto_priv.0
| | |
| | --9.74%--_IO_fwrite
| | |
| |
| |--11.33%--rdbSaveObjectType
| | |
| | --10.96%--rdbWriteRaw.constprop.1
| | |
| | --10.51%--rioFileWrite.lto_priv.0
| | |
| | --9.75%--_IO_fwrite
| | |
| |
| --0.77%--rdbSaveStringObject
|
--18.39%--hashtableNext
|
|--10.04%--hashtableObjectPrefetchValue
|
--6.06%--prefetchNextBucketEntries
```
Conclusions:
The prefetching strategy appears to be working as intended, shifting the
performance bottleneck from data access to I/O operations.
The significant reduction in rdbSaveStringObject time suggests that
string objects(which are the values) are being accessed more
efficiently.
Signed-off-by: NadavGigi <nadavgigi102@gmail.com>
Instead of a dictEntry with pointers to key and value, the hashtable
has a pointer directly to the value (robj) which can hold an embedded
key and acts as a key-value in the hashtable. This minimizes the number
of pointers to follow and thus the number of memory accesses to lookup
a key-value pair.
Keys robj
hashtable
+-------+ +-----------------------+
| 0 | | type, encoding, LRU |
| 1 ------->| refcount, expire |
| 2 | | ptr |
| ... | | optional embedded key |
+-------+ | optional embedded val |
+-----------------------+
The expire timestamp (TTL) is also stored in the robj, if any. The expire
hash table points to the same robj.
Overview of changes:
* Replace dict with hashtable in kvstore (kvstore.c)
* Add functions for embedding key and expire in robj (object.c)
* When there's unused space, reserve an expire field to avoid realloting
it later if expire is added.
* Always reserve space for expire for large key names to avoid realloc
if it's set later.
* Update db functions (db.c)
* dbAdd, setKey and setExpire reallocate the object when embedding a key
* setKey does not increment the reference counter, since it would require
duplicating the object. This responsibility is moved to the caller.
* Remove logic for shared integer objects as values in the database. The keys
are now embedded in the objects, so all objects in the database need to be
unique. Thus, we can't use shared objects as values. Also delete test cases
for shared integers.
* Adjust various commands to the changes mentioned above.
* Adjust defrag code
* Improvement: Don't access the expires table before defrag has actually
reallocated the object.
* Adjust test cases that were using hard-coded sizes for dict when realloc
would happen, and some other adjustments in test cases.
* Adjust memory prefetch for new hash table implementation in IO-threading,
using new `hashtableIncrementalFind` API
* Adjust offloading of free() to IO threads: Object free to be done in main
thread while keeping obj->ptr offloading in IO-thread since the DB object is
now allocated by the main-thread and not by the IO-thread as it used to be.
* Let expireIfNeeded take an optional value, to avoid looking up the expires
table when possible.
---------
Signed-off-by: Uri Yagelnik <uriy@amazon.com>
Signed-off-by: uriyage <78144248+uriyage@users.noreply.github.com>
Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
Co-authored-by: Uri Yagelnik <uriy@amazon.com>
Remove the dict pointer argument to the `dictType` callbacks `keyDup`,
`keyCompare`, `keyDestructor` and `valDestructor`. This argument was
unused in all of the callback implementations.
The macros `dictFreeKey()` and `dictFreeVal()` are made internal to dict
and moved from dict.h to dict.c. They're also changed from macros to
static inline functions.
Signed-off-by: Qu Chen <quchen@amazon.com>
Follow up to https://github.com/valkey-io/valkey/pull/966, which didn't
update the kvstore tests. I'm not actually entirely clear why it fixes
it, but the consistency prevents the crash very reliably so will merge
it now and maybe see if Zhao has a better explanation.
---------
Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
Remove the unused value duplicate API from dict. It's unused in the codebase and introduces unnecessary overhead.
---------
Signed-off-by: Eran Liberty <eran.liberty@gmail.com>
This PR migrates all tests related to kvstore into new test framework as
part of the parent issue https://github.com/valkey-io/valkey/issues/428.
---------
Signed-off-by: Karthick Ariyaratnam <karthyuom@gmail.com>
Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>