Instead of a dictEntry with pointers to key and value, the hashtable
has a pointer directly to the value (robj) which can hold an embedded
key and acts as a key-value in the hashtable. This minimizes the number
of pointers to follow and thus the number of memory accesses to lookup
a key-value pair.
Keys robj
hashtable
+-------+ +-----------------------+
| 0 | | type, encoding, LRU |
| 1 ------->| refcount, expire |
| 2 | | ptr |
| ... | | optional embedded key |
+-------+ | optional embedded val |
+-----------------------+
The expire timestamp (TTL) is also stored in the robj, if any. The expire
hash table points to the same robj.
Overview of changes:
* Replace dict with hashtable in kvstore (kvstore.c)
* Add functions for embedding key and expire in robj (object.c)
* When there's unused space, reserve an expire field to avoid realloting
it later if expire is added.
* Always reserve space for expire for large key names to avoid realloc
if it's set later.
* Update db functions (db.c)
* dbAdd, setKey and setExpire reallocate the object when embedding a key
* setKey does not increment the reference counter, since it would require
duplicating the object. This responsibility is moved to the caller.
* Remove logic for shared integer objects as values in the database. The keys
are now embedded in the objects, so all objects in the database need to be
unique. Thus, we can't use shared objects as values. Also delete test cases
for shared integers.
* Adjust various commands to the changes mentioned above.
* Adjust defrag code
* Improvement: Don't access the expires table before defrag has actually
reallocated the object.
* Adjust test cases that were using hard-coded sizes for dict when realloc
would happen, and some other adjustments in test cases.
* Adjust memory prefetch for new hash table implementation in IO-threading,
using new `hashtableIncrementalFind` API
* Adjust offloading of free() to IO threads: Object free to be done in main
thread while keeping obj->ptr offloading in IO-thread since the DB object is
now allocated by the main-thread and not by the IO-thread as it used to be.
* Let expireIfNeeded take an optional value, to avoid looking up the expires
table when possible.
---------
Signed-off-by: Uri Yagelnik <uriy@amazon.com>
Signed-off-by: uriyage <78144248+uriyage@users.noreply.github.com>
Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
Co-authored-by: Uri Yagelnik <uriy@amazon.com>
A cache-line aware hash table with a user-defined key-value entry type,
supporting incremental rehashing, scan, iterator, random sampling,
incremental lookup and more...
Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
### Improve expired commands performance with IO threads
#### Background
In our IO threads architecture, IO threads allocate client argv's and
later when we free it after processCommand we offload its free to the IO
threads.
With jemalloc, it's crucial that the same thread that allocates memory
also frees it.
For some commands we modify the client's argv in the main thread during
command processing (for example in `SET EX` command we rewrite the
command to use absolute time for replication propagation).
#### Current issues
1. When commands are rewritten (e.g., expire commands), we store the
original argv
in `c->original_argv`. However, we're currently:
- Freeing new argv (allocated by main thread) in IO threads
- Freeing original argv (allocated by IO threads) in main thread
2. Currently, `c->original_argv` points to new array with old
objects, while `c->argv` has old array with new objects, making memory
free management complicated.
#### Changes
1. Refactored argv modification handling code to ensure consistency -
both array and objects are now either all new or all old
2. Moved original_argv cleanup to happen in resetClient after argv
cleanup
3. Modified IO threads code to properly handle original argv cleanup
when argv are modified.
#### Performance Impact
Benchmark with `SET EX` commands (650 clients, 512 byte value, 8 IO
threads):
- New implementation: **729,548 ops/sec**
- Old implementation: **633,243 ops/sec**
Representing a **~15%** performance improvement due to more efficient
memory handling.
---------
Signed-off-by: Uri Yagelnik <uriy@amazon.com>
Signed-off-by: ranshid <88133677+ranshid@users.noreply.github.com>
Co-authored-by: ranshid <88133677+ranshid@users.noreply.github.com>
Fast_float is a C++ header-only library to parse doubles using SIMD
instructions. The purpose is to speed up sorted sets and other commands
that use doubles. A single-file copy of fast_float is included in this
repo. This introduces an optional dependency on a C++ compiler.
The use of fast_float is enabled at compile time using the make variable
`USE_FAST_FLOAT=yes`. It is disabled by default.
Fixes#1069.
---------
Signed-off-by: Parth Patel <661497+parthpatel@users.noreply.github.com>
Signed-off-by: Parth <661497+parthpatel@users.noreply.github.com>
Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
Co-authored-by: Roshan Swain <swainroshan001@gmail.com>
Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>
Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
Migrate quicklist unit test to new unit test framework, and cleanup
remaining references of SERVER_TEST, parent ticket #428.
Closes#428.
Signed-off-by: artikell <739609084@qq.com>
Signed-off-by: Binbin <binloveplay1314@qq.com>
Co-authored-by: Binbin <binloveplay1314@qq.com>
The current implementation of `sdssplitargs` does repeated `sdscatlen`
to build the parsed arguments, which isn't very efficient because it
does a lot of extra reallocations and moves through the sds code a lot.
It also typically results in memory overhead, because `sdscatlen`
over-allocates, which is usually not needed since args are usually not
modified after being created.
The new implementation of sdssplitargs does two passes, the first to
parse the argument to figure out the final length and the second to
actually copy the string. It's generally about 2x faster for larger
strings (~100 bytes), and about 20% faster for small strings (~10
bytes). This is generally faster since as long as everything is in the
CPU cache, it's going to be fast.
There are a couple of sanity tests, none existed before, as well as some
fuzzying which was used to find some bugs and also to do the
benchmarking. The original benchmarking code can be seen
6576aeb86a.
```
test_sdssplitargs_benchmark - unit/test_sds.c:530] Using random seed: 1729883235
[test_sdssplitargs_benchmark - unit/test_sds.c:577] Improvement: 56.44%, new:13039us, old:29930us
[test_sdssplitargs_benchmark - unit/test_sds.c:577] Improvement: 56.58%, new:12057us, old:27771us
[test_sdssplitargs_benchmark - unit/test_sds.c:577] Improvement: 59.18%, new:9048us, old:22165us
[test_sdssplitargs_benchmark - unit/test_sds.c:577] Improvement: 54.61%, new:12381us, old:27278us
[test_sdssplitargs_benchmark - unit/test_sds.c:577] Improvement: 51.17%, new:16012us, old:32793us
[test_sdssplitargs_benchmark - unit/test_sds.c:577] Improvement: 49.18%, new:16041us, old:31563us
[test_sdssplitargs_benchmark - unit/test_sds.c:577] Improvement: 58.40%, new:12450us, old:29930us
[test_sdssplitargs_benchmark - unit/test_sds.c:577] Improvement: 56.49%, new:13066us, old:30031us
[test_sdssplitargs_benchmark - unit/test_sds.c:577] Improvement: 58.75%, new:12744us, old:30894us
[test_sdssplitargs_benchmark - unit/test_sds.c:577] Improvement: 52.44%, new:16885us, old:35504us
[test_sdssplitargs_benchmark - unit/test_sds.c:577] Improvement: 62.57%, new:8107us, old:21659us
[test_sdssplitargs_benchmark - unit/test_sds.c:577] Improvement: 62.12%, new:8320us, old:21966us
[test_sdssplitargs_benchmark - unit/test_sds.c:577] Improvement: 45.23%, new:13960us, old:25487us
[test_sdssplitargs_benchmark - unit/test_sds.c:577] Improvement: 57.95%, new:9188us, old:21849us
```
---------
Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
Introduce a `size_t` field into the rax struct to track allocation size.
Update the allocation size on rax insert and deletes.
Return the allocation size when `raxAllocSize` is called.
This size tracking is now used in MEMORY USAGE and MEMORY STATS in place
of the previous method based on sampling.
The module API allows to create sorted dictionaries, which are backed by
rax. Users now also get precise memory allocation for them (through
`ValkeyModule_MallocSizeDict`).
Fixes#677.
For the release notes:
* MEMORY USAGE and MEMORY STATS are now exact for streams, rather than
based on sampling.
---------
Signed-off-by: Guillaume Koenig <knggk@amazon.com>
Signed-off-by: Guillaume Koenig <106696198+knggk@users.noreply.github.com>
Co-authored-by: Joey <yzhaon@amazon.com>
Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
This commit hopefully improves the formatting of the codebase by setting
ColumnLimit to 0 and hence stopping clang-format from trying to put as
much stuff in one line as possible.
This change enabled us to remove most of `clang-format off` directives
and fixed a bunch of lines that looked like this:
```c
#define KEY \
VALUE /* comment */
```
Additionally, one pair of `clang-format off` / `clang-format on` had
`clang-format off` as the second comment and hence didn't enable the
formatting for the rest of the file. This commit addresses this issue as
well.
Please tell me if anything in the changes seem off. If everything is
fine, I will add this commit to `.git-blame-ignore-revs` later.
---------
Signed-off-by: Mikhail Koviazin <mikhail.koviazin@aiven.io>
This PR migrates the tests related to dict into new test framework as
part of #428.
Signed-off-by: haoqixu <hq.xu0o0@gmail.com>
Signed-off-by: Binbin <binloveplay1314@qq.com>
Co-authored-by: Binbin <binloveplay1314@qq.com>
This PR migrates the tests related to listpack into new test framework
as part of #428.
Signed-off-by: haoqixu <hq.xu0o0@gmail.com>
Signed-off-by: Binbin <binloveplay1314@qq.com>
Co-authored-by: Binbin <binloveplay1314@qq.com>
Deletes zipmapSet, zipmapGet, etc. Only keep iterator and validate
integrity, what we use when loading an old RDB file.
Adjust unit tests to not use zipmapSet, etc.
Solves a build failure where when compiling with fortify source.
---------
Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
Migrate zipmap unit test to new unit test framework, parent ticket #428
.
---------
Signed-off-by: Shivshankar-Reddy <shiva.sheri.github@gmail.com>
Signed-off-by: hwware <wen.hui.ware@gmail.com>
Co-authored-by: hwware <wen.hui.ware@gmail.com>
sds type should be determined based on the size of the underlying
buffer, not the logical length of the sds. Currently we truncate the
alloc field in case buffer is larger than we can handle. It leads to a
mismatch between alloc field and the actual size of the buffer. Even
considering that alloc doesn't include header size and the null
terminator.
It also leads to a waste of memory with jemalloc. For example, let's
consider creation of sds of length 253. According to the length, the
appropriate type is SDS_TYPE_8. But we allocate `253 + sizeof(struct
sdshdr8) + 1` bytes, which sums to 257 bytes. In this case jemalloc
allocates buffer from the next size bucket. With current configuration
on Linux it's 320 bytes. So we end up with 320 bytes buffer, while we
can't address more than 255.
The same happens with other types and length close enough to the
appropriate powers of 2.
The downside of the adjustment is that with allocators that do not
allocate larger than requested chunks (like GNU allocator), we switch to
a larger type "too early". It leads to small waste of memory.
Specifically: sds of length 31 takes 35 bytes instead of 33 (2 bytes
wasted) sds of length 255 takes 261 bytes instead of 259 (2 bytes
wasted) sds of length 65,535 takes 65,545 bytes instead of 65,541 (4
bytes wasted) sds of length 4,294,967,295 takes 4,294,967,313 bytes
instead of 4,294,967,305 (8 bytes wasted)
---------
Signed-off-by: Vadym Khoptynets <vadymkh@amazon.com>
The replica sends its version when initiating replication, in
pipeline with other REPLCONF commands.
The primary stores it in the client struct. Other fields are made
smaller to avoid making the client struct consume more memory.
Fixes#414.
---------
Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
Issue #428.
Moved the SERVER_TEST block from ziplist.c into unit tests in
test_ziplist.c. I left the benchmark related tasks alone in their own
test, as I am not sure what to do with them.
Some of the assertions are a little vague/useless, but I will try to
refine them.
---------
Signed-off-by: Mason Hall <hallmason17@gmail.com>
This is the actual PR which is created to migrate all tests related to
zmalloc into new test framework as part of the parent issue
https://github.com/valkey-io/valkey/issues/428.
Signed-off-by: Karthick Ariyaratnam <karthyuom@gmail.com>
This PR migrates all tests related to endianconv into new test framework
as part of the parent issue https://github.com/valkey-io/valkey/issues/428.
---------
Signed-off-by: Karthick Ariyaratnam <karthyuom@gmail.com>
Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>
This migrates unit tests related to sha1 to new framework, ref: #428.
---------
Signed-off-by: Shivshankar-Reddy <shiva.sheri.github@gmail.com>
Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>
This patch migrates all tests in sds.c into new test framework as part
of the parent issue https://github.com/valkey-io/valkey/issues/428.
---------
Signed-off-by: Lipeng Zhu <lipeng.zhu@intel.com>
Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>
This PR migrates all tests related to kvstore into new test framework as
part of the parent issue https://github.com/valkey-io/valkey/issues/428.
---------
Signed-off-by: Karthick Ariyaratnam <karthyuom@gmail.com>
Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>
This PR migrates all tests related to util into new test framework as
part of the parent issue https://github.com/valkey-io/valkey/issues/428.
---------
Signed-off-by: Karthick Ariyaratnam <karthyuom@gmail.com>
The core idea was to take a lot of the stuff from the C unity framework
and adapt it a bit here. Each file in the `unit` directory that starts
with `test_` is automatically assumed to be a test suite. Within each
file, all functions that start with `test_` are assumed to be a test.
See unit/README.md for details about the implementation.
Instead of compiling basically a net new binary, the way the tests are
compiled is that the main valkey server is compiled as a static archive,
which we then compile the individual test files against to create a new
test executable. This is not all that important now, other than it makes
the compilation simpler, but what it will allow us to do is overwrite
functions in the archive to enable mocking for cross compilation unit
functions. There are also ways to enable mocking from within the same
compilation unit, but I don't know how important this is.
Tests are also written in one of two styles:
1. Including the header file and directly calling functions from the
archive.
2. Importing the original file, and then calling the functions. This
second approach is cool because we can call static functions. It won't
mess up the archive either.
---------
Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>