History

Improve multithreaded performance with memory prefetching (#861 )

This PR utilizes the IO threads to execute commands in batches, allowing
us to prefetch the dictionary data in advance.

After making the IO threads asynchronous and offloading more work to
them in the first 2 PRs, the `lookupKey` function becomes a main
bottle-neck and it takes about 50% of the main-thread time (Tested with
SET command). This is because the Valkey dictionary is a straightforward
but inefficient chained hash implementation. While traversing the hash
linked lists, every access to either a dictEntry structure, pointer to
key, or a value object requires, with high probability, an expensive
external memory access.

### Memory Access Amortization

Memory Access Amortization (MAA) is a technique designed to optimize the
performance of dynamic data structures by reducing the impact of memory
access latency. It is applicable when multiple operations need to be
executed concurrently. The principle behind it is that for certain
dynamic data structures, executing operations in a batch is more
efficient than executing each one separately.

Rather than executing operations sequentially, this approach interleaves
the execution of all operations. This is done in such a way that
whenever a memory access is required during an operation, the program
prefetches the necessary memory and transitions to another operation.
This ensures that when one operation is blocked awaiting memory access,
other memory accesses are executed in parallel, thereby reducing the
average access latency.

We applied this method in the development of `dictPrefetch`, which takes
as parameters a vector of keys and dictionaries. It ensures that all
memory addresses required to execute dictionary operations for these
keys are loaded into the L1-L3 caches when executing commands.
Essentially, `dictPrefetch` is an interleaved execution of dictFind for
all the keys.


**Implementation details**

When the main thread iterates over the `clients-pending-io-read`, for
clients with ready-to-execute commands (i.e., clients for which the IO
thread has parsed the commands), a batch of up to 16 commands is
created. Initially, the command's argv, which were allocated by the IO
thread, is prefetched to the main thread's L1 cache. Subsequently, all
the dict entries and values required for the commands are prefetched
from the dictionary before the command execution. Only then will the
commands be executed.

---------

Signed-off-by: Uri Yagelnik <uriy@amazon.com>

2024-08-26 21:10:44 -07:00

assets

Introduce enable-debug-assert to enable/disable debug asserts at runtime (#584 )

2024-05-31 22:50:08 -07:00

cluster

2024-08-14 09:20:36 -07:00

helpers

Improve dual channel replication stability and fix compatibility issues (#804 )

2024-07-25 09:34:39 -07:00

integration

Revert repl backlog size back to 1mb for dual channel tests (#934 )

2024-08-22 15:35:28 -07:00

modules

2024-08-14 09:20:36 -07:00

rdma

Introduce Valkey Over RDMA transport (experimental) (#477 )

2024-07-15 14:04:22 +02:00

sentinel

Fix Error in Daily CI -- reply-schemas-validator (#922 )

2024-08-21 09:36:02 -04:00

support

Skip tracking clients OOM test when I/O threads are enabled (#764 )

2024-08-21 17:02:57 -07:00

tmp

minor fixes to the new test suite, html doc updated

2010-05-14 18:48:33 +02:00

unit

Improve multithreaded performance with memory prefetching (#861 )

2024-08-26 21:10:44 -07:00

instances.tcl

Make runtest-cluster support --io-threads option (#933 )

2024-08-22 11:21:06 -04:00

README.md

Introduce a minimal debugger for .tcl integration test suite. (#683 )

2024-06-25 10:24:53 -07:00

test_helper.tcl

Skip tracking clients OOM test when I/O threads are enabled (#764 )

2024-08-21 17:02:57 -07:00

README.md

Valkey Test Suite

Overview

Integration tests are written in Tcl, a high-level, general-purpose, interpreted, dynamic programming language [source]. runtest is the main entrance point for running integration tests. For example, to run a single test;

./runtest --single unit/your_test_name
# For additional arguments, you may refer to the `runtest` script itself.

The normal execution mode of the test suite involves starting and manipulating local valkey-server instances, inspecting process state, log files, etc.

The test suite also supports execution against an external server, which is enabled using the --host and --port parameters. When executing against an external server, tests tagged external:skip are skipped.

There are additional runtime options that can further adjust the test suite to match different external server configurations:

Option	Impact
`--singledb`	Only use database 0, don't assume others are supported.
`--ignore-encoding`	Skip all checks for specific encoding.
`--ignore-digest`	Skip key value digest validations.
`--cluster-mode`	Run in strict Valkey Cluster compatibility mode.
`--large-memory`	Enables tests that consume more than 100mb

Debugging

You can set a breakpoint and invoke a minimal debugger using the bp function.

... your test code before break-point
bp 1
... your test code after break-point

The bp 1 will give back the tcl interpreter to the developer, and allow you to interactively print local variables (through puts), run functions and so forth [source]. bp takes a single argument, which is 1 for the case above, and is used to label a breakpoint with a string. Labels are printed out when breakpoints are hit, so you can identify which breakpoint was triggered. Breakpoints can be skipped by setting the global variable ::bp_skip, and by providing the labels you want to skip.

The minimal debugger comes with the following predefined functions.

Press c to continue past the breakpoint.
Press i to print local variables.

Tags

Tags are applied to tests to classify them according to the subsystem they test, but also to indicate compatibility with different run modes and required capabilities.

Tags can be applied in different context levels:

start_server context
tags context that bundles several tests together
A single test context.

The following compatibility and capability tags are currently used:

Tag	Indicates
`external:skip`	Not compatible with external servers.
`cluster:skip`	Not compatible with `--cluster-mode`.
`large-memory`	Test that requires more than 100mb
`tls:skip`	Not compatible with `--tls`.
`needs:repl`	Uses replication and needs to be able to `SYNC` from server.
`needs:debug`	Uses the `DEBUG` command or other debugging focused commands (like `OBJECT REFCOUNT`).
`needs:pfdebug`	Uses the `PFDEBUG` command.
`needs:config-maxmemory`	Uses `CONFIG SET` to manipulate memory limit, eviction policies, etc.
`needs:config-resetstat`	Uses `CONFIG RESETSTAT` to reset statistics.
`needs:reset`	Uses `RESET` to reset client connections.
`needs:save`	Uses `SAVE` or `BGSAVE` to create an RDB file.

When using an external server (--host and --port), filtering using the external:skip tags is done automatically.

When using --cluster-mode, filtering using the cluster:skip tag is done automatically.

When not using --large-memory, filtering using the largemem:skip tag is done automatically.

In addition, it is possible to specify additional configuration. For example, to run tests on a server that does not permit SYNC use:

./runtest --host <host> --port <port> --tags -needs:repl