History

Optimize PFCOUNT, PFMERGE command by SIMD acceleration (#1293 )

This PR optimizes the performance of HyperLogLog commands (PFCOUNT,
PFMERGE) by adding AVX2 fast paths.

Two AVX2 functions are added for conversion between raw representation
and dense representation. They are 15 ~ 30 times faster than scalar
implementaion. Note that sparse representation is not accelerated.

AVX2 fast paths are enabled when the CPU supports AVX2 (checked at
runtime) and the hyperloglog configuration is default (HLL_REGISTERS ==
16384 && HLL_BITS == 6).

`PFDEBUG SIMD (ON|OFF)` subcommand is added for unit tests. A new TCL
unit test checks that the results produced by non-AVX2 and AVX2
implementations are exactly equal.

When merging 3 dense hll structures, the benchmark shows a 12x speedup
compared to the scalar version.

```
pfcount key1 key2 key3
pfmerge keyall key1 key2 key3
```

```
======================================================================================================
Type             Ops/sec    Avg. Latency     p50 Latency     p99 Latency   p99.9 Latency       KB/sec 
------------------------------------------------------------------------------------------------------
PFCOUNT-scalar    5665.56        35.29839        32.25500        63.99900        67.58300       608.60
PFCOUNT-avx2     72377.83         2.75834         2.67100         5.34300         6.81500      7774.96
------------------------------------------------------------------------------------------------------
PFMERGE-scalar    9851.29        20.28806        20.09500        36.86300        39.16700       615.71
PFMERGE-avx2    125621.89         1.59126         1.55100         3.11900         4.70300     15702.74
------------------------------------------------------------------------------------------------------

scalar: valkey:unstable  2df56d87c0ebe802f38e8922bb2ea1e4ca9cfa76
avx2:   Nugine:hll-simd  8f9adc34021080d96e60bd0abe06b043f3ed0275

CPU:    13th Gen Intel® Core™ i9-13900H × 20
Memory: 32.0 GiB
OS:     Ubuntu 22.04.5 LTS
```

Experiment repo: https://github.com/Nugine/redis-hyperloglog
Benchmark script:
https://github.com/Nugine/redis-hyperloglog/blob/main/scripts/memtier.sh
Algorithm:
https://github.com/Nugine/redis-hyperloglog/blob/main/cpp/bench.cpp

---------

Signed-off-by: Xuyang Wang <xuyangwang@link.cuhk.edu.cn>

2024-12-02 19:40:38 +01:00

assets

Add configuration hide-user-data-from-log to hide user data from server logs (#877 )

2024-09-02 09:50:36 -07:00

cluster

Fix timing issue in cluster-shards tests (#1243 )

2024-11-02 19:51:14 +08:00

helpers

Improve dual channel replication stability and fix compatibility issues (#804 )

2024-07-25 09:34:39 -07:00

integration

Split dual-channel COB overrun tests to separate servers (#1374 )

2024-12-01 21:33:43 +08:00

modules

Add CMake build system for valkey (#1196 )

2024-11-07 18:01:37 -08:00

rdma

RDMA builtin support (#1209 )

2024-11-29 11:13:34 +01:00

sentinel

Fix Error in Daily CI -- reply-schemas-validator (#922 )

2024-08-21 09:36:02 -04:00

support

Manual failover vote is not limited by two times the node timeout (#1305 )

2024-11-19 11:17:20 -05:00

tmp

minor fixes to the new test suite, html doc updated

2010-05-14 18:48:33 +02:00

unit

Optimize PFCOUNT, PFMERGE command by SIMD acceleration (#1293 )

2024-12-02 19:40:38 +01:00

CMakeLists.txt

Add CMake build system for valkey (#1196 )

2024-11-07 18:01:37 -08:00

instances.tcl

Make runtest-cluster support --io-threads option (#933 )

2024-08-22 11:21:06 -04:00

README.md

Delete TLS.md and update README.md about tests (#960 )

2024-08-28 21:17:04 +02:00

test_helper.tcl

Integrating fast_float to optionally replace strtod (#1260 )

2024-11-25 10:01:43 +01:00

README.md

Valkey Test Suite

Overview

Integration tests are written in Tcl, a high-level, general-purpose, interpreted, dynamic programming language [source]. runtest is the main entrance point for running integration tests. For example, to run a single test;

./runtest --single unit/your_test_name
# For additional arguments, you may refer to the `runtest` script itself.

The normal execution mode of the test suite involves starting and manipulating local valkey-server instances, inspecting process state, log files, etc.

The test suite also supports execution against an external server, which is enabled using the --host and --port parameters. When executing against an external server, tests tagged external:skip are skipped.

There are additional runtime options that can further adjust the test suite to match different external server configurations. All options are listed by ./runtest --help. The following table is just a subset of the options:

Option	Impact
`--singledb`	Only use database 0, don't assume others are supported.
`--ignore-encoding`	Skip all checks for specific encoding.
`--ignore-digest`	Skip key value digest validations.
`--cluster-mode`	Run in strict Valkey Cluster compatibility mode.
`--large-memory`	Enables tests that consume more than 100MB
`--tls`	Run tests with TLS. See below.
`--tls-module`	Run tests with TLS, when TLS support is built as a module.
`--help`	Displays the full set of options.

Running with TLS requires the following preparations:

Build Valkey is TLS support, e.g. using make BUILD_TLS=yes, or make BUILD_TLS=module.
Run ./utils/gen-test-certs.sh to generate a root CA and a server certificate.
Install TLS support for TCL, e.g. the tcl-tls package on Debian/Ubuntu.

Additional tests

Not all tests are included in the ./runtest scripts. Some additional entry points are provided by the following scripts, which support a subset of the options listed above:

./runtest-cluster runs more extensive tests for Valkey Cluster. Some cluster tests are included in ./runtest, but not all.
./runtest-sentinel runs tests of Valkey Sentinel.
./runtests-module runs tests of the module API.

Debugging

You can set a breakpoint and invoke a minimal debugger using the bp function.

... your test code before break-point
bp 1
... your test code after break-point

The bp 1 will give back the tcl interpreter to the developer, and allow you to interactively print local variables (through puts), run functions and so forth [source]. bp takes a single argument, which is 1 for the case above, and is used to label a breakpoint with a string. Labels are printed out when breakpoints are hit, so you can identify which breakpoint was triggered. Breakpoints can be skipped by setting the global variable ::bp_skip, and by providing the labels you want to skip.

The minimal debugger comes with the following predefined functions.

Press c to continue past the breakpoint.
Press i to print local variables.

Tag	Indicates
`external:skip`	Not compatible with external servers.
`cluster:skip`	Not compatible with `--cluster-mode`.
`large-memory`	Test that requires more than 100MB
`tls:skip`	Not compatible with `--tls`.
`needs:repl`	Uses replication and needs to be able to `SYNC` from server.
`needs:debug`	Uses the `DEBUG` command or other debugging focused commands (like `OBJECT REFCOUNT`).
`needs:pfdebug`	Uses the `PFDEBUG` command.
`needs:config-maxmemory`	Uses `CONFIG SET` to manipulate memory limit, eviction policies, etc.
`needs:config-resetstat`	Uses `CONFIG RESETSTAT` to reset statistics.
`needs:reset`	Uses `RESET` to reset client connections.
`needs:save`	Uses `SAVE` or `BGSAVE` to create an RDB file.

README.md

Valkey Test Suite

Overview

Additional tests

Debugging

Tags