futriix/src/crccombine.h
Josiah Carlson f4e10eee06
CRC64 perf improvements from Redis patches (#350)
Improve the performance of crc64 for large batches by processing large
number of bytes in parallel and combining the results.

## Performance 
* 53-73% faster on Xeon 2670 v0 @ 2.6ghz
* 2-2.5x faster on Core i3 8130U @ 2.2 ghz
* 1.6-2.46 bytes/cycle on i3 8130U
* likely >2x faster than crcspeed on newer CPUs with more resources than
a 2012-era Xeon 2670
* crc64 combine function runs in <50 nanoseconds typical with vector +
cache optimizations (~8 *microseconds* without vector optimizations, ~80
*microseconds without cache, the combination is extra effective)
* still single-threaded
* valkey-server test crc64 --help (requires `make distclean && make
SERVER_TEST=yes`)

---------

Signed-off-by: Josiah Carlson <josiah.carlson@gmail.com>
Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>
2024-04-30 19:32:01 -07:00

11 lines
321 B
C

#include <stdint.h>
/* mask types */
typedef unsigned long long v2uq __attribute__ ((vector_size (16)));
uint64_t gf2_matrix_times_vec2(uint64_t *mat, uint64_t vec);
void init_combine_cache(uint64_t poly, uint8_t dim);
uint64_t crc64_combine(uint64_t crc1, uint64_t crc2, uintmax_t len2, uint64_t poly, uint8_t dim);