futriix/crccombine.h at e9b8970e72754461d92b8bf22055a58540878d59 - futriix - Gitea: Git with a cup of tea

gvsafronov/futriix

Josiah Carlson f4e10eee06

CRC64 perf improvements from Redis patches (#350 )

Improve the performance of crc64 for large batches by processing large
number of bytes in parallel and combining the results.

## Performance 
* 53-73% faster on Xeon 2670 v0 @ 2.6ghz
* 2-2.5x faster on Core i3 8130U @ 2.2 ghz
* 1.6-2.46 bytes/cycle on i3 8130U
* likely >2x faster than crcspeed on newer CPUs with more resources than
a 2012-era Xeon 2670
* crc64 combine function runs in <50 nanoseconds typical with vector +
cache optimizations (~8 *microseconds* without vector optimizations, ~80
*microseconds without cache, the combination is extra effective)
* still single-threaded
* valkey-server test crc64 --help (requires `make distclean && make
SERVER_TEST=yes`)

---------

Signed-off-by: Josiah Carlson <josiah.carlson@gmail.com>
Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>

2024-04-30 19:32:01 -07:00

11 lines

321 B

C

Raw Blame History

 #include <stdint.h>
 /* mask types */
 typedef unsigned long long v2uq __attribute__ ((vector_size (16)));
 uint64_t gf2_matrix_times_vec2(uint64_t *mat, uint64_t vec);
 void init_combine_cache(uint64_t poly, uint8_t dim);
 uint64_t crc64_combine(uint64_t crc1, uint64_t crc2, uintmax_t len2, uint64_t poly, uint8_t dim);