110 Commits

Author SHA1 Message Date
Otmar Ertl
2e1e5b912a fixed compilation error when using clang as reported by michael-grunder 2018-03-14 21:00:06 +01:00
Otmar Ertl
10cefc8c95 use all 64 bits of the hash value instead of 63 2018-03-11 09:18:00 +01:00
Otmar Ertl
bef2df58fe made constant static 2018-03-10 20:44:20 +01:00
Otmar Ertl
ef4106f0b3 improved definition of HLL_Q 2018-03-10 20:22:42 +01:00
Otmar Ertl
4e2090e78b improved HyperLogLog cardinality estimation
based on method described in https://arxiv.org/abs/1702.01284
that does not rely on any magic constants
2018-03-10 20:13:21 +01:00
Otmar Ertl
3ec0a4480d replaced tab by spaces 2018-03-10 20:09:41 +01:00
antirez
3612de4906 Hyperloglog: refresh hdr variable correctly.
This is a fix for the #3819 improvements. The o->ptr may change because
of hllSparseSet() calls, so 'hdr' must be correctly re-fetched.
2017-12-22 11:26:31 +01:00
antirez
bb3ba76417 Hyperloglog: Support for PFMERGE sparse encoding as target.
This is a fix for #3819.
2017-12-22 11:01:27 +01:00
antirez
355302b300 Hyperloglog: refactoring of sparse/dense add function.
The commit splits the add functions into a set() and add() set of
functions, so that it's possible to set registers in an independent way
just having the index and count.

Related to #3819, otherwise a fix is not possible.
2017-12-22 11:00:22 +01:00
antirez
585cdf3a8f Fix isHLLObjectOrReply() to handle integer encoded strings.
Close #3766.
2017-07-11 12:44:59 +02:00
Salvatore Sanfilippo
8291a07501 Use ARM unaligned accesses ifdefs for SPARC as well. 2017-02-23 22:39:44 +08:00
Salvatore Sanfilippo
fcf931ac38 ARM: Avoid memcpy() in MurmurHash64A() if we are using 64 bit ARM.
However note that in architectures supporting 64 bit unaligned
accesses memcpy(...,...,8) is likely translated to a simple
word memory movement anyway.
2017-02-19 15:00:46 +00:00
Salvatore Sanfilippo
24dcc458f7 ARM: Fix 64 bit unaligned access in MurmurHash64A(). 2017-02-19 14:01:58 +00:00
antirez
ecd18441f5 Switch PFCOUNT to LogLog-Beta algorithm.
The new algorithm provides the same speed with a smaller error for
cardinalities in the range 0-100k. Before switching, the new and old
algorithm behavior was studied in details in the context of
issue #3677. You can find a few graphs and motivations there.
2016-12-16 11:07:30 +01:00
antirez
325b3ad3a1 Use llroundl() before converting loglog-beta output to integer.
Otherwise for small cardinalities the algorithm will output something
like, for example, 4.99 for a candinality of 5, that will be converted
to 4 producing a huge error.
2016-12-16 11:07:30 +01:00
Harish Murthy
37b97975c6 LogLog-Beta Algorithm support within HLL
Config option to use LogLog-Beta Algorithm for Cardinality
2016-12-16 11:07:30 +01:00
antirez
c15cac0d77 RDMF: More consistent define names. 2015-07-27 14:37:58 +02:00
antirez
8a893fa4cf RDMF: REDIS_OK REDIS_ERR -> C_OK C_ERR. 2015-07-26 23:17:55 +02:00
antirez
58844a7bfe RDMF: redisAssert -> serverAssert. 2015-07-26 15:29:53 +02:00
antirez
62b27ebc2a RDMF: OBJ_ macros for object related stuff. 2015-07-26 15:28:00 +02:00
antirez
fa26d3dd63 RDMF: use client instead of redisClient, like Disque. 2015-07-26 15:20:52 +02:00
antirez
6a424b5e36 RDMF (Redis/Disque merge friendlyness) refactoring WIP 1. 2015-07-26 15:17:18 +02:00
antirez
5b6e4e866d Better read-only behavior for expired keys in slaves.
Slaves key expire is orchestrated by the master. Sometimes the master
will send the synthesized DEL to expire keys on the slave with a non
trivial delay (when the key is not accessed, only the incremental expiry
algorithm will expire it in background).

During that time, a key is logically expired, but slaves still return
the key if you GET (or whatever) it. This is a bad behavior.

However we can't simply trust the slave view of the key, since we need
the master to be able to send write commands to update the slave data
set, and DELs should only happen when the key is expired in the master
in order to ensure consistency.

However 99.99% of the issues with this behavior is when a client which
is not a master sends a read only command. In this case we are safe and
can consider the key as non existing.

This commit does a few changes in order to make this sane:

1. lookupKeyRead() is modified in order to return NULL if the above
conditions are met.
2. Calls to lookupKeyRead() in commands actually writing to the data set
are repliaced with calls to lookupKeyWrite().

There are redundand checks, so for example, if in "2" something was
overlooked, we should be still safe, since anyway, when the master
writes the behavior is to don't care about what expireIfneeded()
returns.

This commit is related to  #1768, #1770, #2131.
2014-12-10 16:10:21 +01:00
antirez
12a9ec9a84 Over 80 chars comment trimmed in pfcountCommand(). 2014-12-02 17:03:22 +01:00
antirez
2e94ffb1d1 Remove warnings and improve integer sign correctness. 2014-08-13 11:44:38 +02:00
antirez
0f5a822b92 PFSELFTEST: less false positives.
This is just a quickfix, for the nature of the test the right way to fix
it is to average the error of N runs, since otherwise it is always
possible to get a false positive with a bad run, or to minimize too much
this possibility we may end testing with too much "large" error ranges.
2014-07-23 11:43:57 +02:00
Mike Trinkala
ac1a3c7340 Correct the HyperLogLog stale cache flag to prevent unnecessary computations.
Set the MSB as documented.
2014-05-18 07:26:26 -07:00
antirez
ca8f491e9c Speedup hllRawSum() processing 8 bytes per iteration.
The internal HLL raw encoding used by PFCOUNT when merging multiple keys
is aligned to 8 bits (1 byte per register) so we can exploit this to
improve performances by processing multiple bytes per iteration.

In benchmarks the new code was several times faster with HLLs with many
registers set to zero, while no slowdown was observed with populated
HLLs.
2014-04-17 18:05:27 +02:00
antirez
bb3241f788 Speedup SUM(2^-reg[m]) in HyperLogLog computation.
When the register is set to zero, we need to add 2^-0 to E, which is 1,
but it is faster to just add 'ez' at the end, which is the number of
registers set to zero, a value we need to compute anyway.
2014-04-17 17:53:20 +02:00
antirez
e841ecc3df PFCOUNT support for multi-key union. 2014-04-17 17:32:59 +02:00
antirez
e70c3b6c9b HyperLogLog low level merge extracted from PFMERGE. 2014-04-17 17:08:43 +02:00
antirez
fc3426c687 HyperLogLog invalid representation error code set to INVALIDOBJ. 2014-04-16 09:10:30 +02:00
antirez
e105f2a645 PFDEBUG TODENSE added.
Converts HyperLogLogs from sparse to dense. Used for testing.
2014-04-16 09:05:42 +02:00
antirez
1dca87d69c User-defined switch point between sparse-dense HLL encodings. 2014-04-15 17:46:51 +02:00
antirez
2736ec0d0f PFSELFTEST improved with sparse encoding checks. 2014-04-15 10:10:38 +02:00
antirez
c5d86e9db9 PFDEBUG ENCODING added. 2014-04-14 19:35:00 +02:00
antirez
ddc1186dbf Set HLL_SPARSE_MAX to 3000.
After running a few benchmarks, 3000 looks like a reasonable value to
keep HLLs with a few thousand elements small while the CPU cost is
still not huge.

This covers all the cases where the dense representation would use N
orders of magnitude more space, like in the case of many HLLs with
carinality of a few tens or hundreds.

It is not impossible that in the future this gets user configurable,
however it is easy to pick an unreasoable value just looking at savings
in the space dimension without checking what happens in the time
dimension.
2014-04-14 16:15:55 +02:00
antirez
bff89bd0a3 Error message for invalid HLL objects unified. 2014-04-14 16:11:54 +02:00
antirez
cd73060972 PFMERGE fixed to work with sparse encoding. 2014-04-14 16:09:32 +02:00
antirez
4a43c113c5 Correctly replicate PFDEBUG GETREG.
Even if it is a debugging command, make sure that when it forces a
change in encoding, the command is propagated.
2014-04-14 15:57:19 +02:00
antirez
1e7f95441f Added assertion in hllSparseAdd() when promotion to dense occurs.
If we converted to dense, a register must be updated in the dense
representation.
2014-04-14 15:55:21 +02:00
antirez
f009069d7c hllSparseAdd(): speed optimization.
Mostly by reordering opcodes check conditional by frequency of opcodes
in larger sparse-encoded HLLs.
2014-04-14 15:42:05 +02:00
antirez
d2174e6c9b Detect corrupted sparse HLLs in hllSparseSum(). 2014-04-14 15:20:26 +02:00
antirez
a84b91d052 hllSparseAdd(): faster code removing conditional.
Bottleneck found profiling. Big run time improvement found when testing
after the change.
2014-04-14 12:58:46 +02:00
antirez
6c0f0eb21f Comment typo in hllSparseAdd(). first -> fits. 2014-04-14 12:12:53 +02:00
antirez
8670ab5e11 Merge adjacent VAL opcodes in hllSparseAdd().
As more values are added splitting ZERO or XZERO opcodes, try to merge
adjacent VAL opcodes if they have the same value.
2014-04-14 12:11:39 +02:00
antirez
cba3a04160 More robust HLL_SPARSE macros protecting 'p' with parens.
Now the macros will work with arguments such as "ptr+1".
2014-04-14 11:49:53 +02:00
antirez
d15cd39717 hllSparseAdd() opcode seek stop condition fixed. 2014-04-14 11:04:11 +02:00
antirez
c9ee98b388 Fixed error message generation in PFDEBUG GETREG.
Bulk length for registers was emitted too early, so if there was a bug
the reply looked like a long array with just one element, blocking the
client as result.
2014-04-14 10:25:19 +02:00
antirez
70a3bcf3a3 Fixed memmove() count in hllSparseAdd(). 2014-04-14 09:40:07 +02:00