27398 Commits

Author SHA1 Message Date
antirez
433ce7f85c Changed HyperLogLog hash seed to a non-zero value.
Using a seed of zero has the side effect of having the empty string
hashing to what is a very special case in the context of HyperLogLog: a
very long run of zeroes.

This did not influenced the correctness of the result with 16k registers
because of the harmonic mean, but still it is inconvenient that a so
obvious value maps to a so special hash.

The seed 0xadc83b19 is used instead, which is the first 64 bits of the
SHA1 of the empty string.

Reference: issue #1657.
2014-04-04 09:36:32 +02:00
antirez
038f6c8221 Initial HyperLogLog tests. 2014-04-03 22:16:05 +02:00
antirez
352208ff68 Initial HyperLogLog tests. 2014-04-03 22:16:05 +02:00
antirez
f1fb45446e Return "WRONGTYPE" error on PF* type mismatch. 2014-04-03 22:10:20 +02:00
antirez
d2ca4bb62d Return "WRONGTYPE" error on PF* type mismatch. 2014-04-03 22:10:20 +02:00
antirez
84f8741390 Fix PFADD infinite loop.
We need to guarantee that the last bit is 1, otherwise an element may
hash to just zeroes with probability 1/(2^64) and trigger an infinite
loop.

See issue #1657.
2014-04-03 19:31:26 +02:00
antirez
349c978189 Fix PFADD infinite loop.
We need to guarantee that the last bit is 1, otherwise an element may
hash to just zeroes with probability 1/(2^64) and trigger an infinite
loop.

See issue #1657.
2014-04-03 19:31:26 +02:00
antirez
2fbf9b25f9 Make hll-gnuplot-graph.rb callable from cli. 2014-04-03 16:38:11 +02:00
antirez
b612affba3 Make hll-gnuplot-graph.rb callable from cli. 2014-04-03 16:38:11 +02:00
antirez
498b56470c Remove HyperLogLog type checking duplicated code. 2014-04-03 13:20:34 +02:00
antirez
ce637b2fef Remove HyperLogLog type checking duplicated code. 2014-04-03 13:20:34 +02:00
antirez
f13e271c83 PFGETREG added for testing purposes.
The new command allows to get a dump of the registers stored
into an HyperLogLog data structure for testing / debugging purposes.
2014-04-03 10:45:30 +02:00
antirez
aaaed66c56 PFGETREG added for testing purposes.
The new command allows to get a dump of the registers stored
into an HyperLogLog data structure for testing / debugging purposes.
2014-04-03 10:45:30 +02:00
antirez
9c81d09f9a PFCOUNT: unshare the object when cached cardinality is modified. 2014-04-03 10:37:32 +02:00
antirez
9682295f68 PFCOUNT: unshare the object when cached cardinality is modified. 2014-04-03 10:37:32 +02:00
antirez
a5ff183cd0 PFSELFTEST improved to test the approximation error. 2014-04-03 10:18:31 +02:00
antirez
be9860d0e9 PFSELFTEST improved to test the approximation error. 2014-04-03 10:18:31 +02:00
antirez
478cb4eebc hll-gnuplot-graph.rb improved with new filter.
The function to generate graphs is also more flexible as now includes
step and max value. The step of the samples generation function is no
longer limited to min step of 1000.
2014-04-02 12:30:11 +02:00
antirez
cf34507b87 hll-gnuplot-graph.rb improved with new filter.
The function to generate graphs is also more flexible as now includes
step and max value. The step of the samples generation function is no
longer limited to min step of 1000.
2014-04-02 12:30:11 +02:00
antirez
896ca14c29 HyperLogLog: added magic / version.
This will allow future changes like compressed representations.
Currently the magic is not checked for performance reasons but this may
change in the future, for example if we add new types encoded in strings
that may have the same size of HyperLogLogs.
2014-04-02 09:58:47 +02:00
antirez
096b5e921e HyperLogLog: added magic / version.
This will allow future changes like compressed representations.
Currently the magic is not checked for performance reasons but this may
change in the future, for example if we add new types encoded in strings
that may have the same size of HyperLogLogs.
2014-04-02 09:58:47 +02:00
Salvatore Sanfilippo
ffab888083 Merge pull request #1646 from raydog/unstable
Change HLL* to PF* in a few spots
2014-04-02 08:12:27 +02:00
Salvatore Sanfilippo
cef505d739 Merge pull request #1646 from raydog/unstable
Change HLL* to PF* in a few spots
2014-04-02 08:12:27 +02:00
Raymond Myers
8c4ca89605 Fixed pfadd/pfcount commands emitting hll* events instead of pf* events 2014-04-01 14:59:13 -07:00
Raymond Myers
bf066c875f Fixed pfadd/pfcount commands emitting hll* events instead of pf* events 2014-04-01 14:59:13 -07:00
Raymond Myers
5f8b9118aa Change HLL* to PF* in error messages 2014-04-01 14:54:31 -07:00
Raymond Myers
f0868e080d Change HLL* to PF* in error messages 2014-04-01 14:54:31 -07:00
antirez
8e203cede9 Include redis.h before other stuff in hyperloglog.c.
Otherwise fmacros.h is included later and this may break compilation on
different systems.
2014-04-01 15:52:15 +02:00
antirez
4ab162a559 Include redis.h before other stuff in hyperloglog.c.
Otherwise fmacros.h is included later and this may break compilation on
different systems.
2014-04-01 15:52:15 +02:00
antirez
4e25a641e1 HyperLogLog API prefix modified from "P" to "PF".
Using both the initials of Philippe Flajolet instead of just "P".
2014-03-31 22:48:01 +02:00
antirez
5afcca34ce HyperLogLog API prefix modified from "P" to "PF".
Using both the initials of Philippe Flajolet instead of just "P".
2014-03-31 22:48:01 +02:00
antirez
254b945388 Makefile.dep updated with hyperloglog.o deps. 2014-03-31 19:51:34 +02:00
antirez
ba4e20835a Makefile.dep updated with hyperloglog.o deps. 2014-03-31 19:51:34 +02:00
antirez
97a6ccdb7c HyperLogLog: make API use the P prefix in honor of Philippe Flajolet. 2014-03-31 19:29:40 +02:00
antirez
e887c62e45 HyperLogLog: make API use the P prefix in honor of Philippe Flajolet. 2014-03-31 19:29:40 +02:00
antirez
c98815940a HLLMERGE fixed by adding a... missing loop! 2014-03-31 16:03:05 +02:00
antirez
f1b7608128 HLLMERGE fixed by adding a... missing loop! 2014-03-31 16:03:05 +02:00
antirez
f34b176b95 hll-gnuplot-graph.rb: added new filter "all". 2014-03-31 15:45:06 +02:00
antirez
0c9f06a237 hll-gnuplot-graph.rb: added new filter "all". 2014-03-31 15:45:06 +02:00
antirez
ef631343ae HyperLogLog apply bias correction using a polynomial.
Better results can be achieved by compensating for the bias of the raw
approximation just after 2.5m (when LINEARCOUNTING is no longer used) by
using a polynomial that approximates the bias at a given cardinality.

The curve used was found using this web page:

    http://www.xuru.org/rt/PR.asp

That performs polynomial regression given a set of values.
2014-03-31 15:41:38 +02:00
antirez
ec1ee66256 HyperLogLog apply bias correction using a polynomial.
Better results can be achieved by compensating for the bias of the raw
approximation just after 2.5m (when LINEARCOUNTING is no longer used) by
using a polynomial that approximates the bias at a given cardinality.

The curve used was found using this web page:

    http://www.xuru.org/rt/PR.asp

That performs polynomial regression given a set of values.
2014-03-31 15:41:38 +02:00
antirez
a8836a1e83 HLLMERGE implemented.
Merge N HLL data structures by selecting the max value for every
M[i] register among the set of HLLs.
2014-03-31 14:39:44 +02:00
antirez
f2277475b2 HLLMERGE implemented.
Merge N HLL data structures by selecting the max value for every
M[i] register among the set of HLLs.
2014-03-31 14:39:44 +02:00
antirez
e0d40af33b HLLCOUNT is technically a write command
When we update the cached value, we need to propagate the command and
signal the key as modified for WATCH.
2014-03-31 12:29:24 +02:00
antirez
4ab45183fc HLLCOUNT is technically a write command
When we update the cached value, we need to propagate the command and
signal the key as modified for WATCH.
2014-03-31 12:29:24 +02:00
antirez
1c8a0043dd HLLADD: propagate write when only variable name is given.
The following form is given:

    HLLADD myhll

No element is provided in the above case so if 'myhll' var does not
exist the result is to just create an empty HLL structure, and no update
will be performed on the registers.

In this case, the DB should still be set dirty and the command
propagated.
2014-03-31 12:21:08 +02:00
antirez
8aeb0c196a HLLADD: propagate write when only variable name is given.
The following form is given:

    HLLADD myhll

No element is provided in the above case so if 'myhll' var does not
exist the result is to just create an empty HLL structure, and no update
will be performed on the registers.

In this case, the DB should still be set dirty and the command
propagated.
2014-03-31 12:21:08 +02:00
antirez
2c5201f83d hll-gnuplot-graph.rb: Use |error| when filter is :max 2014-03-31 11:58:13 +02:00
antirez
69a93194fd hll-gnuplot-graph.rb: Use |error| when filter is :max 2014-03-31 11:58:13 +02:00
antirez
e0c8ac97a3 Ignore txt files inside utils/hyperloglog.
Those are generated to trace graphs using gnuplot.
2014-03-31 10:21:14 +02:00