624 Commits

Author SHA1 Message Date
antirez
ea5ea8da3d Streams: XREAD related code to serve blocked clients. 2017-12-01 10:24:24 +01:00
antirez
2cacdcd6f8 Streams: XREAD related code to serve blocked clients. 2017-12-01 10:24:24 +01:00
antirez
084a871019 Streams: XREAD get-keys method. 2017-12-01 10:24:24 +01:00
antirez
110041825c Streams: XREAD get-keys method. 2017-12-01 10:24:24 +01:00
antirez
428bb7cca0 Streams: augment client.bpop with XREAD specific fields. 2017-12-01 10:24:24 +01:00
antirez
4086dff477 Streams: augment client.bpop with XREAD specific fields. 2017-12-01 10:24:24 +01:00
antirez
6195349b50 Streams: more internal preparation for blocking XREAD. 2017-12-01 10:24:24 +01:00
antirez
f80dfbf464 Streams: more internal preparation for blocking XREAD. 2017-12-01 10:24:24 +01:00
antirez
896300be30 Streams: initial work to use blocking lists logic for streams XREAD. 2017-12-01 10:24:24 +01:00
antirez
4a377cecd8 Streams: initial work to use blocking lists logic for streams XREAD. 2017-12-01 10:24:24 +01:00
antirez
a168cbef13 Streams: implement stream object release. 2017-12-01 10:24:24 +01:00
antirez
439120c620 Streams: implement stream object release. 2017-12-01 10:24:24 +01:00
antirez
760ad8f65c Streams: XLEN command. 2017-12-01 10:24:24 +01:00
antirez
ec9bbe96bf Streams: XLEN command. 2017-12-01 10:24:24 +01:00
antirez
e4e216f8cc Streams: assign value of 6 to OBJ_STREAM + some refactoring. 2017-12-01 10:24:24 +01:00
antirez
100d43c1ac Streams: assign value of 6 to OBJ_STREAM + some refactoring. 2017-12-01 10:24:24 +01:00
antirez
71ad8ff26e Streams: 12 commits squashed into the initial Streams implementation. 2017-12-01 10:24:24 +01:00
antirez
79866a6361 Streams: 12 commits squashed into the initial Streams implementation. 2017-12-01 10:24:24 +01:00
antirez
445d37f24d PSYNC2: Save Lua scripts state into RDB file.
This is currently needed in order to fix #4483, but this can be
useful in other contexts, so maybe later we may want to remove the
conditionals and always save/load scripts.

Note that we are using the "lua" AUX field here, in order to guarantee
backward compatibility of the RDB file. The unknown AUX fields must be
discarded by past versions of Redis.
2017-11-30 18:37:52 +01:00
antirez
f11a7585a8 PSYNC2: Save Lua scripts state into RDB file.
This is currently needed in order to fix #4483, but this can be
useful in other contexts, so maybe later we may want to remove the
conditionals and always save/load scripts.

Note that we are using the "lua" AUX field here, in order to guarantee
backward compatibility of the RDB file. The unknown AUX fields must be
discarded by past versions of Redis.
2017-11-30 18:37:52 +01:00
zhaozhao.zz
44e3dc083a networking: optimize unlinkClient() in freeClient() 2017-11-30 18:11:05 +08:00
zhaozhao.zz
43be967690 networking: optimize unlinkClient() in freeClient() 2017-11-30 18:11:05 +08:00
Itamar Haber
78aabf66ff Standardizes the 'help' subcommand
This adds a new `addReplyHelp` helper that's used by commands
when returning a help text. The following commands have been
touched: DEBUG, OBJECT, COMMAND, PUBSUB, SCRIPT and SLOWLOG.

WIP

Fix entry command table entry for OBJECT for HELP option.

After #4472 the command may have just 2 arguments.

Improve OBJECT HELP descriptions.

See #4472.

WIP 2

WIP 3
2017-11-28 21:15:45 +02:00
Itamar Haber
59d52f7fab Standardizes the 'help' subcommand
This adds a new `addReplyHelp` helper that's used by commands
when returning a help text. The following commands have been
touched: DEBUG, OBJECT, COMMAND, PUBSUB, SCRIPT and SLOWLOG.

WIP

Fix entry command table entry for OBJECT for HELP option.

After #4472 the command may have just 2 arguments.

Improve OBJECT HELP descriptions.

See #4472.

WIP 2

WIP 3
2017-11-28 21:15:45 +02:00
zhaozhao.zz
660f01011c LFU: do some changes about LFU to find hotkeys
Firstly, use access time to replace the decreas time of LFU.
For function LFUDecrAndReturn,
it should only try to get decremented counter,
not update LFU fields, we will update it in an explicit way.
And we will times halve the counter according to the times of
elapsed time than server.lfu_decay_time.
Everytime a key is accessed, we should update the LFU
including update access time, and increment the counter after
call function LFUDecrAndReturn.
If a key is overwritten, the LFU should be also updated.
Then we can use `OBJECT freq` command to get a key's frequence,
and LFUDecrAndReturn should be called in `OBJECT freq` command
in case of the key has not been accessed for a long time,
because we update the access time only when the key is read or
overwritten.
2017-11-27 18:39:22 +01:00
zhaozhao.zz
583c314725 LFU: do some changes about LFU to find hotkeys
Firstly, use access time to replace the decreas time of LFU.
For function LFUDecrAndReturn,
it should only try to get decremented counter,
not update LFU fields, we will update it in an explicit way.
And we will times halve the counter according to the times of
elapsed time than server.lfu_decay_time.
Everytime a key is accessed, we should update the LFU
including update access time, and increment the counter after
call function LFUDecrAndReturn.
If a key is overwritten, the LFU should be also updated.
Then we can use `OBJECT freq` command to get a key's frequence,
and LFUDecrAndReturn should be called in `OBJECT freq` command
in case of the key has not been accessed for a long time,
because we update the access time only when the key is read or
overwritten.
2017-11-27 18:39:22 +01:00
zhaozhao.zz
aef50770ba LFU: change lfu* parameters to int 2017-11-27 18:38:55 +01:00
zhaozhao.zz
53cea97204 LFU: change lfu* parameters to int 2017-11-27 18:38:55 +01:00
antirez
66c47a4d06 Fix replication of SLAVEOF inside transaction.
In Redis 4.0 replication, with the introduction of PSYNC2, masters and
slaves replicate commands to cascading slaves and to the replication
backlog itself in a different way compared to the past.

Masters actually replicate the effects of client commands.
Slaves just propagate what they receive from masters.

This mechanism can cause problems when the configuration of an instance
is changed from master to slave inside a transaction. For instance
we could send to a master instance the following sequence:

    MULTI
    SLAVEOF 127.0.0.1 0
    EXEC
    SLAVEOF NO ONE

Before the fixes in this commit, the MULTI command used to be propagated
into the replication backlog, however after the SLAVEOF command the
instance is a slave, so the EXEC implementation failed to also propagate
the EXEC command. When the slaves of the above instance reconnected,
they were incrementally synchronized just sending a "MULTI". This put
the master client (in the slaves) into MULTI state, breaking the
replication.

Notably even Redis Sentinel uses the above approach in order to guarantee
that configuration changes are always performed together with rewrites
of the configuration and with clients disconnection. Sentiel does:

    MULTI
    SLAVEOF ...
    CONFIG REWRITE
    CLIENT KILL TYPE normal
    EXEC

So this was a really problematic issue. However even with the fix in
this commit, that will add the final EXEC to the replication stream in
case the instance was switched from master to slave during the
transaction, the result would be to increment the slave replication
offset, so a successive reconnection with the new master, will not
permit a successful partial resynchronization: no way the new master can
provide us with the backlog needed, we incremented our offset to a value
that the new master cannot have.

However the EXEC implementation waits to emit the MULTI, so that if the
commands inside the transaction actually do not need to be replicated,
no commands propagation happens at all. From multi.c:

    if (!must_propagate && !(c->cmd->flags & (CMD_READONLY|CMD_ADMIN))) {
	execCommandPropagateMulti(c);
	must_propagate = 1;
    }

The above code is already modified by this commit you are reading.
Now also ADMIN commands do not trigger the emission of MULTI. It is actually
not clear why we do not just check for CMD_WRITE... Probably I wrote it this
way in order to make the code more reliable: better to over-emit MULTI
than not emitting it in time.

So this commit should indeed fix issue #3836 (verified), however it looks
like some reconsideration of this code path is needed in the long term.

BONUS POINT: The reverse bug.

Even in a read only slave "B", in a replication setup like:

	A -> B -> C

There are commands without the READONLY nor the ADMIN flag, that are also
not flagged as WRITE commands. An example is just the PING command.

So if we send B the following sequence:

    MULTI
    PING
    SLAVEOF NO ONE
    EXEC

The result will be the reverse bug, where only EXEC is emitted, but not the
previous MULTI. However this apparently does not create problems in practice
but it is yet another acknowledge of the fact some work is needed here
in order to make this code path less surprising.

Note that there are many different approaches we could follow. For instance
MULTI/EXEC blocks containing administrative commands may be allowed ONLY
if all the commands are administrative ones, otherwise they could be
denined. When allowed, the commands could simply never be replicated at all.
2017-07-12 11:07:28 +02:00
antirez
e74f0aa6d1 Fix replication of SLAVEOF inside transaction.
In Redis 4.0 replication, with the introduction of PSYNC2, masters and
slaves replicate commands to cascading slaves and to the replication
backlog itself in a different way compared to the past.

Masters actually replicate the effects of client commands.
Slaves just propagate what they receive from masters.

This mechanism can cause problems when the configuration of an instance
is changed from master to slave inside a transaction. For instance
we could send to a master instance the following sequence:

    MULTI
    SLAVEOF 127.0.0.1 0
    EXEC
    SLAVEOF NO ONE

Before the fixes in this commit, the MULTI command used to be propagated
into the replication backlog, however after the SLAVEOF command the
instance is a slave, so the EXEC implementation failed to also propagate
the EXEC command. When the slaves of the above instance reconnected,
they were incrementally synchronized just sending a "MULTI". This put
the master client (in the slaves) into MULTI state, breaking the
replication.

Notably even Redis Sentinel uses the above approach in order to guarantee
that configuration changes are always performed together with rewrites
of the configuration and with clients disconnection. Sentiel does:

    MULTI
    SLAVEOF ...
    CONFIG REWRITE
    CLIENT KILL TYPE normal
    EXEC

So this was a really problematic issue. However even with the fix in
this commit, that will add the final EXEC to the replication stream in
case the instance was switched from master to slave during the
transaction, the result would be to increment the slave replication
offset, so a successive reconnection with the new master, will not
permit a successful partial resynchronization: no way the new master can
provide us with the backlog needed, we incremented our offset to a value
that the new master cannot have.

However the EXEC implementation waits to emit the MULTI, so that if the
commands inside the transaction actually do not need to be replicated,
no commands propagation happens at all. From multi.c:

    if (!must_propagate && !(c->cmd->flags & (CMD_READONLY|CMD_ADMIN))) {
	execCommandPropagateMulti(c);
	must_propagate = 1;
    }

The above code is already modified by this commit you are reading.
Now also ADMIN commands do not trigger the emission of MULTI. It is actually
not clear why we do not just check for CMD_WRITE... Probably I wrote it this
way in order to make the code more reliable: better to over-emit MULTI
than not emitting it in time.

So this commit should indeed fix issue #3836 (verified), however it looks
like some reconsideration of this code path is needed in the long term.

BONUS POINT: The reverse bug.

Even in a read only slave "B", in a replication setup like:

	A -> B -> C

There are commands without the READONLY nor the ADMIN flag, that are also
not flagged as WRITE commands. An example is just the PING command.

So if we send B the following sequence:

    MULTI
    PING
    SLAVEOF NO ONE
    EXEC

The result will be the reverse bug, where only EXEC is emitted, but not the
previous MULTI. However this apparently does not create problems in practice
but it is yet another acknowledge of the fact some work is needed here
in order to make this code path less surprising.

Note that there are many different approaches we could follow. For instance
MULTI/EXEC blocks containing administrative commands may be allowed ONLY
if all the commands are administrative ones, otherwise they could be
denined. When allowed, the commands could simply never be replicated at all.
2017-07-12 11:07:28 +02:00
antirez
63ec3e0170 AOF check utility: ability to check files with RDB preamble. 2017-07-10 13:38:23 +02:00
antirez
fc7ecd8d35 AOF check utility: ability to check files with RDB preamble. 2017-07-10 13:38:23 +02:00
antirez
ed93fb8a29 Modules: DEBUG DIGEST interface. 2017-07-06 11:04:46 +02:00
antirez
51ffd062d3 Modules: DEBUG DIGEST interface. 2017-07-06 11:04:46 +02:00
antirez
83bf475551 Added GEORADIUS(BYMEMBER)_RO variants for read-only operations.
Issue #4084 shows how for a design error, GEORADIUS is a write command
because of the STORE option. Because of this it does not work
on readonly slaves, gets redirected to masters in Redis Cluster even
when the connection is in READONLY mode and so forth.

To break backward compatibility at this stage, with Redis 4.0 to be in
advanced RC state, is problematic for the user base. The API can be
fixed into the unstable branch soon if we'll decide to do so in order to
be more consistent, and reease Redis 5.0 with this incompatibility in
the future. This is still unclear.

However, the ability to scale GEO queries in slaves easily is too
important so this commit adds two read-only variants to the GEORADIUS
and GEORADIUSBYMEMBER command: GEORADIUS_RO and GEORADIUSBYMEMBER_RO.
The commands are exactly as the original commands, but they do not
accept the STORE and STOREDIST options.
2017-06-30 10:03:37 +02:00
antirez
f8547e53f0 Added GEORADIUS(BYMEMBER)_RO variants for read-only operations.
Issue #4084 shows how for a design error, GEORADIUS is a write command
because of the STORE option. Because of this it does not work
on readonly slaves, gets redirected to masters in Redis Cluster even
when the connection is in READONLY mode and so forth.

To break backward compatibility at this stage, with Redis 4.0 to be in
advanced RC state, is problematic for the user base. The API can be
fixed into the unstable branch soon if we'll decide to do so in order to
be more consistent, and reease Redis 5.0 with this incompatibility in
the future. This is still unclear.

However, the ability to scale GEO queries in slaves easily is too
important so this commit adds two read-only variants to the GEORADIUS
and GEORADIUSBYMEMBER command: GEORADIUS_RO and GEORADIUSBYMEMBER_RO.
The commands are exactly as the original commands, but they do not
accept the STORE and STOREDIST options.
2017-06-30 10:03:37 +02:00
antirez
627713f627 RDB modules values serialization format version 2.
The original RDB serialization format was not parsable without the
module loaded, becuase the structure was managed only by the module
itself. Moreover RDB is a streaming protocol in the sense that it is
both produce di an append-only fashion, and is also sometimes directly
sent to the socket (in the case of diskless replication).

The fact that modules values cannot be parsed without the relevant
module loaded is a problem in many ways: RDB checking tools must have
loaded modules even for doing things not involving the value at all,
like splitting an RDB into N RDBs by key or alike, or just checking the
RDB for sanity.

In theory module values could be just a blob of data with a prefixed
length in order for us to be able to skip it. However prefixing the values
with a length would mean one of the following:

1. To be able to write some data at a previous offset. This breaks
stremaing.
2. To bufferize values before outputting them. This breaks performances.
3. To have some chunked RDB output format. This breaks simplicity.

Moreover, the above solution, still makes module values a totally opaque
matter, with the fowllowing problems:

1. The RDB check tool can just skip the value without being able to at
least check the general structure. For datasets composed mostly of
modules values this means to just check the outer level of the RDB not
actually doing any checko on most of the data itself.
2. It is not possible to do any recovering or processing of data for which a
module no longer exists in the future, or is unknown.

So this commit implements a different solution. The modules RDB
serialization API is composed if well defined calls to store integers,
floats, doubles or strings. After this commit, the parts generated by
the module API have a one-byte prefix for each of the above emitted
parts, and there is a final EOF byte as well. So even if we don't know
exactly how to interpret a module value, we can always parse it at an
high level, check the overall structure, understand the types used to
store the information, and easily skip the whole value.

The change is backward compatible: older RDB files can be still loaded
since the new encoding has a new RDB type: MODULE_2 (of value 7).
The commit also implements the ability to check RDB files for sanity
taking advantage of the new feature.
2017-06-27 13:19:16 +02:00
antirez
365dd037dc RDB modules values serialization format version 2.
The original RDB serialization format was not parsable without the
module loaded, becuase the structure was managed only by the module
itself. Moreover RDB is a streaming protocol in the sense that it is
both produce di an append-only fashion, and is also sometimes directly
sent to the socket (in the case of diskless replication).

The fact that modules values cannot be parsed without the relevant
module loaded is a problem in many ways: RDB checking tools must have
loaded modules even for doing things not involving the value at all,
like splitting an RDB into N RDBs by key or alike, or just checking the
RDB for sanity.

In theory module values could be just a blob of data with a prefixed
length in order for us to be able to skip it. However prefixing the values
with a length would mean one of the following:

1. To be able to write some data at a previous offset. This breaks
stremaing.
2. To bufferize values before outputting them. This breaks performances.
3. To have some chunked RDB output format. This breaks simplicity.

Moreover, the above solution, still makes module values a totally opaque
matter, with the fowllowing problems:

1. The RDB check tool can just skip the value without being able to at
least check the general structure. For datasets composed mostly of
modules values this means to just check the outer level of the RDB not
actually doing any checko on most of the data itself.
2. It is not possible to do any recovering or processing of data for which a
module no longer exists in the future, or is unknown.

So this commit implements a different solution. The modules RDB
serialization API is composed if well defined calls to store integers,
floats, doubles or strings. After this commit, the parts generated by
the module API have a one-byte prefix for each of the above emitted
parts, and there is a final EOF byte as well. So even if we don't know
exactly how to interpret a module value, we can always parse it at an
high level, check the overall structure, understand the types used to
store the information, and easily skip the whole value.

The change is backward compatible: older RDB files can be still loaded
since the new encoding has a new RDB type: MODULE_2 (of value 7).
The commit also implements the ability to check RDB files for sanity
taking advantage of the new feature.
2017-06-27 13:19:16 +02:00
xuzhou
809a73be97 Fix set with ex/px option when propagated to aof 2017-06-16 17:51:38 +08:00
xuzhou
530fcf8687 Fix set with ex/px option when propagated to aof 2017-06-16 17:51:38 +08:00
Qu Chen
29122cfa05 Implement getKeys procedure for georadius and georadiusbymember
commands.
2017-06-14 18:15:48 +02:00
Qu Chen
4740424049 Implement getKeys procedure for georadius and georadiusbymember
commands.
2017-06-14 18:15:48 +02:00
antirez
e6ae9c9bab Modules TSC: use atomic var for server.unixtime.
This avoids Helgrind complaining, but we are actually not using
atomicGet() to get the unixtime value for now: too many places where it
is used and given tha time_t is word-sized it should be safe in all the
archs we support as it is.

On the other hand, Helgrind, when Redis is compiled with "make helgrind"
in order to force the __sync macros, will detect the write in
updateCachedTime() as a read (because atomic functions are used) and
will not complain about races.

This commit also includes minor refactoring of mutex initializations and
a "helgrind" target in the Makefile.
2017-05-10 10:04:16 +02:00
antirez
1f598fc2bb Modules TSC: use atomic var for server.unixtime.
This avoids Helgrind complaining, but we are actually not using
atomicGet() to get the unixtime value for now: too many places where it
is used and given tha time_t is word-sized it should be safe in all the
archs we support as it is.

On the other hand, Helgrind, when Redis is compiled with "make helgrind"
in order to force the __sync macros, will detect the write in
updateCachedTime() as a read (because atomic functions are used) and
will not complain about races.

This commit also includes minor refactoring of mutex initializations and
a "helgrind" target in the Makefile.
2017-05-10 10:04:16 +02:00
antirez
61eb08813b Modules TSC: Add mutex for server.lruclock.
Only useful for when no atomic builtins are available.
2017-05-09 16:32:49 +02:00
antirez
9390c384b8 Modules TSC: Add mutex for server.lruclock.
Only useful for when no atomic builtins are available.
2017-05-09 16:32:49 +02:00
antirez
42948bc052 Modules TSC: Improve inter-thread synchronization.
More work to do with server.unixtime and similar. Need to write Helgrind
suppression file in order to suppress the valse positives.
2017-05-09 11:57:09 +02:00
antirez
ece658713b Modules TSC: Improve inter-thread synchronization.
More work to do with server.unixtime and similar. Need to write Helgrind
suppression file in order to suppress the valse positives.
2017-05-09 11:57:09 +02:00
antirez
441c323498 Modules TSC: Release the GIL for all the time we are blocked.
Instead of giving the module background operations just a small time to
run in the beforeSleep() function, we can have the lock released for all
the time we are blocked in the multiplexing syscall.
2017-05-03 11:26:21 +02:00
antirez
3fcf959e60 Modules TSC: Release the GIL for all the time we are blocked.
Instead of giving the module background operations just a small time to
run in the beforeSleep() function, we can have the lock released for all
the time we are blocked in the multiplexing syscall.
2017-05-03 11:26:21 +02:00