6264 Commits

Author SHA1 Message Date
antirez
b30e60d06d Fix replication of SLAVEOF inside transaction.
In Redis 4.0 replication, with the introduction of PSYNC2, masters and
slaves replicate commands to cascading slaves and to the replication
backlog itself in a different way compared to the past.

Masters actually replicate the effects of client commands.
Slaves just propagate what they receive from masters.

This mechanism can cause problems when the configuration of an instance
is changed from master to slave inside a transaction. For instance
we could send to a master instance the following sequence:

    MULTI
    SLAVEOF 127.0.0.1 0
    EXEC
    SLAVEOF NO ONE

Before the fixes in this commit, the MULTI command used to be propagated
into the replication backlog, however after the SLAVEOF command the
instance is a slave, so the EXEC implementation failed to also propagate
the EXEC command. When the slaves of the above instance reconnected,
they were incrementally synchronized just sending a "MULTI". This put
the master client (in the slaves) into MULTI state, breaking the
replication.

Notably even Redis Sentinel uses the above approach in order to guarantee
that configuration changes are always performed together with rewrites
of the configuration and with clients disconnection. Sentiel does:

    MULTI
    SLAVEOF ...
    CONFIG REWRITE
    CLIENT KILL TYPE normal
    EXEC

So this was a really problematic issue. However even with the fix in
this commit, that will add the final EXEC to the replication stream in
case the instance was switched from master to slave during the
transaction, the result would be to increment the slave replication
offset, so a successive reconnection with the new master, will not
permit a successful partial resynchronization: no way the new master can
provide us with the backlog needed, we incremented our offset to a value
that the new master cannot have.

However the EXEC implementation waits to emit the MULTI, so that if the
commands inside the transaction actually do not need to be replicated,
no commands propagation happens at all. From multi.c:

    if (!must_propagate && !(c->cmd->flags & (CMD_READONLY|CMD_ADMIN))) {
	execCommandPropagateMulti(c);
	must_propagate = 1;
    }

The above code is already modified by this commit you are reading.
Now also ADMIN commands do not trigger the emission of MULTI. It is actually
not clear why we do not just check for CMD_WRITE... Probably I wrote it this
way in order to make the code more reliable: better to over-emit MULTI
than not emitting it in time.

So this commit should indeed fix issue #3836 (verified), however it looks
like some reconsideration of this code path is needed in the long term.

BONUS POINT: The reverse bug.

Even in a read only slave "B", in a replication setup like:

	A -> B -> C

There are commands without the READONLY nor the ADMIN flag, that are also
not flagged as WRITE commands. An example is just the PING command.

So if we send B the following sequence:

    MULTI
    PING
    SLAVEOF NO ONE
    EXEC

The result will be the reverse bug, where only EXEC is emitted, but not the
previous MULTI. However this apparently does not create problems in practice
but it is yet another acknowledge of the fact some work is needed here
in order to make this code path less surprising.

Note that there are many different approaches we could follow. For instance
MULTI/EXEC blocks containing administrative commands may be allowed ONLY
if all the commands are administrative ones, otherwise they could be
denined. When allowed, the commands could simply never be replicated at all.
2017-07-12 11:07:28 +02:00
antirez
e1a94c0dde CLUSTER GETKEYSINSLOT: avoid overallocating.
Close #3911.
2017-07-11 15:49:09 +02:00
antirez
585cdf3a8f Fix isHLLObjectOrReply() to handle integer encoded strings.
Close #3766.
2017-07-11 12:44:59 +02:00
antirez
0a2b3c6dac Clients blocked in modules: free argv/argc later.
See issue #3844 for more information.
2017-07-11 12:33:01 +02:00
antirez
07fe4125fc Merge branch 'unstable' of github.com:/antirez/redis into unstable 2017-07-11 09:46:58 +02:00
antirez
f2d0e938c9 Event loop: call after sleep() only from top level.
In general we do not want before/after sleep() callbacks to be called
when we re-enter the event loop, since those calls are only designed in
order to perform operations every main iteration of the event loop, and
re-entering is often just a way to incrementally serve clietns with
error messages or other auxiliary operations. However, if we call the
callbacks, we are then forced to think at before/after sleep callbacks
as re-entrant, which is much harder without any good need.

However here there was also a clear bug: beforeSleep() was actually
never called when re-entering the event loop. But the new afterSleep()
callback was. This is broken and in this instance re-entering
afterSleep() caused a modules GIL dead lock.
2017-07-11 00:13:52 +02:00
Salvatore Sanfilippo
9357229775 Merge pull request #4113 from guybe7/module_io_bytes
Modules: Fix io->bytes calculation in RDB save
2017-07-10 19:14:34 +02:00
antirez
3c3db5589a redis-check-aof: tell users there is a --fix option. 2017-07-10 16:41:25 +02:00
Guy Benoish
eda1c9e6f6 Modules: Fix io->bytes calculation in RDB save 2017-07-10 14:41:57 +03:00
antirez
1aab881fe1 AOF check utility: ability to check files with RDB preamble. 2017-07-10 13:38:23 +02:00
Salvatore Sanfilippo
a929e560d0 Merge pull request #3853 from itamarhaber/issue-3851
Sets up fake client to select current db in RM_Call()
2017-07-06 15:02:11 +02:00
Salvatore Sanfilippo
d8a92f438f Merge pull request #4105 from spinlock/unstable-networking
Optimize addReplyBulkSds for better performance
2017-07-06 14:31:08 +02:00
Salvatore Sanfilippo
9021d1fca1 Merge pull request #4106 from petersunbag/unstable
minor fix in listJoin().
2017-07-06 14:29:37 +02:00
sunweinan
2bf10ee594 minor fix in listJoin(). 2017-07-06 19:47:21 +08:00
antirez
4e0dab2c26 Free IO context if any in RDB loading code.
Thanks to @oranagra for spotting this bug.
2017-07-06 11:20:49 +02:00
antirez
45c2679529 Modules: DEBUG DIGEST interface. 2017-07-06 11:04:46 +02:00
spinlock
1fd461b187 update Makefile for test-sds 2017-07-05 14:32:09 +00:00
spinlock
5fe9e034ff Optimize addReplyBulkSds for better performance 2017-07-05 14:25:05 +00:00
antirez
b1375dd083 Avoid closing invalid FDs to make Valgrind happier. 2017-07-05 15:40:25 +02:00
antirez
a2a406bfef Modules: no MULTI/EXEC for commands replicated from async contexts.
They are technically like commands executed from external clients one
after the other, and do not constitute a single atomic entity.
2017-07-05 10:10:20 +02:00
Salvatore Sanfilippo
abb59a91b8 Merge pull request #4101 from dvirsky/fix_modules_reply_len
Proposed fix to #4100
2017-07-04 12:01:51 +02:00
antirez
c4e9d437b8 Add symmetrical assertion to track c->reply_buffer infinite growth.
Redis clients need to have an instantaneous idea of the amount of memory
they are consuming (if the number is not exact should at least be
proportional to the actual memory usage). We do that adding and
subtracting the SDS length when pushing / popping from the client->reply
list. However it is quite simple to add bugs in such a setup, by not
taking the objects in the list and the count in sync. For such reason,
Redis has an assertion to track counts near 2^64: those are always the
result of the counter wrapping around because we subtract more than we
add. This commit adds the symmetrical assertion: when the list is empty
since we sent everything, the reply_bytes count should be zero. Thanks
to the new assertion it should be simple to also detect the other
problem, where the count slowly increases because of over-counting.
The assertion adds a conditional in the code that sends the buffer to
the socket but should not create any measurable performance slowdown,
listLength() just accesses a structure field, and this code path is
totally dominated by write(2).

Related to #4100.
2017-07-04 11:55:05 +02:00
Dvir Volk
be7ce4a1e5 fixed #4100 2017-07-04 00:02:19 +03:00
antirez
041d783b19 Fix GEORADIUS edge case with huge radius.
This commit closes issue #3698, at least for now, since the root cause
was not fixed: the bounding box function, for huge radiuses, does not
return a correct bounding box, there are points still within the radius
that are left outside.

So when using GEORADIUS queries with radiuses in the order of 5000 km or
more, it was possible to see, at the edge of the area, certain points
not correctly reported.

Because the bounding box for now was used just as an optimization, and
such huge radiuses are not common, for now the optimization is just
switched off when the radius is near such magnitude.

Three test cases found by the Continuous Integration test were added, so
that we can easily trigger the bug again, both for regression testing
and in order to properly fix it as some point in the future.
2017-07-03 19:38:31 +02:00
antirez
bbce753f0a redis-cli --latency: ability to run non interactively.
This feature was proposed by @rosmo in PR #2643 and later redesigned
in order to fit better with the other options for non-interactive modes
of redis-cli. The idea is basically to allow to collect latency
information in scripts, cron jobs or whateever, just running for a
limited time and then producing a single output.
2017-06-30 15:41:58 +02:00
antirez
4d519b35f3 Fix abort typo in Lua debugger help screen. 2017-06-30 12:12:00 +02:00
antirez
a9fce9e530 Added GEORADIUS(BYMEMBER)_RO variants for read-only operations.
Issue #4084 shows how for a design error, GEORADIUS is a write command
because of the STORE option. Because of this it does not work
on readonly slaves, gets redirected to masters in Redis Cluster even
when the connection is in READONLY mode and so forth.

To break backward compatibility at this stage, with Redis 4.0 to be in
advanced RC state, is problematic for the user base. The API can be
fixed into the unstable branch soon if we'll decide to do so in order to
be more consistent, and reease Redis 5.0 with this incompatibility in
the future. This is still unclear.

However, the ability to scale GEO queries in slaves easily is too
important so this commit adds two read-only variants to the GEORADIUS
and GEORADIUSBYMEMBER command: GEORADIUS_RO and GEORADIUSBYMEMBER_RO.
The commands are exactly as the original commands, but they do not
accept the STORE and STOREDIST options.
2017-06-30 10:03:37 +02:00
antirez
d685c6d066 HMSET and MSET implementations unified. HSET now variadic.
This is the first step towards getting rid of HMSET which is a command
that does not make much sense once HSET is variadic, and has a saner
return value.
2017-06-29 17:38:46 +02:00
Salvatore Sanfilippo
bf02457f23 Merge pull request #4075 from sgn1/brpop_keys
Fix Issues in blocking commands in cluster mode.
2017-06-27 17:51:19 +02:00
antirez
2a26f2a9f6 RDB modules values serialization format version 2.
The original RDB serialization format was not parsable without the
module loaded, becuase the structure was managed only by the module
itself. Moreover RDB is a streaming protocol in the sense that it is
both produce di an append-only fashion, and is also sometimes directly
sent to the socket (in the case of diskless replication).

The fact that modules values cannot be parsed without the relevant
module loaded is a problem in many ways: RDB checking tools must have
loaded modules even for doing things not involving the value at all,
like splitting an RDB into N RDBs by key or alike, or just checking the
RDB for sanity.

In theory module values could be just a blob of data with a prefixed
length in order for us to be able to skip it. However prefixing the values
with a length would mean one of the following:

1. To be able to write some data at a previous offset. This breaks
stremaing.
2. To bufferize values before outputting them. This breaks performances.
3. To have some chunked RDB output format. This breaks simplicity.

Moreover, the above solution, still makes module values a totally opaque
matter, with the fowllowing problems:

1. The RDB check tool can just skip the value without being able to at
least check the general structure. For datasets composed mostly of
modules values this means to just check the outer level of the RDB not
actually doing any checko on most of the data itself.
2. It is not possible to do any recovering or processing of data for which a
module no longer exists in the future, or is unknown.

So this commit implements a different solution. The modules RDB
serialization API is composed if well defined calls to store integers,
floats, doubles or strings. After this commit, the parts generated by
the module API have a one-byte prefix for each of the above emitted
parts, and there is a final EOF byte as well. So even if we don't know
exactly how to interpret a module value, we can always parse it at an
high level, check the overall structure, understand the types used to
store the information, and easily skip the whole value.

The change is backward compatible: older RDB files can be still loaded
since the new encoding has a new RDB type: MODULE_2 (of value 7).
The commit also implements the ability to check RDB files for sanity
taking advantage of the new feature.
2017-06-27 13:19:16 +02:00
antirez
0dae486eaa ARM: Fix stack trace generation on crash. 2017-06-26 10:36:16 +02:00
antirez
44a9c2335a Issue #4027: unify comment and modify return value in freeMemoryIfNeeded().
It looks safer to return C_OK from freeMemoryIfNeeded() when clients are
paused because returning C_ERR may prevent success of writes. It is
possible that there is no difference in practice since clients cannot
execute writes while clients are paused, but it looks more correct this
way, at least conceptually.

Related to PR #4028.
2017-06-23 11:42:25 +02:00
Salvatore Sanfilippo
451e0f2874 Merge pull request #4028 from zintrepid/prevent_expirations_while_paused
Prevent expirations and evictions while paused
2017-06-23 11:39:02 +02:00
Suraj Narkhede
f00b0e89ae Fix following issues in blocking commands:
1. brpop last key index, thus checking all keys for slots.
2. Memory leak in clusterRedirectBlockedClientIfNeeded.
3. Remove while loop in clusterRedirectBlockedClientIfNeeded.
2017-06-23 00:30:21 -07:00
Suraj Narkhede
3e8b148cad Fix brpop command table entry and redirect blocked clients. 2017-06-22 23:52:00 -07:00
antirez
5eefa529e1 Aesthetic changes to #4068 PR to conform to Redis coding standard.
1. Inline if ... statement if short.
2. No lines over 80 columns.
2017-06-22 11:00:34 +02:00
Salvatore Sanfilippo
caff201dfa Merge pull request #4068 from FreedomU007/unstable
Fix set with ex/px option when propagated to aof
2017-06-22 10:46:58 +02:00
xuzhou
1ec6363536 Optimize set command with ex/px when updating aof. 2017-06-22 11:06:40 +08:00
Salvatore Sanfilippo
63ac7012dd Merge pull request #3802 from flowly/bugfix-calc-stat-net-output-bytes
Bugfix calc stat net output bytes
2017-06-20 17:01:16 +02:00
Salvatore Sanfilippo
570f489683 Merge pull request #4056 from season89/unstable
Fixed comments of slowlog duration
2017-06-20 16:55:29 +02:00
Salvatore Sanfilippo
c1aec3efcd Merge pull request #3659 from cbgbt/cli-elapsed
cli: Only print elapsed time on OUTPUT_STANDARD.
2017-06-20 16:53:56 +02:00
Salvatore Sanfilippo
7901200e6a Merge pull request #4062 from concreted/patch-1
(fix) Update create-cluster README
2017-06-20 16:41:10 +02:00
antirez
078cb38a81 redis-benchmark: add -t hset target. 2017-06-19 09:41:11 +02:00
Aric Huang
143c09f422 (fix) Update create-cluster README
Fix a few typos/adjust wording in `create-cluster` README
2017-06-16 16:10:00 -07:00
xuzhou
3436bcf681 Fix set with ex/px option when propagated to aof 2017-06-16 17:51:38 +08:00
antirez
08536da477 SLOWLOG: log offending client address and name. 2017-06-15 12:57:54 +02:00
antirez
efff456faa Merge branch 'unstable' of github.com:/antirez/redis into unstable 2017-06-14 18:29:53 +02:00
Qu Chen
3abc3b687a Implement getKeys procedure for georadius and georadiusbymember
commands.
2017-06-14 18:15:48 +02:00
xuchengxuan
86fcfe9dc7 Fixed comments of slowlog duration 2017-06-14 16:42:21 +08:00
Salvatore Sanfilippo
7c64bd50a7 Merge pull request #4034 from amallia/patch-1
Fixed comment in clusterMsg version field
2017-06-13 06:28:23 -07:00