583 Commits

Author SHA1 Message Date
Matt Stancliff
8958c39e71 Improve networking type correctness
read() and write() return ssize_t (signed long), not int.

For other offsets, we can use the unsigned size_t type instead
of a signed offset (since our replication offsets and buffer
positions are never negative).
2015-01-19 14:10:12 -05:00
antirez
1b004d62a0 Cluster: fetch my IP even if msg is not MEET for the first time.
In order to avoid that misconfigured cluster nodes at some time may
force an IP update on other nodes, it is required that nodes update
their own address only on MEET messages. However it does not make sense
to do this the first time a node is contacted and yet does not have an
IP, we just risk that myself->ip remains not assigned if there are
messages lost or cluster creation procedures that don't make sure
everybody is targeted by at least one incoming MEET message.

Also fix the logging of the IP switch avoiding the :-1 tail.
2015-01-13 10:50:34 +01:00
antirez
7e4233e3f7 Cluster: clusterMsgDataGossip structure, explict padding + minor stuff.
Also explicitly set version to 0, add a protocol version define, improve
comments in the gossip structure.

Note that the structure layout is the same after the change, we are just
making the padding explicit with an additional not used 16 bits field.
So this commit is still able to talk with the previous versions of
cluster nodes.
2015-01-13 10:40:09 +01:00
antirez
a35c89f3d3 Suppress valgrind error about write sending uninitialized data.
Valgrind checks that the buffers we transfer via syscalls are all
composed of bytes actually initialized. This is useful, it makes we able
to avoid leaking informations in non initialized parts fo messages
transferred to other hosts. This commit fixes one of such issues.
2015-01-13 09:31:37 +01:00
antirez
a4e2c9fe3d Cluster: initialize mf_end.
Can't be initialized by resetManualFailover() since it's actual state
the function uses, so we need to initialize it at startup time. Not
really a bug in practical terms, but showed up into valgrind and is not
technically correct anyway.
2015-01-12 15:55:00 +01:00
Matt Stancliff
87d6324607 Add addReplyBulkSds() function
Refactor a common pattern into one function so we don't
end up with copy/paste programming.
2014-12-23 09:31:02 -05:00
Matt Stancliff
7da577fd8b Cluster: Notify user on accept error
If we woke up to accept a connection, but we can't
accept it, inform the user of the error going on
with their networking.

(The previous message was the same for success or error!)
2014-12-17 10:49:32 -05:00
antirez
355ab9ccc0 Fix comment in clusterHandleSlaveFailover(). 2014-12-16 15:03:12 +01:00
antirez
655c0b46ff Make sure buffer is enough in clusterSendPing(). 2014-12-15 10:18:22 +01:00
antirez
3d476bf2b6 AnetFormatIP(): renamed, commented, now sticks to IP:port format.
A few code style changes + consistent format: not nice for humans but
better for parsers.
2014-12-11 18:20:30 +01:00
Matt Stancliff
f7a98bdf4d Cleanup all IP formatting code
Instead of manually checking for strchr(n,':') everywhere,
we can use our new centralized IP formatting functions.
2014-12-11 10:12:18 -05:00
antirez
5b6e4e866d Better read-only behavior for expired keys in slaves.
Slaves key expire is orchestrated by the master. Sometimes the master
will send the synthesized DEL to expire keys on the slave with a non
trivial delay (when the key is not accessed, only the incremental expiry
algorithm will expire it in background).

During that time, a key is logically expired, but slaves still return
the key if you GET (or whatever) it. This is a bad behavior.

However we can't simply trust the slave view of the key, since we need
the master to be able to send write commands to update the slave data
set, and DELs should only happen when the key is expired in the master
in order to ensure consistency.

However 99.99% of the issues with this behavior is when a client which
is not a master sends a read only command. In this case we are safe and
can consider the key as non existing.

This commit does a few changes in order to make this sane:

1. lookupKeyRead() is modified in order to return NULL if the above
conditions are met.
2. Calls to lookupKeyRead() in commands actually writing to the data set
are repliaced with calls to lookupKeyWrite().

There are redundand checks, so for example, if in "2" something was
overlooked, we should be still safe, since anyway, when the master
writes the behavior is to don't care about what expireIfneeded()
returns.

This commit is related to  #1768, #1770, #2131.
2014-12-10 16:10:21 +01:00
antirez
0186fa3aa5 Cluster PUBLISH message: fix totlen count.
bulk_data field size was not removed from the count. It is not possible
to declare it simply as 'char bulk_data[]' since the structure is nested
into another structure.
2014-11-28 10:21:47 +01:00
Salvatore Sanfilippo
3761343e84 Merge pull request #2096 from mattsta/cluster-ipv6
Enable Cluster IPv6 Support
2014-10-31 10:38:22 +01:00
Matt Stancliff
2dbdfa708d Networking: add more outbound IP binding fixes
Same as the original bind fixes (we just missed these the
first time around).

This helps Redis not automatically send
connections from the first IP on an interface if we are bound
to a specific IP address (e.g. with multiple IP aliases on one
interface, you want to send from _your_ IP, not from the first IP
on the interface).
2014-10-29 15:09:09 -04:00
Matt Stancliff
996e8dade4 Parse cluster state file in IPv6 compatible way
We need to pick the port based on the _last_ colon, not the first one.
2014-10-29 15:08:35 -04:00
antirez
d91fe789ef Cluster: process gossip section only for known nodes.
With the exception of nodes sending MEET packets: we have to trust them
since they can send us MEET packets only when the cluster is initially
created or because sysadmin manual action.
2014-10-08 16:58:12 +02:00
antirez
724d69e7cf Cluster: fix logic to detect we are among a minority.
In the cluster evaluation function we are supposed to set the cluster
state as "fail" if we are among a minority, however the code was not
detecting to be into a minority partition if exactly half the masters
were reachable, which is a minority.
2014-10-08 16:27:07 +02:00
antirez
1730b49ad7 Cluster: more chatty slaves when failover is stalled. 2014-10-07 09:51:55 +02:00
Matt Stancliff
299c667adb Clean up text throughout project
- Remove trailing newlines from redis.conf
  - Fix comment misspelling
  - Clarifies zipEncodeLength usage and a C API mention (#1243, #1242)
  - Fix cluster typos (inspired by @papanikge #1507)
  - Fix rewite -> rewrite in a few places (inspired by #682)

Closes #1243, #1242, #1507
2014-09-29 06:49:07 -04:00
antirez
dd5d974e57 Cluster: claim ping_sent time even if we can't connect.
This fixes a potential bug that was never observed in practice since
what happens is that the asynchronous connect returns ok (to fail later,
calling the handler) every time, so a ping is queued, and sent_ping
happens to always be populated.

Howver technically connect(2) with a non blocking socket may return an
error synchronously, so before this fix the code was not correct.
2014-09-17 16:39:41 +02:00
antirez
aea347d60c Cluster: new option to work with partial slots coverage. 2014-09-17 11:10:09 +02:00
Matt Stancliff
1832617d46 Cluster: Fix segfault if cluster config corrupt
This commit adds a size check after initial config
line parsing to make sure we have *at least* 8 arguments
per line.

Also, instead of asserting for cluster->myself, we just test
and error out normally (since the error does a hard exit anyway).

Closes #1597
2014-08-25 10:11:38 +02:00
Matt Stancliff
2d5db1bbd1 Fix memory leak in cluster config parsing
The continue stop us from triggering the
free after the long line for loop, so add it
earlier.
2014-08-18 11:27:19 +02:00
Matt Stancliff
ac427859ff Clarify existing slot wording on cluster start 2014-08-18 10:58:00 +02:00
antirez
2e94ffb1d1 Remove warnings and improve integer sign correctness. 2014-08-13 11:44:38 +02:00
antirez
29b38e9901 representRedisNodeFlags() moved into right code section.
The funciton was also modified in order to be more standalone and
produce an output without trailing spaces, making the reuse simpler.
The global variable was renamed in cammel case as most other Redis
globals, except the main ones we refer too many times, like 'server'.
2014-08-08 15:53:42 +02:00
charsyam
40a90bfd7b Refactor cluster flag printing
Less copy/paste code duplication.

Closes #952
2014-08-08 15:39:44 +02:00
SungBin_Hong
9a8653e1e1 Free memory in clusterLoadConfig error handler
Closes #1327
2014-08-08 14:40:32 +02:00
antirez
8abda6a835 Cluster: don't migrate to a master that never had slaves.
Replica migration algorithm modified so that slaves never try to migrate
to masters that were never configured to have slaves in the past.
We want the algorithm to take care of masters that remained without
*working* slaves, but that used to have slaves according to the cluster
configuration.
2014-07-25 11:02:09 +02:00
antirez
f5e4a3583d CLUSTER RESET: Flush dataset if node is a slave.
For non-empty masters, CLUSTER RESET is denied, and the user requires to
start to reset a node by explicitly clearing it with FLUSHALL.
However CLUSTER RESET when executed with slaves don't have this
restrictions since data is just a replica of the master, and with
read-only slaves it is also not possible to remove the data set. However
the node was turned from slave to master after a reset, without touching
the old slave data. This is 99.99% of times not appropriate and forces
full resets to follow this path to work with both slave and master
nodes:

    FLUSHALL
    CLUSTER RESET HARD
    FLUSHALL

Since we need the first flushall for masters, and the second for slaves.

This commit changes the behavior so that CLUSTER RESET removes the data set
of a slave node during a reset, in the moment it gets turned into a master,
so the new pattern is simply:

    FLUSHALL (that may fail for slaves)
    CLUSTER RESET
2014-07-22 15:29:57 +02:00
antirez
1c94889182 No more trailing spaces in Redis source code. 2014-06-26 18:48:40 +02:00
antirez
c532d3a603 CLUSTER SLOTS: don't output failing slaves.
While we have to output failing masters in order to provide an accurate
map (that may be the one of a Redis Cluster in down state because not
all slots are served by a working master), to provide slaves in FAIL
state is not a good idea since those are not necesarely needed, and the
client will likely incur into a latency penalty trying to connect with a
slave which is down.

Note that this means that CLUSTER SLOTS does not provide a *complete*
map of slaves, however this would not be of any help since slaves may be
added later, and a client that needs to scale reads and requires to
stay updated with the list of slaves, need to do a refresh of the map
from time to time, anyway.
2014-06-25 15:19:35 +02:00
antirez
7a35c94ba4 CLUSTER SLOTS: main loop should skip only slaves and zero slot masters. 2014-06-25 15:08:33 +02:00
Matt Stancliff
25539d9a7c Cluster: Add CLUSTER SLOTS command
CLUSTER SLOTS returns a Redis-formatted mapping from
slot ranges to IP/Port pairs serving that slot range.

The outer return elements group return values by slot ranges.

The first two entires in each result are the min and max slots for the range.

The third entry in each result is guaranteed to be either
an IP/Port of the master for that slot range - OR - null
if that slot range, for some reason, has no master

The 4th and higher entries in each result are replica instances
for the slot range.

Output comparison:
127.0.0.1:7001> cluster nodes
f853501ec8ae1618df0e0f0e86fd7abcfca36207 127.0.0.1:7001 myself,master - 0 0 2 connected 4096-8191
5a2caa782042187277647661ffc5da739b3e0805 127.0.0.1:7005 slave f853501ec8ae1618df0e0f0e86fd7abcfca36207 0 1402622415859 6 connected
6c70b49813e2ffc9dd4b8ec1e108276566fcf59f 127.0.0.1:7007 slave 26f4729ca0a5a992822667fc16b5220b13368f32 0 1402622415357 8 connected
2bd5a0e3bb7afb2b56a2120d3fef2f2e4333de1d 127.0.0.1:7006 slave 32adf4b8474fdc938189dba00dc8ed60ce635b0f 0 1402622419373 7 connected
5a9450e8279df36ff8e6bb1c139ce4d5268d1390 127.0.0.1:7000 master - 0 1402622418872 1 connected 0-4095
32adf4b8474fdc938189dba00dc8ed60ce635b0f 127.0.0.1:7002 master - 0 1402622419874 3 connected 8192-12287
5db7d05c245267afdfe48c83e7de899348d2bdb6 127.0.0.1:7004 slave 5a9450e8279df36ff8e6bb1c139ce4d5268d1390 0 1402622417867 5 connected
26f4729ca0a5a992822667fc16b5220b13368f32 127.0.0.1:7003 master - 0 1402622420877 4 connected 12288-16383

127.0.0.1:7001> cluster slots
1) 1) (integer) 0
   2) (integer) 4095
   3) 1) "127.0.0.1"
      2) (integer) 7000
   4) 1) "127.0.0.1"
      2) (integer) 7004
2) 1) (integer) 12288
   2) (integer) 16383
   3) 1) "127.0.0.1"
      2) (integer) 7003
   4) 1) "127.0.0.1"
      2) (integer) 7007
3) 1) (integer) 4096
   2) (integer) 8191
   3) 1) "127.0.0.1"
      2) (integer) 7001
   4) 1) "127.0.0.1"
      2) (integer) 7005
4) 1) (integer) 8192
   2) (integer) 12287
   3) 1) "127.0.0.1"
      2) (integer) 7002
   4) 1) "127.0.0.1"
      2) (integer) 7006
2014-06-25 15:03:41 +02:00
antirez
8f153d6e28 Cluster: myself->ip autodiscovery.
Instead of having an hardcoded IP address in the node configuration, we
autodiscover it via MEET messages for automatic update when the node is
restarted with a different IP address.

This mechanism was discussed in the context of PR #1782.
2014-06-25 11:28:57 +02:00
Matt Stancliff
102d813f73 Add REDIS_BIND_ADDR access macro
We need to access (bindaddr[0] || NULL) in a few places, so centralize
access with a nice macro.
2014-06-23 11:44:34 +02:00
antirez
cf8a6efdd4 Cluster: clear NOADDR flag when updating node address. 2014-06-20 09:32:47 +02:00
antirez
613be787b6 Cluster: fix an error message when logging failover auth denied. 2014-06-10 17:39:42 +02:00
antirez
9701c7dd75 Cluster: better comment for clusterSendFailoverAuthIfNeeded() epoch test. 2014-06-10 17:20:21 +02:00
antirez
b10f3f08a6 Cluster: log granted failover authorizations. 2014-06-10 16:56:08 +02:00
antirez
1074266da1 Cluster: log configEpoch updates to myself. 2014-06-10 16:38:36 +02:00
antirez
d71f39115f Cluster: log when a master denies a failover auth. 2014-06-10 16:07:26 +02:00
antirez
bc552a87e7 Cluster: cluster_my_epoch added to CLUSTER INFO output. 2014-06-10 11:35:40 +02:00
antirez
cff2c3661a Cluster: check that configEpoch never goes back.
Since there are ways to alter the configEpoch outside of the failover
procedure (for exampel CLUSTER SET-CONFIG-EPOCH and via the configEpoch
collision resolution algorithm), make always sure, before replacing our
configEpoch with a new one, that it is greater than the current one.
2014-06-07 14:37:09 +02:00
antirez
b732d42a80 Cluster: SET-CONFIG-EPOCH should update currentEpoch.
SET-CONFIG-EPOCH, used by redis-trib at cluster creation time, failed to
update the currentEpoch, making it possible after a failover for a
server to set its configEpoch to a value smaller than the current one
(since configEpochs are obtained using currentEpoch).

The bug totally break the Redis Cluster algorithms and protocols
allowing for permanent split brain conditions about the slots
configuration as shown in issue #1799.
2014-06-07 14:25:47 +02:00
antirez
2f6c99fc22 Cluster: always allow ok -> fail switch in clusterUpdateState().
There is a time defined by REDIS_CLUSTER_WRITABLE_DELAY where fail -> ok
switch is not possible after startup as a master for some time, however
the contrary (ok -> fail) should always be possible.
2014-05-26 16:24:12 +02:00
antirez
aec8e92316 Cluster: slave validity factor is now user configurable.
Check the commit changes in the example redis.conf for more information.
2014-05-22 16:57:54 +02:00
antirez
352f4fbbd5 Cluster: use clusterSetNodeAsMaster() during slave failover.
clusterHandleSlaveFailover() was reimplementing what
clusterSetNodeAsMaster() without any good reason.
2014-05-15 17:03:28 +02:00
antirez
b1d19fd6e6 Cluster: clear todo_before_sleep flags when executing actions.
Thanks to this change, when there is some code like:

    clusterDoBeforeSleep(CLUSTER_TODO_UPDATE_STATE|...);
    ... and later before returning to the event loop ...
    clusterUpdateState();

The clusterUpdateState() function will clar the flag and will not be
repeated in the clusterBeforeSleep() function. This especially important
for config save/fsync flags which are slow to execute and not a good
idea to repeat without a good reason.

This is implemented for all the CLUSTER_TODO flags.
2014-05-15 16:33:13 +02:00