541 Commits

Author SHA1 Message Date
antirez
a0f2b39791 Cluster: connect to our master ASAP after startup if we are a slave node. 2013-03-05 16:12:08 +01:00
antirez
856a3160a9 Cluster: more robust FAIL flag cleaup.
If we have a master in FAIL state that's reachable again, and apparently
no one is going to serve its slots, clear the FAIL flag and let the
cluster continue with its operations again.
2013-03-05 15:05:32 +01:00
antirez
24aa7a566e Cluster: new node field fail_time.
This is the unix time at which we set the FAIL flag for the node.
It is only valid if FAIL is set.

The idea is to use it in order to make the cluster more robust, for
instance in order to revert a FAIL state if it is long-standing but
still slots are assigned to this node, that is, no one is going to fix
these slots apparently.
2013-03-05 13:15:05 +01:00
antirez
a06092c576 Cluster: A comment updated in clusterCron(). 2013-03-05 12:17:30 +01:00
antirez
de720e4f0a Cluster: send a ping to every node we never contacted in timeout/2 seconds.
Usually we try to send just 1 ping every second, however when we detect
we are going to have unreliable failure detection because we can't ping
some node in time, send an additional ping.

This should only happen with very large clusters or when the the node
timeout is set to a very low value.
2013-03-05 12:16:02 +01:00
antirez
401893bc38 Cluster: set node->slaveof correctly when a node state is updated. 2013-03-05 11:50:11 +01:00
antirez
8eb9c13dcf Cluster: don't perform startup slots sanity check for slaves.
If we are a cluster node the DB content will not match our configured
slots. Don't do the check at all.
2013-03-04 19:47:00 +01:00
antirez
63cac938b3 Cluster: fix maximum line length when loading config.
There are pathological cases where the line can be even longer a single
node may contain all the slots in importing/migrating state.
2013-03-04 19:45:36 +01:00
antirez
9e34d02450 Cluster: actually setup replication in CLUSTER REPLICATE. 2013-03-04 15:27:58 +01:00
antirez
e6da336d19 Cluster: REPLICATE subcommand and stub for clusterSetMaster(). 2013-03-04 13:15:09 +01:00
charsyam
466e2de67d adding check error code
adding check error code
2013-03-04 11:20:11 +01:00
antirez
19833a01f2 Cluster: don't set the slot as unassigned because of PONG info.
As stated in the comment this is usually due to a resharding in progress
so the client should be still redirected to the old node that will
handle the redirection elsewhere.
2013-02-28 15:54:29 +01:00
antirez
de254ffdb4 Cluster: better handling of slots changes in PONG packets.
The new code makes sure that the node slots bitmap is always consistent
with the cluster->slots array.
2013-02-28 15:41:54 +01:00
antirez
4d353074ba Cluster: refactoring of clusterNode*Bit to use helper bitmap functions. 2013-02-28 15:23:09 +01:00
antirez
a535d4b477 Cluster: use node->numslots instead of popcount() where possible. 2013-02-28 15:13:32 +01:00
antirez
7348dbc52c Cluster: new field in cluster node structure, "numslots".
Before a relatively slow popcount() operation was needed every time we
needed to get the number of slots served by a given cluster node.
Now we just need to check an integer that is taken in sync with the
bitmap.
2013-02-28 15:11:05 +01:00
antirez
bee0907fa4 Cluster: don't gossip about nodes that are not useful to the cluster. 2013-02-28 15:00:09 +01:00
antirez
1faf992b16 Cluster: CLUSTER FORGET implemented. 2013-02-27 17:55:59 +01:00
antirez
fae00c9998 Cluster: added a missing return on CLUSTER SETSLOT. 2013-02-27 17:53:48 +01:00
antirez
1841332778 Cluster: blank node address when flagging it as NOADDR. 2013-02-27 17:09:33 +01:00
antirez
48718d5dad Cluster: add comments in sub-sections of CLUSTER command. 2013-02-27 16:12:59 +01:00
antirez
646785ae48 Use GCC printf format attribute for redisLog().
This commit also fixes redisLog() statements producing warnings.
2013-02-27 12:27:15 +01:00
antirez
a63b1196e0 Cluster: a few random fixes to the new failure detection. 2013-02-26 15:15:44 +01:00
antirez
2dec530e15 Cluster: log the event when we clear the FAIL flag. 2013-02-26 15:03:38 +01:00
antirez
79212927fb Cluster: use the failure report API to reimplement failure detection.
The new system detects a failure only when there is quorum from masters.
2013-02-26 14:58:39 +01:00
antirez
7ae78905f9 Cluster: invert two functions declarations in more natural order. 2013-02-26 11:19:48 +01:00
antirez
3f8fa86902 Cluster: cleanup idle failure reports every time we remove one.
This is not very important as anyway when the function counting the
number of reports is called the cleanup is performed. However with this
change if only part of the nodes that reported the failure will report
the node is back ok, we'll cleanup the older entries ASAP. In complex
split net split scenarios, and when we are dealing with clusters having
nodes in the order of ~ 1000, this can save some CPU.
2013-02-26 11:15:18 +01:00
antirez
d4b7d73115 Cluster: new function clusterNodeDelFailureReport() for failure reports.
This is the missing part of the API that will be used to reimplement
failure detection of Cluster nodes.
2013-02-25 19:13:22 +01:00
antirez
0b25267f52 Cluster: no limits for the count parameter of CLUSTER GETKEYSINSLOT.
Not sure why I set a limit to 1 million keys, there is no reason for
this artificial limit, and anyway this is s a stupid limit because it is
already high enough to create latency issues. So let's the users shoot
on their feet because maybe they just actually know what they are doing.
2013-02-25 12:41:13 +01:00
antirez
4b019d22cd Cluster: validate slot number in CLUSTER COUNTKEYSINSLOT. 2013-02-25 12:40:32 +01:00
antirez
5b1e7d3849 Cluster: new sub-command CLUSTER COUNTKEYSINSLOT.
The new sub-command uses the new countKeysInSlot() API and allows a
cluster client to get the number of keys for a given hashslot.
2013-02-25 12:04:31 +01:00
antirez
1f51d687dd Cluster: verifyClusterConfigWithData() implemented. 2013-02-25 11:43:49 +01:00
antirez
301f162a2d Cluster: fix case for getKeysInSlot() and countKeysInSlot().
Redis functions start in low case. A few functions about cluster were
capitalized the wrong way.
2013-02-25 11:25:40 +01:00
antirez
a4b9ffec27 Cluster: use CountKeysInSlot() when we just need the count. 2013-02-25 11:23:04 +01:00
antirez
0b86a5e2d1 Cluster: added stub for verifyClusterConfigWithData().
See the top-comment for the function in this commit for details about
what the function is supposed to do.
2013-02-25 11:20:17 +01:00
antirez
023a57a9d5 Cluster: if no previous config exists, create the myself node as master. 2013-02-22 19:24:01 +01:00
antirez
0c179977d8 Cluster: add cluster_size field in CLUSTER INFO output. 2013-02-22 19:20:38 +01:00
antirez
66cadd3ec9 Cluster: new state information, cluster size.
The definition of cluster size is: the number of known nodes in the
cluster that are masters and serving at least an hash slot.
2013-02-22 19:18:30 +01:00
antirez
6877d08470 Cluster: remove warning adding clusterNodeSetSlotBit() prototype. 2013-02-22 17:45:49 +01:00
antirez
f66a189d4d Cluster: introduced a failure reports system.
A §Redis Cluster node used to mark a node as failing when itself
detected a failure for that node, and a single acknowledge was received
about the possible failure state.

The new API will be used in order to possible to require that N other
nodes have a PFAIL or FAIL state for a given node for a node to set it
as failing.
2013-02-22 17:43:35 +01:00
antirez
491bdb8d61 Cluster: more correct update of slots state when PONG is received. 2013-02-21 16:52:06 +01:00
antirez
e9c34fedfd Call clusterUpdateState() after CLUSTER SETSLOT too. 2013-02-21 16:31:22 +01:00
antirez
4edf3759db Aesthetic change to make a line more 80-cols friendly. 2013-02-21 16:24:48 +01:00
antirez
1455ba6e36 Cluster: clusterAddSlot() was not doing what stated in the comment. 2013-02-21 11:51:17 +01:00
antirez
0b0a8884fc Cluster: always use cluster(Add|Del)Slot to modify the cluster slots table. 2013-02-21 11:44:58 +01:00
antirez
1e2a5d119c Use RESTORE-ASKING for MIGRATE when in cluster mode. 2013-02-20 17:36:54 +01:00
antirez
a9624604ed Cluster: new command flag forcing implicit ASKING.
Also using this new flag the RESTORE-ASKING command was implemented that
will be used by MIGRATE.
2013-02-20 17:28:35 +01:00
antirez
f6a8ba8de5 Cluster: I/O errors are now logged at DEBUG level. 2013-02-20 13:18:51 +01:00
antirez
79f3570266 Cluster: sanity checks on the cluster bus message length. 2013-02-15 16:44:39 +01:00
antirez
47838ff689 Cluster: make valgrind happy initializing all the bytes of the node IP. 2013-02-15 12:58:35 +01:00