2882 Commits

Author SHA1 Message Date
antirez
16bc1ae5f4 Sentinel: failover restart time is now multiple of failover timeout.
Also defaulf failover timeout changed to 3 minutes as the failover is a
fairly fast procedure most of the times, unless there are a very big
number of slaves and the user picked to configure them sequentially (in
that case the user should change the failover timeout accordingly).
2013-11-18 11:30:08 +01:00
antirez
7b7763ff3e Sentinel: state machine and timeouts simplified. 2013-11-18 11:12:58 +01:00
antirez
00cad98228 Sentinel: election timeout define. 2013-11-18 10:08:06 +01:00
antirez
297df0e910 Sentinel: fix address of master in Hello messages.
Once we switched configuration during a failover, we should advertise
the new address.

This was a serious race condition as the Sentinel performing the
failover for a moment advertised the old address with the new
configuration epoch: once trasmitted to the other Sentinels the broken
configuration would remain there forever, until the next failover
(because a greater configuration epoch is required to overwrite an older
one).
2013-11-14 10:25:55 +01:00
antirez
a031dc6efe Sentinel: master address selection in get-master-address refactored. 2013-11-14 10:23:54 +01:00
antirez
d1864486fc Sentinel: fix conditional to only affect slaves with wrong master. 2013-11-14 10:23:05 +01:00
antirez
f4216da4f1 Sentinel: simplify and refactor slave reconfig code. 2013-11-14 00:36:43 +01:00
antirez
5ceb9dc93e Sentinel: reconfigure slaves to right master. 2013-11-14 00:29:38 +01:00
antirez
e7fb6c5697 Sentinel: remember last time slave changed master. 2013-11-14 00:20:15 +01:00
antirez
4608f5da94 Sentinel: redirect-to-master is not ok with new algorithm.
Now Sentinel believe the current configuration is always the winner and
should be applied by Sentinels instead of trying to adapt our view of
the cluster based on what we observe.

So the only way to modify what a Sentinel believe to be the truth is to
win an election and advertise the new configuration via Pub / Sub with a
greater configuration epoch.
2013-11-13 17:03:48 +01:00
antirez
f81306af33 Sentinel: safer slave reconfig, master reported role should match. 2013-11-13 17:02:09 +01:00
antirez
406fee011d Sentinel: role reporting fixed and added in SENTINEL output. 2013-11-13 16:39:57 +01:00
antirez
cc721419ac Sentinel: being a master and reporting as slave is considered SDOWN. 2013-11-13 16:28:56 +01:00
antirez
b27dd8b9d3 Sentinel: make sure role_reported is always updated. 2013-11-13 16:21:58 +01:00
antirez
72629ad358 Sentinel: track role change time. Wait before reconfigurations. 2013-11-13 16:18:23 +01:00
antirez
e1d06fdeb9 Sentinel: fix no-down check in master->slave conversion code. 2013-11-13 13:43:59 +01:00
antirez
47598f4a88 Sentinel: readd slaves back after a master reset. 2013-11-13 13:01:11 +01:00
antirez
f5b5a8eb9f Sentinel: sentinelResetMaster() new flag to avoid removing set of sentinels.
This commit also removes some dead code and cleanup generic flags.
2013-11-13 10:30:45 +01:00
antirez
9896fd51e0 Sentinel: receive Pub/Sub messages from slaves. 2013-11-12 23:07:33 +01:00
antirez
37203602f9 Sentinel: change event name when converting master to slave. 2013-11-12 23:00:17 +01:00
antirez
4d365131b0 Sentinel: added config-epoch to SENTINEL masters output. 2013-11-12 17:22:04 +01:00
antirez
1e74623446 Sentinel: new failover algo, desync slaves and update config epoch. 2013-11-12 17:07:31 +01:00
antirez
61794132b2 Sentinel: when starting failover seek for votes ASAP. 2013-11-12 16:38:02 +01:00
antirez
d63c15923f Sentinel: +new-epoch events. 2013-11-12 13:35:25 +01:00
antirez
ed01b141e8 Sentinel: wait some time between failover attempts. 2013-11-12 13:30:31 +01:00
antirez
8246e06ee9 Sentinel: allow to vote for myself. 2013-11-12 11:32:40 +01:00
antirez
c53ced91ed Sentinel: fix PUBLISH to masters and slaves. 2013-11-12 11:12:48 +01:00
antirez
0e359cc68e Sentinel: epoch introduced in leader vote. 2013-11-12 11:09:35 +01:00
antirez
6273e87b4a Sentinel: leadership handling changes WIP.
Changes to leadership handling.

Now the leader gets selected by every Sentinel, for a specified epoch,
when the SENTINEL is-master-down-by-addr is sent.

This command now includes the runid and the currentEpoch of the instance
seeking for a vote. The Sentinel only votes a single time in a given
epoch.

Still a work in progress, does not even compile at this stage.
2013-11-11 18:30:14 +01:00
antirez
e46942e4b0 Sentinel: handle Hello messages received via slaves correctly.
Even when messages are received via the slave, we should perform
operations (like adding a new Sentinel) in the context of the master.
2013-11-11 17:12:27 +01:00
antirez
3d7000f281 Sentinel: remove code not useful in the new design. 2013-11-11 12:06:11 +01:00
antirez
49170f6c12 Sentinel: epoch introduced.
Sentinel state now includes the idea of current epoch and config epoch.
In the Hello message, that is now published both on masters and slaves,
a Sentinel no longer just advertises itself but also broadcasts its
current view of the configuration: the master name / ip / port and its
current epoch.

Sentinels receiving such information switch to the new master if the
configuration epoch received is newer and the ip / port of the master
are indeed different compared to the previos ones.
2013-11-11 11:05:58 +01:00
antirez
c46f655c90 Log to what master a slave is going to connect to. 2013-11-11 09:25:36 +01:00
antirez
e159239f9c Cluster: removed not needed newline at end of redisLog() msg. 2013-11-08 17:28:02 +01:00
antirez
a67935e5e3 Cluster: send a single UPDATE packet for now. 2013-11-08 17:25:49 +01:00
antirez
a146482c83 Cluster: replace hardcoded 4096 for bus msg len with sizeof(). 2013-11-08 17:19:19 +01:00
antirez
36db83ac50 Cluster: slots update refactored + UPDATE msg processing.
Now there is a function that handles the update of the local slot
configuration every time we have some new info about a node and its set
of served slots and configEpoch.

Moreoever the UPDATE packets are now processed when received (it was a
work in progress in the previous commit).
2013-11-08 17:02:10 +01:00
antirez
a19147c2fa Cluster: UPDATE msg data structure and sending function. 2013-11-08 16:26:50 +01:00
antirez
4666966c9f Cluster: refactoring of slots update code and more.
The commit also introduces detection of nodes publishing not updated
configuration. More work in progress to send an UPDATE packet to inform
of the config change.
2013-11-08 10:32:16 +01:00
antirez
8c2127f9c9 Fix broken rdbWriteRaw() return value check in rdb.c.
Thanks to @PhoneLi for reporting.
2013-11-07 23:53:18 +01:00
antirez
605ba6ad38 redis-trib: fixed slot allocation when --replicas is used. 2013-11-07 16:12:06 +01:00
antirez
b64dbf7387 Sentinel: sentinelSendSlaveOf() was missing a var and the prototype. 2013-11-06 11:23:53 +01:00
antirez
3be2b18d9f Sentinel: increment pending_commands counter in two more places.
AUTH and SCRIPT KILL were sent without incrementing the pending commands
counter. Clearly this needs some kind of wrapper doing it for the caller
in order to be less bug prone.
2013-11-06 11:21:44 +01:00
antirez
1071abc4a4 Sentinel: always send CONFIG REWRITE when changing instance role.
This change makes Sentinel less fragile about a number of failure modes.

This commit also fixes a different bug as a side effect, SLAVEOF command
was sent multiple times without incrementing the pending commands count.
2013-11-06 11:13:27 +01:00
antirez
1a1eb8bc8d SCAN code refactored to parse cursor first.
The previous implementation of SCAN parsed the cursor in the generic
function implementing SCAN, SSCAN, HSCAN and ZSCAN.

The actual higher-level command implementation only checked for empty
keys and return ASAP in that case. The result was that inverting the
arguments of, for instance, SSCAN for example and write:

    SSCAN 0 key

Instead of

    SSCAN key 0

Resulted into no error, since 0 is a non-existing key name very likely.
Just the iterator returned no elements at all.

In order to fix this issue the code was refactored to extract the
function to parse the cursor and return the error. Every higher level
command implementation now parses the cursor and later checks if the key
exist or not.
2013-11-05 15:47:50 +01:00
antirez
f31cf249b9 SCAN: when iterating ziplists or intsets always return cursor of 0.
The previous implementation assumed that the first call always happens
with cursor set to 0, this may not be the case, and we want to return 0
anyway otherwise the (broken) client code will loop forever.
2013-11-05 15:32:25 +01:00
antirez
b97cff8b63 Use strtoul() instead of sscanf() in SCAN implementation. 2013-11-05 15:30:21 +01:00
antirez
56f53f7b3c HSCAN/ZSCAN: skip value when matching.
This fixes issue #1360 and #1362.
2013-11-05 12:16:29 +01:00
antirez
f6738923a6 Cluster: initialize senderConfigEpoch and senderCurrentEpoch for warnings suppression. 2013-11-05 12:01:07 +01:00
antirez
522549729f Pass int64_t to intsetGet() instead of long long. 2013-11-05 11:57:30 +01:00