futriix

Author	SHA1	Message	Date
antirez	b10f3f08a6	Cluster: log granted failover authorizations.	2014-06-10 16:56:08 +02:00
antirez	1074266da1	Cluster: log configEpoch updates to myself.	2014-06-10 16:38:36 +02:00
antirez	d71f39115f	Cluster: log when a master denies a failover auth.	2014-06-10 16:07:26 +02:00
antirez	bc552a87e7	Cluster: cluster_my_epoch added to CLUSTER INFO output.	2014-06-10 11:35:40 +02:00
antirez	cff2c3661a	Cluster: check that configEpoch never goes back. Since there are ways to alter the configEpoch outside of the failover procedure (for exampel CLUSTER SET-CONFIG-EPOCH and via the configEpoch collision resolution algorithm), make always sure, before replacing our configEpoch with a new one, that it is greater than the current one.	2014-06-07 14:37:09 +02:00
antirez	b732d42a80	Cluster: SET-CONFIG-EPOCH should update currentEpoch. SET-CONFIG-EPOCH, used by redis-trib at cluster creation time, failed to update the currentEpoch, making it possible after a failover for a server to set its configEpoch to a value smaller than the current one (since configEpochs are obtained using currentEpoch). The bug totally break the Redis Cluster algorithms and protocols allowing for permanent split brain conditions about the slots configuration as shown in issue #1799.	2014-06-07 14:25:47 +02:00
antirez	2f6c99fc22	Cluster: always allow ok -> fail switch in clusterUpdateState(). There is a time defined by REDIS_CLUSTER_WRITABLE_DELAY where fail -> ok switch is not possible after startup as a master for some time, however the contrary (ok -> fail) should always be possible.	2014-05-26 16:24:12 +02:00
antirez	aec8e92316	Cluster: slave validity factor is now user configurable. Check the commit changes in the example redis.conf for more information.	2014-05-22 16:57:54 +02:00
antirez	352f4fbbd5	Cluster: use clusterSetNodeAsMaster() during slave failover. clusterHandleSlaveFailover() was reimplementing what clusterSetNodeAsMaster() without any good reason.	2014-05-15 17:03:28 +02:00
antirez	b1d19fd6e6	Cluster: clear todo_before_sleep flags when executing actions. Thanks to this change, when there is some code like: clusterDoBeforeSleep(CLUSTER_TODO_UPDATE_STATE\|...); ... and later before returning to the event loop ... clusterUpdateState(); The clusterUpdateState() function will clar the flag and will not be repeated in the clusterBeforeSleep() function. This especially important for config save/fsync flags which are slow to execute and not a good idea to repeat without a good reason. This is implemented for all the CLUSTER_TODO flags.	2014-05-15 16:33:13 +02:00
antirez	73b3fcde40	Fixed typo in CLUSTER RESET implementation.	2014-05-15 12:33:57 +02:00
antirez	b5765f3a73	CLUSTER RESET implemented. The new command is able to reset a cluster node so that it starts again as a fresh node. By default the command performs a soft reset (the same as calling it as CLUSTER RESET SOFT), and the following steps are performed: 1) All slots are set as unassigned. 2) The list of known nodes is flushed. 3) Node is set as master if it is a slave. When an hard reset is performed with CLUSTER RESET HARD the following additional operations are performed: 4) A new Node ID is created at random. 5) Epochs are set to 0. CLUSTER RESET is useful both when the sysadmin wants to reconfigure a node with a different role (for example turning a slave into a master) and for testing purposes. It also may play a role in automatically provisioned Redis Clusters, since it allows to reset a node back to the initial state in order to be reconfigured.	2014-05-15 11:43:06 +02:00
antirez	f48f8fda62	Remove trailing spaces from cluster.c file.	2014-05-15 10:18:36 +02:00
antirez	0ca22608a8	Cluster: don't accept cluster bus connections during startup.	2014-05-14 12:05:00 +02:00
antirez	716b729ab9	Cluster: better handling of stolen slots. The previous code handling a lost slot (by another master with an higher configuration for the slot) was defensive, considering it an error and putting the cluster in an odd state requiring redis-cli fix. This was changed, because actually this only happens either in a legitimate way, with failovers, or when the admin messed with the config in order to reconfigure the cluster. So the new code instead will try to make sure that the keys stored match the new slots map, by removing all the keys in the slots we lost ownership from. The function that deletes the keys from the lost slots is called only if the node does not lose all its slots (resulting in a reconfiguration as a slave of the node that got ownership). This is an optimization since the replication code will anyway flush all the instance data in a faster way.	2014-05-14 10:46:37 +02:00
antirez	e7b8d75ba3	Cluster: fixed data_age computation / check integer overflow.	2014-05-12 17:46:15 +02:00
antirez	28eb9c3f39	Cluster: forced failover implemented. Using CLUSTER FAILOVER FORCE it is now possible to failover a master in a forced way, which means: 1) No check to understand if the master is up is performed. 2) No data age of the slave is checked. Evan a slave with very old data can manually failover a master in this way. 3) No chat with the master is attempted to reach its replication offset: the master can just be down.	2014-05-12 16:34:20 +02:00
antirez	b70dc36c8a	Cluster: bypass data_age check for manual failovers. Automatic failovers only happen in Redis Cluster if the slave trying to be elected was disconnected from its master for no more than 10 times the node-timeout value. However there should be no such a check for manual failovers, since these are initiated by the sysadmin that, in theory, knows what she is doing when a slave is selected to be promoted.	2014-05-12 16:12:12 +02:00
antirez	9cbccf9a6b	RESTORE: reply with -BUSYKEY special error code. The error when the target key is busy was a generic one, while it makes sense to be able to distinguish between the target key busy error and the others easily.	2014-05-12 10:01:59 +02:00
antirez	89afa27bb7	CLUSTER MEET: better error messages when address is invalid. Fixes issue #1734.	2014-05-09 16:36:59 +02:00
antirez	ab05314244	Cluster: bulk-accept new nodes connections. The same change was operated for normal client connections. This is important for Cluster as well, since when a node rejoins the cluster, when a partition heals or after a restart, it gets flooded with new connection attempts by all the other nodes trying to form a full mesh again.	2014-05-09 11:52:59 +02:00
antirez	0478d73098	Cluster: clusterAcceptHandler() comments updated to match the code.	2014-05-09 11:44:46 +02:00
antirez	835af28f4e	CLUSTER SET-CONFIG-EPOCH implemented. Initially Redis Cluster accepted that after cluster creation all the nodes were at configEpoch 0, evolving from zero as failovers happen. However later the semantic was made more strict in order to make sure a cluster has always all the master nodes with a different configEpoch, which is more robust in some corner case (especially resulting from errors by the system administrator). To assign different configEpochs to different nodes at startup was a task performed naturally by the config conflicts resolution algorithm (see the Cluster specification). However this works well only for small clusters or when there are actually just a few collisions, since it is designed for exceptional cases. When a large cluster is created hundred of nodes can be at epoch 0, so the conflict resolution code is slow to provide an unique config to each node. For this reason this new command was introduced. It can be called only when a node is totally fresh: no other nodes known, and configEpoch set to zero, so it is safe even against misuses. redis-trib will use the new command in order to start the cluster already setting an incremental unique config to every node.	2014-04-29 19:15:16 +02:00
antirez	fe8ce2b064	clusterLoadConfig() REDIS_ERR retval semantics refined. We should return REDIS_ERR to signal we can't read the configuration because there is no config file only after checking errno, othewise we risk to rewrite an existing file that was not accessible for some other reason.	2014-04-24 16:23:03 +02:00
antirez	52668c900f	Lock nodes.conf to avoid multiple processes using the same file. This was a common source of problems among users. The solution adopted is not bullet-proof as if the user deletes the nodes.conf file manually, and starts a new instance with the same nodes.conf file path, two instances will use the same file. However following this reasoning the user may drop a nuclear bomb into the datacenter as well.	2014-04-24 16:04:10 +02:00
kingsumos	5456d30213	fix cluster node description showing wrong slot allocation	2014-04-22 11:44:53 -04:00
antirez	76f102e460	Add casting to match printf format. adjustOpenFilesLimit() and clusterUpdateSlotsWithConfig() that were assuming uint64_t is the same as unsigned long long, which is true probably for all the systems out there that we target, but still GCC emitted a warning since technically they are two different types.	2014-04-07 08:58:06 +02:00
antirez	0f816d9867	Cluster: last_vote_epoch -> lastVoteEpoch. Use cammel case for epochs that are persisted on disk.	2014-03-27 15:01:24 +01:00
antirez	befba3795d	Cluster: save/restore vars that must persist after recovery. This fixes issue #1479.	2014-03-27 14:56:29 +01:00
antirez	d28faade17	Cluster: handshake "already known" error logged to VERBOSE. This is not really an error but something that always happens for example when creating a new cluster, or if the sysadmin rejoins manually a node that is already known. Since useless logs don't help, moved to VERBOSE level.	2014-03-26 16:35:38 +01:00
antirez	52dddf2503	Cluster: clusterHandleConfigEpochCollision() fixed. New config epochs must always be obtained incrementing the currentEpoch, that is itself guaranteed to be >= the max configEpoch currently known to the node.	2014-03-26 12:31:28 +01:00
antirez	5c2bc5207a	Cluster: better logging for clusterUpdateSlotsConfigWith().	2014-03-26 12:09:38 +01:00
antirez	242646544a	Cluster: CLUSTER SETSLOT implementation comment updated. Update the comment since the implementation details changed.	2014-03-25 17:50:46 +01:00
antirez	45ddae6110	Cluster: configEpoch collisions resolution. The slave election in Redis Cluster guarantees that slaves promoted to masters always end with unique config epochs, however failures during manual reshardings, software bugs and operational errors may in theory cause two nodes to have the same configEpoch. This commit introduces a mechanism to eventually always end with different configEpochs if a collision ever happens. As a (wanted) side effect, this also ensures that after a new cluster is created, all nodes will end with a different configEpoch automatically.	2014-03-25 17:19:58 +01:00
antirez	4bc0c232e0	Cluster: stay within 80 cols.	2014-03-25 16:07:14 +01:00
antirez	74f0696064	struct dictEntry -> dictEntry.	2014-03-20 16:20:37 +01:00
antirez	274573d489	Cluster: update node configEpoch on UPDATE messages. The UPDATE message contains the configEpoch of the node configuration advertised in the packet. Update it if needed.	2014-03-11 11:53:09 +01:00
antirez	7c07fa63c0	Cluster: set slot error if we receive an update for a busy slot. By manually modifying nodes configurations in random ways, it is possible to create the following scenario: A is serving keys for slot 10 B is manually configured to serve keys for slot 10 A receives an update from B (or another node) where it is informed that the slot 10 is now claimed by B with a greater configuration epoch, however A still has keys from slot 10. With this commit A will put the slot in error setting it in IMPORTING state, so that redis-trib can detect the issue.	2014-03-11 11:49:47 +01:00
antirez	614800dc0d	Cluster: clarified a comment in clusterUpdateSlotsConfigWith().	2014-03-11 11:32:40 +01:00
antirez	062d865c54	Cluster: flush importing/migrating state when master is turned into slave.	2014-03-11 11:22:06 +01:00
antirez	7f8e78732a	Cluster: clusterCloseAllSlots() added.	2014-03-11 11:16:18 +01:00
antirez	24f7ef6e3b	Cluster: getKeysFromCommand() API cleaned up. This API originated from the "diskstore" experiment, not for Redis Cluster itself, so there were legacy/useless things trying to differentiate between keys that are going to be overwritten and keys that need to be fetched from disk (preloaded). All useless with Cluster, so removed with the result of code simplification.	2014-03-10 13:18:41 +01:00
antirez	c23983f658	Cluster: abort on port too high error. It also fixes multi-line comment style to be consistent with the rest of the code base. Related to #1555.	2014-03-10 10:41:27 +01:00
Salvatore Sanfilippo	0df110002e	Merge pull request #1555 from mattsta/cluster-port-error-out Cluster port error out	2014-03-10 10:37:50 +01:00
antirez	128c4f600e	Cluster: be explicit about passing NULL as bind addr for connect. The code was already correct but it was using that bindaddr[0] is set to NULL as a side effect of current implementation if no bind address is configured. This is not guarnteed to hold true in the future.	2014-03-10 10:33:53 +01:00
antirez	20c637ba53	Cluster: log error when anetTcpNonBlockBindConnect() fails.	2014-03-10 10:32:28 +01:00
Salvatore Sanfilippo	7c15239ce5	Merge pull request #1567 from mattsta/fix-cluster-join Bind source address for cluster communication	2014-03-10 10:28:32 +01:00
antirez	964ee1343f	Cluster: better timeout and retry time for failover. When node-timeout is too small, in the order of a few milliseconds, there is no way the voting process can terminate during that time, so we set a lower limit for the failover timeout of two seconds. The retry time is set to two times the failover timeout time, so it is at least 4 seconds.	2014-03-10 09:57:52 +01:00
antirez	a7bbbab9dc	Cluster: fix conditional generating TRYAGAIN error.	2014-03-07 16:18:00 +01:00
antirez	e7022731e0	Redis Cluster: support for multi-key operations.	2014-03-07 13:19:09 +01:00

1 2 3 4 5 ...

393 Commits