3610 Commits

Author SHA1 Message Date
antirez
24aa7a566e Cluster: new node field fail_time.
This is the unix time at which we set the FAIL flag for the node.
It is only valid if FAIL is set.

The idea is to use it in order to make the cluster more robust, for
instance in order to revert a FAIL state if it is long-standing but
still slots are assigned to this node, that is, no one is going to fix
these slots apparently.
2013-03-05 13:15:05 +01:00
antirez
61885aa646 Cluster: don't check keys hash slots when the source is our master.
Usually we redirect clients to the right hash slot, however we don't
want to do that with our master, we want just to mirror it.
2013-03-05 13:02:44 +01:00
antirez
2c63db3b93 Cluster: slaveof not allowed in redis.conf as well. 2013-03-05 12:58:22 +01:00
antirez
39c5f8a615 Cluster: SLAVEOF command not allowed in cluster mode. 2013-03-05 12:39:41 +01:00
antirez
a06092c576 Cluster: A comment updated in clusterCron(). 2013-03-05 12:17:30 +01:00
antirez
de720e4f0a Cluster: send a ping to every node we never contacted in timeout/2 seconds.
Usually we try to send just 1 ping every second, however when we detect
we are going to have unreliable failure detection because we can't ping
some node in time, send an additional ping.

This should only happen with very large clusters or when the the node
timeout is set to a very low value.
2013-03-05 12:16:02 +01:00
antirez
401893bc38 Cluster: set node->slaveof correctly when a node state is updated. 2013-03-05 11:50:11 +01:00
antirez
8eb9c13dcf Cluster: don't perform startup slots sanity check for slaves.
If we are a cluster node the DB content will not match our configured
slots. Don't do the check at all.
2013-03-04 19:47:00 +01:00
antirez
63cac938b3 Cluster: fix maximum line length when loading config.
There are pathological cases where the line can be even longer a single
node may contain all the slots in importing/migrating state.
2013-03-04 19:45:36 +01:00
antirez
7cf96d66ef Make sure replicationSetMaster() works when ip argument is not an sds. 2013-03-04 15:39:55 +01:00
antirez
9e34d02450 Cluster: actually setup replication in CLUSTER REPLICATE. 2013-03-04 15:27:58 +01:00
antirez
06fa5f82d7 SLAVEOF command refactored into a proper API.
We now have replicationSetMaster() and replicationUnsetMaster() that can
be called in other contexts (for instance Redis Cluster).
2013-03-04 13:22:21 +01:00
antirez
e6da336d19 Cluster: REPLICATE subcommand and stub for clusterSetMaster(). 2013-03-04 13:15:09 +01:00
charsyam
466e2de67d adding check error code
adding check error code
2013-03-04 11:20:11 +01:00
antirez
824879435c redis-cli: use keepalive socket option.
This should improve things in two ways:

1) Prevent timeouts caused by the execution of long commands.
2) Improve detection of real connection errors.

This is mostly effective only on Linux because of the bogus default
keepalive settings. In Linux we have OS-specific calls to set the
keepalive interval to reasonable values.
2013-03-04 11:14:32 +01:00
Salvatore Sanfilippo
22911ac922 Merge pull request #967 from 0x20h/fix-git-diff
suppress external diff program when using git diff.
2013-03-04 01:57:01 -08:00
Stam He
e795f82438 point 2 of slave-serve-stale-data miss '-' between 'stale' and 'data' 2013-03-04 10:49:07 +01:00
antirez
19833a01f2 Cluster: don't set the slot as unassigned because of PONG info.
As stated in the comment this is usually due to a resharding in progress
so the client should be still redirected to the old node that will
handle the redirection elsewhere.
2013-02-28 15:54:29 +01:00
antirez
de254ffdb4 Cluster: better handling of slots changes in PONG packets.
The new code makes sure that the node slots bitmap is always consistent
with the cluster->slots array.
2013-02-28 15:41:54 +01:00
antirez
4d353074ba Cluster: refactoring of clusterNode*Bit to use helper bitmap functions. 2013-02-28 15:23:09 +01:00
antirez
a535d4b477 Cluster: use node->numslots instead of popcount() where possible. 2013-02-28 15:13:32 +01:00
antirez
7348dbc52c Cluster: new field in cluster node structure, "numslots".
Before a relatively slow popcount() operation was needed every time we
needed to get the number of slots served by a given cluster node.
Now we just need to check an integer that is taken in sync with the
bitmap.
2013-02-28 15:11:05 +01:00
antirez
bee0907fa4 Cluster: don't gossip about nodes that are not useful to the cluster. 2013-02-28 15:00:09 +01:00
antirez
5e5b822277 redis-trib: skip nodes without slots when creating the config signature. 2013-02-28 13:12:56 +01:00
antirez
f7e497ce38 redis-trib help. 2013-02-27 18:02:22 +01:00
antirez
1faf992b16 Cluster: CLUSTER FORGET implemented. 2013-02-27 17:55:59 +01:00
antirez
fae00c9998 Cluster: added a missing return on CLUSTER SETSLOT. 2013-02-27 17:53:48 +01:00
antirez
8d4ae49ece redis-trib: skip noaddr and disconnected nodes while loading cluster info. 2013-02-27 17:23:11 +01:00
antirez
1841332778 Cluster: blank node address when flagging it as NOADDR. 2013-02-27 17:09:33 +01:00
antirez
48718d5dad Cluster: add comments in sub-sections of CLUSTER command. 2013-02-27 16:12:59 +01:00
antirez
5b28421929 redis-trib: initial implementation of addnode command. 2013-02-27 15:58:41 +01:00
antirez
cabbb64d80 Remove warning when printing redisBuildId(). 2013-02-27 12:33:27 +01:00
antirez
c3a453268b Remove too agressive/spamming log in rdb.c. 2013-02-27 12:29:20 +01:00
antirez
646785ae48 Use GCC printf format attribute for redisLog().
This commit also fixes redisLog() statements producing warnings.
2013-02-27 12:27:15 +01:00
antirez
3fe686b697 Better panic message for failed time event creation. 2013-02-27 12:00:11 +01:00
Stam He
df3489aeb7 add a check for aeCreateTimeEvent
1) Add a check for aeCreateTimeEvent in function initServer.
2013-02-27 11:57:35 +01:00
Stam He
2fabb412ca Set proctitle: avoid the use of __attribute__((constructor)).
This cased a segfault in some Linux system and was GCC-specific.

Commit modified by @antirez:

1) Stripped away the part to set the proc title via config for now.
2) Handle initialization of setproctitle only when the replacement
   is used.
3) Don't require GCC now that the attribute constructor is no
   longer used.
2013-02-27 11:50:35 +01:00
antirez
ac7f6bac6f setproctitle.c: declar tmp as static so valgrind will not detect a leak. 2013-02-26 15:51:33 +01:00
antirez
a63b1196e0 Cluster: a few random fixes to the new failure detection. 2013-02-26 15:15:44 +01:00
antirez
2dec530e15 Cluster: log the event when we clear the FAIL flag. 2013-02-26 15:03:38 +01:00
antirez
79212927fb Cluster: use the failure report API to reimplement failure detection.
The new system detects a failure only when there is quorum from masters.
2013-02-26 14:58:39 +01:00
antirez
6386d8ffc3 Set process name in ps output to make operations safer.
This commit allows Redis to set a process name that includes the binding
address and the port number in order to make operations simpler.

Redis children processes doing AOF rewrites or RDB saving change the
name into redis-aof-rewrite and redis-rdb-bgsave respectively.

This in general makes harder to kill the wrong process because of an
error and makes simpler to identify saving children.

This feature was suggested by Arnaud GRANAL in the Redis Google Group,
Arnaud also pointed me to the setproctitle.c implementation includeed in
this commit.

This feature should work on all the Linux, OSX, and all the three major
BSD systems.
2013-02-26 11:52:12 +01:00
antirez
7ae78905f9 Cluster: invert two functions declarations in more natural order. 2013-02-26 11:19:48 +01:00
antirez
3f8fa86902 Cluster: cleanup idle failure reports every time we remove one.
This is not very important as anyway when the function counting the
number of reports is called the cleanup is performed. However with this
change if only part of the nodes that reported the failure will report
the node is back ok, we'll cleanup the older entries ASAP. In complex
split net split scenarios, and when we are dealing with clusters having
nodes in the order of ~ 1000, this can save some CPU.
2013-02-26 11:15:18 +01:00
antirez
d4b7d73115 Cluster: new function clusterNodeDelFailureReport() for failure reports.
This is the missing part of the API that will be used to reimplement
failure detection of Cluster nodes.
2013-02-25 19:13:22 +01:00
Salvatore Sanfilippo
b00beac84e Merge pull request #969 from serphen/unstable
Fix error "repl-backlog-size must be 1 or greater"
2013-02-25 09:38:15 -08:00
Arnaud Granal
4780c5313c Fix error "repl-backlog-size must be 1 or greater"
The parameter repl-backlog-size is not parsed correctly in the configuration file. argv[0] is parsed instead of argv[1].
2013-02-25 19:25:48 +02:00
antirez
0b25267f52 Cluster: no limits for the count parameter of CLUSTER GETKEYSINSLOT.
Not sure why I set a limit to 1 million keys, there is no reason for
this artificial limit, and anyway this is s a stupid limit because it is
already high enough to create latency issues. So let's the users shoot
on their feet because maybe they just actually know what they are doing.
2013-02-25 12:41:13 +01:00
antirez
4b019d22cd Cluster: validate slot number in CLUSTER COUNTKEYSINSLOT. 2013-02-25 12:40:32 +01:00
antirez
e1b0d246f9 Cluster: use O(log(N)) algo for countKeysInSlot(). 2013-02-25 12:37:50 +01:00