448 Commits

Author SHA1 Message Date
Jiahao Huang
69227d8b76 in 32bit machine, popcount don't work with a input string length up to 512 MB,
bitcount commant may return negtive integer with string length more than 256 MB
2013-05-07 19:55:57 +08:00
antirez
4f96bde1e2 Cluster: link reconnection on delayed PONG reply.
When the PONG delay is half the cluster node timeout, the link gets
disconnected (and later automatically reconnected) in order to ensure
that it's not just a dead connection issue.

However this operation is only performed if the link is old enough, in
order to avoid to disconnect the same link again and again (and among
the other problems, never receive the PONG because of that).

Note: when the link is reconnected, the 'ping_sent' field is not updated
even if a new ping is sent using the new connection, so we can still
reliably detect a node ping timeout.
2013-05-03 15:43:03 +02:00
antirez
bd638d72ee Config option to turn AOF rewrite incremental fsync on/off. 2013-04-24 10:57:07 +02:00
antirez
ef70f8f36e AOF: sync data on disk every 32MB when rewriting.
This prevents the kernel from putting too much stuff in the output
buffers, doing too heavy I/O all at once. So the goal of this commit is
to split the disk pressure due to the AOF rewrite process into smaller
spikes.

Please see issue #1019 for more information.
2013-04-24 10:26:31 +02:00
antirez
442af928f9 Cluster: use server.cluster_node_timeout directly.
We used to copy this value into the server.cluster structure, however this
was not necessary.

The reason why we don't directly use server.cluster->node_timeout is
that things that can be configured via redis.conf need to be directly
available in the server structure as server.cluster is allocated later
only if needed in order to reduce the memory footprint of non-cluster
instances.
2013-04-09 11:24:18 +02:00
antirez
a8c26a0397 Cluster: configdigest field no longer used. Removed. 2013-04-09 11:07:25 +02:00
antirez
90293ad01b Cluster: move REDIS_CLUSTER_FAILOVER_DELAY near other timing defines. 2013-04-04 14:23:34 +02:00
antirez
0c0db1bc3d Cluster: node timeout is now configurable. 2013-04-04 12:29:10 +02:00
antirez
2e9c57f2aa Cluster: turn hardcoded node timeout multiplicators into defines.
Most Redis Cluster time limits are expressed in terms of the configured
node timeout. Turn them into defines.
2013-04-04 12:04:11 +02:00
antirez
a9d031c771 Throttle BGSAVE attempt on saving error.
When a BGSAVE fails, Redis used to flood itself trying to BGSAVE at
every next cron call, that is either 10 or 100 times per second
depending on configuration and server version.

This commit does not allow a new automatic BGSAVE attempt to be
performed before a few seconds delay (currently 5).

This avoids both the auto-flood problem and filling the disk with
logs at a serious rate.

The five seconds limit, considering a log entry of 200 bytes, will use
less than 4 MB of disk space per day that is reasonable, the sysadmin
should notice before of catastrofic events especially since by default
Redis will stop serving write queries after the first failed BGSAVE.

This fixes issue #849
2013-04-02 14:05:50 +02:00
antirez
b9e31495c4 DEBUG set-active-expire added.
We need the ability to disable the activeExpireCycle() (active
expired key collection) call for testing purposes.
2013-03-27 17:55:02 +01:00
antirez
20998c9f35 Cluster: new flag PROMOTED introduced.
A slave node set this flag for itself when, after receiving authorization
from the majority of nodes, it turns itself into a master.

At the same time now this flag is tested by nodes receiving a PING
message before reconfiguring after a failover event. This makes the
system more robust: even if currently there is no way to manually turn
a slave into a master it is possible that we'll have such a feature in
the future, or that simply because of misconfiguration a node joins the
cluster as master while others believe it's a slave. This alone is now
no longer enough to trigger reconfiguration as other nodes will check
for the PROMOTED flag.

The PROMOTED flag is cleared every time the node is turned back into a
replica of some other node.
2013-03-20 10:48:42 +01:00
antirez
27c1fe7c94 Cluster: add sender flags in cluster bus messages header.
Sender flags were not propagated for the sender, but only for nodes in
the gossip section. This is odd and in the next commits we'll need to
get updated flags for the sender node, so this commit adds a new field
in the cluster messages header.

The message header is the same size as we reused some free space that
was marked as 'unused' because of alignment concerns.
2013-03-20 10:32:00 +01:00
antirez
300c6c17aa Cluster: slaves start failover with a small delay.
Redis Cluster can cope with a minority of nodes not informed about the
failure of a master in time for some reason (netsplit or node not
functioning properly, blocked, ...) however to wait a few seconds before
to start the failover will make most "normal" failovers simpler as the
FAIL message will propagate before the slave election happens.
2013-03-15 16:39:49 +01:00
antirez
63e3bc7cb3 Cluster: handle FAILOVER_AUTH_ACK messages.
That's trivial as we just need to increment the count of masters that
received with an ACK.
2013-03-14 16:43:13 +01:00
antirez
444fd457d2 Cluster: FAILOVER_AUTH_REQUEST message type introduced.
This message is sent by a slave that is ready to failover its master to
other nodes to get the authorization from the majority of masters.
2013-03-13 17:21:20 +01:00
antirez
80158107bd Cluster: clusterHandleSlaveFailover() stub. 2013-03-13 13:10:49 +01:00
antirez
dc1e158648 REDIS_DBCRON_DBS_PER_SEC -> REDIS_DBCRON_DBS_PER_CALL 2013-03-09 11:44:20 +01:00
antirez
51c65e01ea Only resize/rehash a few databases per cron iteration.
This is the first step to lower the CPU usage when many databases are
configured. The other is to also process a limited number of DBs per
call in the active expire cycle.
2013-03-08 14:01:12 +01:00
antirez
311f9d5164 Cluster: clusterUpdateState() function simplified.
Also the NEEDHELP Cluster state was removed as it will no longer be
used by Redis Cluster.
2013-03-06 18:25:40 +01:00
antirez
9076bf9d4b API to lookup commands with their original name.
A new server.orig_commands table was added to the server structure, this
contains a copy of the commant table unaffected by rename-command
statements in redis.conf.

A new API lookupCommandOrOriginal() was added that checks both tables,
new first, old later, so that rewriteClientCommandVector() and friends
can lookup commands with their new or original name in order to fix the
client->cmd pointer when the argument vector is renamed.

This fixes the segfault of issue #986, but does not fix a wider range of
problems resulting from renaming commands that actually operate on data
and are registered into the AOF file or propagated to slaves... That is
command renaming should be handled with care.
2013-03-06 16:28:26 +01:00
antirez
24aa7a566e Cluster: new node field fail_time.
This is the unix time at which we set the FAIL flag for the node.
It is only valid if FAIL is set.

The idea is to use it in order to make the cluster more robust, for
instance in order to revert a FAIL state if it is long-standing but
still slots are assigned to this node, that is, no one is going to fix
these slots apparently.
2013-03-05 13:15:05 +01:00
antirez
06fa5f82d7 SLAVEOF command refactored into a proper API.
We now have replicationSetMaster() and replicationUnsetMaster() that can
be called in other contexts (for instance Redis Cluster).
2013-03-04 13:22:21 +01:00
antirez
7348dbc52c Cluster: new field in cluster node structure, "numslots".
Before a relatively slow popcount() operation was needed every time we
needed to get the number of slots served by a given cluster node.
Now we just need to check an integer that is taken in sync with the
bitmap.
2013-02-28 15:11:05 +01:00
antirez
646785ae48 Use GCC printf format attribute for redisLog().
This commit also fixes redisLog() statements producing warnings.
2013-02-27 12:27:15 +01:00
antirez
6386d8ffc3 Set process name in ps output to make operations safer.
This commit allows Redis to set a process name that includes the binding
address and the port number in order to make operations simpler.

Redis children processes doing AOF rewrites or RDB saving change the
name into redis-aof-rewrite and redis-rdb-bgsave respectively.

This in general makes harder to kill the wrong process because of an
error and makes simpler to identify saving children.

This feature was suggested by Arnaud GRANAL in the Redis Google Group,
Arnaud also pointed me to the setproctitle.c implementation includeed in
this commit.

This feature should work on all the Linux, OSX, and all the three major
BSD systems.
2013-02-26 11:52:12 +01:00
antirez
e1b0d246f9 Cluster: use O(log(N)) algo for countKeysInSlot(). 2013-02-25 12:37:50 +01:00
antirez
301f162a2d Cluster: fix case for getKeysInSlot() and countKeysInSlot().
Redis functions start in low case. A few functions about cluster were
capitalized the wrong way.
2013-02-25 11:25:40 +01:00
antirez
a4b9ffec27 Cluster: use CountKeysInSlot() when we just need the count. 2013-02-25 11:23:04 +01:00
antirez
0b86a5e2d1 Cluster: added stub for verifyClusterConfigWithData().
See the top-comment for the function in this commit for details about
what the function is supposed to do.
2013-02-25 11:20:17 +01:00
antirez
66cadd3ec9 Cluster: new state information, cluster size.
The definition of cluster size is: the number of known nodes in the
cluster that are masters and serving at least an hash slot.
2013-02-22 19:18:30 +01:00
antirez
f66a189d4d Cluster: introduced a failure reports system.
A §Redis Cluster node used to mark a node as failing when itself
detected a failure for that node, and a single acknowledge was received
about the possible failure state.

The new API will be used in order to possible to require that N other
nodes have a PFAIL or FAIL state for a given node for a node to set it
as failing.
2013-02-22 17:43:35 +01:00
antirez
a9624604ed Cluster: new command flag forcing implicit ASKING.
Also using this new flag the RESTORE-ASKING command was implemented that
will be used by MIGRATE.
2013-02-20 17:28:35 +01:00
antirez
79f3570266 Cluster: sanity checks on the cluster bus message length. 2013-02-15 16:44:39 +01:00
antirez
6c809ab43a Cluster: move cluster config file out of config state.
This makes us able to avoid allocating the cluster state structure if
cluster is not enabled, but still we can handle the configuration
directive that sets the cluster config filename.
2013-02-14 15:20:02 +01:00
antirez
2531fea2a5 Cluster: the cluster state structure is now heap allocated. 2013-02-14 13:20:56 +01:00
antirez
e60743e0fc Cluster: from 4096 to 16384 hash slots. 2013-02-14 12:49:16 +01:00
antirez
cb6ff7f5d1 Return a specific NOAUTH error if authentication is required. 2013-02-12 16:25:41 +01:00
antirez
12a3bf6245 Replication: added new stats counting full and partial resynchronizations. 2013-02-12 15:33:54 +01:00
antirez
75512d94d9 PSYNC: work in progress, preview #2, rebased to unstable. 2013-02-12 12:52:21 +01:00
antirez
96c5190ed9 Use replicationFeedSlaves() to send PING to slaves.
A Redis master sends PING commands to slaves from time to time: doing
this ensures that even if absence of writes, the master->slave channel
remains active and the slave can feel the master presence, instead of
closing the connection for timeout.

This commit changes the way PINGs are sent to slaves in order to use the
standard interface used to replicate all the other commands, that is,
the function replicationFeedSlaves().

With this change the stream of commands sent to every slave is exactly
the same regardless of their exact state (Transferring RDB for first
synchronization or slave already online). With the previous
implementation the PING was only sent to online slaves, with the result
that the output stream from master to slaves was not identical for all
the slaves: this is a problem if we want to implement partial resyncs in
the future using a global replication stream offset.

TL;DR: this commit should not change the behaviour in practical terms,
but is just something in preparation for partial resynchronization
support.
2013-02-12 12:50:28 +01:00
antirez
08d5aa6a20 Emit SELECT to slaves in a centralized way.
Before this commit every Redis slave had its own selected database ID
state. This was not actually useful as the emitted stream of commands
is identical for all the slaves.

Now the the currently selected database is a global state that is set to
-1 when a new slave is attached, in order to force the SELECT command to
be re-emitted for all the slaves.

This change is useful in order to implement replication partial
resynchronization in the future, as makes sure that the stream of
commands received by slaves, including SELECT commands, are exactly the
same for every slave connected, at any time.

In this way we could have a global offset that can identify a specific
piece of the master -> slaves stream of commands.
2013-02-12 12:50:28 +01:00
antirez
96ef42d768 Set SO_KEEPALIVE on client sockets if configured to do so. 2013-02-08 16:40:59 +01:00
antirez
d698a264d2 TCP_NODELAY after SYNC: changes to the implementation. 2013-02-05 12:04:30 +01:00
charsyam
6c7473623e Turn off TCP_NODELAY on the slave socket after SYNC.
Further details from @antirez:

It was reported by @StopForumSpam on Twitter that the Redis replication
link was strangely using multiple TCP packets for multiple commands.
This wastes a lot of bandwidth and is due to the TCP_NODELAY option we
enable on the socket after accepting a new connection.

However the master -> slave channel is a one-way channel since Redis
replication is asynchronous, so there is no point in trying to reduce
the latency, we should aim to reduce the bandwidth. For this reason this
commit introduces the ability to disable the nagle algorithm on the
socket after a successful SYNC.

This feature is off by default because the delay can be up to 40
milliseconds with normally configured Linux kernels.
2013-02-05 12:04:25 +01:00
antirez
f28d386cc5 Keyspace events: it is now possible to select subclasses of events.
When keyspace events are enabled, the overhead is not sever but
noticeable, so this commit introduces the ability to select subclasses
of events in order to avoid to generate events the user is not
interested in.

The events can be selected using redis.conf or CONFIG SET / GET.
2013-01-28 13:15:12 +01:00
antirez
3394db6361 Fix decrRefCount() prototype from void to robj pointer.
decrRefCount used to get its argument as a void* pointer in order to be
used as destructor where a 'void free_object(void*)' prototype is
expected. However this made simpler to introduce bugs by freeing the
wrong pointer. This commit fixes the argument type and introduces a new
wrapper called decrRefCountVoid() that can be used when the void*
argument is needed.
2013-01-28 13:14:53 +01:00
antirez
766a541a0a Keyspace events notification API. 2013-01-28 13:14:36 +01:00
guiquanz
df7a5b7157 Fixed many typos. 2013-01-19 10:59:44 +01:00
antirez
1472cae960 CLIENT GETNAME and CLIENT SETNAME introduced.
Sometimes it is much simpler to debug complex Redis installations if it
is possible to assign clients a name that is displayed in the CLIENT
LIST output.

This is the case, for example, for "leaked" connections. The ability to
provide a name to the client makes it quite trivial to understand what
is the part of the code implementing the client not releasing the
resources appropriately.

Behavior:

    CLIENT SETNAME: set a name for the client, or remove the current
                    name if an empty name is set.
    CLIENT GETNAME: get the current name, or a nil.
    CLIENT LIST: now displays the client name if any.

Thanks to Mark Gravell for pushing this idea forward.
2013-01-15 13:34:10 +01:00