100 Commits

Author SHA1 Message Date
antirez
1d2eaf4cb4 serverCron() frequency is now a runtime parameter (was REDIS_HZ).
REDIS_HZ is the frequency our serverCron() function is called with.
A more frequent call to this function results into less latency when the
server is trying to handle very expansive background operations like
mass expires of a lot of keys at the same time.

Redis 2.4 used to have an HZ of 10. This was good enough with almost
every setup, but the incremental key expiration algorithm was working a
bit better under *extreme* pressure when HZ was set to 100 for Redis
2.6.

However for most users a latency spike of 30 milliseconds when million
of keys are expiring at the same time is acceptable, on the other hand a
default HZ of 100 in Redis 2.6 was causing idle instances to use some
CPU time compared to Redis 2.4. The CPU usage was in the order of 0.3%
for an idle instance, however this is a shame as more energy is consumed
by the server, if not important resources.

This commit introduces HZ as a runtime parameter, that can be queried by
INFO or CONFIG GET, and can be modified with CONFIG SET. At the same
time the default frequency is set back to 10.

In this way we default to a sane value of 10, but allows users to
easily switch to values up to 500 for near real-time applications if
needed and if they are willing to pay this small CPU usage penalty.
2012-12-14 17:10:40 +01:00
antirez
a32d1ddff6 BSD license added to every C source and header file. 2012-11-08 18:31:32 +01:00
antirez
b754312cfe Unix socket clients properly displayed in MONITOR and CLIENT LIST.
This also fixes issue #745.
2012-11-01 22:10:45 +01:00
antirez
7784b364aa "Timeout receiving bulk data" error message modified.
The new message now contains an hint about modifying the repl-timeout
configuration directive if the problem persists.

This should normally not be needed, because while the master generates
the RDB file it makes sure to send newlines to the replication channel
to prevent timeouts. However there are times when masters running on
very slow systems can completely stop for seconds during the RDB saving
process. In such a case enlarging the timeout value can fix the problem.

See issue #695 for an example of this problem in an EC2 deployment.
2012-10-04 11:52:16 +02:00
antirez
9b3dea9357 Fix compilation on FreeBSD. Thanks to @koobs on twitter. 2012-09-17 12:46:06 +02:00
Salvatore Sanfilippo
7c903f144c Merge pull request #576 from saj/fix-slave-ping-period
Bug fix: slaves being pinged every second
2012-09-05 06:59:37 -07:00
antirez
217c473cf9 Send an async PING before starting replication with master.
During the first synchronization step of the replication process, a Redis
slave connects with the master in a non blocking way. However once the
connection is established the replication continues sending the REPLCONF
command, and sometimes the AUTH command if needed. Those commands are
send in a partially blocking way (blocking with timeout in the order of
seconds).

Because it is common for a blocked master to accept connections even if
it is actually not able to reply to the slave requests, it was easy for
a slave to block if the master had serious issues, but was still able to
accept connections in the listening socket.

For this reason we now send an asynchronous PING request just after the
non blocking connection ended in a successful way, and wait for the
reply before to continue with the replication process. It is very
unlikely that a master replying to PING can't reply to the other
commands.

This solution was proposed by Didier Spezia (Thanks!) so that we don't
need to turn all the replication process into a non blocking affair, but
still the probability of a slave blocked is minimal even in the event of
a failing master.

Also we now use getsockopt(SO_ERROR) in order to check errors ASAP
in the event handler, instead of waiting for actual I/O to return an
error.

This commit fixes issue #632.
2012-09-02 12:24:38 +02:00
antirez
8df82f58a6 Incrementally flush RDB on disk while loading it from a master.
This fixes issue #539.

Basically if there is enough free memory the OS may buffer the RDB file
that the slave transfers on disk from the master. The file may
actually be flused on disk at once by the operating system when it gets
closed by Redis, causing the close system call to block for a long time.

This patch is a modified version of one provided by yoav-steinberg of
@garantiadata (the original version was posted in the issue #539
comments), and tries to flush the OS buffers incrementally (every 8 MB
of loaded data).
2012-08-28 12:47:33 +02:00
Saj Goonatilleke
bc235fe40f Bug fix: slaves being pinged every second
REDIS_REPL_PING_SLAVE_PERIOD controls how often the master should
transmit a heartbeat (PING) to its slaves.  This period, which defaults
to 10, is measured in seconds.

Redis 2.4 masters used to ping their slaves every ten seconds, just like
it says on the tin.

The Redis 2.6 masters I have been experimenting with, on the other hand,
ping their slaves *every second*.  (master_last_io_seconds_ago never
approaches 10.)  I think the ping period was inadvertently slashed to
one-tenth of its nominal value around the time REDIS_HZ was introduced.
This commit reintroduces correct ping schedule behaviour.
2012-07-05 14:29:27 +10:00
antirez
3a7ddfde77 Typo in comment. 2012-06-27 11:26:44 +02:00
antirez
cee3bfd049 REPLCONF internal command introduced.
The REPLCONF command is an internal command (not designed to be directly
used by normal clients) that allows a slave to set some replication
related state in the master before issuing SYNC to start the
replication.

The initial motivation for this command, and the only reason currently
it is used by the implementation, is to let the slave instance
communicate its listening port to the slave, so that the master can
show all the slaves with their listening ports in the "replication"
section of the INFO output.

This allows clients to auto discover and query all the slaves attached
into a master.

Currently only a single option of the REPLCONF command is supported, and
it is called "listening-port", so the slave now starts the replication
process with something like the following chat:

    REPLCONF listening-prot 6380
    SYNC

Note that this works even if the master is an older version of Redis and
does not understand REPLCONF, because the slave ignores the REPLCONF
error.

In the future REPLCONF can be used for partial replication and other
replication related features where there is the need to exchange
information between master and slave.

NOTE: This commit also fixes a bug: the INFO outout already carried
information about slaves, but the port was broken, and was obtained
with getpeername(2), so it was actually just the ephemeral port used
by the slave to connect to the master as a client.
2012-06-27 09:43:57 +02:00
antirez
b5a89c926d Dead code removed from replication.c.
The user @jokea noticed that the following line of code into
replication.c made little sense:

    addReplySds(slave,sdsempty());

Investigating a bit I found that this was introduced by commit 6208b3a7
three years ago in the early stages of Redis. The code apparently is not
useful at all, so I'm removing it.

This change will not be backported into 2.4 so that in the rare case
this should introduce a bug, we'll have a chance to detect it into the
development branch. However following the code path it seems like the
code is not useful at all, so the risk is truly small.
2012-05-24 11:35:21 +02:00
antirez
62f9e1e630 Remove useless trailing space in SYNC command sent to master. 2012-05-02 21:47:53 +02:00
David Tran
7434b94a10 Spelling: s/synchrnonization/synchronization 2012-04-25 12:21:56 -07:00
antirez
a17aa84adf syncio.c calls in replication.c fixed for the new millisecond timeout API. 2012-03-31 11:23:30 +02:00
antirez
18c912fc45 Purely aesthetic code change. 2012-03-30 10:39:34 +02:00
Joseph Jang
2e7ff37280 Fixed a memory leak with replication
occurs when two or more dbs are replicated and at least one of them is >db10
2012-03-30 10:34:29 +02:00
antirez
b72df2df2c Fix for slaves chains. Force resync of slaves (simply disconnecting them) when SLAVEOF turns a master into a slave. 2012-03-29 09:24:02 +02:00
Premysl Hruby
f3fa6655c6 use server.unixtime instead of time(NULL) where possible (cluster.c not checked though) 2012-03-27 17:39:58 +02:00
antirez
8c0915285d Better MONITOR output, now includes client ip:port or the lua string if the command was executed by the scripting engine. 2012-03-07 12:12:15 +01:00
antirez
f77aff1f05 Ping the slave using the standard protocol instead of the inline one. 2012-02-29 16:33:54 +01:00
antirez
379502c014 Don't change the replication state if SLAVE OF is called with arguments specifying the same master we are already connected with. This fixes issues #290. 2012-01-16 11:29:47 +01:00
antirez
ac6de3d151 Fixed replication when multiple slaves are attaching at the same time. The output buffer was not copied correctly between slaves. This fixes issue #141. 2011-12-30 19:40:43 +01:00
antirez
64afc922e6 server.replstate -> server.repl_state 2011-12-21 12:23:18 +01:00
antirez
4aa527ba09 some RDB server struct fields renamed. 2011-12-21 12:22:13 +01:00
antirez
6bb4b565ff AOF refactoring, now with three states: ON, OFF, WAIT_REWRITE. 2011-12-21 10:31:34 +01:00
antirez
55092fe167 AOF fixes in the context of replicaiton (when AOF is used by slave) and CONFIG SET appendonly yes/no. 2011-12-15 16:07:49 +01:00
antirez
9913ca1c7b Replication bug fixed: now non blocking connect is also forced to follow the configured replication timeout. 2011-11-30 15:35:16 +01:00
antirez
b64f417d3c 7c6da73 2011-10-31 11:13:28 +01:00
antirez
4fd387090e Return from syncWithMaster() ASAP if the event fired but the instance is no longer a slave. This should fix Issue #145. 2011-10-18 11:15:11 +02:00
antirez
fbd6d884f7 Two fixes for replication: Slave performs the AOF rewrite at the right point. Non blocking connect also uses readable handler as with old Linux kernels like 2.6.18 on connection refused the writable even is not fired (kernel bug). 2011-06-09 15:39:12 +02:00
Pieter Noordhuis
567194bf5d Make replication faster (biggest gain for small number of slaves) 2011-05-30 12:45:07 +02:00
Pieter Noordhuis
5bd18cc33e Configurable synchronous I/O timeout 2011-05-22 12:58:18 +02:00
Pieter Noordhuis
fedabe3991 Minor changes in non-blocking repl. connect 2011-05-22 12:51:09 +02:00
Pieter Noordhuis
6ecfff06a8 Non-blocking connect with master 2011-05-19 18:54:57 +02:00
antirez
226157c42c suppress a Linux warning, for 2.2 sake 2011-02-21 17:51:52 +01:00
antirez
3f3b6a3c6f Fixed issue #435 and at the same time introduced explicit ping in the master-slave channel that will detect a blocked master or a broken even if apparently connected TCP link. 2011-01-20 13:18:23 +01:00
Pieter Noordhuis
6e4ac1ba44 Zero-pad timestamps in MONITOR output
Original report and fix:
http://code.google.com/p/redis/issues/detail?id=404
2010-12-14 17:39:34 +01:00
antirez
dea1672cf3 Fix for bug 374, thanks to Jeremy Zawodny for reporting and tracing why it was crashing. 2010-11-12 20:02:20 +01:00
antirez
e36081f49f more replication info in logs 2010-11-04 18:14:20 +01:00
antirez
f64a6a60f0 non blocking slave replication is now more non blocking than the first implementation... 2010-11-04 18:09:35 +01:00
antirez
86f411a86b typos and minor stuff fixed in the new non blocking replication code 2010-11-04 17:35:03 +01:00
antirez
87bd3ca8fd first attempt to non blocking implementation of slave replication and SYNC bulk data download. Never compiled so far... 2010-11-04 17:29:53 +01:00
antirez
db874d3c9a synchronous I/O networking functions originally used just for replication refactored in a file as generally useful, they are used in the cluster branch for MIGRATE. 2010-10-24 16:22:52 +02:00
Pieter Noordhuis
3ab203762f Use specialized function to add status and error replies 2010-09-02 23:33:06 +02:00
antirez
09252fc4f3 Fixed another instace of the Issue 173 2010-08-27 12:46:10 +02:00
antirez
b91d605a35 slave now detect lost connection during SYNC, fixing Issue 173 2010-08-24 16:25:00 +02:00
antirez
778b2210a9 slave with attached slaves now close the conection to all the slaves when the connection to the master is lost. Now a slave without a connected link to the master will refuse SYNC from other slaves. Enhanced the replication error reporting. All this will fix Issue 156 2010-08-24 16:04:13 +02:00
antirez
d3b958c3fc Fixed MONITOR output for consistency: now integer encoded values are also formatted like this: "3932" 2010-07-01 20:22:46 +02:00
antirez
e2641e09cc redis.c split into many different C files.
networking related stuff moved into networking.c

moved more code

more work on layout of source code

SDS instantaneuos memory saving. By Pieter and Salvatore at VMware ;)

cleanly compiling again after the first split, now splitting it in more C files

moving more things around... work in progress

split replication code

splitting more

Sets split

Hash split

replication split

even more splitting

more splitting

minor change
2010-07-01 14:38:51 +02:00