492 Commits

Author SHA1 Message Date
antirez
cde7093b10 HLLCOUNT implemented. 2014-03-28 17:37:18 +01:00
antirez
abb0b21919 HLLADD implemented. 2014-03-28 16:24:35 +01:00
antirez
61f3239044 HLLSELFTEST command implemented.
To test the bitfield array of counters set/get macros from the Redis Tcl
suite is hard, so a specialized command that is able to test the
internals was developed.
2014-03-28 12:11:55 +01:00
Matt Stancliff
246a1771db Add REDIS_MIN_RESERVED_FDS define for open fds
Also update the original REDIS_EVENTLOOP_FDSET_INCR to
include REDIS_MIN_RESERVED_FDS. REDIS_EVENTLOOP_FDSET_INCR
exists to make sure more than (maxclients+RESERVED) entries
are allocated, but we can only guarantee that if we include
the current value of REDIS_MIN_RESERVED_FDS as a minimum
for the INCR size.
2014-03-24 13:15:35 -04:00
Matt Stancliff
b8247b4686 Fix maxclients error handling
Everywhere in the Redis code base, maxclients is treated
as an int with (int)maxclients or `maxclients = atoi(source)`,
so let's make maxclients an int.

This fixes a bug where someone could specify a negative maxclients
on startup and it would work (as well as set maxclients very high)
because:

    unsigned int maxclients;
    char *update = "-300";
    maxclients = atoi(update);
    if (maxclients < 1) goto fail;

But, (maxclients < 1) can only catch the case when maxclients
is exactly 0.  maxclients happily sets itself to -300, which isn't
-300, but rather 4294966996, which isn't < 1, so... everything
"worked."

maxclients config parsing checks for the case of < 1, but maxclients
CONFIG SET parsing was checking for case of < 0 (allowing
maxclients to be set to 0).  CONFIG SET parsing is now updated to
match config parsing of < 1.

It's tempting to add a MINIMUM_CLIENTS define, but... I didn't.

These changes were inspired by antirez#356, but this doesn't
fix that issue.
2014-03-24 10:17:33 -04:00
antirez
d84c84f268 Sample and cache RSS in serverCron().
Obtaining the RSS (Resident Set Size) info is slow in Linux and OSX.
This slowed down the generation of the INFO 'memory' section.

Since the RSS does not require to be a real-time measurement, we
now sample it with server.hz frequency (10 times per second by default)
and use this value both to show the INFO rss field and to compute the
fragmentation ratio.

Practically this does not make any difference for memory profiling of
Redis but speeds up the INFO call significantly.
2014-03-24 12:00:20 +01:00
antirez
385bc5807e The default maxmemory policy is now noeviction.
This is safer as by default maxmemory should just set a memory limit
without any key to be deleted, unless the policy is set to something
more relaxed.
2014-03-21 08:03:34 +01:00
antirez
40dc99ffd2 Use 24 bits for the lru object field and improve resolution.
There were 2 spare bits inside the Redis object structure that are now
used in order to enlarge 4x the range of the LRU field.

At the same time the resolution was improved from 10 to 1 second: this
still provides 194 days before the LRU counter overflows (restarting from
zero).

This is not a problem since it only causes lack of eviction precision for
objects not touched for a very long time, and the lack of precision is
only temporary.
2014-03-20 17:56:27 +01:00
antirez
c1c4fc292b Default LRU samples is now 5. 2014-03-20 17:05:42 +01:00
antirez
1d1db42f20 LRU eviction pool implementation.
This is an improvement over the previous eviction algorithm where we use
an eviction pool that is persistent across evictions of keys, and gets
populated with the best candidates for evictions found so far.

It allows to approximate LRU eviction at a given number of samples
better than the previous algorithm used.
2014-03-20 11:57:29 +01:00
antirez
d7a2dd9f2e Obtain LRU clock in a resolution dependent way.
For testing purposes it is handy to have a very high resolution of the
LRU clock, so that it is possible to experiment with scripts running in
just a few seconds how the eviction algorithms works.

This commit allows Redis to use the cached LRU clock, or a value
computed on demand, depending on the resolution. So normally we have the
good performance of a precomputed value, and a clock that wraps in many
days using the normal resolution, but if needed, changing a define will
switch behavior to an high resolution LRU clock.
2014-03-20 11:47:12 +01:00
antirez
c847202e8b Specify lruclock in redisServer structure via REDIS_LRU_BITS.
The padding field was totally useless: removed.
2014-03-20 11:37:27 +01:00
antirez
2f5a67dbca Specify LRU resolution in milliseconds. 2014-03-20 11:33:25 +01:00
antirez
a8127d9fcb Set LRU parameters via REDIS_LRU_BITS define. 2014-03-20 11:22:47 +01:00
antirez
624be01ff0 Unify stats reset for CONFIG RESETSTAT / initServer().
Now CONFIG RESETSTAT makes sure to reset all the fields, and in the
future it will be simpler to avoid missing new fields.
2014-03-19 12:55:49 +01:00
antirez
e695beb0f7 Cluster: SORT get keys helper implemented. 2014-03-10 16:26:08 +01:00
antirez
fa29266526 Cluster: evalGetKey() added for EVAL/EVALSHA.
Previously we used zunionInterGetKeys(), however after this function was
fixed to account for the destination key (not needed when the API was
designed for "diskstore") the two set of commands can no longer be served
by an unique keys-extraction function.
2014-03-10 15:26:13 +01:00
antirez
24f7ef6e3b Cluster: getKeysFromCommand() API cleaned up.
This API originated from the "diskstore" experiment, not for Redis
Cluster itself, so there were legacy/useless things trying to
differentiate between keys that are going to be overwritten and keys
that need to be fetched from disk (preloaded).

All useless with Cluster, so removed with the result of code
simplification.
2014-03-10 13:18:41 +01:00
zhanghailei
45c373db00 refer to updateLRUClock's comment REDIS_LRU_CLOCK_MAX is 22 bits,but #define REDIS_LRU_CLOCK_MAX ((1<<21)-1) only 21 bits 2014-03-04 12:20:31 +08:00
antirez
eeb949a94f Initial implementation of BITPOS.
It appears to work but more stress testing, and both unit tests and
fuzzy testing, is needed in order to ensure the implementation is sane.
2014-02-27 12:44:27 +01:00
antirez
bea68cc21f Update cached time in rdbLoad() callback.
server.unixtime and server.mstime are cached less precise timestamps
that we use every time we don't need an accurate time representation and
a syscall would be too slow for the number of calls we require.

Such an example is the initialization and update process of the last
interaction time with the client, that is used for timeouts.

However rdbLoad() can take some time to load the DB, but at the same
time it did not updated the time during DB loading. This resulted in the
bug described in issue #1535, where in the replication process the slave
loads the DB, creates the redisClient representation of its master, but
the timestamp is so old that the master, under certain conditions, is
sensed as already "timed out".

Thanks to @yoav-steinberg and Redis Labs Inc for the bug report and
analysis.
2014-02-13 15:13:26 +01:00
antirez
84152ddd22 AOF: don't abort on write errors unless fsync is 'always'.
A system similar to the RDB write error handling is used, in which when
we can't write to the AOF file, writes are no longer accepted until we
are able to write again.

For fsync == always we still abort on errors since there is currently no
easy way to avoid replying with success to the user otherwise, and this
would violate the contract with the user of only acknowledging data
already secured on disk.
2014-02-12 16:11:36 +01:00
antirez
c94826dfdf CLIENT PAUSE and related API implemented.
The API is one of the bulding blocks of CLUSTER FAILOVER command that
executes a manual failover in Redis Cluster. However exposed as a
command that the user can call directly, it makes much simpler to
upgrade a standalone Redis instance using a slave in a safer way.

The commands works like that:

    CLIENT PAUSE <milliesconds>

All the clients that are not slaves and not in MONITOR state are paused
for the specified number of milliesconds. This means that slaves are
normally served in the meantime.

At the end of the specified amount of time all the clients are unblocked
and will continue operations normally. This command has no effects on
the population of the slow log, since clients are not blocked in the
middle of operations but only when there is to process new data.

Note that while the clients are unblocked, still new commands are
accepted and queued in the client buffer, so clients will likely not
block while writing to the server while the pause is active.
2014-02-04 16:16:09 +01:00
antirez
d0db6b9fcf Scripting: use mstime() and mstime_t for lua_time_start.
server.lua_time_start is expressed in milliseconds. Use mstime_t instead
of long long, and populate it with mstime() instead of ustime()/1000.

Functionally identical but more natural.
2014-02-03 15:45:40 +01:00
antirez
2d253d1543 Option "backlog" renamed "tcp-backlog".
This is especially important since we already have a concept of backlog
(the replication backlog).
2014-01-31 14:56:10 +01:00
Nenad Merdanovic
ca81272ea4 Add support for listen(2) backlog definition
In high RPS environments, the default listen backlog is not sufficient, so
giving users the power to configure it is the right approach, especially
since it requires only minor modifications to the code.
2014-01-31 14:52:10 +01:00
antirez
bc300b22af Cluster: configurable replicas migration barrier.
It is possible to configure the min number of additional working slaves
a master should be left with, for a slave to migrate to an orphaned
master.
2014-01-31 11:26:36 +01:00
antirez
6ece04b1fc Cluster: function clusterGetSlaveRank() added.
Return the number of slaves for the same master having a better
replication offset of the current slave, that is, the slave "rank" used
to pick a delay before the request for election.
2014-01-29 16:39:04 +01:00
antirez
fdab41fe65 Cluster: support to read from slave nodes.
A client can enter a special cluster read-only mode using the READONLY
command: if the client read from a slave instance after this command,
for slots that are actually served by the instance's master, the queries
will be processed without redirection, allowing clients to read from
slaves (but without any kind fo read-after-write guarantee).

The READWRITE command can be used in order to exit the readonly state.
2014-01-14 16:33:16 +01:00
antirez
639613b3f0 Set REDIS_AOF_REWRITE_MIN_SIZE to 64mb.
64mb is the default value in redis.conf. For some reason instead the
hard-coded default was 1mb that is too small.
2014-01-14 11:27:28 +01:00
antirez
c0cdcaf373 Don't send REPLCONF ACK to old masters.
Masters not understanding REPLCONF ACK will reply with errors to our
requests causing a number of possible issues.

This commit detects a global replication offest set to -1 at the end of
the replication, and marks the client representing the master with the
REDIS_PRE_PSYNC flag.

Note that this flag was called REDIS_PRE_PSYNC_SLAVE but now it is just
REDIS_PRE_PSYNC as it is used for both slaves and masters starting with
this commit.

This commit fixes issue #1488.
2014-01-08 14:28:16 +01:00
Yubao Liu
9846af124d CONFIG REWRITE: don't throw some options on config rewrite
Those options will be thrown without this patch:
  include, rename-command, min-slaves-to-write, min-slaves-max-lag,
appendfilename.
2013-12-19 15:56:48 +01:00
Yossi Gottlieb
74d9f048fa Fix wrong repldboff type which causes dropped replication in rare cases. 2013-12-11 11:38:02 +01:00
antirez
ccd6ccc7dd Slaves heartbeats during sync improved.
The previous fix for false positive timeout detected by master was not
complete. There is another blocking stage while loading data for the
first synchronization with the master, that is, flushing away the
current data from the DB memory.

This commit uses the newly introduced dict.c callback in order to make
some incremental work (to send "\n" heartbeats to the master) while
flushing the old data from memory.

It is hard to write a regression test for this issue unfortunately. More
support for debugging in the Redis core would be needed in terms of
functionalities to simulate a slow DB loading / deletion.
2013-12-10 18:47:31 +01:00
antirez
247a311317 dict.c: added optional callback to dictEmpty().
Redis hash table implementation has many non-blocking features like
incremental rehashing, however while deleting a large hash table there
was no way to have a callback called to do some incremental work.

This commit adds this support, as an optiona callback argument to
dictEmpty() that is currently called at a fixed interval (one time every
65k deletions).
2013-12-10 18:46:24 +01:00
antirez
a7ebb0c7bf WAIT command: synchronous replication for Redis. 2013-12-04 16:20:03 +01:00
antirez
83e363d3e6 BLPOP blocking code refactored to be generic & reusable. 2013-12-03 17:43:53 +01:00
antirez
a6ed453b33 Removed old comments and dead code from freeClient(). 2013-12-03 13:54:06 +01:00
antirez
5502face59 Cluster: basic data structures for nodes black list. 2013-11-29 17:37:06 +01:00
antirez
adbba45d5d Sentinel: test for writable config file.
This commit introduces a funciton called when Sentinel is ready for
normal operations to avoid putting Sentinel specific stuff in redis.c.
2013-11-21 12:28:15 +01:00
antirez
d345a59943 Sentinel: sentinelFlushConfig() to CONFIG REWRITE + fsync. 2013-11-19 10:13:04 +01:00
antirez
45666c4c22 Sentinel: CONFIG REWRITE support for Sentinel config. 2013-11-19 09:48:12 +01:00
antirez
1a1eb8bc8d SCAN code refactored to parse cursor first.
The previous implementation of SCAN parsed the cursor in the generic
function implementing SCAN, SSCAN, HSCAN and ZSCAN.

The actual higher-level command implementation only checked for empty
keys and return ASAP in that case. The result was that inverting the
arguments of, for instance, SSCAN for example and write:

    SSCAN 0 key

Instead of

    SSCAN key 0

Resulted into no error, since 0 is a non-existing key name very likely.
Just the iterator returned no elements at all.

In order to fix this issue the code was refactored to extract the
function to parse the cursor and return the error. Every higher level
command implementation now parses the cursor and later checks if the key
exist or not.
2013-11-05 15:47:50 +01:00
antirez
b2618c6cdb ZSCAN implemented. 2013-10-28 11:36:42 +01:00
antirez
6618167a9f HSCAN implemented. 2013-10-28 11:35:26 +01:00
antirez
e96ffac563 SSCAN implemented. 2013-10-28 11:17:32 +01:00
Pieter Noordhuis
956c0ed927 Add SCAN command 2013-10-25 10:49:48 +02:00
antirez
e4b341a335 Cluster: time switched from seconds to milliseconds.
All the internal state of cluster involving time is now using mstime_t
and mstime() in order to use milliseconds resolution.

Also the clusterCron() function is called with a 10 hz frequency instead
of 1 hz.

The cluster node_timeout must be also configured in milliseconds by the
user in redis.conf.
2013-10-09 16:19:26 +02:00
antirez
1560b70889 Cluster: cluster stuff moved from redis.h to cluster.h. 2013-10-09 15:38:05 +02:00
antirez
dbf6c85d5e Cluster: new clusterDoBeforeSleep() API.
The new API is able to remember operations to perform before returning
to the event loop, such as checking if there is the failover quorum for
a slave, save and fsync the configuraiton file, and so forth.

Because this operations are performed before returning on the event
loop we are sure that messages that are sent in the same event loop run
will be delivered *after* the configuration is already saved, that is a
requirement sometimes. For instance we want to publish a new epoch only
when it is already stored in nodes.conf in order to avoid returning back
in the logical clock when a node is restarted.

This new API provides a big performance advantage compared to saving and
possibly fsyncing the configuration file multiple times in the same
event loop run, especially in the case of big clusters with tens or
hundreds of nodes.
2013-10-03 09:58:06 +02:00