452 Commits

Author SHA1 Message Date
antirez
aa1bc12f39 New commands: BITOP and BITCOUNT.
The motivation for this new commands is to be search in the usage of
Redis for real time statistics. See the article "Fast real time metrics
using Redis".

http://blog.getspool.com/2011/11/29/fast-easy-realtime-metrics-using-redis-bitmaps/

In general Redis strings when used as bitmaps using the SETBIT/GETBIT
command provide a very space-efficient and fast way to store statistics.
For instance in a web application with users, every user can be
associated with a key that shows every day in which the user visited the
web service. This information can be really valuable to extract user
behaviour information.

With Redis bitmaps doing this is very simple just saying that a given
day is 0 (the data the service was put online) and all the next days are
1, 2, 3, and so forth. So with SETBIT it is possible to set the bit
corresponding to the current day every time the user visits the site.

It is possible to take the count of the bit sets on the run, this is
extremely easy using a Lua script. However a fast bit count native
operation can be useful, especially if it can operate on ranges, or when
the string is small like in the case of days (even if you consider many
years it is still extremely little data).

For this reason BITOP was introduced. The command counts the number of
bits set to 1 in a string, with optional range:

BITCOUNT key [start end]

The start/end parameters are similar to GETRANGE. If omitted the whole
string is tested.

Population counting is more useful when bit-level operations like AND,
OR and XOR are avaialble. For instance I can test multiple users to see
the number of days three users visited the site at the same time. To do
this we can take the AND of all the bitmaps, and then count the set bits.

For this reason the BITOP command was introduced:

BITOP [AND|OR|XOR|NOT] dest_key src_key1 src_key2 src_key3 ... src_keyN

In the special case of NOT (that inverts the bits) only one source key
can be passed.

The judicious use of BITCOUNT and BITOP combined can lead to interesting
use cases with very space efficient representation of data.

The implementation provided is still not tested and optimized for speed,
next commits will introduce unit tests. Later the implementation will be
profiled to see if it is possible to gain an important amount of speed
without making the code much more complex.
2012-05-24 15:19:43 +02:00
antirez
030406ac3a Add aof_rewrite_buffer_length INFO field.
The INFO output, persistence section, already contained the field
describing the size of the current AOF buffer to flush on disk. However
the other AOF buffer, used to accumulate changes during an AOF rewrite,
was not mentioned in the INFO output.

This commit introduces a new field called aof_rewrite_buffer_length with
the length of the rewrite buffer.
2012-05-24 15:19:18 +02:00
antirez
2a76a8e453 Allow an AOF rewrite buffer > 2GB (Fix for issue #504).
During the AOF rewrite process, the parent process needs to accumulate
the new writes in an in-memory buffer: when the child will terminate the
AOF rewriting process this buffer (that ist the difference between the
dataset when the rewrite was started, and the current dataset) is
flushed to the new AOF file.

We used to implement this buffer using an sds.c string, but sds.c has a
2GB limit. Sometimes the dataset can be big enough, the amount of writes
so high, and the rewrite process slow enough that we overflow the 2GB
limit, causing a crash, documented on github by issue #504.

In order to prevent this from happening, this commit introduces a new
system to accumulate writes, implemented by a linked list of blocks of
10 MB each, so that we also avoid paying the reallocation cost.

Note that theoretically modern operating systems may implement realloc()
simply as a remaping of the old pages, thus with very good performances,
see for instance the mremap() syscall on Linux. However this is not
always true, and jemalloc by default avoids doing this because there are
issues with the current implementation of mremap().

For this reason we are using a linked list of blocks instead of a single
block that gets reallocated again and again.

The changes in this commit lacks testing, that will be performed before
merging into the unstable branch. This fix will not enter 2.4 because it
is too invasive. However 2.4 will log a warning when the AOF rewrite
buffer is near to the 2GB limit.
2012-05-24 15:19:15 +02:00
antirez
d33db594a2 activeExpireCycle(): better precision in max time used.
activeExpireCycle() can consume no more than a few milliseconds per
iteration. This commit improves the precision of the check for the time
elapsed in two ways:

1) We check every 16 iterations instead of the main loop instead of 256.
2) We reset iterations at the start of the function and not every time
   we switch to the next database, so the check is correctly performed
   every 16 iterations.
2012-05-14 16:04:41 +02:00
antirez
2b103500e9 Impovements for: Redis timer, hashes rehashing, keys collection.
A previous commit introduced REDIS_HZ define that changes the frequency
of calls to the serverCron() Redis function. This commit improves
different related things:

1) Software watchdog: now the minimal period can be set according to
REDIS_HZ. The minimal period is two times the timer period, that is:

    (1000/REDIS_HZ)*2 milliseconds

2) The incremental rehashing is now performed in the expires dictionary
as well.

3) The activeExpireCycle() function was improved in different ways:

- Now it checks if it already used too much time using microseconds
  instead of milliseconds for better precision.
- The time limit is now calculated correctly, in the previous version
  the division was performed before of the multiplication resulting in
  a timelimit of 0 if HZ was big enough.
- Databases with less than 1% of buckets fill in the hash table are
  skipped, because getting random keys is too expensive in this
  condition.

4) tryResizeHashTables() is now called at every timer call, we need to
   match the number of calls we do to the expired keys colleciton cycle.

5) REDIS_HZ was raised to 100.
2012-05-13 21:52:35 +02:00
antirez
72c8a6e29d Redis timer interrupt frequency configurable as REDIS_HZ.
Redis uses a function called serverCron() that is very similar to the
timer interrupt of an operating system. This function is used to handle
a number of asynchronous things, like active expired keys collection,
clients timeouts, update of statistics, things related to the cluster
and replication, triggering of BGSAVE and AOF rewrite process, and so
forth.

In the past the timer was called 1 time per second. At some point it was
raised to 10 times per second, but it still was fixed and could not be
changed even at compile time, because different functions called from
serverCron() assumed a given fixed frequency.

This commmit makes the frequency configurable, so that it is simpler to
pick a good tradeoff between overhead of this function (that is usually
very small) and the responsiveness of Redis during a few critical
circumstances where a lot of work is done inside the timer.

An example of such a critical condition is mass-expire of a lot of keys
in the same second. Up to a given percentage of CPU time is used to
perform expired keys collection per expire cylce. Now changing the
REDIS_HZ macro it is possible to do less work but more times per second
in order to block the server for less time.

If this patch will work well in our tests it will enter Redis 2.6-final.
2012-05-13 16:40:29 +02:00
antirez
ad56660fa1 Comment improved so that the code goal is more clear. Thx to @agladysh. 2012-05-11 22:33:28 +02:00
antirez
52b9f9f082 More incremental active expired keys collection process.
If a large amonut of keys are all expiring about at the same time, the
"active" expired keys collection cycle used to block as far as the
percentage of already expired keys was >= 25% of the total population of
keys with an expire set.

This could block the server even for many seconds in order to reclaim
memory ASAP. The new algorithm uses at max a small amount of
milliseconds per cycle, even if this means reclaiming the memory less
promptly it also means a more responsive server.
2012-05-11 19:17:31 +02:00
antirez
1221530376 Use specific error if master is down and slave-serve-stale-data is set to no.
We used to reply -ERR ... message ..., now the reply is
instead -MASTERDOWN ... message ... so that it can be distinguished
easily by the other error conditions.
2012-05-02 20:57:55 +02:00
antirez
cbaaa13fe3 Don't use an alternative stack for SIGSEGV & co.
This commit reverts most of e0f4de6aafdd16716a079b2bae6c3842424efc00, in
order to use back main stack for signal handling.

The main reason is that otherwise it is completely pointless that we do
a lot of efforts to print the stack trace on crash, and the content of
the stack and registers as well. Using an alternate stack broken this
feature completely.
2012-04-26 16:21:19 +02:00
antirez
95f3f95fbf SHUTDOWN NOSAVE now can stop a non returning script. Issue #466. 2012-04-19 23:35:15 +02:00
antirez
59aa522200 Two small fixes to maxclients handling.
1) Don't accept maxclients set to < 0
2) Allow maxclients < 1024, it is useful for testing.
2012-04-18 11:31:24 +02:00
antirez
d6ed0f6f00 Print arch bits with redis-server -v 2012-04-12 11:50:18 +02:00
antirez
5a39fc973d Comment typo fixed. Clusetr -> Cluster. 2012-04-11 10:57:02 +02:00
antirez
670b1e8985 Check write(2) return value to avoid warnings, because in this context failing write is not critical. 2012-04-10 16:48:28 +02:00
antirez
372430fcc0 It is now possible to enable/disable RDB checksum computation from redis.conf or via CONFIG SET/GET. Also CONFIG SET support added for rdbcompression as well. 2012-04-10 15:47:10 +02:00
antirez
04c1bc9106 For coverage testing use exit() instead of _exit() when termiating saving children. 2012-04-07 12:11:23 +02:00
antirez
cb455b7258 New INFO field in persistence section: bgrewriteaof_scheduled. 2012-04-06 21:12:50 +02:00
Salvatore Sanfilippo
a8f0a0c982 Merge pull request #431 from anydot/f-signal
allocate alternate signal stack, change of sigaction flags for sigterm
2012-04-05 01:52:40 -07:00
antirez
461ffdabca Structure field controlling the INFO field master_link_down_since_seconds initialized correctly to avoid strange INFO output at startup when a slave has yet to connect to its master. 2012-04-04 18:32:22 +02:00
antirez
82e4f722ee New "os" field in INFO output providing information about the operating system. 2012-04-04 15:38:13 +02:00
antirez
e7d86b04ca SLAVEOF is not a write command. 2012-04-04 15:11:30 +02:00
antirez
540d395377 Print milliseconds of the current second in log lines timestamps. Sometimes precise timing is very important for debugging. 2012-04-04 15:11:17 +02:00
Premysl Hruby
e0f4de6aaf allocate alternate signal stack, change of sigaction flags for sigterm 2012-04-03 17:40:31 +02:00
antirez
974c66a26b When the user-provided 'maxclients' value is too big for the max number of files we can open, at least try to search the max the OS is allowing (in steps of 256 filedes). 2012-04-03 11:53:45 +02:00
Joseph Jang
2e7ff37280 Fixed a memory leak with replication
occurs when two or more dbs are replicated and at least one of them is >db10
2012-03-30 10:34:29 +02:00
antirez
3946c0afe7 Fixes for redisLogFromHandler(). 2012-03-28 13:51:23 +02:00
antirez
740f77af69 Log from signal handlers is now safer. 2012-03-28 13:45:39 +02:00
antirez
8ff012c473 Merge branch 'watchdog' into unstable 2012-03-28 13:16:19 +02:00
Premysl Hruby
f3fa6655c6 use server.unixtime instead of time(NULL) where possible (cluster.c not checked though) 2012-03-27 17:39:58 +02:00
antirez
25d500767a Redis software watchdog. 2012-03-27 11:47:51 +02:00
antirez
6afb293b49 New INFO field aof_delayed_fsync introduced.
This new field counts all the times Redis is configured with AOF enabled and
fsync policy 'everysec', but the previous fsync performed by the
background thread was not able to complete within two seconds, forcing
Redis to perform a write against the AOF file while the fsync is still
in progress (likely a blocking operation).
2012-03-25 11:27:35 +02:00
antirez
ef34aec719 Add used allocator in redis-server -v output. 2012-03-24 11:53:03 +01:00
antirez
118caec1a8 Correctly create shared.oomerr as an sds string. 2012-03-21 12:11:07 +01:00
antirez
3cd475b254 DEBUG should not be flagged as w otherwise we can not call DEBUG DIGEST and other commands against read only slaves. 2012-03-20 17:53:47 +01:00
antirez
a15f004026 Support for read-only slaves. Semantical fixes.
This commit introduces support for read only slaves via redis.conf and CONFIG GET/SET commands. Also various semantical fixes are implemented here:

1) MULTI/EXEC with only read commands now work where the server is into a state where writes (or commands increasing memory usage) are not allowed. Before this patch everything inside a transaction would fail in this conditions.

2) Scripts just calling read-only commands will work against read only
slaves, when the server is out of memory, or when persistence is into an
error condition. Before the patch EVAL always failed in this condition.
2012-03-20 17:32:48 +01:00
antirez
e47e5e87a2 Read-only flag removed from PUBLISH command. 2012-03-19 19:16:41 +01:00
antirez
c577014461 More memory tests implemented. Default number of iterations lowered to a more acceptable value of 50. 2012-03-18 18:03:27 +01:00
antirez
eec3337856 Number of iteration of --test-memory is now 300 (several minutes per gigabyte). Memtest86 and Memtester links are also displayed while running the test. 2012-03-18 17:25:00 +01:00
antirez
26330d4a06 First implementation of --test-memory. Still a work in progress. 2012-03-16 17:17:39 +01:00
antirez
74d1cad274 Fix for issue #391.
Use a simple protocol between clientsCron() and helper functions to
understand if the client is still valind and clientsCron() should
continue processing or if the client was freed and we should continue
with the next one.
2012-03-15 20:55:14 +01:00
antirez
696c28fcf9 Reclaim space from the client querybuf if needed. 2012-03-14 15:32:30 +01:00
antirez
94df4c639f Call all the helper functions needed by clientsCron() as clientsCronSomething() for clarity. 2012-03-14 09:56:22 +01:00
antirez
5bab1baf21 Process async client checks like client timeouts and BLPOP timeouts incrementally using a circular list. 2012-03-13 18:05:11 +01:00
antirez
afadac1728 Merge conflicts resolved. 2012-03-09 22:07:45 +01:00
antirez
4002005dc4 Instantaneous ops/sec figure in INFO output. 2012-03-08 16:15:37 +01:00
antirez
6c939b3752 run_id added to INFO output.
The Run ID is a field that identifies a single execution of the Redis
server. It can be useful for many purposes as it makes easy to detect if
the instance we are talking about is the same, or if it is a different
one or was rebooted. An application of run_id will be in the partial
synchronization of replication, where a slave may request a partial sync
from a given offset only if it is talking with the same master. Another
application is in failover and monitoring scripts.
2012-03-08 10:13:36 +01:00
antirez
bab3ce5a09 By default Redis refuses writes with an error if the latest BGSAVE failed (and at least one save point is configured). However people having good monitoring systems may prefer a server that continues to work, since they are notified that there are problems by their monitoring systems. This commit implements the ability to turn the feature on or off via redis.conf and CONFIG SET. 2012-03-07 18:02:26 +01:00
antirez
a380c4ad9c Refuse writes if can't persist on disk.
Redis now refuses accepting write queries if RDB persistence is
configured, but RDB snapshots can't be generated for some reason.
The status of the latest background save operation is now exposed
in the INFO output as well. This fixes issue #90.
2012-03-07 13:05:53 +01:00
antirez
8c0915285d Better MONITOR output, now includes client ip:port or the lua string if the command was executed by the scripting engine. 2012-03-07 12:12:15 +01:00