4056 Commits

Author SHA1 Message Date
Salvatore Sanfilippo
17e4e34634 Merge pull request #1606 from mattsta/fix-disk-full-dataloss
Fix data loss when save AOF/RDB with no free space
2014-03-24 19:26:25 +01:00
Matt Stancliff
49875adb2b Sentinel: Notify user when config can't be saved 2014-03-24 13:54:14 -04:00
Matt Stancliff
a2c4706fc7 Fix data loss when save AOF/RDB with no free space
Previously, the (!fp) would only catch lack of free space
under OS X.  Linux waits to discover it can't write until
it actually writes contents to disk.

(fwrite() returns success even if the underlying file
has no free space to write into.  All the errors
only show up at flush/sync/close time.)

Fixes antirez/redis#1604
2014-03-24 13:54:14 -04:00
Salvatore Sanfilippo
deb298091c Merge pull request #1617 from mattsta/remove-unused-warning
Cluster: remove variable causing warning
2014-03-24 18:33:22 +01:00
Salvatore Sanfilippo
b8fed78aa5 Merge pull request #1629 from mattsta/fix-trib-master-assignment
Cluster: Restore proper trib master iteration
2014-03-24 18:31:55 +01:00
Salvatore Sanfilippo
8f72f7df40 Merge pull request #1628 from mattsta/fix-trib-create
Cluster: Fix trib create when masters==replicas
2014-03-24 18:26:17 +01:00
Salvatore Sanfilippo
7815daa329 Merge pull request #1627 from badboy/lru-fix
Fixed a few typos.
2014-03-24 18:13:39 +01:00
Salvatore Sanfilippo
2697da6db9 Merge pull request #1609 from badboy/install_server-fix
Finally fix the `install_server.sh` script.
2014-03-24 18:10:50 +01:00
Matt Stancliff
794d3401e9 Cluster: Restore proper trib master iteration
This got removed in 15ef91e during a new feature addition.

The prior commit had "break if masters.length == masters_count"
but we are guaranteed to aready have that condition met since
otherwise we would haven't gotten this far.

Without this break statement, it's possible some masters may
be forgotten and have zero replicas while other masters have
more than their requested number of replicas.

Thanks to carlos for pointing out this regression at:
https://groups.google.com/forum/#!topic/redis-db/_WVVqDw5B7c
2014-03-24 10:17:44 -04:00
Matt Stancliff
173f9ae558 Cluster: Fix trib create when masters==replicas
This bug was introduced in 15ef91eb during a refactor.

It took me a while to understand what was going on with
the code, so I've refactored it further by:
  - Replacing boolean values with meaningful symbols
  - Replacing 'i' with a meaningful variable name
  - Adding the proper abort check
  - Factoring out now duplicated conditionals
  - Adding optional verbose logging (we're inside *four*
    different looping constructs, so it takes a while to
    figure out where all the moving parts are)
  - Updating comment for the section

This fixes a problem when the number of master instances
equaled the number of replica instances.  Before, when
there were equal numbers of both, nodes_count would go to
zero, but the while loop would spin in i < @replicas because
i would never be updated (because the nodes_list of each ip
was length == 0, which triggered an endless loop of
next -> i = 0 -> 0 < 1? -> true -> next -> i = 0 ...)

Thanks to carlo who found this problem at:
https://groups.google.com/forum/#!topic/redis-db/_WVVqDw5B7c
2014-03-24 10:17:38 -04:00
antirez
d84c84f268 Sample and cache RSS in serverCron().
Obtaining the RSS (Resident Set Size) info is slow in Linux and OSX.
This slowed down the generation of the INFO 'memory' section.

Since the RSS does not require to be a real-time measurement, we
now sample it with server.hz frequency (10 times per second by default)
and use this value both to show the INFO rss field and to compute the
fragmentation ratio.

Practically this does not make any difference for memory profiling of
Redis but speeds up the INFO call significantly.
2014-03-24 12:00:20 +01:00
antirez
e2c6bb0a2a sdscatvprintf(): Try to use a static buffer.
For small content the function now tries to use a static buffer to avoid
a malloc/free cycle that is too costly when the function is used in the
context of performance critical code path such as INFO output generation.

This change was verified to have positive effects in the execution speed
of the INFO command.
2014-03-24 10:20:33 +01:00
antirez
982ca268a2 Cache uname() output across INFO calls.
Uname was profiled to be a slow syscall. It produces always the same
output in the context of a single execution of Redis, so calling it at
every INFO output generation does not make too much sense.

The uname utsname structure was modified as a static variable. At the
same time a static integer was added to check if we need to call uname
the first time.
2014-03-24 10:00:08 +01:00
antirez
57e4d0ca6b sdscatvprintf(): guess buflen using format length.
sdscatvprintf() uses a loop where it tries to output the formatted
string in a buffer of the initial length, if there was not enough room,
a buffer of doubled size is tried and so forth.

The initial guess for the buffer length was very poor, an hardcoded
"16". This caused the printf to be processed multiple times without a
good reason. Given that printf functions are already not fast, the
overhead was significant.

The new heuristic is to use a buffer 4 times the length of the format
buffer, and 32 as minimal size. This appears to be a good balance for
typical uses of the function inside the Redis code base.

This change improved INFO command performances 3 times.
2014-03-24 09:44:11 +01:00
antirez
1e39792ba7 Add test-lru.rb to utils.
This is a program useful to evaluate the Redis LRU algorithm behavior.
2014-03-21 09:52:05 +01:00
antirez
7b50be26a9 Use getLRUClock() instead of server.lruclock to create objects.
Thanks to Matt Stancliff for noticing this error. It was in the original
code but somehow I managed to remove the change from the commit...
2014-03-21 09:08:20 +01:00
antirez
385bc5807e The default maxmemory policy is now noeviction.
This is safer as by default maxmemory should just set a memory limit
without any key to be deleted, unless the policy is set to something
more relaxed.
2014-03-21 08:03:34 +01:00
Jan-Erik Rediger
1b6ddffbbf Fixed a few typos. 2014-03-20 23:16:38 +01:00
antirez
40dc99ffd2 Use 24 bits for the lru object field and improve resolution.
There were 2 spare bits inside the Redis object structure that are now
used in order to enlarge 4x the range of the LRU field.

At the same time the resolution was improved from 10 to 1 second: this
still provides 194 days before the LRU counter overflows (restarting from
zero).

This is not a problem since it only causes lack of eviction precision for
objects not touched for a very long time, and the lack of precision is
only temporary.
2014-03-20 17:56:27 +01:00
antirez
c1c4fc292b Default LRU samples is now 5. 2014-03-20 17:05:42 +01:00
antirez
ac12b1c1db Use new dictGetRandomKeys() API to get samples for eviction.
The eviction quality degradates a bit in my tests, but since the API is
faster, it allows to raise the number of samples, and overall is a win.
2014-03-20 16:52:12 +01:00
antirez
74f0696064 struct dictEntry -> dictEntry. 2014-03-20 16:20:37 +01:00
antirez
28573f8fe8 Added dictGetRandomKeys() to dict.c: mass get random entries.
This new function is useful to get a number of random entries from an
hash table when we just need to do some sampling without particularly
good distribution.

It just jumps at a random place of the hash table and returns the first
N items encountered by scanning linearly.

The main usefulness of this function is to speedup Redis internal
sampling of the key space, for example for key eviction or expiry.
2014-03-20 15:50:46 +01:00
antirez
1d1db42f20 LRU eviction pool implementation.
This is an improvement over the previous eviction algorithm where we use
an eviction pool that is persistent across evictions of keys, and gets
populated with the best candidates for evictions found so far.

It allows to approximate LRU eviction at a given number of samples
better than the previous algorithm used.
2014-03-20 11:57:29 +01:00
antirez
f73fcf5a18 Fix OBJECT IDLETIME return value converting to seconds.
estimateObjectIdleTime() returns a value in milliseconds now, so we need
to scale the output of OBJECT IDLETIME to seconds.
2014-03-20 11:55:18 +01:00
antirez
d7a2dd9f2e Obtain LRU clock in a resolution dependent way.
For testing purposes it is handy to have a very high resolution of the
LRU clock, so that it is possible to experiment with scripts running in
just a few seconds how the eviction algorithms works.

This commit allows Redis to use the cached LRU clock, or a value
computed on demand, depending on the resolution. So normally we have the
good performance of a precomputed value, and a clock that wraps in many
days using the normal resolution, but if needed, changing a define will
switch behavior to an high resolution LRU clock.
2014-03-20 11:47:12 +01:00
antirez
c847202e8b Specify lruclock in redisServer structure via REDIS_LRU_BITS.
The padding field was totally useless: removed.
2014-03-20 11:37:27 +01:00
antirez
2f5a67dbca Specify LRU resolution in milliseconds. 2014-03-20 11:33:25 +01:00
antirez
a8127d9fcb Set LRU parameters via REDIS_LRU_BITS define. 2014-03-20 11:22:47 +01:00
antirez
624be01ff0 Unify stats reset for CONFIG RESETSTAT / initServer().
Now CONFIG RESETSTAT makes sure to reset all the fields, and in the
future it will be simpler to avoid missing new fields.
2014-03-19 12:55:49 +01:00
Matt Stancliff
2b5f60352a Cluster: remove variable causing warning
GCC-4.9 warned about this, but clang didn't.

This commit fixes warning:
sentinel.c: In function 'sentinelReceiveHelloMessages':
sentinel.c:2156:43: warning: variable 'master' set but not used [-Wunused-but-set-variable]
     sentinelRedisInstance *ri = c->data, *master;
2014-03-18 15:35:09 -04:00
antirez
4bcedf006b Sentinel: sentinelRefreshInstanceInfo() minor refactoring.
Test sentinel.tilt condition on top and return if it is true.
This allows to remove the check for the tilt condition in the remaining
code paths of the function.
2014-03-18 15:35:47 +01:00
antirez
ebdf5ddfa9 Sentinel test: 02 unit better coverage + refactoring. 2014-03-18 15:18:51 +01:00
antirez
c870142fca Sentinel test: foreach_instance_id implements 'break'. 2014-03-18 15:06:52 +01:00
antirez
d9148d9fde Sentinel: instance_is_killed proc added to sentinel.tcl. 2014-03-18 14:58:27 +01:00
antirez
edd99535a3 Sentinel: propagate down-after-ms changes to slaves and sentinels. 2014-03-18 14:37:44 +01:00
antirez
d2107f42f7 Sentinel: down-after-milliseconds is not master-specific.
addReplySentinelRedisInstance() modified so that this field is displayed
for all the kind of instances: Sentinels, Masters, Slaves.
2014-03-18 11:21:17 +01:00
antirez
832dfff34a Sentinel failure detection implementation improved.
Failure detection in Sentinel is ping-pong based. It used to work by
remembering the last time a valid PONG reply was received, and checking
if the reception time was too old compared to the current current time.

PINGs were sent at a fixed interval of 1 second.

This works in a decent way, but does not scale well when we want to set
very small values of "down-after-milliseconds" (this is the node
timeout basically).

This commit reiplements the failure detection making a number of
changes. Some changes are inspired to Redis Cluster failure detection
code:

* A new last_ping_time field is added in representation of instances.
  If non zero, we have an active ping that was sent at the specified
  time. When a valid reply to ping is received, the field is zeroed
  again.
* last_ping_time is not reset when we reconnect the link or send a new
  ping, so from our point of view it represents the time we started
  waiting for the instance to reply to our pings without receiving a
  reply.
* last_ping_time is now used in order to check if the instance is
  timed out. This means that we can have a node timeout of 100
  milliseconds and yet the system will work well since the new check is
  not bound to the period used to send pings.
* Pings are now sent every second, or often if the value of
  down-after-milliseconds is less than one second. With a lower limit of
  10 HZ ping frequency.
* Link reconnection code was improved. This is used in order to try to
  reconnect the link when we are at 50% of the node timeout without a
  valid reply received yet. However the old code triggered unnecessary
  reconnections when the node timeout was very small. Now that should be
  ok.

The new code passes the tests but more testing is needed and more unit
tests stressing the failure detector, so currently this is merged only
in the unstable branch.
2014-03-17 18:33:45 +01:00
antirez
becda90be4 Sentinel: use CLIENT SETNAME when connecting to Redis.
This makes debugging / monitoring of Sentinels simpler since you can
identify sentinels in CLIENT LIST output of Redis instances.
2014-03-15 14:59:23 +01:00
Jan-Erik Rediger
8858e0264b Finally fix the install_server.sh script.
Includes changes from a dozen bug reports and pull requests.
Was tested on Ubuntu, Debian and CentOS.
2014-03-15 14:43:50 +01:00
Salvatore Sanfilippo
9e8e818e20 Merge pull request #1608 from mattsta/fix-sentinel-current-epoch-segfault
Fix segfault from accessing array out of bounds
2014-03-14 22:56:24 +01:00
Matt Stancliff
a371b2975b Fix segfault from accessing array out of bounds
argc == 2; argv[2] == crash
2014-03-14 17:38:05 -04:00
antirez
5b52765845 Sentinel: be safe under crash-recovery assumptions.
Sentinel's main safety argument is that there are no two configurations
for the same master with the same version (configuration epoch).

For this to be true Sentinels require to be authorized by a majority.
Additionally Sentinels require to do two important things:

* Never vote again for the same epoch.
* Never exchange an old vote for a fresh one.

The first prerequisite, in a crash-recovery system model, requires to
persist the master->leader_epoch on durable storage before to reply to
messages. This was not the case.

We also make sure to persist the current epoch in order to never reply
to stale votes requests from other Sentinels, after a recovery.

The configuration is persisted by making use of fsync(), this is
considered in the context of this code a good enough guarantee that
after a restart our durable state is restored, however this may not
always be the case depending on the kind of hardware and operating
system used.
2014-03-14 14:58:44 +01:00
antirez
0e9801990f Sentinel: fake PUBLISH command to receive HELLO messages.
Now the way HELLO messages are received is unified.
Now it is no longer needed for Sentinels to converge to the higher
configuration for a master to be able to chat via some Redis instance,
the are able to directly exchanges configurations.

Note that this commit does not include the (trivial) change needed to
send HELLO messages to Sentinel instances as well, since for an error I
committed the change in the previous commit that refactored hello
messages processing into a separated function.
2014-03-14 11:07:42 +01:00
antirez
1dfeeceb87 Sentinel: HELLO processing refactored into sentinelProcessHelloMessage(). 2014-03-14 11:07:42 +01:00
antirez
b34fea52e2 Cluster: flag the transaction as dirty for the new redirections. 2014-03-13 15:11:53 +01:00
antirez
8954da8136 Linenoise updated, multiline mode enabled in redis-cli. 2014-03-13 15:11:08 +01:00
antirez
18dafd646a redis-trib: call MIGRATE via r.client.call as fix for redis-rb API changes.
See issue #1593.

Thanks to @badboy for suggesting the direct client.call fix.
2014-03-11 16:10:13 +01:00
antirez
5604970426 redis-trib: new subcommand 'call'. Exec command in all nodes.
Example:

./redis-trib.rb call 192.168.1.11:7000 config get cluster-node-timeout
2014-03-11 14:58:55 +01:00
antirez
15ef91eb7b redis-trib: create subcommand is now able to assign spare slaves.
Example: if the user will try to configure a cluster with 9 nodes,
asking for 1 slave for master, redis-trib will configure a 4 masters
cluster with 1 slave each as usually, but this time will assign the
spare node as a slave of one of the masters.
2014-03-11 14:17:28 +01:00