3610 Commits

Author SHA1 Message Date
antirez
3a9bf5e618 Cluster: PFAIL -> FAIL transition allowed for slaves.
First change: now there is no need to be a master in order to detect a
failure, however the majority of masters signaling PFAIL or FAIL is needed.

This change is important because it allows slaves rejoining the cluster
after a partition to sense the FAIL condition so that eventually all the
nodes agree on failures.
2013-09-20 11:26:44 +02:00
antirez
3f5034d1d7 Cluster: added time field in cluster bus messages.
The time is sent in requests, and copied back in reply packets.
This way the receiver can compare the time field in a reply with its
local clock and check the age of the request associated with this reply.

This is an easy way to discard delayed replies. Note that only a clock
is used here, that is the one of the node sending the packet. The
receiver only copies the field back into the reply, so no
synchronization is needed between clocks of different hosts.
2013-09-20 09:22:21 +02:00
antirez
90e1829ec4 Allow AUTH / PING when disconnected from slave and serve-stale-data is no. 2013-09-17 09:46:06 +02:00
antirez
c7cb80c8bb Cluster: don't add an handshake node for the same ip:port pair multiple times. 2013-09-04 15:52:16 +02:00
antirez
79a1deac28 Cluster: free HANDSHAKE nodes after node_timeout.
Handshake nodes should turn into normal nodes or be freed in a
reasonable amount of time, otherwise they'll keep accumulating if the
address they are associated with is not reachable for some reason.
2013-09-04 12:41:21 +02:00
antirez
ee99df2d59 redis-cli: fix big keys search when the key no longer exist.
The code freed a reply object that was never created, resulting in a
segfault every time randomkey returned a key that was deleted before we
queried it for size.
2013-09-04 10:35:53 +02:00
antirez
232f84ec2d Cluster: CLUSTER SAVECONFIG command added. 2013-09-04 10:33:00 +02:00
antirez
61eb16c4da Cluster: don't save HANDSHAKE nodes in nodes.conf. 2013-09-04 10:25:26 +02:00
antirez
6e460eac58 Cluster: always use safe iteartors to iterate server.cluster->nodes. 2013-09-04 10:07:50 +02:00
Maxim Zakharov
ff18243fce mistype fixed 2013-09-03 15:15:51 +02:00
Maxim Zakharov
1885c6bada A mistype fixed 2013-09-03 15:15:48 +02:00
antirez
853defe071 Cluster: clusterReadHandler() reworked to be more correct and simpler to follow. 2013-09-03 11:43:52 +02:00
antirez
e307150c21 Cluster: use non-blocking I/O for the cluster bus. 2013-09-03 11:43:52 +02:00
antirez
d3726385c2 Cluster: fixed a bug in clusterSendPublish() due to inverted statements.
The code used to copy the header *after* the 'hdr' pointer was already
switched to the new buffer. Of course we need to do the reverse.
2013-09-03 11:43:43 +02:00
antirez
ebe91a49c9 Test: Lua stack leak regression test added. 2013-08-30 08:59:11 +02:00
antirez
b3277bab4b Test: added a memory efficiency test. 2013-08-29 16:23:57 +02:00
antirez
47f6823a73 Fixed critical memory leak from EVAL.
Multiple missing calls to lua_pop prevented the error handler function
pushed on the stack for lua_pcall() to be popped before returning,
causing a memory leak in almost all the code paths of EVAL (both
successful calls and calls returning errors).

This caused two issues: Lua leaking memory (and this was very visible
from INFO memory output, as the 'used_memory_lua' field reported an
always increasing amount of memory used), and as a result slower and
slower GC cycles resulting in all the CPU being used.

Thanks to Tanguy Le Barzic for noticing something was wrong with his 2.8
slave, and for creating a testing EC2 environment where I was able to
investigate the issue.
2013-08-29 11:54:03 +02:00
yihuang
a8ec7abb1b fix lua_cmsgpack pack map as array 2013-08-27 15:19:25 +02:00
antirez
82189282e7 Fix an hypothetical issue in processMultibulkBuffer(). 2013-08-27 13:00:06 +02:00
antirez
acbaa37cbf Remove useful check from tryObjectEncoding().
We are sure the string is large, since when the sds optimization branch
is entered it means that it was not possible to encode it as EMBSTR for
size concerns.
2013-08-27 12:36:52 +02:00
antirez
3816606eb2 tryObjectEncoding(): optimize sds strings if possible.
When no encoding is possible, at least try to reallocate the sds string
with one that does not waste memory (with free space at the end of the
buffer) when the string is large enough.
2013-08-27 12:15:13 +02:00
antirez
99ec8f117d tryObjectEncoding(): don't call stringl2() for too big strings.
We are sure that a string that is longer than 21 chars cannot be
represented by a 64 bit signed integer, as -(2^64) is 21 chars:

strlen(-18446744073709551616) => 21
2013-08-27 11:56:47 +02:00
antirez
77c3c946a1 Don't over-allocate the sds string for large bulk requests.
The call to sdsMakeRoomFor() did not accounted for the amount of data
already present in the query buffer, resulting into over-allocation.
2013-08-27 11:54:38 +02:00
antirez
2c3fa3e392 DEBUG SDSLEN added.
This command is only useful for low-level debugging of memory issues due
to sds wasting memory as empty buffer at the end of the string.
2013-08-27 11:53:49 +02:00
antirez
2862182e9e Update server.lastbgsave_status when fork() fails. 2013-08-27 10:16:29 +02:00
antirez
a97a8b8954 Only run the fast active expire cycle if master & enabled. 2013-08-27 09:31:55 +02:00
antirez
33b286bf68 Don't update node pong time via gossip.
This feature was implemented in the initial days of the Redis Cluster
implementaiton but is not a good idea at all.

1) It depends on clocks to be synchronized, that is already very bad.
2) Moreover it adds a bug where the pong time is updated via gossip so
no new PING is ever sent by the current node, with the effect of no PONG
received, no update of tables, no clearing of PFAIL flag.

In general to trust other nodes about the reachability of other nodes is
a broken distributed programming model.
2013-08-26 16:16:25 +02:00
antirez
d7d90442f5 Cluster: set event handler in cluster bus listening socket.
The commit using listenToPort() introduced this bug by no longer
creating the event handler to handle incoming messages from the cluster
bus.
2013-08-22 14:53:53 +02:00
antirez
a8dc4ecd21 Use listenToPort() in cluster.c as well. 2013-08-22 14:05:07 +02:00
antirez
85bfeb8fea Opening TCP listening ports refactored into a function. 2013-08-22 14:01:16 +02:00
antirez
d42715de50 Print error message when can't bind * on any address. 2013-08-22 13:02:59 +02:00
antirez
f45f05531d Cluster: fix CLUSTER MEET ip address validation.
This was broken by the IPv6 support patches.
2013-08-22 11:54:28 +02:00
antirez
77c71b2046 Cluster: process MEET packets as PING packets.
Somewhat a previous commit broken this so CLUSTER MEET was no longer
working.
2013-08-22 11:53:28 +02:00
antirez
487951c9b4 Use a safe dict.c iterator in clusterCron(). 2013-08-21 15:51:15 +02:00
antirez
18317ae6b4 Fix for issue #1214 simplified. 2013-08-21 11:36:09 +02:00
Salvatore Sanfilippo
617c877d29 Merge pull request #1214 from kaoshijuan/unstable
fixed initServer fail problem
2013-08-21 02:18:41 -07:00
antirez
2f47ed9f5f Use printf %zu specifier to print private_dirty. 2013-08-20 12:04:57 +02:00
antirez
13c59cfdc8 dictFingerprint(): cast pointers to integer of same size. 2013-08-20 11:49:55 +02:00
antirez
5824ab54dd Revert "Fixed type in dict.c comment: 265 -> 256."
This reverts commit d22d557e41151c1d716045e0059550e197d6e526.
2013-08-19 17:25:48 +02:00
antirez
d22d557e41 Fixed type in dict.c comment: 265 -> 256. 2013-08-19 15:10:37 +02:00
antirez
ded611636f assert.h replaced with redisassert.h when appropriate.
Also a warning was suppressed by including unistd.h in redisassert.h
(needed for _exit()).
2013-08-19 15:01:21 +02:00
antirez
3a9c595ab5 Added redisassert.h as drop in replacement for assert.h.
By using redisassert.h version of assert() you get stack traces in the
log instead of a process disappearing on assertions.
2013-08-19 15:01:15 +02:00
antirez
ae1bb62f62 dictFingerprint() fingerprinting made more robust.
The previous hashing used the trivial algorithm of xoring the integers
together. This is not optimal as it is very likely that different
hash table setups will hash the same, for instance an hash table at the
start of the rehashing process, and at the end, will have the same
fingerprint.

Now we hash N integers in a smarter way, by summing every integer to the
previous hash, and taking the integer hashing again (see the code for
further details). This way it is a lot less likely that we get a
collision. Moreover this way of hashing explicitly protects from the
same set of integers in a different order to hash to the same number.

This commit is related to issue #1240.
2013-08-19 15:01:12 +02:00
antirez
4bb257b480 Fix comments for correctness in zunionInterGenericCommand().
Related to issue #1240.
2013-08-19 15:01:05 +02:00
antirez
5173de0525 Properly init/release iterators in zunionInterGenericCommand().
This commit does mainly two things:

1) It fixes zunionInterGenericCommand() by removing mass-initialization
of all the iterators used, so that we don't violate the unsafe iterator
API of dictionaries. This fixes issue #1240.

2) Since the zui* APIs required the allocator to be initialized in the
zsetopsrc structure in order to use non-iterator related APIs, this
commit fixes this strict requirement by accessing objects directly via
the op->subject->ptr pointer we have to the object.
2013-08-19 15:01:01 +02:00
antirez
bfaadb0df2 dict.c iterator API misuse protection.
dict.c allows the user to create unsafe iterators, that are iterators
that will not touch the dictionary data structure in any way, preventing
copy on write, but at the same time are limited in their usage.

The limitation is that when itearting with an unsafe iterator, no call
to other dictionary functions must be done inside the iteration loop,
otherwise the dictionary may be incrementally rehashed resulting into
missing elements in the set of the elements returned by the iterator.

However after introducing this kind of iterators a number of bugs were
found due to misuses of the API, and we are still finding
bugs about this issue. The bugs are not trivial to track because the
effect is just missing elements during the iteartion.

This commit introduces auto-detection of the API misuse. The idea is
that an unsafe iterator has a contract: from initialization to the
release of the iterator the dictionary should not change.

So we take a fingerprint of the dictionary state, xoring a few important
dict properties when the unsafe iteartor is initialized. We later check
when the iterator is released if the fingerprint is still the same. If it
is not, we found a misuse of the iterator, as not allowed API calls
changed the internal state of the dictionary.

This code was checked against a real bug, issue #1240.

This is what Redis prints (aborting) when a misuse is detected:

Assertion failed: (iter->fingerprint == dictFingerprint(iter->d)),
function dictReleaseIterator, file dict.c, line 587.
2013-08-19 15:00:57 +02:00
antirez
88f51adf22 Use precomptued objects for bulk and mbulk prefixes. 2013-08-12 12:50:49 +02:00
antirez
a33c9fb250 replicationFeedSlaves() func name typo: feedReplicationBacklogWithObject -> feedReplicationBacklog. 2013-08-12 12:50:45 +02:00
antirez
6268dbdd94 replicationFeedSlave() reworked for correctness and speed.
The previous code using a static buffer as an optimization was lame:

1) Premature optimization, actually it was *slower* than naive code
   because resulted into the creation / destruction of the object
   encapsulating the output buffer.
2) The code was very hard to test, since it was needed to have specific
   tests for command lines exceeding the size of the static buffer.
3) As a result of "2" the code was bugged as the current tests were not
   able to stress specific corner cases.

It was replaced with easy to understand code that is safer and faster.
2013-08-12 12:50:29 +02:00
antirez
21cde6ecb7 Fix a PSYNC bug caused by a variable name typo. 2013-08-12 11:51:35 +02:00