21057 Commits

Author SHA1 Message Date
antirez
6502947a85 redis-check-rdb: initialize entry in case while is never entered. 2015-01-30 15:19:39 +01:00
antirez
834d0eaa37 Cluster: some bias towwards FAIL/PFAIL nodes in gossip sections.
This improves PFAIL -> FAIL switch. Too late at this point in the RC
releases to add proper PFAIL/FAIL separate dictionary to do this in a
less randomized way. Tested in practice with experiments that this
helps. PFAIL -> FAIL average with 20 nodes and node-timeout set to 5
seconds takes 2.5 seconds without this commit, 1 second with this
commit.
2015-01-30 11:55:36 +01:00
antirez
233729fe7f Cluster: some bias towwards FAIL/PFAIL nodes in gossip sections.
This improves PFAIL -> FAIL switch. Too late at this point in the RC
releases to add proper PFAIL/FAIL separate dictionary to do this in a
less randomized way. Tested in practice with experiments that this
helps. PFAIL -> FAIL average with 20 nodes and node-timeout set to 5
seconds takes 2.5 seconds without this commit, 1 second with this
commit.
2015-01-30 11:55:36 +01:00
antirez
bd205275cf More correct wanted / maxiterations values in clusterSendPing(). 2015-01-30 11:23:27 +01:00
antirez
69b4f00d28 More correct wanted / maxiterations values in clusterSendPing(). 2015-01-30 11:23:27 +01:00
antirez
d05f8b1c78 Cluster: magical 10% of nodes explained in comments. 2015-01-29 15:43:35 +01:00
antirez
e5a22064cc Cluster: magical 10% of nodes explained in comments. 2015-01-29 15:43:35 +01:00
antirez
5d0e61b0d3 CLUSTER count-failure-reports command added. 2015-01-29 15:02:10 +01:00
antirez
1efacfe53d CLUSTER count-failure-reports command added. 2015-01-29 15:02:10 +01:00
antirez
27ea79d373 Cluster: use a number of gossip sections proportional to cluster size.
Otherwise it is impossible to receive the majority of failure reports in
the node_timeout*2 window in larger clusters.

Still with a 200 nodes cluster, 20 gossip sections are a very reasonable
amount of bytes to send.

A side effect of this change is also fater cluster nodes joins for large
clusters, because the cluster layout makes less time to propagate.
2015-01-29 14:20:59 +01:00
antirez
3fd43062c8 Cluster: use a number of gossip sections proportional to cluster size.
Otherwise it is impossible to receive the majority of failure reports in
the node_timeout*2 window in larger clusters.

Still with a 200 nodes cluster, 20 gossip sections are a very reasonable
amount of bytes to send.

A side effect of this change is also fater cluster nodes joins for large
clusters, because the cluster layout makes less time to propagate.
2015-01-29 14:20:59 +01:00
Matt Stancliff
8ada516fb6 Improve RDB error-on-load handling
Previouly if we loaded a corrupt RDB, Redis printed an error report
with a big "REPORT ON GITHUB" message at the bottom.  But, we know
RDB load failures are corrupt data, not corrupt code.

Now when RDB failure is detected (duplicate keys or unknown data
types in the file), we run check-rdb against the RDB then exit.  The
automatic check-rdb hopefully gives the user instant feedback
about what is wrong instead of providing a mysterious stack
trace.
2015-01-28 11:19:00 -05:00
Matt Stancliff
d8c7db1bdb Improve RDB error-on-load handling
Previouly if we loaded a corrupt RDB, Redis printed an error report
with a big "REPORT ON GITHUB" message at the bottom.  But, we know
RDB load failures are corrupt data, not corrupt code.

Now when RDB failure is detected (duplicate keys or unknown data
types in the file), we run check-rdb against the RDB then exit.  The
automatic check-rdb hopefully gives the user instant feedback
about what is wrong instead of providing a mysterious stack
trace.
2015-01-28 11:19:00 -05:00
Matt Stancliff
2902fe7597 Remove code duplication from check-rdb
redis-check-rdb (previously redis-check-dump) had every RDB define
copy/pasted from rdb.h and some defines copied from redis.h.  Since
the initial copy, some constants had changed in Redis headers and
check-dump was using incorrect values.

Since check-rdb is now a mode of Redis, the old check-dump code
is cleaned up to:
  - replace all printf with redisLog (and remove \n from all strings)
  - remove all copy/pasted defines to use defines from rdb.h and redis.h
  - replace all malloc/free with zmalloc/zfree
  - remove unnecessary include headers
2015-01-28 11:18:18 -05:00
Matt Stancliff
764b000c3e Remove code duplication from check-rdb
redis-check-rdb (previously redis-check-dump) had every RDB define
copy/pasted from rdb.h and some defines copied from redis.h.  Since
the initial copy, some constants had changed in Redis headers and
check-dump was using incorrect values.

Since check-rdb is now a mode of Redis, the old check-dump code
is cleaned up to:
  - replace all printf with redisLog (and remove \n from all strings)
  - remove all copy/pasted defines to use defines from rdb.h and redis.h
  - replace all malloc/free with zmalloc/zfree
  - remove unnecessary include headers
2015-01-28 11:18:18 -05:00
Matt Stancliff
302a22ad8c Convert check-dump to Redis check-rdb mode
redis-check-dump is now named redis-check-rdb and it runs
as a mode of redis-server instead of an independent binary.

You can now use 'redis-server redis.conf --check-rdb' to check
the RDB defined in redis.conf.  Using argument --check-rdb
checks the RDB and exits.  We could potentially also allow
the server to continue starting if the RDB check succeeds.

This change also enables us to use RDB checking programatically
from inside Redis for certain failure conditions.
2015-01-28 11:18:16 -05:00
Matt Stancliff
145473acc5 Convert check-dump to Redis check-rdb mode
redis-check-dump is now named redis-check-rdb and it runs
as a mode of redis-server instead of an independent binary.

You can now use 'redis-server redis.conf --check-rdb' to check
the RDB defined in redis.conf.  Using argument --check-rdb
checks the RDB and exits.  We could potentially also allow
the server to continue starting if the RDB check succeeds.

This change also enables us to use RDB checking programatically
from inside Redis for certain failure conditions.
2015-01-28 11:18:16 -05:00
mattcollier
3db601e064 Update redis-cli.c
Code was adding '\n'  (line 521) to the end of NIL values exlusively making csv output inconsistent.  Removed '\n'
2015-01-25 14:01:39 -05:00
mattcollier
6ec5f1f780 Update redis-cli.c
Code was adding '\n'  (line 521) to the end of NIL values exlusively making csv output inconsistent.  Removed '\n'
2015-01-25 14:01:39 -05:00
antirez
f46c66ec9a Cluster: initialized not used fileds in gossip section.
Otherwise we risk sending not initialized data to other nodes, that may
contain anything. This was actually not possible only because the
initialization of the buffer where the cluster packets header is created
was larger than the 3 gossip sections we use, so the memory was already
all filled with zeroes by the memset().
2015-01-24 07:52:24 +01:00
antirez
9802ec3c83 Cluster: initialized not used fileds in gossip section.
Otherwise we risk sending not initialized data to other nodes, that may
contain anything. This was actually not possible only because the
initialization of the buffer where the cluster packets header is created
was larger than the 3 gossip sections we use, so the memory was already
all filled with zeroes by the memset().
2015-01-24 07:52:24 +01:00
antirez
848a382619 dict.c: make chaining strategy more clear in dictAddRaw(). 2015-01-23 18:11:05 +01:00
antirez
8aaf5075c5 dict.c: make chaining strategy more clear in dictAddRaw(). 2015-01-23 18:11:05 +01:00
antirez
eed24343df DEBUG structsize
Show sizes of a few important data structures in Redis. More missing.
2015-01-23 18:10:14 +01:00
antirez
7885e1264e DEBUG structsize
Show sizes of a few important data structures in Redis. More missing.
2015-01-23 18:10:14 +01:00
antirez
453ddb1c20 The seed must be static in getRandomHexChars(). 2015-01-22 11:10:50 +01:00
antirez
e4d65e35e6 The seed must be static in getRandomHexChars(). 2015-01-22 11:10:50 +01:00
antirez
9cdd8c5599 counter must be static in getRandomHexChars(). 2015-01-22 11:00:26 +01:00
antirez
9826038f0b counter must be static in getRandomHexChars(). 2015-01-22 11:00:26 +01:00
antirez
0d76782d97 getRandomHexChars(): use /dev/urandom just to seed.
On Darwin /dev/urandom depletes terribly fast. This is not an issue
normally, but with Redis Cluster we generate a lot of unique IDs, for
example during nodes handshakes. Our IDs need just to be unique without
other strong crypto requirements, so this commit turns the function into
something that gets a 20 bytes seed from /dev/urandom, and produces the
rest of the output just using SHA1 in counter mode.
2015-01-21 23:21:55 +01:00
antirez
87301be151 getRandomHexChars(): use /dev/urandom just to seed.
On Darwin /dev/urandom depletes terribly fast. This is not an issue
normally, but with Redis Cluster we generate a lot of unique IDs, for
example during nodes handshakes. Our IDs need just to be unique without
other strong crypto requirements, so this commit turns the function into
something that gets a 20 bytes seed from /dev/urandom, and produces the
rest of the output just using SHA1 in counter mode.
2015-01-21 23:21:55 +01:00
antirez
106c183300 Merge branch 'clusterfixes' into unstable 2015-01-21 19:30:22 +01:00
antirez
af8d1b4bda Merge branch 'clusterfixes' into unstable 2015-01-21 19:30:22 +01:00
Matt Stancliff
ccc4753ae6 Fix cluster migrate memory leak
Fixes valgrind error:
48 bytes in 1 blocks are definitely lost in loss record 196 of 373
   at 0x4910D3: je_malloc (jemalloc.c:944)
   by 0x42807D: zmalloc (zmalloc.c:125)
   by 0x41FA0D: dictGetIterator (dict.c:543)
   by 0x41FA48: dictGetSafeIterator (dict.c:555)
   by 0x459B73: clusterHandleSlaveMigration (cluster.c:2776)
   by 0x45BF27: clusterCron (cluster.c:3123)
   by 0x423344: serverCron (redis.c:1239)
   by 0x41D6CD: aeProcessEvents (ae.c:311)
   by 0x41D8EA: aeMain (ae.c:455)
   by 0x41A84B: main (redis.c:3832)
2015-01-21 18:47:16 +01:00
Matt Stancliff
051a43e03a Fix cluster migrate memory leak
Fixes valgrind error:
48 bytes in 1 blocks are definitely lost in loss record 196 of 373
   at 0x4910D3: je_malloc (jemalloc.c:944)
   by 0x42807D: zmalloc (zmalloc.c:125)
   by 0x41FA0D: dictGetIterator (dict.c:543)
   by 0x41FA48: dictGetSafeIterator (dict.c:555)
   by 0x459B73: clusterHandleSlaveMigration (cluster.c:2776)
   by 0x45BF27: clusterCron (cluster.c:3123)
   by 0x423344: serverCron (redis.c:1239)
   by 0x41D6CD: aeProcessEvents (ae.c:311)
   by 0x41D8EA: aeMain (ae.c:455)
   by 0x41A84B: main (redis.c:3832)
2015-01-21 18:47:16 +01:00
Matt Stancliff
f41f3403c6 Fix potential invalid read past end of array
If array has N elements, we can't read +1 if we are already at N.

Also, we need to move elements by their storage size in the array,
not just by individual bytes.
2015-01-21 18:01:03 +01:00
Matt Stancliff
29049507ec Fix potential invalid read past end of array
If array has N elements, we can't read +1 if we are already at N.

Also, we need to move elements by their storage size in the array,
not just by individual bytes.
2015-01-21 18:01:03 +01:00
Matt Stancliff
ae87d2454b Fix cluster reset memory leak
[maybe] Fixes valgrind errors:
32 bytes in 4 blocks are definitely lost in loss record 107 of 228
   at 0x80EA447: je_malloc (jemalloc.c:944)
   by 0x806E59C: zrealloc (zmalloc.c:125)
   by 0x80A9AFC: clusterSetMaster (cluster.c:801)
   by 0x80AEDC9: clusterCommand (cluster.c:3994)
   by 0x80682A5: call (redis.c:2049)
   by 0x8068A20: processCommand (redis.c:2309)
   by 0x8076497: processInputBuffer (networking.c:1143)
   by 0x8073BAF: readQueryFromClient (networking.c:1208)
   by 0x8060E98: aeProcessEvents (ae.c:412)
   by 0x806123B: aeMain (ae.c:455)
   by 0x806C3DB: main (redis.c:3832)

64 bytes in 8 blocks are definitely lost in loss record 143 of 228
   at 0x80EA447: je_malloc (jemalloc.c:944)
   by 0x806E59C: zrealloc (zmalloc.c:125)
   by 0x80AAB40: clusterProcessPacket (cluster.c:801)
   by 0x80A847F: clusterReadHandler (cluster.c:1975)
   by 0x30000FF: ???

80 bytes in 10 blocks are definitely lost in loss record 148 of 228
   at 0x80EA447: je_malloc (jemalloc.c:944)
   by 0x806E59C: zrealloc (zmalloc.c:125)
   by 0x80AAB40: clusterProcessPacket (cluster.c:801)
   by 0x80A847F: clusterReadHandler (cluster.c:1975)
   by 0x2FFFFFF: ???
2015-01-21 17:51:57 +01:00
Matt Stancliff
30152554ea Fix cluster reset memory leak
[maybe] Fixes valgrind errors:
32 bytes in 4 blocks are definitely lost in loss record 107 of 228
   at 0x80EA447: je_malloc (jemalloc.c:944)
   by 0x806E59C: zrealloc (zmalloc.c:125)
   by 0x80A9AFC: clusterSetMaster (cluster.c:801)
   by 0x80AEDC9: clusterCommand (cluster.c:3994)
   by 0x80682A5: call (redis.c:2049)
   by 0x8068A20: processCommand (redis.c:2309)
   by 0x8076497: processInputBuffer (networking.c:1143)
   by 0x8073BAF: readQueryFromClient (networking.c:1208)
   by 0x8060E98: aeProcessEvents (ae.c:412)
   by 0x806123B: aeMain (ae.c:455)
   by 0x806C3DB: main (redis.c:3832)

64 bytes in 8 blocks are definitely lost in loss record 143 of 228
   at 0x80EA447: je_malloc (jemalloc.c:944)
   by 0x806E59C: zrealloc (zmalloc.c:125)
   by 0x80AAB40: clusterProcessPacket (cluster.c:801)
   by 0x80A847F: clusterReadHandler (cluster.c:1975)
   by 0x30000FF: ???

80 bytes in 10 blocks are definitely lost in loss record 148 of 228
   at 0x80EA447: je_malloc (jemalloc.c:944)
   by 0x806E59C: zrealloc (zmalloc.c:125)
   by 0x80AAB40: clusterProcessPacket (cluster.c:801)
   by 0x80A847F: clusterReadHandler (cluster.c:1975)
   by 0x2FFFFFF: ???
2015-01-21 17:51:57 +01:00
Matt Stancliff
7167c2dc8e Fix sending uninitialized bytes
Fixes valgrind error:
Syscall param write(buf) points to uninitialised byte(s)
   at 0x514C35D: ??? (syscall-template.S:81)
   by 0x456B81: clusterWriteHandler (cluster.c:1907)
   by 0x41D596: aeProcessEvents (ae.c:416)
   by 0x41D8EA: aeMain (ae.c:455)
   by 0x41A84B: main (redis.c:3832)
 Address 0x5f268e2 is 2,274 bytes inside a block of size 8,192 alloc'd
   at 0x4932D1: je_realloc (jemalloc.c:1297)
   by 0x428185: zrealloc (zmalloc.c:162)
   by 0x4269E0: sdsMakeRoomFor.part.0 (sds.c:142)
   by 0x426CD7: sdscatlen (sds.c:251)
   by 0x4579E7: clusterSendMessage (cluster.c:1995)
   by 0x45805A: clusterSendPing (cluster.c:2140)
   by 0x45BB03: clusterCron (cluster.c:2944)
   by 0x423344: serverCron (redis.c:1239)
   by 0x41D6CD: aeProcessEvents (ae.c:311)
   by 0x41D8EA: aeMain (ae.c:455)
   by 0x41A84B: main (redis.c:3832)
 Uninitialised value was created by a stack allocation
   at 0x457810: nodeUpdateAddressIfNeeded (cluster.c:1236)
2015-01-21 17:50:17 +01:00
Matt Stancliff
72b8574cca Fix sending uninitialized bytes
Fixes valgrind error:
Syscall param write(buf) points to uninitialised byte(s)
   at 0x514C35D: ??? (syscall-template.S:81)
   by 0x456B81: clusterWriteHandler (cluster.c:1907)
   by 0x41D596: aeProcessEvents (ae.c:416)
   by 0x41D8EA: aeMain (ae.c:455)
   by 0x41A84B: main (redis.c:3832)
 Address 0x5f268e2 is 2,274 bytes inside a block of size 8,192 alloc'd
   at 0x4932D1: je_realloc (jemalloc.c:1297)
   by 0x428185: zrealloc (zmalloc.c:162)
   by 0x4269E0: sdsMakeRoomFor.part.0 (sds.c:142)
   by 0x426CD7: sdscatlen (sds.c:251)
   by 0x4579E7: clusterSendMessage (cluster.c:1995)
   by 0x45805A: clusterSendPing (cluster.c:2140)
   by 0x45BB03: clusterCron (cluster.c:2944)
   by 0x423344: serverCron (redis.c:1239)
   by 0x41D6CD: aeProcessEvents (ae.c:311)
   by 0x41D8EA: aeMain (ae.c:455)
   by 0x41A84B: main (redis.c:3832)
 Uninitialised value was created by a stack allocation
   at 0x457810: nodeUpdateAddressIfNeeded (cluster.c:1236)
2015-01-21 17:50:17 +01:00
antirez
9a540bf3cb AOF rewrite: set iterator var to NULL when freed.
The cleanup code expects that if 'di' is not NULL, it is a valid
iterator that should be freed.

The result of this bug was a crash of the AOF rewriting process if an
error occurred after the DBs data are written and the iterator is no
longer valid.
2015-01-21 16:42:08 +01:00
antirez
4433f5a7f2 AOF rewrite: set iterator var to NULL when freed.
The cleanup code expects that if 'di' is not NULL, it is a valid
iterator that should be freed.

The result of this bug was a crash of the AOF rewriting process if an
error occurred after the DBs data are written and the iterator is no
longer valid.
2015-01-21 16:42:08 +01:00
antirez
735adaa62d Cluster: node deletion cleanup / centralization. 2015-01-21 16:03:43 +01:00
antirez
2601e3e461 Cluster: node deletion cleanup / centralization. 2015-01-21 16:03:43 +01:00
antirez
c13e0820ad Cluster: set the slaves->slaveof filed to NULL when master is freed.
Related to issue #2289.
2015-01-21 15:55:53 +01:00
antirez
59ad6ac5fe Cluster: set the slaves->slaveof filed to NULL when master is freed.
Related to issue #2289.
2015-01-21 15:55:53 +01:00
antirez
e63ad12b8f Fix gcc warning for lack of casting to char pointer. 2015-01-21 14:51:42 +01:00
antirez
92cfab44b2 Fix gcc warning for lack of casting to char pointer. 2015-01-21 14:51:42 +01:00
antirez
2d7e7141a5 luaRedisGenericCommand(): log error at WARNING level when re-entered.
Rationale is that when re-entering, it is likely due to Lua debugging
hooks. Returning an error will be ignored in most cases, going totally
unnoticed. With the log at least we leave a trace.

Related to issue #2302.
2015-01-20 23:21:21 +01:00