8709 Commits

Author SHA1 Message Date
Oran Agra
806736cdf9 Adding real allocator fragmentation to INFO and MEMORY command + active defrag test
other fixes / improvements:
- LUA script memory isn't taken from zmalloc (taken from libc malloc)
  so it can cause high fragmentation ratio to be displayed (which is false)
- there was a problem with "fragmentation" info being calculated from
  RSS and used_memory sampled at different times (now sampling them together)

other details:
- adding a few more allocator info fields to INFO and MEMORY commands
- improve defrag test to measure defrag latency of big keys
- increasing the accuracy of the defrag test (by looking at real grag info)
  this way we can use an even lower threshold and still avoid false positives
- keep the old (total) "fragmentation" field unchanged, but add new ones for spcific things
- add these the MEMORY DOCTOR command
- deduct LUA memory from the rss in case of non jemalloc allocator (one for which we don't "allocator active/used")
- reduce sampling rate of the rss and allocator info
2018-03-12 15:08:52 +02:00
Oran Agra
d66739931b Adding real allocator fragmentation to INFO and MEMORY command + active defrag test
other fixes / improvements:
- LUA script memory isn't taken from zmalloc (taken from libc malloc)
  so it can cause high fragmentation ratio to be displayed (which is false)
- there was a problem with "fragmentation" info being calculated from
  RSS and used_memory sampled at different times (now sampling them together)

other details:
- adding a few more allocator info fields to INFO and MEMORY commands
- improve defrag test to measure defrag latency of big keys
- increasing the accuracy of the defrag test (by looking at real grag info)
  this way we can use an even lower threshold and still avoid false positives
- keep the old (total) "fragmentation" field unchanged, but add new ones for spcific things
- add these the MEMORY DOCTOR command
- deduct LUA memory from the rss in case of non jemalloc allocator (one for which we don't "allocator active/used")
- reduce sampling rate of the rss and allocator info
2018-03-12 15:08:52 +02:00
Oran Agra
be1b4aa9aa active defrag v2
- big keys are not defragged in one go from within the dict scan
  instead they are scanned in parts after the main dict hash bucket is done.
- add latency monitor sample for defrag
- change default active-defrag-cycle-min to induce lower latency
- make active defrag start a new scan right away if needed, so it's easier
  (for the test suite) to detect when it's done
- make active defrag quick the current cycle after each db / big key
- defrag  some non key long term global allocations
- some refactoring for smaller functions and more reusable code
- during dict rehashing, one scan iteration of the dict, can end up scanning
  one bucket in the smaller dict and many many buckets in the larger dict.
  so waiting for 16 scan iterations before checking the time, may be much too long.
2018-03-12 15:07:43 +02:00
Oran Agra
eb83a56f8e active defrag v2
- big keys are not defragged in one go from within the dict scan
  instead they are scanned in parts after the main dict hash bucket is done.
- add latency monitor sample for defrag
- change default active-defrag-cycle-min to induce lower latency
- make active defrag start a new scan right away if needed, so it's easier
  (for the test suite) to detect when it's done
- make active defrag quick the current cycle after each db / big key
- defrag  some non key long term global allocations
- some refactoring for smaller functions and more reusable code
- during dict rehashing, one scan iteration of the dict, can end up scanning
  one bucket in the smaller dict and many many buckets in the larger dict.
  so waiting for 16 scan iterations before checking the time, may be much too long.
2018-03-12 15:07:43 +02:00
Otmar Ertl
97bde9f623 use all 64 bits of the hash value instead of 63 2018-03-11 09:18:00 +01:00
Otmar Ertl
86ad4e06b8 use all 64 bits of the hash value instead of 63 2018-03-11 09:18:00 +01:00
Otmar Ertl
44698f45e7 made constant static 2018-03-10 20:44:20 +01:00
Otmar Ertl
09f818bc20 made constant static 2018-03-10 20:44:20 +01:00
Otmar Ertl
633983d479 improved definition of HLL_Q 2018-03-10 20:22:42 +01:00
Otmar Ertl
bacac52eec improved definition of HLL_Q 2018-03-10 20:22:42 +01:00
Otmar Ertl
1e9a774871 improved HyperLogLog cardinality estimation
based on method described in https://arxiv.org/abs/1702.01284
that does not rely on any magic constants
2018-03-10 20:13:21 +01:00
Otmar Ertl
47b0cfcbd2 improved HyperLogLog cardinality estimation
based on method described in https://arxiv.org/abs/1702.01284
that does not rely on any magic constants
2018-03-10 20:13:21 +01:00
Otmar Ertl
6470b21f59 replaced tab by spaces 2018-03-10 20:09:41 +01:00
Otmar Ertl
3eaf0e07d2 replaced tab by spaces 2018-03-10 20:09:41 +01:00
Guy Benoish
b660fc2fbe Fix zlexrangespec mem-leak in genericZrangebylexCommand 2018-03-07 10:40:37 +07:00
Guy Benoish
290a63dc54 Don't call sdscmp() with shared.maxstring or shared.minstring 2018-03-06 20:14:35 +07:00
Guy Benoish
0888eb4a00 Don't call sdscmp() with shared.maxstring or shared.minstring 2018-03-06 20:14:35 +07:00
pan.liangp
f4eb64cd35 move get clients max buffer calculate into info clients command 2018-03-02 17:16:00 +08:00
pan.liangp
fb23cd0627 move get clients max buffer calculate into info clients command 2018-03-02 17:16:00 +08:00
antirez
84b281209a Stream: update the listpack pointer in streamTrimByLength(). 2018-03-01 17:26:02 +01:00
antirez
c277aaf3b4 Stream: update the listpack pointer in streamTrimByLength(). 2018-03-01 17:26:02 +01:00
antirez
efcbc01fbd Remove warning from lpGet snprintf(). 2018-03-01 15:26:27 +01:00
antirez
8e0fbea741 Remove warning from lpGet snprintf(). 2018-03-01 15:26:27 +01:00
antirez
d63caaa820 redis-cli: fix missed unit in array. Change define name. 2018-03-01 15:06:41 +01:00
antirez
1db665cda9 redis-cli: fix missed unit in array. Change define name. 2018-03-01 15:06:41 +01:00
charsyam
da7f5700cf refactoring-call-aeDeleteFileEvent-twice-in-freeClusterLink 2018-03-01 22:30:39 +09:00
charsyam
ef132d1337 refactoring-call-aeDeleteFileEvent-twice-in-freeClusterLink 2018-03-01 22:30:39 +09:00
charsyam
51a03f6356 fix dlopen leak 2018-03-01 21:22:42 +09:00
charsyam
063e4b44c0 fix dlopen leak 2018-03-01 21:22:42 +09:00
Salvatore Sanfilippo
83b5b5a476
Merge pull request #4714 from charsyam/feature/fix-out-of-index-range
[BugFix] Fix out of array index range for findBigKeys in redis-cli
2018-03-01 03:39:15 -08:00
Salvatore Sanfilippo
38ecac9dd0 Merge pull request #4714 from charsyam/feature/fix-out-of-index-range
[BugFix] Fix out of array index range for findBigKeys in redis-cli
2018-03-01 03:39:15 -08:00
antirez
3a5bf75ede Actually use ae_flags to add AE_BARRIER if needed.
Many thanks to @Plasma that spotted this problem reviewing the code.
2018-02-28 18:03:51 +01:00
antirez
99f94354a6 Actually use ae_flags to add AE_BARRIER if needed.
Many thanks to @Plasma that spotted this problem reviewing the code.
2018-02-28 18:03:51 +01:00
Salvatore Sanfilippo
7a73db7512
Merge pull request #4715 from charsyam/feature/refactoring-make-condition-clear-for-rdb
[BugFix] fix calculation length in rdbSaveAuxField
2018-02-27 10:15:27 -08:00
Salvatore Sanfilippo
c3934db151 Merge pull request #4715 from charsyam/feature/refactoring-make-condition-clear-for-rdb
[BugFix] fix calculation length in rdbSaveAuxField
2018-02-27 10:15:27 -08:00
antirez
92696e49d2 expireIfNeeded() needed a top comment documenting the behavior. 2018-02-27 16:44:43 +01:00
antirez
550181a96b expireIfNeeded() needed a top comment documenting the behavior. 2018-02-27 16:44:43 +01:00
antirez
b00c4ffab5 expireIfNeeded() comment: claim -> pretend. 2018-02-27 16:37:37 +01:00
antirez
4db08588cc expireIfNeeded() comment: claim -> pretend. 2018-02-27 16:37:37 +01:00
charsyam
76386c48b8 refactoring-make-condition-clear-for-rdb 2018-02-27 21:55:20 +09:00
charsyam
7bf2ef9dba refactoring-make-condition-clear-for-rdb 2018-02-27 21:55:20 +09:00
charsyam
6168d5a1a6 fix-out-of-index-range-for-redis-cli-findbigkey 2018-02-27 21:46:19 +09:00
charsyam
aecbdde3c0 fix-out-of-index-range-for-redis-cli-findbigkey 2018-02-27 21:46:19 +09:00
antirez
956350ef89 ae.c: insetad of not firing, on AE_BARRIER invert the sequence.
AE_BARRIER was implemented like:

    - Fire the readable event.
    - Do not fire the writabel event if the readable fired.

However this may lead to the writable event to never be called if the
readable event is always fired. There is an alterantive, we can just
invert the sequence of the calls in case AE_BARRIER is set. This commit
does that.
2018-02-27 13:06:42 +01:00
antirez
b745b98f3d ae.c: insetad of not firing, on AE_BARRIER invert the sequence.
AE_BARRIER was implemented like:

    - Fire the readable event.
    - Do not fire the writabel event if the readable fired.

However this may lead to the writable event to never be called if the
readable event is always fired. There is an alterantive, we can just
invert the sequence of the calls in case AE_BARRIER is set. This commit
does that.
2018-02-27 13:06:42 +01:00
antirez
75987431f0 AOF: fix a bug that may prevent proper fsyncing when fsync=always.
In case the write handler is already installed, it could happen that we
serve the reply of a query in the same event loop cycle we received it,
preventing beforeSleep() from guaranteeing that we do the AOF fsync
before sending the reply to the client.

The AE_BARRIER mechanism, introduced in a previous commit, prevents this
problem. This commit makes actual use of this new feature to fix the
bug.
2018-02-27 13:06:42 +01:00
antirez
5a57c953c9 AOF: fix a bug that may prevent proper fsyncing when fsync=always.
In case the write handler is already installed, it could happen that we
serve the reply of a query in the same event loop cycle we received it,
preventing beforeSleep() from guaranteeing that we do the AOF fsync
before sending the reply to the client.

The AE_BARRIER mechanism, introduced in a previous commit, prevents this
problem. This commit makes actual use of this new feature to fix the
bug.
2018-02-27 13:06:42 +01:00
antirez
533d0e0375 Cluster: improve crash-recovery safety after failover auth vote.
Add AE_BARRIER to the writable event loop so that slaves requesting
votes can't be served before we re-enter the event loop in the next
iteration, so clusterBeforeSleep() will fsync to disk in time.
Also add the call to explicitly fsync, given that we modified the last
vote epoch variable.
2018-02-27 13:06:42 +01:00
antirez
61da5a9da8 Cluster: improve crash-recovery safety after failover auth vote.
Add AE_BARRIER to the writable event loop so that slaves requesting
votes can't be served before we re-enter the event loop in the next
iteration, so clusterBeforeSleep() will fsync to disk in time.
Also add the call to explicitly fsync, given that we modified the last
vote epoch variable.
2018-02-27 13:06:42 +01:00
antirez
548e478e40 ae.c: introduce the concept of read->write barrier.
AOF fsync=always, and certain Redis Cluster bus operations, require to
fsync data on disk before replying with an acknowledge.
In such case, in order to implement Group Commits, we want to be sure
that queries that are read in a given cycle of the event loop, are never
served to clients in the same event loop iteration. This way, by using
the event loop "before sleep" callback, we can fsync the information
just one time before returning into the event loop for the next cycle.
This is much more efficient compared to calling fsync() multiple times.

Unfortunately because of a bug, this was not always guaranteed: the
actual way the events are installed was the sole thing that could
control. Normally this problem is hard to trigger when AOF is enabled
with fsync=always, because we try to flush the output buffers to the
socekt directly in the beforeSleep() function of Redis. However if the
output buffers are full, we actually install a write event, and in such
a case, this bug could happen.

This change to ae.c modifies the event loop implementation to make this
concept explicit. Write events that are registered with:

    AE_WRITABLE|AE_BARRIER

Are guaranteed to never fire after the readable event was fired for the
same file descriptor. In this way we are sure that data is persisted to
disk before the client performing the operation receives an
acknowledged.

However note that this semantics does not provide all the guarantees
that one may believe are automatically provided. Take the example of the
blocking list operations in Redis.

With AOF and fsync=always we could have:

    Client A doing: BLPOP myqueue 0
    Client B doing: RPUSH myqueue a b c

In this scenario, Client A will get the "a" elements immediately after
the Client B RPUSH will be executed, even before the operation is persisted.
However when Client B will get the acknowledge, it can be sure that
"b,c" are already safe on disk inside the list.

What to note here is that it cannot be assumed that Client A receiving
the element is a guaranteed that the operation succeeded from the point
of view of Client B.

This is due to the fact that the barrier exists within the same socket,
and not between different sockets. However in the case above, the
element "a" was not going to be persisted regardless, so it is a pretty
synthetic argument.
2018-02-27 13:06:42 +01:00