6726 Commits

Author SHA1 Message Date
Salvatore Sanfilippo
c3934db151 Merge pull request #4715 from charsyam/feature/refactoring-make-condition-clear-for-rdb
[BugFix] fix calculation length in rdbSaveAuxField
2018-02-27 10:15:27 -08:00
antirez
550181a96b expireIfNeeded() needed a top comment documenting the behavior. 2018-02-27 16:44:43 +01:00
antirez
4db08588cc expireIfNeeded() comment: claim -> pretend. 2018-02-27 16:37:37 +01:00
charsyam
7bf2ef9dba refactoring-make-condition-clear-for-rdb 2018-02-27 21:55:20 +09:00
charsyam
aecbdde3c0 fix-out-of-index-range-for-redis-cli-findbigkey 2018-02-27 21:46:19 +09:00
antirez
b745b98f3d ae.c: insetad of not firing, on AE_BARRIER invert the sequence.
AE_BARRIER was implemented like:

    - Fire the readable event.
    - Do not fire the writabel event if the readable fired.

However this may lead to the writable event to never be called if the
readable event is always fired. There is an alterantive, we can just
invert the sequence of the calls in case AE_BARRIER is set. This commit
does that.
2018-02-27 13:06:42 +01:00
antirez
5a57c953c9 AOF: fix a bug that may prevent proper fsyncing when fsync=always.
In case the write handler is already installed, it could happen that we
serve the reply of a query in the same event loop cycle we received it,
preventing beforeSleep() from guaranteeing that we do the AOF fsync
before sending the reply to the client.

The AE_BARRIER mechanism, introduced in a previous commit, prevents this
problem. This commit makes actual use of this new feature to fix the
bug.
2018-02-27 13:06:42 +01:00
antirez
61da5a9da8 Cluster: improve crash-recovery safety after failover auth vote.
Add AE_BARRIER to the writable event loop so that slaves requesting
votes can't be served before we re-enter the event loop in the next
iteration, so clusterBeforeSleep() will fsync to disk in time.
Also add the call to explicitly fsync, given that we modified the last
vote epoch variable.
2018-02-27 13:06:42 +01:00
antirez
6c7d27c711 ae.c: introduce the concept of read->write barrier.
AOF fsync=always, and certain Redis Cluster bus operations, require to
fsync data on disk before replying with an acknowledge.
In such case, in order to implement Group Commits, we want to be sure
that queries that are read in a given cycle of the event loop, are never
served to clients in the same event loop iteration. This way, by using
the event loop "before sleep" callback, we can fsync the information
just one time before returning into the event loop for the next cycle.
This is much more efficient compared to calling fsync() multiple times.

Unfortunately because of a bug, this was not always guaranteed: the
actual way the events are installed was the sole thing that could
control. Normally this problem is hard to trigger when AOF is enabled
with fsync=always, because we try to flush the output buffers to the
socekt directly in the beforeSleep() function of Redis. However if the
output buffers are full, we actually install a write event, and in such
a case, this bug could happen.

This change to ae.c modifies the event loop implementation to make this
concept explicit. Write events that are registered with:

    AE_WRITABLE|AE_BARRIER

Are guaranteed to never fire after the readable event was fired for the
same file descriptor. In this way we are sure that data is persisted to
disk before the client performing the operation receives an
acknowledged.

However note that this semantics does not provide all the guarantees
that one may believe are automatically provided. Take the example of the
blocking list operations in Redis.

With AOF and fsync=always we could have:

    Client A doing: BLPOP myqueue 0
    Client B doing: RPUSH myqueue a b c

In this scenario, Client A will get the "a" elements immediately after
the Client B RPUSH will be executed, even before the operation is persisted.
However when Client B will get the acknowledge, it can be sure that
"b,c" are already safe on disk inside the list.

What to note here is that it cannot be assumed that Client A receiving
the element is a guaranteed that the operation succeeded from the point
of view of Client B.

This is due to the fact that the barrier exists within the same socket,
and not between different sockets. However in the case above, the
element "a" was not going to be persisted regardless, so it is a pretty
synthetic argument.
2018-02-27 13:06:42 +01:00
Salvatore Sanfilippo
3f379e3d70 Merge pull request #3828 from oranagra/sdsnewlen_pr
add SDS_NOINIT option to sdsnewlen to avoid unnecessary memsets.
2018-02-27 04:04:32 -08:00
antirez
ac49eb0c8a Fix ziplist prevlen encoding description. See #4705. 2018-02-23 12:19:35 +01:00
gechunlin
c857ac5840 Update object.c 2018-02-22 20:57:54 -06:00
antirez
ee84fc714a Track number of logically expired keys still in memory.
This commit adds two new fields in the INFO output, stats section:

expired_stale_perc:0.34
expired_time_cap_reached_count:58

The first field is an estimate of the number of keys that are yet in
memory but are already logically expired. They reason why those keys are
yet not reclaimed is because the active expire cycle can't spend more
time on the process of reclaiming the keys, and at the same time nobody
is accessing such keys. However as the active expire cycle runs, while
it will eventually have to return to the caller, because of time limit
or because there are less than 25% of keys logically expired in each
given database, it collects the stats in order to populate this INFO
field.

Note that expired_stale_perc is a running average, where the current
sample accounts for 5% and the history for 95%, so you'll see it
changing smoothly over time.

The other field, expired_time_cap_reached_count, counts the number
of times the expire cycle had to stop, even if still it was finding a
sizeable number of keys yet to expire, because of the time limit.
This allows people handling operations to understand if the Redis
server, during mass-expiration events, is able to collect keys fast
enough usually. It is normal for this field to increment during mass
expires, but normally it should very rarely increment. When instead it
constantly increments, it means that the current workloads is using
a very important percentage of CPU time to expire keys.

This feature was created thanks to the hints of Rashmi Ramesh and
Bart Robinson from Twitter. In private email exchanges, they noted how
it was important to improve the observability of this parameter in the
Redis server. Actually in big deployments, the amount of keys that are
yet to expire in each server, even if they are logically expired, may
account for a very big amount of wasted memory.
2018-02-19 11:12:49 +01:00
antirez
f4395e232b Remove non semantical spaces from module.c. 2018-02-15 21:41:03 +01:00
Salvatore Sanfilippo
1d0a91aecb Merge pull request #4479 from dvirsky/notify
Keyspace notifications API for modules
2018-02-15 21:36:32 +01:00
antirez
906b095592 Fix typo in notifyKeyspaceEvent() comment. 2018-02-15 21:33:06 +01:00
Dvir Volk
0690168116 Add doc comment about notification flags 2018-02-14 21:54:00 +02:00
Dvir Volk
d3abc6e3ae Add REDISMODULE_NOTIFY_STREAM flag to support stream notifications 2018-02-14 21:50:42 +02:00
Dvir Volk
4991298fb0 Fix indentation and comment style in testmodule 2018-02-14 21:43:06 +02:00
Dvir Volk
fbd0514a1f Use one static client for all keyspace notification callbacks 2018-02-14 21:40:10 +02:00
Dvir Volk
c54aaca680 Remove the NOTIFY_MODULE flag and simplify the module notification flow if there aren't subscribers 2018-02-14 21:40:10 +02:00
Dvir Volk
fa3b63fe82 Document flags for notifications 2018-02-14 21:38:58 +02:00
Dvir Volk
8e5174caeb removed some trailing whitespaces 2018-02-14 21:38:58 +02:00
Dvir Volk
fa158da45d removed hellonotify.c 2018-02-14 21:38:58 +02:00
Dvir Volk
641fce93e7 fixed test 2018-02-14 21:38:58 +02:00
Dvir Volk
053941b983 finished implementation of notifications. Tests unfinished 2018-02-14 21:38:58 +02:00
Salvatore Sanfilippo
ca9192e2fa Merge pull request #4685 from charsyam/refactoring/set_max_latency
Removing duplicated code to set max latency
2018-02-13 16:20:32 +01:00
charsyam
a7bb8bb27c getting rid of duplicated code 2018-02-14 00:12:13 +09:00
antirez
bc0c7045f4 More verbose logging when slave sends errors to master.
See #3832.
2018-02-13 16:01:31 +01:00
Salvatore Sanfilippo
1296894d25 Merge pull request #3832 from oranagra/slave_reply_to_master_pr
when a slave responds with an error on commands that come from master, log it
2018-02-13 15:55:26 +01:00
Salvatore Sanfilippo
028efd8242 Merge pull request #3745 from guybe7/unstable
enlarged buffer given to ld2string
2018-02-13 15:50:21 +01:00
antirez
d4f1cbbd3a Make it explicit with a comment why we kill the old AOF rewrite.
See #3858.
2018-02-13 15:43:34 +01:00
Guy Benoish
cfa0d361b7 rewriteAppendOnlyFileBackground() failure fix
It is possible to do BGREWRITEAOF even if appendonly=no. This is by design.
stopAppendonly() didn't turn off aof_rewrite_scheduled (it can be turned on
again by BGREWRITEAOF even while appendonly is off anyway).
After configuring `appendonly yes` it will see that the state is AOF_OFF,
there's no RDB fork, so it will do rewriteAppendOnlyFileBackground() which
will fail since the aof_child_pid is set (was scheduled and started by cron).

Solution:
stopAppendonly() will turn off the schedule flag (regardless of who asked for it).
startAppendonly() will terminate any existing fork and start a new one (so it is the most recent).
2018-02-13 15:41:06 +01:00
Salvatore Sanfilippo
3ff53a8228 Merge pull request #4684 from oranagra/latency_monitor_max
fix to latency monitor reporting wrong max latency
2018-02-13 15:31:11 +01:00
Oran Agra
a151e92947 fix to latency monitor reporting wrong max latency
in some cases LATENCY HISTORY reported latency that was
higher than the max latency reported by LATENCY LATEST / DOCTOR
2018-02-13 15:58:40 +02:00
赵磊
04a3125f02 Remove updateLFU() in dbOverwrite(). 2018-02-11 21:02:07 +08:00
antirez
0b58392719 Rax updated to latest antirez/rax commit. 2018-02-02 11:10:18 +01:00
Salvatore Sanfilippo
6314165e4d Merge pull request #4269 from jianqingdu/unstable
fix not call va_end() when syncWrite() failed
2018-01-24 10:55:25 +01:00
Salvatore Sanfilippo
7c41737e39 Merge pull request #4628 from mnunberg/patch-1
redismodule.h: Check ModuleNameBusy before calling it
2018-01-24 10:48:04 +01:00
antirez
a8d678d26d Fix integration test NOREPLICAS error time dependent false positive. 2018-01-24 10:10:48 +01:00
Mark Nunberg
eb80bec044 redismodule.h: Check ModuleNameBusy before calling it
Older versions might not have this function.
2018-01-23 10:49:18 -05:00
antirez
bca90f9ef7 Fix migrateCommand() access of not initialized byte. 2018-01-18 12:41:05 +01:00
Guy Benoish
99e7e46da1 Replication buffer fills up on high rate traffic.
When feeding the master with a high rate traffic the the slave's feed is much slower.
This causes the replication buffer to grow (indefinitely) which leads to slave disconnection.
The problem is that writeToClient() decides to stop writing after NET_MAX_WRITES_PER_EVENT
writes (In order to be fair to clients).
We should ignore this when the client is a slave.
It's better if clients wait longer, the alternative is that the slave has no chance to stay in
sync in this situation.
2018-01-18 12:10:48 +01:00
antirez
0faed417e5 Cluster: improve anti-affinity algo in redis-trib.rb.
See #3462 and related PRs.

We use a simple algorithm to calculate the level of affinity violation,
and then an optimizer that performs random swaps until things improve.
2018-01-18 11:44:19 +01:00
antirez
dbc3625a80 Remove useless comment from serverCron().
The behavior is well specified by the code itself.
2018-01-17 11:23:41 +01:00
Salvatore Sanfilippo
a28f3fc568 Merge pull request #4546 from hqin6/unstable
fixbug for #4545 dead loop aof rewrite
2018-01-17 11:21:55 +01:00
heqin
28405f1f75 fixbug for #4545 dead loop aof rewrite 2018-01-17 18:08:30 +08:00
Salvatore Sanfilippo
d19edaf43f Merge pull request #4609 from Qinch/unstable
fix assert problem in ZIP_DECODE_PREVLENSIZE macro
2018-01-17 10:45:11 +01:00
antirez
299e560a2e Hopefully more clear comment to explain the change in #4607. 2018-01-16 15:52:13 +01:00
qinchao
f2db112964 fix assert problem in ZIP_DECODE_PREVLENSIZE
, see issue: https://github.com/antirez/redis/issues/4587
2018-01-16 22:43:06 +08:00