6608 Commits

Author SHA1 Message Date
antirez
c72389ef0f CG: test XACK ability to remove items from the PELs. 2018-03-15 12:54:10 +01:00
antirez
862c29fc1b CG: test XPENDING ability to return pending items. 2018-03-15 12:54:10 +01:00
antirez
0a5272cea8 CG: test XGROUPREAD abilities. 2018-03-15 12:54:10 +01:00
antirez
b3ae33b2ad CG: test group creation. 2018-03-15 12:54:10 +01:00
antirez
8d3c0e4186 CG: More specific duplicated group error. 2018-03-15 12:54:10 +01:00
antirez
bd4dd842ea CG: RDB loading, fix inverted conditional. 2018-03-15 12:54:10 +01:00
antirez
f979275a2a CG: RDB loading first implementation. 2018-03-15 12:54:10 +01:00
antirez
80da510011 CG: RDB saving part 2, consumers. 2018-03-15 12:54:10 +01:00
antirez
9a686b82cd CG: RDB saving part 1, metadata and PEL. 2018-03-15 12:54:10 +01:00
antirez
922b74828b CG: XPENDING should not create consumers and obey to count. 2018-03-15 12:54:10 +01:00
antirez
c139073709 CG: XPENDING with start/stop/count variant implemented. 2018-03-15 12:54:10 +01:00
antirez
a85f0fc571 CG: XPENDING without start/stop variant implemented. 2018-03-15 12:54:10 +01:00
antirez
f49cbaf84c CG: Now XREADGROUP + blocking operations work. 2018-03-15 12:54:10 +01:00
antirez
55b705689f CG: XACK should return zero when nothing is processed. 2018-03-15 12:54:10 +01:00
antirez
9b2223cebd CG: XACK implementation. 2018-03-15 12:54:10 +01:00
antirez
d82968ece2 CG: XREADGROUP can fetch data from the consumer PEL. 2018-03-15 12:54:10 +01:00
antirez
018f8a3c1d CG: first draft of streamReplyWithRangeFromConsumerPEL(). 2018-03-15 12:54:10 +01:00
antirez
391e1082a4 CG: Fix order of calls in streamReplyWithRange().
We need to check if we are going to serve the request via the PEL before
inserting a deferred array len in the client output buffer.
2018-03-15 12:54:10 +01:00
antirez
840ad8cec6 CG: creation of NACK entries in PELs. 2018-03-15 12:54:10 +01:00
antirez
28d36e3aa3 CG: fix XREADGROUP ">" special ID parsing due to missing "continue". 2018-03-15 12:54:10 +01:00
antirez
e0733ff7f8 CG: streamCompareID() + group last_id updating. 2018-03-15 12:54:10 +01:00
antirez
0f43a908f9 CG: consumer lookup + initial streamReplyWithRange() work to supprot CG. 2018-03-15 12:54:10 +01:00
antirez
23dc98ac52 CG: add & populate group+consumer in the blocking state. 2018-03-15 12:54:10 +01:00
antirez
20bc183ef8 CG: fix parsing in XREADGROUP and streamLookupCG() NULL check. 2018-03-15 12:54:10 +01:00
antirez
ffedba44e5 CG: add XREADGROUP in the command table. 2018-03-15 12:54:10 +01:00
antirez
d293a8a11e CG: XGROUPREAD group option parsing and groups lookup. 2018-03-15 12:54:10 +01:00
antirez
7dba47a535 CG: fix raxFind() retval check in streamCreateCG(). 2018-03-15 12:54:10 +01:00
antirez
2c348dafd0 CG: data structures design + XGROUP CREATE implementation. 2018-03-15 12:54:10 +01:00
antirez
054a935d2c Cluster: add test for the nofailover flag. 2018-03-14 16:30:32 +01:00
antirez
b94379e29a Cluster: ability to prevent slaves from failing over their masters.
This commit, in some parts derived from PR #3041 which is no longer
possible to merge (because the user deleted the original branch),
implements the ability of slaves to have a special configuration
preventing that they try to start a failover when the master is failing.

There are multiple reasons for wanting this, and the feautre was
requested in issue #3021 time ago.

The differences between this patch and the original PR are the
following:

1. The flag is saved/loaded on the nodes configuration.
2. The 'myself' node is now flag-aware, the flag is updated as needed
   when the configuration is changed via CONFIG SET.
3. The flag name uses NOFAILOVER instead of NO_FAILOVER to be consistent
   with existing NOADDR.
4. The redis.conf documentation was rewritten.

Thanks to @deep011 for the original patch.
2018-03-14 14:01:38 +01:00
antirez
4ee8d1377f Stream: update the listpack pointer in streamTrimByLength(). 2018-03-01 17:26:02 +01:00
antirez
b4e6961fec Remove warning from lpGet snprintf(). 2018-03-01 15:26:27 +01:00
antirez
5db00d9547 redis-cli: fix missed unit in array. Change define name. 2018-03-01 15:06:41 +01:00
Salvatore Sanfilippo
aa1871bc5c Merge pull request #4714 from charsyam/feature/fix-out-of-index-range
[BugFix] Fix out of array index range for findBigKeys in redis-cli
2018-03-01 03:39:15 -08:00
antirez
e5eafbc776 Actually use ae_flags to add AE_BARRIER if needed.
Many thanks to @Plasma that spotted this problem reviewing the code.
2018-02-28 18:03:51 +01:00
Salvatore Sanfilippo
c2d54c26df Merge pull request #4715 from charsyam/feature/refactoring-make-condition-clear-for-rdb
[BugFix] fix calculation length in rdbSaveAuxField
2018-02-27 10:15:27 -08:00
antirez
7ba754281f expireIfNeeded() needed a top comment documenting the behavior. 2018-02-27 16:44:43 +01:00
antirez
f7f696f4ae expireIfNeeded() comment: claim -> pretend. 2018-02-27 16:37:37 +01:00
charsyam
4529e3dffa refactoring-make-condition-clear-for-rdb 2018-02-27 21:55:20 +09:00
charsyam
fdea7d8cda fix-out-of-index-range-for-redis-cli-findbigkey 2018-02-27 21:46:19 +09:00
antirez
7874da180e ae.c: insetad of not firing, on AE_BARRIER invert the sequence.
AE_BARRIER was implemented like:

    - Fire the readable event.
    - Do not fire the writabel event if the readable fired.

However this may lead to the writable event to never be called if the
readable event is always fired. There is an alterantive, we can just
invert the sequence of the calls in case AE_BARRIER is set. This commit
does that.
2018-02-27 13:06:42 +01:00
antirez
c724e5dcf6 AOF: fix a bug that may prevent proper fsyncing when fsync=always.
In case the write handler is already installed, it could happen that we
serve the reply of a query in the same event loop cycle we received it,
preventing beforeSleep() from guaranteeing that we do the AOF fsync
before sending the reply to the client.

The AE_BARRIER mechanism, introduced in a previous commit, prevents this
problem. This commit makes actual use of this new feature to fix the
bug.
2018-02-27 13:06:42 +01:00
antirez
17cbb54919 Cluster: improve crash-recovery safety after failover auth vote.
Add AE_BARRIER to the writable event loop so that slaves requesting
votes can't be served before we re-enter the event loop in the next
iteration, so clusterBeforeSleep() will fsync to disk in time.
Also add the call to explicitly fsync, given that we modified the last
vote epoch variable.
2018-02-27 13:06:42 +01:00
antirez
043763d040 ae.c: introduce the concept of read->write barrier.
AOF fsync=always, and certain Redis Cluster bus operations, require to
fsync data on disk before replying with an acknowledge.
In such case, in order to implement Group Commits, we want to be sure
that queries that are read in a given cycle of the event loop, are never
served to clients in the same event loop iteration. This way, by using
the event loop "before sleep" callback, we can fsync the information
just one time before returning into the event loop for the next cycle.
This is much more efficient compared to calling fsync() multiple times.

Unfortunately because of a bug, this was not always guaranteed: the
actual way the events are installed was the sole thing that could
control. Normally this problem is hard to trigger when AOF is enabled
with fsync=always, because we try to flush the output buffers to the
socekt directly in the beforeSleep() function of Redis. However if the
output buffers are full, we actually install a write event, and in such
a case, this bug could happen.

This change to ae.c modifies the event loop implementation to make this
concept explicit. Write events that are registered with:

    AE_WRITABLE|AE_BARRIER

Are guaranteed to never fire after the readable event was fired for the
same file descriptor. In this way we are sure that data is persisted to
disk before the client performing the operation receives an
acknowledged.

However note that this semantics does not provide all the guarantees
that one may believe are automatically provided. Take the example of the
blocking list operations in Redis.

With AOF and fsync=always we could have:

    Client A doing: BLPOP myqueue 0
    Client B doing: RPUSH myqueue a b c

In this scenario, Client A will get the "a" elements immediately after
the Client B RPUSH will be executed, even before the operation is persisted.
However when Client B will get the acknowledge, it can be sure that
"b,c" are already safe on disk inside the list.

What to note here is that it cannot be assumed that Client A receiving
the element is a guaranteed that the operation succeeded from the point
of view of Client B.

This is due to the fact that the barrier exists within the same socket,
and not between different sockets. However in the case above, the
element "a" was not going to be persisted regardless, so it is a pretty
synthetic argument.
2018-02-27 13:06:42 +01:00
Salvatore Sanfilippo
601420cb73 Merge pull request #3828 from oranagra/sdsnewlen_pr
add SDS_NOINIT option to sdsnewlen to avoid unnecessary memsets.
2018-02-27 04:04:32 -08:00
antirez
e024c01a4c Fix ziplist prevlen encoding description. See #4705. 2018-02-23 12:19:35 +01:00
antirez
f00615b4ff Track number of logically expired keys still in memory.
This commit adds two new fields in the INFO output, stats section:

expired_stale_perc:0.34
expired_time_cap_reached_count:58

The first field is an estimate of the number of keys that are yet in
memory but are already logically expired. They reason why those keys are
yet not reclaimed is because the active expire cycle can't spend more
time on the process of reclaiming the keys, and at the same time nobody
is accessing such keys. However as the active expire cycle runs, while
it will eventually have to return to the caller, because of time limit
or because there are less than 25% of keys logically expired in each
given database, it collects the stats in order to populate this INFO
field.

Note that expired_stale_perc is a running average, where the current
sample accounts for 5% and the history for 95%, so you'll see it
changing smoothly over time.

The other field, expired_time_cap_reached_count, counts the number
of times the expire cycle had to stop, even if still it was finding a
sizeable number of keys yet to expire, because of the time limit.
This allows people handling operations to understand if the Redis
server, during mass-expiration events, is able to collect keys fast
enough usually. It is normal for this field to increment during mass
expires, but normally it should very rarely increment. When instead it
constantly increments, it means that the current workloads is using
a very important percentage of CPU time to expire keys.

This feature was created thanks to the hints of Rashmi Ramesh and
Bart Robinson from Twitter. In private email exchanges, they noted how
it was important to improve the observability of this parameter in the
Redis server. Actually in big deployments, the amount of keys that are
yet to expire in each server, even if they are logically expired, may
account for a very big amount of wasted memory.
2018-02-19 11:12:49 +01:00
antirez
ab4dddc4eb Remove non semantical spaces from module.c. 2018-02-15 21:41:03 +01:00
Salvatore Sanfilippo
4bfcce959c Merge pull request #4479 from dvirsky/notify
Keyspace notifications API for modules
2018-02-15 21:36:32 +01:00
antirez
85a9a91e56 Fix typo in notifyKeyspaceEvent() comment. 2018-02-15 21:33:06 +01:00