233 Commits

Author SHA1 Message Date
antirez
5957ec011b Modules API: RM_GetRandomBytes() / GetRandomHexChars(). 2018-04-05 13:24:22 +02:00
antirez
055ab3623b Modules Cluster API: make node IDs pointers constant. 2018-03-30 13:16:07 +02:00
antirez
72f11ded18 Modules Cluster API: message bus implementation. 2018-03-29 15:13:31 +02:00
antirez
6abf326d6c AOF: enable RDB-preamble rewriting by default.
There are too many advantages in doing this, RDB is faster to persist,
more compact, much faster to load back. The main issues here are that
the code is less tested because this was not the old default (so we are
enabling it for the new 5.0 release), and that the AOF is no longer a
trivially parsable format from now on. However the non-preamble mode
will be supported in the future as well, if new data types will be
added.
2018-03-25 11:43:30 +02:00
Salvatore Sanfilippo
d799fdf733 Merge pull request #4691 from oranagra/active_defrag_v2
Active defrag v2
2018-03-22 09:16:32 +01:00
antirez
d7fa510612 CG: Replication WIP 1: XREADGROUP and XCLAIM propagated as XCLAIM. 2018-03-19 18:02:19 +01:00
zhaozhao.zz
d7d0493065 rdb: incremental fsync when redis saves rdb 2018-03-16 00:44:50 +08:00
antirez
78bdd568a4 CG: XINFO CONSUMERS implemented. 2018-03-15 12:54:10 +01:00
antirez
0c4a8532c6 CG: XCLAIM now updates the idle time of the message. 2018-03-15 12:54:10 +01:00
antirez
a85f0fc571 CG: XPENDING without start/stop variant implemented. 2018-03-15 12:54:10 +01:00
antirez
9b2223cebd CG: XACK implementation. 2018-03-15 12:54:10 +01:00
antirez
23dc98ac52 CG: add & populate group+consumer in the blocking state. 2018-03-15 12:54:10 +01:00
antirez
2c348dafd0 CG: data structures design + XGROUP CREATE implementation. 2018-03-15 12:54:10 +01:00
antirez
b94379e29a Cluster: ability to prevent slaves from failing over their masters.
This commit, in some parts derived from PR #3041 which is no longer
possible to merge (because the user deleted the original branch),
implements the ability of slaves to have a special configuration
preventing that they try to start a failover when the master is failing.

There are multiple reasons for wanting this, and the feautre was
requested in issue #3021 time ago.

The differences between this patch and the original PR are the
following:

1. The flag is saved/loaded on the nodes configuration.
2. The 'myself' node is now flag-aware, the flag is updated as needed
   when the configuration is changed via CONFIG SET.
3. The flag name uses NOFAILOVER instead of NO_FAILOVER to be consistent
   with existing NOADDR.
4. The redis.conf documentation was rewritten.

Thanks to @deep011 for the original patch.
2018-03-14 14:01:38 +01:00
Oran Agra
5805907825 Adding real allocator fragmentation to INFO and MEMORY command + active defrag test
other fixes / improvements:
- LUA script memory isn't taken from zmalloc (taken from libc malloc)
  so it can cause high fragmentation ratio to be displayed (which is false)
- there was a problem with "fragmentation" info being calculated from
  RSS and used_memory sampled at different times (now sampling them together)

other details:
- adding a few more allocator info fields to INFO and MEMORY commands
- improve defrag test to measure defrag latency of big keys
- increasing the accuracy of the defrag test (by looking at real grag info)
  this way we can use an even lower threshold and still avoid false positives
- keep the old (total) "fragmentation" field unchanged, but add new ones for spcific things
- add these the MEMORY DOCTOR command
- deduct LUA memory from the rss in case of non jemalloc allocator (one for which we don't "allocator active/used")
- reduce sampling rate of the rss and allocator info
2018-03-12 15:08:52 +02:00
Oran Agra
a0ad511d6b active defrag v2
- big keys are not defragged in one go from within the dict scan
  instead they are scanned in parts after the main dict hash bucket is done.
- add latency monitor sample for defrag
- change default active-defrag-cycle-min to induce lower latency
- make active defrag start a new scan right away if needed, so it's easier
  (for the test suite) to detect when it's done
- make active defrag quick the current cycle after each db / big key
- defrag  some non key long term global allocations
- some refactoring for smaller functions and more reusable code
- during dict rehashing, one scan iteration of the dict, can end up scanning
  one bucket in the smaller dict and many many buckets in the larger dict.
  so waiting for 16 scan iterations before checking the time, may be much too long.
2018-03-12 15:07:43 +02:00
antirez
f00615b4ff Track number of logically expired keys still in memory.
This commit adds two new fields in the INFO output, stats section:

expired_stale_perc:0.34
expired_time_cap_reached_count:58

The first field is an estimate of the number of keys that are yet in
memory but are already logically expired. They reason why those keys are
yet not reclaimed is because the active expire cycle can't spend more
time on the process of reclaiming the keys, and at the same time nobody
is accessing such keys. However as the active expire cycle runs, while
it will eventually have to return to the caller, because of time limit
or because there are less than 25% of keys logically expired in each
given database, it collects the stats in order to populate this INFO
field.

Note that expired_stale_perc is a running average, where the current
sample accounts for 5% and the history for 95%, so you'll see it
changing smoothly over time.

The other field, expired_time_cap_reached_count, counts the number
of times the expire cycle had to stop, even if still it was finding a
sizeable number of keys yet to expire, because of the time limit.
This allows people handling operations to understand if the Redis
server, during mass-expiration events, is able to collect keys fast
enough usually. It is normal for this field to increment during mass
expires, but normally it should very rarely increment. When instead it
constantly increments, it means that the current workloads is using
a very important percentage of CPU time to expire keys.

This feature was created thanks to the hints of Rashmi Ramesh and
Bart Robinson from Twitter. In private email exchanges, they noted how
it was important to improve the observability of this parameter in the
Redis server. Actually in big deployments, the amount of keys that are
yet to expire in each server, even if they are logically expired, may
account for a very big amount of wasted memory.
2018-02-19 11:12:49 +01:00
Dvir Volk
383edf2101 Remove the NOTIFY_MODULE flag and simplify the module notification flow if there aren't subscribers 2018-02-14 21:40:10 +02:00
Dvir Volk
20f414af40 finished implementation of notifications. Tests unfinished 2018-02-14 21:38:58 +02:00
antirez
4573d2648d New config options about protocol prefixed with "proto".
Related to #4568.
2018-01-11 11:27:41 +01:00
Oran Agra
41be5ba332 Add config options for max-bulk-len and max-querybuf-len mainly to support RESTORE of large keys 2017-12-29 12:43:48 +02:00
zhaozhao.zz
72d64eb653 zset: change the span of zskiplistNode to unsigned long 2017-12-08 16:09:27 +08:00
zhaozhao.zz
93512e13a0 zset: fix the int problem 2017-12-08 15:37:08 +08:00
Itamar Haber
29f1a3ee20 Merge remote-tracking branch 'upstream/unstable' into help_subcommands 2017-12-05 18:14:59 +02:00
antirez
e0df883f45 add linkClient(): adds the client and caches the list node.
We have this operation in two places: when caching the master and
when linking a new client after the client creation. By having an API
for this we avoid incurring in errors when modifying one of the two
places forgetting the other. The function is also a good place where to
document why we cache the linked list node.

Related to #4497 and #4210.
2017-12-05 16:02:03 +01:00
Salvatore Sanfilippo
1bfd55ac7d Merge pull request #4497 from soloestoy/optimize-unlink-client
networking: optimize unlinkClient() in freeClient()
2017-12-05 15:51:15 +01:00
antirez
0f55991907 Refactoring: improve luaCreateFunction() API.
The function in its initial form, and after the fixes for the PSYNC2
bugs, required code duplication in multiple spots. This commit modifies
it in order to always compute the script name independently, and to
return the SDS of the SHA of the body: this way it can be used in all
the places, including for SCRIPT LOAD, without duplicating the code to
create the Lua function name. Note that this requires to re-compute the
body SHA1 in the case of EVAL seeing a script for the first time, but
this should not change scripting performance in any way because new
scripts definition is a rare event happening the first time a script is
seen, and the SHA1 computation is anyway not a very slow process against
the typical Redis script and compared to the actua Lua byte compiling of
the body.

Note that the function used to assert() if a duplicated script was
loaded, however actually now two times over three, we want the function
to handle duplicated scripts just fine: this happens in SCRIPT LOAD and
in RDB AUX "lua" loading. Moreover the assert was not defending against
some obvious failure mode, so now the function always tests against
already defined functions at start.
2017-12-04 11:25:20 +01:00
antirez
5ac21dab66 Fix loading of RDB files lua AUX fields when the script is defined.
In the case of slaves loading the RDB from master, or in other similar
cases, the script is already defined, and the function registering the
script should not fail in the assert() call.
2017-12-01 16:01:10 +01:00
antirez
8be81febcf Streams: XRANGE REV option -> XREVRANGE command. 2017-12-01 10:24:25 +01:00
antirez
3f8ed9a516 Streams: export iteration API. 2017-12-01 10:24:24 +01:00
antirez
541b1a7cc3 Streams: fix XADD API and keyspace notifications.
XADD was suboptimal in the first incarnation of the command, not being
able to accept an ID (very useufl for replication), nor options for
having capped streams.

The keyspace notification for streams was not implemented.
2017-12-01 10:24:24 +01:00
antirez
799eaab940 Streams: fix XREAD ready-key signaling.
With lists we need to signal only on key creation, but streams can
provide data to clients listening at every new item added.
To make this slightly more efficient we now track different classes of
blocked clients to avoid signaling keys when there is nobody listening.
A typical case is when the stream is used as a time series DB and
accessed only by range with XRANGE.
2017-12-01 10:24:24 +01:00
antirez
3aa11e22a4 Streams: XREAD related code to serve blocked clients. 2017-12-01 10:24:24 +01:00
antirez
d783126316 Streams: XREAD get-keys method. 2017-12-01 10:24:24 +01:00
antirez
a477b95b01 Streams: augment client.bpop with XREAD specific fields. 2017-12-01 10:24:24 +01:00
antirez
53600c7fd5 Streams: more internal preparation for blocking XREAD. 2017-12-01 10:24:24 +01:00
antirez
f6179c38b0 Streams: initial work to use blocking lists logic for streams XREAD. 2017-12-01 10:24:24 +01:00
antirez
8016093ee9 Streams: implement stream object release. 2017-12-01 10:24:24 +01:00
antirez
6a4fd95478 Streams: XLEN command. 2017-12-01 10:24:24 +01:00
antirez
27fd469a8d Streams: assign value of 6 to OBJ_STREAM + some refactoring. 2017-12-01 10:24:24 +01:00
antirez
1850ff43f4 Streams: 12 commits squashed into the initial Streams implementation. 2017-12-01 10:24:24 +01:00
antirez
35773a3295 PSYNC2: Save Lua scripts state into RDB file.
This is currently needed in order to fix #4483, but this can be
useful in other contexts, so maybe later we may want to remove the
conditionals and always save/load scripts.

Note that we are using the "lua" AUX field here, in order to guarantee
backward compatibility of the RDB file. The unknown AUX fields must be
discarded by past versions of Redis.
2017-11-30 18:37:52 +01:00
zhaozhao.zz
2f98aa2d83 networking: optimize unlinkClient() in freeClient() 2017-11-30 18:11:05 +08:00
Itamar Haber
50ce8124a3 Standardizes the 'help' subcommand
This adds a new `addReplyHelp` helper that's used by commands
when returning a help text. The following commands have been
touched: DEBUG, OBJECT, COMMAND, PUBSUB, SCRIPT and SLOWLOG.

WIP

Fix entry command table entry for OBJECT for HELP option.

After #4472 the command may have just 2 arguments.

Improve OBJECT HELP descriptions.

See #4472.

WIP 2

WIP 3
2017-11-28 21:15:45 +02:00
zhaozhao.zz
22950a6284 LFU: do some changes about LFU to find hotkeys
Firstly, use access time to replace the decreas time of LFU.
For function LFUDecrAndReturn,
it should only try to get decremented counter,
not update LFU fields, we will update it in an explicit way.
And we will times halve the counter according to the times of
elapsed time than server.lfu_decay_time.
Everytime a key is accessed, we should update the LFU
including update access time, and increment the counter after
call function LFUDecrAndReturn.
If a key is overwritten, the LFU should be also updated.
Then we can use `OBJECT freq` command to get a key's frequence,
and LFUDecrAndReturn should be called in `OBJECT freq` command
in case of the key has not been accessed for a long time,
because we update the access time only when the key is read or
overwritten.
2017-11-27 18:39:22 +01:00
zhaozhao.zz
ac4614a079 LFU: change lfu* parameters to int 2017-11-27 18:38:55 +01:00
antirez
b30e60d06d Fix replication of SLAVEOF inside transaction.
In Redis 4.0 replication, with the introduction of PSYNC2, masters and
slaves replicate commands to cascading slaves and to the replication
backlog itself in a different way compared to the past.

Masters actually replicate the effects of client commands.
Slaves just propagate what they receive from masters.

This mechanism can cause problems when the configuration of an instance
is changed from master to slave inside a transaction. For instance
we could send to a master instance the following sequence:

    MULTI
    SLAVEOF 127.0.0.1 0
    EXEC
    SLAVEOF NO ONE

Before the fixes in this commit, the MULTI command used to be propagated
into the replication backlog, however after the SLAVEOF command the
instance is a slave, so the EXEC implementation failed to also propagate
the EXEC command. When the slaves of the above instance reconnected,
they were incrementally synchronized just sending a "MULTI". This put
the master client (in the slaves) into MULTI state, breaking the
replication.

Notably even Redis Sentinel uses the above approach in order to guarantee
that configuration changes are always performed together with rewrites
of the configuration and with clients disconnection. Sentiel does:

    MULTI
    SLAVEOF ...
    CONFIG REWRITE
    CLIENT KILL TYPE normal
    EXEC

So this was a really problematic issue. However even with the fix in
this commit, that will add the final EXEC to the replication stream in
case the instance was switched from master to slave during the
transaction, the result would be to increment the slave replication
offset, so a successive reconnection with the new master, will not
permit a successful partial resynchronization: no way the new master can
provide us with the backlog needed, we incremented our offset to a value
that the new master cannot have.

However the EXEC implementation waits to emit the MULTI, so that if the
commands inside the transaction actually do not need to be replicated,
no commands propagation happens at all. From multi.c:

    if (!must_propagate && !(c->cmd->flags & (CMD_READONLY|CMD_ADMIN))) {
	execCommandPropagateMulti(c);
	must_propagate = 1;
    }

The above code is already modified by this commit you are reading.
Now also ADMIN commands do not trigger the emission of MULTI. It is actually
not clear why we do not just check for CMD_WRITE... Probably I wrote it this
way in order to make the code more reliable: better to over-emit MULTI
than not emitting it in time.

So this commit should indeed fix issue #3836 (verified), however it looks
like some reconsideration of this code path is needed in the long term.

BONUS POINT: The reverse bug.

Even in a read only slave "B", in a replication setup like:

	A -> B -> C

There are commands without the READONLY nor the ADMIN flag, that are also
not flagged as WRITE commands. An example is just the PING command.

So if we send B the following sequence:

    MULTI
    PING
    SLAVEOF NO ONE
    EXEC

The result will be the reverse bug, where only EXEC is emitted, but not the
previous MULTI. However this apparently does not create problems in practice
but it is yet another acknowledge of the fact some work is needed here
in order to make this code path less surprising.

Note that there are many different approaches we could follow. For instance
MULTI/EXEC blocks containing administrative commands may be allowed ONLY
if all the commands are administrative ones, otherwise they could be
denined. When allowed, the commands could simply never be replicated at all.
2017-07-12 11:07:28 +02:00
antirez
1aab881fe1 AOF check utility: ability to check files with RDB preamble. 2017-07-10 13:38:23 +02:00
antirez
45c2679529 Modules: DEBUG DIGEST interface. 2017-07-06 11:04:46 +02:00
antirez
a9fce9e530 Added GEORADIUS(BYMEMBER)_RO variants for read-only operations.
Issue #4084 shows how for a design error, GEORADIUS is a write command
because of the STORE option. Because of this it does not work
on readonly slaves, gets redirected to masters in Redis Cluster even
when the connection is in READONLY mode and so forth.

To break backward compatibility at this stage, with Redis 4.0 to be in
advanced RC state, is problematic for the user base. The API can be
fixed into the unstable branch soon if we'll decide to do so in order to
be more consistent, and reease Redis 5.0 with this incompatibility in
the future. This is still unclear.

However, the ability to scale GEO queries in slaves easily is too
important so this commit adds two read-only variants to the GEORADIUS
and GEORADIUSBYMEMBER command: GEORADIUS_RO and GEORADIUSBYMEMBER_RO.
The commands are exactly as the original commands, but they do not
accept the STORE and STOREDIST options.
2017-06-30 10:03:37 +02:00