243 Commits

Author SHA1 Message Date
Guy Benoish
4330bc3e3a Enhance RESTORE with RDBv9 new features
RESTORE now supports:
1. Setting LRU/LFU
2. Absolute-time TTL

Other related changes:
1. RDB loading will not override LRU bits when RDB file
   does not contain the LRU opcode.
2. RDB loading will not set LRU/LFU bits if the server's
   maxmemory-policy does not match.
2018-06-20 15:11:08 +07:00
Salvatore Sanfilippo
6263f40b76 Merge pull request #4758 from soloestoy/rdb-save-incremental-fsync
Rdb save incremental fsync
2018-06-16 10:59:37 +02:00
antirez
dc3ae17744 RDB: Apply fix to rdbLoadMillisecondTime() only for new RDB versions.
This way we let big endian systems to still load old RDB versions.
However newver versions will be saved and loaded in a way that make RDB
expires cross-endian again. Thanks to @oranagra for the reporting and
the discussion about this problem, leading to this fix.
2018-06-12 18:21:39 +02:00
antirez
49ede33ee8 Fix rdbSaveKeyValuePair() integer overflow.
Again thanks to @oranagra. The object idle time does not fit into an int
sometimes: use the native type that the serialization function will get
as argument, which is uint64_t.
2018-06-12 17:31:04 +02:00
antirez
69d72ba2b9 RDB: store times consistently in little endian.
I'm not sure how this escaped the attention of Redis users for years,
but finally @oranagra reported this issue... Thanks to Oran.
2018-06-12 17:22:03 +02:00
Salvatore Sanfilippo
e925f78b95 Merge pull request #4861 from soloestoy/rdb-dict-expand
RDB: expand dict if needed when rdb load object
2018-06-08 12:12:34 +02:00
antirez
96c5bd2043 Don't expire keys while loading RDB from AOF preamble.
The AOF tail of a combined RDB+AOF is based on the premise of applying
the AOF commands to the exact state that there was in the server while
the RDB was persisted. By expiring keys while loading the RDB file, we
change the state, so applying the AOF tail later may change the state.

Test case:

* Time1: SET a 10
* Time2: EXPIREAT a $time5
* Time3: INCR a
* Time4: PERSIT A. Start bgrewiteaof with RDB preamble. The value of a is 11 without expire time.
* Time5: Restart redis from the RDB+AOF: consistency violation.

Thanks to @soloestoy for providing the patch.
Thanks to @trevor211 for the original issue report and the initial fix.

Check issue #4950 for more info.
2018-05-29 12:37:42 +02:00
WuYunlong
4a80f44955 Fix rdb save by allowing dumping of expire keys, so that when
we add a new slave, and do a failover, eighter by manual or
not, other local slaves will delete the expired keys properly.
2018-05-29 12:35:15 +02:00
Salvatore Sanfilippo
10ff9c213b Merge pull request #4908 from soloestoy/aof-rdb-preamble-compatible-checksum-no
AOF & RDB: be compatible with rdbchecksum no
2018-05-23 17:11:00 +02:00
antirez
dabd389e2f Fix rdb.c dictionary iterator release in 2 more places. 2018-05-09 12:06:37 +02:00
antirez
e55e57f132 Fix rdb.c dictionary iterator release.
Some times it was not released on error, sometimes it was released two
times because the error path expected the "di" var to be NULL if the
iterator was already released. Thanks to @oranagra for pinging me about
potential problems of this kind inside rdb.c.
2018-05-09 11:03:27 +02:00
zhaozhao.zz
dc1a6b5759 AOF & RDB: be compatible with rdbchecksum no 2018-05-08 19:22:13 +08:00
zhaozhao.zz
ecee5da9bb RDB: expand dict if needed when rdb load object 2018-04-22 22:30:44 +08:00
antirez
0f94a377b0 RDB: Implement future-proof module AUX data loading. 2018-03-16 13:47:10 +01:00
zhaozhao.zz
d7d0493065 rdb: incremental fsync when redis saves rdb 2018-03-16 00:44:50 +08:00
antirez
7062ca6ff0 RDB: LRU/LFU branches missed continue. 2018-03-15 16:33:18 +01:00
antirez
68b3ee4bc8 RDB: Ability to load LFU/LRU info. 2018-03-15 16:24:53 +01:00
antirez
d4d246f296 RDB: Ability to save LFU/LRU info.
This is a big win for caching use cases, since on reloading Redis will
still have some idea about what is worth to evict and what not.
However this only solves part of the problem because the information is
only partially propagated to slaves (on write operations). Reads will
not affect slaves LFU and LRU counters, so after a failover the eviction
decisions are kinda random until keys start to collect some aging/freq info.

However since new slaves are initially populated via RDB file transfer,
this means that if we spin up a new slave from a master, and perform an
immediate manual failover (for instance in order to upgrade the master),
the slave will have eviction informations to use for some time.

The LFU/LRU info is persisted only if the maxmemory policy is set to one
of the relevant type, even if no actual "maxmemory"  memory limit is
set.
2018-03-15 13:15:55 +01:00
antirez
8f1f95971f CG: fix CG RDB loading not found conditional. 2018-03-15 12:54:10 +01:00
antirez
4c1db9b9e7 Streams: trap more errors in stream loading + RDB check type name. 2018-03-15 12:54:10 +01:00
antirez
3ecf2a1005 CG: fix RDB saving when there are no consumer groups. 2018-03-15 12:54:10 +01:00
antirez
bd4dd842ea CG: RDB loading, fix inverted conditional. 2018-03-15 12:54:10 +01:00
antirez
f979275a2a CG: RDB loading first implementation. 2018-03-15 12:54:10 +01:00
antirez
80da510011 CG: RDB saving part 2, consumers. 2018-03-15 12:54:10 +01:00
antirez
9a686b82cd CG: RDB saving part 1, metadata and PEL. 2018-03-15 12:54:10 +01:00
charsyam
4529e3dffa refactoring-make-condition-clear-for-rdb 2018-02-27 21:55:20 +09:00
Salvatore Sanfilippo
601420cb73 Merge pull request #3828 from oranagra/sdsnewlen_pr
add SDS_NOINIT option to sdsnewlen to avoid unnecessary memsets.
2018-02-27 04:04:32 -08:00
Oran Agra
5821e8cf82 fix processing of large bulks (above 2GB)
- protocol parsing (processMultibulkBuffer) was limitted to 32big positions in the buffer
  readQueryFromClient potential overflow
- rioWriteBulkCount used int, although rioWriteBulkString gave it size_t
- several places in sds.c that used int for string length or index.
- bugfix in RM_SaveAuxField (return was 1 or -1 and not length)
- RM_SaveStringBuffer was limitted to 32bit length
2017-12-29 12:24:19 +02:00
antirez
0f55991907 Refactoring: improve luaCreateFunction() API.
The function in its initial form, and after the fixes for the PSYNC2
bugs, required code duplication in multiple spots. This commit modifies
it in order to always compute the script name independently, and to
return the SDS of the SHA of the body: this way it can be used in all
the places, including for SCRIPT LOAD, without duplicating the code to
create the Lua function name. Note that this requires to re-compute the
body SHA1 in the case of EVAL seeing a script for the first time, but
this should not change scripting performance in any way because new
scripts definition is a rare event happening the first time a script is
seen, and the SHA1 computation is anyway not a very slow process against
the typical Redis script and compared to the actua Lua byte compiling of
the body.

Note that the function used to assert() if a duplicated script was
loaded, however actually now two times over three, we want the function
to handle duplicated scripts just fine: this happens in SCRIPT LOAD and
in RDB AUX "lua" loading. Moreover the assert was not defending against
some obvious failure mode, so now the function always tests against
already defined functions at start.
2017-12-04 11:25:20 +01:00
antirez
5ac21dab66 Fix loading of RDB files lua AUX fields when the script is defined.
In the case of slaves loading the RDB from master, or in other similar
cases, the script is already defined, and the function registering the
script should not fail in the assert() call.
2017-12-01 16:01:10 +01:00
antirez
75fb0da53a Streams: delta encode IDs based on key. Add count + deleted fields.
We used to have the master ID stored at the start of the listpack,
however using the key directly makes more sense in order to create a
space efficient representation: anyway the key at the radix tree is very
unlikely to change because of how the stream is implemented. Moreover on
nodes merging, to rewrite the merged listpacks is anyway the most
sensible operation, and we can use the iterator and the append-to-stream
function in order to avoid re-implementing the code needed for merging.

This commit also adds two items at the start of the listpack: the
number of valid items inside the listpack, and the number of items
marked as deleted. This means that there is no need to scan a listpack
in order to understand if it's a good candidate for garbage collection,
if the ration between valid/deleted items triggers the GC.
2017-12-01 10:24:24 +01:00
antirez
1157127d8e Streams: Save stream->length in RDB. 2017-12-01 10:24:24 +01:00
antirez
5a9ec523f5 Streams: RDB loading. RDB saving modified.
After a few attempts it looked quite saner to just add the last item ID
at the end of the serialized listpacks, instead of scanning the last
listpack loaded from head to tail just to fetch it. It's a disk space VS
CPU-and-simplicity tradeoff basically.
2017-12-01 10:24:24 +01:00
antirez
fac980c6d1 Streams: RDB saving. 2017-12-01 10:24:24 +01:00
antirez
4f1b09e79a PSYNC2: just store script bodies into RDB.
Related to #4483. As suggested by @soloestoy, we can retrieve the SHA1
from the body. Given that in the new implementation using AUX fields we
ended copying around a lot to create new objects and strings, extremize
such concept and trade CPU for space inside the RDB file.
2017-11-30 18:38:26 +01:00
antirez
35773a3295 PSYNC2: Save Lua scripts state into RDB file.
This is currently needed in order to fix #4483, but this can be
useful in other contexts, so maybe later we may want to remove the
conditionals and always save/load scripts.

Note that we are using the "lua" AUX field here, in order to guarantee
backward compatibility of the RDB file. The unknown AUX fields must be
discarded by past versions of Redis.
2017-11-30 18:37:52 +01:00
antirez
2ed2fb7f25 PSYNC2: reorganize comments related to recent fixes.
Related to PR #4412 and issue #4407.
2017-11-24 11:08:29 +01:00
Salvatore Sanfilippo
acfece8854 Merge pull request #4412 from soloestoy/bugfix-psync2
PSYNC2: safe free backlog when reach the time limit and others
2017-11-24 10:56:18 +01:00
zhaozhao.zz
1e4dbdd0a1 PSYNC2: persist cached_master's dbid inside the RDB 2017-11-22 12:11:26 +08:00
zhaozhao.zz
07842aaaab PSYNC2: make repl_stream_db never be -1
it means that after this change all the replication
info in RDB is valid, and it can distinguish us from
the older version.
2017-11-22 12:05:34 +08:00
antirez
fe6dfb3136 Fix saving of zero-length lists.
Normally in modern Redis you can't create zero-len lists, however it's
possible to load them from old RDB files generated, for instance, using
Redis 2.8 (see issue #4409). The "Right Thing" would be not loading such
lists at all, but this requires to hook in rdb.c random places in a not
great way, for a problem that is at this point, at best, minor.

Here in this commit instead I just fix the fact that zero length lists,
materialized as quicklists with the first node set to NULL, were
iterated in the wrong way while they are saved, leading to a crash.

The other parts of the list implementation are apparently able to deal
with empty lists correctly, even if they are no longer a thing.
2017-11-06 12:37:03 +01:00
zhaozhao.zz
98f1bcf1cb PSYNC2: clarify the scenario when repl_stream_db can be -1 2017-11-02 10:45:33 +08:00
zhaozhao.zz
b99ca4211a PSYNC2 & RDB: fix the missing rdbSaveInfo for BGSAVE 2017-11-01 17:52:43 +08:00
antirez
0d68dd2fad PSYNC2: More refinements related to #4316. 2017-09-20 11:28:13 +02:00
zhaozhao.zz
019c7fa546 PSYNC2: make persisiting replication info more solid
This commit is a reinforcement of commit af78ec8.

1. Replication information can be stored when the RDB file is
generated by a mater using server.slaveseldb when server.repl_backlog
is not NULL, or set repl_stream_db be -1. That's safe, because
NULL server.repl_backlog will trigger full synchronization,
then master will send SELECT command to replicaiton stream.
2. Only do rdbSave* when rsiptr is not NULL,
if we do rdbSave* without rdbSaveInfo, slave will miss repl-stream-db.
3. Save the replication informations also in the case of
SAVE command, FLUSHALL command and DEBUG reload.
2017-09-20 11:18:10 +02:00
antirez
af78ec8ccf PSYNC2: Fix the way replication info is saved/loaded from RDB.
This commit attempts to fix a number of bugs reported in #4316.
They are related to the way replication info like replication ID,
offsets, and currently selected DB in the master client, are stored
and loaded by Redis. In order to avoid inconsistencies the changes in
this commit try to enforce that:

1. Replication information are only stored when the RDB file is
generated by a slave that has a valid 'master' client, so that we can
always extract the currently selected DB.
2. When replication informations are persisted in the RDB file, all the
info for a successful PSYNC or nothing is persisted.
3. The RDB replication informations are only loaded if the instance is
configured as a slave, otherwise a master can start with IDs that relate
to a different history of the data set, and stil retain such IDs in the
future while receiving unrelated writes.
2017-09-19 23:03:39 +02:00
antirez
1aab881fe1 AOF check utility: ability to check files with RDB preamble. 2017-07-10 13:38:23 +02:00
antirez
4e0dab2c26 Free IO context if any in RDB loading code.
Thanks to @oranagra for spotting this bug.
2017-07-06 11:20:49 +02:00
antirez
2a26f2a9f6 RDB modules values serialization format version 2.
The original RDB serialization format was not parsable without the
module loaded, becuase the structure was managed only by the module
itself. Moreover RDB is a streaming protocol in the sense that it is
both produce di an append-only fashion, and is also sometimes directly
sent to the socket (in the case of diskless replication).

The fact that modules values cannot be parsed without the relevant
module loaded is a problem in many ways: RDB checking tools must have
loaded modules even for doing things not involving the value at all,
like splitting an RDB into N RDBs by key or alike, or just checking the
RDB for sanity.

In theory module values could be just a blob of data with a prefixed
length in order for us to be able to skip it. However prefixing the values
with a length would mean one of the following:

1. To be able to write some data at a previous offset. This breaks
stremaing.
2. To bufferize values before outputting them. This breaks performances.
3. To have some chunked RDB output format. This breaks simplicity.

Moreover, the above solution, still makes module values a totally opaque
matter, with the fowllowing problems:

1. The RDB check tool can just skip the value without being able to at
least check the general structure. For datasets composed mostly of
modules values this means to just check the outer level of the RDB not
actually doing any checko on most of the data itself.
2. It is not possible to do any recovering or processing of data for which a
module no longer exists in the future, or is unknown.

So this commit implements a different solution. The modules RDB
serialization API is composed if well defined calls to store integers,
floats, doubles or strings. After this commit, the parts generated by
the module API have a one-byte prefix for each of the above emitted
parts, and there is a final EOF byte as well. So even if we don't know
exactly how to interpret a module value, we can always parse it at an
high level, check the overall structure, understand the types used to
store the information, and easily skip the whole value.

The change is backward compatible: older RDB files can be still loaded
since the new encoding has a new RDB type: MODULE_2 (of value 7).
The commit also implements the ability to check RDB files for sanity
taking advantage of the new feature.
2017-06-27 13:19:16 +02:00
antirez
e5ec0a9f4b Collect fork() timing info only if fork succeeded. 2017-05-19 11:10:36 +02:00