188 Commits

Author SHA1 Message Date
antirez
996bc7b3e1 PSYNC2: Save replication ID/offset on RDB file.
This means that stopping a slave and restarting it will still make it
able to PSYNC with the master. Moreover the master itself will retain
its ID/offset, in case it gets turned into a slave, or if a slave will
try to PSYNC with it with an exactly updated offset (otherwise there is
no backlog).

This change was possible thanks to PSYNC v2 that makes saving the current
replication state much simpler.
2016-11-10 12:35:29 +01:00
antirez
f53ed7a969 PSYNC2: different improvements to Redis replication.
The gist of the changes is that now, partial resynchronizations between
slaves and masters (without the need of a full resync with RDB transfer
and so forth), work in a number of cases when it was impossible
in the past. For instance:

1. When a slave is promoted to mastrer, the slaves of the old master can
partially resynchronize with the new master.

2. Chained slalves (slaves of slaves) can be moved to replicate to other
slaves or the master itsef, without requiring a full resync.

3. The master itself, after being turned into a slave, is able to
partially resynchronize with the new master, when it joins replication
again.

In order to obtain this, the following main changes were operated:

* Slaves also take a replication backlog, not just masters.

* Same stream replication for all the slaves and sub slaves. The
replication stream is identical from the top level master to its slaves
and is also the same from the slaves to their sub-slaves and so forth.
This means that if a slave is later promoted to master, it has the
same replication backlong, and can partially resynchronize with its
slaves (that were previously slaves of the old master).

* A given replication history is no longer identified by the `runid` of
a Redis node. There is instead a `replication ID` which changes every
time the instance has a new history no longer coherent with the past
one. So, for example, slaves publish the same replication history of
their master, however when they are turned into masters, they publish
a new replication ID, but still remember the old ID, so that they are
able to partially resynchronize with slaves of the old master (up to a
given offset).

* The replication protocol was slightly modified so that a new extended
+CONTINUE reply from the master is able to inform the slave of a
replication ID change.

* REPLCONF CAPA is used in order to notify masters that a slave is able
to understand the new +CONTINUE reply.

* The RDB file was extended with an auxiliary field that is able to
select a given DB after loading in the slave, so that the slave can
continue receiving the replication stream from the point it was
disconnected without requiring the master to insert "SELECT" statements.
This is useful in order to guarantee the "same stream" property, because
the slave must be able to accumulate an identical backlog.

* Slave pings to sub-slaves are now sent in a special form, when the
top-level master is disconnected, in order to don't interfer with the
replication stream. We just use out of band "\n" bytes as in other parts
of the Redis protocol.

An old design document is available here:

https://gist.github.com/antirez/ae068f95c0d084891305

However the implementation is not identical to the description because
during the work to implement it, different changes were needed in order
to make things working well.
2016-11-09 15:37:15 +01:00
antirez
56f2406e09 Module: Ability to get context from IO context.
It was noted by @dvirsky that it is not possible to use string functions
when writing the AOF file. This sometimes is critical since the command
rewriting may need to be built in the context of the AOF callback, and
without access to the context, and the limited types that the AOF
production functions will accept, this can be an issue.

Moreover there are other needs that we can't anticipate regarding the
ability to use Redis Modules APIs using the context in order to build
representations to emit AOF / RDB.

Because of this a new API was added that allows the user to get a
temporary context from the IO context. The context is auto released
if obtained when the RDB / AOF callback returns.

Calling multiple time the function to get the context, always returns
the same one, since it is invalid to have more than a single context.
2016-10-06 17:09:26 +02:00
antirez
adcafbd283 Modules: API to save/load single precision floating point numbers.
When double precision is not needed, to take 2x space in the
serialization is not good.
2016-10-03 00:08:35 +02:00
antirez
62e7179de9 Child -> Parent pipe for COW info transferring. 2016-09-19 13:45:20 +02:00
antirez
7e86bce6b0 zmalloc: zmalloc_get_smap_bytes_by_field() modified to work for any PID.
The goal is to get copy-on-write amount of the child from the parent.
2016-09-19 10:28:42 +02:00
antirez
19ba3dbb70 Merge branch 'aofrdb' into unstable 2016-09-09 15:03:21 +02:00
antirez
a7e65c9a19 Fix rdb.c var types when calling rdbLoadLen().
Technically as soon as Redis 64 bit gets proper support for loading
collections and/or DBs with more than 2^32 elements, the 32 bit version
should be modified in order to check if what we read from rdbLoadLen()
overflows. This would only apply to huge RDB files created with a 64 bit
instance and later loaded into a 32 bit instance.
2016-09-01 11:08:44 +02:00
antirez
16214c0658 RDB AOF preamble: WIP 3 (RDB loading refactoring). 2016-08-11 15:27:29 +02:00
antirez
3d6b2655b9 RDB AOF preamble: WIP 2. 2016-08-09 16:41:40 +02:00
antirez
805f8c2c90 RDB AOF preamble: WIP 1. 2016-08-09 11:07:32 +02:00
antirez
276dfeb3d0 Avoid simultaneous RDB and AOF child process.
This patch, written in collaboration with Oran Agra (@oranagra) is a companion
to 03a4432. Together the two patches should avoid that the AOF and RDB saving
processes can be spawned at the same time. Previously conditions that
could lead to two saving processes at the same time were:

1. When AOF is enabled via CONFIG SET and an RDB saving process is
   already active.

2. When the SYNC command decides to start an RDB saving process ASAP in
   order to serve a new slave that cannot partially resynchronize (but
   only if we have a disk target for replication, for diskless
   replication there is not such a problem).

Condition "1" is not very severe but "2" can happen often and is
definitely good at degrading Redis performances in an unexpected way.

The two commits have the effect of always spawning RDB savings for
replication in replicationCron() instead of attempting to start an RDB
save synchronously. Moreover when a BGSAVE or AOF rewrite must be
performed, they are instead just postponed using flags that will try to
perform such operations ASAP.

Finally the BGSAVE command was modified in order to accept a SCHEDULE
option so that if an AOF rewrite is in progress, when this option is
given, the command no longer returns an error, but instead schedules an
RDB rewrite operation for when it will be possible to start it.
2016-07-21 18:35:01 +02:00
antirez
bad6463f55 In Redis RDB check: more details in error reportings. 2016-07-01 15:26:55 +02:00
antirez
8b44311d67 In Redis RDB check: log decompression errors. 2016-07-01 11:59:25 +02:00
antirez
a89f5d2bdf In Redis RDB check: better error reporting. 2016-07-01 09:36:52 +02:00
Pierre Chapuis
f0d0231473 fix some compiler warnings 2016-06-05 16:48:45 +02:00
antirez
9deb98167b Modules: support for modules native data types. 2016-06-03 18:14:04 +02:00
antirez
4e37d7d2c8 RDB v8: fix rdbLoadLen() return value. 2016-06-01 20:18:28 +02:00
antirez
fb9173a888 RDB v8: new ZSET storage format with binary doubles. 2016-06-01 12:12:26 +02:00
antirez
8bfdd07667 RDB v8: ability to save uint64_t lengths. 2016-06-01 11:35:47 +02:00
Oran Agra
7496636280 various cleanups and minor fixes 2016-04-25 16:49:57 +03:00
Oran Agra
1c396991dc fix small issues in redis 3.2 2016-04-25 14:19:28 +03:00
antirez
26e5f3ff33 Include full paths on RDB/AOF files errors.
Close #3086.
2016-02-15 16:15:01 +01:00
antirez
cd94476e0b Lazyfree: Hash converted to use plain SDS WIP 5. 2015-10-01 13:02:25 +02:00
antirez
c4175281f6 Lazyfree: Sorted sets convereted to plain SDS. (several commits squashed) 2015-10-01 13:02:24 +02:00
antirez
062bf5ce19 Lazyfree: Convert Sets to use plains SDS (several commits squashed). 2015-10-01 13:02:24 +02:00
antirez
331835ca06 Undo slaves state change on failed rdbSaveToSlavesSockets().
As Oran Agra suggested, in startBgsaveForReplication() when the BGSAVE
attempt returns an error, we scan the list of slaves in order to remove
them since there is no way to serve them currently.

However we check for the replication state BGSAVE_START, which was
modified by rdbSaveToSlaveSockets() before forking(). So when fork fails
the state of slaves remain BGSAVE_END and no cleanup is performed.

This commit fixes the problem by making rdbSaveToSlavesSockets() able to
undo the state change on fork failure.
2015-09-07 16:09:23 +02:00
antirez
b83005ae8a Remove slave state change handled by replicationSetupSlaveForFullResync(). 2015-08-05 13:58:56 +02:00
antirez
c3f2bcafc8 Make sure we re-emit SELECT after each new slave full sync setup.
In previous commits we moved the FULLRESYNC to the moment we start the
BGSAVE, so that the offset we provide is the right one. However this
also means that we need to re-emit the SELECT statement every time a new
slave starts to accumulate the changes.

To obtian this effect in a more clean way, the function that sends the
FULLRESYNC reply was overloaded with a more important role of also doing
this and chanigng the slave state. So it was renamed to
replicationSetupSlaveForFullResync() to better reflect what it does now.
2015-08-05 13:34:46 +02:00
antirez
142611882a PSYNC initial offset fix.
This commit attempts to fix a bug involving PSYNC and diskless
replication (currently experimental) found by Yuval Inbar from Redis Labs
and that was later found to have even more far reaching effects (the bug also
exists when diskstore is off).

The gist of the bug is that, a Redis master replies with +FULLRESYNC to
a PSYNC attempt that fails and requires a full resynchronization.
However, the baseline offset sent along with FULLRESYNC was always the
current master replication offset. This is not ok, because there are
many reasosn that may delay the RDB file creation. And... guess what,
the master offset we communicate must be the one of the time the RDB
was created. So for example:

1) When the BGSAVE for replication is delayed since there is one
   already but is not good for replication.
2) When the BGSAVE is not needed as we attach one currently ongoing.
3) When because of diskless replication the BGSAVE is delayed.

In all the above cases the PSYNC reply is wrong and the slave may
reconnect later claiming to need a wrong offset: this may cause
data curruption later.
2015-08-04 17:06:10 +02:00
antirez
c15cac0d77 RDMF: More consistent define names. 2015-07-27 14:37:58 +02:00
antirez
8a893fa4cf RDMF: REDIS_OK REDIS_ERR -> C_OK C_ERR. 2015-07-26 23:17:55 +02:00
antirez
58844a7bfe RDMF: redisAssert -> serverAssert. 2015-07-26 15:29:53 +02:00
antirez
62b27ebc2a RDMF: OBJ_ macros for object related stuff. 2015-07-26 15:28:00 +02:00
antirez
fa26d3dd63 RDMF: use client instead of redisClient, like Disque. 2015-07-26 15:20:52 +02:00
antirez
e2b858a580 RDMF: redisLog -> serverLog. 2015-07-26 15:17:43 +02:00
antirez
6a424b5e36 RDMF (Redis/Disque merge friendlyness) refactoring WIP 1. 2015-07-26 15:17:18 +02:00
Yongyue Sun
9d9e0190b5 bugfix: errno might change before logging
Signed-off-by: Yongyue Sun <abioy.sun@gmail.com>
2015-07-17 10:47:32 +02:00
Salvatore Sanfilippo
de4eacd132 Merge pull request #2301 from mattsta/fix/lengths
Improve type correctness
2015-02-24 17:22:53 +01:00
antirez
1d68744ad3 Check RDB automatically in a few more cases. 2015-02-03 10:33:05 +01:00
Matt Stancliff
8ada516fb6 Improve RDB error-on-load handling
Previouly if we loaded a corrupt RDB, Redis printed an error report
with a big "REPORT ON GITHUB" message at the bottom.  But, we know
RDB load failures are corrupt data, not corrupt code.

Now when RDB failure is detected (duplicate keys or unknown data
types in the file), we run check-rdb against the RDB then exit.  The
automatic check-rdb hopefully gives the user instant feedback
about what is wrong instead of providing a mysterious stack
trace.
2015-01-28 11:19:00 -05:00
antirez
e63ad12b8f Fix gcc warning for lack of casting to char pointer. 2015-01-21 14:51:42 +01:00
Matt Stancliff
0c611363e5 Improve RDB type correctness
It's possible large objects could be larger than 'int', so let's
upgrade all size counters to ssize_t.

This also fixes rdbSaveObject serialized bytes calculation.
Since entire serializations of data structures can be large,
so we don't want to limit their calculated size to a 32 bit signed max.

This commit increases object size calculation and
cascades the change back up to serializedlength printing.

Before:
127.0.0.1:6379> debug object hihihi
... encoding:quicklist serializedlength:-2147483559 ...

After:
127.0.0.1:6379> debug object hihihi
... encoding:quicklist serializedlength:2147483737 ...
2015-01-19 14:10:12 -05:00
Matt Stancliff
c0b0e23100 Remove RDB AUX memory leaks 2015-01-09 15:19:18 -05:00
antirez
83c56336e0 Typo fixed: fiels -> fields in rdbSaveInfoAuxFields().
Thx to @badboy.
2015-01-08 12:06:22 +01:00
antirez
5de189fd79 A few more AUX info fields added to RDB. 2015-01-08 09:52:59 +01:00
antirez
d93e29bea0 RDB AUX fields support.
This commit introduces a new RDB data type called 'aux'. It is used in
order to insert inside an RDB file key-value pairs that may serve
different needs, without breaking backward compatibility when new
informations are embedded inside an RDB file. The contract between Redis
versions is to ignore unknown aux fields when encountered.

Aux fields can be used in order to:

1. Augment the RDB file with info like version of Redis that created the
RDB file, creation time, used memory while the RDB was created, and so
forth.
2. Add state about Redis inside the RDB file that we need to reload
later: replication offset, previos master run ID, in order to improve
failovers safety and allow partial resynchronization after a slave
restart.
3. Anything that we may want to add to RDB files without breaking the
ability of past versions of Redis to load the file.
2015-01-08 09:52:55 +01:00
antirez
e2308cf791 rdbLoad() refactoring to make it simpler to follow. 2015-01-08 09:52:51 +01:00
antirez
4a56ebe7dd New RDB v7 opcode: RESIZEDB.
The new opcode is an hint about the size of the dataset (keys and number
of expires) we are going to load for a given Redis database inside the
RDB file. Since hash tables are resized accordingly ASAP, useless
rehashing is avoided, speeding up load times significantly, in the order
of ~ 20% or more for larger data sets.

Related issue: #1719
2015-01-08 09:52:47 +01:00
antirez
30041299ed Use RDB_LOAD_PLAIN to load quicklists and encoded types.
Before we needed to create a string object with an embedded SDS, adn
basically duplicate the SDS part into a plain zmalloc() allocation.
2015-01-08 09:52:40 +01:00