243 Commits

Author SHA1 Message Date
antirez
5de189fd79 A few more AUX info fields added to RDB. 2015-01-08 09:52:59 +01:00
antirez
d93e29bea0 RDB AUX fields support.
This commit introduces a new RDB data type called 'aux'. It is used in
order to insert inside an RDB file key-value pairs that may serve
different needs, without breaking backward compatibility when new
informations are embedded inside an RDB file. The contract between Redis
versions is to ignore unknown aux fields when encountered.

Aux fields can be used in order to:

1. Augment the RDB file with info like version of Redis that created the
RDB file, creation time, used memory while the RDB was created, and so
forth.
2. Add state about Redis inside the RDB file that we need to reload
later: replication offset, previos master run ID, in order to improve
failovers safety and allow partial resynchronization after a slave
restart.
3. Anything that we may want to add to RDB files without breaking the
ability of past versions of Redis to load the file.
2015-01-08 09:52:55 +01:00
antirez
e2308cf791 rdbLoad() refactoring to make it simpler to follow. 2015-01-08 09:52:51 +01:00
antirez
4a56ebe7dd New RDB v7 opcode: RESIZEDB.
The new opcode is an hint about the size of the dataset (keys and number
of expires) we are going to load for a given Redis database inside the
RDB file. Since hash tables are resized accordingly ASAP, useless
rehashing is avoided, speeding up load times significantly, in the order
of ~ 20% or more for larger data sets.

Related issue: #1719
2015-01-08 09:52:47 +01:00
antirez
30041299ed Use RDB_LOAD_PLAIN to load quicklists and encoded types.
Before we needed to create a string object with an embedded SDS, adn
basically duplicate the SDS part into a plain zmalloc() allocation.
2015-01-08 09:52:40 +01:00
antirez
a07f5e0b14 RDB refactored to load plain strings from RDB. 2015-01-08 09:52:36 +01:00
Matt Stancliff
16cda6f076 Config: Add quicklist, remove old list options
This removes:
  - list-max-ziplist-entries
  - list-max-ziplist-value

This adds:
  - list-max-ziplist-size
  - list-compress-depth

Also updates config file with new sections and updates
tests to use quicklist settings instead of old list settings.
2015-01-02 11:16:10 -05:00
Matt Stancliff
1120f6b855 Allow compression of interior quicklist nodes
Let user set how many nodes to *not* compress.

We can specify a compression "depth" of how many nodes
to leave uncompressed on each end of the quicklist.

Depth 0 = disable compression.
Depth 1 = only leave head/tail uncompressed.
  - (read as: "skip 1 node on each end of the list before compressing")
Depth 2 = leave head, head->next, tail->prev, tail uncompressed.
  - ("skip 2 nodes on each end of the list before compressing")
Depth 3 = Depth 2 + head->next->next + tail->prev->prev
  - ("skip 3 nodes...")
etc.

This also:
  - updates RDB storage to use native quicklist compression (if node is
    already compressed) instead of uncompressing, generating the RDB string,
    then re-compressing the quicklist node.
  - internalizes the "fill" parameter for the quicklist so we don't
    need to pass it to _every_ function.  Now it's just a property of
    the list.
  - allows a runtime-configurable compression option, so we can
    expose a compresion parameter in the configuration file if people
    want to trade slight request-per-second performance for up to 90%+
    memory savings in some situations.
  - updates the quicklist tests to do multiple passes: 200k+ tests now.
2015-01-02 11:16:09 -05:00
Matt Stancliff
1dfe1cea49 Convert quicklist RDB to store ziplist nodes
Turns out it's a huge improvement during save/reload/migrate/restore
because, with compression enabled, we're compressing 4k or 8k
chunks of data consisting of multiple elements in one ziplist
instead of compressing series of smaller individual elements.
2015-01-02 11:16:09 -05:00
Matt Stancliff
5257f91390 Convert RDB ziplist loading to sdsnative()
This saves us an unnecessary zmalloc, memcpy, and two frees.
2015-01-02 11:16:09 -05:00
Matt Stancliff
e24ef16446 Add quicklist implementation
This replaces individual ziplist vs. linkedlist representations
for Redis list operations.

Big thanks for all the reviews and feedback from everybody in
https://github.com/antirez/redis/pull/2143
2015-01-02 11:16:08 -05:00
Matt Stancliff
01b7155ff5 Fix three simple clang analyzer warnings 2014-12-23 09:31:04 -05:00
antirez
c406fc16e6 INFO loading stats: three fixes.
1. Server unxtime may remain not updated while loading AOF, so ETA is
not updated correctly.

2. Number of processed byte was not initialized.

3. Possible division by zero condition (likely cause of issue #1932).
2014-12-23 14:54:34 +01:00
Alon Diamant
2a52664067 Fixed memory leaks in rdbSaveToSlavesSockets() 2014-12-21 16:13:45 +02:00
antirez
cc1e44966c Use new slave name function for diskless repl reporting. 2014-10-27 12:23:03 +01:00
antirez
7175393b58 Diskless replication: child -> parent communication improved.
Child now reports full info to the parent including IDs of slaves in
failure state and exit code.
2014-10-23 23:10:33 +02:00
antirez
4e8d30fa04 Diskless replication: set / reset socket send timeout.
We need to avoid that a child -> slaves transfer can continue forever.
We use the same timeout used as global replication timeout, which is
documented to also affect I/O operations during bulk transfers.
2014-10-22 15:53:45 +02:00
antirez
ff228efb5c rio fdset target: handle short writes.
While the socket is set in blocking mode, we still can get short writes
writing to a socket.
2014-10-17 16:45:53 +02:00
antirez
8339957bb2 Diskless replication: rio fdset target new supports buffering.
To perform a socket write() for each RDB rio API write call was
extremely unefficient, so now rio has minimal buffering capabilities.
Writes are accumulated into a buffer and only when a given limit is
reacehd are actually wrote to the N slaves FDs.

Trivia: rio lacked support for buffering since our targets were:

1) Memory buffers.
2) C standard I/O.

Both were buffered already.
2014-10-17 11:36:12 +02:00
antirez
849f3fa4df Diskless replication: Various fixes to backgroundSaveDoneHandlerSocket() 2014-10-17 10:43:56 +02:00
antirez
a141f24268 Diskless replication: read report from child. 2014-10-15 11:36:03 +02:00
antirez
c2d1a98b0c Diskless replication: child writes report to parent. 2014-10-15 09:46:49 +02:00
antirez
316c2d0ebc Diskless replication: parent-child pipe and a few TODOs. 2014-10-14 15:29:07 +02:00
antirez
1900d091d7 Diskless replication: RDB -> slaves transfer draft implementation. 2014-10-14 10:11:29 +02:00
antirez
d052e6dbcb Define different types of RDB childs.
We need to remember what is the saving strategy of the current RDB child
process, since the configuration may be modified at runtime via CONFIG
SET and still we'll need to understand, when the child exists, what to
do and for what goal the process was initiated: to create an RDB file
on disk or to write stuff directly to slave's sockets.
2014-10-08 09:09:01 +02:00
antirez
7e4728b545 RDB file creation refactored to target non-disk target. 2014-10-07 12:56:23 +02:00
zionwu
cb88673a3f Fix incorrect comments
error != success; and 0 != number of bytes written

Closes #1806
2014-09-29 06:49:06 -04:00
yoav
3fc3157dc2 Add error check for writing RDB checksum
Closes #857
2014-08-18 11:09:06 +02:00
Matt Stancliff
895bbb535d Fix assert technical correctness
dictAdd returns DICT_OK, not REDIS_OK. They both
have the same underlying values, so it works even though
the code is technically wrong.

Fixes #1512
2014-08-08 10:03:22 +02:00
antirez
910499adaa Force quit when receiving a second SIGINT.
Also quit ASAP when we are still loading a DB, since care is not needed
in this special condition, especially for a SIGINT.
2014-08-07 16:39:02 +02:00
Yossi Gottlieb
2c1a1612bd Fail SYNC if background save child aborted due to a signal. 2014-07-28 14:43:30 +03:00
antirez
fc21a596e6 RDB: load string objects directly as EMBSTR objects when possible. 2014-07-16 11:36:22 +02:00
antirez
a519c133a6 LATENCY DOCTOR first implementation complete. 2014-07-08 17:05:56 +02:00
antirez
51116b4638 Latency monitor: more hooks around the code. 2014-07-01 17:19:08 +02:00
antirez
1c94889182 No more trailing spaces in Redis source code. 2014-06-26 18:48:40 +02:00
Akos Vandra
52ede31ac4 Fixed possible buffer overflow bug if RDB file is corrupted.
(Note: commit message modified by @antirez for clarity).
2014-05-12 11:48:14 +02:00
Akos Vandra
367c237194 fixed possible buffer overflow error 2014-05-12 11:19:07 +02:00
antirez
4912f873b4 Process events with processEventsWhileBlocked() when blocked.
When we are blocked and a few events a processed from time to time, it
is smarter to call the event handler a few times in order to handle the
accept, read, write, close cycle of a client in a single pass, otherwise
there is too much latency added for clients to receive a reply while the
server is busy in some way (for example during the DB loading).
2014-04-24 21:44:32 +02:00
Matt Stancliff
a2c4706fc7 Fix data loss when save AOF/RDB with no free space
Previously, the (!fp) would only catch lack of free space
under OS X.  Linux waits to discover it can't write until
it actually writes contents to disk.

(fwrite() returns success even if the underlying file
has no free space to write into.  All the errors
only show up at flush/sync/close time.)

Fixes antirez/redis#1604
2014-03-24 13:54:14 -04:00
antirez
bea68cc21f Update cached time in rdbLoad() callback.
server.unixtime and server.mstime are cached less precise timestamps
that we use every time we don't need an accurate time representation and
a syscall would be too slow for the number of calls we require.

Such an example is the initialization and update process of the last
interaction time with the client, that is used for timeouts.

However rdbLoad() can take some time to load the DB, but at the same
time it did not updated the time during DB loading. This resulted in the
bug described in issue #1535, where in the replication process the slave
loads the DB, creates the redisClient representation of its master, but
the timestamp is so old that the master, under certain conditions, is
sensed as already "timed out".

Thanks to @yoav-steinberg and Redis Labs Inc for the bug report and
analysis.
2014-02-13 15:13:26 +01:00
antirez
ccd6ccc7dd Slaves heartbeats during sync improved.
The previous fix for false positive timeout detected by master was not
complete. There is another blocking stage while loading data for the
first synchronization with the master, that is, flushing away the
current data from the DB memory.

This commit uses the newly introduced dict.c callback in order to make
some incremental work (to send "\n" heartbeats to the master) while
flushing the old data from memory.

It is hard to write a regression test for this issue unfortunately. More
support for debugging in the Redis core would be needed in terms of
functionalities to simulate a slow DB loading / deletion.
2013-12-10 18:47:31 +01:00
antirez
18d92c0836 Don't send more than 1 newline/sec while loading RDB. 2013-12-10 18:43:19 +01:00
antirez
54a526687d Slaves heartbeat while loading RDB files.
Starting with Redis 2.8 masters are able to detect timed out slaves,
while before 2.8 only slaves were able to detect a timed out master.

Now that timeout detection is bi-directional the following problem
happens as described "in the field" by issue #1449:

1) Master and slave setup with big dataset.
2) Slave performs the first synchronization, or a full sync
   after a failed partial resync.
3) Master sends the RDB payload to the slave.
4) Slave loads this payload.
5) Master detects the slave as timed out since does not receive back the
   REPLCONF ACK acknowledges.

Here the problem is that the master has no way to know how much the
slave will take to load the RDB file in memory. The obvious solution is
to use a greater replication timeout setting, but this is a shame since
for the 0.1% of operation time we are forced to use a timeout that is
not what is suited for 99.9% of operation time.

This commit tries to fix this problem with a solution that is a bit of
an hack, but that modifies little of the replication internals, in order
to be back ported to 2.8 safely.

During the RDB loading time, we send the master newlines to avoid
being sensed as timed out. This is the same that the master already does
while saving the RDB file to still signal its presence to the slave.

The single newline is used because:

1) It can't desync the protocol, as it is only transmitted all or
nothing.
2) It can be safely sent while we don't have a client structure for the
master or in similar situations just with write(2).
2013-12-09 20:26:00 +01:00
antirez
7a5a646df9 Fixed grammar: before H the article is a, not an. 2013-12-05 16:35:32 +01:00
antirez
8c2127f9c9 Fix broken rdbWriteRaw() return value check in rdb.c.
Thanks to @PhoneLi for reporting.
2013-11-07 23:53:18 +01:00
antirez
2862182e9e Update server.lastbgsave_status when fork() fails. 2013-08-27 10:16:29 +02:00
antirez
2f47ed9f5f Use printf %zu specifier to print private_dirty. 2013-08-20 12:04:57 +02:00
antirez
aa32f92338 Introduction of a new string encoding: EMBSTR
Previously two string encodings were used for string objects:

1) REDIS_ENCODING_RAW: a string object with obj->ptr pointing to an sds
stirng.

2) REDIS_ENCODING_INT: a string object where the obj->ptr void pointer
is casted to a long.

This commit introduces a experimental new encoding called
REDIS_ENCODING_EMBSTR that implements an object represented by an sds
string that is not modifiable but allocated in the same memory chunk as
the robj structure itself.

The chunk looks like the following:

+--------------+-----------+------------+--------+----+
| robj data... | robj->ptr | sds header | string | \0 |
+--------------+-----+-----+------------+--------+----+
                     |                       ^
                     +-----------------------+

The robj->ptr points to the contiguous sds string data, so the object
can be manipulated with the same functions used to manipulate plan
string objects, however we need just on malloc and one free in order to
allocate or release this kind of objects. Moreover it has better cache
locality.

This new allocation strategy should benefit both the memory usage and
the performances. A performance gain between 60 and 70% was observed
during micro-benchmarks, however there is more work to do to evaluate
the performance impact and the memory usage behavior.
2013-07-22 10:31:38 +02:00
yoav
dddfb15bc0 Chunked loading of RDB to prevent redis from stalling reading very large keys. 2013-07-16 15:41:24 +02:00
antirez
d3cde09645 Binding multiple IPs done properly with multiple sockets. 2013-07-05 11:47:20 +02:00