2012-11-08 18:25:23 +01:00
/*
* Copyright ( c ) 2009 - 2012 , Salvatore Sanfilippo < antirez at gmail dot com >
* All rights reserved .
*
* Redistribution and use in source and binary forms , with or without
* modification , are permitted provided that the following conditions are met :
*
* * Redistributions of source code must retain the above copyright notice ,
* this list of conditions and the following disclaimer .
* * Redistributions in binary form must reproduce the above copyright
* notice , this list of conditions and the following disclaimer in the
* documentation and / or other materials provided with the distribution .
* * Neither the name of Redis nor the names of its contributors may be used
* to endorse or promote products derived from this software without
* specific prior written permission .
*
* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS " AS IS "
* AND ANY EXPRESS OR IMPLIED WARRANTIES , INCLUDING , BUT NOT LIMITED TO , THE
* IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
* ARE DISCLAIMED . IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
* LIABLE FOR ANY DIRECT , INDIRECT , INCIDENTAL , SPECIAL , EXEMPLARY , OR
* CONSEQUENTIAL DAMAGES ( INCLUDING , BUT NOT LIMITED TO , PROCUREMENT OF
* SUBSTITUTE GOODS OR SERVICES ; LOSS OF USE , DATA , OR PROFITS ; OR BUSINESS
* INTERRUPTION ) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY , WHETHER IN
* CONTRACT , STRICT LIABILITY , OR TORT ( INCLUDING NEGLIGENCE OR OTHERWISE )
* ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE , EVEN IF ADVISED OF THE
* POSSIBILITY OF SUCH DAMAGE .
*/
2015-07-26 15:14:57 +02:00
# include "server.h"
2011-09-13 18:27:08 +02:00
# include "bio.h"
2011-09-22 15:15:26 +02:00
# include "rio.h"
2021-10-07 14:41:26 +03:00
# include "functions.h"
2010-06-22 00:07:48 +02:00
# include <signal.h>
# include <fcntl.h>
# include <sys/stat.h>
2010-07-01 21:13:38 +02:00
# include <sys/types.h>
# include <sys/time.h>
# include <sys/resource.h>
# include <sys/wait.h>
2016-02-15 16:14:56 +01:00
# include <sys/param.h>
2010-06-22 00:07:48 +02:00
2021-08-09 01:03:59 -07:00
void freeClientArgv ( client * c ) ;
2022-02-22 08:59:23 +02:00
off_t getAppendOnlyFileSize ( sds filename , int * status ) ;
off_t getBaseAndIncrAppendOnlyFilesSize ( aofManifest * am , int * status ) ;
Implement Multi Part AOF mechanism to avoid AOFRW overheads. (#9788)
Implement Multi-Part AOF mechanism to avoid overheads during AOFRW.
Introducing a folder with multiple AOF files tracked by a manifest file.
The main issues with the the original AOFRW mechanism are:
* buffering of commands that are processed during rewrite (consuming a lot of RAM)
* freezes of the main process when the AOFRW completes to drain the remaining part of the buffer and fsync it.
* double disk IO for the data that arrives during AOFRW (had to be written to both the old and new AOF files)
The main modifications of this PR:
1. Remove the AOF rewrite buffer and related code.
2. Divide the AOF into multiple files, they are classified as two types, one is the the `BASE` type,
it represents the full amount of data (Maybe AOF or RDB format) after each AOFRW, there is only
one `BASE` file at most. The second is `INCR` type, may have more than one. They represent the
incremental commands since the last AOFRW.
3. Use a AOF manifest file to record and manage these AOF files mentioned above.
4. The original configuration of `appendfilename` will be the base part of the new file name, for example:
`appendonly.aof.1.base.rdb` and `appendonly.aof.2.incr.aof`
5. Add manifest-related TCL tests, and modified some existing tests that depend on the `appendfilename`
6. Remove the `aof_rewrite_buffer_length` field in info.
7. Add `aof-disable-auto-gc` configuration. By default we're automatically deleting HISTORY type AOFs.
It also gives users the opportunity to preserve the history AOFs. just for testing use now.
8. Add AOFRW limiting measure. When the AOFRW failures reaches the threshold (3 times now),
we will delay the execution of the next AOFRW by 1 minute. If the next AOFRW also fails, it will be
delayed by 2 minutes. The next is 4, 8, 16, the maximum delay is 60 minutes (1 hour). During the limit
period, we can still use the 'bgrewriteaof' command to execute AOFRW immediately.
9. Support upgrade (load) data from old version redis.
10. Add `appenddirname` configuration, as the directory name of the append only files. All AOF files and
manifest file will be placed in this directory.
11. Only the last AOF file (BASE or INCR) can be truncated. Otherwise redis will exit even if
`aof-load-truncated` is enabled.
Co-authored-by: Oran Agra <oran@redislabs.com>
2022-01-04 01:14:13 +08:00
int getBaseAndIncrAppendOnlyFilesNum ( aofManifest * am ) ;
int aofFileExist ( char * filename ) ;
Always create base AOF file when redis start from empty. (#10102)
Force create a BASE file (use a foreground `rewriteAppendOnlyFile`) when redis starts from an
empty data set and `appendonly` is yes.
The reasoning is that normally, after redis is running for some time, and the AOF has gone though
a few rewrites, there's always a base rdb file. and the scenario where the base file is missing, is
kinda rare (happens only at empty startup), so this change normalizes it.
But more importantly, there are or could be some complex modules that are started with some
configuration, when they create persistence they write that configuration to RDB AUX fields, so
that can can always know with which configuration the persistence file they're loading was
created (could be critical). there is (was) one scenario in which they could load their persisted data,
and that configuration was missing, and this change fixes it.
Add a new module event: REDISMODULE_SUBEVENT_PERSISTENCE_SYNC_AOF_START, similar to
REDISMODULE_SUBEVENT_PERSISTENCE_AOF_START which is async.
Co-authored-by: Oran Agra <oran@redislabs.com>
2022-01-13 14:49:26 +08:00
int rewriteAppendOnlyFile ( char * filename ) ;
Adapt redis-check-aof tool for Multi Part Aof (#10061)
Modifications of this PR:
1. Support the verification of `Multi Part AOF`, while still maintaining support for the
old-style `AOF/RDB-preamble`. `redis-check-aof` will automatically choose which
mode to use according to the incoming file format.
`Usage: redis-check-aof [--fix|--truncate-to-timestamp $timestamp] <AOF/manifest>`
2. Refactor part of the code to make it easier to understand
3. Currently only supports truncate (`--fix` or `--truncate-to-timestamp`) the last AOF
file (may be `BASE` or `INCR`)
The reasons for 3 above:
- for `--fix`: Only the last AOF may be truncated, this is guaranteed by redis
- for `--truncate-to-timestamp`: Normally, we only have `BASE` + `INCR` files
at most, and `BASE` cannot be truncated(It only contains a timestamp annotation
at the beginning of the file), so only `INCR` can be truncated. If we have a
`BASE+INCR1+INCR2` file (meaning we have an interrupted AOFRW), Only `INCR2`
files can be truncated at this time. If we still insist on truncate `INCR1`, we need to
manually delete `INCR2` and update the manifest file, then re-run `redis-check-aof`
- If we want to support truncate any file, we need to add very complicated code to support
the atomic modification of multiple file deletion and update manifest, I think this is unnecessary
2022-02-17 14:13:28 +08:00
aofManifest * aofLoadManifestFromFile ( sds am_filepath ) ;
void aofManifestFreeAndUpdate ( aofManifest * am ) ;
fsync the old aof file when open a new INCR AOF (#11004)
In rewriteAppendOnlyFileBackground, after flushAppendOnlyFile(1),
and before openNewIncrAofForAppend, we should call redis_fsync
to fsync the aof file.
Because we may open a new INCR AOF in openNewIncrAofForAppend,
in the case of using everysec policy, the old AOF file may not
be fsynced in time (or even at all).
When using everysec, we don't want to pay the disk latency from
the main thread, so we will do a background fsync.
Adding a argument for bioCreateCloseJob, a `need_fsync` flag to
indicate that a fsync is required before the file is closed. So we will
fsync the old AOF file before we close it.
A cleanup, we make union become a union, since the free_* args and
the fd / fsync args are never used together.
Co-authored-by: Oran Agra <oran@redislabs.com>
2022-07-25 14:16:35 +08:00
void aof_background_fsync_and_close ( int fd ) ;
2011-06-10 12:39:23 +02:00
Allow an AOF rewrite buffer > 2GB (Fix for issue #504).
During the AOF rewrite process, the parent process needs to accumulate
the new writes in an in-memory buffer: when the child will terminate the
AOF rewriting process this buffer (that ist the difference between the
dataset when the rewrite was started, and the current dataset) is
flushed to the new AOF file.
We used to implement this buffer using an sds.c string, but sds.c has a
2GB limit. Sometimes the dataset can be big enough, the amount of writes
so high, and the rewrite process slow enough that we overflow the 2GB
limit, causing a crash, documented on github by issue #504.
In order to prevent this from happening, this commit introduces a new
system to accumulate writes, implemented by a linked list of blocks of
10 MB each, so that we also avoid paying the reallocation cost.
Note that theoretically modern operating systems may implement realloc()
simply as a remaping of the old pages, thus with very good performances,
see for instance the mremap() syscall on Linux. However this is not
always true, and jemalloc by default avoids doing this because there are
issues with the current implementation of mremap().
For this reason we are using a linked list of blocks instead of a single
block that gets reallocated again and again.
The changes in this commit lacks testing, that will be performed before
merging into the unstable branch. This fix will not enter 2.4 because it
is too invasive. However 2.4 will log a warning when the AOF rewrite
buffer is near to the 2GB limit.
2012-05-22 13:03:41 +02:00
/* ----------------------------------------------------------------------------
Implement Multi Part AOF mechanism to avoid AOFRW overheads. (#9788)
Implement Multi-Part AOF mechanism to avoid overheads during AOFRW.
Introducing a folder with multiple AOF files tracked by a manifest file.
The main issues with the the original AOFRW mechanism are:
* buffering of commands that are processed during rewrite (consuming a lot of RAM)
* freezes of the main process when the AOFRW completes to drain the remaining part of the buffer and fsync it.
* double disk IO for the data that arrives during AOFRW (had to be written to both the old and new AOF files)
The main modifications of this PR:
1. Remove the AOF rewrite buffer and related code.
2. Divide the AOF into multiple files, they are classified as two types, one is the the `BASE` type,
it represents the full amount of data (Maybe AOF or RDB format) after each AOFRW, there is only
one `BASE` file at most. The second is `INCR` type, may have more than one. They represent the
incremental commands since the last AOFRW.
3. Use a AOF manifest file to record and manage these AOF files mentioned above.
4. The original configuration of `appendfilename` will be the base part of the new file name, for example:
`appendonly.aof.1.base.rdb` and `appendonly.aof.2.incr.aof`
5. Add manifest-related TCL tests, and modified some existing tests that depend on the `appendfilename`
6. Remove the `aof_rewrite_buffer_length` field in info.
7. Add `aof-disable-auto-gc` configuration. By default we're automatically deleting HISTORY type AOFs.
It also gives users the opportunity to preserve the history AOFs. just for testing use now.
8. Add AOFRW limiting measure. When the AOFRW failures reaches the threshold (3 times now),
we will delay the execution of the next AOFRW by 1 minute. If the next AOFRW also fails, it will be
delayed by 2 minutes. The next is 4, 8, 16, the maximum delay is 60 minutes (1 hour). During the limit
period, we can still use the 'bgrewriteaof' command to execute AOFRW immediately.
9. Support upgrade (load) data from old version redis.
10. Add `appenddirname` configuration, as the directory name of the append only files. All AOF files and
manifest file will be placed in this directory.
11. Only the last AOF file (BASE or INCR) can be truncated. Otherwise redis will exit even if
`aof-load-truncated` is enabled.
Co-authored-by: Oran Agra <oran@redislabs.com>
2022-01-04 01:14:13 +08:00
* AOF Manifest file implementation .
Allow an AOF rewrite buffer > 2GB (Fix for issue #504).
During the AOF rewrite process, the parent process needs to accumulate
the new writes in an in-memory buffer: when the child will terminate the
AOF rewriting process this buffer (that ist the difference between the
dataset when the rewrite was started, and the current dataset) is
flushed to the new AOF file.
We used to implement this buffer using an sds.c string, but sds.c has a
2GB limit. Sometimes the dataset can be big enough, the amount of writes
so high, and the rewrite process slow enough that we overflow the 2GB
limit, causing a crash, documented on github by issue #504.
In order to prevent this from happening, this commit introduces a new
system to accumulate writes, implemented by a linked list of blocks of
10 MB each, so that we also avoid paying the reallocation cost.
Note that theoretically modern operating systems may implement realloc()
simply as a remaping of the old pages, thus with very good performances,
see for instance the mremap() syscall on Linux. However this is not
always true, and jemalloc by default avoids doing this because there are
issues with the current implementation of mremap().
For this reason we are using a linked list of blocks instead of a single
block that gets reallocated again and again.
The changes in this commit lacks testing, that will be performed before
merging into the unstable branch. This fix will not enter 2.4 because it
is too invasive. However 2.4 will log a warning when the AOF rewrite
buffer is near to the 2GB limit.
2012-05-22 13:03:41 +02:00
*
Implement Multi Part AOF mechanism to avoid AOFRW overheads. (#9788)
Implement Multi-Part AOF mechanism to avoid overheads during AOFRW.
Introducing a folder with multiple AOF files tracked by a manifest file.
The main issues with the the original AOFRW mechanism are:
* buffering of commands that are processed during rewrite (consuming a lot of RAM)
* freezes of the main process when the AOFRW completes to drain the remaining part of the buffer and fsync it.
* double disk IO for the data that arrives during AOFRW (had to be written to both the old and new AOF files)
The main modifications of this PR:
1. Remove the AOF rewrite buffer and related code.
2. Divide the AOF into multiple files, they are classified as two types, one is the the `BASE` type,
it represents the full amount of data (Maybe AOF or RDB format) after each AOFRW, there is only
one `BASE` file at most. The second is `INCR` type, may have more than one. They represent the
incremental commands since the last AOFRW.
3. Use a AOF manifest file to record and manage these AOF files mentioned above.
4. The original configuration of `appendfilename` will be the base part of the new file name, for example:
`appendonly.aof.1.base.rdb` and `appendonly.aof.2.incr.aof`
5. Add manifest-related TCL tests, and modified some existing tests that depend on the `appendfilename`
6. Remove the `aof_rewrite_buffer_length` field in info.
7. Add `aof-disable-auto-gc` configuration. By default we're automatically deleting HISTORY type AOFs.
It also gives users the opportunity to preserve the history AOFs. just for testing use now.
8. Add AOFRW limiting measure. When the AOFRW failures reaches the threshold (3 times now),
we will delay the execution of the next AOFRW by 1 minute. If the next AOFRW also fails, it will be
delayed by 2 minutes. The next is 4, 8, 16, the maximum delay is 60 minutes (1 hour). During the limit
period, we can still use the 'bgrewriteaof' command to execute AOFRW immediately.
9. Support upgrade (load) data from old version redis.
10. Add `appenddirname` configuration, as the directory name of the append only files. All AOF files and
manifest file will be placed in this directory.
11. Only the last AOF file (BASE or INCR) can be truncated. Otherwise redis will exit even if
`aof-load-truncated` is enabled.
Co-authored-by: Oran Agra <oran@redislabs.com>
2022-01-04 01:14:13 +08:00
* The following code implements the read / write logic of AOF manifest file , which
* is used to track and manage all AOF files .
Allow an AOF rewrite buffer > 2GB (Fix for issue #504).
During the AOF rewrite process, the parent process needs to accumulate
the new writes in an in-memory buffer: when the child will terminate the
AOF rewriting process this buffer (that ist the difference between the
dataset when the rewrite was started, and the current dataset) is
flushed to the new AOF file.
We used to implement this buffer using an sds.c string, but sds.c has a
2GB limit. Sometimes the dataset can be big enough, the amount of writes
so high, and the rewrite process slow enough that we overflow the 2GB
limit, causing a crash, documented on github by issue #504.
In order to prevent this from happening, this commit introduces a new
system to accumulate writes, implemented by a linked list of blocks of
10 MB each, so that we also avoid paying the reallocation cost.
Note that theoretically modern operating systems may implement realloc()
simply as a remaping of the old pages, thus with very good performances,
see for instance the mremap() syscall on Linux. However this is not
always true, and jemalloc by default avoids doing this because there are
issues with the current implementation of mremap().
For this reason we are using a linked list of blocks instead of a single
block that gets reallocated again and again.
The changes in this commit lacks testing, that will be performed before
merging into the unstable branch. This fix will not enter 2.4 because it
is too invasive. However 2.4 will log a warning when the AOF rewrite
buffer is near to the 2GB limit.
2012-05-22 13:03:41 +02:00
*
Implement Multi Part AOF mechanism to avoid AOFRW overheads. (#9788)
Implement Multi-Part AOF mechanism to avoid overheads during AOFRW.
Introducing a folder with multiple AOF files tracked by a manifest file.
The main issues with the the original AOFRW mechanism are:
* buffering of commands that are processed during rewrite (consuming a lot of RAM)
* freezes of the main process when the AOFRW completes to drain the remaining part of the buffer and fsync it.
* double disk IO for the data that arrives during AOFRW (had to be written to both the old and new AOF files)
The main modifications of this PR:
1. Remove the AOF rewrite buffer and related code.
2. Divide the AOF into multiple files, they are classified as two types, one is the the `BASE` type,
it represents the full amount of data (Maybe AOF or RDB format) after each AOFRW, there is only
one `BASE` file at most. The second is `INCR` type, may have more than one. They represent the
incremental commands since the last AOFRW.
3. Use a AOF manifest file to record and manage these AOF files mentioned above.
4. The original configuration of `appendfilename` will be the base part of the new file name, for example:
`appendonly.aof.1.base.rdb` and `appendonly.aof.2.incr.aof`
5. Add manifest-related TCL tests, and modified some existing tests that depend on the `appendfilename`
6. Remove the `aof_rewrite_buffer_length` field in info.
7. Add `aof-disable-auto-gc` configuration. By default we're automatically deleting HISTORY type AOFs.
It also gives users the opportunity to preserve the history AOFs. just for testing use now.
8. Add AOFRW limiting measure. When the AOFRW failures reaches the threshold (3 times now),
we will delay the execution of the next AOFRW by 1 minute. If the next AOFRW also fails, it will be
delayed by 2 minutes. The next is 4, 8, 16, the maximum delay is 60 minutes (1 hour). During the limit
period, we can still use the 'bgrewriteaof' command to execute AOFRW immediately.
9. Support upgrade (load) data from old version redis.
10. Add `appenddirname` configuration, as the directory name of the append only files. All AOF files and
manifest file will be placed in this directory.
11. Only the last AOF file (BASE or INCR) can be truncated. Otherwise redis will exit even if
`aof-load-truncated` is enabled.
Co-authored-by: Oran Agra <oran@redislabs.com>
2022-01-04 01:14:13 +08:00
* Append - only files consist of three types :
Allow an AOF rewrite buffer > 2GB (Fix for issue #504).
During the AOF rewrite process, the parent process needs to accumulate
the new writes in an in-memory buffer: when the child will terminate the
AOF rewriting process this buffer (that ist the difference between the
dataset when the rewrite was started, and the current dataset) is
flushed to the new AOF file.
We used to implement this buffer using an sds.c string, but sds.c has a
2GB limit. Sometimes the dataset can be big enough, the amount of writes
so high, and the rewrite process slow enough that we overflow the 2GB
limit, causing a crash, documented on github by issue #504.
In order to prevent this from happening, this commit introduces a new
system to accumulate writes, implemented by a linked list of blocks of
10 MB each, so that we also avoid paying the reallocation cost.
Note that theoretically modern operating systems may implement realloc()
simply as a remaping of the old pages, thus with very good performances,
see for instance the mremap() syscall on Linux. However this is not
always true, and jemalloc by default avoids doing this because there are
issues with the current implementation of mremap().
For this reason we are using a linked list of blocks instead of a single
block that gets reallocated again and again.
The changes in this commit lacks testing, that will be performed before
merging into the unstable branch. This fix will not enter 2.4 because it
is too invasive. However 2.4 will log a warning when the AOF rewrite
buffer is near to the 2GB limit.
2012-05-22 13:03:41 +02:00
*
Implement Multi Part AOF mechanism to avoid AOFRW overheads. (#9788)
Implement Multi-Part AOF mechanism to avoid overheads during AOFRW.
Introducing a folder with multiple AOF files tracked by a manifest file.
The main issues with the the original AOFRW mechanism are:
* buffering of commands that are processed during rewrite (consuming a lot of RAM)
* freezes of the main process when the AOFRW completes to drain the remaining part of the buffer and fsync it.
* double disk IO for the data that arrives during AOFRW (had to be written to both the old and new AOF files)
The main modifications of this PR:
1. Remove the AOF rewrite buffer and related code.
2. Divide the AOF into multiple files, they are classified as two types, one is the the `BASE` type,
it represents the full amount of data (Maybe AOF or RDB format) after each AOFRW, there is only
one `BASE` file at most. The second is `INCR` type, may have more than one. They represent the
incremental commands since the last AOFRW.
3. Use a AOF manifest file to record and manage these AOF files mentioned above.
4. The original configuration of `appendfilename` will be the base part of the new file name, for example:
`appendonly.aof.1.base.rdb` and `appendonly.aof.2.incr.aof`
5. Add manifest-related TCL tests, and modified some existing tests that depend on the `appendfilename`
6. Remove the `aof_rewrite_buffer_length` field in info.
7. Add `aof-disable-auto-gc` configuration. By default we're automatically deleting HISTORY type AOFs.
It also gives users the opportunity to preserve the history AOFs. just for testing use now.
8. Add AOFRW limiting measure. When the AOFRW failures reaches the threshold (3 times now),
we will delay the execution of the next AOFRW by 1 minute. If the next AOFRW also fails, it will be
delayed by 2 minutes. The next is 4, 8, 16, the maximum delay is 60 minutes (1 hour). During the limit
period, we can still use the 'bgrewriteaof' command to execute AOFRW immediately.
9. Support upgrade (load) data from old version redis.
10. Add `appenddirname` configuration, as the directory name of the append only files. All AOF files and
manifest file will be placed in this directory.
11. Only the last AOF file (BASE or INCR) can be truncated. Otherwise redis will exit even if
`aof-load-truncated` is enabled.
Co-authored-by: Oran Agra <oran@redislabs.com>
2022-01-04 01:14:13 +08:00
* BASE : Represents a Redis snapshot from the time of last AOF rewrite . The manifest
* file contains at most a single BASE file , which will always be the first file in the
* list .
*
* INCR : Represents all write commands executed by Redis following the last successful
* AOF rewrite . In some cases it is possible to have several ordered INCR files . For
* example :
* - During an on - going AOF rewrite
* - After an AOF rewrite was aborted / failed , and before the next one succeeded .
*
* HISTORY : After a successful rewrite , the previous BASE and INCR become HISTORY files .
* They will be automatically removed unless garbage collection is disabled .
*
* The following is a possible AOF manifest file content :
*
* file appendonly . aof .2 . base . rdb seq 2 type b
* file appendonly . aof .1 . incr . aof seq 1 type h
* file appendonly . aof .2 . incr . aof seq 2 type h
* file appendonly . aof .3 . incr . aof seq 3 type h
* file appendonly . aof .4 . incr . aof seq 4 type i
* file appendonly . aof .5 . incr . aof seq 5 type i
Allow an AOF rewrite buffer > 2GB (Fix for issue #504).
During the AOF rewrite process, the parent process needs to accumulate
the new writes in an in-memory buffer: when the child will terminate the
AOF rewriting process this buffer (that ist the difference between the
dataset when the rewrite was started, and the current dataset) is
flushed to the new AOF file.
We used to implement this buffer using an sds.c string, but sds.c has a
2GB limit. Sometimes the dataset can be big enough, the amount of writes
so high, and the rewrite process slow enough that we overflow the 2GB
limit, causing a crash, documented on github by issue #504.
In order to prevent this from happening, this commit introduces a new
system to accumulate writes, implemented by a linked list of blocks of
10 MB each, so that we also avoid paying the reallocation cost.
Note that theoretically modern operating systems may implement realloc()
simply as a remaping of the old pages, thus with very good performances,
see for instance the mremap() syscall on Linux. However this is not
always true, and jemalloc by default avoids doing this because there are
issues with the current implementation of mremap().
For this reason we are using a linked list of blocks instead of a single
block that gets reallocated again and again.
The changes in this commit lacks testing, that will be performed before
merging into the unstable branch. This fix will not enter 2.4 because it
is too invasive. However 2.4 will log a warning when the AOF rewrite
buffer is near to the 2GB limit.
2012-05-22 13:03:41 +02:00
* - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - */
Implement Multi Part AOF mechanism to avoid AOFRW overheads. (#9788)
Implement Multi-Part AOF mechanism to avoid overheads during AOFRW.
Introducing a folder with multiple AOF files tracked by a manifest file.
The main issues with the the original AOFRW mechanism are:
* buffering of commands that are processed during rewrite (consuming a lot of RAM)
* freezes of the main process when the AOFRW completes to drain the remaining part of the buffer and fsync it.
* double disk IO for the data that arrives during AOFRW (had to be written to both the old and new AOF files)
The main modifications of this PR:
1. Remove the AOF rewrite buffer and related code.
2. Divide the AOF into multiple files, they are classified as two types, one is the the `BASE` type,
it represents the full amount of data (Maybe AOF or RDB format) after each AOFRW, there is only
one `BASE` file at most. The second is `INCR` type, may have more than one. They represent the
incremental commands since the last AOFRW.
3. Use a AOF manifest file to record and manage these AOF files mentioned above.
4. The original configuration of `appendfilename` will be the base part of the new file name, for example:
`appendonly.aof.1.base.rdb` and `appendonly.aof.2.incr.aof`
5. Add manifest-related TCL tests, and modified some existing tests that depend on the `appendfilename`
6. Remove the `aof_rewrite_buffer_length` field in info.
7. Add `aof-disable-auto-gc` configuration. By default we're automatically deleting HISTORY type AOFs.
It also gives users the opportunity to preserve the history AOFs. just for testing use now.
8. Add AOFRW limiting measure. When the AOFRW failures reaches the threshold (3 times now),
we will delay the execution of the next AOFRW by 1 minute. If the next AOFRW also fails, it will be
delayed by 2 minutes. The next is 4, 8, 16, the maximum delay is 60 minutes (1 hour). During the limit
period, we can still use the 'bgrewriteaof' command to execute AOFRW immediately.
9. Support upgrade (load) data from old version redis.
10. Add `appenddirname` configuration, as the directory name of the append only files. All AOF files and
manifest file will be placed in this directory.
11. Only the last AOF file (BASE or INCR) can be truncated. Otherwise redis will exit even if
`aof-load-truncated` is enabled.
Co-authored-by: Oran Agra <oran@redislabs.com>
2022-01-04 01:14:13 +08:00
/* Naming rules. */
# define BASE_FILE_SUFFIX ".base"
# define INCR_FILE_SUFFIX ".incr"
# define RDB_FORMAT_SUFFIX ".rdb"
# define AOF_FORMAT_SUFFIX ".aof"
# define MANIFEST_NAME_SUFFIX ".manifest"
2022-04-26 21:31:19 +08:00
# define TEMP_FILE_NAME_PREFIX "temp-"
Implement Multi Part AOF mechanism to avoid AOFRW overheads. (#9788)
Implement Multi-Part AOF mechanism to avoid overheads during AOFRW.
Introducing a folder with multiple AOF files tracked by a manifest file.
The main issues with the the original AOFRW mechanism are:
* buffering of commands that are processed during rewrite (consuming a lot of RAM)
* freezes of the main process when the AOFRW completes to drain the remaining part of the buffer and fsync it.
* double disk IO for the data that arrives during AOFRW (had to be written to both the old and new AOF files)
The main modifications of this PR:
1. Remove the AOF rewrite buffer and related code.
2. Divide the AOF into multiple files, they are classified as two types, one is the the `BASE` type,
it represents the full amount of data (Maybe AOF or RDB format) after each AOFRW, there is only
one `BASE` file at most. The second is `INCR` type, may have more than one. They represent the
incremental commands since the last AOFRW.
3. Use a AOF manifest file to record and manage these AOF files mentioned above.
4. The original configuration of `appendfilename` will be the base part of the new file name, for example:
`appendonly.aof.1.base.rdb` and `appendonly.aof.2.incr.aof`
5. Add manifest-related TCL tests, and modified some existing tests that depend on the `appendfilename`
6. Remove the `aof_rewrite_buffer_length` field in info.
7. Add `aof-disable-auto-gc` configuration. By default we're automatically deleting HISTORY type AOFs.
It also gives users the opportunity to preserve the history AOFs. just for testing use now.
8. Add AOFRW limiting measure. When the AOFRW failures reaches the threshold (3 times now),
we will delay the execution of the next AOFRW by 1 minute. If the next AOFRW also fails, it will be
delayed by 2 minutes. The next is 4, 8, 16, the maximum delay is 60 minutes (1 hour). During the limit
period, we can still use the 'bgrewriteaof' command to execute AOFRW immediately.
9. Support upgrade (load) data from old version redis.
10. Add `appenddirname` configuration, as the directory name of the append only files. All AOF files and
manifest file will be placed in this directory.
11. Only the last AOF file (BASE or INCR) can be truncated. Otherwise redis will exit even if
`aof-load-truncated` is enabled.
Co-authored-by: Oran Agra <oran@redislabs.com>
2022-01-04 01:14:13 +08:00
/* AOF manifest key. */
# define AOF_MANIFEST_KEY_FILE_NAME "file"
# define AOF_MANIFEST_KEY_FILE_SEQ "seq"
# define AOF_MANIFEST_KEY_FILE_TYPE "type"
/* Create an empty aofInfo. */
aofInfo * aofInfoCreate ( void ) {
return zcalloc ( sizeof ( aofInfo ) ) ;
}
/* Free the aofInfo structure (pointed to by ai) and its embedded file_name. */
void aofInfoFree ( aofInfo * ai ) {
serverAssert ( ai ! = NULL ) ;
if ( ai - > file_name ) sdsfree ( ai - > file_name ) ;
zfree ( ai ) ;
}
/* Deep copy an aofInfo. */
aofInfo * aofInfoDup ( aofInfo * orig ) {
serverAssert ( orig ! = NULL ) ;
aofInfo * ai = aofInfoCreate ( ) ;
ai - > file_name = sdsdup ( orig - > file_name ) ;
ai - > file_seq = orig - > file_seq ;
ai - > file_type = orig - > file_type ;
return ai ;
}
2022-01-10 15:09:39 +08:00
/* Format aofInfo as a string and it will be a line in the manifest. */
sds aofInfoFormat ( sds buf , aofInfo * ai ) {
2022-01-18 12:52:27 +02:00
sds filename_repr = NULL ;
if ( sdsneedsrepr ( ai - > file_name ) )
filename_repr = sdscatrepr ( sdsempty ( ) , ai - > file_name , sdslen ( ai - > file_name ) ) ;
sds ret = sdscatprintf ( buf , " %s %s %s %lld %s %c \n " ,
AOF_MANIFEST_KEY_FILE_NAME , filename_repr ? filename_repr : ai - > file_name ,
AOF_MANIFEST_KEY_FILE_SEQ , ai - > file_seq ,
AOF_MANIFEST_KEY_FILE_TYPE , ai - > file_type ) ;
sdsfree ( filename_repr ) ;
return ret ;
2022-01-10 15:09:39 +08:00
}
Implement Multi Part AOF mechanism to avoid AOFRW overheads. (#9788)
Implement Multi-Part AOF mechanism to avoid overheads during AOFRW.
Introducing a folder with multiple AOF files tracked by a manifest file.
The main issues with the the original AOFRW mechanism are:
* buffering of commands that are processed during rewrite (consuming a lot of RAM)
* freezes of the main process when the AOFRW completes to drain the remaining part of the buffer and fsync it.
* double disk IO for the data that arrives during AOFRW (had to be written to both the old and new AOF files)
The main modifications of this PR:
1. Remove the AOF rewrite buffer and related code.
2. Divide the AOF into multiple files, they are classified as two types, one is the the `BASE` type,
it represents the full amount of data (Maybe AOF or RDB format) after each AOFRW, there is only
one `BASE` file at most. The second is `INCR` type, may have more than one. They represent the
incremental commands since the last AOFRW.
3. Use a AOF manifest file to record and manage these AOF files mentioned above.
4. The original configuration of `appendfilename` will be the base part of the new file name, for example:
`appendonly.aof.1.base.rdb` and `appendonly.aof.2.incr.aof`
5. Add manifest-related TCL tests, and modified some existing tests that depend on the `appendfilename`
6. Remove the `aof_rewrite_buffer_length` field in info.
7. Add `aof-disable-auto-gc` configuration. By default we're automatically deleting HISTORY type AOFs.
It also gives users the opportunity to preserve the history AOFs. just for testing use now.
8. Add AOFRW limiting measure. When the AOFRW failures reaches the threshold (3 times now),
we will delay the execution of the next AOFRW by 1 minute. If the next AOFRW also fails, it will be
delayed by 2 minutes. The next is 4, 8, 16, the maximum delay is 60 minutes (1 hour). During the limit
period, we can still use the 'bgrewriteaof' command to execute AOFRW immediately.
9. Support upgrade (load) data from old version redis.
10. Add `appenddirname` configuration, as the directory name of the append only files. All AOF files and
manifest file will be placed in this directory.
11. Only the last AOF file (BASE or INCR) can be truncated. Otherwise redis will exit even if
`aof-load-truncated` is enabled.
Co-authored-by: Oran Agra <oran@redislabs.com>
2022-01-04 01:14:13 +08:00
/* Method to free AOF list elements. */
void aofListFree ( void * item ) {
aofInfo * ai = ( aofInfo * ) item ;
aofInfoFree ( ai ) ;
}
/* Method to duplicate AOF list elements. */
void * aofListDup ( void * item ) {
return aofInfoDup ( item ) ;
Allow an AOF rewrite buffer > 2GB (Fix for issue #504).
During the AOF rewrite process, the parent process needs to accumulate
the new writes in an in-memory buffer: when the child will terminate the
AOF rewriting process this buffer (that ist the difference between the
dataset when the rewrite was started, and the current dataset) is
flushed to the new AOF file.
We used to implement this buffer using an sds.c string, but sds.c has a
2GB limit. Sometimes the dataset can be big enough, the amount of writes
so high, and the rewrite process slow enough that we overflow the 2GB
limit, causing a crash, documented on github by issue #504.
In order to prevent this from happening, this commit introduces a new
system to accumulate writes, implemented by a linked list of blocks of
10 MB each, so that we also avoid paying the reallocation cost.
Note that theoretically modern operating systems may implement realloc()
simply as a remaping of the old pages, thus with very good performances,
see for instance the mremap() syscall on Linux. However this is not
always true, and jemalloc by default avoids doing this because there are
issues with the current implementation of mremap().
For this reason we are using a linked list of blocks instead of a single
block that gets reallocated again and again.
The changes in this commit lacks testing, that will be performed before
merging into the unstable branch. This fix will not enter 2.4 because it
is too invasive. However 2.4 will log a warning when the AOF rewrite
buffer is near to the 2GB limit.
2012-05-22 13:03:41 +02:00
}
Implement Multi Part AOF mechanism to avoid AOFRW overheads. (#9788)
Implement Multi-Part AOF mechanism to avoid overheads during AOFRW.
Introducing a folder with multiple AOF files tracked by a manifest file.
The main issues with the the original AOFRW mechanism are:
* buffering of commands that are processed during rewrite (consuming a lot of RAM)
* freezes of the main process when the AOFRW completes to drain the remaining part of the buffer and fsync it.
* double disk IO for the data that arrives during AOFRW (had to be written to both the old and new AOF files)
The main modifications of this PR:
1. Remove the AOF rewrite buffer and related code.
2. Divide the AOF into multiple files, they are classified as two types, one is the the `BASE` type,
it represents the full amount of data (Maybe AOF or RDB format) after each AOFRW, there is only
one `BASE` file at most. The second is `INCR` type, may have more than one. They represent the
incremental commands since the last AOFRW.
3. Use a AOF manifest file to record and manage these AOF files mentioned above.
4. The original configuration of `appendfilename` will be the base part of the new file name, for example:
`appendonly.aof.1.base.rdb` and `appendonly.aof.2.incr.aof`
5. Add manifest-related TCL tests, and modified some existing tests that depend on the `appendfilename`
6. Remove the `aof_rewrite_buffer_length` field in info.
7. Add `aof-disable-auto-gc` configuration. By default we're automatically deleting HISTORY type AOFs.
It also gives users the opportunity to preserve the history AOFs. just for testing use now.
8. Add AOFRW limiting measure. When the AOFRW failures reaches the threshold (3 times now),
we will delay the execution of the next AOFRW by 1 minute. If the next AOFRW also fails, it will be
delayed by 2 minutes. The next is 4, 8, 16, the maximum delay is 60 minutes (1 hour). During the limit
period, we can still use the 'bgrewriteaof' command to execute AOFRW immediately.
9. Support upgrade (load) data from old version redis.
10. Add `appenddirname` configuration, as the directory name of the append only files. All AOF files and
manifest file will be placed in this directory.
11. Only the last AOF file (BASE or INCR) can be truncated. Otherwise redis will exit even if
`aof-load-truncated` is enabled.
Co-authored-by: Oran Agra <oran@redislabs.com>
2022-01-04 01:14:13 +08:00
/* Create an empty aofManifest, which will be called in `aofLoadManifestFromDisk`. */
aofManifest * aofManifestCreate ( void ) {
aofManifest * am = zcalloc ( sizeof ( aofManifest ) ) ;
am - > incr_aof_list = listCreate ( ) ;
am - > history_aof_list = listCreate ( ) ;
listSetFreeMethod ( am - > incr_aof_list , aofListFree ) ;
listSetDupMethod ( am - > incr_aof_list , aofListDup ) ;
listSetFreeMethod ( am - > history_aof_list , aofListFree ) ;
listSetDupMethod ( am - > history_aof_list , aofListDup ) ;
return am ;
}
/* Free the aofManifest structure (pointed to by am) and its embedded members. */
void aofManifestFree ( aofManifest * am ) {
if ( am - > base_aof_info ) aofInfoFree ( am - > base_aof_info ) ;
if ( am - > incr_aof_list ) listRelease ( am - > incr_aof_list ) ;
if ( am - > history_aof_list ) listRelease ( am - > history_aof_list ) ;
zfree ( am ) ;
}
2023-05-02 17:31:32 -07:00
sds getAofManifestFileName ( void ) {
Implement Multi Part AOF mechanism to avoid AOFRW overheads. (#9788)
Implement Multi-Part AOF mechanism to avoid overheads during AOFRW.
Introducing a folder with multiple AOF files tracked by a manifest file.
The main issues with the the original AOFRW mechanism are:
* buffering of commands that are processed during rewrite (consuming a lot of RAM)
* freezes of the main process when the AOFRW completes to drain the remaining part of the buffer and fsync it.
* double disk IO for the data that arrives during AOFRW (had to be written to both the old and new AOF files)
The main modifications of this PR:
1. Remove the AOF rewrite buffer and related code.
2. Divide the AOF into multiple files, they are classified as two types, one is the the `BASE` type,
it represents the full amount of data (Maybe AOF or RDB format) after each AOFRW, there is only
one `BASE` file at most. The second is `INCR` type, may have more than one. They represent the
incremental commands since the last AOFRW.
3. Use a AOF manifest file to record and manage these AOF files mentioned above.
4. The original configuration of `appendfilename` will be the base part of the new file name, for example:
`appendonly.aof.1.base.rdb` and `appendonly.aof.2.incr.aof`
5. Add manifest-related TCL tests, and modified some existing tests that depend on the `appendfilename`
6. Remove the `aof_rewrite_buffer_length` field in info.
7. Add `aof-disable-auto-gc` configuration. By default we're automatically deleting HISTORY type AOFs.
It also gives users the opportunity to preserve the history AOFs. just for testing use now.
8. Add AOFRW limiting measure. When the AOFRW failures reaches the threshold (3 times now),
we will delay the execution of the next AOFRW by 1 minute. If the next AOFRW also fails, it will be
delayed by 2 minutes. The next is 4, 8, 16, the maximum delay is 60 minutes (1 hour). During the limit
period, we can still use the 'bgrewriteaof' command to execute AOFRW immediately.
9. Support upgrade (load) data from old version redis.
10. Add `appenddirname` configuration, as the directory name of the append only files. All AOF files and
manifest file will be placed in this directory.
11. Only the last AOF file (BASE or INCR) can be truncated. Otherwise redis will exit even if
`aof-load-truncated` is enabled.
Co-authored-by: Oran Agra <oran@redislabs.com>
2022-01-04 01:14:13 +08:00
return sdscatprintf ( sdsempty ( ) , " %s%s " , server . aof_filename ,
MANIFEST_NAME_SUFFIX ) ;
}
2023-05-02 17:31:32 -07:00
sds getTempAofManifestFileName ( void ) {
2022-04-26 21:31:19 +08:00
return sdscatprintf ( sdsempty ( ) , " %s%s%s " , TEMP_FILE_NAME_PREFIX ,
Implement Multi Part AOF mechanism to avoid AOFRW overheads. (#9788)
Implement Multi-Part AOF mechanism to avoid overheads during AOFRW.
Introducing a folder with multiple AOF files tracked by a manifest file.
The main issues with the the original AOFRW mechanism are:
* buffering of commands that are processed during rewrite (consuming a lot of RAM)
* freezes of the main process when the AOFRW completes to drain the remaining part of the buffer and fsync it.
* double disk IO for the data that arrives during AOFRW (had to be written to both the old and new AOF files)
The main modifications of this PR:
1. Remove the AOF rewrite buffer and related code.
2. Divide the AOF into multiple files, they are classified as two types, one is the the `BASE` type,
it represents the full amount of data (Maybe AOF or RDB format) after each AOFRW, there is only
one `BASE` file at most. The second is `INCR` type, may have more than one. They represent the
incremental commands since the last AOFRW.
3. Use a AOF manifest file to record and manage these AOF files mentioned above.
4. The original configuration of `appendfilename` will be the base part of the new file name, for example:
`appendonly.aof.1.base.rdb` and `appendonly.aof.2.incr.aof`
5. Add manifest-related TCL tests, and modified some existing tests that depend on the `appendfilename`
6. Remove the `aof_rewrite_buffer_length` field in info.
7. Add `aof-disable-auto-gc` configuration. By default we're automatically deleting HISTORY type AOFs.
It also gives users the opportunity to preserve the history AOFs. just for testing use now.
8. Add AOFRW limiting measure. When the AOFRW failures reaches the threshold (3 times now),
we will delay the execution of the next AOFRW by 1 minute. If the next AOFRW also fails, it will be
delayed by 2 minutes. The next is 4, 8, 16, the maximum delay is 60 minutes (1 hour). During the limit
period, we can still use the 'bgrewriteaof' command to execute AOFRW immediately.
9. Support upgrade (load) data from old version redis.
10. Add `appenddirname` configuration, as the directory name of the append only files. All AOF files and
manifest file will be placed in this directory.
11. Only the last AOF file (BASE or INCR) can be truncated. Otherwise redis will exit even if
`aof-load-truncated` is enabled.
Co-authored-by: Oran Agra <oran@redislabs.com>
2022-01-04 01:14:13 +08:00
server . aof_filename , MANIFEST_NAME_SUFFIX ) ;
}
/* Returns the string representation of aofManifest pointed to by am.
*
* The string is multiple lines separated by ' \n ' , and each line represents
* an AOF file .
*
* Each line is space delimited and contains 6 fields , as follows :
* " file " [ filename ] " seq " [ sequence ] " type " [ type ]
*
* Where " file " , " seq " and " type " are keywords that describe the next value ,
* [ filename ] and [ sequence ] describe file name and order , and [ type ] is one
* of ' b ' ( base ) , ' h ' ( history ) or ' i ' ( incr ) .
*
* The base file , if exists , will always be first , followed by history files ,
* and incremental files .
*/
sds getAofManifestAsString ( aofManifest * am ) {
serverAssert ( am ! = NULL ) ;
sds buf = sdsempty ( ) ;
2014-07-04 15:22:49 +02:00
listNode * ln ;
listIter li ;
Allow an AOF rewrite buffer > 2GB (Fix for issue #504).
During the AOF rewrite process, the parent process needs to accumulate
the new writes in an in-memory buffer: when the child will terminate the
AOF rewriting process this buffer (that ist the difference between the
dataset when the rewrite was started, and the current dataset) is
flushed to the new AOF file.
We used to implement this buffer using an sds.c string, but sds.c has a
2GB limit. Sometimes the dataset can be big enough, the amount of writes
so high, and the rewrite process slow enough that we overflow the 2GB
limit, causing a crash, documented on github by issue #504.
In order to prevent this from happening, this commit introduces a new
system to accumulate writes, implemented by a linked list of blocks of
10 MB each, so that we also avoid paying the reallocation cost.
Note that theoretically modern operating systems may implement realloc()
simply as a remaping of the old pages, thus with very good performances,
see for instance the mremap() syscall on Linux. However this is not
always true, and jemalloc by default avoids doing this because there are
issues with the current implementation of mremap().
For this reason we are using a linked list of blocks instead of a single
block that gets reallocated again and again.
The changes in this commit lacks testing, that will be performed before
merging into the unstable branch. This fix will not enter 2.4 because it
is too invasive. However 2.4 will log a warning when the AOF rewrite
buffer is near to the 2GB limit.
2012-05-22 13:03:41 +02:00
Implement Multi Part AOF mechanism to avoid AOFRW overheads. (#9788)
Implement Multi-Part AOF mechanism to avoid overheads during AOFRW.
Introducing a folder with multiple AOF files tracked by a manifest file.
The main issues with the the original AOFRW mechanism are:
* buffering of commands that are processed during rewrite (consuming a lot of RAM)
* freezes of the main process when the AOFRW completes to drain the remaining part of the buffer and fsync it.
* double disk IO for the data that arrives during AOFRW (had to be written to both the old and new AOF files)
The main modifications of this PR:
1. Remove the AOF rewrite buffer and related code.
2. Divide the AOF into multiple files, they are classified as two types, one is the the `BASE` type,
it represents the full amount of data (Maybe AOF or RDB format) after each AOFRW, there is only
one `BASE` file at most. The second is `INCR` type, may have more than one. They represent the
incremental commands since the last AOFRW.
3. Use a AOF manifest file to record and manage these AOF files mentioned above.
4. The original configuration of `appendfilename` will be the base part of the new file name, for example:
`appendonly.aof.1.base.rdb` and `appendonly.aof.2.incr.aof`
5. Add manifest-related TCL tests, and modified some existing tests that depend on the `appendfilename`
6. Remove the `aof_rewrite_buffer_length` field in info.
7. Add `aof-disable-auto-gc` configuration. By default we're automatically deleting HISTORY type AOFs.
It also gives users the opportunity to preserve the history AOFs. just for testing use now.
8. Add AOFRW limiting measure. When the AOFRW failures reaches the threshold (3 times now),
we will delay the execution of the next AOFRW by 1 minute. If the next AOFRW also fails, it will be
delayed by 2 minutes. The next is 4, 8, 16, the maximum delay is 60 minutes (1 hour). During the limit
period, we can still use the 'bgrewriteaof' command to execute AOFRW immediately.
9. Support upgrade (load) data from old version redis.
10. Add `appenddirname` configuration, as the directory name of the append only files. All AOF files and
manifest file will be placed in this directory.
11. Only the last AOF file (BASE or INCR) can be truncated. Otherwise redis will exit even if
`aof-load-truncated` is enabled.
Co-authored-by: Oran Agra <oran@redislabs.com>
2022-01-04 01:14:13 +08:00
/* 1. Add BASE File information, it is always at the beginning
* of the manifest file . */
if ( am - > base_aof_info ) {
2022-01-10 15:09:39 +08:00
buf = aofInfoFormat ( buf , am - > base_aof_info ) ;
2014-07-04 15:22:49 +02:00
}
Allow an AOF rewrite buffer > 2GB (Fix for issue #504).
During the AOF rewrite process, the parent process needs to accumulate
the new writes in an in-memory buffer: when the child will terminate the
AOF rewriting process this buffer (that ist the difference between the
dataset when the rewrite was started, and the current dataset) is
flushed to the new AOF file.
We used to implement this buffer using an sds.c string, but sds.c has a
2GB limit. Sometimes the dataset can be big enough, the amount of writes
so high, and the rewrite process slow enough that we overflow the 2GB
limit, causing a crash, documented on github by issue #504.
In order to prevent this from happening, this commit introduces a new
system to accumulate writes, implemented by a linked list of blocks of
10 MB each, so that we also avoid paying the reallocation cost.
Note that theoretically modern operating systems may implement realloc()
simply as a remaping of the old pages, thus with very good performances,
see for instance the mremap() syscall on Linux. However this is not
always true, and jemalloc by default avoids doing this because there are
issues with the current implementation of mremap().
For this reason we are using a linked list of blocks instead of a single
block that gets reallocated again and again.
The changes in this commit lacks testing, that will be performed before
merging into the unstable branch. This fix will not enter 2.4 because it
is too invasive. However 2.4 will log a warning when the AOF rewrite
buffer is near to the 2GB limit.
2012-05-22 13:03:41 +02:00
Implement Multi Part AOF mechanism to avoid AOFRW overheads. (#9788)
Implement Multi-Part AOF mechanism to avoid overheads during AOFRW.
Introducing a folder with multiple AOF files tracked by a manifest file.
The main issues with the the original AOFRW mechanism are:
* buffering of commands that are processed during rewrite (consuming a lot of RAM)
* freezes of the main process when the AOFRW completes to drain the remaining part of the buffer and fsync it.
* double disk IO for the data that arrives during AOFRW (had to be written to both the old and new AOF files)
The main modifications of this PR:
1. Remove the AOF rewrite buffer and related code.
2. Divide the AOF into multiple files, they are classified as two types, one is the the `BASE` type,
it represents the full amount of data (Maybe AOF or RDB format) after each AOFRW, there is only
one `BASE` file at most. The second is `INCR` type, may have more than one. They represent the
incremental commands since the last AOFRW.
3. Use a AOF manifest file to record and manage these AOF files mentioned above.
4. The original configuration of `appendfilename` will be the base part of the new file name, for example:
`appendonly.aof.1.base.rdb` and `appendonly.aof.2.incr.aof`
5. Add manifest-related TCL tests, and modified some existing tests that depend on the `appendfilename`
6. Remove the `aof_rewrite_buffer_length` field in info.
7. Add `aof-disable-auto-gc` configuration. By default we're automatically deleting HISTORY type AOFs.
It also gives users the opportunity to preserve the history AOFs. just for testing use now.
8. Add AOFRW limiting measure. When the AOFRW failures reaches the threshold (3 times now),
we will delay the execution of the next AOFRW by 1 minute. If the next AOFRW also fails, it will be
delayed by 2 minutes. The next is 4, 8, 16, the maximum delay is 60 minutes (1 hour). During the limit
period, we can still use the 'bgrewriteaof' command to execute AOFRW immediately.
9. Support upgrade (load) data from old version redis.
10. Add `appenddirname` configuration, as the directory name of the append only files. All AOF files and
manifest file will be placed in this directory.
11. Only the last AOF file (BASE or INCR) can be truncated. Otherwise redis will exit even if
`aof-load-truncated` is enabled.
Co-authored-by: Oran Agra <oran@redislabs.com>
2022-01-04 01:14:13 +08:00
/* 2. Add HISTORY type AOF information. */
listRewind ( am - > history_aof_list , & li ) ;
while ( ( ln = listNext ( & li ) ) ! = NULL ) {
aofInfo * ai = ( aofInfo * ) ln - > value ;
2022-01-10 15:09:39 +08:00
buf = aofInfoFormat ( buf , ai ) ;
Implement Multi Part AOF mechanism to avoid AOFRW overheads. (#9788)
Implement Multi-Part AOF mechanism to avoid overheads during AOFRW.
Introducing a folder with multiple AOF files tracked by a manifest file.
The main issues with the the original AOFRW mechanism are:
* buffering of commands that are processed during rewrite (consuming a lot of RAM)
* freezes of the main process when the AOFRW completes to drain the remaining part of the buffer and fsync it.
* double disk IO for the data that arrives during AOFRW (had to be written to both the old and new AOF files)
The main modifications of this PR:
1. Remove the AOF rewrite buffer and related code.
2. Divide the AOF into multiple files, they are classified as two types, one is the the `BASE` type,
it represents the full amount of data (Maybe AOF or RDB format) after each AOFRW, there is only
one `BASE` file at most. The second is `INCR` type, may have more than one. They represent the
incremental commands since the last AOFRW.
3. Use a AOF manifest file to record and manage these AOF files mentioned above.
4. The original configuration of `appendfilename` will be the base part of the new file name, for example:
`appendonly.aof.1.base.rdb` and `appendonly.aof.2.incr.aof`
5. Add manifest-related TCL tests, and modified some existing tests that depend on the `appendfilename`
6. Remove the `aof_rewrite_buffer_length` field in info.
7. Add `aof-disable-auto-gc` configuration. By default we're automatically deleting HISTORY type AOFs.
It also gives users the opportunity to preserve the history AOFs. just for testing use now.
8. Add AOFRW limiting measure. When the AOFRW failures reaches the threshold (3 times now),
we will delay the execution of the next AOFRW by 1 minute. If the next AOFRW also fails, it will be
delayed by 2 minutes. The next is 4, 8, 16, the maximum delay is 60 minutes (1 hour). During the limit
period, we can still use the 'bgrewriteaof' command to execute AOFRW immediately.
9. Support upgrade (load) data from old version redis.
10. Add `appenddirname` configuration, as the directory name of the append only files. All AOF files and
manifest file will be placed in this directory.
11. Only the last AOF file (BASE or INCR) can be truncated. Otherwise redis will exit even if
`aof-load-truncated` is enabled.
Co-authored-by: Oran Agra <oran@redislabs.com>
2022-01-04 01:14:13 +08:00
}
2021-05-30 16:57:36 +08:00
Implement Multi Part AOF mechanism to avoid AOFRW overheads. (#9788)
Implement Multi-Part AOF mechanism to avoid overheads during AOFRW.
Introducing a folder with multiple AOF files tracked by a manifest file.
The main issues with the the original AOFRW mechanism are:
* buffering of commands that are processed during rewrite (consuming a lot of RAM)
* freezes of the main process when the AOFRW completes to drain the remaining part of the buffer and fsync it.
* double disk IO for the data that arrives during AOFRW (had to be written to both the old and new AOF files)
The main modifications of this PR:
1. Remove the AOF rewrite buffer and related code.
2. Divide the AOF into multiple files, they are classified as two types, one is the the `BASE` type,
it represents the full amount of data (Maybe AOF or RDB format) after each AOFRW, there is only
one `BASE` file at most. The second is `INCR` type, may have more than one. They represent the
incremental commands since the last AOFRW.
3. Use a AOF manifest file to record and manage these AOF files mentioned above.
4. The original configuration of `appendfilename` will be the base part of the new file name, for example:
`appendonly.aof.1.base.rdb` and `appendonly.aof.2.incr.aof`
5. Add manifest-related TCL tests, and modified some existing tests that depend on the `appendfilename`
6. Remove the `aof_rewrite_buffer_length` field in info.
7. Add `aof-disable-auto-gc` configuration. By default we're automatically deleting HISTORY type AOFs.
It also gives users the opportunity to preserve the history AOFs. just for testing use now.
8. Add AOFRW limiting measure. When the AOFRW failures reaches the threshold (3 times now),
we will delay the execution of the next AOFRW by 1 minute. If the next AOFRW also fails, it will be
delayed by 2 minutes. The next is 4, 8, 16, the maximum delay is 60 minutes (1 hour). During the limit
period, we can still use the 'bgrewriteaof' command to execute AOFRW immediately.
9. Support upgrade (load) data from old version redis.
10. Add `appenddirname` configuration, as the directory name of the append only files. All AOF files and
manifest file will be placed in this directory.
11. Only the last AOF file (BASE or INCR) can be truncated. Otherwise redis will exit even if
`aof-load-truncated` is enabled.
Co-authored-by: Oran Agra <oran@redislabs.com>
2022-01-04 01:14:13 +08:00
/* 3. Add INCR type AOF information. */
listRewind ( am - > incr_aof_list , & li ) ;
while ( ( ln = listNext ( & li ) ) ! = NULL ) {
aofInfo * ai = ( aofInfo * ) ln - > value ;
2022-01-10 15:09:39 +08:00
buf = aofInfoFormat ( buf , ai ) ;
2021-05-30 16:57:36 +08:00
}
Implement Multi Part AOF mechanism to avoid AOFRW overheads. (#9788)
Implement Multi-Part AOF mechanism to avoid overheads during AOFRW.
Introducing a folder with multiple AOF files tracked by a manifest file.
The main issues with the the original AOFRW mechanism are:
* buffering of commands that are processed during rewrite (consuming a lot of RAM)
* freezes of the main process when the AOFRW completes to drain the remaining part of the buffer and fsync it.
* double disk IO for the data that arrives during AOFRW (had to be written to both the old and new AOF files)
The main modifications of this PR:
1. Remove the AOF rewrite buffer and related code.
2. Divide the AOF into multiple files, they are classified as two types, one is the the `BASE` type,
it represents the full amount of data (Maybe AOF or RDB format) after each AOFRW, there is only
one `BASE` file at most. The second is `INCR` type, may have more than one. They represent the
incremental commands since the last AOFRW.
3. Use a AOF manifest file to record and manage these AOF files mentioned above.
4. The original configuration of `appendfilename` will be the base part of the new file name, for example:
`appendonly.aof.1.base.rdb` and `appendonly.aof.2.incr.aof`
5. Add manifest-related TCL tests, and modified some existing tests that depend on the `appendfilename`
6. Remove the `aof_rewrite_buffer_length` field in info.
7. Add `aof-disable-auto-gc` configuration. By default we're automatically deleting HISTORY type AOFs.
It also gives users the opportunity to preserve the history AOFs. just for testing use now.
8. Add AOFRW limiting measure. When the AOFRW failures reaches the threshold (3 times now),
we will delay the execution of the next AOFRW by 1 minute. If the next AOFRW also fails, it will be
delayed by 2 minutes. The next is 4, 8, 16, the maximum delay is 60 minutes (1 hour). During the limit
period, we can still use the 'bgrewriteaof' command to execute AOFRW immediately.
9. Support upgrade (load) data from old version redis.
10. Add `appenddirname` configuration, as the directory name of the append only files. All AOF files and
manifest file will be placed in this directory.
11. Only the last AOF file (BASE or INCR) can be truncated. Otherwise redis will exit even if
`aof-load-truncated` is enabled.
Co-authored-by: Oran Agra <oran@redislabs.com>
2022-01-04 01:14:13 +08:00
return buf ;
2021-05-30 16:57:36 +08:00
}
Implement Multi Part AOF mechanism to avoid AOFRW overheads. (#9788)
Implement Multi-Part AOF mechanism to avoid overheads during AOFRW.
Introducing a folder with multiple AOF files tracked by a manifest file.
The main issues with the the original AOFRW mechanism are:
* buffering of commands that are processed during rewrite (consuming a lot of RAM)
* freezes of the main process when the AOFRW completes to drain the remaining part of the buffer and fsync it.
* double disk IO for the data that arrives during AOFRW (had to be written to both the old and new AOF files)
The main modifications of this PR:
1. Remove the AOF rewrite buffer and related code.
2. Divide the AOF into multiple files, they are classified as two types, one is the the `BASE` type,
it represents the full amount of data (Maybe AOF or RDB format) after each AOFRW, there is only
one `BASE` file at most. The second is `INCR` type, may have more than one. They represent the
incremental commands since the last AOFRW.
3. Use a AOF manifest file to record and manage these AOF files mentioned above.
4. The original configuration of `appendfilename` will be the base part of the new file name, for example:
`appendonly.aof.1.base.rdb` and `appendonly.aof.2.incr.aof`
5. Add manifest-related TCL tests, and modified some existing tests that depend on the `appendfilename`
6. Remove the `aof_rewrite_buffer_length` field in info.
7. Add `aof-disable-auto-gc` configuration. By default we're automatically deleting HISTORY type AOFs.
It also gives users the opportunity to preserve the history AOFs. just for testing use now.
8. Add AOFRW limiting measure. When the AOFRW failures reaches the threshold (3 times now),
we will delay the execution of the next AOFRW by 1 minute. If the next AOFRW also fails, it will be
delayed by 2 minutes. The next is 4, 8, 16, the maximum delay is 60 minutes (1 hour). During the limit
period, we can still use the 'bgrewriteaof' command to execute AOFRW immediately.
9. Support upgrade (load) data from old version redis.
10. Add `appenddirname` configuration, as the directory name of the append only files. All AOF files and
manifest file will be placed in this directory.
11. Only the last AOF file (BASE or INCR) can be truncated. Otherwise redis will exit even if
`aof-load-truncated` is enabled.
Co-authored-by: Oran Agra <oran@redislabs.com>
2022-01-04 01:14:13 +08:00
/* Load the manifest information from the disk to `server.aof_manifest`
* when the Redis server start .
*
* During loading , this function does strict error checking and will abort
* the entire Redis server process on error ( I / O error , invalid format , etc . )
*
* If the AOF directory or manifest file do not exist , this will be ignored
* in order to support seamless upgrades from previous versions which did not
* use them .
*/
void aofLoadManifestFromDisk ( void ) {
server . aof_manifest = aofManifestCreate ( ) ;
if ( ! dirExists ( server . aof_dirname ) ) {
2022-05-26 22:23:05 +08:00
serverLog ( LL_DEBUG , " The AOF directory %s doesn't exist " , server . aof_dirname ) ;
Implement Multi Part AOF mechanism to avoid AOFRW overheads. (#9788)
Implement Multi-Part AOF mechanism to avoid overheads during AOFRW.
Introducing a folder with multiple AOF files tracked by a manifest file.
The main issues with the the original AOFRW mechanism are:
* buffering of commands that are processed during rewrite (consuming a lot of RAM)
* freezes of the main process when the AOFRW completes to drain the remaining part of the buffer and fsync it.
* double disk IO for the data that arrives during AOFRW (had to be written to both the old and new AOF files)
The main modifications of this PR:
1. Remove the AOF rewrite buffer and related code.
2. Divide the AOF into multiple files, they are classified as two types, one is the the `BASE` type,
it represents the full amount of data (Maybe AOF or RDB format) after each AOFRW, there is only
one `BASE` file at most. The second is `INCR` type, may have more than one. They represent the
incremental commands since the last AOFRW.
3. Use a AOF manifest file to record and manage these AOF files mentioned above.
4. The original configuration of `appendfilename` will be the base part of the new file name, for example:
`appendonly.aof.1.base.rdb` and `appendonly.aof.2.incr.aof`
5. Add manifest-related TCL tests, and modified some existing tests that depend on the `appendfilename`
6. Remove the `aof_rewrite_buffer_length` field in info.
7. Add `aof-disable-auto-gc` configuration. By default we're automatically deleting HISTORY type AOFs.
It also gives users the opportunity to preserve the history AOFs. just for testing use now.
8. Add AOFRW limiting measure. When the AOFRW failures reaches the threshold (3 times now),
we will delay the execution of the next AOFRW by 1 minute. If the next AOFRW also fails, it will be
delayed by 2 minutes. The next is 4, 8, 16, the maximum delay is 60 minutes (1 hour). During the limit
period, we can still use the 'bgrewriteaof' command to execute AOFRW immediately.
9. Support upgrade (load) data from old version redis.
10. Add `appenddirname` configuration, as the directory name of the append only files. All AOF files and
manifest file will be placed in this directory.
11. Only the last AOF file (BASE or INCR) can be truncated. Otherwise redis will exit even if
`aof-load-truncated` is enabled.
Co-authored-by: Oran Agra <oran@redislabs.com>
2022-01-04 01:14:13 +08:00
return ;
}
sds am_name = getAofManifestFileName ( ) ;
sds am_filepath = makePath ( server . aof_dirname , am_name ) ;
if ( ! fileExist ( am_filepath ) ) {
2022-05-26 22:23:05 +08:00
serverLog ( LL_DEBUG , " The AOF manifest file %s doesn't exist " , am_name ) ;
Implement Multi Part AOF mechanism to avoid AOFRW overheads. (#9788)
Implement Multi-Part AOF mechanism to avoid overheads during AOFRW.
Introducing a folder with multiple AOF files tracked by a manifest file.
The main issues with the the original AOFRW mechanism are:
* buffering of commands that are processed during rewrite (consuming a lot of RAM)
* freezes of the main process when the AOFRW completes to drain the remaining part of the buffer and fsync it.
* double disk IO for the data that arrives during AOFRW (had to be written to both the old and new AOF files)
The main modifications of this PR:
1. Remove the AOF rewrite buffer and related code.
2. Divide the AOF into multiple files, they are classified as two types, one is the the `BASE` type,
it represents the full amount of data (Maybe AOF or RDB format) after each AOFRW, there is only
one `BASE` file at most. The second is `INCR` type, may have more than one. They represent the
incremental commands since the last AOFRW.
3. Use a AOF manifest file to record and manage these AOF files mentioned above.
4. The original configuration of `appendfilename` will be the base part of the new file name, for example:
`appendonly.aof.1.base.rdb` and `appendonly.aof.2.incr.aof`
5. Add manifest-related TCL tests, and modified some existing tests that depend on the `appendfilename`
6. Remove the `aof_rewrite_buffer_length` field in info.
7. Add `aof-disable-auto-gc` configuration. By default we're automatically deleting HISTORY type AOFs.
It also gives users the opportunity to preserve the history AOFs. just for testing use now.
8. Add AOFRW limiting measure. When the AOFRW failures reaches the threshold (3 times now),
we will delay the execution of the next AOFRW by 1 minute. If the next AOFRW also fails, it will be
delayed by 2 minutes. The next is 4, 8, 16, the maximum delay is 60 minutes (1 hour). During the limit
period, we can still use the 'bgrewriteaof' command to execute AOFRW immediately.
9. Support upgrade (load) data from old version redis.
10. Add `appenddirname` configuration, as the directory name of the append only files. All AOF files and
manifest file will be placed in this directory.
11. Only the last AOF file (BASE or INCR) can be truncated. Otherwise redis will exit even if
`aof-load-truncated` is enabled.
Co-authored-by: Oran Agra <oran@redislabs.com>
2022-01-04 01:14:13 +08:00
sdsfree ( am_name ) ;
sdsfree ( am_filepath ) ;
return ;
}
Adapt redis-check-aof tool for Multi Part Aof (#10061)
Modifications of this PR:
1. Support the verification of `Multi Part AOF`, while still maintaining support for the
old-style `AOF/RDB-preamble`. `redis-check-aof` will automatically choose which
mode to use according to the incoming file format.
`Usage: redis-check-aof [--fix|--truncate-to-timestamp $timestamp] <AOF/manifest>`
2. Refactor part of the code to make it easier to understand
3. Currently only supports truncate (`--fix` or `--truncate-to-timestamp`) the last AOF
file (may be `BASE` or `INCR`)
The reasons for 3 above:
- for `--fix`: Only the last AOF may be truncated, this is guaranteed by redis
- for `--truncate-to-timestamp`: Normally, we only have `BASE` + `INCR` files
at most, and `BASE` cannot be truncated(It only contains a timestamp annotation
at the beginning of the file), so only `INCR` can be truncated. If we have a
`BASE+INCR1+INCR2` file (meaning we have an interrupted AOFRW), Only `INCR2`
files can be truncated at this time. If we still insist on truncate `INCR1`, we need to
manually delete `INCR2` and update the manifest file, then re-run `redis-check-aof`
- If we want to support truncate any file, we need to add very complicated code to support
the atomic modification of multiple file deletion and update manifest, I think this is unnecessary
2022-02-17 14:13:28 +08:00
aofManifest * am = aofLoadManifestFromFile ( am_filepath ) ;
if ( am ) aofManifestFreeAndUpdate ( am ) ;
sdsfree ( am_name ) ;
sdsfree ( am_filepath ) ;
}
2024-04-06 20:33:09 -07:00
/* Generic manifest loading function, used in `aofLoadManifestFromDisk` and valkey-check-aof tool. */
Adapt redis-check-aof tool for Multi Part Aof (#10061)
Modifications of this PR:
1. Support the verification of `Multi Part AOF`, while still maintaining support for the
old-style `AOF/RDB-preamble`. `redis-check-aof` will automatically choose which
mode to use according to the incoming file format.
`Usage: redis-check-aof [--fix|--truncate-to-timestamp $timestamp] <AOF/manifest>`
2. Refactor part of the code to make it easier to understand
3. Currently only supports truncate (`--fix` or `--truncate-to-timestamp`) the last AOF
file (may be `BASE` or `INCR`)
The reasons for 3 above:
- for `--fix`: Only the last AOF may be truncated, this is guaranteed by redis
- for `--truncate-to-timestamp`: Normally, we only have `BASE` + `INCR` files
at most, and `BASE` cannot be truncated(It only contains a timestamp annotation
at the beginning of the file), so only `INCR` can be truncated. If we have a
`BASE+INCR1+INCR2` file (meaning we have an interrupted AOFRW), Only `INCR2`
files can be truncated at this time. If we still insist on truncate `INCR1`, we need to
manually delete `INCR2` and update the manifest file, then re-run `redis-check-aof`
- If we want to support truncate any file, we need to add very complicated code to support
the atomic modification of multiple file deletion and update manifest, I think this is unnecessary
2022-02-17 14:13:28 +08:00
# define MANIFEST_MAX_LINE 1024
aofManifest * aofLoadManifestFromFile ( sds am_filepath ) {
const char * err = NULL ;
long long maxseq = 0 ;
aofManifest * am = aofManifestCreate ( ) ;
Implement Multi Part AOF mechanism to avoid AOFRW overheads. (#9788)
Implement Multi-Part AOF mechanism to avoid overheads during AOFRW.
Introducing a folder with multiple AOF files tracked by a manifest file.
The main issues with the the original AOFRW mechanism are:
* buffering of commands that are processed during rewrite (consuming a lot of RAM)
* freezes of the main process when the AOFRW completes to drain the remaining part of the buffer and fsync it.
* double disk IO for the data that arrives during AOFRW (had to be written to both the old and new AOF files)
The main modifications of this PR:
1. Remove the AOF rewrite buffer and related code.
2. Divide the AOF into multiple files, they are classified as two types, one is the the `BASE` type,
it represents the full amount of data (Maybe AOF or RDB format) after each AOFRW, there is only
one `BASE` file at most. The second is `INCR` type, may have more than one. They represent the
incremental commands since the last AOFRW.
3. Use a AOF manifest file to record and manage these AOF files mentioned above.
4. The original configuration of `appendfilename` will be the base part of the new file name, for example:
`appendonly.aof.1.base.rdb` and `appendonly.aof.2.incr.aof`
5. Add manifest-related TCL tests, and modified some existing tests that depend on the `appendfilename`
6. Remove the `aof_rewrite_buffer_length` field in info.
7. Add `aof-disable-auto-gc` configuration. By default we're automatically deleting HISTORY type AOFs.
It also gives users the opportunity to preserve the history AOFs. just for testing use now.
8. Add AOFRW limiting measure. When the AOFRW failures reaches the threshold (3 times now),
we will delay the execution of the next AOFRW by 1 minute. If the next AOFRW also fails, it will be
delayed by 2 minutes. The next is 4, 8, 16, the maximum delay is 60 minutes (1 hour). During the limit
period, we can still use the 'bgrewriteaof' command to execute AOFRW immediately.
9. Support upgrade (load) data from old version redis.
10. Add `appenddirname` configuration, as the directory name of the append only files. All AOF files and
manifest file will be placed in this directory.
11. Only the last AOF file (BASE or INCR) can be truncated. Otherwise redis will exit even if
`aof-load-truncated` is enabled.
Co-authored-by: Oran Agra <oran@redislabs.com>
2022-01-04 01:14:13 +08:00
FILE * fp = fopen ( am_filepath , " r " ) ;
if ( fp = = NULL ) {
serverLog ( LL_WARNING , " Fatal error: can't open the AOF manifest "
Adapt redis-check-aof tool for Multi Part Aof (#10061)
Modifications of this PR:
1. Support the verification of `Multi Part AOF`, while still maintaining support for the
old-style `AOF/RDB-preamble`. `redis-check-aof` will automatically choose which
mode to use according to the incoming file format.
`Usage: redis-check-aof [--fix|--truncate-to-timestamp $timestamp] <AOF/manifest>`
2. Refactor part of the code to make it easier to understand
3. Currently only supports truncate (`--fix` or `--truncate-to-timestamp`) the last AOF
file (may be `BASE` or `INCR`)
The reasons for 3 above:
- for `--fix`: Only the last AOF may be truncated, this is guaranteed by redis
- for `--truncate-to-timestamp`: Normally, we only have `BASE` + `INCR` files
at most, and `BASE` cannot be truncated(It only contains a timestamp annotation
at the beginning of the file), so only `INCR` can be truncated. If we have a
`BASE+INCR1+INCR2` file (meaning we have an interrupted AOFRW), Only `INCR2`
files can be truncated at this time. If we still insist on truncate `INCR1`, we need to
manually delete `INCR2` and update the manifest file, then re-run `redis-check-aof`
- If we want to support truncate any file, we need to add very complicated code to support
the atomic modification of multiple file deletion and update manifest, I think this is unnecessary
2022-02-17 14:13:28 +08:00
" file %s for reading: %s " , am_filepath , strerror ( errno ) ) ;
Implement Multi Part AOF mechanism to avoid AOFRW overheads. (#9788)
Implement Multi-Part AOF mechanism to avoid overheads during AOFRW.
Introducing a folder with multiple AOF files tracked by a manifest file.
The main issues with the the original AOFRW mechanism are:
* buffering of commands that are processed during rewrite (consuming a lot of RAM)
* freezes of the main process when the AOFRW completes to drain the remaining part of the buffer and fsync it.
* double disk IO for the data that arrives during AOFRW (had to be written to both the old and new AOF files)
The main modifications of this PR:
1. Remove the AOF rewrite buffer and related code.
2. Divide the AOF into multiple files, they are classified as two types, one is the the `BASE` type,
it represents the full amount of data (Maybe AOF or RDB format) after each AOFRW, there is only
one `BASE` file at most. The second is `INCR` type, may have more than one. They represent the
incremental commands since the last AOFRW.
3. Use a AOF manifest file to record and manage these AOF files mentioned above.
4. The original configuration of `appendfilename` will be the base part of the new file name, for example:
`appendonly.aof.1.base.rdb` and `appendonly.aof.2.incr.aof`
5. Add manifest-related TCL tests, and modified some existing tests that depend on the `appendfilename`
6. Remove the `aof_rewrite_buffer_length` field in info.
7. Add `aof-disable-auto-gc` configuration. By default we're automatically deleting HISTORY type AOFs.
It also gives users the opportunity to preserve the history AOFs. just for testing use now.
8. Add AOFRW limiting measure. When the AOFRW failures reaches the threshold (3 times now),
we will delay the execution of the next AOFRW by 1 minute. If the next AOFRW also fails, it will be
delayed by 2 minutes. The next is 4, 8, 16, the maximum delay is 60 minutes (1 hour). During the limit
period, we can still use the 'bgrewriteaof' command to execute AOFRW immediately.
9. Support upgrade (load) data from old version redis.
10. Add `appenddirname` configuration, as the directory name of the append only files. All AOF files and
manifest file will be placed in this directory.
11. Only the last AOF file (BASE or INCR) can be truncated. Otherwise redis will exit even if
`aof-load-truncated` is enabled.
Co-authored-by: Oran Agra <oran@redislabs.com>
2022-01-04 01:14:13 +08:00
exit ( 1 ) ;
}
char buf [ MANIFEST_MAX_LINE + 1 ] ;
sds * argv = NULL ;
int argc ;
aofInfo * ai = NULL ;
sds line = NULL ;
int linenum = 0 ;
while ( 1 ) {
if ( fgets ( buf , MANIFEST_MAX_LINE + 1 , fp ) = = NULL ) {
if ( feof ( fp ) ) {
if ( linenum = = 0 ) {
err = " Found an empty AOF manifest " ;
goto loaderr ;
} else {
break ;
}
} else {
err = " Read AOF manifest failed " ;
goto loaderr ;
}
2014-07-04 15:54:20 +02:00
}
Implement Multi Part AOF mechanism to avoid AOFRW overheads. (#9788)
Implement Multi-Part AOF mechanism to avoid overheads during AOFRW.
Introducing a folder with multiple AOF files tracked by a manifest file.
The main issues with the the original AOFRW mechanism are:
* buffering of commands that are processed during rewrite (consuming a lot of RAM)
* freezes of the main process when the AOFRW completes to drain the remaining part of the buffer and fsync it.
* double disk IO for the data that arrives during AOFRW (had to be written to both the old and new AOF files)
The main modifications of this PR:
1. Remove the AOF rewrite buffer and related code.
2. Divide the AOF into multiple files, they are classified as two types, one is the the `BASE` type,
it represents the full amount of data (Maybe AOF or RDB format) after each AOFRW, there is only
one `BASE` file at most. The second is `INCR` type, may have more than one. They represent the
incremental commands since the last AOFRW.
3. Use a AOF manifest file to record and manage these AOF files mentioned above.
4. The original configuration of `appendfilename` will be the base part of the new file name, for example:
`appendonly.aof.1.base.rdb` and `appendonly.aof.2.incr.aof`
5. Add manifest-related TCL tests, and modified some existing tests that depend on the `appendfilename`
6. Remove the `aof_rewrite_buffer_length` field in info.
7. Add `aof-disable-auto-gc` configuration. By default we're automatically deleting HISTORY type AOFs.
It also gives users the opportunity to preserve the history AOFs. just for testing use now.
8. Add AOFRW limiting measure. When the AOFRW failures reaches the threshold (3 times now),
we will delay the execution of the next AOFRW by 1 minute. If the next AOFRW also fails, it will be
delayed by 2 minutes. The next is 4, 8, 16, the maximum delay is 60 minutes (1 hour). During the limit
period, we can still use the 'bgrewriteaof' command to execute AOFRW immediately.
9. Support upgrade (load) data from old version redis.
10. Add `appenddirname` configuration, as the directory name of the append only files. All AOF files and
manifest file will be placed in this directory.
11. Only the last AOF file (BASE or INCR) can be truncated. Otherwise redis will exit even if
`aof-load-truncated` is enabled.
Co-authored-by: Oran Agra <oran@redislabs.com>
2022-01-04 01:14:13 +08:00
linenum + + ;
/* Skip comments lines */
if ( buf [ 0 ] = = ' # ' ) continue ;
if ( strchr ( buf , ' \n ' ) = = NULL ) {
err = " The AOF manifest file contains too long line " ;
goto loaderr ;
2014-07-05 01:11:28 +02:00
}
Implement Multi Part AOF mechanism to avoid AOFRW overheads. (#9788)
Implement Multi-Part AOF mechanism to avoid overheads during AOFRW.
Introducing a folder with multiple AOF files tracked by a manifest file.
The main issues with the the original AOFRW mechanism are:
* buffering of commands that are processed during rewrite (consuming a lot of RAM)
* freezes of the main process when the AOFRW completes to drain the remaining part of the buffer and fsync it.
* double disk IO for the data that arrives during AOFRW (had to be written to both the old and new AOF files)
The main modifications of this PR:
1. Remove the AOF rewrite buffer and related code.
2. Divide the AOF into multiple files, they are classified as two types, one is the the `BASE` type,
it represents the full amount of data (Maybe AOF or RDB format) after each AOFRW, there is only
one `BASE` file at most. The second is `INCR` type, may have more than one. They represent the
incremental commands since the last AOFRW.
3. Use a AOF manifest file to record and manage these AOF files mentioned above.
4. The original configuration of `appendfilename` will be the base part of the new file name, for example:
`appendonly.aof.1.base.rdb` and `appendonly.aof.2.incr.aof`
5. Add manifest-related TCL tests, and modified some existing tests that depend on the `appendfilename`
6. Remove the `aof_rewrite_buffer_length` field in info.
7. Add `aof-disable-auto-gc` configuration. By default we're automatically deleting HISTORY type AOFs.
It also gives users the opportunity to preserve the history AOFs. just for testing use now.
8. Add AOFRW limiting measure. When the AOFRW failures reaches the threshold (3 times now),
we will delay the execution of the next AOFRW by 1 minute. If the next AOFRW also fails, it will be
delayed by 2 minutes. The next is 4, 8, 16, the maximum delay is 60 minutes (1 hour). During the limit
period, we can still use the 'bgrewriteaof' command to execute AOFRW immediately.
9. Support upgrade (load) data from old version redis.
10. Add `appenddirname` configuration, as the directory name of the append only files. All AOF files and
manifest file will be placed in this directory.
11. Only the last AOF file (BASE or INCR) can be truncated. Otherwise redis will exit even if
`aof-load-truncated` is enabled.
Co-authored-by: Oran Agra <oran@redislabs.com>
2022-01-04 01:14:13 +08:00
line = sdstrim ( sdsnew ( buf ) , " \t \r \n " ) ;
2022-01-10 15:09:39 +08:00
if ( ! sdslen ( line ) ) {
Adapt redis-check-aof tool for Multi Part Aof (#10061)
Modifications of this PR:
1. Support the verification of `Multi Part AOF`, while still maintaining support for the
old-style `AOF/RDB-preamble`. `redis-check-aof` will automatically choose which
mode to use according to the incoming file format.
`Usage: redis-check-aof [--fix|--truncate-to-timestamp $timestamp] <AOF/manifest>`
2. Refactor part of the code to make it easier to understand
3. Currently only supports truncate (`--fix` or `--truncate-to-timestamp`) the last AOF
file (may be `BASE` or `INCR`)
The reasons for 3 above:
- for `--fix`: Only the last AOF may be truncated, this is guaranteed by redis
- for `--truncate-to-timestamp`: Normally, we only have `BASE` + `INCR` files
at most, and `BASE` cannot be truncated(It only contains a timestamp annotation
at the beginning of the file), so only `INCR` can be truncated. If we have a
`BASE+INCR1+INCR2` file (meaning we have an interrupted AOFRW), Only `INCR2`
files can be truncated at this time. If we still insist on truncate `INCR1`, we need to
manually delete `INCR2` and update the manifest file, then re-run `redis-check-aof`
- If we want to support truncate any file, we need to add very complicated code to support
the atomic modification of multiple file deletion and update manifest, I think this is unnecessary
2022-02-17 14:13:28 +08:00
err = " Invalid AOF manifest file format " ;
2022-01-10 15:09:39 +08:00
goto loaderr ;
}
Implement Multi Part AOF mechanism to avoid AOFRW overheads. (#9788)
Implement Multi-Part AOF mechanism to avoid overheads during AOFRW.
Introducing a folder with multiple AOF files tracked by a manifest file.
The main issues with the the original AOFRW mechanism are:
* buffering of commands that are processed during rewrite (consuming a lot of RAM)
* freezes of the main process when the AOFRW completes to drain the remaining part of the buffer and fsync it.
* double disk IO for the data that arrives during AOFRW (had to be written to both the old and new AOF files)
The main modifications of this PR:
1. Remove the AOF rewrite buffer and related code.
2. Divide the AOF into multiple files, they are classified as two types, one is the the `BASE` type,
it represents the full amount of data (Maybe AOF or RDB format) after each AOFRW, there is only
one `BASE` file at most. The second is `INCR` type, may have more than one. They represent the
incremental commands since the last AOFRW.
3. Use a AOF manifest file to record and manage these AOF files mentioned above.
4. The original configuration of `appendfilename` will be the base part of the new file name, for example:
`appendonly.aof.1.base.rdb` and `appendonly.aof.2.incr.aof`
5. Add manifest-related TCL tests, and modified some existing tests that depend on the `appendfilename`
6. Remove the `aof_rewrite_buffer_length` field in info.
7. Add `aof-disable-auto-gc` configuration. By default we're automatically deleting HISTORY type AOFs.
It also gives users the opportunity to preserve the history AOFs. just for testing use now.
8. Add AOFRW limiting measure. When the AOFRW failures reaches the threshold (3 times now),
we will delay the execution of the next AOFRW by 1 minute. If the next AOFRW also fails, it will be
delayed by 2 minutes. The next is 4, 8, 16, the maximum delay is 60 minutes (1 hour). During the limit
period, we can still use the 'bgrewriteaof' command to execute AOFRW immediately.
9. Support upgrade (load) data from old version redis.
10. Add `appenddirname` configuration, as the directory name of the append only files. All AOF files and
manifest file will be placed in this directory.
11. Only the last AOF file (BASE or INCR) can be truncated. Otherwise redis will exit even if
`aof-load-truncated` is enabled.
Co-authored-by: Oran Agra <oran@redislabs.com>
2022-01-04 01:14:13 +08:00
argv = sdssplitargs ( line , & argc ) ;
/* 'argc < 6' was done for forward compatibility. */
if ( argv = = NULL | | argc < 6 | | ( argc % 2 ) ) {
Adapt redis-check-aof tool for Multi Part Aof (#10061)
Modifications of this PR:
1. Support the verification of `Multi Part AOF`, while still maintaining support for the
old-style `AOF/RDB-preamble`. `redis-check-aof` will automatically choose which
mode to use according to the incoming file format.
`Usage: redis-check-aof [--fix|--truncate-to-timestamp $timestamp] <AOF/manifest>`
2. Refactor part of the code to make it easier to understand
3. Currently only supports truncate (`--fix` or `--truncate-to-timestamp`) the last AOF
file (may be `BASE` or `INCR`)
The reasons for 3 above:
- for `--fix`: Only the last AOF may be truncated, this is guaranteed by redis
- for `--truncate-to-timestamp`: Normally, we only have `BASE` + `INCR` files
at most, and `BASE` cannot be truncated(It only contains a timestamp annotation
at the beginning of the file), so only `INCR` can be truncated. If we have a
`BASE+INCR1+INCR2` file (meaning we have an interrupted AOFRW), Only `INCR2`
files can be truncated at this time. If we still insist on truncate `INCR1`, we need to
manually delete `INCR2` and update the manifest file, then re-run `redis-check-aof`
- If we want to support truncate any file, we need to add very complicated code to support
the atomic modification of multiple file deletion and update manifest, I think this is unnecessary
2022-02-17 14:13:28 +08:00
err = " Invalid AOF manifest file format " ;
Implement Multi Part AOF mechanism to avoid AOFRW overheads. (#9788)
Implement Multi-Part AOF mechanism to avoid overheads during AOFRW.
Introducing a folder with multiple AOF files tracked by a manifest file.
The main issues with the the original AOFRW mechanism are:
* buffering of commands that are processed during rewrite (consuming a lot of RAM)
* freezes of the main process when the AOFRW completes to drain the remaining part of the buffer and fsync it.
* double disk IO for the data that arrives during AOFRW (had to be written to both the old and new AOF files)
The main modifications of this PR:
1. Remove the AOF rewrite buffer and related code.
2. Divide the AOF into multiple files, they are classified as two types, one is the the `BASE` type,
it represents the full amount of data (Maybe AOF or RDB format) after each AOFRW, there is only
one `BASE` file at most. The second is `INCR` type, may have more than one. They represent the
incremental commands since the last AOFRW.
3. Use a AOF manifest file to record and manage these AOF files mentioned above.
4. The original configuration of `appendfilename` will be the base part of the new file name, for example:
`appendonly.aof.1.base.rdb` and `appendonly.aof.2.incr.aof`
5. Add manifest-related TCL tests, and modified some existing tests that depend on the `appendfilename`
6. Remove the `aof_rewrite_buffer_length` field in info.
7. Add `aof-disable-auto-gc` configuration. By default we're automatically deleting HISTORY type AOFs.
It also gives users the opportunity to preserve the history AOFs. just for testing use now.
8. Add AOFRW limiting measure. When the AOFRW failures reaches the threshold (3 times now),
we will delay the execution of the next AOFRW by 1 minute. If the next AOFRW also fails, it will be
delayed by 2 minutes. The next is 4, 8, 16, the maximum delay is 60 minutes (1 hour). During the limit
period, we can still use the 'bgrewriteaof' command to execute AOFRW immediately.
9. Support upgrade (load) data from old version redis.
10. Add `appenddirname` configuration, as the directory name of the append only files. All AOF files and
manifest file will be placed in this directory.
11. Only the last AOF file (BASE or INCR) can be truncated. Otherwise redis will exit even if
`aof-load-truncated` is enabled.
Co-authored-by: Oran Agra <oran@redislabs.com>
2022-01-04 01:14:13 +08:00
goto loaderr ;
}
ai = aofInfoCreate ( ) ;
for ( int i = 0 ; i < argc ; i + = 2 ) {
if ( ! strcasecmp ( argv [ i ] , AOF_MANIFEST_KEY_FILE_NAME ) ) {
ai - > file_name = sdsnew ( argv [ i + 1 ] ) ;
if ( ! pathIsBaseName ( ai - > file_name ) ) {
err = " File can't be a path, just a filename " ;
goto loaderr ;
}
} else if ( ! strcasecmp ( argv [ i ] , AOF_MANIFEST_KEY_FILE_SEQ ) ) {
ai - > file_seq = atoll ( argv [ i + 1 ] ) ;
} else if ( ! strcasecmp ( argv [ i ] , AOF_MANIFEST_KEY_FILE_TYPE ) ) {
ai - > file_type = ( argv [ i + 1 ] ) [ 0 ] ;
}
/* else if (!strcasecmp(argv[i], AOF_MANIFEST_KEY_OTHER)) {} */
}
/* We have to make sure we load all the information. */
if ( ! ai - > file_name | | ! ai - > file_seq | | ! ai - > file_type ) {
Adapt redis-check-aof tool for Multi Part Aof (#10061)
Modifications of this PR:
1. Support the verification of `Multi Part AOF`, while still maintaining support for the
old-style `AOF/RDB-preamble`. `redis-check-aof` will automatically choose which
mode to use according to the incoming file format.
`Usage: redis-check-aof [--fix|--truncate-to-timestamp $timestamp] <AOF/manifest>`
2. Refactor part of the code to make it easier to understand
3. Currently only supports truncate (`--fix` or `--truncate-to-timestamp`) the last AOF
file (may be `BASE` or `INCR`)
The reasons for 3 above:
- for `--fix`: Only the last AOF may be truncated, this is guaranteed by redis
- for `--truncate-to-timestamp`: Normally, we only have `BASE` + `INCR` files
at most, and `BASE` cannot be truncated(It only contains a timestamp annotation
at the beginning of the file), so only `INCR` can be truncated. If we have a
`BASE+INCR1+INCR2` file (meaning we have an interrupted AOFRW), Only `INCR2`
files can be truncated at this time. If we still insist on truncate `INCR1`, we need to
manually delete `INCR2` and update the manifest file, then re-run `redis-check-aof`
- If we want to support truncate any file, we need to add very complicated code to support
the atomic modification of multiple file deletion and update manifest, I think this is unnecessary
2022-02-17 14:13:28 +08:00
err = " Invalid AOF manifest file format " ;
Implement Multi Part AOF mechanism to avoid AOFRW overheads. (#9788)
Implement Multi-Part AOF mechanism to avoid overheads during AOFRW.
Introducing a folder with multiple AOF files tracked by a manifest file.
The main issues with the the original AOFRW mechanism are:
* buffering of commands that are processed during rewrite (consuming a lot of RAM)
* freezes of the main process when the AOFRW completes to drain the remaining part of the buffer and fsync it.
* double disk IO for the data that arrives during AOFRW (had to be written to both the old and new AOF files)
The main modifications of this PR:
1. Remove the AOF rewrite buffer and related code.
2. Divide the AOF into multiple files, they are classified as two types, one is the the `BASE` type,
it represents the full amount of data (Maybe AOF or RDB format) after each AOFRW, there is only
one `BASE` file at most. The second is `INCR` type, may have more than one. They represent the
incremental commands since the last AOFRW.
3. Use a AOF manifest file to record and manage these AOF files mentioned above.
4. The original configuration of `appendfilename` will be the base part of the new file name, for example:
`appendonly.aof.1.base.rdb` and `appendonly.aof.2.incr.aof`
5. Add manifest-related TCL tests, and modified some existing tests that depend on the `appendfilename`
6. Remove the `aof_rewrite_buffer_length` field in info.
7. Add `aof-disable-auto-gc` configuration. By default we're automatically deleting HISTORY type AOFs.
It also gives users the opportunity to preserve the history AOFs. just for testing use now.
8. Add AOFRW limiting measure. When the AOFRW failures reaches the threshold (3 times now),
we will delay the execution of the next AOFRW by 1 minute. If the next AOFRW also fails, it will be
delayed by 2 minutes. The next is 4, 8, 16, the maximum delay is 60 minutes (1 hour). During the limit
period, we can still use the 'bgrewriteaof' command to execute AOFRW immediately.
9. Support upgrade (load) data from old version redis.
10. Add `appenddirname` configuration, as the directory name of the append only files. All AOF files and
manifest file will be placed in this directory.
11. Only the last AOF file (BASE or INCR) can be truncated. Otherwise redis will exit even if
`aof-load-truncated` is enabled.
Co-authored-by: Oran Agra <oran@redislabs.com>
2022-01-04 01:14:13 +08:00
goto loaderr ;
}
sdsfreesplitres ( argv , argc ) ;
argv = NULL ;
if ( ai - > file_type = = AOF_FILE_TYPE_BASE ) {
Adapt redis-check-aof tool for Multi Part Aof (#10061)
Modifications of this PR:
1. Support the verification of `Multi Part AOF`, while still maintaining support for the
old-style `AOF/RDB-preamble`. `redis-check-aof` will automatically choose which
mode to use according to the incoming file format.
`Usage: redis-check-aof [--fix|--truncate-to-timestamp $timestamp] <AOF/manifest>`
2. Refactor part of the code to make it easier to understand
3. Currently only supports truncate (`--fix` or `--truncate-to-timestamp`) the last AOF
file (may be `BASE` or `INCR`)
The reasons for 3 above:
- for `--fix`: Only the last AOF may be truncated, this is guaranteed by redis
- for `--truncate-to-timestamp`: Normally, we only have `BASE` + `INCR` files
at most, and `BASE` cannot be truncated(It only contains a timestamp annotation
at the beginning of the file), so only `INCR` can be truncated. If we have a
`BASE+INCR1+INCR2` file (meaning we have an interrupted AOFRW), Only `INCR2`
files can be truncated at this time. If we still insist on truncate `INCR1`, we need to
manually delete `INCR2` and update the manifest file, then re-run `redis-check-aof`
- If we want to support truncate any file, we need to add very complicated code to support
the atomic modification of multiple file deletion and update manifest, I think this is unnecessary
2022-02-17 14:13:28 +08:00
if ( am - > base_aof_info ) {
Implement Multi Part AOF mechanism to avoid AOFRW overheads. (#9788)
Implement Multi-Part AOF mechanism to avoid overheads during AOFRW.
Introducing a folder with multiple AOF files tracked by a manifest file.
The main issues with the the original AOFRW mechanism are:
* buffering of commands that are processed during rewrite (consuming a lot of RAM)
* freezes of the main process when the AOFRW completes to drain the remaining part of the buffer and fsync it.
* double disk IO for the data that arrives during AOFRW (had to be written to both the old and new AOF files)
The main modifications of this PR:
1. Remove the AOF rewrite buffer and related code.
2. Divide the AOF into multiple files, they are classified as two types, one is the the `BASE` type,
it represents the full amount of data (Maybe AOF or RDB format) after each AOFRW, there is only
one `BASE` file at most. The second is `INCR` type, may have more than one. They represent the
incremental commands since the last AOFRW.
3. Use a AOF manifest file to record and manage these AOF files mentioned above.
4. The original configuration of `appendfilename` will be the base part of the new file name, for example:
`appendonly.aof.1.base.rdb` and `appendonly.aof.2.incr.aof`
5. Add manifest-related TCL tests, and modified some existing tests that depend on the `appendfilename`
6. Remove the `aof_rewrite_buffer_length` field in info.
7. Add `aof-disable-auto-gc` configuration. By default we're automatically deleting HISTORY type AOFs.
It also gives users the opportunity to preserve the history AOFs. just for testing use now.
8. Add AOFRW limiting measure. When the AOFRW failures reaches the threshold (3 times now),
we will delay the execution of the next AOFRW by 1 minute. If the next AOFRW also fails, it will be
delayed by 2 minutes. The next is 4, 8, 16, the maximum delay is 60 minutes (1 hour). During the limit
period, we can still use the 'bgrewriteaof' command to execute AOFRW immediately.
9. Support upgrade (load) data from old version redis.
10. Add `appenddirname` configuration, as the directory name of the append only files. All AOF files and
manifest file will be placed in this directory.
11. Only the last AOF file (BASE or INCR) can be truncated. Otherwise redis will exit even if
`aof-load-truncated` is enabled.
Co-authored-by: Oran Agra <oran@redislabs.com>
2022-01-04 01:14:13 +08:00
err = " Found duplicate base file information " ;
goto loaderr ;
}
Adapt redis-check-aof tool for Multi Part Aof (#10061)
Modifications of this PR:
1. Support the verification of `Multi Part AOF`, while still maintaining support for the
old-style `AOF/RDB-preamble`. `redis-check-aof` will automatically choose which
mode to use according to the incoming file format.
`Usage: redis-check-aof [--fix|--truncate-to-timestamp $timestamp] <AOF/manifest>`
2. Refactor part of the code to make it easier to understand
3. Currently only supports truncate (`--fix` or `--truncate-to-timestamp`) the last AOF
file (may be `BASE` or `INCR`)
The reasons for 3 above:
- for `--fix`: Only the last AOF may be truncated, this is guaranteed by redis
- for `--truncate-to-timestamp`: Normally, we only have `BASE` + `INCR` files
at most, and `BASE` cannot be truncated(It only contains a timestamp annotation
at the beginning of the file), so only `INCR` can be truncated. If we have a
`BASE+INCR1+INCR2` file (meaning we have an interrupted AOFRW), Only `INCR2`
files can be truncated at this time. If we still insist on truncate `INCR1`, we need to
manually delete `INCR2` and update the manifest file, then re-run `redis-check-aof`
- If we want to support truncate any file, we need to add very complicated code to support
the atomic modification of multiple file deletion and update manifest, I think this is unnecessary
2022-02-17 14:13:28 +08:00
am - > base_aof_info = ai ;
am - > curr_base_file_seq = ai - > file_seq ;
Implement Multi Part AOF mechanism to avoid AOFRW overheads. (#9788)
Implement Multi-Part AOF mechanism to avoid overheads during AOFRW.
Introducing a folder with multiple AOF files tracked by a manifest file.
The main issues with the the original AOFRW mechanism are:
* buffering of commands that are processed during rewrite (consuming a lot of RAM)
* freezes of the main process when the AOFRW completes to drain the remaining part of the buffer and fsync it.
* double disk IO for the data that arrives during AOFRW (had to be written to both the old and new AOF files)
The main modifications of this PR:
1. Remove the AOF rewrite buffer and related code.
2. Divide the AOF into multiple files, they are classified as two types, one is the the `BASE` type,
it represents the full amount of data (Maybe AOF or RDB format) after each AOFRW, there is only
one `BASE` file at most. The second is `INCR` type, may have more than one. They represent the
incremental commands since the last AOFRW.
3. Use a AOF manifest file to record and manage these AOF files mentioned above.
4. The original configuration of `appendfilename` will be the base part of the new file name, for example:
`appendonly.aof.1.base.rdb` and `appendonly.aof.2.incr.aof`
5. Add manifest-related TCL tests, and modified some existing tests that depend on the `appendfilename`
6. Remove the `aof_rewrite_buffer_length` field in info.
7. Add `aof-disable-auto-gc` configuration. By default we're automatically deleting HISTORY type AOFs.
It also gives users the opportunity to preserve the history AOFs. just for testing use now.
8. Add AOFRW limiting measure. When the AOFRW failures reaches the threshold (3 times now),
we will delay the execution of the next AOFRW by 1 minute. If the next AOFRW also fails, it will be
delayed by 2 minutes. The next is 4, 8, 16, the maximum delay is 60 minutes (1 hour). During the limit
period, we can still use the 'bgrewriteaof' command to execute AOFRW immediately.
9. Support upgrade (load) data from old version redis.
10. Add `appenddirname` configuration, as the directory name of the append only files. All AOF files and
manifest file will be placed in this directory.
11. Only the last AOF file (BASE or INCR) can be truncated. Otherwise redis will exit even if
`aof-load-truncated` is enabled.
Co-authored-by: Oran Agra <oran@redislabs.com>
2022-01-04 01:14:13 +08:00
} else if ( ai - > file_type = = AOF_FILE_TYPE_HIST ) {
Adapt redis-check-aof tool for Multi Part Aof (#10061)
Modifications of this PR:
1. Support the verification of `Multi Part AOF`, while still maintaining support for the
old-style `AOF/RDB-preamble`. `redis-check-aof` will automatically choose which
mode to use according to the incoming file format.
`Usage: redis-check-aof [--fix|--truncate-to-timestamp $timestamp] <AOF/manifest>`
2. Refactor part of the code to make it easier to understand
3. Currently only supports truncate (`--fix` or `--truncate-to-timestamp`) the last AOF
file (may be `BASE` or `INCR`)
The reasons for 3 above:
- for `--fix`: Only the last AOF may be truncated, this is guaranteed by redis
- for `--truncate-to-timestamp`: Normally, we only have `BASE` + `INCR` files
at most, and `BASE` cannot be truncated(It only contains a timestamp annotation
at the beginning of the file), so only `INCR` can be truncated. If we have a
`BASE+INCR1+INCR2` file (meaning we have an interrupted AOFRW), Only `INCR2`
files can be truncated at this time. If we still insist on truncate `INCR1`, we need to
manually delete `INCR2` and update the manifest file, then re-run `redis-check-aof`
- If we want to support truncate any file, we need to add very complicated code to support
the atomic modification of multiple file deletion and update manifest, I think this is unnecessary
2022-02-17 14:13:28 +08:00
listAddNodeTail ( am - > history_aof_list , ai ) ;
Implement Multi Part AOF mechanism to avoid AOFRW overheads. (#9788)
Implement Multi-Part AOF mechanism to avoid overheads during AOFRW.
Introducing a folder with multiple AOF files tracked by a manifest file.
The main issues with the the original AOFRW mechanism are:
* buffering of commands that are processed during rewrite (consuming a lot of RAM)
* freezes of the main process when the AOFRW completes to drain the remaining part of the buffer and fsync it.
* double disk IO for the data that arrives during AOFRW (had to be written to both the old and new AOF files)
The main modifications of this PR:
1. Remove the AOF rewrite buffer and related code.
2. Divide the AOF into multiple files, they are classified as two types, one is the the `BASE` type,
it represents the full amount of data (Maybe AOF or RDB format) after each AOFRW, there is only
one `BASE` file at most. The second is `INCR` type, may have more than one. They represent the
incremental commands since the last AOFRW.
3. Use a AOF manifest file to record and manage these AOF files mentioned above.
4. The original configuration of `appendfilename` will be the base part of the new file name, for example:
`appendonly.aof.1.base.rdb` and `appendonly.aof.2.incr.aof`
5. Add manifest-related TCL tests, and modified some existing tests that depend on the `appendfilename`
6. Remove the `aof_rewrite_buffer_length` field in info.
7. Add `aof-disable-auto-gc` configuration. By default we're automatically deleting HISTORY type AOFs.
It also gives users the opportunity to preserve the history AOFs. just for testing use now.
8. Add AOFRW limiting measure. When the AOFRW failures reaches the threshold (3 times now),
we will delay the execution of the next AOFRW by 1 minute. If the next AOFRW also fails, it will be
delayed by 2 minutes. The next is 4, 8, 16, the maximum delay is 60 minutes (1 hour). During the limit
period, we can still use the 'bgrewriteaof' command to execute AOFRW immediately.
9. Support upgrade (load) data from old version redis.
10. Add `appenddirname` configuration, as the directory name of the append only files. All AOF files and
manifest file will be placed in this directory.
11. Only the last AOF file (BASE or INCR) can be truncated. Otherwise redis will exit even if
`aof-load-truncated` is enabled.
Co-authored-by: Oran Agra <oran@redislabs.com>
2022-01-04 01:14:13 +08:00
} else if ( ai - > file_type = = AOF_FILE_TYPE_INCR ) {
if ( ai - > file_seq < = maxseq ) {
err = " Found a non-monotonic sequence number " ;
goto loaderr ;
}
Adapt redis-check-aof tool for Multi Part Aof (#10061)
Modifications of this PR:
1. Support the verification of `Multi Part AOF`, while still maintaining support for the
old-style `AOF/RDB-preamble`. `redis-check-aof` will automatically choose which
mode to use according to the incoming file format.
`Usage: redis-check-aof [--fix|--truncate-to-timestamp $timestamp] <AOF/manifest>`
2. Refactor part of the code to make it easier to understand
3. Currently only supports truncate (`--fix` or `--truncate-to-timestamp`) the last AOF
file (may be `BASE` or `INCR`)
The reasons for 3 above:
- for `--fix`: Only the last AOF may be truncated, this is guaranteed by redis
- for `--truncate-to-timestamp`: Normally, we only have `BASE` + `INCR` files
at most, and `BASE` cannot be truncated(It only contains a timestamp annotation
at the beginning of the file), so only `INCR` can be truncated. If we have a
`BASE+INCR1+INCR2` file (meaning we have an interrupted AOFRW), Only `INCR2`
files can be truncated at this time. If we still insist on truncate `INCR1`, we need to
manually delete `INCR2` and update the manifest file, then re-run `redis-check-aof`
- If we want to support truncate any file, we need to add very complicated code to support
the atomic modification of multiple file deletion and update manifest, I think this is unnecessary
2022-02-17 14:13:28 +08:00
listAddNodeTail ( am - > incr_aof_list , ai ) ;
am - > curr_incr_file_seq = ai - > file_seq ;
Implement Multi Part AOF mechanism to avoid AOFRW overheads. (#9788)
Implement Multi-Part AOF mechanism to avoid overheads during AOFRW.
Introducing a folder with multiple AOF files tracked by a manifest file.
The main issues with the the original AOFRW mechanism are:
* buffering of commands that are processed during rewrite (consuming a lot of RAM)
* freezes of the main process when the AOFRW completes to drain the remaining part of the buffer and fsync it.
* double disk IO for the data that arrives during AOFRW (had to be written to both the old and new AOF files)
The main modifications of this PR:
1. Remove the AOF rewrite buffer and related code.
2. Divide the AOF into multiple files, they are classified as two types, one is the the `BASE` type,
it represents the full amount of data (Maybe AOF or RDB format) after each AOFRW, there is only
one `BASE` file at most. The second is `INCR` type, may have more than one. They represent the
incremental commands since the last AOFRW.
3. Use a AOF manifest file to record and manage these AOF files mentioned above.
4. The original configuration of `appendfilename` will be the base part of the new file name, for example:
`appendonly.aof.1.base.rdb` and `appendonly.aof.2.incr.aof`
5. Add manifest-related TCL tests, and modified some existing tests that depend on the `appendfilename`
6. Remove the `aof_rewrite_buffer_length` field in info.
7. Add `aof-disable-auto-gc` configuration. By default we're automatically deleting HISTORY type AOFs.
It also gives users the opportunity to preserve the history AOFs. just for testing use now.
8. Add AOFRW limiting measure. When the AOFRW failures reaches the threshold (3 times now),
we will delay the execution of the next AOFRW by 1 minute. If the next AOFRW also fails, it will be
delayed by 2 minutes. The next is 4, 8, 16, the maximum delay is 60 minutes (1 hour). During the limit
period, we can still use the 'bgrewriteaof' command to execute AOFRW immediately.
9. Support upgrade (load) data from old version redis.
10. Add `appenddirname` configuration, as the directory name of the append only files. All AOF files and
manifest file will be placed in this directory.
11. Only the last AOF file (BASE or INCR) can be truncated. Otherwise redis will exit even if
`aof-load-truncated` is enabled.
Co-authored-by: Oran Agra <oran@redislabs.com>
2022-01-04 01:14:13 +08:00
maxseq = ai - > file_seq ;
} else {
err = " Unknown AOF file type " ;
goto loaderr ;
}
sdsfree ( line ) ;
line = NULL ;
ai = NULL ;
2014-07-04 15:22:49 +02:00
}
Implement Multi Part AOF mechanism to avoid AOFRW overheads. (#9788)
Implement Multi-Part AOF mechanism to avoid overheads during AOFRW.
Introducing a folder with multiple AOF files tracked by a manifest file.
The main issues with the the original AOFRW mechanism are:
* buffering of commands that are processed during rewrite (consuming a lot of RAM)
* freezes of the main process when the AOFRW completes to drain the remaining part of the buffer and fsync it.
* double disk IO for the data that arrives during AOFRW (had to be written to both the old and new AOF files)
The main modifications of this PR:
1. Remove the AOF rewrite buffer and related code.
2. Divide the AOF into multiple files, they are classified as two types, one is the the `BASE` type,
it represents the full amount of data (Maybe AOF or RDB format) after each AOFRW, there is only
one `BASE` file at most. The second is `INCR` type, may have more than one. They represent the
incremental commands since the last AOFRW.
3. Use a AOF manifest file to record and manage these AOF files mentioned above.
4. The original configuration of `appendfilename` will be the base part of the new file name, for example:
`appendonly.aof.1.base.rdb` and `appendonly.aof.2.incr.aof`
5. Add manifest-related TCL tests, and modified some existing tests that depend on the `appendfilename`
6. Remove the `aof_rewrite_buffer_length` field in info.
7. Add `aof-disable-auto-gc` configuration. By default we're automatically deleting HISTORY type AOFs.
It also gives users the opportunity to preserve the history AOFs. just for testing use now.
8. Add AOFRW limiting measure. When the AOFRW failures reaches the threshold (3 times now),
we will delay the execution of the next AOFRW by 1 minute. If the next AOFRW also fails, it will be
delayed by 2 minutes. The next is 4, 8, 16, the maximum delay is 60 minutes (1 hour). During the limit
period, we can still use the 'bgrewriteaof' command to execute AOFRW immediately.
9. Support upgrade (load) data from old version redis.
10. Add `appenddirname` configuration, as the directory name of the append only files. All AOF files and
manifest file will be placed in this directory.
11. Only the last AOF file (BASE or INCR) can be truncated. Otherwise redis will exit even if
`aof-load-truncated` is enabled.
Co-authored-by: Oran Agra <oran@redislabs.com>
2022-01-04 01:14:13 +08:00
fclose ( fp ) ;
Adapt redis-check-aof tool for Multi Part Aof (#10061)
Modifications of this PR:
1. Support the verification of `Multi Part AOF`, while still maintaining support for the
old-style `AOF/RDB-preamble`. `redis-check-aof` will automatically choose which
mode to use according to the incoming file format.
`Usage: redis-check-aof [--fix|--truncate-to-timestamp $timestamp] <AOF/manifest>`
2. Refactor part of the code to make it easier to understand
3. Currently only supports truncate (`--fix` or `--truncate-to-timestamp`) the last AOF
file (may be `BASE` or `INCR`)
The reasons for 3 above:
- for `--fix`: Only the last AOF may be truncated, this is guaranteed by redis
- for `--truncate-to-timestamp`: Normally, we only have `BASE` + `INCR` files
at most, and `BASE` cannot be truncated(It only contains a timestamp annotation
at the beginning of the file), so only `INCR` can be truncated. If we have a
`BASE+INCR1+INCR2` file (meaning we have an interrupted AOFRW), Only `INCR2`
files can be truncated at this time. If we still insist on truncate `INCR1`, we need to
manually delete `INCR2` and update the manifest file, then re-run `redis-check-aof`
- If we want to support truncate any file, we need to add very complicated code to support
the atomic modification of multiple file deletion and update manifest, I think this is unnecessary
2022-02-17 14:13:28 +08:00
return am ;
Implement Multi Part AOF mechanism to avoid AOFRW overheads. (#9788)
Implement Multi-Part AOF mechanism to avoid overheads during AOFRW.
Introducing a folder with multiple AOF files tracked by a manifest file.
The main issues with the the original AOFRW mechanism are:
* buffering of commands that are processed during rewrite (consuming a lot of RAM)
* freezes of the main process when the AOFRW completes to drain the remaining part of the buffer and fsync it.
* double disk IO for the data that arrives during AOFRW (had to be written to both the old and new AOF files)
The main modifications of this PR:
1. Remove the AOF rewrite buffer and related code.
2. Divide the AOF into multiple files, they are classified as two types, one is the the `BASE` type,
it represents the full amount of data (Maybe AOF or RDB format) after each AOFRW, there is only
one `BASE` file at most. The second is `INCR` type, may have more than one. They represent the
incremental commands since the last AOFRW.
3. Use a AOF manifest file to record and manage these AOF files mentioned above.
4. The original configuration of `appendfilename` will be the base part of the new file name, for example:
`appendonly.aof.1.base.rdb` and `appendonly.aof.2.incr.aof`
5. Add manifest-related TCL tests, and modified some existing tests that depend on the `appendfilename`
6. Remove the `aof_rewrite_buffer_length` field in info.
7. Add `aof-disable-auto-gc` configuration. By default we're automatically deleting HISTORY type AOFs.
It also gives users the opportunity to preserve the history AOFs. just for testing use now.
8. Add AOFRW limiting measure. When the AOFRW failures reaches the threshold (3 times now),
we will delay the execution of the next AOFRW by 1 minute. If the next AOFRW also fails, it will be
delayed by 2 minutes. The next is 4, 8, 16, the maximum delay is 60 minutes (1 hour). During the limit
period, we can still use the 'bgrewriteaof' command to execute AOFRW immediately.
9. Support upgrade (load) data from old version redis.
10. Add `appenddirname` configuration, as the directory name of the append only files. All AOF files and
manifest file will be placed in this directory.
11. Only the last AOF file (BASE or INCR) can be truncated. Otherwise redis will exit even if
`aof-load-truncated` is enabled.
Co-authored-by: Oran Agra <oran@redislabs.com>
2022-01-04 01:14:13 +08:00
loaderr :
/* Sanitizer suppression: may report a false positive if we goto loaderr
* and exit ( 1 ) without freeing these allocations . */
if ( argv ) sdsfreesplitres ( argv , argc ) ;
if ( ai ) aofInfoFree ( ai ) ;
serverLog ( LL_WARNING , " \n *** FATAL AOF MANIFEST FILE ERROR *** \n " ) ;
if ( line ) {
serverLog ( LL_WARNING , " Reading the manifest file, at line %d \n " , linenum ) ;
serverLog ( LL_WARNING , " >>> '%s' \n " , line ) ;
}
serverLog ( LL_WARNING , " %s \n " , err ) ;
exit ( 1 ) ;
}
/* Deep copy an aofManifest from orig.
*
* In ` backgroundRewriteDoneHandler ` and ` openNewIncrAofForAppend ` , we will
* first deep copy a temporary AOF manifest from the ` server . aof_manifest ` and
* try to modify it . Once everything is modified , we will atomically make the
* ` server . aof_manifest ` point to this temporary aof_manifest .
*/
aofManifest * aofManifestDup ( aofManifest * orig ) {
serverAssert ( orig ! = NULL ) ;
aofManifest * am = zcalloc ( sizeof ( aofManifest ) ) ;
am - > curr_base_file_seq = orig - > curr_base_file_seq ;
am - > curr_incr_file_seq = orig - > curr_incr_file_seq ;
am - > dirty = orig - > dirty ;
if ( orig - > base_aof_info ) {
am - > base_aof_info = aofInfoDup ( orig - > base_aof_info ) ;
}
am - > incr_aof_list = listDup ( orig - > incr_aof_list ) ;
am - > history_aof_list = listDup ( orig - > history_aof_list ) ;
serverAssert ( am - > incr_aof_list ! = NULL ) ;
serverAssert ( am - > history_aof_list ! = NULL ) ;
return am ;
}
/* Change the `server.aof_manifest` pointer to 'am' and free the previous
* one if we have . */
void aofManifestFreeAndUpdate ( aofManifest * am ) {
serverAssert ( am ! = NULL ) ;
if ( server . aof_manifest ) aofManifestFree ( server . aof_manifest ) ;
server . aof_manifest = am ;
}
/* Called in `backgroundRewriteDoneHandler` to get a new BASE file
* name , and mark the previous ( if we have ) BASE file as HISTORY type .
*
* BASE file naming rules : ` server . aof_filename ` . seq . base . format
*
* for example :
* appendonly . aof .1 . base . aof ( server . aof_use_rdb_preamble is no )
* appendonly . aof .1 . base . rdb ( server . aof_use_rdb_preamble is yes )
*/
sds getNewBaseFileNameAndMarkPreAsHistory ( aofManifest * am ) {
serverAssert ( am ! = NULL ) ;
if ( am - > base_aof_info ) {
serverAssert ( am - > base_aof_info - > file_type = = AOF_FILE_TYPE_BASE ) ;
am - > base_aof_info - > file_type = AOF_FILE_TYPE_HIST ;
listAddNodeHead ( am - > history_aof_list , am - > base_aof_info ) ;
}
char * format_suffix = server . aof_use_rdb_preamble ?
RDB_FORMAT_SUFFIX : AOF_FORMAT_SUFFIX ;
aofInfo * ai = aofInfoCreate ( ) ;
ai - > file_name = sdscatprintf ( sdsempty ( ) , " %s.%lld%s%s " , server . aof_filename ,
+ + am - > curr_base_file_seq , BASE_FILE_SUFFIX , format_suffix ) ;
ai - > file_seq = am - > curr_base_file_seq ;
ai - > file_type = AOF_FILE_TYPE_BASE ;
am - > base_aof_info = ai ;
am - > dirty = 1 ;
return am - > base_aof_info - > file_name ;
2014-07-04 15:22:49 +02:00
}
Implement Multi Part AOF mechanism to avoid AOFRW overheads. (#9788)
Implement Multi-Part AOF mechanism to avoid overheads during AOFRW.
Introducing a folder with multiple AOF files tracked by a manifest file.
The main issues with the the original AOFRW mechanism are:
* buffering of commands that are processed during rewrite (consuming a lot of RAM)
* freezes of the main process when the AOFRW completes to drain the remaining part of the buffer and fsync it.
* double disk IO for the data that arrives during AOFRW (had to be written to both the old and new AOF files)
The main modifications of this PR:
1. Remove the AOF rewrite buffer and related code.
2. Divide the AOF into multiple files, they are classified as two types, one is the the `BASE` type,
it represents the full amount of data (Maybe AOF or RDB format) after each AOFRW, there is only
one `BASE` file at most. The second is `INCR` type, may have more than one. They represent the
incremental commands since the last AOFRW.
3. Use a AOF manifest file to record and manage these AOF files mentioned above.
4. The original configuration of `appendfilename` will be the base part of the new file name, for example:
`appendonly.aof.1.base.rdb` and `appendonly.aof.2.incr.aof`
5. Add manifest-related TCL tests, and modified some existing tests that depend on the `appendfilename`
6. Remove the `aof_rewrite_buffer_length` field in info.
7. Add `aof-disable-auto-gc` configuration. By default we're automatically deleting HISTORY type AOFs.
It also gives users the opportunity to preserve the history AOFs. just for testing use now.
8. Add AOFRW limiting measure. When the AOFRW failures reaches the threshold (3 times now),
we will delay the execution of the next AOFRW by 1 minute. If the next AOFRW also fails, it will be
delayed by 2 minutes. The next is 4, 8, 16, the maximum delay is 60 minutes (1 hour). During the limit
period, we can still use the 'bgrewriteaof' command to execute AOFRW immediately.
9. Support upgrade (load) data from old version redis.
10. Add `appenddirname` configuration, as the directory name of the append only files. All AOF files and
manifest file will be placed in this directory.
11. Only the last AOF file (BASE or INCR) can be truncated. Otherwise redis will exit even if
`aof-load-truncated` is enabled.
Co-authored-by: Oran Agra <oran@redislabs.com>
2022-01-04 01:14:13 +08:00
/* Get a new INCR type AOF name.
2021-11-11 14:51:33 +03:00
*
Implement Multi Part AOF mechanism to avoid AOFRW overheads. (#9788)
Implement Multi-Part AOF mechanism to avoid overheads during AOFRW.
Introducing a folder with multiple AOF files tracked by a manifest file.
The main issues with the the original AOFRW mechanism are:
* buffering of commands that are processed during rewrite (consuming a lot of RAM)
* freezes of the main process when the AOFRW completes to drain the remaining part of the buffer and fsync it.
* double disk IO for the data that arrives during AOFRW (had to be written to both the old and new AOF files)
The main modifications of this PR:
1. Remove the AOF rewrite buffer and related code.
2. Divide the AOF into multiple files, they are classified as two types, one is the the `BASE` type,
it represents the full amount of data (Maybe AOF or RDB format) after each AOFRW, there is only
one `BASE` file at most. The second is `INCR` type, may have more than one. They represent the
incremental commands since the last AOFRW.
3. Use a AOF manifest file to record and manage these AOF files mentioned above.
4. The original configuration of `appendfilename` will be the base part of the new file name, for example:
`appendonly.aof.1.base.rdb` and `appendonly.aof.2.incr.aof`
5. Add manifest-related TCL tests, and modified some existing tests that depend on the `appendfilename`
6. Remove the `aof_rewrite_buffer_length` field in info.
7. Add `aof-disable-auto-gc` configuration. By default we're automatically deleting HISTORY type AOFs.
It also gives users the opportunity to preserve the history AOFs. just for testing use now.
8. Add AOFRW limiting measure. When the AOFRW failures reaches the threshold (3 times now),
we will delay the execution of the next AOFRW by 1 minute. If the next AOFRW also fails, it will be
delayed by 2 minutes. The next is 4, 8, 16, the maximum delay is 60 minutes (1 hour). During the limit
period, we can still use the 'bgrewriteaof' command to execute AOFRW immediately.
9. Support upgrade (load) data from old version redis.
10. Add `appenddirname` configuration, as the directory name of the append only files. All AOF files and
manifest file will be placed in this directory.
11. Only the last AOF file (BASE or INCR) can be truncated. Otherwise redis will exit even if
`aof-load-truncated` is enabled.
Co-authored-by: Oran Agra <oran@redislabs.com>
2022-01-04 01:14:13 +08:00
* INCR AOF naming rules : ` server . aof_filename ` . seq . incr . aof
*
* for example :
* appendonly . aof .1 . incr . aof
*/
sds getNewIncrAofName ( aofManifest * am ) {
aofInfo * ai = aofInfoCreate ( ) ;
ai - > file_type = AOF_FILE_TYPE_INCR ;
ai - > file_name = sdscatprintf ( sdsempty ( ) , " %s.%lld%s%s " , server . aof_filename ,
+ + am - > curr_incr_file_seq , INCR_FILE_SUFFIX , AOF_FORMAT_SUFFIX ) ;
ai - > file_seq = am - > curr_incr_file_seq ;
listAddNodeTail ( am - > incr_aof_list , ai ) ;
am - > dirty = 1 ;
return ai - > file_name ;
}
Allow an AOF rewrite buffer > 2GB (Fix for issue #504).
During the AOF rewrite process, the parent process needs to accumulate
the new writes in an in-memory buffer: when the child will terminate the
AOF rewriting process this buffer (that ist the difference between the
dataset when the rewrite was started, and the current dataset) is
flushed to the new AOF file.
We used to implement this buffer using an sds.c string, but sds.c has a
2GB limit. Sometimes the dataset can be big enough, the amount of writes
so high, and the rewrite process slow enough that we overflow the 2GB
limit, causing a crash, documented on github by issue #504.
In order to prevent this from happening, this commit introduces a new
system to accumulate writes, implemented by a linked list of blocks of
10 MB each, so that we also avoid paying the reallocation cost.
Note that theoretically modern operating systems may implement realloc()
simply as a remaping of the old pages, thus with very good performances,
see for instance the mremap() syscall on Linux. However this is not
always true, and jemalloc by default avoids doing this because there are
issues with the current implementation of mremap().
For this reason we are using a linked list of blocks instead of a single
block that gets reallocated again and again.
The changes in this commit lacks testing, that will be performed before
merging into the unstable branch. This fix will not enter 2.4 because it
is too invasive. However 2.4 will log a warning when the AOF rewrite
buffer is near to the 2GB limit.
2012-05-22 13:03:41 +02:00
2022-04-26 21:31:19 +08:00
/* Get temp INCR type AOF name. */
2023-05-02 17:31:32 -07:00
sds getTempIncrAofName ( void ) {
2022-04-26 21:31:19 +08:00
return sdscatprintf ( sdsempty ( ) , " %s%s%s " , TEMP_FILE_NAME_PREFIX , server . aof_filename ,
INCR_FILE_SUFFIX ) ;
}
Implement Multi Part AOF mechanism to avoid AOFRW overheads. (#9788)
Implement Multi-Part AOF mechanism to avoid overheads during AOFRW.
Introducing a folder with multiple AOF files tracked by a manifest file.
The main issues with the the original AOFRW mechanism are:
* buffering of commands that are processed during rewrite (consuming a lot of RAM)
* freezes of the main process when the AOFRW completes to drain the remaining part of the buffer and fsync it.
* double disk IO for the data that arrives during AOFRW (had to be written to both the old and new AOF files)
The main modifications of this PR:
1. Remove the AOF rewrite buffer and related code.
2. Divide the AOF into multiple files, they are classified as two types, one is the the `BASE` type,
it represents the full amount of data (Maybe AOF or RDB format) after each AOFRW, there is only
one `BASE` file at most. The second is `INCR` type, may have more than one. They represent the
incremental commands since the last AOFRW.
3. Use a AOF manifest file to record and manage these AOF files mentioned above.
4. The original configuration of `appendfilename` will be the base part of the new file name, for example:
`appendonly.aof.1.base.rdb` and `appendonly.aof.2.incr.aof`
5. Add manifest-related TCL tests, and modified some existing tests that depend on the `appendfilename`
6. Remove the `aof_rewrite_buffer_length` field in info.
7. Add `aof-disable-auto-gc` configuration. By default we're automatically deleting HISTORY type AOFs.
It also gives users the opportunity to preserve the history AOFs. just for testing use now.
8. Add AOFRW limiting measure. When the AOFRW failures reaches the threshold (3 times now),
we will delay the execution of the next AOFRW by 1 minute. If the next AOFRW also fails, it will be
delayed by 2 minutes. The next is 4, 8, 16, the maximum delay is 60 minutes (1 hour). During the limit
period, we can still use the 'bgrewriteaof' command to execute AOFRW immediately.
9. Support upgrade (load) data from old version redis.
10. Add `appenddirname` configuration, as the directory name of the append only files. All AOF files and
manifest file will be placed in this directory.
11. Only the last AOF file (BASE or INCR) can be truncated. Otherwise redis will exit even if
`aof-load-truncated` is enabled.
Co-authored-by: Oran Agra <oran@redislabs.com>
2022-01-04 01:14:13 +08:00
/* Get the last INCR AOF name or create a new one. */
sds getLastIncrAofName ( aofManifest * am ) {
serverAssert ( am ! = NULL ) ;
/* If 'incr_aof_list' is empty, just create a new one. */
if ( ! listLength ( am - > incr_aof_list ) ) {
return getNewIncrAofName ( am ) ;
}
/* Or return the last one. */
listNode * lastnode = listIndex ( am - > incr_aof_list , - 1 ) ;
aofInfo * ai = listNodeValue ( lastnode ) ;
return ai - > file_name ;
}
/* Called in `backgroundRewriteDoneHandler`. when AOFRW success, This
* function will change the AOF file type in ' incr_aof_list ' from
* AOF_FILE_TYPE_INCR to AOF_FILE_TYPE_HIST , and move them to the
* ' history_aof_list ' .
*/
void markRewrittenIncrAofAsHistory ( aofManifest * am ) {
serverAssert ( am ! = NULL ) ;
if ( ! listLength ( am - > incr_aof_list ) ) {
return ;
}
listNode * ln ;
listIter li ;
listRewindTail ( am - > incr_aof_list , & li ) ;
/* "server.aof_fd != -1" means AOF enabled, then we must skip the
* last AOF , because this file is our currently writing . */
if ( server . aof_fd ! = - 1 ) {
ln = listNext ( & li ) ;
serverAssert ( ln ! = NULL ) ;
}
/* Move aofInfo from 'incr_aof_list' to 'history_aof_list'. */
while ( ( ln = listNext ( & li ) ) ! = NULL ) {
aofInfo * ai = ( aofInfo * ) ln - > value ;
serverAssert ( ai - > file_type = = AOF_FILE_TYPE_INCR ) ;
aofInfo * hai = aofInfoDup ( ai ) ;
hai - > file_type = AOF_FILE_TYPE_HIST ;
listAddNodeHead ( am - > history_aof_list , hai ) ;
listDelNode ( am - > incr_aof_list , ln ) ;
}
am - > dirty = 1 ;
}
/* Write the formatted manifest string to disk. */
int writeAofManifestFile ( sds buf ) {
int ret = C_OK ;
ssize_t nwritten ;
int len ;
sds am_name = getAofManifestFileName ( ) ;
sds am_filepath = makePath ( server . aof_dirname , am_name ) ;
sds tmp_am_name = getTempAofManifestFileName ( ) ;
sds tmp_am_filepath = makePath ( server . aof_dirname , tmp_am_name ) ;
int fd = open ( tmp_am_filepath , O_WRONLY | O_TRUNC | O_CREAT , 0644 ) ;
if ( fd = = - 1 ) {
serverLog ( LL_WARNING , " Can't open the AOF manifest file %s: %s " ,
tmp_am_name , strerror ( errno ) ) ;
ret = C_ERR ;
goto cleanup ;
}
len = sdslen ( buf ) ;
Allow an AOF rewrite buffer > 2GB (Fix for issue #504).
During the AOF rewrite process, the parent process needs to accumulate
the new writes in an in-memory buffer: when the child will terminate the
AOF rewriting process this buffer (that ist the difference between the
dataset when the rewrite was started, and the current dataset) is
flushed to the new AOF file.
We used to implement this buffer using an sds.c string, but sds.c has a
2GB limit. Sometimes the dataset can be big enough, the amount of writes
so high, and the rewrite process slow enough that we overflow the 2GB
limit, causing a crash, documented on github by issue #504.
In order to prevent this from happening, this commit introduces a new
system to accumulate writes, implemented by a linked list of blocks of
10 MB each, so that we also avoid paying the reallocation cost.
Note that theoretically modern operating systems may implement realloc()
simply as a remaping of the old pages, thus with very good performances,
see for instance the mremap() syscall on Linux. However this is not
always true, and jemalloc by default avoids doing this because there are
issues with the current implementation of mremap().
For this reason we are using a linked list of blocks instead of a single
block that gets reallocated again and again.
The changes in this commit lacks testing, that will be performed before
merging into the unstable branch. This fix will not enter 2.4 because it
is too invasive. However 2.4 will log a warning when the AOF rewrite
buffer is near to the 2GB limit.
2012-05-22 13:03:41 +02:00
while ( len ) {
Implement Multi Part AOF mechanism to avoid AOFRW overheads. (#9788)
Implement Multi-Part AOF mechanism to avoid overheads during AOFRW.
Introducing a folder with multiple AOF files tracked by a manifest file.
The main issues with the the original AOFRW mechanism are:
* buffering of commands that are processed during rewrite (consuming a lot of RAM)
* freezes of the main process when the AOFRW completes to drain the remaining part of the buffer and fsync it.
* double disk IO for the data that arrives during AOFRW (had to be written to both the old and new AOF files)
The main modifications of this PR:
1. Remove the AOF rewrite buffer and related code.
2. Divide the AOF into multiple files, they are classified as two types, one is the the `BASE` type,
it represents the full amount of data (Maybe AOF or RDB format) after each AOFRW, there is only
one `BASE` file at most. The second is `INCR` type, may have more than one. They represent the
incremental commands since the last AOFRW.
3. Use a AOF manifest file to record and manage these AOF files mentioned above.
4. The original configuration of `appendfilename` will be the base part of the new file name, for example:
`appendonly.aof.1.base.rdb` and `appendonly.aof.2.incr.aof`
5. Add manifest-related TCL tests, and modified some existing tests that depend on the `appendfilename`
6. Remove the `aof_rewrite_buffer_length` field in info.
7. Add `aof-disable-auto-gc` configuration. By default we're automatically deleting HISTORY type AOFs.
It also gives users the opportunity to preserve the history AOFs. just for testing use now.
8. Add AOFRW limiting measure. When the AOFRW failures reaches the threshold (3 times now),
we will delay the execution of the next AOFRW by 1 minute. If the next AOFRW also fails, it will be
delayed by 2 minutes. The next is 4, 8, 16, the maximum delay is 60 minutes (1 hour). During the limit
period, we can still use the 'bgrewriteaof' command to execute AOFRW immediately.
9. Support upgrade (load) data from old version redis.
10. Add `appenddirname` configuration, as the directory name of the append only files. All AOF files and
manifest file will be placed in this directory.
11. Only the last AOF file (BASE or INCR) can be truncated. Otherwise redis will exit even if
`aof-load-truncated` is enabled.
Co-authored-by: Oran Agra <oran@redislabs.com>
2022-01-04 01:14:13 +08:00
nwritten = write ( fd , buf , len ) ;
Allow an AOF rewrite buffer > 2GB (Fix for issue #504).
During the AOF rewrite process, the parent process needs to accumulate
the new writes in an in-memory buffer: when the child will terminate the
AOF rewriting process this buffer (that ist the difference between the
dataset when the rewrite was started, and the current dataset) is
flushed to the new AOF file.
We used to implement this buffer using an sds.c string, but sds.c has a
2GB limit. Sometimes the dataset can be big enough, the amount of writes
so high, and the rewrite process slow enough that we overflow the 2GB
limit, causing a crash, documented on github by issue #504.
In order to prevent this from happening, this commit introduces a new
system to accumulate writes, implemented by a linked list of blocks of
10 MB each, so that we also avoid paying the reallocation cost.
Note that theoretically modern operating systems may implement realloc()
simply as a remaping of the old pages, thus with very good performances,
see for instance the mremap() syscall on Linux. However this is not
always true, and jemalloc by default avoids doing this because there are
issues with the current implementation of mremap().
For this reason we are using a linked list of blocks instead of a single
block that gets reallocated again and again.
The changes in this commit lacks testing, that will be performed before
merging into the unstable branch. This fix will not enter 2.4 because it
is too invasive. However 2.4 will log a warning when the AOF rewrite
buffer is near to the 2GB limit.
2012-05-22 13:03:41 +02:00
Implement Multi Part AOF mechanism to avoid AOFRW overheads. (#9788)
Implement Multi-Part AOF mechanism to avoid overheads during AOFRW.
Introducing a folder with multiple AOF files tracked by a manifest file.
The main issues with the the original AOFRW mechanism are:
* buffering of commands that are processed during rewrite (consuming a lot of RAM)
* freezes of the main process when the AOFRW completes to drain the remaining part of the buffer and fsync it.
* double disk IO for the data that arrives during AOFRW (had to be written to both the old and new AOF files)
The main modifications of this PR:
1. Remove the AOF rewrite buffer and related code.
2. Divide the AOF into multiple files, they are classified as two types, one is the the `BASE` type,
it represents the full amount of data (Maybe AOF or RDB format) after each AOFRW, there is only
one `BASE` file at most. The second is `INCR` type, may have more than one. They represent the
incremental commands since the last AOFRW.
3. Use a AOF manifest file to record and manage these AOF files mentioned above.
4. The original configuration of `appendfilename` will be the base part of the new file name, for example:
`appendonly.aof.1.base.rdb` and `appendonly.aof.2.incr.aof`
5. Add manifest-related TCL tests, and modified some existing tests that depend on the `appendfilename`
6. Remove the `aof_rewrite_buffer_length` field in info.
7. Add `aof-disable-auto-gc` configuration. By default we're automatically deleting HISTORY type AOFs.
It also gives users the opportunity to preserve the history AOFs. just for testing use now.
8. Add AOFRW limiting measure. When the AOFRW failures reaches the threshold (3 times now),
we will delay the execution of the next AOFRW by 1 minute. If the next AOFRW also fails, it will be
delayed by 2 minutes. The next is 4, 8, 16, the maximum delay is 60 minutes (1 hour). During the limit
period, we can still use the 'bgrewriteaof' command to execute AOFRW immediately.
9. Support upgrade (load) data from old version redis.
10. Add `appenddirname` configuration, as the directory name of the append only files. All AOF files and
manifest file will be placed in this directory.
11. Only the last AOF file (BASE or INCR) can be truncated. Otherwise redis will exit even if
`aof-load-truncated` is enabled.
Co-authored-by: Oran Agra <oran@redislabs.com>
2022-01-04 01:14:13 +08:00
if ( nwritten < 0 ) {
if ( errno = = EINTR ) continue ;
serverLog ( LL_WARNING , " Error trying to write the temporary AOF manifest file %s: %s " ,
tmp_am_name , strerror ( errno ) ) ;
ret = C_ERR ;
goto cleanup ;
Allow an AOF rewrite buffer > 2GB (Fix for issue #504).
During the AOF rewrite process, the parent process needs to accumulate
the new writes in an in-memory buffer: when the child will terminate the
AOF rewriting process this buffer (that ist the difference between the
dataset when the rewrite was started, and the current dataset) is
flushed to the new AOF file.
We used to implement this buffer using an sds.c string, but sds.c has a
2GB limit. Sometimes the dataset can be big enough, the amount of writes
so high, and the rewrite process slow enough that we overflow the 2GB
limit, causing a crash, documented on github by issue #504.
In order to prevent this from happening, this commit introduces a new
system to accumulate writes, implemented by a linked list of blocks of
10 MB each, so that we also avoid paying the reallocation cost.
Note that theoretically modern operating systems may implement realloc()
simply as a remaping of the old pages, thus with very good performances,
see for instance the mremap() syscall on Linux. However this is not
always true, and jemalloc by default avoids doing this because there are
issues with the current implementation of mremap().
For this reason we are using a linked list of blocks instead of a single
block that gets reallocated again and again.
The changes in this commit lacks testing, that will be performed before
merging into the unstable branch. This fix will not enter 2.4 because it
is too invasive. However 2.4 will log a warning when the AOF rewrite
buffer is near to the 2GB limit.
2012-05-22 13:03:41 +02:00
}
Implement Multi Part AOF mechanism to avoid AOFRW overheads. (#9788)
Implement Multi-Part AOF mechanism to avoid overheads during AOFRW.
Introducing a folder with multiple AOF files tracked by a manifest file.
The main issues with the the original AOFRW mechanism are:
* buffering of commands that are processed during rewrite (consuming a lot of RAM)
* freezes of the main process when the AOFRW completes to drain the remaining part of the buffer and fsync it.
* double disk IO for the data that arrives during AOFRW (had to be written to both the old and new AOF files)
The main modifications of this PR:
1. Remove the AOF rewrite buffer and related code.
2. Divide the AOF into multiple files, they are classified as two types, one is the the `BASE` type,
it represents the full amount of data (Maybe AOF or RDB format) after each AOFRW, there is only
one `BASE` file at most. The second is `INCR` type, may have more than one. They represent the
incremental commands since the last AOFRW.
3. Use a AOF manifest file to record and manage these AOF files mentioned above.
4. The original configuration of `appendfilename` will be the base part of the new file name, for example:
`appendonly.aof.1.base.rdb` and `appendonly.aof.2.incr.aof`
5. Add manifest-related TCL tests, and modified some existing tests that depend on the `appendfilename`
6. Remove the `aof_rewrite_buffer_length` field in info.
7. Add `aof-disable-auto-gc` configuration. By default we're automatically deleting HISTORY type AOFs.
It also gives users the opportunity to preserve the history AOFs. just for testing use now.
8. Add AOFRW limiting measure. When the AOFRW failures reaches the threshold (3 times now),
we will delay the execution of the next AOFRW by 1 minute. If the next AOFRW also fails, it will be
delayed by 2 minutes. The next is 4, 8, 16, the maximum delay is 60 minutes (1 hour). During the limit
period, we can still use the 'bgrewriteaof' command to execute AOFRW immediately.
9. Support upgrade (load) data from old version redis.
10. Add `appenddirname` configuration, as the directory name of the append only files. All AOF files and
manifest file will be placed in this directory.
11. Only the last AOF file (BASE or INCR) can be truncated. Otherwise redis will exit even if
`aof-load-truncated` is enabled.
Co-authored-by: Oran Agra <oran@redislabs.com>
2022-01-04 01:14:13 +08:00
len - = nwritten ;
buf + = nwritten ;
Allow an AOF rewrite buffer > 2GB (Fix for issue #504).
During the AOF rewrite process, the parent process needs to accumulate
the new writes in an in-memory buffer: when the child will terminate the
AOF rewriting process this buffer (that ist the difference between the
dataset when the rewrite was started, and the current dataset) is
flushed to the new AOF file.
We used to implement this buffer using an sds.c string, but sds.c has a
2GB limit. Sometimes the dataset can be big enough, the amount of writes
so high, and the rewrite process slow enough that we overflow the 2GB
limit, causing a crash, documented on github by issue #504.
In order to prevent this from happening, this commit introduces a new
system to accumulate writes, implemented by a linked list of blocks of
10 MB each, so that we also avoid paying the reallocation cost.
Note that theoretically modern operating systems may implement realloc()
simply as a remaping of the old pages, thus with very good performances,
see for instance the mremap() syscall on Linux. However this is not
always true, and jemalloc by default avoids doing this because there are
issues with the current implementation of mremap().
For this reason we are using a linked list of blocks instead of a single
block that gets reallocated again and again.
The changes in this commit lacks testing, that will be performed before
merging into the unstable branch. This fix will not enter 2.4 because it
is too invasive. However 2.4 will log a warning when the AOF rewrite
buffer is near to the 2GB limit.
2012-05-22 13:03:41 +02:00
}
2014-07-04 15:22:49 +02:00
Implement Multi Part AOF mechanism to avoid AOFRW overheads. (#9788)
Implement Multi-Part AOF mechanism to avoid overheads during AOFRW.
Introducing a folder with multiple AOF files tracked by a manifest file.
The main issues with the the original AOFRW mechanism are:
* buffering of commands that are processed during rewrite (consuming a lot of RAM)
* freezes of the main process when the AOFRW completes to drain the remaining part of the buffer and fsync it.
* double disk IO for the data that arrives during AOFRW (had to be written to both the old and new AOF files)
The main modifications of this PR:
1. Remove the AOF rewrite buffer and related code.
2. Divide the AOF into multiple files, they are classified as two types, one is the the `BASE` type,
it represents the full amount of data (Maybe AOF or RDB format) after each AOFRW, there is only
one `BASE` file at most. The second is `INCR` type, may have more than one. They represent the
incremental commands since the last AOFRW.
3. Use a AOF manifest file to record and manage these AOF files mentioned above.
4. The original configuration of `appendfilename` will be the base part of the new file name, for example:
`appendonly.aof.1.base.rdb` and `appendonly.aof.2.incr.aof`
5. Add manifest-related TCL tests, and modified some existing tests that depend on the `appendfilename`
6. Remove the `aof_rewrite_buffer_length` field in info.
7. Add `aof-disable-auto-gc` configuration. By default we're automatically deleting HISTORY type AOFs.
It also gives users the opportunity to preserve the history AOFs. just for testing use now.
8. Add AOFRW limiting measure. When the AOFRW failures reaches the threshold (3 times now),
we will delay the execution of the next AOFRW by 1 minute. If the next AOFRW also fails, it will be
delayed by 2 minutes. The next is 4, 8, 16, the maximum delay is 60 minutes (1 hour). During the limit
period, we can still use the 'bgrewriteaof' command to execute AOFRW immediately.
9. Support upgrade (load) data from old version redis.
10. Add `appenddirname` configuration, as the directory name of the append only files. All AOF files and
manifest file will be placed in this directory.
11. Only the last AOF file (BASE or INCR) can be truncated. Otherwise redis will exit even if
`aof-load-truncated` is enabled.
Co-authored-by: Oran Agra <oran@redislabs.com>
2022-01-04 01:14:13 +08:00
if ( redis_fsync ( fd ) = = - 1 ) {
serverLog ( LL_WARNING , " Fail to fsync the temp AOF file %s: %s. " ,
tmp_am_name , strerror ( errno ) ) ;
ret = C_ERR ;
goto cleanup ;
}
if ( rename ( tmp_am_filepath , am_filepath ) ! = 0 ) {
serverLog ( LL_WARNING ,
" Error trying to rename the temporary AOF manifest file %s into %s: %s " ,
tmp_am_name , am_name , strerror ( errno ) ) ;
ret = C_ERR ;
2022-06-21 00:17:23 +08:00
goto cleanup ;
}
/* Also sync the AOF directory as new AOF files may be added in the directory */
if ( fsyncFileDir ( am_filepath ) = = - 1 ) {
serverLog ( LL_WARNING , " Fail to fsync AOF directory %s: %s. " ,
am_filepath , strerror ( errno ) ) ;
ret = C_ERR ;
goto cleanup ;
2014-07-04 15:22:49 +02:00
}
Implement Multi Part AOF mechanism to avoid AOFRW overheads. (#9788)
Implement Multi-Part AOF mechanism to avoid overheads during AOFRW.
Introducing a folder with multiple AOF files tracked by a manifest file.
The main issues with the the original AOFRW mechanism are:
* buffering of commands that are processed during rewrite (consuming a lot of RAM)
* freezes of the main process when the AOFRW completes to drain the remaining part of the buffer and fsync it.
* double disk IO for the data that arrives during AOFRW (had to be written to both the old and new AOF files)
The main modifications of this PR:
1. Remove the AOF rewrite buffer and related code.
2. Divide the AOF into multiple files, they are classified as two types, one is the the `BASE` type,
it represents the full amount of data (Maybe AOF or RDB format) after each AOFRW, there is only
one `BASE` file at most. The second is `INCR` type, may have more than one. They represent the
incremental commands since the last AOFRW.
3. Use a AOF manifest file to record and manage these AOF files mentioned above.
4. The original configuration of `appendfilename` will be the base part of the new file name, for example:
`appendonly.aof.1.base.rdb` and `appendonly.aof.2.incr.aof`
5. Add manifest-related TCL tests, and modified some existing tests that depend on the `appendfilename`
6. Remove the `aof_rewrite_buffer_length` field in info.
7. Add `aof-disable-auto-gc` configuration. By default we're automatically deleting HISTORY type AOFs.
It also gives users the opportunity to preserve the history AOFs. just for testing use now.
8. Add AOFRW limiting measure. When the AOFRW failures reaches the threshold (3 times now),
we will delay the execution of the next AOFRW by 1 minute. If the next AOFRW also fails, it will be
delayed by 2 minutes. The next is 4, 8, 16, the maximum delay is 60 minutes (1 hour). During the limit
period, we can still use the 'bgrewriteaof' command to execute AOFRW immediately.
9. Support upgrade (load) data from old version redis.
10. Add `appenddirname` configuration, as the directory name of the append only files. All AOF files and
manifest file will be placed in this directory.
11. Only the last AOF file (BASE or INCR) can be truncated. Otherwise redis will exit even if
`aof-load-truncated` is enabled.
Co-authored-by: Oran Agra <oran@redislabs.com>
2022-01-04 01:14:13 +08:00
cleanup :
if ( fd ! = - 1 ) close ( fd ) ;
sdsfree ( am_name ) ;
sdsfree ( am_filepath ) ;
sdsfree ( tmp_am_name ) ;
sdsfree ( tmp_am_filepath ) ;
return ret ;
Allow an AOF rewrite buffer > 2GB (Fix for issue #504).
During the AOF rewrite process, the parent process needs to accumulate
the new writes in an in-memory buffer: when the child will terminate the
AOF rewriting process this buffer (that ist the difference between the
dataset when the rewrite was started, and the current dataset) is
flushed to the new AOF file.
We used to implement this buffer using an sds.c string, but sds.c has a
2GB limit. Sometimes the dataset can be big enough, the amount of writes
so high, and the rewrite process slow enough that we overflow the 2GB
limit, causing a crash, documented on github by issue #504.
In order to prevent this from happening, this commit introduces a new
system to accumulate writes, implemented by a linked list of blocks of
10 MB each, so that we also avoid paying the reallocation cost.
Note that theoretically modern operating systems may implement realloc()
simply as a remaping of the old pages, thus with very good performances,
see for instance the mremap() syscall on Linux. However this is not
always true, and jemalloc by default avoids doing this because there are
issues with the current implementation of mremap().
For this reason we are using a linked list of blocks instead of a single
block that gets reallocated again and again.
The changes in this commit lacks testing, that will be performed before
merging into the unstable branch. This fix will not enter 2.4 because it
is too invasive. However 2.4 will log a warning when the AOF rewrite
buffer is near to the 2GB limit.
2012-05-22 13:03:41 +02:00
}
Implement Multi Part AOF mechanism to avoid AOFRW overheads. (#9788)
Implement Multi-Part AOF mechanism to avoid overheads during AOFRW.
Introducing a folder with multiple AOF files tracked by a manifest file.
The main issues with the the original AOFRW mechanism are:
* buffering of commands that are processed during rewrite (consuming a lot of RAM)
* freezes of the main process when the AOFRW completes to drain the remaining part of the buffer and fsync it.
* double disk IO for the data that arrives during AOFRW (had to be written to both the old and new AOF files)
The main modifications of this PR:
1. Remove the AOF rewrite buffer and related code.
2. Divide the AOF into multiple files, they are classified as two types, one is the the `BASE` type,
it represents the full amount of data (Maybe AOF or RDB format) after each AOFRW, there is only
one `BASE` file at most. The second is `INCR` type, may have more than one. They represent the
incremental commands since the last AOFRW.
3. Use a AOF manifest file to record and manage these AOF files mentioned above.
4. The original configuration of `appendfilename` will be the base part of the new file name, for example:
`appendonly.aof.1.base.rdb` and `appendonly.aof.2.incr.aof`
5. Add manifest-related TCL tests, and modified some existing tests that depend on the `appendfilename`
6. Remove the `aof_rewrite_buffer_length` field in info.
7. Add `aof-disable-auto-gc` configuration. By default we're automatically deleting HISTORY type AOFs.
It also gives users the opportunity to preserve the history AOFs. just for testing use now.
8. Add AOFRW limiting measure. When the AOFRW failures reaches the threshold (3 times now),
we will delay the execution of the next AOFRW by 1 minute. If the next AOFRW also fails, it will be
delayed by 2 minutes. The next is 4, 8, 16, the maximum delay is 60 minutes (1 hour). During the limit
period, we can still use the 'bgrewriteaof' command to execute AOFRW immediately.
9. Support upgrade (load) data from old version redis.
10. Add `appenddirname` configuration, as the directory name of the append only files. All AOF files and
manifest file will be placed in this directory.
11. Only the last AOF file (BASE or INCR) can be truncated. Otherwise redis will exit even if
`aof-load-truncated` is enabled.
Co-authored-by: Oran Agra <oran@redislabs.com>
2022-01-04 01:14:13 +08:00
/* Persist the aofManifest information pointed to by am to disk. */
int persistAofManifest ( aofManifest * am ) {
if ( am - > dirty = = 0 ) {
return C_OK ;
}
sds amstr = getAofManifestAsString ( am ) ;
int ret = writeAofManifestFile ( amstr ) ;
sdsfree ( amstr ) ;
if ( ret = = C_OK ) am - > dirty = 0 ;
return ret ;
}
/* Called in `loadAppendOnlyFiles` when we upgrade from a old version redis.
*
* 1 ) Create AOF directory use ' server . aof_dirname ' as the name .
* 2 ) Use ' server . aof_filename ' to construct a BASE type aofInfo and add it to
* aofManifest , then persist the manifest file to AOF directory .
* 3 ) Move the old AOF file ( server . aof_filename ) to AOF directory .
*
* If any of the above steps fails or crash occurs , this will not cause any
* problems , and redis will retry the upgrade process when it restarts .
*/
void aofUpgradePrepare ( aofManifest * am ) {
serverAssert ( ! aofFileExist ( server . aof_filename ) ) ;
/* Create AOF directory use 'server.aof_dirname' as the name. */
if ( dirCreateIfMissing ( server . aof_dirname ) = = - 1 ) {
serverLog ( LL_WARNING , " Can't open or create append-only dir %s: %s " ,
server . aof_dirname , strerror ( errno ) ) ;
exit ( 1 ) ;
}
/* Manually construct a BASE type aofInfo and add it to aofManifest. */
if ( am - > base_aof_info ) aofInfoFree ( am - > base_aof_info ) ;
aofInfo * ai = aofInfoCreate ( ) ;
ai - > file_name = sdsnew ( server . aof_filename ) ;
ai - > file_seq = 1 ;
ai - > file_type = AOF_FILE_TYPE_BASE ;
am - > base_aof_info = ai ;
am - > curr_base_file_seq = 1 ;
am - > dirty = 1 ;
/* Persist the manifest file to AOF directory. */
if ( persistAofManifest ( am ) ! = C_OK ) {
exit ( 1 ) ;
}
/* Move the old AOF file to AOF directory. */
sds aof_filepath = makePath ( server . aof_dirname , server . aof_filename ) ;
if ( rename ( server . aof_filename , aof_filepath ) = = - 1 ) {
serverLog ( LL_WARNING ,
" Error trying to move the old AOF file %s into dir %s: %s " ,
server . aof_filename ,
server . aof_dirname ,
strerror ( errno ) ) ;
sdsfree ( aof_filepath ) ;
2022-02-16 15:42:04 +01:00
exit ( 1 ) ;
Implement Multi Part AOF mechanism to avoid AOFRW overheads. (#9788)
Implement Multi-Part AOF mechanism to avoid overheads during AOFRW.
Introducing a folder with multiple AOF files tracked by a manifest file.
The main issues with the the original AOFRW mechanism are:
* buffering of commands that are processed during rewrite (consuming a lot of RAM)
* freezes of the main process when the AOFRW completes to drain the remaining part of the buffer and fsync it.
* double disk IO for the data that arrives during AOFRW (had to be written to both the old and new AOF files)
The main modifications of this PR:
1. Remove the AOF rewrite buffer and related code.
2. Divide the AOF into multiple files, they are classified as two types, one is the the `BASE` type,
it represents the full amount of data (Maybe AOF or RDB format) after each AOFRW, there is only
one `BASE` file at most. The second is `INCR` type, may have more than one. They represent the
incremental commands since the last AOFRW.
3. Use a AOF manifest file to record and manage these AOF files mentioned above.
4. The original configuration of `appendfilename` will be the base part of the new file name, for example:
`appendonly.aof.1.base.rdb` and `appendonly.aof.2.incr.aof`
5. Add manifest-related TCL tests, and modified some existing tests that depend on the `appendfilename`
6. Remove the `aof_rewrite_buffer_length` field in info.
7. Add `aof-disable-auto-gc` configuration. By default we're automatically deleting HISTORY type AOFs.
It also gives users the opportunity to preserve the history AOFs. just for testing use now.
8. Add AOFRW limiting measure. When the AOFRW failures reaches the threshold (3 times now),
we will delay the execution of the next AOFRW by 1 minute. If the next AOFRW also fails, it will be
delayed by 2 minutes. The next is 4, 8, 16, the maximum delay is 60 minutes (1 hour). During the limit
period, we can still use the 'bgrewriteaof' command to execute AOFRW immediately.
9. Support upgrade (load) data from old version redis.
10. Add `appenddirname` configuration, as the directory name of the append only files. All AOF files and
manifest file will be placed in this directory.
11. Only the last AOF file (BASE or INCR) can be truncated. Otherwise redis will exit even if
`aof-load-truncated` is enabled.
Co-authored-by: Oran Agra <oran@redislabs.com>
2022-01-04 01:14:13 +08:00
}
sdsfree ( aof_filepath ) ;
serverLog ( LL_NOTICE , " Successfully migrated an old-style AOF file (%s) into the AOF directory (%s). " ,
server . aof_filename , server . aof_dirname ) ;
}
/* When AOFRW success, the previous BASE and INCR AOFs will
* become HISTORY type and be moved into ' history_aof_list ' .
*
* The function will traverse the ' history_aof_list ' and submit
* the delete task to the bio thread .
*/
int aofDelHistoryFiles ( void ) {
if ( server . aof_manifest = = NULL | |
server . aof_disable_auto_gc = = 1 | |
! listLength ( server . aof_manifest - > history_aof_list ) )
{
return C_OK ;
}
Allow an AOF rewrite buffer > 2GB (Fix for issue #504).
During the AOF rewrite process, the parent process needs to accumulate
the new writes in an in-memory buffer: when the child will terminate the
AOF rewriting process this buffer (that ist the difference between the
dataset when the rewrite was started, and the current dataset) is
flushed to the new AOF file.
We used to implement this buffer using an sds.c string, but sds.c has a
2GB limit. Sometimes the dataset can be big enough, the amount of writes
so high, and the rewrite process slow enough that we overflow the 2GB
limit, causing a crash, documented on github by issue #504.
In order to prevent this from happening, this commit introduces a new
system to accumulate writes, implemented by a linked list of blocks of
10 MB each, so that we also avoid paying the reallocation cost.
Note that theoretically modern operating systems may implement realloc()
simply as a remaping of the old pages, thus with very good performances,
see for instance the mremap() syscall on Linux. However this is not
always true, and jemalloc by default avoids doing this because there are
issues with the current implementation of mremap().
For this reason we are using a linked list of blocks instead of a single
block that gets reallocated again and again.
The changes in this commit lacks testing, that will be performed before
merging into the unstable branch. This fix will not enter 2.4 because it
is too invasive. However 2.4 will log a warning when the AOF rewrite
buffer is near to the 2GB limit.
2012-05-22 13:03:41 +02:00
listNode * ln ;
listIter li ;
Implement Multi Part AOF mechanism to avoid AOFRW overheads. (#9788)
Implement Multi-Part AOF mechanism to avoid overheads during AOFRW.
Introducing a folder with multiple AOF files tracked by a manifest file.
The main issues with the the original AOFRW mechanism are:
* buffering of commands that are processed during rewrite (consuming a lot of RAM)
* freezes of the main process when the AOFRW completes to drain the remaining part of the buffer and fsync it.
* double disk IO for the data that arrives during AOFRW (had to be written to both the old and new AOF files)
The main modifications of this PR:
1. Remove the AOF rewrite buffer and related code.
2. Divide the AOF into multiple files, they are classified as two types, one is the the `BASE` type,
it represents the full amount of data (Maybe AOF or RDB format) after each AOFRW, there is only
one `BASE` file at most. The second is `INCR` type, may have more than one. They represent the
incremental commands since the last AOFRW.
3. Use a AOF manifest file to record and manage these AOF files mentioned above.
4. The original configuration of `appendfilename` will be the base part of the new file name, for example:
`appendonly.aof.1.base.rdb` and `appendonly.aof.2.incr.aof`
5. Add manifest-related TCL tests, and modified some existing tests that depend on the `appendfilename`
6. Remove the `aof_rewrite_buffer_length` field in info.
7. Add `aof-disable-auto-gc` configuration. By default we're automatically deleting HISTORY type AOFs.
It also gives users the opportunity to preserve the history AOFs. just for testing use now.
8. Add AOFRW limiting measure. When the AOFRW failures reaches the threshold (3 times now),
we will delay the execution of the next AOFRW by 1 minute. If the next AOFRW also fails, it will be
delayed by 2 minutes. The next is 4, 8, 16, the maximum delay is 60 minutes (1 hour). During the limit
period, we can still use the 'bgrewriteaof' command to execute AOFRW immediately.
9. Support upgrade (load) data from old version redis.
10. Add `appenddirname` configuration, as the directory name of the append only files. All AOF files and
manifest file will be placed in this directory.
11. Only the last AOF file (BASE or INCR) can be truncated. Otherwise redis will exit even if
`aof-load-truncated` is enabled.
Co-authored-by: Oran Agra <oran@redislabs.com>
2022-01-04 01:14:13 +08:00
listRewind ( server . aof_manifest - > history_aof_list , & li ) ;
while ( ( ln = listNext ( & li ) ) ! = NULL ) {
aofInfo * ai = ( aofInfo * ) ln - > value ;
serverAssert ( ai - > file_type = = AOF_FILE_TYPE_HIST ) ;
serverLog ( LL_NOTICE , " Removing the history file %s in the background " , ai - > file_name ) ;
sds aof_filepath = makePath ( server . aof_dirname , ai - > file_name ) ;
bg_unlink ( aof_filepath ) ;
sdsfree ( aof_filepath ) ;
listDelNode ( server . aof_manifest - > history_aof_list , ln ) ;
}
server . aof_manifest - > dirty = 1 ;
return persistAofManifest ( server . aof_manifest ) ;
}
2022-04-26 21:31:19 +08:00
/* Used to clean up temp INCR AOF when AOFRW fails. */
2023-05-02 17:31:32 -07:00
void aofDelTempIncrAofFile ( void ) {
2022-04-26 21:31:19 +08:00
sds aof_filename = getTempIncrAofName ( ) ;
sds aof_filepath = makePath ( server . aof_dirname , aof_filename ) ;
serverLog ( LL_NOTICE , " Removing the temp incr aof file %s in the background " , aof_filename ) ;
bg_unlink ( aof_filepath ) ;
sdsfree ( aof_filepath ) ;
sdsfree ( aof_filename ) ;
return ;
}
Implement Multi Part AOF mechanism to avoid AOFRW overheads. (#9788)
Implement Multi-Part AOF mechanism to avoid overheads during AOFRW.
Introducing a folder with multiple AOF files tracked by a manifest file.
The main issues with the the original AOFRW mechanism are:
* buffering of commands that are processed during rewrite (consuming a lot of RAM)
* freezes of the main process when the AOFRW completes to drain the remaining part of the buffer and fsync it.
* double disk IO for the data that arrives during AOFRW (had to be written to both the old and new AOF files)
The main modifications of this PR:
1. Remove the AOF rewrite buffer and related code.
2. Divide the AOF into multiple files, they are classified as two types, one is the the `BASE` type,
it represents the full amount of data (Maybe AOF or RDB format) after each AOFRW, there is only
one `BASE` file at most. The second is `INCR` type, may have more than one. They represent the
incremental commands since the last AOFRW.
3. Use a AOF manifest file to record and manage these AOF files mentioned above.
4. The original configuration of `appendfilename` will be the base part of the new file name, for example:
`appendonly.aof.1.base.rdb` and `appendonly.aof.2.incr.aof`
5. Add manifest-related TCL tests, and modified some existing tests that depend on the `appendfilename`
6. Remove the `aof_rewrite_buffer_length` field in info.
7. Add `aof-disable-auto-gc` configuration. By default we're automatically deleting HISTORY type AOFs.
It also gives users the opportunity to preserve the history AOFs. just for testing use now.
8. Add AOFRW limiting measure. When the AOFRW failures reaches the threshold (3 times now),
we will delay the execution of the next AOFRW by 1 minute. If the next AOFRW also fails, it will be
delayed by 2 minutes. The next is 4, 8, 16, the maximum delay is 60 minutes (1 hour). During the limit
period, we can still use the 'bgrewriteaof' command to execute AOFRW immediately.
9. Support upgrade (load) data from old version redis.
10. Add `appenddirname` configuration, as the directory name of the append only files. All AOF files and
manifest file will be placed in this directory.
11. Only the last AOF file (BASE or INCR) can be truncated. Otherwise redis will exit even if
`aof-load-truncated` is enabled.
Co-authored-by: Oran Agra <oran@redislabs.com>
2022-01-04 01:14:13 +08:00
/* Called after `loadDataFromDisk` when redis start. If `server.aof_state` is
Always create base AOF file when redis start from empty. (#10102)
Force create a BASE file (use a foreground `rewriteAppendOnlyFile`) when redis starts from an
empty data set and `appendonly` is yes.
The reasoning is that normally, after redis is running for some time, and the AOF has gone though
a few rewrites, there's always a base rdb file. and the scenario where the base file is missing, is
kinda rare (happens only at empty startup), so this change normalizes it.
But more importantly, there are or could be some complex modules that are started with some
configuration, when they create persistence they write that configuration to RDB AUX fields, so
that can can always know with which configuration the persistence file they're loading was
created (could be critical). there is (was) one scenario in which they could load their persisted data,
and that configuration was missing, and this change fixes it.
Add a new module event: REDISMODULE_SUBEVENT_PERSISTENCE_SYNC_AOF_START, similar to
REDISMODULE_SUBEVENT_PERSISTENCE_AOF_START which is async.
Co-authored-by: Oran Agra <oran@redislabs.com>
2022-01-13 14:49:26 +08:00
* ' AOF_ON ' , It will do three things :
* 1. Force create a BASE file when redis starts with an empty dataset
* 2. Open the last opened INCR type AOF for writing , If not , create a new one
* 3. Synchronously update the manifest file to the disk
Implement Multi Part AOF mechanism to avoid AOFRW overheads. (#9788)
Implement Multi-Part AOF mechanism to avoid overheads during AOFRW.
Introducing a folder with multiple AOF files tracked by a manifest file.
The main issues with the the original AOFRW mechanism are:
* buffering of commands that are processed during rewrite (consuming a lot of RAM)
* freezes of the main process when the AOFRW completes to drain the remaining part of the buffer and fsync it.
* double disk IO for the data that arrives during AOFRW (had to be written to both the old and new AOF files)
The main modifications of this PR:
1. Remove the AOF rewrite buffer and related code.
2. Divide the AOF into multiple files, they are classified as two types, one is the the `BASE` type,
it represents the full amount of data (Maybe AOF or RDB format) after each AOFRW, there is only
one `BASE` file at most. The second is `INCR` type, may have more than one. They represent the
incremental commands since the last AOFRW.
3. Use a AOF manifest file to record and manage these AOF files mentioned above.
4. The original configuration of `appendfilename` will be the base part of the new file name, for example:
`appendonly.aof.1.base.rdb` and `appendonly.aof.2.incr.aof`
5. Add manifest-related TCL tests, and modified some existing tests that depend on the `appendfilename`
6. Remove the `aof_rewrite_buffer_length` field in info.
7. Add `aof-disable-auto-gc` configuration. By default we're automatically deleting HISTORY type AOFs.
It also gives users the opportunity to preserve the history AOFs. just for testing use now.
8. Add AOFRW limiting measure. When the AOFRW failures reaches the threshold (3 times now),
we will delay the execution of the next AOFRW by 1 minute. If the next AOFRW also fails, it will be
delayed by 2 minutes. The next is 4, 8, 16, the maximum delay is 60 minutes (1 hour). During the limit
period, we can still use the 'bgrewriteaof' command to execute AOFRW immediately.
9. Support upgrade (load) data from old version redis.
10. Add `appenddirname` configuration, as the directory name of the append only files. All AOF files and
manifest file will be placed in this directory.
11. Only the last AOF file (BASE or INCR) can be truncated. Otherwise redis will exit even if
`aof-load-truncated` is enabled.
Co-authored-by: Oran Agra <oran@redislabs.com>
2022-01-04 01:14:13 +08:00
*
Always create base AOF file when redis start from empty. (#10102)
Force create a BASE file (use a foreground `rewriteAppendOnlyFile`) when redis starts from an
empty data set and `appendonly` is yes.
The reasoning is that normally, after redis is running for some time, and the AOF has gone though
a few rewrites, there's always a base rdb file. and the scenario where the base file is missing, is
kinda rare (happens only at empty startup), so this change normalizes it.
But more importantly, there are or could be some complex modules that are started with some
configuration, when they create persistence they write that configuration to RDB AUX fields, so
that can can always know with which configuration the persistence file they're loading was
created (could be critical). there is (was) one scenario in which they could load their persisted data,
and that configuration was missing, and this change fixes it.
Add a new module event: REDISMODULE_SUBEVENT_PERSISTENCE_SYNC_AOF_START, similar to
REDISMODULE_SUBEVENT_PERSISTENCE_AOF_START which is async.
Co-authored-by: Oran Agra <oran@redislabs.com>
2022-01-13 14:49:26 +08:00
* If any of the above steps fails , the redis process will exit .
Implement Multi Part AOF mechanism to avoid AOFRW overheads. (#9788)
Implement Multi-Part AOF mechanism to avoid overheads during AOFRW.
Introducing a folder with multiple AOF files tracked by a manifest file.
The main issues with the the original AOFRW mechanism are:
* buffering of commands that are processed during rewrite (consuming a lot of RAM)
* freezes of the main process when the AOFRW completes to drain the remaining part of the buffer and fsync it.
* double disk IO for the data that arrives during AOFRW (had to be written to both the old and new AOF files)
The main modifications of this PR:
1. Remove the AOF rewrite buffer and related code.
2. Divide the AOF into multiple files, they are classified as two types, one is the the `BASE` type,
it represents the full amount of data (Maybe AOF or RDB format) after each AOFRW, there is only
one `BASE` file at most. The second is `INCR` type, may have more than one. They represent the
incremental commands since the last AOFRW.
3. Use a AOF manifest file to record and manage these AOF files mentioned above.
4. The original configuration of `appendfilename` will be the base part of the new file name, for example:
`appendonly.aof.1.base.rdb` and `appendonly.aof.2.incr.aof`
5. Add manifest-related TCL tests, and modified some existing tests that depend on the `appendfilename`
6. Remove the `aof_rewrite_buffer_length` field in info.
7. Add `aof-disable-auto-gc` configuration. By default we're automatically deleting HISTORY type AOFs.
It also gives users the opportunity to preserve the history AOFs. just for testing use now.
8. Add AOFRW limiting measure. When the AOFRW failures reaches the threshold (3 times now),
we will delay the execution of the next AOFRW by 1 minute. If the next AOFRW also fails, it will be
delayed by 2 minutes. The next is 4, 8, 16, the maximum delay is 60 minutes (1 hour). During the limit
period, we can still use the 'bgrewriteaof' command to execute AOFRW immediately.
9. Support upgrade (load) data from old version redis.
10. Add `appenddirname` configuration, as the directory name of the append only files. All AOF files and
manifest file will be placed in this directory.
11. Only the last AOF file (BASE or INCR) can be truncated. Otherwise redis will exit even if
`aof-load-truncated` is enabled.
Co-authored-by: Oran Agra <oran@redislabs.com>
2022-01-04 01:14:13 +08:00
*/
void aofOpenIfNeededOnServerStart ( void ) {
if ( server . aof_state ! = AOF_ON ) {
return ;
}
serverAssert ( server . aof_manifest ! = NULL ) ;
serverAssert ( server . aof_fd = = - 1 ) ;
if ( dirCreateIfMissing ( server . aof_dirname ) = = - 1 ) {
serverLog ( LL_WARNING , " Can't open or create append-only dir %s: %s " ,
server . aof_dirname , strerror ( errno ) ) ;
exit ( 1 ) ;
}
Always create base AOF file when redis start from empty. (#10102)
Force create a BASE file (use a foreground `rewriteAppendOnlyFile`) when redis starts from an
empty data set and `appendonly` is yes.
The reasoning is that normally, after redis is running for some time, and the AOF has gone though
a few rewrites, there's always a base rdb file. and the scenario where the base file is missing, is
kinda rare (happens only at empty startup), so this change normalizes it.
But more importantly, there are or could be some complex modules that are started with some
configuration, when they create persistence they write that configuration to RDB AUX fields, so
that can can always know with which configuration the persistence file they're loading was
created (could be critical). there is (was) one scenario in which they could load their persisted data,
and that configuration was missing, and this change fixes it.
Add a new module event: REDISMODULE_SUBEVENT_PERSISTENCE_SYNC_AOF_START, similar to
REDISMODULE_SUBEVENT_PERSISTENCE_AOF_START which is async.
Co-authored-by: Oran Agra <oran@redislabs.com>
2022-01-13 14:49:26 +08:00
/* If we start with an empty dataset, we will force create a BASE file. */
2022-05-26 22:23:05 +08:00
size_t incr_aof_len = listLength ( server . aof_manifest - > incr_aof_list ) ;
if ( ! server . aof_manifest - > base_aof_info & & ! incr_aof_len ) {
Always create base AOF file when redis start from empty. (#10102)
Force create a BASE file (use a foreground `rewriteAppendOnlyFile`) when redis starts from an
empty data set and `appendonly` is yes.
The reasoning is that normally, after redis is running for some time, and the AOF has gone though
a few rewrites, there's always a base rdb file. and the scenario where the base file is missing, is
kinda rare (happens only at empty startup), so this change normalizes it.
But more importantly, there are or could be some complex modules that are started with some
configuration, when they create persistence they write that configuration to RDB AUX fields, so
that can can always know with which configuration the persistence file they're loading was
created (could be critical). there is (was) one scenario in which they could load their persisted data,
and that configuration was missing, and this change fixes it.
Add a new module event: REDISMODULE_SUBEVENT_PERSISTENCE_SYNC_AOF_START, similar to
REDISMODULE_SUBEVENT_PERSISTENCE_AOF_START which is async.
Co-authored-by: Oran Agra <oran@redislabs.com>
2022-01-13 14:49:26 +08:00
sds base_name = getNewBaseFileNameAndMarkPreAsHistory ( server . aof_manifest ) ;
sds base_filepath = makePath ( server . aof_dirname , base_name ) ;
if ( rewriteAppendOnlyFile ( base_filepath ) ! = C_OK ) {
exit ( 1 ) ;
}
sdsfree ( base_filepath ) ;
2022-05-26 22:23:05 +08:00
serverLog ( LL_NOTICE , " Creating AOF base file %s on server start " ,
base_name ) ;
Always create base AOF file when redis start from empty. (#10102)
Force create a BASE file (use a foreground `rewriteAppendOnlyFile`) when redis starts from an
empty data set and `appendonly` is yes.
The reasoning is that normally, after redis is running for some time, and the AOF has gone though
a few rewrites, there's always a base rdb file. and the scenario where the base file is missing, is
kinda rare (happens only at empty startup), so this change normalizes it.
But more importantly, there are or could be some complex modules that are started with some
configuration, when they create persistence they write that configuration to RDB AUX fields, so
that can can always know with which configuration the persistence file they're loading was
created (could be critical). there is (was) one scenario in which they could load their persisted data,
and that configuration was missing, and this change fixes it.
Add a new module event: REDISMODULE_SUBEVENT_PERSISTENCE_SYNC_AOF_START, similar to
REDISMODULE_SUBEVENT_PERSISTENCE_AOF_START which is async.
Co-authored-by: Oran Agra <oran@redislabs.com>
2022-01-13 14:49:26 +08:00
}
Implement Multi Part AOF mechanism to avoid AOFRW overheads. (#9788)
Implement Multi-Part AOF mechanism to avoid overheads during AOFRW.
Introducing a folder with multiple AOF files tracked by a manifest file.
The main issues with the the original AOFRW mechanism are:
* buffering of commands that are processed during rewrite (consuming a lot of RAM)
* freezes of the main process when the AOFRW completes to drain the remaining part of the buffer and fsync it.
* double disk IO for the data that arrives during AOFRW (had to be written to both the old and new AOF files)
The main modifications of this PR:
1. Remove the AOF rewrite buffer and related code.
2. Divide the AOF into multiple files, they are classified as two types, one is the the `BASE` type,
it represents the full amount of data (Maybe AOF or RDB format) after each AOFRW, there is only
one `BASE` file at most. The second is `INCR` type, may have more than one. They represent the
incremental commands since the last AOFRW.
3. Use a AOF manifest file to record and manage these AOF files mentioned above.
4. The original configuration of `appendfilename` will be the base part of the new file name, for example:
`appendonly.aof.1.base.rdb` and `appendonly.aof.2.incr.aof`
5. Add manifest-related TCL tests, and modified some existing tests that depend on the `appendfilename`
6. Remove the `aof_rewrite_buffer_length` field in info.
7. Add `aof-disable-auto-gc` configuration. By default we're automatically deleting HISTORY type AOFs.
It also gives users the opportunity to preserve the history AOFs. just for testing use now.
8. Add AOFRW limiting measure. When the AOFRW failures reaches the threshold (3 times now),
we will delay the execution of the next AOFRW by 1 minute. If the next AOFRW also fails, it will be
delayed by 2 minutes. The next is 4, 8, 16, the maximum delay is 60 minutes (1 hour). During the limit
period, we can still use the 'bgrewriteaof' command to execute AOFRW immediately.
9. Support upgrade (load) data from old version redis.
10. Add `appenddirname` configuration, as the directory name of the append only files. All AOF files and
manifest file will be placed in this directory.
11. Only the last AOF file (BASE or INCR) can be truncated. Otherwise redis will exit even if
`aof-load-truncated` is enabled.
Co-authored-by: Oran Agra <oran@redislabs.com>
2022-01-04 01:14:13 +08:00
/* Because we will 'exit(1)' if open AOF or persistent manifest fails, so
* we don ' t need atomic modification here . */
sds aof_name = getLastIncrAofName ( server . aof_manifest ) ;
/* Here we should use 'O_APPEND' flag. */
sds aof_filepath = makePath ( server . aof_dirname , aof_name ) ;
server . aof_fd = open ( aof_filepath , O_WRONLY | O_APPEND | O_CREAT , 0644 ) ;
sdsfree ( aof_filepath ) ;
if ( server . aof_fd = = - 1 ) {
serverLog ( LL_WARNING , " Can't open the append-only file %s: %s " ,
aof_name , strerror ( errno ) ) ;
exit ( 1 ) ;
}
Always create base AOF file when redis start from empty. (#10102)
Force create a BASE file (use a foreground `rewriteAppendOnlyFile`) when redis starts from an
empty data set and `appendonly` is yes.
The reasoning is that normally, after redis is running for some time, and the AOF has gone though
a few rewrites, there's always a base rdb file. and the scenario where the base file is missing, is
kinda rare (happens only at empty startup), so this change normalizes it.
But more importantly, there are or could be some complex modules that are started with some
configuration, when they create persistence they write that configuration to RDB AUX fields, so
that can can always know with which configuration the persistence file they're loading was
created (could be critical). there is (was) one scenario in which they could load their persisted data,
and that configuration was missing, and this change fixes it.
Add a new module event: REDISMODULE_SUBEVENT_PERSISTENCE_SYNC_AOF_START, similar to
REDISMODULE_SUBEVENT_PERSISTENCE_AOF_START which is async.
Co-authored-by: Oran Agra <oran@redislabs.com>
2022-01-13 14:49:26 +08:00
/* Persist our changes. */
Implement Multi Part AOF mechanism to avoid AOFRW overheads. (#9788)
Implement Multi-Part AOF mechanism to avoid overheads during AOFRW.
Introducing a folder with multiple AOF files tracked by a manifest file.
The main issues with the the original AOFRW mechanism are:
* buffering of commands that are processed during rewrite (consuming a lot of RAM)
* freezes of the main process when the AOFRW completes to drain the remaining part of the buffer and fsync it.
* double disk IO for the data that arrives during AOFRW (had to be written to both the old and new AOF files)
The main modifications of this PR:
1. Remove the AOF rewrite buffer and related code.
2. Divide the AOF into multiple files, they are classified as two types, one is the the `BASE` type,
it represents the full amount of data (Maybe AOF or RDB format) after each AOFRW, there is only
one `BASE` file at most. The second is `INCR` type, may have more than one. They represent the
incremental commands since the last AOFRW.
3. Use a AOF manifest file to record and manage these AOF files mentioned above.
4. The original configuration of `appendfilename` will be the base part of the new file name, for example:
`appendonly.aof.1.base.rdb` and `appendonly.aof.2.incr.aof`
5. Add manifest-related TCL tests, and modified some existing tests that depend on the `appendfilename`
6. Remove the `aof_rewrite_buffer_length` field in info.
7. Add `aof-disable-auto-gc` configuration. By default we're automatically deleting HISTORY type AOFs.
It also gives users the opportunity to preserve the history AOFs. just for testing use now.
8. Add AOFRW limiting measure. When the AOFRW failures reaches the threshold (3 times now),
we will delay the execution of the next AOFRW by 1 minute. If the next AOFRW also fails, it will be
delayed by 2 minutes. The next is 4, 8, 16, the maximum delay is 60 minutes (1 hour). During the limit
period, we can still use the 'bgrewriteaof' command to execute AOFRW immediately.
9. Support upgrade (load) data from old version redis.
10. Add `appenddirname` configuration, as the directory name of the append only files. All AOF files and
manifest file will be placed in this directory.
11. Only the last AOF file (BASE or INCR) can be truncated. Otherwise redis will exit even if
`aof-load-truncated` is enabled.
Co-authored-by: Oran Agra <oran@redislabs.com>
2022-01-04 01:14:13 +08:00
int ret = persistAofManifest ( server . aof_manifest ) ;
if ( ret ! = C_OK ) {
exit ( 1 ) ;
}
2022-02-22 08:59:23 +02:00
server . aof_last_incr_size = getAppendOnlyFileSize ( aof_name , NULL ) ;
Fix fork done handler wrongly update fsync metrics and enhance AOF_ FSYNC_ALWAYS (#11973)
This PR fix several unrelated bugs that were discovered by the same set of tests
(WAITAOF tests in #11713), could make the `WAITAOF` test hang.
The change in `backgroundRewriteDoneHandler` is about MP-AOF.
That leftover / old code assumes that we started a new AOF file just now
(when we have a new base into which we're gonna incrementally write), but
the fact is that with MP-AOF, the fork done handler doesn't really affect the
incremental file being maintained by the parent process, there's no reason to
re-issue `SELECT`, and no reason to update any of the fsync variables in that flow.
This should have been deleted with MP-AOF (introduced in #9788, 7.0).
The damage is that the update to `aof_fsync_offset` will cause us to miss an fsync
in `flushAppendOnlyFile`, that happens if we stop write commands in `AOF_FSYNC_EVERYSEC`
while an AOFRW is in progress. This caused a new `WAITAOF` test to sometime hang forever.
Also because of MP-AOF, we needed to change `aof_fsync_offset` to `aof_last_incr_fsync_offset`
and match it to `aof_last_incr_size` in `flushAppendOnlyFile`. This is because in the past we compared
`aof_fsync_offset` and `aof_current_size`, but with MP-AOF it could be the total AOF file will be
smaller after AOFRW, and the (already existing) incr file still has data that needs to be fsynced.
The change in `flushAppendOnlyFile`, about the `AOF_FSYNC_ALWAYS`, it is follow #6053
(the details is in #5985), we also check `AOF_FSYNC_ALWAYS` to handle a case where
appendfsync is changed from everysec to always while there is data that's written but not yet fsynced.
2023-03-29 20:17:05 +08:00
server . aof_last_incr_fsync_offset = server . aof_last_incr_size ;
2022-05-26 22:23:05 +08:00
if ( incr_aof_len ) {
serverLog ( LL_NOTICE , " Opening AOF incr file %s on server start " , aof_name ) ;
} else {
serverLog ( LL_NOTICE , " Creating AOF incr file %s on server start " , aof_name ) ;
}
Implement Multi Part AOF mechanism to avoid AOFRW overheads. (#9788)
Implement Multi-Part AOF mechanism to avoid overheads during AOFRW.
Introducing a folder with multiple AOF files tracked by a manifest file.
The main issues with the the original AOFRW mechanism are:
* buffering of commands that are processed during rewrite (consuming a lot of RAM)
* freezes of the main process when the AOFRW completes to drain the remaining part of the buffer and fsync it.
* double disk IO for the data that arrives during AOFRW (had to be written to both the old and new AOF files)
The main modifications of this PR:
1. Remove the AOF rewrite buffer and related code.
2. Divide the AOF into multiple files, they are classified as two types, one is the the `BASE` type,
it represents the full amount of data (Maybe AOF or RDB format) after each AOFRW, there is only
one `BASE` file at most. The second is `INCR` type, may have more than one. They represent the
incremental commands since the last AOFRW.
3. Use a AOF manifest file to record and manage these AOF files mentioned above.
4. The original configuration of `appendfilename` will be the base part of the new file name, for example:
`appendonly.aof.1.base.rdb` and `appendonly.aof.2.incr.aof`
5. Add manifest-related TCL tests, and modified some existing tests that depend on the `appendfilename`
6. Remove the `aof_rewrite_buffer_length` field in info.
7. Add `aof-disable-auto-gc` configuration. By default we're automatically deleting HISTORY type AOFs.
It also gives users the opportunity to preserve the history AOFs. just for testing use now.
8. Add AOFRW limiting measure. When the AOFRW failures reaches the threshold (3 times now),
we will delay the execution of the next AOFRW by 1 minute. If the next AOFRW also fails, it will be
delayed by 2 minutes. The next is 4, 8, 16, the maximum delay is 60 minutes (1 hour). During the limit
period, we can still use the 'bgrewriteaof' command to execute AOFRW immediately.
9. Support upgrade (load) data from old version redis.
10. Add `appenddirname` configuration, as the directory name of the append only files. All AOF files and
manifest file will be placed in this directory.
11. Only the last AOF file (BASE or INCR) can be truncated. Otherwise redis will exit even if
`aof-load-truncated` is enabled.
Co-authored-by: Oran Agra <oran@redislabs.com>
2022-01-04 01:14:13 +08:00
}
int aofFileExist ( char * filename ) {
sds file_path = makePath ( server . aof_dirname , filename ) ;
int ret = fileExist ( file_path ) ;
sdsfree ( file_path ) ;
return ret ;
}
/* Called in `rewriteAppendOnlyFileBackground`. If `server.aof_state`
2022-04-26 21:31:19 +08:00
* is ' AOF_ON ' , It will do two things :
Implement Multi Part AOF mechanism to avoid AOFRW overheads. (#9788)
Implement Multi-Part AOF mechanism to avoid overheads during AOFRW.
Introducing a folder with multiple AOF files tracked by a manifest file.
The main issues with the the original AOFRW mechanism are:
* buffering of commands that are processed during rewrite (consuming a lot of RAM)
* freezes of the main process when the AOFRW completes to drain the remaining part of the buffer and fsync it.
* double disk IO for the data that arrives during AOFRW (had to be written to both the old and new AOF files)
The main modifications of this PR:
1. Remove the AOF rewrite buffer and related code.
2. Divide the AOF into multiple files, they are classified as two types, one is the the `BASE` type,
it represents the full amount of data (Maybe AOF or RDB format) after each AOFRW, there is only
one `BASE` file at most. The second is `INCR` type, may have more than one. They represent the
incremental commands since the last AOFRW.
3. Use a AOF manifest file to record and manage these AOF files mentioned above.
4. The original configuration of `appendfilename` will be the base part of the new file name, for example:
`appendonly.aof.1.base.rdb` and `appendonly.aof.2.incr.aof`
5. Add manifest-related TCL tests, and modified some existing tests that depend on the `appendfilename`
6. Remove the `aof_rewrite_buffer_length` field in info.
7. Add `aof-disable-auto-gc` configuration. By default we're automatically deleting HISTORY type AOFs.
It also gives users the opportunity to preserve the history AOFs. just for testing use now.
8. Add AOFRW limiting measure. When the AOFRW failures reaches the threshold (3 times now),
we will delay the execution of the next AOFRW by 1 minute. If the next AOFRW also fails, it will be
delayed by 2 minutes. The next is 4, 8, 16, the maximum delay is 60 minutes (1 hour). During the limit
period, we can still use the 'bgrewriteaof' command to execute AOFRW immediately.
9. Support upgrade (load) data from old version redis.
10. Add `appenddirname` configuration, as the directory name of the append only files. All AOF files and
manifest file will be placed in this directory.
11. Only the last AOF file (BASE or INCR) can be truncated. Otherwise redis will exit even if
`aof-load-truncated` is enabled.
Co-authored-by: Oran Agra <oran@redislabs.com>
2022-01-04 01:14:13 +08:00
* 1. Open a new INCR type AOF for writing
* 2. Synchronously update the manifest file to the disk
*
* The above two steps of modification are atomic , that is , if
* any step fails , the entire operation will rollback and returns
* C_ERR , and if all succeeds , it returns C_OK .
2022-04-26 21:31:19 +08:00
*
* If ` server . aof_state ` is ' AOF_WAIT_REWRITE ' , It will open a temporary INCR AOF
* file to accumulate data during AOF_WAIT_REWRITE , and it will eventually be
* renamed in the ` backgroundRewriteDoneHandler ` and written to the manifest file .
Implement Multi Part AOF mechanism to avoid AOFRW overheads. (#9788)
Implement Multi-Part AOF mechanism to avoid overheads during AOFRW.
Introducing a folder with multiple AOF files tracked by a manifest file.
The main issues with the the original AOFRW mechanism are:
* buffering of commands that are processed during rewrite (consuming a lot of RAM)
* freezes of the main process when the AOFRW completes to drain the remaining part of the buffer and fsync it.
* double disk IO for the data that arrives during AOFRW (had to be written to both the old and new AOF files)
The main modifications of this PR:
1. Remove the AOF rewrite buffer and related code.
2. Divide the AOF into multiple files, they are classified as two types, one is the the `BASE` type,
it represents the full amount of data (Maybe AOF or RDB format) after each AOFRW, there is only
one `BASE` file at most. The second is `INCR` type, may have more than one. They represent the
incremental commands since the last AOFRW.
3. Use a AOF manifest file to record and manage these AOF files mentioned above.
4. The original configuration of `appendfilename` will be the base part of the new file name, for example:
`appendonly.aof.1.base.rdb` and `appendonly.aof.2.incr.aof`
5. Add manifest-related TCL tests, and modified some existing tests that depend on the `appendfilename`
6. Remove the `aof_rewrite_buffer_length` field in info.
7. Add `aof-disable-auto-gc` configuration. By default we're automatically deleting HISTORY type AOFs.
It also gives users the opportunity to preserve the history AOFs. just for testing use now.
8. Add AOFRW limiting measure. When the AOFRW failures reaches the threshold (3 times now),
we will delay the execution of the next AOFRW by 1 minute. If the next AOFRW also fails, it will be
delayed by 2 minutes. The next is 4, 8, 16, the maximum delay is 60 minutes (1 hour). During the limit
period, we can still use the 'bgrewriteaof' command to execute AOFRW immediately.
9. Support upgrade (load) data from old version redis.
10. Add `appenddirname` configuration, as the directory name of the append only files. All AOF files and
manifest file will be placed in this directory.
11. Only the last AOF file (BASE or INCR) can be truncated. Otherwise redis will exit even if
`aof-load-truncated` is enabled.
Co-authored-by: Oran Agra <oran@redislabs.com>
2022-01-04 01:14:13 +08:00
* */
int openNewIncrAofForAppend ( void ) {
serverAssert ( server . aof_manifest ! = NULL ) ;
2022-04-26 21:31:19 +08:00
int newfd = - 1 ;
aofManifest * temp_am = NULL ;
sds new_aof_name = NULL ;
Implement Multi Part AOF mechanism to avoid AOFRW overheads. (#9788)
Implement Multi-Part AOF mechanism to avoid overheads during AOFRW.
Introducing a folder with multiple AOF files tracked by a manifest file.
The main issues with the the original AOFRW mechanism are:
* buffering of commands that are processed during rewrite (consuming a lot of RAM)
* freezes of the main process when the AOFRW completes to drain the remaining part of the buffer and fsync it.
* double disk IO for the data that arrives during AOFRW (had to be written to both the old and new AOF files)
The main modifications of this PR:
1. Remove the AOF rewrite buffer and related code.
2. Divide the AOF into multiple files, they are classified as two types, one is the the `BASE` type,
it represents the full amount of data (Maybe AOF or RDB format) after each AOFRW, there is only
one `BASE` file at most. The second is `INCR` type, may have more than one. They represent the
incremental commands since the last AOFRW.
3. Use a AOF manifest file to record and manage these AOF files mentioned above.
4. The original configuration of `appendfilename` will be the base part of the new file name, for example:
`appendonly.aof.1.base.rdb` and `appendonly.aof.2.incr.aof`
5. Add manifest-related TCL tests, and modified some existing tests that depend on the `appendfilename`
6. Remove the `aof_rewrite_buffer_length` field in info.
7. Add `aof-disable-auto-gc` configuration. By default we're automatically deleting HISTORY type AOFs.
It also gives users the opportunity to preserve the history AOFs. just for testing use now.
8. Add AOFRW limiting measure. When the AOFRW failures reaches the threshold (3 times now),
we will delay the execution of the next AOFRW by 1 minute. If the next AOFRW also fails, it will be
delayed by 2 minutes. The next is 4, 8, 16, the maximum delay is 60 minutes (1 hour). During the limit
period, we can still use the 'bgrewriteaof' command to execute AOFRW immediately.
9. Support upgrade (load) data from old version redis.
10. Add `appenddirname` configuration, as the directory name of the append only files. All AOF files and
manifest file will be placed in this directory.
11. Only the last AOF file (BASE or INCR) can be truncated. Otherwise redis will exit even if
`aof-load-truncated` is enabled.
Co-authored-by: Oran Agra <oran@redislabs.com>
2022-01-04 01:14:13 +08:00
/* Only open new INCR AOF when AOF enabled. */
if ( server . aof_state = = AOF_OFF ) return C_OK ;
/* Open new AOF. */
2022-04-26 21:31:19 +08:00
if ( server . aof_state = = AOF_WAIT_REWRITE ) {
/* Use a temporary INCR AOF file to accumulate data during AOF_WAIT_REWRITE. */
new_aof_name = getTempIncrAofName ( ) ;
} else {
/* Dup a temp aof_manifest to modify. */
temp_am = aofManifestDup ( server . aof_manifest ) ;
new_aof_name = sdsdup ( getNewIncrAofName ( temp_am ) ) ;
}
Implement Multi Part AOF mechanism to avoid AOFRW overheads. (#9788)
Implement Multi-Part AOF mechanism to avoid overheads during AOFRW.
Introducing a folder with multiple AOF files tracked by a manifest file.
The main issues with the the original AOFRW mechanism are:
* buffering of commands that are processed during rewrite (consuming a lot of RAM)
* freezes of the main process when the AOFRW completes to drain the remaining part of the buffer and fsync it.
* double disk IO for the data that arrives during AOFRW (had to be written to both the old and new AOF files)
The main modifications of this PR:
1. Remove the AOF rewrite buffer and related code.
2. Divide the AOF into multiple files, they are classified as two types, one is the the `BASE` type,
it represents the full amount of data (Maybe AOF or RDB format) after each AOFRW, there is only
one `BASE` file at most. The second is `INCR` type, may have more than one. They represent the
incremental commands since the last AOFRW.
3. Use a AOF manifest file to record and manage these AOF files mentioned above.
4. The original configuration of `appendfilename` will be the base part of the new file name, for example:
`appendonly.aof.1.base.rdb` and `appendonly.aof.2.incr.aof`
5. Add manifest-related TCL tests, and modified some existing tests that depend on the `appendfilename`
6. Remove the `aof_rewrite_buffer_length` field in info.
7. Add `aof-disable-auto-gc` configuration. By default we're automatically deleting HISTORY type AOFs.
It also gives users the opportunity to preserve the history AOFs. just for testing use now.
8. Add AOFRW limiting measure. When the AOFRW failures reaches the threshold (3 times now),
we will delay the execution of the next AOFRW by 1 minute. If the next AOFRW also fails, it will be
delayed by 2 minutes. The next is 4, 8, 16, the maximum delay is 60 minutes (1 hour). During the limit
period, we can still use the 'bgrewriteaof' command to execute AOFRW immediately.
9. Support upgrade (load) data from old version redis.
10. Add `appenddirname` configuration, as the directory name of the append only files. All AOF files and
manifest file will be placed in this directory.
11. Only the last AOF file (BASE or INCR) can be truncated. Otherwise redis will exit even if
`aof-load-truncated` is enabled.
Co-authored-by: Oran Agra <oran@redislabs.com>
2022-01-04 01:14:13 +08:00
sds new_aof_filepath = makePath ( server . aof_dirname , new_aof_name ) ;
newfd = open ( new_aof_filepath , O_WRONLY | O_TRUNC | O_CREAT , 0644 ) ;
sdsfree ( new_aof_filepath ) ;
if ( newfd = = - 1 ) {
serverLog ( LL_WARNING , " Can't open the append-only file %s: %s " ,
new_aof_name , strerror ( errno ) ) ;
2022-04-26 21:31:19 +08:00
goto cleanup ;
Implement Multi Part AOF mechanism to avoid AOFRW overheads. (#9788)
Implement Multi-Part AOF mechanism to avoid overheads during AOFRW.
Introducing a folder with multiple AOF files tracked by a manifest file.
The main issues with the the original AOFRW mechanism are:
* buffering of commands that are processed during rewrite (consuming a lot of RAM)
* freezes of the main process when the AOFRW completes to drain the remaining part of the buffer and fsync it.
* double disk IO for the data that arrives during AOFRW (had to be written to both the old and new AOF files)
The main modifications of this PR:
1. Remove the AOF rewrite buffer and related code.
2. Divide the AOF into multiple files, they are classified as two types, one is the the `BASE` type,
it represents the full amount of data (Maybe AOF or RDB format) after each AOFRW, there is only
one `BASE` file at most. The second is `INCR` type, may have more than one. They represent the
incremental commands since the last AOFRW.
3. Use a AOF manifest file to record and manage these AOF files mentioned above.
4. The original configuration of `appendfilename` will be the base part of the new file name, for example:
`appendonly.aof.1.base.rdb` and `appendonly.aof.2.incr.aof`
5. Add manifest-related TCL tests, and modified some existing tests that depend on the `appendfilename`
6. Remove the `aof_rewrite_buffer_length` field in info.
7. Add `aof-disable-auto-gc` configuration. By default we're automatically deleting HISTORY type AOFs.
It also gives users the opportunity to preserve the history AOFs. just for testing use now.
8. Add AOFRW limiting measure. When the AOFRW failures reaches the threshold (3 times now),
we will delay the execution of the next AOFRW by 1 minute. If the next AOFRW also fails, it will be
delayed by 2 minutes. The next is 4, 8, 16, the maximum delay is 60 minutes (1 hour). During the limit
period, we can still use the 'bgrewriteaof' command to execute AOFRW immediately.
9. Support upgrade (load) data from old version redis.
10. Add `appenddirname` configuration, as the directory name of the append only files. All AOF files and
manifest file will be placed in this directory.
11. Only the last AOF file (BASE or INCR) can be truncated. Otherwise redis will exit even if
`aof-load-truncated` is enabled.
Co-authored-by: Oran Agra <oran@redislabs.com>
2022-01-04 01:14:13 +08:00
}
2022-04-26 21:31:19 +08:00
if ( temp_am ) {
/* Persist AOF Manifest. */
if ( persistAofManifest ( temp_am ) = = C_ERR ) {
goto cleanup ;
}
Implement Multi Part AOF mechanism to avoid AOFRW overheads. (#9788)
Implement Multi-Part AOF mechanism to avoid overheads during AOFRW.
Introducing a folder with multiple AOF files tracked by a manifest file.
The main issues with the the original AOFRW mechanism are:
* buffering of commands that are processed during rewrite (consuming a lot of RAM)
* freezes of the main process when the AOFRW completes to drain the remaining part of the buffer and fsync it.
* double disk IO for the data that arrives during AOFRW (had to be written to both the old and new AOF files)
The main modifications of this PR:
1. Remove the AOF rewrite buffer and related code.
2. Divide the AOF into multiple files, they are classified as two types, one is the the `BASE` type,
it represents the full amount of data (Maybe AOF or RDB format) after each AOFRW, there is only
one `BASE` file at most. The second is `INCR` type, may have more than one. They represent the
incremental commands since the last AOFRW.
3. Use a AOF manifest file to record and manage these AOF files mentioned above.
4. The original configuration of `appendfilename` will be the base part of the new file name, for example:
`appendonly.aof.1.base.rdb` and `appendonly.aof.2.incr.aof`
5. Add manifest-related TCL tests, and modified some existing tests that depend on the `appendfilename`
6. Remove the `aof_rewrite_buffer_length` field in info.
7. Add `aof-disable-auto-gc` configuration. By default we're automatically deleting HISTORY type AOFs.
It also gives users the opportunity to preserve the history AOFs. just for testing use now.
8. Add AOFRW limiting measure. When the AOFRW failures reaches the threshold (3 times now),
we will delay the execution of the next AOFRW by 1 minute. If the next AOFRW also fails, it will be
delayed by 2 minutes. The next is 4, 8, 16, the maximum delay is 60 minutes (1 hour). During the limit
period, we can still use the 'bgrewriteaof' command to execute AOFRW immediately.
9. Support upgrade (load) data from old version redis.
10. Add `appenddirname` configuration, as the directory name of the append only files. All AOF files and
manifest file will be placed in this directory.
11. Only the last AOF file (BASE or INCR) can be truncated. Otherwise redis will exit even if
`aof-load-truncated` is enabled.
Co-authored-by: Oran Agra <oran@redislabs.com>
2022-01-04 01:14:13 +08:00
}
2022-05-26 22:23:05 +08:00
serverLog ( LL_NOTICE , " Creating AOF incr file %s on background rewrite " ,
new_aof_name ) ;
2022-04-26 21:31:19 +08:00
sdsfree ( new_aof_name ) ;
Implement Multi Part AOF mechanism to avoid AOFRW overheads. (#9788)
Implement Multi-Part AOF mechanism to avoid overheads during AOFRW.
Introducing a folder with multiple AOF files tracked by a manifest file.
The main issues with the the original AOFRW mechanism are:
* buffering of commands that are processed during rewrite (consuming a lot of RAM)
* freezes of the main process when the AOFRW completes to drain the remaining part of the buffer and fsync it.
* double disk IO for the data that arrives during AOFRW (had to be written to both the old and new AOF files)
The main modifications of this PR:
1. Remove the AOF rewrite buffer and related code.
2. Divide the AOF into multiple files, they are classified as two types, one is the the `BASE` type,
it represents the full amount of data (Maybe AOF or RDB format) after each AOFRW, there is only
one `BASE` file at most. The second is `INCR` type, may have more than one. They represent the
incremental commands since the last AOFRW.
3. Use a AOF manifest file to record and manage these AOF files mentioned above.
4. The original configuration of `appendfilename` will be the base part of the new file name, for example:
`appendonly.aof.1.base.rdb` and `appendonly.aof.2.incr.aof`
5. Add manifest-related TCL tests, and modified some existing tests that depend on the `appendfilename`
6. Remove the `aof_rewrite_buffer_length` field in info.
7. Add `aof-disable-auto-gc` configuration. By default we're automatically deleting HISTORY type AOFs.
It also gives users the opportunity to preserve the history AOFs. just for testing use now.
8. Add AOFRW limiting measure. When the AOFRW failures reaches the threshold (3 times now),
we will delay the execution of the next AOFRW by 1 minute. If the next AOFRW also fails, it will be
delayed by 2 minutes. The next is 4, 8, 16, the maximum delay is 60 minutes (1 hour). During the limit
period, we can still use the 'bgrewriteaof' command to execute AOFRW immediately.
9. Support upgrade (load) data from old version redis.
10. Add `appenddirname` configuration, as the directory name of the append only files. All AOF files and
manifest file will be placed in this directory.
11. Only the last AOF file (BASE or INCR) can be truncated. Otherwise redis will exit even if
`aof-load-truncated` is enabled.
Co-authored-by: Oran Agra <oran@redislabs.com>
2022-01-04 01:14:13 +08:00
/* If reaches here, we can safely modify the `server.aof_manifest`
* and ` server . aof_fd ` . */
fsync the old aof file when open a new INCR AOF (#11004)
In rewriteAppendOnlyFileBackground, after flushAppendOnlyFile(1),
and before openNewIncrAofForAppend, we should call redis_fsync
to fsync the aof file.
Because we may open a new INCR AOF in openNewIncrAofForAppend,
in the case of using everysec policy, the old AOF file may not
be fsynced in time (or even at all).
When using everysec, we don't want to pay the disk latency from
the main thread, so we will do a background fsync.
Adding a argument for bioCreateCloseJob, a `need_fsync` flag to
indicate that a fsync is required before the file is closed. So we will
fsync the old AOF file before we close it.
A cleanup, we make union become a union, since the free_* args and
the fd / fsync args are never used together.
Co-authored-by: Oran Agra <oran@redislabs.com>
2022-07-25 14:16:35 +08:00
/* fsync and close old aof_fd if needed. In fsync everysec it's ok to delay
* the fsync as long as we grantee it happens , and in fsync always the file
* is already synced at this point so fsync doesn ' t matter . */
if ( server . aof_fd ! = - 1 ) {
aof_background_fsync_and_close ( server . aof_fd ) ;
server . aof_last_fsync = server . unixtime ;
}
Implement Multi Part AOF mechanism to avoid AOFRW overheads. (#9788)
Implement Multi-Part AOF mechanism to avoid overheads during AOFRW.
Introducing a folder with multiple AOF files tracked by a manifest file.
The main issues with the the original AOFRW mechanism are:
* buffering of commands that are processed during rewrite (consuming a lot of RAM)
* freezes of the main process when the AOFRW completes to drain the remaining part of the buffer and fsync it.
* double disk IO for the data that arrives during AOFRW (had to be written to both the old and new AOF files)
The main modifications of this PR:
1. Remove the AOF rewrite buffer and related code.
2. Divide the AOF into multiple files, they are classified as two types, one is the the `BASE` type,
it represents the full amount of data (Maybe AOF or RDB format) after each AOFRW, there is only
one `BASE` file at most. The second is `INCR` type, may have more than one. They represent the
incremental commands since the last AOFRW.
3. Use a AOF manifest file to record and manage these AOF files mentioned above.
4. The original configuration of `appendfilename` will be the base part of the new file name, for example:
`appendonly.aof.1.base.rdb` and `appendonly.aof.2.incr.aof`
5. Add manifest-related TCL tests, and modified some existing tests that depend on the `appendfilename`
6. Remove the `aof_rewrite_buffer_length` field in info.
7. Add `aof-disable-auto-gc` configuration. By default we're automatically deleting HISTORY type AOFs.
It also gives users the opportunity to preserve the history AOFs. just for testing use now.
8. Add AOFRW limiting measure. When the AOFRW failures reaches the threshold (3 times now),
we will delay the execution of the next AOFRW by 1 minute. If the next AOFRW also fails, it will be
delayed by 2 minutes. The next is 4, 8, 16, the maximum delay is 60 minutes (1 hour). During the limit
period, we can still use the 'bgrewriteaof' command to execute AOFRW immediately.
9. Support upgrade (load) data from old version redis.
10. Add `appenddirname` configuration, as the directory name of the append only files. All AOF files and
manifest file will be placed in this directory.
11. Only the last AOF file (BASE or INCR) can be truncated. Otherwise redis will exit even if
`aof-load-truncated` is enabled.
Co-authored-by: Oran Agra <oran@redislabs.com>
2022-01-04 01:14:13 +08:00
server . aof_fd = newfd ;
/* Reset the aof_last_incr_size. */
server . aof_last_incr_size = 0 ;
Fix fork done handler wrongly update fsync metrics and enhance AOF_ FSYNC_ALWAYS (#11973)
This PR fix several unrelated bugs that were discovered by the same set of tests
(WAITAOF tests in #11713), could make the `WAITAOF` test hang.
The change in `backgroundRewriteDoneHandler` is about MP-AOF.
That leftover / old code assumes that we started a new AOF file just now
(when we have a new base into which we're gonna incrementally write), but
the fact is that with MP-AOF, the fork done handler doesn't really affect the
incremental file being maintained by the parent process, there's no reason to
re-issue `SELECT`, and no reason to update any of the fsync variables in that flow.
This should have been deleted with MP-AOF (introduced in #9788, 7.0).
The damage is that the update to `aof_fsync_offset` will cause us to miss an fsync
in `flushAppendOnlyFile`, that happens if we stop write commands in `AOF_FSYNC_EVERYSEC`
while an AOFRW is in progress. This caused a new `WAITAOF` test to sometime hang forever.
Also because of MP-AOF, we needed to change `aof_fsync_offset` to `aof_last_incr_fsync_offset`
and match it to `aof_last_incr_size` in `flushAppendOnlyFile`. This is because in the past we compared
`aof_fsync_offset` and `aof_current_size`, but with MP-AOF it could be the total AOF file will be
smaller after AOFRW, and the (already existing) incr file still has data that needs to be fsynced.
The change in `flushAppendOnlyFile`, about the `AOF_FSYNC_ALWAYS`, it is follow #6053
(the details is in #5985), we also check `AOF_FSYNC_ALWAYS` to handle a case where
appendfsync is changed from everysec to always while there is data that's written but not yet fsynced.
2023-03-29 20:17:05 +08:00
/* Reset the aof_last_incr_fsync_offset. */
server . aof_last_incr_fsync_offset = 0 ;
Implement Multi Part AOF mechanism to avoid AOFRW overheads. (#9788)
Implement Multi-Part AOF mechanism to avoid overheads during AOFRW.
Introducing a folder with multiple AOF files tracked by a manifest file.
The main issues with the the original AOFRW mechanism are:
* buffering of commands that are processed during rewrite (consuming a lot of RAM)
* freezes of the main process when the AOFRW completes to drain the remaining part of the buffer and fsync it.
* double disk IO for the data that arrives during AOFRW (had to be written to both the old and new AOF files)
The main modifications of this PR:
1. Remove the AOF rewrite buffer and related code.
2. Divide the AOF into multiple files, they are classified as two types, one is the the `BASE` type,
it represents the full amount of data (Maybe AOF or RDB format) after each AOFRW, there is only
one `BASE` file at most. The second is `INCR` type, may have more than one. They represent the
incremental commands since the last AOFRW.
3. Use a AOF manifest file to record and manage these AOF files mentioned above.
4. The original configuration of `appendfilename` will be the base part of the new file name, for example:
`appendonly.aof.1.base.rdb` and `appendonly.aof.2.incr.aof`
5. Add manifest-related TCL tests, and modified some existing tests that depend on the `appendfilename`
6. Remove the `aof_rewrite_buffer_length` field in info.
7. Add `aof-disable-auto-gc` configuration. By default we're automatically deleting HISTORY type AOFs.
It also gives users the opportunity to preserve the history AOFs. just for testing use now.
8. Add AOFRW limiting measure. When the AOFRW failures reaches the threshold (3 times now),
we will delay the execution of the next AOFRW by 1 minute. If the next AOFRW also fails, it will be
delayed by 2 minutes. The next is 4, 8, 16, the maximum delay is 60 minutes (1 hour). During the limit
period, we can still use the 'bgrewriteaof' command to execute AOFRW immediately.
9. Support upgrade (load) data from old version redis.
10. Add `appenddirname` configuration, as the directory name of the append only files. All AOF files and
manifest file will be placed in this directory.
11. Only the last AOF file (BASE or INCR) can be truncated. Otherwise redis will exit even if
`aof-load-truncated` is enabled.
Co-authored-by: Oran Agra <oran@redislabs.com>
2022-01-04 01:14:13 +08:00
/* Update `server.aof_manifest`. */
2022-04-26 21:31:19 +08:00
if ( temp_am ) aofManifestFreeAndUpdate ( temp_am ) ;
Implement Multi Part AOF mechanism to avoid AOFRW overheads. (#9788)
Implement Multi-Part AOF mechanism to avoid overheads during AOFRW.
Introducing a folder with multiple AOF files tracked by a manifest file.
The main issues with the the original AOFRW mechanism are:
* buffering of commands that are processed during rewrite (consuming a lot of RAM)
* freezes of the main process when the AOFRW completes to drain the remaining part of the buffer and fsync it.
* double disk IO for the data that arrives during AOFRW (had to be written to both the old and new AOF files)
The main modifications of this PR:
1. Remove the AOF rewrite buffer and related code.
2. Divide the AOF into multiple files, they are classified as two types, one is the the `BASE` type,
it represents the full amount of data (Maybe AOF or RDB format) after each AOFRW, there is only
one `BASE` file at most. The second is `INCR` type, may have more than one. They represent the
incremental commands since the last AOFRW.
3. Use a AOF manifest file to record and manage these AOF files mentioned above.
4. The original configuration of `appendfilename` will be the base part of the new file name, for example:
`appendonly.aof.1.base.rdb` and `appendonly.aof.2.incr.aof`
5. Add manifest-related TCL tests, and modified some existing tests that depend on the `appendfilename`
6. Remove the `aof_rewrite_buffer_length` field in info.
7. Add `aof-disable-auto-gc` configuration. By default we're automatically deleting HISTORY type AOFs.
It also gives users the opportunity to preserve the history AOFs. just for testing use now.
8. Add AOFRW limiting measure. When the AOFRW failures reaches the threshold (3 times now),
we will delay the execution of the next AOFRW by 1 minute. If the next AOFRW also fails, it will be
delayed by 2 minutes. The next is 4, 8, 16, the maximum delay is 60 minutes (1 hour). During the limit
period, we can still use the 'bgrewriteaof' command to execute AOFRW immediately.
9. Support upgrade (load) data from old version redis.
10. Add `appenddirname` configuration, as the directory name of the append only files. All AOF files and
manifest file will be placed in this directory.
11. Only the last AOF file (BASE or INCR) can be truncated. Otherwise redis will exit even if
`aof-load-truncated` is enabled.
Co-authored-by: Oran Agra <oran@redislabs.com>
2022-01-04 01:14:13 +08:00
return C_OK ;
2022-04-26 21:31:19 +08:00
cleanup :
if ( new_aof_name ) sdsfree ( new_aof_name ) ;
if ( newfd ! = - 1 ) close ( newfd ) ;
if ( temp_am ) aofManifestFree ( temp_am ) ;
return C_ERR ;
Implement Multi Part AOF mechanism to avoid AOFRW overheads. (#9788)
Implement Multi-Part AOF mechanism to avoid overheads during AOFRW.
Introducing a folder with multiple AOF files tracked by a manifest file.
The main issues with the the original AOFRW mechanism are:
* buffering of commands that are processed during rewrite (consuming a lot of RAM)
* freezes of the main process when the AOFRW completes to drain the remaining part of the buffer and fsync it.
* double disk IO for the data that arrives during AOFRW (had to be written to both the old and new AOF files)
The main modifications of this PR:
1. Remove the AOF rewrite buffer and related code.
2. Divide the AOF into multiple files, they are classified as two types, one is the the `BASE` type,
it represents the full amount of data (Maybe AOF or RDB format) after each AOFRW, there is only
one `BASE` file at most. The second is `INCR` type, may have more than one. They represent the
incremental commands since the last AOFRW.
3. Use a AOF manifest file to record and manage these AOF files mentioned above.
4. The original configuration of `appendfilename` will be the base part of the new file name, for example:
`appendonly.aof.1.base.rdb` and `appendonly.aof.2.incr.aof`
5. Add manifest-related TCL tests, and modified some existing tests that depend on the `appendfilename`
6. Remove the `aof_rewrite_buffer_length` field in info.
7. Add `aof-disable-auto-gc` configuration. By default we're automatically deleting HISTORY type AOFs.
It also gives users the opportunity to preserve the history AOFs. just for testing use now.
8. Add AOFRW limiting measure. When the AOFRW failures reaches the threshold (3 times now),
we will delay the execution of the next AOFRW by 1 minute. If the next AOFRW also fails, it will be
delayed by 2 minutes. The next is 4, 8, 16, the maximum delay is 60 minutes (1 hour). During the limit
period, we can still use the 'bgrewriteaof' command to execute AOFRW immediately.
9. Support upgrade (load) data from old version redis.
10. Add `appenddirname` configuration, as the directory name of the append only files. All AOF files and
manifest file will be placed in this directory.
11. Only the last AOF file (BASE or INCR) can be truncated. Otherwise redis will exit even if
`aof-load-truncated` is enabled.
Co-authored-by: Oran Agra <oran@redislabs.com>
2022-01-04 01:14:13 +08:00
}
/* Whether to limit the execution of Background AOF rewrite.
*
* At present , if AOFRW fails , redis will automatically retry . If it continues
* to fail , we may get a lot of very small INCR files . so we need an AOFRW
* limiting measure .
*
* We can ' t directly use ` server . aof_current_size ` and ` server . aof_last_incr_size ` ,
* because there may be no new writes after AOFRW fails .
*
* So , we use time delay to achieve our goal . When AOFRW fails , we delay the execution
* of the next AOFRW by 1 minute . If the next AOFRW also fails , it will be delayed by 2
* minutes . The next is 4 , 8 , 16 , the maximum delay is 60 minutes ( 1 hour ) .
*
* During the limit period , we can still use the ' bgrewriteaof ' command to execute AOFRW
* immediately .
*
* Return 1 means that AOFRW is limited and cannot be executed . 0 means that we can execute
* AOFRW , which may be that we have reached the ' next_rewrite_time ' or the number of INCR
* AOFs has not reached the limit threshold .
* */
# define AOF_REWRITE_LIMITE_THRESHOLD 3
2022-04-04 13:38:41 +08:00
# define AOF_REWRITE_LIMITE_MAX_MINUTES 60 /* 1 hour */
Implement Multi Part AOF mechanism to avoid AOFRW overheads. (#9788)
Implement Multi-Part AOF mechanism to avoid overheads during AOFRW.
Introducing a folder with multiple AOF files tracked by a manifest file.
The main issues with the the original AOFRW mechanism are:
* buffering of commands that are processed during rewrite (consuming a lot of RAM)
* freezes of the main process when the AOFRW completes to drain the remaining part of the buffer and fsync it.
* double disk IO for the data that arrives during AOFRW (had to be written to both the old and new AOF files)
The main modifications of this PR:
1. Remove the AOF rewrite buffer and related code.
2. Divide the AOF into multiple files, they are classified as two types, one is the the `BASE` type,
it represents the full amount of data (Maybe AOF or RDB format) after each AOFRW, there is only
one `BASE` file at most. The second is `INCR` type, may have more than one. They represent the
incremental commands since the last AOFRW.
3. Use a AOF manifest file to record and manage these AOF files mentioned above.
4. The original configuration of `appendfilename` will be the base part of the new file name, for example:
`appendonly.aof.1.base.rdb` and `appendonly.aof.2.incr.aof`
5. Add manifest-related TCL tests, and modified some existing tests that depend on the `appendfilename`
6. Remove the `aof_rewrite_buffer_length` field in info.
7. Add `aof-disable-auto-gc` configuration. By default we're automatically deleting HISTORY type AOFs.
It also gives users the opportunity to preserve the history AOFs. just for testing use now.
8. Add AOFRW limiting measure. When the AOFRW failures reaches the threshold (3 times now),
we will delay the execution of the next AOFRW by 1 minute. If the next AOFRW also fails, it will be
delayed by 2 minutes. The next is 4, 8, 16, the maximum delay is 60 minutes (1 hour). During the limit
period, we can still use the 'bgrewriteaof' command to execute AOFRW immediately.
9. Support upgrade (load) data from old version redis.
10. Add `appenddirname` configuration, as the directory name of the append only files. All AOF files and
manifest file will be placed in this directory.
11. Only the last AOF file (BASE or INCR) can be truncated. Otherwise redis will exit even if
`aof-load-truncated` is enabled.
Co-authored-by: Oran Agra <oran@redislabs.com>
2022-01-04 01:14:13 +08:00
int aofRewriteLimited ( void ) {
2022-04-19 17:06:39 +08:00
static int next_delay_minutes = 0 ;
Implement Multi Part AOF mechanism to avoid AOFRW overheads. (#9788)
Implement Multi-Part AOF mechanism to avoid overheads during AOFRW.
Introducing a folder with multiple AOF files tracked by a manifest file.
The main issues with the the original AOFRW mechanism are:
* buffering of commands that are processed during rewrite (consuming a lot of RAM)
* freezes of the main process when the AOFRW completes to drain the remaining part of the buffer and fsync it.
* double disk IO for the data that arrives during AOFRW (had to be written to both the old and new AOF files)
The main modifications of this PR:
1. Remove the AOF rewrite buffer and related code.
2. Divide the AOF into multiple files, they are classified as two types, one is the the `BASE` type,
it represents the full amount of data (Maybe AOF or RDB format) after each AOFRW, there is only
one `BASE` file at most. The second is `INCR` type, may have more than one. They represent the
incremental commands since the last AOFRW.
3. Use a AOF manifest file to record and manage these AOF files mentioned above.
4. The original configuration of `appendfilename` will be the base part of the new file name, for example:
`appendonly.aof.1.base.rdb` and `appendonly.aof.2.incr.aof`
5. Add manifest-related TCL tests, and modified some existing tests that depend on the `appendfilename`
6. Remove the `aof_rewrite_buffer_length` field in info.
7. Add `aof-disable-auto-gc` configuration. By default we're automatically deleting HISTORY type AOFs.
It also gives users the opportunity to preserve the history AOFs. just for testing use now.
8. Add AOFRW limiting measure. When the AOFRW failures reaches the threshold (3 times now),
we will delay the execution of the next AOFRW by 1 minute. If the next AOFRW also fails, it will be
delayed by 2 minutes. The next is 4, 8, 16, the maximum delay is 60 minutes (1 hour). During the limit
period, we can still use the 'bgrewriteaof' command to execute AOFRW immediately.
9. Support upgrade (load) data from old version redis.
10. Add `appenddirname` configuration, as the directory name of the append only files. All AOF files and
manifest file will be placed in this directory.
11. Only the last AOF file (BASE or INCR) can be truncated. Otherwise redis will exit even if
`aof-load-truncated` is enabled.
Co-authored-by: Oran Agra <oran@redislabs.com>
2022-01-04 01:14:13 +08:00
static time_t next_rewrite_time = 0 ;
2022-04-26 21:31:19 +08:00
if ( server . stat_aofrw_consecutive_failures < AOF_REWRITE_LIMITE_THRESHOLD ) {
2022-04-19 17:06:39 +08:00
/* We may be recovering from limited state, so reset all states. */
next_delay_minutes = 0 ;
next_rewrite_time = 0 ;
return 0 ;
}
2022-04-26 21:31:19 +08:00
2022-04-19 17:06:39 +08:00
/* if it is in the limiting state, then check if the next_rewrite_time is reached */
if ( next_rewrite_time ! = 0 ) {
Implement Multi Part AOF mechanism to avoid AOFRW overheads. (#9788)
Implement Multi-Part AOF mechanism to avoid overheads during AOFRW.
Introducing a folder with multiple AOF files tracked by a manifest file.
The main issues with the the original AOFRW mechanism are:
* buffering of commands that are processed during rewrite (consuming a lot of RAM)
* freezes of the main process when the AOFRW completes to drain the remaining part of the buffer and fsync it.
* double disk IO for the data that arrives during AOFRW (had to be written to both the old and new AOF files)
The main modifications of this PR:
1. Remove the AOF rewrite buffer and related code.
2. Divide the AOF into multiple files, they are classified as two types, one is the the `BASE` type,
it represents the full amount of data (Maybe AOF or RDB format) after each AOFRW, there is only
one `BASE` file at most. The second is `INCR` type, may have more than one. They represent the
incremental commands since the last AOFRW.
3. Use a AOF manifest file to record and manage these AOF files mentioned above.
4. The original configuration of `appendfilename` will be the base part of the new file name, for example:
`appendonly.aof.1.base.rdb` and `appendonly.aof.2.incr.aof`
5. Add manifest-related TCL tests, and modified some existing tests that depend on the `appendfilename`
6. Remove the `aof_rewrite_buffer_length` field in info.
7. Add `aof-disable-auto-gc` configuration. By default we're automatically deleting HISTORY type AOFs.
It also gives users the opportunity to preserve the history AOFs. just for testing use now.
8. Add AOFRW limiting measure. When the AOFRW failures reaches the threshold (3 times now),
we will delay the execution of the next AOFRW by 1 minute. If the next AOFRW also fails, it will be
delayed by 2 minutes. The next is 4, 8, 16, the maximum delay is 60 minutes (1 hour). During the limit
period, we can still use the 'bgrewriteaof' command to execute AOFRW immediately.
9. Support upgrade (load) data from old version redis.
10. Add `appenddirname` configuration, as the directory name of the append only files. All AOF files and
manifest file will be placed in this directory.
11. Only the last AOF file (BASE or INCR) can be truncated. Otherwise redis will exit even if
`aof-load-truncated` is enabled.
Co-authored-by: Oran Agra <oran@redislabs.com>
2022-01-04 01:14:13 +08:00
if ( server . unixtime < next_rewrite_time ) {
2022-04-19 17:06:39 +08:00
return 1 ;
Implement Multi Part AOF mechanism to avoid AOFRW overheads. (#9788)
Implement Multi-Part AOF mechanism to avoid overheads during AOFRW.
Introducing a folder with multiple AOF files tracked by a manifest file.
The main issues with the the original AOFRW mechanism are:
* buffering of commands that are processed during rewrite (consuming a lot of RAM)
* freezes of the main process when the AOFRW completes to drain the remaining part of the buffer and fsync it.
* double disk IO for the data that arrives during AOFRW (had to be written to both the old and new AOF files)
The main modifications of this PR:
1. Remove the AOF rewrite buffer and related code.
2. Divide the AOF into multiple files, they are classified as two types, one is the the `BASE` type,
it represents the full amount of data (Maybe AOF or RDB format) after each AOFRW, there is only
one `BASE` file at most. The second is `INCR` type, may have more than one. They represent the
incremental commands since the last AOFRW.
3. Use a AOF manifest file to record and manage these AOF files mentioned above.
4. The original configuration of `appendfilename` will be the base part of the new file name, for example:
`appendonly.aof.1.base.rdb` and `appendonly.aof.2.incr.aof`
5. Add manifest-related TCL tests, and modified some existing tests that depend on the `appendfilename`
6. Remove the `aof_rewrite_buffer_length` field in info.
7. Add `aof-disable-auto-gc` configuration. By default we're automatically deleting HISTORY type AOFs.
It also gives users the opportunity to preserve the history AOFs. just for testing use now.
8. Add AOFRW limiting measure. When the AOFRW failures reaches the threshold (3 times now),
we will delay the execution of the next AOFRW by 1 minute. If the next AOFRW also fails, it will be
delayed by 2 minutes. The next is 4, 8, 16, the maximum delay is 60 minutes (1 hour). During the limit
period, we can still use the 'bgrewriteaof' command to execute AOFRW immediately.
9. Support upgrade (load) data from old version redis.
10. Add `appenddirname` configuration, as the directory name of the append only files. All AOF files and
manifest file will be placed in this directory.
11. Only the last AOF file (BASE or INCR) can be truncated. Otherwise redis will exit even if
`aof-load-truncated` is enabled.
Co-authored-by: Oran Agra <oran@redislabs.com>
2022-01-04 01:14:13 +08:00
} else {
2022-04-19 17:06:39 +08:00
next_rewrite_time = 0 ;
return 0 ;
Allow an AOF rewrite buffer > 2GB (Fix for issue #504).
During the AOF rewrite process, the parent process needs to accumulate
the new writes in an in-memory buffer: when the child will terminate the
AOF rewriting process this buffer (that ist the difference between the
dataset when the rewrite was started, and the current dataset) is
flushed to the new AOF file.
We used to implement this buffer using an sds.c string, but sds.c has a
2GB limit. Sometimes the dataset can be big enough, the amount of writes
so high, and the rewrite process slow enough that we overflow the 2GB
limit, causing a crash, documented on github by issue #504.
In order to prevent this from happening, this commit introduces a new
system to accumulate writes, implemented by a linked list of blocks of
10 MB each, so that we also avoid paying the reallocation cost.
Note that theoretically modern operating systems may implement realloc()
simply as a remaping of the old pages, thus with very good performances,
see for instance the mremap() syscall on Linux. However this is not
always true, and jemalloc by default avoids doing this because there are
issues with the current implementation of mremap().
For this reason we are using a linked list of blocks instead of a single
block that gets reallocated again and again.
The changes in this commit lacks testing, that will be performed before
merging into the unstable branch. This fix will not enter 2.4 because it
is too invasive. However 2.4 will log a warning when the AOF rewrite
buffer is near to the 2GB limit.
2012-05-22 13:03:41 +02:00
}
}
Implement Multi Part AOF mechanism to avoid AOFRW overheads. (#9788)
Implement Multi-Part AOF mechanism to avoid overheads during AOFRW.
Introducing a folder with multiple AOF files tracked by a manifest file.
The main issues with the the original AOFRW mechanism are:
* buffering of commands that are processed during rewrite (consuming a lot of RAM)
* freezes of the main process when the AOFRW completes to drain the remaining part of the buffer and fsync it.
* double disk IO for the data that arrives during AOFRW (had to be written to both the old and new AOF files)
The main modifications of this PR:
1. Remove the AOF rewrite buffer and related code.
2. Divide the AOF into multiple files, they are classified as two types, one is the the `BASE` type,
it represents the full amount of data (Maybe AOF or RDB format) after each AOFRW, there is only
one `BASE` file at most. The second is `INCR` type, may have more than one. They represent the
incremental commands since the last AOFRW.
3. Use a AOF manifest file to record and manage these AOF files mentioned above.
4. The original configuration of `appendfilename` will be the base part of the new file name, for example:
`appendonly.aof.1.base.rdb` and `appendonly.aof.2.incr.aof`
5. Add manifest-related TCL tests, and modified some existing tests that depend on the `appendfilename`
6. Remove the `aof_rewrite_buffer_length` field in info.
7. Add `aof-disable-auto-gc` configuration. By default we're automatically deleting HISTORY type AOFs.
It also gives users the opportunity to preserve the history AOFs. just for testing use now.
8. Add AOFRW limiting measure. When the AOFRW failures reaches the threshold (3 times now),
we will delay the execution of the next AOFRW by 1 minute. If the next AOFRW also fails, it will be
delayed by 2 minutes. The next is 4, 8, 16, the maximum delay is 60 minutes (1 hour). During the limit
period, we can still use the 'bgrewriteaof' command to execute AOFRW immediately.
9. Support upgrade (load) data from old version redis.
10. Add `appenddirname` configuration, as the directory name of the append only files. All AOF files and
manifest file will be placed in this directory.
11. Only the last AOF file (BASE or INCR) can be truncated. Otherwise redis will exit even if
`aof-load-truncated` is enabled.
Co-authored-by: Oran Agra <oran@redislabs.com>
2022-01-04 01:14:13 +08:00
2022-04-19 17:06:39 +08:00
next_delay_minutes = ( next_delay_minutes = = 0 ) ? 1 : ( next_delay_minutes * 2 ) ;
if ( next_delay_minutes > AOF_REWRITE_LIMITE_MAX_MINUTES ) {
next_delay_minutes = AOF_REWRITE_LIMITE_MAX_MINUTES ;
}
next_rewrite_time = server . unixtime + next_delay_minutes * 60 ;
serverLog ( LL_WARNING ,
" Background AOF rewrite has repeatedly failed and triggered the limit, will retry in %d minutes " , next_delay_minutes ) ;
return 1 ;
Allow an AOF rewrite buffer > 2GB (Fix for issue #504).
During the AOF rewrite process, the parent process needs to accumulate
the new writes in an in-memory buffer: when the child will terminate the
AOF rewriting process this buffer (that ist the difference between the
dataset when the rewrite was started, and the current dataset) is
flushed to the new AOF file.
We used to implement this buffer using an sds.c string, but sds.c has a
2GB limit. Sometimes the dataset can be big enough, the amount of writes
so high, and the rewrite process slow enough that we overflow the 2GB
limit, causing a crash, documented on github by issue #504.
In order to prevent this from happening, this commit introduces a new
system to accumulate writes, implemented by a linked list of blocks of
10 MB each, so that we also avoid paying the reallocation cost.
Note that theoretically modern operating systems may implement realloc()
simply as a remaping of the old pages, thus with very good performances,
see for instance the mremap() syscall on Linux. However this is not
always true, and jemalloc by default avoids doing this because there are
issues with the current implementation of mremap().
For this reason we are using a linked list of blocks instead of a single
block that gets reallocated again and again.
The changes in this commit lacks testing, that will be performed before
merging into the unstable branch. This fix will not enter 2.4 because it
is too invasive. However 2.4 will log a warning when the AOF rewrite
buffer is near to the 2GB limit.
2012-05-22 13:03:41 +02:00
}
2012-05-21 16:50:05 +02:00
/* ----------------------------------------------------------------------------
* AOF file implementation
* - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - */
2019-04-29 14:38:28 +08:00
/* Return true if an AOf fsync is currently already in progress in a
* BIO thread . */
int aofFsyncInProgress ( void ) {
fsync the old aof file when open a new INCR AOF (#11004)
In rewriteAppendOnlyFileBackground, after flushAppendOnlyFile(1),
and before openNewIncrAofForAppend, we should call redis_fsync
to fsync the aof file.
Because we may open a new INCR AOF in openNewIncrAofForAppend,
in the case of using everysec policy, the old AOF file may not
be fsynced in time (or even at all).
When using everysec, we don't want to pay the disk latency from
the main thread, so we will do a background fsync.
Adding a argument for bioCreateCloseJob, a `need_fsync` flag to
indicate that a fsync is required before the file is closed. So we will
fsync the old AOF file before we close it.
A cleanup, we make union become a union, since the free_* args and
the fd / fsync args are never used together.
Co-authored-by: Oran Agra <oran@redislabs.com>
2022-07-25 14:16:35 +08:00
/* Note that we don't care about aof_background_fsync_and_close because
* server . aof_fd has been replaced by the new INCR AOF file fd ,
* see openNewIncrAofForAppend . */
2019-04-29 14:38:28 +08:00
return bioPendingJobsOfType ( BIO_AOF_FSYNC ) ! = 0 ;
}
2012-05-21 16:50:05 +02:00
/* Starts a background task that performs fsync() against the specified
* file descriptor ( the one of the AOF file ) in another thread . */
2011-09-16 11:08:39 +02:00
void aof_background_fsync ( int fd ) {
Implementing the WAITAOF command (issue #10505) (#11713)
Implementing the WAITAOF functionality which would allow the user to
block until a specified number of Redises have fsynced all previous write
commands to the AOF.
Syntax: `WAITAOF <num_local> <num_replicas> <timeout>`
Response: Array containing two elements: num_local, num_replicas
num_local is always either 0 or 1 representing the local AOF on the master.
num_replicas is the number of replicas that acknowledged the a replication
offset of the last write being fsynced to the AOF.
Returns an error when called on replicas, or when called with non-zero
num_local on a master with AOF disabled, in all other cases the response
just contains number of fsync copies.
Main changes:
* Added code to keep track of replication offsets that are confirmed to have
been fsynced to disk.
* Keep advancing master_repl_offset even when replication is disabled (and
there's no replication backlog, only if there's an AOF enabled).
This way we can use this command and it's mechanisms even when replication
is disabled.
* Extend REPLCONF ACK to `REPLCONF ACK <ofs> FACK <ofs>`, the FACK
will be appended only if there's an AOF on the replica, and already ignored on
old masters (thus backwards compatible)
* WAIT now no longer wait for the replication offset after your last command, but
rather the replication offset after your last write (or read command that caused
propagation, e.g. lazy expiry).
Unrelated changes:
* WAIT command respects CLIENT_DENY_BLOCKING (not just CLIENT_MULTI)
Implementation details:
* Add an atomic var named `fsynced_reploff_pending` that's updated
(usually by the bio thread) and later copied to the main `fsynced_reploff`
variable (only if the AOF base file exists).
I.e. during the initial AOF rewrite it will not be used as the fsynced offset
since the AOF base is still missing.
* Replace close+fsync bio job with new BIO_CLOSE_AOF (AOF specific)
job that will also update fsync offset the field.
* Handle all AOF jobs (BIO_CLOSE_AOF, BIO_AOF_FSYNC) in the same bio
worker thread, to impose ordering on their execution. This solves a
race condition where a job could set `fsynced_reploff_pending` to a higher
value than another pending fsync job, resulting in indicating an offset
for which parts of the data have not yet actually been fsynced.
Imposing an ordering on the jobs guarantees that fsync jobs are executed
in increasing order of replication offset.
* Drain bio jobs when switching `appendfsync` to "always"
This should prevent a write race between updates to `fsynced_reploff_pending`
in the main thread (`flushAppendOnlyFile` when set to ALWAYS fsync), and
those done in the bio thread.
* Drain the pending fsync when starting over a new AOF to avoid race conditions
with the previous AOF offsets overriding the new one (e.g. after switching to
replicate from a new master).
* Make sure to update the fsynced offset at the end of the initial AOF rewrite.
a must in case there are no additional writes that trigger a periodic fsync,
specifically for a replica that does a full sync.
Limitations:
It is possible to write a module and a Lua script that propagate to the AOF and doesn't
propagate to the replication stream. see REDISMODULE_ARGV_NO_REPLICAS and luaRedisSetReplCommand.
These features are incompatible with the WAITAOF command, and can result
in two bad cases. The scenario is that the user executes command that only
propagates to AOF, and then immediately
issues a WAITAOF, and there's no further writes on the replication stream after that.
1. if the the last thing that happened on the replication stream is a PING
(which increased the replication offset but won't trigger an fsync on the replica),
then the client would hang forever (will wait for an fack that the replica will never
send sine it doesn't trigger any fsyncs).
2. if the last thing that happened is a write command that got propagated properly,
then WAITAOF will be released immediately, without waiting for an fsync (since
the offset didn't change)
Refactoring:
* Plumbing to allow bio worker to handle multiple job types
This introduces infrastructure necessary to allow BIO workers to
not have a 1-1 mapping of worker to job-type. This allows in the
future to assign multiple job types to a single worker, either as
a performance/resource optimization, or as a way of enforcing
ordering between specific classes of jobs.
Co-authored-by: Oran Agra <oran@redislabs.com>
2023-03-14 20:26:21 +02:00
bioCreateFsyncJob ( fd , server . master_repl_offset , 1 ) ;
2011-09-16 11:08:39 +02:00
}
fsync the old aof file when open a new INCR AOF (#11004)
In rewriteAppendOnlyFileBackground, after flushAppendOnlyFile(1),
and before openNewIncrAofForAppend, we should call redis_fsync
to fsync the aof file.
Because we may open a new INCR AOF in openNewIncrAofForAppend,
in the case of using everysec policy, the old AOF file may not
be fsynced in time (or even at all).
When using everysec, we don't want to pay the disk latency from
the main thread, so we will do a background fsync.
Adding a argument for bioCreateCloseJob, a `need_fsync` flag to
indicate that a fsync is required before the file is closed. So we will
fsync the old AOF file before we close it.
A cleanup, we make union become a union, since the free_* args and
the fd / fsync args are never used together.
Co-authored-by: Oran Agra <oran@redislabs.com>
2022-07-25 14:16:35 +08:00
/* Close the fd on the basis of aof_background_fsync. */
void aof_background_fsync_and_close ( int fd ) {
Implementing the WAITAOF command (issue #10505) (#11713)
Implementing the WAITAOF functionality which would allow the user to
block until a specified number of Redises have fsynced all previous write
commands to the AOF.
Syntax: `WAITAOF <num_local> <num_replicas> <timeout>`
Response: Array containing two elements: num_local, num_replicas
num_local is always either 0 or 1 representing the local AOF on the master.
num_replicas is the number of replicas that acknowledged the a replication
offset of the last write being fsynced to the AOF.
Returns an error when called on replicas, or when called with non-zero
num_local on a master with AOF disabled, in all other cases the response
just contains number of fsync copies.
Main changes:
* Added code to keep track of replication offsets that are confirmed to have
been fsynced to disk.
* Keep advancing master_repl_offset even when replication is disabled (and
there's no replication backlog, only if there's an AOF enabled).
This way we can use this command and it's mechanisms even when replication
is disabled.
* Extend REPLCONF ACK to `REPLCONF ACK <ofs> FACK <ofs>`, the FACK
will be appended only if there's an AOF on the replica, and already ignored on
old masters (thus backwards compatible)
* WAIT now no longer wait for the replication offset after your last command, but
rather the replication offset after your last write (or read command that caused
propagation, e.g. lazy expiry).
Unrelated changes:
* WAIT command respects CLIENT_DENY_BLOCKING (not just CLIENT_MULTI)
Implementation details:
* Add an atomic var named `fsynced_reploff_pending` that's updated
(usually by the bio thread) and later copied to the main `fsynced_reploff`
variable (only if the AOF base file exists).
I.e. during the initial AOF rewrite it will not be used as the fsynced offset
since the AOF base is still missing.
* Replace close+fsync bio job with new BIO_CLOSE_AOF (AOF specific)
job that will also update fsync offset the field.
* Handle all AOF jobs (BIO_CLOSE_AOF, BIO_AOF_FSYNC) in the same bio
worker thread, to impose ordering on their execution. This solves a
race condition where a job could set `fsynced_reploff_pending` to a higher
value than another pending fsync job, resulting in indicating an offset
for which parts of the data have not yet actually been fsynced.
Imposing an ordering on the jobs guarantees that fsync jobs are executed
in increasing order of replication offset.
* Drain bio jobs when switching `appendfsync` to "always"
This should prevent a write race between updates to `fsynced_reploff_pending`
in the main thread (`flushAppendOnlyFile` when set to ALWAYS fsync), and
those done in the bio thread.
* Drain the pending fsync when starting over a new AOF to avoid race conditions
with the previous AOF offsets overriding the new one (e.g. after switching to
replicate from a new master).
* Make sure to update the fsynced offset at the end of the initial AOF rewrite.
a must in case there are no additional writes that trigger a periodic fsync,
specifically for a replica that does a full sync.
Limitations:
It is possible to write a module and a Lua script that propagate to the AOF and doesn't
propagate to the replication stream. see REDISMODULE_ARGV_NO_REPLICAS and luaRedisSetReplCommand.
These features are incompatible with the WAITAOF command, and can result
in two bad cases. The scenario is that the user executes command that only
propagates to AOF, and then immediately
issues a WAITAOF, and there's no further writes on the replication stream after that.
1. if the the last thing that happened on the replication stream is a PING
(which increased the replication offset but won't trigger an fsync on the replica),
then the client would hang forever (will wait for an fack that the replica will never
send sine it doesn't trigger any fsyncs).
2. if the last thing that happened is a write command that got propagated properly,
then WAITAOF will be released immediately, without waiting for an fsync (since
the offset didn't change)
Refactoring:
* Plumbing to allow bio worker to handle multiple job types
This introduces infrastructure necessary to allow BIO workers to
not have a 1-1 mapping of worker to job-type. This allows in the
future to assign multiple job types to a single worker, either as
a performance/resource optimization, or as a way of enforcing
ordering between specific classes of jobs.
Co-authored-by: Oran Agra <oran@redislabs.com>
2023-03-14 20:26:21 +02:00
bioCreateCloseAofJob ( fd , server . master_repl_offset , 1 ) ;
fsync the old aof file when open a new INCR AOF (#11004)
In rewriteAppendOnlyFileBackground, after flushAppendOnlyFile(1),
and before openNewIncrAofForAppend, we should call redis_fsync
to fsync the aof file.
Because we may open a new INCR AOF in openNewIncrAofForAppend,
in the case of using everysec policy, the old AOF file may not
be fsynced in time (or even at all).
When using everysec, we don't want to pay the disk latency from
the main thread, so we will do a background fsync.
Adding a argument for bioCreateCloseJob, a `need_fsync` flag to
indicate that a fsync is required before the file is closed. So we will
fsync the old AOF file before we close it.
A cleanup, we make union become a union, since the free_* args and
the fd / fsync args are never used together.
Co-authored-by: Oran Agra <oran@redislabs.com>
2022-07-25 14:16:35 +08:00
}
2017-03-07 20:58:54 +02:00
/* Kills an AOFRW child process if exists */
2019-01-21 11:28:44 +01:00
void killAppendOnlyChild ( void ) {
2017-03-07 20:58:54 +02:00
int statloc ;
/* No AOFRW child? return. */
Refactory fork child related infra, Unify child pid
This is a refactory commit, isn't suppose to have any actual impact.
it does the following:
- keep just one server struct fork child pid variable instead of 3
- have one server struct variable indicating the purpose of the current fork
child.
- redisFork is now responsible of updating the server struct with the pid,
which means it can be the one that calls updateDictResizePolicy
- move child info pipe handling into redisFork instead of having them
repeated outside
- there are two classes of fork purposes, mutually exclusive group (AOF, RDB,
Module), and one that can create several forks to coexist in parallel (LDB,
but maybe Modules some day too, Module API allows for that).
- minor fix to killRDBChild:
unlike killAppendOnlyChild and TerminateModuleForkChild, the killRDBChild
doesn't clear the pid variable or call wait4, so checkChildrenDone does
the cleanup for it.
This commit removes the explicit calls to rdbRemoveTempFile, closeChildInfoPipe,
updateDictResizePolicy, which didn't do any harm, but where unnecessary.
2020-12-16 15:14:04 +02:00
if ( server . child_type ! = CHILD_TYPE_AOF ) return ;
2017-03-07 20:58:54 +02:00
/* Kill AOFRW child, wait for child exit. */
serverLog ( LL_NOTICE , " Killing running AOF rewrite child: %ld " ,
Refactory fork child related infra, Unify child pid
This is a refactory commit, isn't suppose to have any actual impact.
it does the following:
- keep just one server struct fork child pid variable instead of 3
- have one server struct variable indicating the purpose of the current fork
child.
- redisFork is now responsible of updating the server struct with the pid,
which means it can be the one that calls updateDictResizePolicy
- move child info pipe handling into redisFork instead of having them
repeated outside
- there are two classes of fork purposes, mutually exclusive group (AOF, RDB,
Module), and one that can create several forks to coexist in parallel (LDB,
but maybe Modules some day too, Module API allows for that).
- minor fix to killRDBChild:
unlike killAppendOnlyChild and TerminateModuleForkChild, the killRDBChild
doesn't clear the pid variable or call wait4, so checkChildrenDone does
the cleanup for it.
This commit removes the explicit calls to rdbRemoveTempFile, closeChildInfoPipe,
updateDictResizePolicy, which didn't do any harm, but where unnecessary.
2020-12-16 15:14:04 +02:00
( long ) server . child_pid ) ;
if ( kill ( server . child_pid , SIGUSR1 ) ! = - 1 ) {
2021-03-24 08:41:05 -07:00
while ( waitpid ( - 1 , & statloc , 0 ) ! = server . child_pid ) ;
2017-03-07 20:58:54 +02:00
}
Refactory fork child related infra, Unify child pid
This is a refactory commit, isn't suppose to have any actual impact.
it does the following:
- keep just one server struct fork child pid variable instead of 3
- have one server struct variable indicating the purpose of the current fork
child.
- redisFork is now responsible of updating the server struct with the pid,
which means it can be the one that calls updateDictResizePolicy
- move child info pipe handling into redisFork instead of having them
repeated outside
- there are two classes of fork purposes, mutually exclusive group (AOF, RDB,
Module), and one that can create several forks to coexist in parallel (LDB,
but maybe Modules some day too, Module API allows for that).
- minor fix to killRDBChild:
unlike killAppendOnlyChild and TerminateModuleForkChild, the killRDBChild
doesn't clear the pid variable or call wait4, so checkChildrenDone does
the cleanup for it.
This commit removes the explicit calls to rdbRemoveTempFile, closeChildInfoPipe,
updateDictResizePolicy, which didn't do any harm, but where unnecessary.
2020-12-16 15:14:04 +02:00
aofRemoveTempFile ( server . child_pid ) ;
resetChildState ( ) ;
2017-03-07 20:58:54 +02:00
server . aof_rewrite_time_start = - 1 ;
}
2010-06-22 00:07:48 +02:00
/* Called when the user switches from "appendonly yes" to "appendonly no"
* at runtime using the CONFIG command . */
void stopAppendOnly ( void ) {
2015-07-27 09:41:48 +02:00
serverAssert ( server . aof_state ! = AOF_OFF ) ;
2011-09-16 12:35:12 +02:00
flushAppendOnlyFile ( 1 ) ;
Handle remaining fsync errors (#8419)
In `aof.c`, we call fsync when stop aof, and now print a log to let user know that if fail.
In `cluster.c`, we now return error, the calling function already handles these write errors.
In `redis-cli.c`, users hope to save rdb, we now print a message if fsync failed.
In `rio.c`, we now treat fsync errors like we do for write errors.
In `server.c`, we try to fsync aof file when shutdown redis, we only can print one log if fail.
In `bio.c`, if failing to fsync aof file, we will set `aof_bio_fsync_status` to error , and reject writing just like last writing aof error, moreover also set INFO command field `aof_last_write_status` to error.
2021-04-01 17:45:15 +08:00
if ( redis_fsync ( server . aof_fd ) = = - 1 ) {
serverLog ( LL_WARNING , " Fail to fsync the AOF file: %s " , strerror ( errno ) ) ;
} else {
server . aof_last_fsync = server . unixtime ;
}
2011-12-21 12:17:02 +01:00
close ( server . aof_fd ) ;
2010-06-22 00:07:48 +02:00
2011-12-21 12:17:02 +01:00
server . aof_fd = - 1 ;
server . aof_selected_db = - 1 ;
2015-07-27 09:41:48 +02:00
server . aof_state = AOF_OFF ;
2020-02-06 10:17:34 +02:00
server . aof_rewrite_scheduled = 0 ;
Implement Multi Part AOF mechanism to avoid AOFRW overheads. (#9788)
Implement Multi-Part AOF mechanism to avoid overheads during AOFRW.
Introducing a folder with multiple AOF files tracked by a manifest file.
The main issues with the the original AOFRW mechanism are:
* buffering of commands that are processed during rewrite (consuming a lot of RAM)
* freezes of the main process when the AOFRW completes to drain the remaining part of the buffer and fsync it.
* double disk IO for the data that arrives during AOFRW (had to be written to both the old and new AOF files)
The main modifications of this PR:
1. Remove the AOF rewrite buffer and related code.
2. Divide the AOF into multiple files, they are classified as two types, one is the the `BASE` type,
it represents the full amount of data (Maybe AOF or RDB format) after each AOFRW, there is only
one `BASE` file at most. The second is `INCR` type, may have more than one. They represent the
incremental commands since the last AOFRW.
3. Use a AOF manifest file to record and manage these AOF files mentioned above.
4. The original configuration of `appendfilename` will be the base part of the new file name, for example:
`appendonly.aof.1.base.rdb` and `appendonly.aof.2.incr.aof`
5. Add manifest-related TCL tests, and modified some existing tests that depend on the `appendfilename`
6. Remove the `aof_rewrite_buffer_length` field in info.
7. Add `aof-disable-auto-gc` configuration. By default we're automatically deleting HISTORY type AOFs.
It also gives users the opportunity to preserve the history AOFs. just for testing use now.
8. Add AOFRW limiting measure. When the AOFRW failures reaches the threshold (3 times now),
we will delay the execution of the next AOFRW by 1 minute. If the next AOFRW also fails, it will be
delayed by 2 minutes. The next is 4, 8, 16, the maximum delay is 60 minutes (1 hour). During the limit
period, we can still use the 'bgrewriteaof' command to execute AOFRW immediately.
9. Support upgrade (load) data from old version redis.
10. Add `appenddirname` configuration, as the directory name of the append only files. All AOF files and
manifest file will be placed in this directory.
11. Only the last AOF file (BASE or INCR) can be truncated. Otherwise redis will exit even if
`aof-load-truncated` is enabled.
Co-authored-by: Oran Agra <oran@redislabs.com>
2022-01-04 01:14:13 +08:00
server . aof_last_incr_size = 0 ;
Fix fork done handler wrongly update fsync metrics and enhance AOF_ FSYNC_ALWAYS (#11973)
This PR fix several unrelated bugs that were discovered by the same set of tests
(WAITAOF tests in #11713), could make the `WAITAOF` test hang.
The change in `backgroundRewriteDoneHandler` is about MP-AOF.
That leftover / old code assumes that we started a new AOF file just now
(when we have a new base into which we're gonna incrementally write), but
the fact is that with MP-AOF, the fork done handler doesn't really affect the
incremental file being maintained by the parent process, there's no reason to
re-issue `SELECT`, and no reason to update any of the fsync variables in that flow.
This should have been deleted with MP-AOF (introduced in #9788, 7.0).
The damage is that the update to `aof_fsync_offset` will cause us to miss an fsync
in `flushAppendOnlyFile`, that happens if we stop write commands in `AOF_FSYNC_EVERYSEC`
while an AOFRW is in progress. This caused a new `WAITAOF` test to sometime hang forever.
Also because of MP-AOF, we needed to change `aof_fsync_offset` to `aof_last_incr_fsync_offset`
and match it to `aof_last_incr_size` in `flushAppendOnlyFile`. This is because in the past we compared
`aof_fsync_offset` and `aof_current_size`, but with MP-AOF it could be the total AOF file will be
smaller after AOFRW, and the (already existing) incr file still has data that needs to be fsynced.
The change in `flushAppendOnlyFile`, about the `AOF_FSYNC_ALWAYS`, it is follow #6053
(the details is in #5985), we also check `AOF_FSYNC_ALWAYS` to handle a case where
appendfsync is changed from everysec to always while there is data that's written but not yet fsynced.
2023-03-29 20:17:05 +08:00
server . aof_last_incr_fsync_offset = 0 ;
Implementing the WAITAOF command (issue #10505) (#11713)
Implementing the WAITAOF functionality which would allow the user to
block until a specified number of Redises have fsynced all previous write
commands to the AOF.
Syntax: `WAITAOF <num_local> <num_replicas> <timeout>`
Response: Array containing two elements: num_local, num_replicas
num_local is always either 0 or 1 representing the local AOF on the master.
num_replicas is the number of replicas that acknowledged the a replication
offset of the last write being fsynced to the AOF.
Returns an error when called on replicas, or when called with non-zero
num_local on a master with AOF disabled, in all other cases the response
just contains number of fsync copies.
Main changes:
* Added code to keep track of replication offsets that are confirmed to have
been fsynced to disk.
* Keep advancing master_repl_offset even when replication is disabled (and
there's no replication backlog, only if there's an AOF enabled).
This way we can use this command and it's mechanisms even when replication
is disabled.
* Extend REPLCONF ACK to `REPLCONF ACK <ofs> FACK <ofs>`, the FACK
will be appended only if there's an AOF on the replica, and already ignored on
old masters (thus backwards compatible)
* WAIT now no longer wait for the replication offset after your last command, but
rather the replication offset after your last write (or read command that caused
propagation, e.g. lazy expiry).
Unrelated changes:
* WAIT command respects CLIENT_DENY_BLOCKING (not just CLIENT_MULTI)
Implementation details:
* Add an atomic var named `fsynced_reploff_pending` that's updated
(usually by the bio thread) and later copied to the main `fsynced_reploff`
variable (only if the AOF base file exists).
I.e. during the initial AOF rewrite it will not be used as the fsynced offset
since the AOF base is still missing.
* Replace close+fsync bio job with new BIO_CLOSE_AOF (AOF specific)
job that will also update fsync offset the field.
* Handle all AOF jobs (BIO_CLOSE_AOF, BIO_AOF_FSYNC) in the same bio
worker thread, to impose ordering on their execution. This solves a
race condition where a job could set `fsynced_reploff_pending` to a higher
value than another pending fsync job, resulting in indicating an offset
for which parts of the data have not yet actually been fsynced.
Imposing an ordering on the jobs guarantees that fsync jobs are executed
in increasing order of replication offset.
* Drain bio jobs when switching `appendfsync` to "always"
This should prevent a write race between updates to `fsynced_reploff_pending`
in the main thread (`flushAppendOnlyFile` when set to ALWAYS fsync), and
those done in the bio thread.
* Drain the pending fsync when starting over a new AOF to avoid race conditions
with the previous AOF offsets overriding the new one (e.g. after switching to
replicate from a new master).
* Make sure to update the fsynced offset at the end of the initial AOF rewrite.
a must in case there are no additional writes that trigger a periodic fsync,
specifically for a replica that does a full sync.
Limitations:
It is possible to write a module and a Lua script that propagate to the AOF and doesn't
propagate to the replication stream. see REDISMODULE_ARGV_NO_REPLICAS and luaRedisSetReplCommand.
These features are incompatible with the WAITAOF command, and can result
in two bad cases. The scenario is that the user executes command that only
propagates to AOF, and then immediately
issues a WAITAOF, and there's no further writes on the replication stream after that.
1. if the the last thing that happened on the replication stream is a PING
(which increased the replication offset but won't trigger an fsync on the replica),
then the client would hang forever (will wait for an fack that the replica will never
send sine it doesn't trigger any fsyncs).
2. if the last thing that happened is a write command that got propagated properly,
then WAITAOF will be released immediately, without waiting for an fsync (since
the offset didn't change)
Refactoring:
* Plumbing to allow bio worker to handle multiple job types
This introduces infrastructure necessary to allow BIO workers to
not have a 1-1 mapping of worker to job-type. This allows in the
future to assign multiple job types to a single worker, either as
a performance/resource optimization, or as a way of enforcing
ordering between specific classes of jobs.
Co-authored-by: Oran Agra <oran@redislabs.com>
2023-03-14 20:26:21 +02:00
server . fsynced_reploff = - 1 ;
atomicSet ( server . fsynced_reploff_pending , 0 ) ;
2017-03-07 20:58:54 +02:00
killAppendOnlyChild ( ) ;
AOF: recover from last write error after turn on appendonly again (#8030)
The key point is how to recover from last AOF write error, for example:
1. start redis with appendonly yes, and append some write commands
2. short write or something else error happen, `server.aof_last_write_status` changed to `C_ERR`, now redis doesn't accept write commands
3. execute `CONFIG SET appendonly no` to avoid the above problem, now redis can accept write commands again
4. disk error resolved, and execute `CONFIG SET appendonly yes` to reopen AOF, but `server.aof_last_write_status` cannot be changed to `C_OK` (if background aof rewrite run less then 1 second, it will free `server.aof_buf` and then serverCron cannot fix `aof_last_write_status`), then redis cannot accept write commands forever.
This PR use a simple way to fix it:
1. just free `server.aof_buf` when stop appendonly to save memory, if error happens in `flushAppendOnlyFile(1)`, the `server.aof_buf` may contains some data which has not be written to aof, I think we can ignore it because we turn off the appendonly.
2. reset fsync status after stop appendonly and call `flushAppendOnlyFile` only when `aof_state` is ON
3. reset `server.last_write_status` when reopen aof to accept write commands
2021-01-29 14:35:10 +08:00
sdsfree ( server . aof_buf ) ;
server . aof_buf = sdsempty ( ) ;
2010-06-22 00:07:48 +02:00
}
/* Called when the user switches from "appendonly no" to "appendonly yes"
* at runtime using the CONFIG command . */
int startAppendOnly ( void ) {
2015-07-27 09:41:48 +02:00
serverAssert ( server . aof_state = = AOF_OFF ) ;
2016-02-15 16:14:56 +01:00
Implement Multi Part AOF mechanism to avoid AOFRW overheads. (#9788)
Implement Multi-Part AOF mechanism to avoid overheads during AOFRW.
Introducing a folder with multiple AOF files tracked by a manifest file.
The main issues with the the original AOFRW mechanism are:
* buffering of commands that are processed during rewrite (consuming a lot of RAM)
* freezes of the main process when the AOFRW completes to drain the remaining part of the buffer and fsync it.
* double disk IO for the data that arrives during AOFRW (had to be written to both the old and new AOF files)
The main modifications of this PR:
1. Remove the AOF rewrite buffer and related code.
2. Divide the AOF into multiple files, they are classified as two types, one is the the `BASE` type,
it represents the full amount of data (Maybe AOF or RDB format) after each AOFRW, there is only
one `BASE` file at most. The second is `INCR` type, may have more than one. They represent the
incremental commands since the last AOFRW.
3. Use a AOF manifest file to record and manage these AOF files mentioned above.
4. The original configuration of `appendfilename` will be the base part of the new file name, for example:
`appendonly.aof.1.base.rdb` and `appendonly.aof.2.incr.aof`
5. Add manifest-related TCL tests, and modified some existing tests that depend on the `appendfilename`
6. Remove the `aof_rewrite_buffer_length` field in info.
7. Add `aof-disable-auto-gc` configuration. By default we're automatically deleting HISTORY type AOFs.
It also gives users the opportunity to preserve the history AOFs. just for testing use now.
8. Add AOFRW limiting measure. When the AOFRW failures reaches the threshold (3 times now),
we will delay the execution of the next AOFRW by 1 minute. If the next AOFRW also fails, it will be
delayed by 2 minutes. The next is 4, 8, 16, the maximum delay is 60 minutes (1 hour). During the limit
period, we can still use the 'bgrewriteaof' command to execute AOFRW immediately.
9. Support upgrade (load) data from old version redis.
10. Add `appenddirname` configuration, as the directory name of the append only files. All AOF files and
manifest file will be placed in this directory.
11. Only the last AOF file (BASE or INCR) can be truncated. Otherwise redis will exit even if
`aof-load-truncated` is enabled.
Co-authored-by: Oran Agra <oran@redislabs.com>
2022-01-04 01:14:13 +08:00
server . aof_state = AOF_WAIT_REWRITE ;
Refactory fork child related infra, Unify child pid
This is a refactory commit, isn't suppose to have any actual impact.
it does the following:
- keep just one server struct fork child pid variable instead of 3
- have one server struct variable indicating the purpose of the current fork
child.
- redisFork is now responsible of updating the server struct with the pid,
which means it can be the one that calls updateDictResizePolicy
- move child info pipe handling into redisFork instead of having them
repeated outside
- there are two classes of fork purposes, mutually exclusive group (AOF, RDB,
Module), and one that can create several forks to coexist in parallel (LDB,
but maybe Modules some day too, Module API allows for that).
- minor fix to killRDBChild:
unlike killAppendOnlyChild and TerminateModuleForkChild, the killRDBChild
doesn't clear the pid variable or call wait4, so checkChildrenDone does
the cleanup for it.
This commit removes the explicit calls to rdbRemoveTempFile, closeChildInfoPipe,
updateDictResizePolicy, which didn't do any harm, but where unnecessary.
2020-12-16 15:14:04 +02:00
if ( hasActiveChildProcess ( ) & & server . child_type ! = CHILD_TYPE_AOF ) {
2016-07-21 18:34:53 +02:00
server . aof_rewrite_scheduled = 1 ;
2023-02-19 22:33:19 +08:00
serverLog ( LL_NOTICE , " AOF was enabled but there is already another background operation. An AOF background was scheduled to start when possible. " ) ;
2022-01-04 12:37:47 +01:00
} else if ( server . in_exec ) {
server . aof_rewrite_scheduled = 1 ;
2023-02-19 22:33:19 +08:00
serverLog ( LL_NOTICE , " AOF was enabled during a transaction. An AOF background was scheduled to start when possible. " ) ;
2017-03-07 20:58:54 +02:00
} else {
2018-02-13 15:43:26 +01:00
/* If there is a pending AOF rewrite, we need to switch it off and
2018-07-01 13:24:50 +08:00
* start a new one : the old one cannot be reused because it is not
2018-02-13 15:43:26 +01:00
* accumulating the AOF buffer . */
Refactory fork child related infra, Unify child pid
This is a refactory commit, isn't suppose to have any actual impact.
it does the following:
- keep just one server struct fork child pid variable instead of 3
- have one server struct variable indicating the purpose of the current fork
child.
- redisFork is now responsible of updating the server struct with the pid,
which means it can be the one that calls updateDictResizePolicy
- move child info pipe handling into redisFork instead of having them
repeated outside
- there are two classes of fork purposes, mutually exclusive group (AOF, RDB,
Module), and one that can create several forks to coexist in parallel (LDB,
but maybe Modules some day too, Module API allows for that).
- minor fix to killRDBChild:
unlike killAppendOnlyChild and TerminateModuleForkChild, the killRDBChild
doesn't clear the pid variable or call wait4, so checkChildrenDone does
the cleanup for it.
This commit removes the explicit calls to rdbRemoveTempFile, closeChildInfoPipe,
updateDictResizePolicy, which didn't do any harm, but where unnecessary.
2020-12-16 15:14:04 +02:00
if ( server . child_type = = CHILD_TYPE_AOF ) {
2023-02-19 22:33:19 +08:00
serverLog ( LL_NOTICE , " AOF was enabled but there is already an AOF rewriting in background. Stopping background AOF and starting a rewrite now. " ) ;
2017-03-07 20:58:54 +02:00
killAppendOnlyChild ( ) ;
}
Implement Multi Part AOF mechanism to avoid AOFRW overheads. (#9788)
Implement Multi-Part AOF mechanism to avoid overheads during AOFRW.
Introducing a folder with multiple AOF files tracked by a manifest file.
The main issues with the the original AOFRW mechanism are:
* buffering of commands that are processed during rewrite (consuming a lot of RAM)
* freezes of the main process when the AOFRW completes to drain the remaining part of the buffer and fsync it.
* double disk IO for the data that arrives during AOFRW (had to be written to both the old and new AOF files)
The main modifications of this PR:
1. Remove the AOF rewrite buffer and related code.
2. Divide the AOF into multiple files, they are classified as two types, one is the the `BASE` type,
it represents the full amount of data (Maybe AOF or RDB format) after each AOFRW, there is only
one `BASE` file at most. The second is `INCR` type, may have more than one. They represent the
incremental commands since the last AOFRW.
3. Use a AOF manifest file to record and manage these AOF files mentioned above.
4. The original configuration of `appendfilename` will be the base part of the new file name, for example:
`appendonly.aof.1.base.rdb` and `appendonly.aof.2.incr.aof`
5. Add manifest-related TCL tests, and modified some existing tests that depend on the `appendfilename`
6. Remove the `aof_rewrite_buffer_length` field in info.
7. Add `aof-disable-auto-gc` configuration. By default we're automatically deleting HISTORY type AOFs.
It also gives users the opportunity to preserve the history AOFs. just for testing use now.
8. Add AOFRW limiting measure. When the AOFRW failures reaches the threshold (3 times now),
we will delay the execution of the next AOFRW by 1 minute. If the next AOFRW also fails, it will be
delayed by 2 minutes. The next is 4, 8, 16, the maximum delay is 60 minutes (1 hour). During the limit
period, we can still use the 'bgrewriteaof' command to execute AOFRW immediately.
9. Support upgrade (load) data from old version redis.
10. Add `appenddirname` configuration, as the directory name of the append only files. All AOF files and
manifest file will be placed in this directory.
11. Only the last AOF file (BASE or INCR) can be truncated. Otherwise redis will exit even if
`aof-load-truncated` is enabled.
Co-authored-by: Oran Agra <oran@redislabs.com>
2022-01-04 01:14:13 +08:00
2017-03-07 20:58:54 +02:00
if ( rewriteAppendOnlyFileBackground ( ) = = C_ERR ) {
Implement Multi Part AOF mechanism to avoid AOFRW overheads. (#9788)
Implement Multi-Part AOF mechanism to avoid overheads during AOFRW.
Introducing a folder with multiple AOF files tracked by a manifest file.
The main issues with the the original AOFRW mechanism are:
* buffering of commands that are processed during rewrite (consuming a lot of RAM)
* freezes of the main process when the AOFRW completes to drain the remaining part of the buffer and fsync it.
* double disk IO for the data that arrives during AOFRW (had to be written to both the old and new AOF files)
The main modifications of this PR:
1. Remove the AOF rewrite buffer and related code.
2. Divide the AOF into multiple files, they are classified as two types, one is the the `BASE` type,
it represents the full amount of data (Maybe AOF or RDB format) after each AOFRW, there is only
one `BASE` file at most. The second is `INCR` type, may have more than one. They represent the
incremental commands since the last AOFRW.
3. Use a AOF manifest file to record and manage these AOF files mentioned above.
4. The original configuration of `appendfilename` will be the base part of the new file name, for example:
`appendonly.aof.1.base.rdb` and `appendonly.aof.2.incr.aof`
5. Add manifest-related TCL tests, and modified some existing tests that depend on the `appendfilename`
6. Remove the `aof_rewrite_buffer_length` field in info.
7. Add `aof-disable-auto-gc` configuration. By default we're automatically deleting HISTORY type AOFs.
It also gives users the opportunity to preserve the history AOFs. just for testing use now.
8. Add AOFRW limiting measure. When the AOFRW failures reaches the threshold (3 times now),
we will delay the execution of the next AOFRW by 1 minute. If the next AOFRW also fails, it will be
delayed by 2 minutes. The next is 4, 8, 16, the maximum delay is 60 minutes (1 hour). During the limit
period, we can still use the 'bgrewriteaof' command to execute AOFRW immediately.
9. Support upgrade (load) data from old version redis.
10. Add `appenddirname` configuration, as the directory name of the append only files. All AOF files and
manifest file will be placed in this directory.
11. Only the last AOF file (BASE or INCR) can be truncated. Otherwise redis will exit even if
`aof-load-truncated` is enabled.
Co-authored-by: Oran Agra <oran@redislabs.com>
2022-01-04 01:14:13 +08:00
server . aof_state = AOF_OFF ;
2017-03-07 20:58:54 +02:00
serverLog ( LL_WARNING , " Redis needs to enable the AOF but can't trigger a background AOF rewrite operation. Check the above logs for more info about the error. " ) ;
return C_ERR ;
}
2010-06-22 00:07:48 +02:00
}
2017-12-15 11:06:58 +08:00
server . aof_last_fsync = server . unixtime ;
Handle remaining fsync errors (#8419)
In `aof.c`, we call fsync when stop aof, and now print a log to let user know that if fail.
In `cluster.c`, we now return error, the calling function already handles these write errors.
In `redis-cli.c`, users hope to save rdb, we now print a message if fsync failed.
In `rio.c`, we now treat fsync errors like we do for write errors.
In `server.c`, we try to fsync aof file when shutdown redis, we only can print one log if fail.
In `bio.c`, if failing to fsync aof file, we will set `aof_bio_fsync_status` to error , and reject writing just like last writing aof error, moreover also set INFO command field `aof_last_write_status` to error.
2021-04-01 17:45:15 +08:00
/* If AOF fsync error in bio job, we just ignore it and log the event. */
int aof_bio_fsync_status ;
atomicGet ( server . aof_bio_fsync_status , aof_bio_fsync_status ) ;
if ( aof_bio_fsync_status = = C_ERR ) {
serverLog ( LL_WARNING ,
" AOF reopen, just ignore the AOF fsync error in bio job " ) ;
atomicSet ( server . aof_bio_fsync_status , C_OK ) ;
}
AOF: recover from last write error after turn on appendonly again (#8030)
The key point is how to recover from last AOF write error, for example:
1. start redis with appendonly yes, and append some write commands
2. short write or something else error happen, `server.aof_last_write_status` changed to `C_ERR`, now redis doesn't accept write commands
3. execute `CONFIG SET appendonly no` to avoid the above problem, now redis can accept write commands again
4. disk error resolved, and execute `CONFIG SET appendonly yes` to reopen AOF, but `server.aof_last_write_status` cannot be changed to `C_OK` (if background aof rewrite run less then 1 second, it will free `server.aof_buf` and then serverCron cannot fix `aof_last_write_status`), then redis cannot accept write commands forever.
This PR use a simple way to fix it:
1. just free `server.aof_buf` when stop appendonly to save memory, if error happens in `flushAppendOnlyFile(1)`, the `server.aof_buf` may contains some data which has not be written to aof, I think we can ignore it because we turn off the appendonly.
2. reset fsync status after stop appendonly and call `flushAppendOnlyFile` only when `aof_state` is ON
3. reset `server.last_write_status` when reopen aof to accept write commands
2021-01-29 14:35:10 +08:00
/* If AOF was in error state, we just ignore it and log the event. */
if ( server . aof_last_write_status = = C_ERR ) {
serverLog ( LL_WARNING , " AOF reopen, just ignore the last error. " ) ;
server . aof_last_write_status = C_OK ;
}
2015-07-26 23:17:55 +02:00
return C_OK ;
2010-06-22 00:07:48 +02:00
}
2017-12-14 12:19:30 +01:00
/* This is a wrapper to the write syscall in order to retry on short writes
* or if the syscall gets interrupted . It could look strange that we retry
* on short writes given that we are writing to a block device : normally if
* the first call is short , there is a end - of - space condition , so the next
* is likely to fail . However apparently in modern systems this is no longer
* true , and in general it looks just more resilient to retry the write . If
* there is an actual error condition we ' ll get it at the next try . */
ssize_t aofWrite ( int fd , const char * buf , size_t len ) {
2017-11-30 10:22:12 +08:00
ssize_t nwritten = 0 , totwritten = 0 ;
while ( len ) {
nwritten = write ( fd , buf , len ) ;
if ( nwritten < 0 ) {
2019-07-12 12:18:33 +02:00
if ( errno = = EINTR ) continue ;
2017-11-30 10:22:12 +08:00
return totwritten ? totwritten : - 1 ;
}
len - = nwritten ;
buf + = nwritten ;
totwritten + = nwritten ;
}
return totwritten ;
}
2010-06-22 00:07:48 +02:00
/* Write the append only file buffer on disk.
*
* Since we are required to write the AOF before replying to the client ,
2021-06-07 00:12:25 -07:00
* and the only way the client socket can get a write is entering when
2010-06-22 00:07:48 +02:00
* the event loop , we accumulate all the AOF writes in a memory
* buffer and write it on disk using this function just before entering
2011-09-16 12:35:12 +02:00
* the event loop again .
*
* About the ' force ' argument :
*
* When the fsync policy is set to ' everysec ' we may delay the flush if there
* is still an fsync ( ) going on in the background thread , since for instance
* on Linux write ( 2 ) will be blocked by the background fsync anyway .
* When this happens we remember that there is some aof buffer to be
* flushed ASAP , and will try to do that in the serverCron ( ) function .
*
* However if force is set to 1 we ' ll write regardless of the background
* fsync . */
2014-02-12 12:47:10 +01:00
# define AOF_WRITE_LOG_ERROR_RATE 30 /* Seconds between errors logging. */
2011-09-16 12:35:12 +02:00
void flushAppendOnlyFile ( int force ) {
2010-06-22 00:07:48 +02:00
ssize_t nwritten ;
2011-09-16 12:35:12 +02:00
int sync_in_progress = 0 ;
2014-07-01 17:19:08 +02:00
mstime_t latency ;
2010-06-22 00:07:48 +02:00
2019-04-29 14:38:28 +08:00
if ( sdslen ( server . aof_buf ) = = 0 ) {
/* Check if we need to do fsync even the aof buffer is empty,
* because previously in AOF_FSYNC_EVERYSEC mode , fsync is
* called only when aof buffer is not empty , so if users
* stop write commands before fsync called in one second ,
* the data in page cache cannot be flushed in time . */
if ( server . aof_fsync = = AOF_FSYNC_EVERYSEC & &
Fix fork done handler wrongly update fsync metrics and enhance AOF_ FSYNC_ALWAYS (#11973)
This PR fix several unrelated bugs that were discovered by the same set of tests
(WAITAOF tests in #11713), could make the `WAITAOF` test hang.
The change in `backgroundRewriteDoneHandler` is about MP-AOF.
That leftover / old code assumes that we started a new AOF file just now
(when we have a new base into which we're gonna incrementally write), but
the fact is that with MP-AOF, the fork done handler doesn't really affect the
incremental file being maintained by the parent process, there's no reason to
re-issue `SELECT`, and no reason to update any of the fsync variables in that flow.
This should have been deleted with MP-AOF (introduced in #9788, 7.0).
The damage is that the update to `aof_fsync_offset` will cause us to miss an fsync
in `flushAppendOnlyFile`, that happens if we stop write commands in `AOF_FSYNC_EVERYSEC`
while an AOFRW is in progress. This caused a new `WAITAOF` test to sometime hang forever.
Also because of MP-AOF, we needed to change `aof_fsync_offset` to `aof_last_incr_fsync_offset`
and match it to `aof_last_incr_size` in `flushAppendOnlyFile`. This is because in the past we compared
`aof_fsync_offset` and `aof_current_size`, but with MP-AOF it could be the total AOF file will be
smaller after AOFRW, and the (already existing) incr file still has data that needs to be fsynced.
The change in `flushAppendOnlyFile`, about the `AOF_FSYNC_ALWAYS`, it is follow #6053
(the details is in #5985), we also check `AOF_FSYNC_ALWAYS` to handle a case where
appendfsync is changed from everysec to always while there is data that's written but not yet fsynced.
2023-03-29 20:17:05 +08:00
server . aof_last_incr_fsync_offset ! = server . aof_last_incr_size & &
2019-04-29 14:38:28 +08:00
server . unixtime > server . aof_last_fsync & &
! ( sync_in_progress = aofFsyncInProgress ( ) ) ) {
goto try_fsync ;
Fix fork done handler wrongly update fsync metrics and enhance AOF_ FSYNC_ALWAYS (#11973)
This PR fix several unrelated bugs that were discovered by the same set of tests
(WAITAOF tests in #11713), could make the `WAITAOF` test hang.
The change in `backgroundRewriteDoneHandler` is about MP-AOF.
That leftover / old code assumes that we started a new AOF file just now
(when we have a new base into which we're gonna incrementally write), but
the fact is that with MP-AOF, the fork done handler doesn't really affect the
incremental file being maintained by the parent process, there's no reason to
re-issue `SELECT`, and no reason to update any of the fsync variables in that flow.
This should have been deleted with MP-AOF (introduced in #9788, 7.0).
The damage is that the update to `aof_fsync_offset` will cause us to miss an fsync
in `flushAppendOnlyFile`, that happens if we stop write commands in `AOF_FSYNC_EVERYSEC`
while an AOFRW is in progress. This caused a new `WAITAOF` test to sometime hang forever.
Also because of MP-AOF, we needed to change `aof_fsync_offset` to `aof_last_incr_fsync_offset`
and match it to `aof_last_incr_size` in `flushAppendOnlyFile`. This is because in the past we compared
`aof_fsync_offset` and `aof_current_size`, but with MP-AOF it could be the total AOF file will be
smaller after AOFRW, and the (already existing) incr file still has data that needs to be fsynced.
The change in `flushAppendOnlyFile`, about the `AOF_FSYNC_ALWAYS`, it is follow #6053
(the details is in #5985), we also check `AOF_FSYNC_ALWAYS` to handle a case where
appendfsync is changed from everysec to always while there is data that's written but not yet fsynced.
2023-03-29 20:17:05 +08:00
/* Check if we need to do fsync even the aof buffer is empty,
* the reason is described in the previous AOF_FSYNC_EVERYSEC block ,
* and AOF_FSYNC_ALWAYS is also checked here to handle a case where
* aof_fsync is changed from everysec to always . */
} else if ( server . aof_fsync = = AOF_FSYNC_ALWAYS & &
server . aof_last_incr_fsync_offset ! = server . aof_last_incr_size )
{
goto try_fsync ;
2019-04-29 14:38:28 +08:00
} else {
return ;
}
}
2010-06-22 00:07:48 +02:00
2011-12-21 11:58:42 +01:00
if ( server . aof_fsync = = AOF_FSYNC_EVERYSEC )
2019-04-29 14:38:28 +08:00
sync_in_progress = aofFsyncInProgress ( ) ;
2011-09-16 12:35:12 +02:00
2011-12-21 11:58:42 +01:00
if ( server . aof_fsync = = AOF_FSYNC_EVERYSEC & & ! force ) {
2011-09-16 12:35:12 +02:00
/* With this append fsync policy we do background fsyncing.
* If the fsync is still in progress we can try to delay
* the write for a couple of seconds . */
if ( sync_in_progress ) {
if ( server . aof_flush_postponed_start = = 0 ) {
2014-07-12 18:02:17 +02:00
/* No previous write postponing, remember that we are
2011-09-16 12:35:12 +02:00
* postponing the flush and return . */
server . aof_flush_postponed_start = server . unixtime ;
return ;
} else if ( server . unixtime - server . aof_flush_postponed_start < 2 ) {
2011-09-19 17:49:50 +02:00
/* We were already waiting for fsync to finish, but for less
2011-09-16 12:35:12 +02:00
* than two seconds this is still ok . Postpone again . */
return ;
}
2021-06-10 20:39:33 +08:00
/* Otherwise fall through, and go write since we can't wait
2011-09-16 12:35:12 +02:00
* over two seconds . */
2012-03-25 11:27:35 +02:00
server . aof_delayed_fsync + + ;
2015-07-27 09:41:48 +02:00
serverLog ( LL_NOTICE , " Asynchronous AOF fsync is taking too long (disk is busy?). Writing the AOF buffer without waiting for fsync to complete, this may slow down Redis. " ) ;
2011-09-16 12:35:12 +02:00
}
}
2010-06-22 00:07:48 +02:00
/* We want to perform a single write. This should be guaranteed atomic
* at least if the filesystem we are writing is a real physical one .
* While this will save us against the server being killed I don ' t think
* there is much to do about the whole server stopping for power problems
* or alike */
2014-07-02 16:45:45 +02:00
2019-08-19 12:18:25 +03:00
if ( server . aof_flush_sleep & & sdslen ( server . aof_buf ) ) {
usleep ( server . aof_flush_sleep ) ;
}
2014-07-01 17:19:08 +02:00
latencyStartMonitor ( latency ) ;
2017-12-14 12:19:30 +01:00
nwritten = aofWrite ( server . aof_fd , server . aof_buf , sdslen ( server . aof_buf ) ) ;
2014-07-01 17:19:08 +02:00
latencyEndMonitor ( latency ) ;
2014-07-02 16:45:45 +02:00
/* We want to capture different events for delayed writes:
* when the delay happens with a pending fsync , or with a saving child
* active , and when the above two conditions are missing .
* We also use an additional event name to save all samples which is
* useful for graphing / monitoring purposes . */
2014-07-02 17:42:29 +02:00
if ( sync_in_progress ) {
2014-07-02 16:45:45 +02:00
latencyAddSampleIfNeeded ( " aof-write-pending-fsync " , latency ) ;
2019-09-27 12:03:09 +02:00
} else if ( hasActiveChildProcess ( ) ) {
2014-07-02 16:45:45 +02:00
latencyAddSampleIfNeeded ( " aof-write-active-child " , latency ) ;
} else {
latencyAddSampleIfNeeded ( " aof-write-alone " , latency ) ;
}
2014-07-01 17:19:08 +02:00
latencyAddSampleIfNeeded ( " aof-write " , latency ) ;
2014-07-02 16:45:45 +02:00
/* We performed the write so reset the postponed flush sentinel to zero. */
server . aof_flush_postponed_start = 0 ;
2017-11-30 10:27:12 +08:00
if ( nwritten ! = ( ssize_t ) sdslen ( server . aof_buf ) ) {
2014-02-12 12:47:10 +01:00
static time_t last_write_error_log = 0 ;
int can_log = 0 ;
/* Limit logging rate to 1 line per AOF_WRITE_LOG_ERROR_RATE seconds. */
if ( ( server . unixtime - last_write_error_log ) > AOF_WRITE_LOG_ERROR_RATE ) {
can_log = 1 ;
last_write_error_log = server . unixtime ;
}
2014-07-12 18:02:17 +02:00
/* Log the AOF write error and record the error code. */
2011-08-18 12:27:34 +02:00
if ( nwritten = = - 1 ) {
2014-02-12 12:47:10 +01:00
if ( can_log ) {
2015-07-27 09:41:48 +02:00
serverLog ( LL_WARNING , " Error writing to the AOF file: %s " ,
2014-02-12 12:47:10 +01:00
strerror ( errno ) ) ;
}
2022-07-03 13:34:39 +08:00
server . aof_last_write_errno = errno ;
2011-08-18 12:27:34 +02:00
} else {
2014-02-12 12:47:10 +01:00
if ( can_log ) {
2015-07-27 09:41:48 +02:00
serverLog ( LL_WARNING , " Short write while writing to "
2014-02-12 12:47:10 +01:00
" the AOF file: (nwritten=%lld, "
" expected=%lld) " ,
( long long ) nwritten ,
( long long ) sdslen ( server . aof_buf ) ) ;
}
2012-07-16 15:33:25 +10:00
Implement Multi Part AOF mechanism to avoid AOFRW overheads. (#9788)
Implement Multi-Part AOF mechanism to avoid overheads during AOFRW.
Introducing a folder with multiple AOF files tracked by a manifest file.
The main issues with the the original AOFRW mechanism are:
* buffering of commands that are processed during rewrite (consuming a lot of RAM)
* freezes of the main process when the AOFRW completes to drain the remaining part of the buffer and fsync it.
* double disk IO for the data that arrives during AOFRW (had to be written to both the old and new AOF files)
The main modifications of this PR:
1. Remove the AOF rewrite buffer and related code.
2. Divide the AOF into multiple files, they are classified as two types, one is the the `BASE` type,
it represents the full amount of data (Maybe AOF or RDB format) after each AOFRW, there is only
one `BASE` file at most. The second is `INCR` type, may have more than one. They represent the
incremental commands since the last AOFRW.
3. Use a AOF manifest file to record and manage these AOF files mentioned above.
4. The original configuration of `appendfilename` will be the base part of the new file name, for example:
`appendonly.aof.1.base.rdb` and `appendonly.aof.2.incr.aof`
5. Add manifest-related TCL tests, and modified some existing tests that depend on the `appendfilename`
6. Remove the `aof_rewrite_buffer_length` field in info.
7. Add `aof-disable-auto-gc` configuration. By default we're automatically deleting HISTORY type AOFs.
It also gives users the opportunity to preserve the history AOFs. just for testing use now.
8. Add AOFRW limiting measure. When the AOFRW failures reaches the threshold (3 times now),
we will delay the execution of the next AOFRW by 1 minute. If the next AOFRW also fails, it will be
delayed by 2 minutes. The next is 4, 8, 16, the maximum delay is 60 minutes (1 hour). During the limit
period, we can still use the 'bgrewriteaof' command to execute AOFRW immediately.
9. Support upgrade (load) data from old version redis.
10. Add `appenddirname` configuration, as the directory name of the append only files. All AOF files and
manifest file will be placed in this directory.
11. Only the last AOF file (BASE or INCR) can be truncated. Otherwise redis will exit even if
`aof-load-truncated` is enabled.
Co-authored-by: Oran Agra <oran@redislabs.com>
2022-01-04 01:14:13 +08:00
if ( ftruncate ( server . aof_fd , server . aof_last_incr_size ) = = - 1 ) {
2014-02-12 12:47:10 +01:00
if ( can_log ) {
2015-07-27 09:41:48 +02:00
serverLog ( LL_WARNING , " Could not remove short write "
2014-02-12 12:47:10 +01:00
" from the append-only file. Redis may refuse "
" to load the AOF the next time it starts. "
" ftruncate: %s " , strerror ( errno ) ) ;
}
} else {
2014-07-12 18:02:17 +02:00
/* If the ftruncate() succeeded we can set nwritten to
2014-02-12 12:47:10 +01:00
* - 1 since there is no longer partial data into the AOF . */
nwritten = - 1 ;
2012-07-16 15:33:25 +10:00
}
2014-02-12 12:47:10 +01:00
server . aof_last_write_errno = ENOSPC ;
}
/* Handle the AOF write error. */
if ( server . aof_fsync = = AOF_FSYNC_ALWAYS ) {
2021-01-28 22:25:35 +08:00
/* We can't recover when the fsync policy is ALWAYS since the reply
* for the client is already in the output buffers ( both writes and
* reads ) , and the changes to the db can ' t be rolled back . Since we
* have a contract with the user that on acknowledged or observed
* writes are is synced on disk , we must exit . */
2015-07-27 09:41:48 +02:00
serverLog ( LL_WARNING , " Can't recover from AOF write error when the AOF fsync policy is 'always'. Exiting... " ) ;
2014-02-12 12:47:10 +01:00
exit ( 1 ) ;
} else {
/* Recover from failed write leaving data into the buffer. However
* set an error to stop accepting writes as long as the error
* condition is not cleared . */
2015-07-26 23:17:55 +02:00
server . aof_last_write_status = C_ERR ;
2014-02-12 12:47:10 +01:00
/* Trim the sds buffer if there was a partial write, and there
* was no way to undo it with ftruncate ( 2 ) . */
if ( nwritten > 0 ) {
server . aof_current_size + = nwritten ;
Implement Multi Part AOF mechanism to avoid AOFRW overheads. (#9788)
Implement Multi-Part AOF mechanism to avoid overheads during AOFRW.
Introducing a folder with multiple AOF files tracked by a manifest file.
The main issues with the the original AOFRW mechanism are:
* buffering of commands that are processed during rewrite (consuming a lot of RAM)
* freezes of the main process when the AOFRW completes to drain the remaining part of the buffer and fsync it.
* double disk IO for the data that arrives during AOFRW (had to be written to both the old and new AOF files)
The main modifications of this PR:
1. Remove the AOF rewrite buffer and related code.
2. Divide the AOF into multiple files, they are classified as two types, one is the the `BASE` type,
it represents the full amount of data (Maybe AOF or RDB format) after each AOFRW, there is only
one `BASE` file at most. The second is `INCR` type, may have more than one. They represent the
incremental commands since the last AOFRW.
3. Use a AOF manifest file to record and manage these AOF files mentioned above.
4. The original configuration of `appendfilename` will be the base part of the new file name, for example:
`appendonly.aof.1.base.rdb` and `appendonly.aof.2.incr.aof`
5. Add manifest-related TCL tests, and modified some existing tests that depend on the `appendfilename`
6. Remove the `aof_rewrite_buffer_length` field in info.
7. Add `aof-disable-auto-gc` configuration. By default we're automatically deleting HISTORY type AOFs.
It also gives users the opportunity to preserve the history AOFs. just for testing use now.
8. Add AOFRW limiting measure. When the AOFRW failures reaches the threshold (3 times now),
we will delay the execution of the next AOFRW by 1 minute. If the next AOFRW also fails, it will be
delayed by 2 minutes. The next is 4, 8, 16, the maximum delay is 60 minutes (1 hour). During the limit
period, we can still use the 'bgrewriteaof' command to execute AOFRW immediately.
9. Support upgrade (load) data from old version redis.
10. Add `appenddirname` configuration, as the directory name of the append only files. All AOF files and
manifest file will be placed in this directory.
11. Only the last AOF file (BASE or INCR) can be truncated. Otherwise redis will exit even if
`aof-load-truncated` is enabled.
Co-authored-by: Oran Agra <oran@redislabs.com>
2022-01-04 01:14:13 +08:00
server . aof_last_incr_size + = nwritten ;
2014-02-12 12:47:10 +01:00
sdsrange ( server . aof_buf , nwritten , - 1 ) ;
}
return ; /* We'll try again on the next call... */
}
} else {
/* Successful write(2). If AOF was in error state, restore the
* OK state and log the event . */
2015-07-26 23:17:55 +02:00
if ( server . aof_last_write_status = = C_ERR ) {
2023-02-19 22:33:19 +08:00
serverLog ( LL_NOTICE ,
2014-02-12 12:47:10 +01:00
" AOF write error looks solved, Redis can write again. " ) ;
2015-07-26 23:17:55 +02:00
server . aof_last_write_status = C_OK ;
2011-08-18 12:27:34 +02:00
}
2010-06-22 00:07:48 +02:00
}
2011-12-21 11:58:42 +01:00
server . aof_current_size + = nwritten ;
Implement Multi Part AOF mechanism to avoid AOFRW overheads. (#9788)
Implement Multi-Part AOF mechanism to avoid overheads during AOFRW.
Introducing a folder with multiple AOF files tracked by a manifest file.
The main issues with the the original AOFRW mechanism are:
* buffering of commands that are processed during rewrite (consuming a lot of RAM)
* freezes of the main process when the AOFRW completes to drain the remaining part of the buffer and fsync it.
* double disk IO for the data that arrives during AOFRW (had to be written to both the old and new AOF files)
The main modifications of this PR:
1. Remove the AOF rewrite buffer and related code.
2. Divide the AOF into multiple files, they are classified as two types, one is the the `BASE` type,
it represents the full amount of data (Maybe AOF or RDB format) after each AOFRW, there is only
one `BASE` file at most. The second is `INCR` type, may have more than one. They represent the
incremental commands since the last AOFRW.
3. Use a AOF manifest file to record and manage these AOF files mentioned above.
4. The original configuration of `appendfilename` will be the base part of the new file name, for example:
`appendonly.aof.1.base.rdb` and `appendonly.aof.2.incr.aof`
5. Add manifest-related TCL tests, and modified some existing tests that depend on the `appendfilename`
6. Remove the `aof_rewrite_buffer_length` field in info.
7. Add `aof-disable-auto-gc` configuration. By default we're automatically deleting HISTORY type AOFs.
It also gives users the opportunity to preserve the history AOFs. just for testing use now.
8. Add AOFRW limiting measure. When the AOFRW failures reaches the threshold (3 times now),
we will delay the execution of the next AOFRW by 1 minute. If the next AOFRW also fails, it will be
delayed by 2 minutes. The next is 4, 8, 16, the maximum delay is 60 minutes (1 hour). During the limit
period, we can still use the 'bgrewriteaof' command to execute AOFRW immediately.
9. Support upgrade (load) data from old version redis.
10. Add `appenddirname` configuration, as the directory name of the append only files. All AOF files and
manifest file will be placed in this directory.
11. Only the last AOF file (BASE or INCR) can be truncated. Otherwise redis will exit even if
`aof-load-truncated` is enabled.
Co-authored-by: Oran Agra <oran@redislabs.com>
2022-01-04 01:14:13 +08:00
server . aof_last_incr_size + = nwritten ;
2010-06-22 00:07:48 +02:00
2011-08-18 12:44:30 +02:00
/* Re-use AOF buffer when it is small enough. The maximum comes from the
* arena size of 4 k minus some overhead ( but is otherwise arbitrary ) . */
2011-12-21 12:17:02 +01:00
if ( ( sdslen ( server . aof_buf ) + sdsavail ( server . aof_buf ) ) < 4000 ) {
sdsclear ( server . aof_buf ) ;
2011-08-18 12:44:30 +02:00
} else {
2011-12-21 12:17:02 +01:00
sdsfree ( server . aof_buf ) ;
server . aof_buf = sdsempty ( ) ;
2011-08-18 12:44:30 +02:00
}
2019-04-29 14:38:28 +08:00
try_fsync :
2011-08-18 12:25:59 +02:00
/* Don't fsync if no-appendfsync-on-rewrite is set to yes and there are
* children doing I / O in the background . */
2019-09-27 12:03:09 +02:00
if ( server . aof_no_fsync_on_rewrite & & hasActiveChildProcess ( ) )
2019-07-17 08:51:02 +03:00
return ;
2011-08-18 12:25:59 +02:00
/* Perform the fsync if needed. */
2011-12-21 11:58:42 +01:00
if ( server . aof_fsync = = AOF_FSYNC_ALWAYS ) {
2018-03-16 00:44:50 +08:00
/* redis_fsync is defined as fdatasync() for Linux in order to avoid
2010-06-22 00:07:48 +02:00
* flushing metadata . */
2014-07-01 17:19:08 +02:00
latencyStartMonitor ( latency ) ;
2021-01-28 22:25:35 +08:00
/* Let's try to get this data on the disk. To guarantee data safe when
* the AOF fsync policy is ' always ' , we should exit if failed to fsync
* AOF ( see comment next to the exit ( 1 ) after write error above ) . */
if ( redis_fsync ( server . aof_fd ) = = - 1 ) {
serverLog ( LL_WARNING , " Can't persist AOF for fsync error when the "
" AOF fsync policy is 'always': %s. Exiting... " , strerror ( errno ) ) ;
exit ( 1 ) ;
}
2014-07-01 17:19:08 +02:00
latencyEndMonitor ( latency ) ;
latencyAddSampleIfNeeded ( " aof-fsync-always " , latency ) ;
Fix fork done handler wrongly update fsync metrics and enhance AOF_ FSYNC_ALWAYS (#11973)
This PR fix several unrelated bugs that were discovered by the same set of tests
(WAITAOF tests in #11713), could make the `WAITAOF` test hang.
The change in `backgroundRewriteDoneHandler` is about MP-AOF.
That leftover / old code assumes that we started a new AOF file just now
(when we have a new base into which we're gonna incrementally write), but
the fact is that with MP-AOF, the fork done handler doesn't really affect the
incremental file being maintained by the parent process, there's no reason to
re-issue `SELECT`, and no reason to update any of the fsync variables in that flow.
This should have been deleted with MP-AOF (introduced in #9788, 7.0).
The damage is that the update to `aof_fsync_offset` will cause us to miss an fsync
in `flushAppendOnlyFile`, that happens if we stop write commands in `AOF_FSYNC_EVERYSEC`
while an AOFRW is in progress. This caused a new `WAITAOF` test to sometime hang forever.
Also because of MP-AOF, we needed to change `aof_fsync_offset` to `aof_last_incr_fsync_offset`
and match it to `aof_last_incr_size` in `flushAppendOnlyFile`. This is because in the past we compared
`aof_fsync_offset` and `aof_current_size`, but with MP-AOF it could be the total AOF file will be
smaller after AOFRW, and the (already existing) incr file still has data that needs to be fsynced.
The change in `flushAppendOnlyFile`, about the `AOF_FSYNC_ALWAYS`, it is follow #6053
(the details is in #5985), we also check `AOF_FSYNC_ALWAYS` to handle a case where
appendfsync is changed from everysec to always while there is data that's written but not yet fsynced.
2023-03-29 20:17:05 +08:00
server . aof_last_incr_fsync_offset = server . aof_last_incr_size ;
2011-12-21 12:17:02 +01:00
server . aof_last_fsync = server . unixtime ;
Implementing the WAITAOF command (issue #10505) (#11713)
Implementing the WAITAOF functionality which would allow the user to
block until a specified number of Redises have fsynced all previous write
commands to the AOF.
Syntax: `WAITAOF <num_local> <num_replicas> <timeout>`
Response: Array containing two elements: num_local, num_replicas
num_local is always either 0 or 1 representing the local AOF on the master.
num_replicas is the number of replicas that acknowledged the a replication
offset of the last write being fsynced to the AOF.
Returns an error when called on replicas, or when called with non-zero
num_local on a master with AOF disabled, in all other cases the response
just contains number of fsync copies.
Main changes:
* Added code to keep track of replication offsets that are confirmed to have
been fsynced to disk.
* Keep advancing master_repl_offset even when replication is disabled (and
there's no replication backlog, only if there's an AOF enabled).
This way we can use this command and it's mechanisms even when replication
is disabled.
* Extend REPLCONF ACK to `REPLCONF ACK <ofs> FACK <ofs>`, the FACK
will be appended only if there's an AOF on the replica, and already ignored on
old masters (thus backwards compatible)
* WAIT now no longer wait for the replication offset after your last command, but
rather the replication offset after your last write (or read command that caused
propagation, e.g. lazy expiry).
Unrelated changes:
* WAIT command respects CLIENT_DENY_BLOCKING (not just CLIENT_MULTI)
Implementation details:
* Add an atomic var named `fsynced_reploff_pending` that's updated
(usually by the bio thread) and later copied to the main `fsynced_reploff`
variable (only if the AOF base file exists).
I.e. during the initial AOF rewrite it will not be used as the fsynced offset
since the AOF base is still missing.
* Replace close+fsync bio job with new BIO_CLOSE_AOF (AOF specific)
job that will also update fsync offset the field.
* Handle all AOF jobs (BIO_CLOSE_AOF, BIO_AOF_FSYNC) in the same bio
worker thread, to impose ordering on their execution. This solves a
race condition where a job could set `fsynced_reploff_pending` to a higher
value than another pending fsync job, resulting in indicating an offset
for which parts of the data have not yet actually been fsynced.
Imposing an ordering on the jobs guarantees that fsync jobs are executed
in increasing order of replication offset.
* Drain bio jobs when switching `appendfsync` to "always"
This should prevent a write race between updates to `fsynced_reploff_pending`
in the main thread (`flushAppendOnlyFile` when set to ALWAYS fsync), and
those done in the bio thread.
* Drain the pending fsync when starting over a new AOF to avoid race conditions
with the previous AOF offsets overriding the new one (e.g. after switching to
replicate from a new master).
* Make sure to update the fsynced offset at the end of the initial AOF rewrite.
a must in case there are no additional writes that trigger a periodic fsync,
specifically for a replica that does a full sync.
Limitations:
It is possible to write a module and a Lua script that propagate to the AOF and doesn't
propagate to the replication stream. see REDISMODULE_ARGV_NO_REPLICAS and luaRedisSetReplCommand.
These features are incompatible with the WAITAOF command, and can result
in two bad cases. The scenario is that the user executes command that only
propagates to AOF, and then immediately
issues a WAITAOF, and there's no further writes on the replication stream after that.
1. if the the last thing that happened on the replication stream is a PING
(which increased the replication offset but won't trigger an fsync on the replica),
then the client would hang forever (will wait for an fack that the replica will never
send sine it doesn't trigger any fsyncs).
2. if the last thing that happened is a write command that got propagated properly,
then WAITAOF will be released immediately, without waiting for an fsync (since
the offset didn't change)
Refactoring:
* Plumbing to allow bio worker to handle multiple job types
This introduces infrastructure necessary to allow BIO workers to
not have a 1-1 mapping of worker to job-type. This allows in the
future to assign multiple job types to a single worker, either as
a performance/resource optimization, or as a way of enforcing
ordering between specific classes of jobs.
Co-authored-by: Oran Agra <oran@redislabs.com>
2023-03-14 20:26:21 +02:00
atomicSet ( server . fsynced_reploff_pending , server . master_repl_offset ) ;
Fix fork done handler wrongly update fsync metrics and enhance AOF_ FSYNC_ALWAYS (#11973)
This PR fix several unrelated bugs that were discovered by the same set of tests
(WAITAOF tests in #11713), could make the `WAITAOF` test hang.
The change in `backgroundRewriteDoneHandler` is about MP-AOF.
That leftover / old code assumes that we started a new AOF file just now
(when we have a new base into which we're gonna incrementally write), but
the fact is that with MP-AOF, the fork done handler doesn't really affect the
incremental file being maintained by the parent process, there's no reason to
re-issue `SELECT`, and no reason to update any of the fsync variables in that flow.
This should have been deleted with MP-AOF (introduced in #9788, 7.0).
The damage is that the update to `aof_fsync_offset` will cause us to miss an fsync
in `flushAppendOnlyFile`, that happens if we stop write commands in `AOF_FSYNC_EVERYSEC`
while an AOFRW is in progress. This caused a new `WAITAOF` test to sometime hang forever.
Also because of MP-AOF, we needed to change `aof_fsync_offset` to `aof_last_incr_fsync_offset`
and match it to `aof_last_incr_size` in `flushAppendOnlyFile`. This is because in the past we compared
`aof_fsync_offset` and `aof_current_size`, but with MP-AOF it could be the total AOF file will be
smaller after AOFRW, and the (already existing) incr file still has data that needs to be fsynced.
The change in `flushAppendOnlyFile`, about the `AOF_FSYNC_ALWAYS`, it is follow #6053
(the details is in #5985), we also check `AOF_FSYNC_ALWAYS` to handle a case where
appendfsync is changed from everysec to always while there is data that's written but not yet fsynced.
2023-03-29 20:17:05 +08:00
} else if ( server . aof_fsync = = AOF_FSYNC_EVERYSEC & &
server . unixtime > server . aof_last_fsync ) {
2019-04-29 14:38:28 +08:00
if ( ! sync_in_progress ) {
aof_background_fsync ( server . aof_fd ) ;
Fix fork done handler wrongly update fsync metrics and enhance AOF_ FSYNC_ALWAYS (#11973)
This PR fix several unrelated bugs that were discovered by the same set of tests
(WAITAOF tests in #11713), could make the `WAITAOF` test hang.
The change in `backgroundRewriteDoneHandler` is about MP-AOF.
That leftover / old code assumes that we started a new AOF file just now
(when we have a new base into which we're gonna incrementally write), but
the fact is that with MP-AOF, the fork done handler doesn't really affect the
incremental file being maintained by the parent process, there's no reason to
re-issue `SELECT`, and no reason to update any of the fsync variables in that flow.
This should have been deleted with MP-AOF (introduced in #9788, 7.0).
The damage is that the update to `aof_fsync_offset` will cause us to miss an fsync
in `flushAppendOnlyFile`, that happens if we stop write commands in `AOF_FSYNC_EVERYSEC`
while an AOFRW is in progress. This caused a new `WAITAOF` test to sometime hang forever.
Also because of MP-AOF, we needed to change `aof_fsync_offset` to `aof_last_incr_fsync_offset`
and match it to `aof_last_incr_size` in `flushAppendOnlyFile`. This is because in the past we compared
`aof_fsync_offset` and `aof_current_size`, but with MP-AOF it could be the total AOF file will be
smaller after AOFRW, and the (already existing) incr file still has data that needs to be fsynced.
The change in `flushAppendOnlyFile`, about the `AOF_FSYNC_ALWAYS`, it is follow #6053
(the details is in #5985), we also check `AOF_FSYNC_ALWAYS` to handle a case where
appendfsync is changed from everysec to always while there is data that's written but not yet fsynced.
2023-03-29 20:17:05 +08:00
server . aof_last_incr_fsync_offset = server . aof_last_incr_size ;
2019-04-29 14:38:28 +08:00
}
2011-12-21 12:17:02 +01:00
server . aof_last_fsync = server . unixtime ;
2010-06-22 00:07:48 +02:00
}
}
2011-08-18 13:03:04 +02:00
sds catAppendOnlyGenericCommand ( sds dst , int argc , robj * * argv ) {
char buf [ 32 ] ;
int len , j ;
robj * o ;
buf [ 0 ] = ' * ' ;
len = 1 + ll2string ( buf + 1 , sizeof ( buf ) - 1 , argc ) ;
buf [ len + + ] = ' \r ' ;
buf [ len + + ] = ' \n ' ;
dst = sdscatlen ( dst , buf , len ) ;
2010-06-22 00:07:48 +02:00
for ( j = 0 ; j < argc ; j + + ) {
2011-08-18 13:03:04 +02:00
o = getDecodedObject ( argv [ j ] ) ;
buf [ 0 ] = ' $ ' ;
len = 1 + ll2string ( buf + 1 , sizeof ( buf ) - 1 , sdslen ( o - > ptr ) ) ;
buf [ len + + ] = ' \r ' ;
buf [ len + + ] = ' \n ' ;
dst = sdscatlen ( dst , buf , len ) ;
dst = sdscatlen ( dst , o - > ptr , sdslen ( o - > ptr ) ) ;
dst = sdscatlen ( dst , " \r \n " , 2 ) ;
2010-06-22 00:07:48 +02:00
decrRefCount ( o ) ;
}
2011-08-18 13:03:04 +02:00
return dst ;
2010-06-22 00:07:48 +02:00
}
2021-10-25 18:08:34 +08:00
/* Generate a piece of timestamp annotation for AOF if current record timestamp
* in AOF is not equal server unix time . If we specify ' force ' argument to 1 ,
* we would generate one without check , currently , it is useful in AOF rewriting
* child process which always needs to record one timestamp at the beginning of
* rewriting AOF .
*
* Timestamp annotation format is " #TS:${timestamp} \r \n " . " TS " is short of
* timestamp and this method could save extra bytes in AOF . */
sds genAofTimestampAnnotationIfNeeded ( int force ) {
sds ts = NULL ;
if ( force | | server . aof_cur_timestamp < server . unixtime ) {
server . aof_cur_timestamp = force ? time ( NULL ) : server . unixtime ;
ts = sdscatfmt ( sdsempty ( ) , " #TS:%I \r \n " , server . aof_cur_timestamp ) ;
serverAssert ( sdslen ( ts ) < = AOF_ANNOTATION_LINE_MAX_LEN ) ;
}
return ts ;
}
Fix replication inconsistency on modules that uses key space notifications (#10969)
Fix replication inconsistency on modules that uses key space notifications.
### The Problem
In general, key space notifications are invoked after the command logic was
executed (this is not always the case, we will discuss later about specific
command that do not follow this rules). For example, the `set x 1` will trigger
a `set` notification that will be invoked after the `set` logic was performed, so
if the notification logic will try to fetch `x`, it will see the new data that was written.
Consider the scenario on which the notification logic performs some write
commands. for example, the notification logic increase some counter,
`incr x{counter}`, indicating how many times `x` was changed.
The logical order by which the logic was executed is has follow:
```
set x 1
incr x{counter}
```
The issue is that the `set x 1` command is added to the replication buffer
at the end of the command invocation (specifically after the key space
notification logic was invoked and performed the `incr` command).
The replication/aof sees the commands in the wrong order:
```
incr x{counter}
set x 1
```
In this specific example the order is less important.
But if, for example, the notification would have deleted `x` then we would
end up with primary-replica inconsistency.
### The Solution
Put the command that cause the notification in its rightful place. In the
above example, the `set x 1` command logic was executed before the
notification logic, so it should be added to the replication buffer before
the commands that is invoked by the notification logic. To achieve this,
without a major code refactoring, we save a placeholder in the replication
buffer, when finishing invoking the command logic we check if the command
need to be replicated, and if it does, we use the placeholder to add it to the
replication buffer instead of appending it to the end.
To be efficient and not allocating memory on each command to save the
placeholder, the replication buffer array was modified to reuse memory
(instead of allocating it each time we want to replicate commands).
Also, to avoid saving a placeholder when not needed, we do it only for
WRITE or MAY_REPLICATE commands.
#### Additional Fixes
* Expire and Eviction notifications:
* Expire/Eviction logical order was to first perform the Expire/Eviction
and then the notification logic. The replication buffer got this in the
other way around (first notification effect and then the `del` command).
The PR fixes this issue.
* The notification effect and the `del` command was not wrap with
`multi-exec` (if needed). The PR also fix this issue.
* SPOP command:
* On spop, the `spop` notification was fired before the command logic
was executed. The change in this PR would have cause the replication
order to be change (first `spop` command and then notification `logic`)
although the logical order is first the notification logic and then the
`spop` logic. The right fix would have been to move the notification to
be fired after the command was executed (like all the other commands),
but this can be considered a breaking change. To overcome this, the PR
keeps the current behavior and changes the `spop` code to keep the right
logical order when pushing commands to the replication buffer. Another PR
will follow to fix the SPOP properly and match it to the other command (we
split it to 2 separate PR's so it will be easy to cherry-pick this PR to 7.0 if
we chose to).
#### Unhanded Known Limitations
* key miss event:
* On key miss event, if a module performed some write command on the
event (using `RM_Call`), the `dirty` counter would increase and the read
command that cause the key miss event would be replicated to the replication
and aof. This problem can also happened on a write command that open
some keys but eventually decides not to perform any action. We decided
not to handle this problem on this PR because the solution is complex
and will cause additional risks in case we will want to cherry-pick this PR.
We should decide if we want to handle it in future PR's. For now, modules
writers is advice not to perform any write commands on key miss event.
#### Testing
* We already have tests to cover cases where a notification is invoking write
commands that are also added to the replication buffer, the tests was modified
to verify that the replica gets the command in the correct logical order.
* Test was added to verify that `spop` behavior was kept unchanged.
* Test was added to verify key miss event behave as expected.
* Test was added to verify the changes do not break lazy expiration.
#### Additional Changes
* `propagateNow` function can accept a special dbid, -1, indicating not
to replicate `select`. We use this to replicate `multi/exec` on `propagatePendingCommands`
function. The side effect of this change is that now the `select` command
will appear inside the `multi/exec` block on the replication stream (instead of
outside of the `multi/exec` block). Tests was modified to match this new behavior.
2022-08-18 10:16:32 +03:00
/* Write the given command to the aof file.
* dictid - dictionary id the command should be applied to ,
* this is used in order to decide if a ` select ` command
* should also be written to the aof . Value of - 1 means
* to avoid writing ` select ` command in any case .
* argv - The command to write to the aof .
* argc - Number of values in argv
*/
2021-05-29 23:20:32 -07:00
void feedAppendOnlyFile ( int dictid , robj * * argv , int argc ) {
2011-12-15 20:03:28 +01:00
sds buf = sdsempty ( ) ;
2021-10-25 18:08:34 +08:00
Fix replication inconsistency on modules that uses key space notifications (#10969)
Fix replication inconsistency on modules that uses key space notifications.
### The Problem
In general, key space notifications are invoked after the command logic was
executed (this is not always the case, we will discuss later about specific
command that do not follow this rules). For example, the `set x 1` will trigger
a `set` notification that will be invoked after the `set` logic was performed, so
if the notification logic will try to fetch `x`, it will see the new data that was written.
Consider the scenario on which the notification logic performs some write
commands. for example, the notification logic increase some counter,
`incr x{counter}`, indicating how many times `x` was changed.
The logical order by which the logic was executed is has follow:
```
set x 1
incr x{counter}
```
The issue is that the `set x 1` command is added to the replication buffer
at the end of the command invocation (specifically after the key space
notification logic was invoked and performed the `incr` command).
The replication/aof sees the commands in the wrong order:
```
incr x{counter}
set x 1
```
In this specific example the order is less important.
But if, for example, the notification would have deleted `x` then we would
end up with primary-replica inconsistency.
### The Solution
Put the command that cause the notification in its rightful place. In the
above example, the `set x 1` command logic was executed before the
notification logic, so it should be added to the replication buffer before
the commands that is invoked by the notification logic. To achieve this,
without a major code refactoring, we save a placeholder in the replication
buffer, when finishing invoking the command logic we check if the command
need to be replicated, and if it does, we use the placeholder to add it to the
replication buffer instead of appending it to the end.
To be efficient and not allocating memory on each command to save the
placeholder, the replication buffer array was modified to reuse memory
(instead of allocating it each time we want to replicate commands).
Also, to avoid saving a placeholder when not needed, we do it only for
WRITE or MAY_REPLICATE commands.
#### Additional Fixes
* Expire and Eviction notifications:
* Expire/Eviction logical order was to first perform the Expire/Eviction
and then the notification logic. The replication buffer got this in the
other way around (first notification effect and then the `del` command).
The PR fixes this issue.
* The notification effect and the `del` command was not wrap with
`multi-exec` (if needed). The PR also fix this issue.
* SPOP command:
* On spop, the `spop` notification was fired before the command logic
was executed. The change in this PR would have cause the replication
order to be change (first `spop` command and then notification `logic`)
although the logical order is first the notification logic and then the
`spop` logic. The right fix would have been to move the notification to
be fired after the command was executed (like all the other commands),
but this can be considered a breaking change. To overcome this, the PR
keeps the current behavior and changes the `spop` code to keep the right
logical order when pushing commands to the replication buffer. Another PR
will follow to fix the SPOP properly and match it to the other command (we
split it to 2 separate PR's so it will be easy to cherry-pick this PR to 7.0 if
we chose to).
#### Unhanded Known Limitations
* key miss event:
* On key miss event, if a module performed some write command on the
event (using `RM_Call`), the `dirty` counter would increase and the read
command that cause the key miss event would be replicated to the replication
and aof. This problem can also happened on a write command that open
some keys but eventually decides not to perform any action. We decided
not to handle this problem on this PR because the solution is complex
and will cause additional risks in case we will want to cherry-pick this PR.
We should decide if we want to handle it in future PR's. For now, modules
writers is advice not to perform any write commands on key miss event.
#### Testing
* We already have tests to cover cases where a notification is invoking write
commands that are also added to the replication buffer, the tests was modified
to verify that the replica gets the command in the correct logical order.
* Test was added to verify that `spop` behavior was kept unchanged.
* Test was added to verify key miss event behave as expected.
* Test was added to verify the changes do not break lazy expiration.
#### Additional Changes
* `propagateNow` function can accept a special dbid, -1, indicating not
to replicate `select`. We use this to replicate `multi/exec` on `propagatePendingCommands`
function. The side effect of this change is that now the `select` command
will appear inside the `multi/exec` block on the replication stream (instead of
outside of the `multi/exec` block). Tests was modified to match this new behavior.
2022-08-18 10:16:32 +03:00
serverAssert ( dictid = = - 1 | | ( dictid > = 0 & & dictid < server . dbnum ) ) ;
Sort out mess around propagation and MULTI/EXEC (#9890)
The mess:
Some parts use alsoPropagate for late propagation, others using an immediate one (propagate()),
causing edge cases, ugly/hacky code, and the tendency for bugs
The basic idea is that all commands are propagated via alsoPropagate (i.e. added to a list) and the
top-most call() is responsible for going over that list and actually propagating them (and wrapping
them in MULTI/EXEC if there's more than one command). This is done in the new function,
propagatePendingCommands.
Callers to propagatePendingCommands:
1. top-most call() (we want all nested call()s to add to the also_propagate array and just the top-most
one to propagate them) - via `afterCommand`
2. handleClientsBlockedOnKeys: it is out of call() context and it may propagate stuff - via `afterCommand`.
3. handleClientsBlockedOnKeys edge case: if the looked-up key is already expired, we will propagate the
expire but will not unblock any client so `afterCommand` isn't called. in that case, we have to propagate
the deletion explicitly.
4. cron stuff: active-expire and eviction may also propagate stuff
5. modules: the module API allows to propagate stuff from just about anywhere (timers, keyspace notifications,
threads). I could have tried to catch all the out-of-call-context places but it seemed easier to handle it in one
place: when we free the context. in the spirit of what was done in call(), only the top-most freeing of a module
context may cause propagation.
6. modules: when using a thread-safe ctx it's not clear when/if the ctx will be freed. we do know that the module
must lock the GIL before calling RM_Replicate/RM_Call so we propagate the pending commands when
releasing the GIL.
A "known limitation", which were actually a bug, was fixed because of this commit (see propagate.tcl):
When using a mix of RM_Call with `!` and RM_Replicate, the command would propagate out-of-order:
first all the commands from RM_Call, and then the ones from RM_Replicate
Another thing worth mentioning is that if, in the past, a client would issue a MULTI/EXEC with just one
write command the server would blindly propagate the MULTI/EXEC too, even though it's redundant.
not anymore.
This commit renames propagate() to propagateNow() in order to cause conflicts in pending PRs.
propagatePendingCommands is the only caller of propagateNow, which is now a static, internal helper function.
Optimizations:
1. alsoPropagate will not add stuff to also_propagate if there's no AOF and replicas
2. alsoPropagate reallocs also_propagagte exponentially, to save calls to memmove
Bugfixes:
1. CONFIG SET can create evictions, sending notifications which can cause to dirty++ with modules.
we need to prevent it from propagating to AOF/replicas
2. We need to set current_client in RM_Call. buggy scenario:
- CONFIG SET maxmemory, eviction notifications, module hook calls RM_Call
- assertion in lookupKey crashes, because current_client has CONFIG SET, which isn't CMD_WRITE
3. minor: in eviction, call propagateDeletion after notification, like active-expire and all commands
(we always send a notification before propagating the command)
2021-12-22 23:03:48 +01:00
2021-10-25 18:08:34 +08:00
/* Feed timestamp if needed */
if ( server . aof_timestamp_enabled ) {
sds ts = genAofTimestampAnnotationIfNeeded ( 0 ) ;
if ( ts ! = NULL ) {
buf = sdscatsds ( buf , ts ) ;
sdsfree ( ts ) ;
}
}
2013-01-17 01:00:20 +08:00
/* The DB this command was targeting is not the same as the last command
2014-07-12 18:02:17 +02:00
* we appended . To issue a SELECT command is needed . */
Fix replication inconsistency on modules that uses key space notifications (#10969)
Fix replication inconsistency on modules that uses key space notifications.
### The Problem
In general, key space notifications are invoked after the command logic was
executed (this is not always the case, we will discuss later about specific
command that do not follow this rules). For example, the `set x 1` will trigger
a `set` notification that will be invoked after the `set` logic was performed, so
if the notification logic will try to fetch `x`, it will see the new data that was written.
Consider the scenario on which the notification logic performs some write
commands. for example, the notification logic increase some counter,
`incr x{counter}`, indicating how many times `x` was changed.
The logical order by which the logic was executed is has follow:
```
set x 1
incr x{counter}
```
The issue is that the `set x 1` command is added to the replication buffer
at the end of the command invocation (specifically after the key space
notification logic was invoked and performed the `incr` command).
The replication/aof sees the commands in the wrong order:
```
incr x{counter}
set x 1
```
In this specific example the order is less important.
But if, for example, the notification would have deleted `x` then we would
end up with primary-replica inconsistency.
### The Solution
Put the command that cause the notification in its rightful place. In the
above example, the `set x 1` command logic was executed before the
notification logic, so it should be added to the replication buffer before
the commands that is invoked by the notification logic. To achieve this,
without a major code refactoring, we save a placeholder in the replication
buffer, when finishing invoking the command logic we check if the command
need to be replicated, and if it does, we use the placeholder to add it to the
replication buffer instead of appending it to the end.
To be efficient and not allocating memory on each command to save the
placeholder, the replication buffer array was modified to reuse memory
(instead of allocating it each time we want to replicate commands).
Also, to avoid saving a placeholder when not needed, we do it only for
WRITE or MAY_REPLICATE commands.
#### Additional Fixes
* Expire and Eviction notifications:
* Expire/Eviction logical order was to first perform the Expire/Eviction
and then the notification logic. The replication buffer got this in the
other way around (first notification effect and then the `del` command).
The PR fixes this issue.
* The notification effect and the `del` command was not wrap with
`multi-exec` (if needed). The PR also fix this issue.
* SPOP command:
* On spop, the `spop` notification was fired before the command logic
was executed. The change in this PR would have cause the replication
order to be change (first `spop` command and then notification `logic`)
although the logical order is first the notification logic and then the
`spop` logic. The right fix would have been to move the notification to
be fired after the command was executed (like all the other commands),
but this can be considered a breaking change. To overcome this, the PR
keeps the current behavior and changes the `spop` code to keep the right
logical order when pushing commands to the replication buffer. Another PR
will follow to fix the SPOP properly and match it to the other command (we
split it to 2 separate PR's so it will be easy to cherry-pick this PR to 7.0 if
we chose to).
#### Unhanded Known Limitations
* key miss event:
* On key miss event, if a module performed some write command on the
event (using `RM_Call`), the `dirty` counter would increase and the read
command that cause the key miss event would be replicated to the replication
and aof. This problem can also happened on a write command that open
some keys but eventually decides not to perform any action. We decided
not to handle this problem on this PR because the solution is complex
and will cause additional risks in case we will want to cherry-pick this PR.
We should decide if we want to handle it in future PR's. For now, modules
writers is advice not to perform any write commands on key miss event.
#### Testing
* We already have tests to cover cases where a notification is invoking write
commands that are also added to the replication buffer, the tests was modified
to verify that the replica gets the command in the correct logical order.
* Test was added to verify that `spop` behavior was kept unchanged.
* Test was added to verify key miss event behave as expected.
* Test was added to verify the changes do not break lazy expiration.
#### Additional Changes
* `propagateNow` function can accept a special dbid, -1, indicating not
to replicate `select`. We use this to replicate `multi/exec` on `propagatePendingCommands`
function. The side effect of this change is that now the `select` command
will appear inside the `multi/exec` block on the replication stream (instead of
outside of the `multi/exec` block). Tests was modified to match this new behavior.
2022-08-18 10:16:32 +03:00
if ( dictid ! = - 1 & & dictid ! = server . aof_selected_db ) {
2010-06-22 00:07:48 +02:00
char seldb [ 64 ] ;
snprintf ( seldb , sizeof ( seldb ) , " %d " , dictid ) ;
buf = sdscatprintf ( buf , " *2 \r \n $6 \r \n SELECT \r \n $%lu \r \n %s \r \n " ,
( unsigned long ) strlen ( seldb ) , seldb ) ;
2011-12-21 12:17:02 +01:00
server . aof_selected_db = dictid ;
2010-06-22 00:07:48 +02:00
}
2021-05-29 23:20:32 -07:00
/* All commands should be propagated the same way in AOF as in replication.
* No need for AOF - specific translation . */
buf = catAppendOnlyGenericCommand ( buf , argc , argv ) ;
2010-06-22 00:07:48 +02:00
/* Append to the AOF buffer. This will be flushed on disk just before
* of re - entering the event loop , so before the client will get a
2011-12-21 10:31:34 +01:00
* positive reply about the operation performed . */
Implement Multi Part AOF mechanism to avoid AOFRW overheads. (#9788)
Implement Multi-Part AOF mechanism to avoid overheads during AOFRW.
Introducing a folder with multiple AOF files tracked by a manifest file.
The main issues with the the original AOFRW mechanism are:
* buffering of commands that are processed during rewrite (consuming a lot of RAM)
* freezes of the main process when the AOFRW completes to drain the remaining part of the buffer and fsync it.
* double disk IO for the data that arrives during AOFRW (had to be written to both the old and new AOF files)
The main modifications of this PR:
1. Remove the AOF rewrite buffer and related code.
2. Divide the AOF into multiple files, they are classified as two types, one is the the `BASE` type,
it represents the full amount of data (Maybe AOF or RDB format) after each AOFRW, there is only
one `BASE` file at most. The second is `INCR` type, may have more than one. They represent the
incremental commands since the last AOFRW.
3. Use a AOF manifest file to record and manage these AOF files mentioned above.
4. The original configuration of `appendfilename` will be the base part of the new file name, for example:
`appendonly.aof.1.base.rdb` and `appendonly.aof.2.incr.aof`
5. Add manifest-related TCL tests, and modified some existing tests that depend on the `appendfilename`
6. Remove the `aof_rewrite_buffer_length` field in info.
7. Add `aof-disable-auto-gc` configuration. By default we're automatically deleting HISTORY type AOFs.
It also gives users the opportunity to preserve the history AOFs. just for testing use now.
8. Add AOFRW limiting measure. When the AOFRW failures reaches the threshold (3 times now),
we will delay the execution of the next AOFRW by 1 minute. If the next AOFRW also fails, it will be
delayed by 2 minutes. The next is 4, 8, 16, the maximum delay is 60 minutes (1 hour). During the limit
period, we can still use the 'bgrewriteaof' command to execute AOFRW immediately.
9. Support upgrade (load) data from old version redis.
10. Add `appenddirname` configuration, as the directory name of the append only files. All AOF files and
manifest file will be placed in this directory.
11. Only the last AOF file (BASE or INCR) can be truncated. Otherwise redis will exit even if
`aof-load-truncated` is enabled.
Co-authored-by: Oran Agra <oran@redislabs.com>
2022-01-04 01:14:13 +08:00
if ( server . aof_state = = AOF_ON | |
2022-04-26 21:31:19 +08:00
( server . aof_state = = AOF_WAIT_REWRITE & & server . child_type = = CHILD_TYPE_AOF ) )
Implement Multi Part AOF mechanism to avoid AOFRW overheads. (#9788)
Implement Multi-Part AOF mechanism to avoid overheads during AOFRW.
Introducing a folder with multiple AOF files tracked by a manifest file.
The main issues with the the original AOFRW mechanism are:
* buffering of commands that are processed during rewrite (consuming a lot of RAM)
* freezes of the main process when the AOFRW completes to drain the remaining part of the buffer and fsync it.
* double disk IO for the data that arrives during AOFRW (had to be written to both the old and new AOF files)
The main modifications of this PR:
1. Remove the AOF rewrite buffer and related code.
2. Divide the AOF into multiple files, they are classified as two types, one is the the `BASE` type,
it represents the full amount of data (Maybe AOF or RDB format) after each AOFRW, there is only
one `BASE` file at most. The second is `INCR` type, may have more than one. They represent the
incremental commands since the last AOFRW.
3. Use a AOF manifest file to record and manage these AOF files mentioned above.
4. The original configuration of `appendfilename` will be the base part of the new file name, for example:
`appendonly.aof.1.base.rdb` and `appendonly.aof.2.incr.aof`
5. Add manifest-related TCL tests, and modified some existing tests that depend on the `appendfilename`
6. Remove the `aof_rewrite_buffer_length` field in info.
7. Add `aof-disable-auto-gc` configuration. By default we're automatically deleting HISTORY type AOFs.
It also gives users the opportunity to preserve the history AOFs. just for testing use now.
8. Add AOFRW limiting measure. When the AOFRW failures reaches the threshold (3 times now),
we will delay the execution of the next AOFRW by 1 minute. If the next AOFRW also fails, it will be
delayed by 2 minutes. The next is 4, 8, 16, the maximum delay is 60 minutes (1 hour). During the limit
period, we can still use the 'bgrewriteaof' command to execute AOFRW immediately.
9. Support upgrade (load) data from old version redis.
10. Add `appenddirname` configuration, as the directory name of the append only files. All AOF files and
manifest file will be placed in this directory.
11. Only the last AOF file (BASE or INCR) can be truncated. Otherwise redis will exit even if
`aof-load-truncated` is enabled.
Co-authored-by: Oran Agra <oran@redislabs.com>
2022-01-04 01:14:13 +08:00
{
server . aof_buf = sdscatlen ( server . aof_buf , buf , sdslen ( buf ) ) ;
}
2010-06-22 00:07:48 +02:00
sdsfree ( buf ) ;
}
2012-05-21 16:50:05 +02:00
/* ----------------------------------------------------------------------------
* AOF loading
* - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - */
2010-06-22 00:07:48 +02:00
/* In Redis commands are always executed in the context of a client, so in
* order to load the append only file we need to create a fake client . */
2019-10-17 18:10:50 +02:00
struct client * createAOFClient ( void ) {
2021-08-09 01:03:59 -07:00
struct client * c = createClient ( NULL ) ;
2010-06-22 00:07:48 +02:00
2019-10-17 18:10:50 +02:00
c - > id = CLIENT_ID_AOF ; /* So modules can identify it's the AOF client. */
Unified MULTI, LUA, and RM_Call with respect to blocking commands (#8025)
Blocking command should not be used with MULTI, LUA, and RM_Call. This is because,
the caller, who executes the command in this context, expects a reply.
Today, LUA and MULTI have a special (and different) treatment to blocking commands:
LUA - Most commands are marked with no-script flag which are checked when executing
and command from LUA, commands that are not marked (like XREAD) verify that their
blocking mode is not used inside LUA (by checking the CLIENT_LUA client flag).
MULTI - Command that is going to block, first verify that the client is not inside
multi (by checking the CLIENT_MULTI client flag). If the client is inside multi, they
return a result which is a match to the empty key with no timeout (for example blpop
inside MULTI will act as lpop)
For modules that perform RM_Call with blocking command, the returned results type is
REDISMODULE_REPLY_UNKNOWN and the caller can not really know what happened.
Disadvantages of the current state are:
No unified approach, LUA, MULTI, and RM_Call, each has a different treatment
Module can not safely execute blocking command (and get reply or error).
Though It is true that modules are not like LUA or MULTI and should be smarter not
to execute blocking commands on RM_Call, sometimes you want to execute a command base
on client input (for example if you create a module that provides a new scripting
language like javascript or python).
While modules (on modules command) can check for REDISMODULE_CTX_FLAGS_LUA or
REDISMODULE_CTX_FLAGS_MULTI to know not to block the client, there is no way to
check if the command came from another module using RM_Call. So there is no way
for a module to know not to block another module RM_Call execution.
This commit adds a way to unify the treatment for blocking clients by introducing
a new CLIENT_DENY_BLOCKING client flag. On LUA, MULTI, and RM_Call the new flag
turned on to signify that the client should not be blocked. A blocking command
verifies that the flag is turned off before blocking. If a blocking command sees
that the CLIENT_DENY_BLOCKING flag is on, it's not blocking and return results
which are matches to empty key with no timeout (as MULTI does today).
The new flag is checked on the following commands:
List blocking commands: BLPOP, BRPOP, BRPOPLPUSH, BLMOVE,
Zset blocking commands: BZPOPMIN, BZPOPMAX
Stream blocking commands: XREAD, XREADGROUP
SUBSCRIBE, PSUBSCRIBE, MONITOR
In addition, the new flag is turned on inside the AOF client, we do not want to
block the AOF client to prevent deadlocks and commands ordering issues (and there
is also an existing assert in the code that verifies it).
To keep backward compatibility on LUA, all the no-script flags on existing commands
were kept untouched. In addition, a LUA special treatment on XREAD and XREADGROUP was kept.
To keep backward compatibility on MULTI (which today allows SUBSCRIBE, and PSUBSCRIBE).
We added a special treatment on those commands to allow executing them on MULTI.
The only backward compatibility issue that this PR introduces is that now MONITOR
is not allowed inside MULTI.
Tests were added to verify blocking commands are not blocking the client on LUA, MULTI,
or RM_Call. Tests were added to verify the module can check for CLIENT_DENY_BLOCKING flag.
Co-authored-by: Oran Agra <oran@redislabs.com>
Co-authored-by: Itamar Haber <itamar@redislabs.com>
2020-11-17 18:58:55 +02:00
/*
* The AOF client should never be blocked ( unlike master
* replication connection ) .
* This is because blocking the AOF client might cause
* deadlock ( because potentially no one will unblock it ) .
* Also , if the AOF client will be blocked just for
* background processing there is a chance that the
* command execution order will be violated .
*/
c - > flags = CLIENT_DENY_BLOCKING ;
2010-06-22 00:07:48 +02:00
/* We set the fake client as a slave waiting for the synchronization
* so that Redis will not try to send replies to this client . */
2015-07-27 09:41:48 +02:00
c - > replstate = SLAVE_STATE_WAIT_BGSAVE_START ;
2010-06-22 00:07:48 +02:00
return c ;
}
Implement Multi Part AOF mechanism to avoid AOFRW overheads. (#9788)
Implement Multi-Part AOF mechanism to avoid overheads during AOFRW.
Introducing a folder with multiple AOF files tracked by a manifest file.
The main issues with the the original AOFRW mechanism are:
* buffering of commands that are processed during rewrite (consuming a lot of RAM)
* freezes of the main process when the AOFRW completes to drain the remaining part of the buffer and fsync it.
* double disk IO for the data that arrives during AOFRW (had to be written to both the old and new AOF files)
The main modifications of this PR:
1. Remove the AOF rewrite buffer and related code.
2. Divide the AOF into multiple files, they are classified as two types, one is the the `BASE` type,
it represents the full amount of data (Maybe AOF or RDB format) after each AOFRW, there is only
one `BASE` file at most. The second is `INCR` type, may have more than one. They represent the
incremental commands since the last AOFRW.
3. Use a AOF manifest file to record and manage these AOF files mentioned above.
4. The original configuration of `appendfilename` will be the base part of the new file name, for example:
`appendonly.aof.1.base.rdb` and `appendonly.aof.2.incr.aof`
5. Add manifest-related TCL tests, and modified some existing tests that depend on the `appendfilename`
6. Remove the `aof_rewrite_buffer_length` field in info.
7. Add `aof-disable-auto-gc` configuration. By default we're automatically deleting HISTORY type AOFs.
It also gives users the opportunity to preserve the history AOFs. just for testing use now.
8. Add AOFRW limiting measure. When the AOFRW failures reaches the threshold (3 times now),
we will delay the execution of the next AOFRW by 1 minute. If the next AOFRW also fails, it will be
delayed by 2 minutes. The next is 4, 8, 16, the maximum delay is 60 minutes (1 hour). During the limit
period, we can still use the 'bgrewriteaof' command to execute AOFRW immediately.
9. Support upgrade (load) data from old version redis.
10. Add `appenddirname` configuration, as the directory name of the append only files. All AOF files and
manifest file will be placed in this directory.
11. Only the last AOF file (BASE or INCR) can be truncated. Otherwise redis will exit even if
`aof-load-truncated` is enabled.
Co-authored-by: Oran Agra <oran@redislabs.com>
2022-01-04 01:14:13 +08:00
/* Replay an append log file. On success AOF_OK or AOF_TRUNCATED is returned,
2021-06-14 10:38:08 +03:00
* otherwise , one of the following is returned :
* AOF_OPEN_ERR : Failed to open the AOF file .
* AOF_NOT_EXIST : AOF file doesn ' t exist .
* AOF_EMPTY : The AOF file is empty ( nothing to load ) .
* AOF_FAILED : Failed to load the AOF file . */
Implement Multi Part AOF mechanism to avoid AOFRW overheads. (#9788)
Implement Multi-Part AOF mechanism to avoid overheads during AOFRW.
Introducing a folder with multiple AOF files tracked by a manifest file.
The main issues with the the original AOFRW mechanism are:
* buffering of commands that are processed during rewrite (consuming a lot of RAM)
* freezes of the main process when the AOFRW completes to drain the remaining part of the buffer and fsync it.
* double disk IO for the data that arrives during AOFRW (had to be written to both the old and new AOF files)
The main modifications of this PR:
1. Remove the AOF rewrite buffer and related code.
2. Divide the AOF into multiple files, they are classified as two types, one is the the `BASE` type,
it represents the full amount of data (Maybe AOF or RDB format) after each AOFRW, there is only
one `BASE` file at most. The second is `INCR` type, may have more than one. They represent the
incremental commands since the last AOFRW.
3. Use a AOF manifest file to record and manage these AOF files mentioned above.
4. The original configuration of `appendfilename` will be the base part of the new file name, for example:
`appendonly.aof.1.base.rdb` and `appendonly.aof.2.incr.aof`
5. Add manifest-related TCL tests, and modified some existing tests that depend on the `appendfilename`
6. Remove the `aof_rewrite_buffer_length` field in info.
7. Add `aof-disable-auto-gc` configuration. By default we're automatically deleting HISTORY type AOFs.
It also gives users the opportunity to preserve the history AOFs. just for testing use now.
8. Add AOFRW limiting measure. When the AOFRW failures reaches the threshold (3 times now),
we will delay the execution of the next AOFRW by 1 minute. If the next AOFRW also fails, it will be
delayed by 2 minutes. The next is 4, 8, 16, the maximum delay is 60 minutes (1 hour). During the limit
period, we can still use the 'bgrewriteaof' command to execute AOFRW immediately.
9. Support upgrade (load) data from old version redis.
10. Add `appenddirname` configuration, as the directory name of the append only files. All AOF files and
manifest file will be placed in this directory.
11. Only the last AOF file (BASE or INCR) can be truncated. Otherwise redis will exit even if
`aof-load-truncated` is enabled.
Co-authored-by: Oran Agra <oran@redislabs.com>
2022-01-04 01:14:13 +08:00
int loadSingleAppendOnlyFile ( char * filename ) {
2015-07-26 15:20:46 +02:00
struct client * fakeClient ;
2010-06-22 00:07:48 +02:00
struct redis_stat sb ;
2011-12-21 10:31:34 +01:00
int old_aof_state = server . aof_state ;
2010-11-08 11:52:03 +01:00
long loops = 0 ;
2016-08-11 15:42:28 +02:00
off_t valid_up_to = 0 ; /* Offset of latest well-formed command loaded. */
2018-08-02 14:59:28 +08:00
off_t valid_before_multi = 0 ; /* Offset before MULTI command loaded. */
Implement Multi Part AOF mechanism to avoid AOFRW overheads. (#9788)
Implement Multi-Part AOF mechanism to avoid overheads during AOFRW.
Introducing a folder with multiple AOF files tracked by a manifest file.
The main issues with the the original AOFRW mechanism are:
* buffering of commands that are processed during rewrite (consuming a lot of RAM)
* freezes of the main process when the AOFRW completes to drain the remaining part of the buffer and fsync it.
* double disk IO for the data that arrives during AOFRW (had to be written to both the old and new AOF files)
The main modifications of this PR:
1. Remove the AOF rewrite buffer and related code.
2. Divide the AOF into multiple files, they are classified as two types, one is the the `BASE` type,
it represents the full amount of data (Maybe AOF or RDB format) after each AOFRW, there is only
one `BASE` file at most. The second is `INCR` type, may have more than one. They represent the
incremental commands since the last AOFRW.
3. Use a AOF manifest file to record and manage these AOF files mentioned above.
4. The original configuration of `appendfilename` will be the base part of the new file name, for example:
`appendonly.aof.1.base.rdb` and `appendonly.aof.2.incr.aof`
5. Add manifest-related TCL tests, and modified some existing tests that depend on the `appendfilename`
6. Remove the `aof_rewrite_buffer_length` field in info.
7. Add `aof-disable-auto-gc` configuration. By default we're automatically deleting HISTORY type AOFs.
It also gives users the opportunity to preserve the history AOFs. just for testing use now.
8. Add AOFRW limiting measure. When the AOFRW failures reaches the threshold (3 times now),
we will delay the execution of the next AOFRW by 1 minute. If the next AOFRW also fails, it will be
delayed by 2 minutes. The next is 4, 8, 16, the maximum delay is 60 minutes (1 hour). During the limit
period, we can still use the 'bgrewriteaof' command to execute AOFRW immediately.
9. Support upgrade (load) data from old version redis.
10. Add `appenddirname` configuration, as the directory name of the append only files. All AOF files and
manifest file will be placed in this directory.
11. Only the last AOF file (BASE or INCR) can be truncated. Otherwise redis will exit even if
`aof-load-truncated` is enabled.
Co-authored-by: Oran Agra <oran@redislabs.com>
2022-01-04 01:14:13 +08:00
off_t last_progress_report_size = 0 ;
2022-05-29 18:33:16 +08:00
int ret = AOF_OK ;
2010-06-22 00:07:48 +02:00
Implement Multi Part AOF mechanism to avoid AOFRW overheads. (#9788)
Implement Multi-Part AOF mechanism to avoid overheads during AOFRW.
Introducing a folder with multiple AOF files tracked by a manifest file.
The main issues with the the original AOFRW mechanism are:
* buffering of commands that are processed during rewrite (consuming a lot of RAM)
* freezes of the main process when the AOFRW completes to drain the remaining part of the buffer and fsync it.
* double disk IO for the data that arrives during AOFRW (had to be written to both the old and new AOF files)
The main modifications of this PR:
1. Remove the AOF rewrite buffer and related code.
2. Divide the AOF into multiple files, they are classified as two types, one is the the `BASE` type,
it represents the full amount of data (Maybe AOF or RDB format) after each AOFRW, there is only
one `BASE` file at most. The second is `INCR` type, may have more than one. They represent the
incremental commands since the last AOFRW.
3. Use a AOF manifest file to record and manage these AOF files mentioned above.
4. The original configuration of `appendfilename` will be the base part of the new file name, for example:
`appendonly.aof.1.base.rdb` and `appendonly.aof.2.incr.aof`
5. Add manifest-related TCL tests, and modified some existing tests that depend on the `appendfilename`
6. Remove the `aof_rewrite_buffer_length` field in info.
7. Add `aof-disable-auto-gc` configuration. By default we're automatically deleting HISTORY type AOFs.
It also gives users the opportunity to preserve the history AOFs. just for testing use now.
8. Add AOFRW limiting measure. When the AOFRW failures reaches the threshold (3 times now),
we will delay the execution of the next AOFRW by 1 minute. If the next AOFRW also fails, it will be
delayed by 2 minutes. The next is 4, 8, 16, the maximum delay is 60 minutes (1 hour). During the limit
period, we can still use the 'bgrewriteaof' command to execute AOFRW immediately.
9. Support upgrade (load) data from old version redis.
10. Add `appenddirname` configuration, as the directory name of the append only files. All AOF files and
manifest file will be placed in this directory.
11. Only the last AOF file (BASE or INCR) can be truncated. Otherwise redis will exit even if
`aof-load-truncated` is enabled.
Co-authored-by: Oran Agra <oran@redislabs.com>
2022-01-04 01:14:13 +08:00
sds aof_filepath = makePath ( server . aof_dirname , filename ) ;
FILE * fp = fopen ( aof_filepath , " r " ) ;
2016-08-11 15:42:28 +02:00
if ( fp = = NULL ) {
2021-06-14 10:38:08 +03:00
int en = errno ;
Implement Multi Part AOF mechanism to avoid AOFRW overheads. (#9788)
Implement Multi-Part AOF mechanism to avoid overheads during AOFRW.
Introducing a folder with multiple AOF files tracked by a manifest file.
The main issues with the the original AOFRW mechanism are:
* buffering of commands that are processed during rewrite (consuming a lot of RAM)
* freezes of the main process when the AOFRW completes to drain the remaining part of the buffer and fsync it.
* double disk IO for the data that arrives during AOFRW (had to be written to both the old and new AOF files)
The main modifications of this PR:
1. Remove the AOF rewrite buffer and related code.
2. Divide the AOF into multiple files, they are classified as two types, one is the the `BASE` type,
it represents the full amount of data (Maybe AOF or RDB format) after each AOFRW, there is only
one `BASE` file at most. The second is `INCR` type, may have more than one. They represent the
incremental commands since the last AOFRW.
3. Use a AOF manifest file to record and manage these AOF files mentioned above.
4. The original configuration of `appendfilename` will be the base part of the new file name, for example:
`appendonly.aof.1.base.rdb` and `appendonly.aof.2.incr.aof`
5. Add manifest-related TCL tests, and modified some existing tests that depend on the `appendfilename`
6. Remove the `aof_rewrite_buffer_length` field in info.
7. Add `aof-disable-auto-gc` configuration. By default we're automatically deleting HISTORY type AOFs.
It also gives users the opportunity to preserve the history AOFs. just for testing use now.
8. Add AOFRW limiting measure. When the AOFRW failures reaches the threshold (3 times now),
we will delay the execution of the next AOFRW by 1 minute. If the next AOFRW also fails, it will be
delayed by 2 minutes. The next is 4, 8, 16, the maximum delay is 60 minutes (1 hour). During the limit
period, we can still use the 'bgrewriteaof' command to execute AOFRW immediately.
9. Support upgrade (load) data from old version redis.
10. Add `appenddirname` configuration, as the directory name of the append only files. All AOF files and
manifest file will be placed in this directory.
11. Only the last AOF file (BASE or INCR) can be truncated. Otherwise redis will exit even if
`aof-load-truncated` is enabled.
Co-authored-by: Oran Agra <oran@redislabs.com>
2022-01-04 01:14:13 +08:00
if ( redis_stat ( aof_filepath , & sb ) = = 0 | | errno ! = ENOENT ) {
serverLog ( LL_WARNING , " Fatal error: can't open the append log file %s for reading: %s " , filename , strerror ( en ) ) ;
sdsfree ( aof_filepath ) ;
2021-06-14 10:38:08 +03:00
return AOF_OPEN_ERR ;
} else {
Implement Multi Part AOF mechanism to avoid AOFRW overheads. (#9788)
Implement Multi-Part AOF mechanism to avoid overheads during AOFRW.
Introducing a folder with multiple AOF files tracked by a manifest file.
The main issues with the the original AOFRW mechanism are:
* buffering of commands that are processed during rewrite (consuming a lot of RAM)
* freezes of the main process when the AOFRW completes to drain the remaining part of the buffer and fsync it.
* double disk IO for the data that arrives during AOFRW (had to be written to both the old and new AOF files)
The main modifications of this PR:
1. Remove the AOF rewrite buffer and related code.
2. Divide the AOF into multiple files, they are classified as two types, one is the the `BASE` type,
it represents the full amount of data (Maybe AOF or RDB format) after each AOFRW, there is only
one `BASE` file at most. The second is `INCR` type, may have more than one. They represent the
incremental commands since the last AOFRW.
3. Use a AOF manifest file to record and manage these AOF files mentioned above.
4. The original configuration of `appendfilename` will be the base part of the new file name, for example:
`appendonly.aof.1.base.rdb` and `appendonly.aof.2.incr.aof`
5. Add manifest-related TCL tests, and modified some existing tests that depend on the `appendfilename`
6. Remove the `aof_rewrite_buffer_length` field in info.
7. Add `aof-disable-auto-gc` configuration. By default we're automatically deleting HISTORY type AOFs.
It also gives users the opportunity to preserve the history AOFs. just for testing use now.
8. Add AOFRW limiting measure. When the AOFRW failures reaches the threshold (3 times now),
we will delay the execution of the next AOFRW by 1 minute. If the next AOFRW also fails, it will be
delayed by 2 minutes. The next is 4, 8, 16, the maximum delay is 60 minutes (1 hour). During the limit
period, we can still use the 'bgrewriteaof' command to execute AOFRW immediately.
9. Support upgrade (load) data from old version redis.
10. Add `appenddirname` configuration, as the directory name of the append only files. All AOF files and
manifest file will be placed in this directory.
11. Only the last AOF file (BASE or INCR) can be truncated. Otherwise redis will exit even if
`aof-load-truncated` is enabled.
Co-authored-by: Oran Agra <oran@redislabs.com>
2022-01-04 01:14:13 +08:00
serverLog ( LL_WARNING , " The append log file %s doesn't exist: %s " , filename , strerror ( errno ) ) ;
sdsfree ( aof_filepath ) ;
2021-06-14 10:38:08 +03:00
return AOF_NOT_EXIST ;
}
2016-08-11 15:42:28 +02:00
}
2011-03-04 16:13:54 +01:00
if ( fp & & redis_fstat ( fileno ( fp ) , & sb ) ! = - 1 & & sb . st_size = = 0 ) {
fclose ( fp ) ;
Implement Multi Part AOF mechanism to avoid AOFRW overheads. (#9788)
Implement Multi-Part AOF mechanism to avoid overheads during AOFRW.
Introducing a folder with multiple AOF files tracked by a manifest file.
The main issues with the the original AOFRW mechanism are:
* buffering of commands that are processed during rewrite (consuming a lot of RAM)
* freezes of the main process when the AOFRW completes to drain the remaining part of the buffer and fsync it.
* double disk IO for the data that arrives during AOFRW (had to be written to both the old and new AOF files)
The main modifications of this PR:
1. Remove the AOF rewrite buffer and related code.
2. Divide the AOF into multiple files, they are classified as two types, one is the the `BASE` type,
it represents the full amount of data (Maybe AOF or RDB format) after each AOFRW, there is only
one `BASE` file at most. The second is `INCR` type, may have more than one. They represent the
incremental commands since the last AOFRW.
3. Use a AOF manifest file to record and manage these AOF files mentioned above.
4. The original configuration of `appendfilename` will be the base part of the new file name, for example:
`appendonly.aof.1.base.rdb` and `appendonly.aof.2.incr.aof`
5. Add manifest-related TCL tests, and modified some existing tests that depend on the `appendfilename`
6. Remove the `aof_rewrite_buffer_length` field in info.
7. Add `aof-disable-auto-gc` configuration. By default we're automatically deleting HISTORY type AOFs.
It also gives users the opportunity to preserve the history AOFs. just for testing use now.
8. Add AOFRW limiting measure. When the AOFRW failures reaches the threshold (3 times now),
we will delay the execution of the next AOFRW by 1 minute. If the next AOFRW also fails, it will be
delayed by 2 minutes. The next is 4, 8, 16, the maximum delay is 60 minutes (1 hour). During the limit
period, we can still use the 'bgrewriteaof' command to execute AOFRW immediately.
9. Support upgrade (load) data from old version redis.
10. Add `appenddirname` configuration, as the directory name of the append only files. All AOF files and
manifest file will be placed in this directory.
11. Only the last AOF file (BASE or INCR) can be truncated. Otherwise redis will exit even if
`aof-load-truncated` is enabled.
Co-authored-by: Oran Agra <oran@redislabs.com>
2022-01-04 01:14:13 +08:00
sdsfree ( aof_filepath ) ;
2021-06-14 10:38:08 +03:00
return AOF_EMPTY ;
2011-03-04 16:13:54 +01:00
}
2010-06-22 00:07:48 +02:00
/* Temporarily disable AOF, to prevent EXEC from feeding a MULTI
* to the same file we ' re about to read . */
2015-07-27 09:41:48 +02:00
server . aof_state = AOF_OFF ;
2010-06-22 00:07:48 +02:00
2023-02-16 08:07:35 +02:00
client * old_cur_client = server . current_client ;
client * old_exec_client = server . executing_client ;
fakeClient = createAOFClient ( ) ;
server . current_client = server . executing_client = fakeClient ;
2010-11-08 11:52:03 +01:00
2022-02-12 00:47:03 +08:00
/* Check if the AOF file is in RDB format (it may be RDB encoded base AOF
* or old style RDB - preamble AOF ) . In that case we need to load the RDB file
* and later continue loading the AOF tail if it is an old style RDB - preamble AOF . */
2016-08-11 15:42:28 +02:00
char sig [ 5 ] ; /* "REDIS" */
if ( fread ( sig , 1 , 5 , fp ) ! = 5 | | memcmp ( sig , " REDIS " , 5 ) ! = 0 ) {
2022-02-12 00:47:03 +08:00
/* Not in RDB format, seek back at 0 offset. */
2016-08-11 15:42:28 +02:00
if ( fseek ( fp , 0 , SEEK_SET ) = = - 1 ) goto readerr ;
} else {
2022-02-12 00:47:03 +08:00
/* RDB format. Pass loading the RDB functions. */
2016-08-11 15:42:28 +02:00
rio rdb ;
2022-02-12 00:47:03 +08:00
int old_style = ! strcmp ( filename , server . aof_filename ) ;
if ( old_style )
serverLog ( LL_NOTICE , " Reading RDB preamble from AOF file... " ) ;
else
serverLog ( LL_NOTICE , " Reading RDB base file on AOF loading... " ) ;
2016-08-11 15:42:28 +02:00
if ( fseek ( fp , 0 , SEEK_SET ) = = - 1 ) goto readerr ;
rioInitWithFile ( & rdb , fp ) ;
2021-10-07 14:41:26 +03:00
if ( rdbLoadRio ( & rdb , RDBFLAGS_AOF_PREAMBLE , NULL ) ! = C_OK ) {
2022-02-12 00:47:03 +08:00
if ( old_style )
serverLog ( LL_WARNING , " Error reading the RDB preamble of the AOF file %s, AOF loading aborted " , filename ) ;
else
serverLog ( LL_WARNING , " Error reading the RDB base file %s, AOF loading aborted " , filename ) ;
2022-08-04 15:47:37 +08:00
ret = AOF_FAILED ;
goto cleanup ;
2016-08-11 15:42:28 +02:00
} else {
Implement Multi Part AOF mechanism to avoid AOFRW overheads. (#9788)
Implement Multi-Part AOF mechanism to avoid overheads during AOFRW.
Introducing a folder with multiple AOF files tracked by a manifest file.
The main issues with the the original AOFRW mechanism are:
* buffering of commands that are processed during rewrite (consuming a lot of RAM)
* freezes of the main process when the AOFRW completes to drain the remaining part of the buffer and fsync it.
* double disk IO for the data that arrives during AOFRW (had to be written to both the old and new AOF files)
The main modifications of this PR:
1. Remove the AOF rewrite buffer and related code.
2. Divide the AOF into multiple files, they are classified as two types, one is the the `BASE` type,
it represents the full amount of data (Maybe AOF or RDB format) after each AOFRW, there is only
one `BASE` file at most. The second is `INCR` type, may have more than one. They represent the
incremental commands since the last AOFRW.
3. Use a AOF manifest file to record and manage these AOF files mentioned above.
4. The original configuration of `appendfilename` will be the base part of the new file name, for example:
`appendonly.aof.1.base.rdb` and `appendonly.aof.2.incr.aof`
5. Add manifest-related TCL tests, and modified some existing tests that depend on the `appendfilename`
6. Remove the `aof_rewrite_buffer_length` field in info.
7. Add `aof-disable-auto-gc` configuration. By default we're automatically deleting HISTORY type AOFs.
It also gives users the opportunity to preserve the history AOFs. just for testing use now.
8. Add AOFRW limiting measure. When the AOFRW failures reaches the threshold (3 times now),
we will delay the execution of the next AOFRW by 1 minute. If the next AOFRW also fails, it will be
delayed by 2 minutes. The next is 4, 8, 16, the maximum delay is 60 minutes (1 hour). During the limit
period, we can still use the 'bgrewriteaof' command to execute AOFRW immediately.
9. Support upgrade (load) data from old version redis.
10. Add `appenddirname` configuration, as the directory name of the append only files. All AOF files and
manifest file will be placed in this directory.
11. Only the last AOF file (BASE or INCR) can be truncated. Otherwise redis will exit even if
`aof-load-truncated` is enabled.
Co-authored-by: Oran Agra <oran@redislabs.com>
2022-01-04 01:14:13 +08:00
loadingAbsProgress ( ftello ( fp ) ) ;
last_progress_report_size = ftello ( fp ) ;
2022-02-12 00:47:03 +08:00
if ( old_style ) serverLog ( LL_NOTICE , " Reading the remaining AOF tail... " ) ;
2016-08-11 15:42:28 +02:00
}
}
/* Read the actual AOF file, in REPL format, command by command. */
2010-06-22 00:07:48 +02:00
while ( 1 ) {
int argc , j ;
unsigned long len ;
robj * * argv ;
2021-10-25 18:08:34 +08:00
char buf [ AOF_ANNOTATION_LINE_MAX_LEN ] ;
2010-06-22 00:07:48 +02:00
sds argsds ;
struct redisCommand * cmd ;
2010-11-08 11:52:03 +01:00
/* Serve the clients from time to time */
Implement Multi Part AOF mechanism to avoid AOFRW overheads. (#9788)
Implement Multi-Part AOF mechanism to avoid overheads during AOFRW.
Introducing a folder with multiple AOF files tracked by a manifest file.
The main issues with the the original AOFRW mechanism are:
* buffering of commands that are processed during rewrite (consuming a lot of RAM)
* freezes of the main process when the AOFRW completes to drain the remaining part of the buffer and fsync it.
* double disk IO for the data that arrives during AOFRW (had to be written to both the old and new AOF files)
The main modifications of this PR:
1. Remove the AOF rewrite buffer and related code.
2. Divide the AOF into multiple files, they are classified as two types, one is the the `BASE` type,
it represents the full amount of data (Maybe AOF or RDB format) after each AOFRW, there is only
one `BASE` file at most. The second is `INCR` type, may have more than one. They represent the
incremental commands since the last AOFRW.
3. Use a AOF manifest file to record and manage these AOF files mentioned above.
4. The original configuration of `appendfilename` will be the base part of the new file name, for example:
`appendonly.aof.1.base.rdb` and `appendonly.aof.2.incr.aof`
5. Add manifest-related TCL tests, and modified some existing tests that depend on the `appendfilename`
6. Remove the `aof_rewrite_buffer_length` field in info.
7. Add `aof-disable-auto-gc` configuration. By default we're automatically deleting HISTORY type AOFs.
It also gives users the opportunity to preserve the history AOFs. just for testing use now.
8. Add AOFRW limiting measure. When the AOFRW failures reaches the threshold (3 times now),
we will delay the execution of the next AOFRW by 1 minute. If the next AOFRW also fails, it will be
delayed by 2 minutes. The next is 4, 8, 16, the maximum delay is 60 minutes (1 hour). During the limit
period, we can still use the 'bgrewriteaof' command to execute AOFRW immediately.
9. Support upgrade (load) data from old version redis.
10. Add `appenddirname` configuration, as the directory name of the append only files. All AOF files and
manifest file will be placed in this directory.
11. Only the last AOF file (BASE or INCR) can be truncated. Otherwise redis will exit even if
`aof-load-truncated` is enabled.
Co-authored-by: Oran Agra <oran@redislabs.com>
2022-01-04 01:14:13 +08:00
if ( ! ( loops + + % 1024 ) ) {
off_t progress_delta = ftello ( fp ) - last_progress_report_size ;
loadingIncrProgress ( progress_delta ) ;
last_progress_report_size + = progress_delta ;
2014-04-24 17:36:47 +02:00
processEventsWhileBlocked ( ) ;
2019-10-29 17:59:09 +02:00
processModuleLoadingProgressEvent ( 1 ) ;
2010-11-08 11:52:03 +01:00
}
2010-06-22 00:07:48 +02:00
if ( fgets ( buf , sizeof ( buf ) , fp ) = = NULL ) {
Implement Multi Part AOF mechanism to avoid AOFRW overheads. (#9788)
Implement Multi-Part AOF mechanism to avoid overheads during AOFRW.
Introducing a folder with multiple AOF files tracked by a manifest file.
The main issues with the the original AOFRW mechanism are:
* buffering of commands that are processed during rewrite (consuming a lot of RAM)
* freezes of the main process when the AOFRW completes to drain the remaining part of the buffer and fsync it.
* double disk IO for the data that arrives during AOFRW (had to be written to both the old and new AOF files)
The main modifications of this PR:
1. Remove the AOF rewrite buffer and related code.
2. Divide the AOF into multiple files, they are classified as two types, one is the the `BASE` type,
it represents the full amount of data (Maybe AOF or RDB format) after each AOFRW, there is only
one `BASE` file at most. The second is `INCR` type, may have more than one. They represent the
incremental commands since the last AOFRW.
3. Use a AOF manifest file to record and manage these AOF files mentioned above.
4. The original configuration of `appendfilename` will be the base part of the new file name, for example:
`appendonly.aof.1.base.rdb` and `appendonly.aof.2.incr.aof`
5. Add manifest-related TCL tests, and modified some existing tests that depend on the `appendfilename`
6. Remove the `aof_rewrite_buffer_length` field in info.
7. Add `aof-disable-auto-gc` configuration. By default we're automatically deleting HISTORY type AOFs.
It also gives users the opportunity to preserve the history AOFs. just for testing use now.
8. Add AOFRW limiting measure. When the AOFRW failures reaches the threshold (3 times now),
we will delay the execution of the next AOFRW by 1 minute. If the next AOFRW also fails, it will be
delayed by 2 minutes. The next is 4, 8, 16, the maximum delay is 60 minutes (1 hour). During the limit
period, we can still use the 'bgrewriteaof' command to execute AOFRW immediately.
9. Support upgrade (load) data from old version redis.
10. Add `appenddirname` configuration, as the directory name of the append only files. All AOF files and
manifest file will be placed in this directory.
11. Only the last AOF file (BASE or INCR) can be truncated. Otherwise redis will exit even if
`aof-load-truncated` is enabled.
Co-authored-by: Oran Agra <oran@redislabs.com>
2022-01-04 01:14:13 +08:00
if ( feof ( fp ) ) {
2010-06-22 00:07:48 +02:00
break ;
Implement Multi Part AOF mechanism to avoid AOFRW overheads. (#9788)
Implement Multi-Part AOF mechanism to avoid overheads during AOFRW.
Introducing a folder with multiple AOF files tracked by a manifest file.
The main issues with the the original AOFRW mechanism are:
* buffering of commands that are processed during rewrite (consuming a lot of RAM)
* freezes of the main process when the AOFRW completes to drain the remaining part of the buffer and fsync it.
* double disk IO for the data that arrives during AOFRW (had to be written to both the old and new AOF files)
The main modifications of this PR:
1. Remove the AOF rewrite buffer and related code.
2. Divide the AOF into multiple files, they are classified as two types, one is the the `BASE` type,
it represents the full amount of data (Maybe AOF or RDB format) after each AOFRW, there is only
one `BASE` file at most. The second is `INCR` type, may have more than one. They represent the
incremental commands since the last AOFRW.
3. Use a AOF manifest file to record and manage these AOF files mentioned above.
4. The original configuration of `appendfilename` will be the base part of the new file name, for example:
`appendonly.aof.1.base.rdb` and `appendonly.aof.2.incr.aof`
5. Add manifest-related TCL tests, and modified some existing tests that depend on the `appendfilename`
6. Remove the `aof_rewrite_buffer_length` field in info.
7. Add `aof-disable-auto-gc` configuration. By default we're automatically deleting HISTORY type AOFs.
It also gives users the opportunity to preserve the history AOFs. just for testing use now.
8. Add AOFRW limiting measure. When the AOFRW failures reaches the threshold (3 times now),
we will delay the execution of the next AOFRW by 1 minute. If the next AOFRW also fails, it will be
delayed by 2 minutes. The next is 4, 8, 16, the maximum delay is 60 minutes (1 hour). During the limit
period, we can still use the 'bgrewriteaof' command to execute AOFRW immediately.
9. Support upgrade (load) data from old version redis.
10. Add `appenddirname` configuration, as the directory name of the append only files. All AOF files and
manifest file will be placed in this directory.
11. Only the last AOF file (BASE or INCR) can be truncated. Otherwise redis will exit even if
`aof-load-truncated` is enabled.
Co-authored-by: Oran Agra <oran@redislabs.com>
2022-01-04 01:14:13 +08:00
} else {
2010-06-22 00:07:48 +02:00
goto readerr ;
Implement Multi Part AOF mechanism to avoid AOFRW overheads. (#9788)
Implement Multi-Part AOF mechanism to avoid overheads during AOFRW.
Introducing a folder with multiple AOF files tracked by a manifest file.
The main issues with the the original AOFRW mechanism are:
* buffering of commands that are processed during rewrite (consuming a lot of RAM)
* freezes of the main process when the AOFRW completes to drain the remaining part of the buffer and fsync it.
* double disk IO for the data that arrives during AOFRW (had to be written to both the old and new AOF files)
The main modifications of this PR:
1. Remove the AOF rewrite buffer and related code.
2. Divide the AOF into multiple files, they are classified as two types, one is the the `BASE` type,
it represents the full amount of data (Maybe AOF or RDB format) after each AOFRW, there is only
one `BASE` file at most. The second is `INCR` type, may have more than one. They represent the
incremental commands since the last AOFRW.
3. Use a AOF manifest file to record and manage these AOF files mentioned above.
4. The original configuration of `appendfilename` will be the base part of the new file name, for example:
`appendonly.aof.1.base.rdb` and `appendonly.aof.2.incr.aof`
5. Add manifest-related TCL tests, and modified some existing tests that depend on the `appendfilename`
6. Remove the `aof_rewrite_buffer_length` field in info.
7. Add `aof-disable-auto-gc` configuration. By default we're automatically deleting HISTORY type AOFs.
It also gives users the opportunity to preserve the history AOFs. just for testing use now.
8. Add AOFRW limiting measure. When the AOFRW failures reaches the threshold (3 times now),
we will delay the execution of the next AOFRW by 1 minute. If the next AOFRW also fails, it will be
delayed by 2 minutes. The next is 4, 8, 16, the maximum delay is 60 minutes (1 hour). During the limit
period, we can still use the 'bgrewriteaof' command to execute AOFRW immediately.
9. Support upgrade (load) data from old version redis.
10. Add `appenddirname` configuration, as the directory name of the append only files. All AOF files and
manifest file will be placed in this directory.
11. Only the last AOF file (BASE or INCR) can be truncated. Otherwise redis will exit even if
`aof-load-truncated` is enabled.
Co-authored-by: Oran Agra <oran@redislabs.com>
2022-01-04 01:14:13 +08:00
}
2010-06-22 00:07:48 +02:00
}
2021-10-25 18:08:34 +08:00
if ( buf [ 0 ] = = ' # ' ) continue ; /* Skip annotations */
2010-06-22 00:07:48 +02:00
if ( buf [ 0 ] ! = ' * ' ) goto fmterr ;
2014-09-05 11:48:35 +02:00
if ( buf [ 1 ] = = ' \0 ' ) goto readerr ;
2010-06-22 00:07:48 +02:00
argc = atoi ( buf + 1 ) ;
2011-08-02 17:05:04 +04:00
if ( argc < 1 ) goto fmterr ;
2021-11-02 23:03:07 +08:00
if ( ( size_t ) argc > SIZE_MAX / sizeof ( robj * ) ) goto fmterr ;
2011-08-02 17:05:04 +04:00
2020-01-10 13:02:45 +01:00
/* Load the next command in the AOF as our fake client
* argv . */
2010-06-22 00:07:48 +02:00
argv = zmalloc ( sizeof ( robj * ) * argc ) ;
2014-09-05 15:14:36 +02:00
fakeClient - > argc = argc ;
fakeClient - > argv = argv ;
2021-10-04 12:17:22 +03:00
fakeClient - > argv_len = argc ;
2014-09-05 15:14:36 +02:00
2010-06-22 00:07:48 +02:00
for ( j = 0 ; j < argc ; j + + ) {
2020-01-10 13:02:45 +01:00
/* Parse the argument len. */
2020-01-13 13:16:13 +01:00
char * readres = fgets ( buf , sizeof ( buf ) , fp ) ;
if ( readres = = NULL | | buf [ 0 ] ! = ' $ ' ) {
2014-09-05 15:14:36 +02:00
fakeClient - > argc = j ; /* Free up to j-1. */
2021-08-09 01:03:59 -07:00
freeClientArgv ( fakeClient ) ;
2020-01-13 13:16:13 +01:00
if ( readres = = NULL )
goto readerr ;
else
goto fmterr ;
2014-09-05 15:14:36 +02:00
}
2010-06-22 00:07:48 +02:00
len = strtol ( buf + 1 , NULL , 10 ) ;
2020-01-10 13:02:45 +01:00
/* Read it into a string object. */
2017-02-23 03:04:08 -08:00
argsds = sdsnewlen ( SDS_NOINIT , len ) ;
2014-09-05 15:14:36 +02:00
if ( len & & fread ( argsds , len , 1 , fp ) = = 0 ) {
sdsfree ( argsds ) ;
fakeClient - > argc = j ; /* Free up to j-1. */
2021-08-09 01:03:59 -07:00
freeClientArgv ( fakeClient ) ;
2014-09-05 15:14:36 +02:00
goto readerr ;
}
2015-07-26 15:28:00 +02:00
argv [ j ] = createObject ( OBJ_STRING , argsds ) ;
2020-01-10 13:02:45 +01:00
/* Discard CRLF. */
2014-09-05 15:14:36 +02:00
if ( fread ( buf , 2 , 1 , fp ) = = 0 ) {
fakeClient - > argc = j + 1 ; /* Free up to j. */
2021-08-09 01:03:59 -07:00
freeClientArgv ( fakeClient ) ;
2020-01-10 13:02:45 +01:00
goto readerr ;
2014-09-05 15:14:36 +02:00
}
2010-06-22 00:07:48 +02:00
}
/* Command lookup */
Treat subcommands as commands (#9504)
## Intro
The purpose is to allow having different flags/ACL categories for
subcommands (Example: CONFIG GET is ok-loading but CONFIG SET isn't)
We create a small command table for every command that has subcommands
and each subcommand has its own flags, etc. (same as a "regular" command)
This commit also unites the Redis and the Sentinel command tables
## Affected commands
CONFIG
Used to have "admin ok-loading ok-stale no-script"
Changes:
1. Dropped "ok-loading" in all except GET (this doesn't change behavior since
there were checks in the code doing that)
XINFO
Used to have "read-only random"
Changes:
1. Dropped "random" in all except CONSUMERS
XGROUP
Used to have "write use-memory"
Changes:
1. Dropped "use-memory" in all except CREATE and CREATECONSUMER
COMMAND
No changes.
MEMORY
Used to have "random read-only"
Changes:
1. Dropped "random" in PURGE and USAGE
ACL
Used to have "admin no-script ok-loading ok-stale"
Changes:
1. Dropped "admin" in WHOAMI, GENPASS, and CAT
LATENCY
No changes.
MODULE
No changes.
SLOWLOG
Used to have "admin random ok-loading ok-stale"
Changes:
1. Dropped "random" in RESET
OBJECT
Used to have "read-only random"
Changes:
1. Dropped "random" in ENCODING and REFCOUNT
SCRIPT
Used to have "may-replicate no-script"
Changes:
1. Dropped "may-replicate" in all except FLUSH and LOAD
CLIENT
Used to have "admin no-script random ok-loading ok-stale"
Changes:
1. Dropped "random" in all except INFO and LIST
2. Dropped "admin" in ID, TRACKING, CACHING, GETREDIR, INFO, SETNAME, GETNAME, and REPLY
STRALGO
No changes.
PUBSUB
No changes.
CLUSTER
Changes:
1. Dropped "admin in countkeysinslots, getkeysinslot, info, nodes, keyslot, myid, and slots
SENTINEL
No changes.
(note that DEBUG also fits, but we decided not to convert it since it's for
debugging and anyway undocumented)
## New sub-command
This commit adds another element to the per-command output of COMMAND,
describing the list of subcommands, if any (in the same structure as "regular" commands)
Also, it adds a new subcommand:
```
COMMAND LIST [FILTERBY (MODULE <module-name>|ACLCAT <cat>|PATTERN <pattern>)]
```
which returns a set of all commands (unless filters), but excluding subcommands.
## Module API
A new module API, RM_CreateSubcommand, was added, in order to allow
module writer to define subcommands
## ACL changes:
1. Now, that each subcommand is actually a command, each has its own ACL id.
2. The old mechanism of allowed_subcommands is redundant
(blocking/allowing a subcommand is the same as blocking/allowing a regular command),
but we had to keep it, to support the widespread usage of allowed_subcommands
to block commands with certain args, that aren't subcommands (e.g. "-select +select|0").
3. I have renamed allowed_subcommands to allowed_firstargs to emphasize the difference.
4. Because subcommands are commands in ACL too, you can now use "-" to block subcommands
(e.g. "+client -client|kill"), which wasn't possible in the past.
5. It is also possible to use the allowed_firstargs mechanism with subcommand.
For example: `+config -config|set +config|set|loglevel` will block all CONFIG SET except
for setting the log level.
6. All of the ACL changes above required some amount of refactoring.
## Misc
1. There are two approaches: Either each subcommand has its own function or all
subcommands use the same function, determining what to do according to argv[0].
For now, I took the former approaches only with CONFIG and COMMAND,
while other commands use the latter approach (for smaller blamelog diff).
2. Deleted memoryGetKeys: It is no longer needed because MEMORY USAGE now uses the "range" key spec.
4. Bugfix: GETNAME was missing from CLIENT's help message.
5. Sentinel and Redis now use the same table, with the same function pointer.
Some commands have a different implementation in Sentinel, so we redirect
them (these are ROLE, PUBLISH, and INFO).
6. Command stats now show the stats per subcommand (e.g. instead of stats just
for "config" you will have stats for "config|set", "config|get", etc.)
7. It is now possible to use COMMAND directly on subcommands:
COMMAND INFO CONFIG|GET (The pipeline syntax was inspired from ACL, and
can be used in functions lookupCommandBySds and lookupCommandByCString)
8. STRALGO is now a container command (has "help")
## Breaking changes:
1. Command stats now show the stats per subcommand (see (5) above)
2021-10-20 10:52:57 +02:00
cmd = lookupCommand ( argv , argc ) ;
2010-06-22 00:07:48 +02:00
if ( ! cmd ) {
2018-10-09 11:51:04 +02:00
serverLog ( LL_WARNING ,
Implement Multi Part AOF mechanism to avoid AOFRW overheads. (#9788)
Implement Multi-Part AOF mechanism to avoid overheads during AOFRW.
Introducing a folder with multiple AOF files tracked by a manifest file.
The main issues with the the original AOFRW mechanism are:
* buffering of commands that are processed during rewrite (consuming a lot of RAM)
* freezes of the main process when the AOFRW completes to drain the remaining part of the buffer and fsync it.
* double disk IO for the data that arrives during AOFRW (had to be written to both the old and new AOF files)
The main modifications of this PR:
1. Remove the AOF rewrite buffer and related code.
2. Divide the AOF into multiple files, they are classified as two types, one is the the `BASE` type,
it represents the full amount of data (Maybe AOF or RDB format) after each AOFRW, there is only
one `BASE` file at most. The second is `INCR` type, may have more than one. They represent the
incremental commands since the last AOFRW.
3. Use a AOF manifest file to record and manage these AOF files mentioned above.
4. The original configuration of `appendfilename` will be the base part of the new file name, for example:
`appendonly.aof.1.base.rdb` and `appendonly.aof.2.incr.aof`
5. Add manifest-related TCL tests, and modified some existing tests that depend on the `appendfilename`
6. Remove the `aof_rewrite_buffer_length` field in info.
7. Add `aof-disable-auto-gc` configuration. By default we're automatically deleting HISTORY type AOFs.
It also gives users the opportunity to preserve the history AOFs. just for testing use now.
8. Add AOFRW limiting measure. When the AOFRW failures reaches the threshold (3 times now),
we will delay the execution of the next AOFRW by 1 minute. If the next AOFRW also fails, it will be
delayed by 2 minutes. The next is 4, 8, 16, the maximum delay is 60 minutes (1 hour). During the limit
period, we can still use the 'bgrewriteaof' command to execute AOFRW immediately.
9. Support upgrade (load) data from old version redis.
10. Add `appenddirname` configuration, as the directory name of the append only files. All AOF files and
manifest file will be placed in this directory.
11. Only the last AOF file (BASE or INCR) can be truncated. Otherwise redis will exit even if
`aof-load-truncated` is enabled.
Co-authored-by: Oran Agra <oran@redislabs.com>
2022-01-04 01:14:13 +08:00
" Unknown command '%s' reading the append only file %s " ,
( char * ) argv [ 0 ] - > ptr , filename ) ;
2021-08-09 01:03:59 -07:00
freeClientArgv ( fakeClient ) ;
2021-06-14 10:38:08 +03:00
ret = AOF_FAILED ;
goto cleanup ;
2010-06-22 00:07:48 +02:00
}
2014-09-05 15:14:36 +02:00
2021-09-15 11:53:42 +02:00
if ( cmd - > proc = = multiCommand ) valid_before_multi = valid_up_to ;
2018-08-02 14:59:28 +08:00
2010-06-22 00:07:48 +02:00
/* Run the command in the context of a fake client */
2020-03-29 13:08:21 +03:00
fakeClient - > cmd = fakeClient - > lastcmd = cmd ;
2018-10-09 11:51:04 +02:00
if ( fakeClient - > flags & CLIENT_MULTI & &
fakeClient - > cmd - > proc ! = execCommand )
{
Expose script flags to processCommand for better handling (#10744)
The important part is that read-only scripts (not just EVAL_RO
and FCALL_RO, but also ones with `no-writes` executed by normal EVAL or
FCALL), will now be permitted to run during CLIENT PAUSE WRITE (unlike
before where only the _RO commands would be processed).
Other than that, some errors like OOM, READONLY, MASTERDOWN are now
handled by processCommand, rather than the command itself affects the
error string (and even error code in some cases), and command stats.
Besides that, now the `may-replicate` commands, PFCOUNT and PUBLISH, will
be considered `write` commands in scripts and will be blocked in all
read-only scripts just like other write commands.
They'll also be blocked in EVAL_RO (i.e. even for scripts without the
`no-writes` shebang flag.
This commit also hides the `may_replicate` flag from the COMMAND command
output. this is a **breaking change**.
background about may_replicate:
We don't want to expose a no-may-replicate flag or alike to scripts, since we
consider the may-replicate thing an internal concern of redis, that we may
some day get rid of.
In fact, the may-replicate flag was initially introduced to flag EVAL: since
we didn't know what it's gonna do ahead of execution, before function-flags
existed). PUBLISH and PFCOUNT, both of which because they have side effects
which may some day be fixed differently.
code changes:
The changes in eval.c are mostly code re-ordering:
- evalCalcFunctionName is extracted out of evalGenericCommand
- evalExtractShebangFlags is extracted luaCreateFunction
- evalGetCommandFlags is new code
2022-06-01 14:09:40 +03:00
/* Note: we don't have to attempt calling evalGetCommandFlags,
* since this is AOF , the checks in processCommand are not made
* anyway . */
queueMultiCommand ( fakeClient , cmd - > flags ) ;
2018-08-02 14:59:28 +08:00
} else {
cmd - > proc ( fakeClient ) ;
}
2010-08-30 16:51:39 +02:00
/* The fake client should not have a reply */
2018-10-09 11:51:04 +02:00
serverAssert ( fakeClient - > bufpos = = 0 & &
listLength ( fakeClient - > reply ) = = 0 ) ;
2011-06-29 16:10:28 +02:00
/* The fake client should never get blocked */
2015-07-27 09:41:48 +02:00
serverAssert ( ( fakeClient - > flags & CLIENT_BLOCKED ) = = 0 ) ;
2010-08-30 16:51:39 +02:00
2011-04-22 09:44:06 +02:00
/* Clean up. Command code may have changed argv/argc so we use the
* argv / argc of the client instead of the local variables . */
2021-08-09 01:03:59 -07:00
freeClientArgv ( fakeClient ) ;
2014-09-16 10:32:58 +02:00
if ( server . aof_load_truncated ) valid_up_to = ftello ( fp ) ;
2019-08-11 16:07:53 +03:00
if ( server . key_load_delay )
2020-09-03 08:47:29 +03:00
debugDelay ( server . key_load_delay ) ;
2010-06-22 00:07:48 +02:00
}
/* This point can only be reached when EOF is reached without errors.
2018-08-03 12:46:04 +02:00
* If the client is in the middle of a MULTI / EXEC , handle it as it was
* a short read , even if technically the protocol is correct : we want
* to remove the unprocessed tail and continue . */
2018-08-02 14:59:28 +08:00
if ( fakeClient - > flags & CLIENT_MULTI ) {
2018-10-09 11:51:04 +02:00
serverLog ( LL_WARNING ,
Implement Multi Part AOF mechanism to avoid AOFRW overheads. (#9788)
Implement Multi-Part AOF mechanism to avoid overheads during AOFRW.
Introducing a folder with multiple AOF files tracked by a manifest file.
The main issues with the the original AOFRW mechanism are:
* buffering of commands that are processed during rewrite (consuming a lot of RAM)
* freezes of the main process when the AOFRW completes to drain the remaining part of the buffer and fsync it.
* double disk IO for the data that arrives during AOFRW (had to be written to both the old and new AOF files)
The main modifications of this PR:
1. Remove the AOF rewrite buffer and related code.
2. Divide the AOF into multiple files, they are classified as two types, one is the the `BASE` type,
it represents the full amount of data (Maybe AOF or RDB format) after each AOFRW, there is only
one `BASE` file at most. The second is `INCR` type, may have more than one. They represent the
incremental commands since the last AOFRW.
3. Use a AOF manifest file to record and manage these AOF files mentioned above.
4. The original configuration of `appendfilename` will be the base part of the new file name, for example:
`appendonly.aof.1.base.rdb` and `appendonly.aof.2.incr.aof`
5. Add manifest-related TCL tests, and modified some existing tests that depend on the `appendfilename`
6. Remove the `aof_rewrite_buffer_length` field in info.
7. Add `aof-disable-auto-gc` configuration. By default we're automatically deleting HISTORY type AOFs.
It also gives users the opportunity to preserve the history AOFs. just for testing use now.
8. Add AOFRW limiting measure. When the AOFRW failures reaches the threshold (3 times now),
we will delay the execution of the next AOFRW by 1 minute. If the next AOFRW also fails, it will be
delayed by 2 minutes. The next is 4, 8, 16, the maximum delay is 60 minutes (1 hour). During the limit
period, we can still use the 'bgrewriteaof' command to execute AOFRW immediately.
9. Support upgrade (load) data from old version redis.
10. Add `appenddirname` configuration, as the directory name of the append only files. All AOF files and
manifest file will be placed in this directory.
11. Only the last AOF file (BASE or INCR) can be truncated. Otherwise redis will exit even if
`aof-load-truncated` is enabled.
Co-authored-by: Oran Agra <oran@redislabs.com>
2022-01-04 01:14:13 +08:00
" Revert incomplete MULTI/EXEC transaction in AOF file %s " , filename ) ;
2018-08-02 14:59:28 +08:00
valid_up_to = valid_before_multi ;
goto uxeof ;
}
2010-06-22 00:07:48 +02:00
2022-05-29 18:33:16 +08:00
loaded_ok : /* DB loaded, cleanup and return success (AOF_OK or AOF_TRUNCATED). */
Implement Multi Part AOF mechanism to avoid AOFRW overheads. (#9788)
Implement Multi-Part AOF mechanism to avoid overheads during AOFRW.
Introducing a folder with multiple AOF files tracked by a manifest file.
The main issues with the the original AOFRW mechanism are:
* buffering of commands that are processed during rewrite (consuming a lot of RAM)
* freezes of the main process when the AOFRW completes to drain the remaining part of the buffer and fsync it.
* double disk IO for the data that arrives during AOFRW (had to be written to both the old and new AOF files)
The main modifications of this PR:
1. Remove the AOF rewrite buffer and related code.
2. Divide the AOF into multiple files, they are classified as two types, one is the the `BASE` type,
it represents the full amount of data (Maybe AOF or RDB format) after each AOFRW, there is only
one `BASE` file at most. The second is `INCR` type, may have more than one. They represent the
incremental commands since the last AOFRW.
3. Use a AOF manifest file to record and manage these AOF files mentioned above.
4. The original configuration of `appendfilename` will be the base part of the new file name, for example:
`appendonly.aof.1.base.rdb` and `appendonly.aof.2.incr.aof`
5. Add manifest-related TCL tests, and modified some existing tests that depend on the `appendfilename`
6. Remove the `aof_rewrite_buffer_length` field in info.
7. Add `aof-disable-auto-gc` configuration. By default we're automatically deleting HISTORY type AOFs.
It also gives users the opportunity to preserve the history AOFs. just for testing use now.
8. Add AOFRW limiting measure. When the AOFRW failures reaches the threshold (3 times now),
we will delay the execution of the next AOFRW by 1 minute. If the next AOFRW also fails, it will be
delayed by 2 minutes. The next is 4, 8, 16, the maximum delay is 60 minutes (1 hour). During the limit
period, we can still use the 'bgrewriteaof' command to execute AOFRW immediately.
9. Support upgrade (load) data from old version redis.
10. Add `appenddirname` configuration, as the directory name of the append only files. All AOF files and
manifest file will be placed in this directory.
11. Only the last AOF file (BASE or INCR) can be truncated. Otherwise redis will exit even if
`aof-load-truncated` is enabled.
Co-authored-by: Oran Agra <oran@redislabs.com>
2022-01-04 01:14:13 +08:00
loadingIncrProgress ( ftello ( fp ) - last_progress_report_size ) ;
2011-12-21 10:31:34 +01:00
server . aof_state = old_aof_state ;
2021-06-14 10:38:08 +03:00
goto cleanup ;
2010-06-22 00:07:48 +02:00
2014-09-05 09:56:26 +02:00
readerr : /* Read error. If feof(fp) is true, fall through to unexpected EOF. */
if ( ! feof ( fp ) ) {
Implement Multi Part AOF mechanism to avoid AOFRW overheads. (#9788)
Implement Multi-Part AOF mechanism to avoid overheads during AOFRW.
Introducing a folder with multiple AOF files tracked by a manifest file.
The main issues with the the original AOFRW mechanism are:
* buffering of commands that are processed during rewrite (consuming a lot of RAM)
* freezes of the main process when the AOFRW completes to drain the remaining part of the buffer and fsync it.
* double disk IO for the data that arrives during AOFRW (had to be written to both the old and new AOF files)
The main modifications of this PR:
1. Remove the AOF rewrite buffer and related code.
2. Divide the AOF into multiple files, they are classified as two types, one is the the `BASE` type,
it represents the full amount of data (Maybe AOF or RDB format) after each AOFRW, there is only
one `BASE` file at most. The second is `INCR` type, may have more than one. They represent the
incremental commands since the last AOFRW.
3. Use a AOF manifest file to record and manage these AOF files mentioned above.
4. The original configuration of `appendfilename` will be the base part of the new file name, for example:
`appendonly.aof.1.base.rdb` and `appendonly.aof.2.incr.aof`
5. Add manifest-related TCL tests, and modified some existing tests that depend on the `appendfilename`
6. Remove the `aof_rewrite_buffer_length` field in info.
7. Add `aof-disable-auto-gc` configuration. By default we're automatically deleting HISTORY type AOFs.
It also gives users the opportunity to preserve the history AOFs. just for testing use now.
8. Add AOFRW limiting measure. When the AOFRW failures reaches the threshold (3 times now),
we will delay the execution of the next AOFRW by 1 minute. If the next AOFRW also fails, it will be
delayed by 2 minutes. The next is 4, 8, 16, the maximum delay is 60 minutes (1 hour). During the limit
period, we can still use the 'bgrewriteaof' command to execute AOFRW immediately.
9. Support upgrade (load) data from old version redis.
10. Add `appenddirname` configuration, as the directory name of the append only files. All AOF files and
manifest file will be placed in this directory.
11. Only the last AOF file (BASE or INCR) can be truncated. Otherwise redis will exit even if
`aof-load-truncated` is enabled.
Co-authored-by: Oran Agra <oran@redislabs.com>
2022-01-04 01:14:13 +08:00
serverLog ( LL_WARNING , " Unrecoverable error reading the append only file %s: %s " , filename , strerror ( errno ) ) ;
2021-06-14 10:38:08 +03:00
ret = AOF_FAILED ;
goto cleanup ;
2010-06-22 00:07:48 +02:00
}
2014-09-05 09:56:26 +02:00
uxeof : /* Unexpected AOF end of file. */
2014-09-05 11:48:35 +02:00
if ( server . aof_load_truncated ) {
Implement Multi Part AOF mechanism to avoid AOFRW overheads. (#9788)
Implement Multi-Part AOF mechanism to avoid overheads during AOFRW.
Introducing a folder with multiple AOF files tracked by a manifest file.
The main issues with the the original AOFRW mechanism are:
* buffering of commands that are processed during rewrite (consuming a lot of RAM)
* freezes of the main process when the AOFRW completes to drain the remaining part of the buffer and fsync it.
* double disk IO for the data that arrives during AOFRW (had to be written to both the old and new AOF files)
The main modifications of this PR:
1. Remove the AOF rewrite buffer and related code.
2. Divide the AOF into multiple files, they are classified as two types, one is the the `BASE` type,
it represents the full amount of data (Maybe AOF or RDB format) after each AOFRW, there is only
one `BASE` file at most. The second is `INCR` type, may have more than one. They represent the
incremental commands since the last AOFRW.
3. Use a AOF manifest file to record and manage these AOF files mentioned above.
4. The original configuration of `appendfilename` will be the base part of the new file name, for example:
`appendonly.aof.1.base.rdb` and `appendonly.aof.2.incr.aof`
5. Add manifest-related TCL tests, and modified some existing tests that depend on the `appendfilename`
6. Remove the `aof_rewrite_buffer_length` field in info.
7. Add `aof-disable-auto-gc` configuration. By default we're automatically deleting HISTORY type AOFs.
It also gives users the opportunity to preserve the history AOFs. just for testing use now.
8. Add AOFRW limiting measure. When the AOFRW failures reaches the threshold (3 times now),
we will delay the execution of the next AOFRW by 1 minute. If the next AOFRW also fails, it will be
delayed by 2 minutes. The next is 4, 8, 16, the maximum delay is 60 minutes (1 hour). During the limit
period, we can still use the 'bgrewriteaof' command to execute AOFRW immediately.
9. Support upgrade (load) data from old version redis.
10. Add `appenddirname` configuration, as the directory name of the append only files. All AOF files and
manifest file will be placed in this directory.
11. Only the last AOF file (BASE or INCR) can be truncated. Otherwise redis will exit even if
`aof-load-truncated` is enabled.
Co-authored-by: Oran Agra <oran@redislabs.com>
2022-01-04 01:14:13 +08:00
serverLog ( LL_WARNING , " !!! Warning: short read while loading the AOF file %s!!! " , filename ) ;
serverLog ( LL_WARNING , " !!! Truncating the AOF %s at offset %llu !!! " ,
filename , ( unsigned long long ) valid_up_to ) ;
if ( valid_up_to = = - 1 | | truncate ( aof_filepath , valid_up_to ) = = - 1 ) {
2014-09-16 10:32:58 +02:00
if ( valid_up_to = = - 1 ) {
2015-07-27 09:41:48 +02:00
serverLog ( LL_WARNING , " Last valid command offset is invalid " ) ;
2014-09-16 10:32:58 +02:00
} else {
Implement Multi Part AOF mechanism to avoid AOFRW overheads. (#9788)
Implement Multi-Part AOF mechanism to avoid overheads during AOFRW.
Introducing a folder with multiple AOF files tracked by a manifest file.
The main issues with the the original AOFRW mechanism are:
* buffering of commands that are processed during rewrite (consuming a lot of RAM)
* freezes of the main process when the AOFRW completes to drain the remaining part of the buffer and fsync it.
* double disk IO for the data that arrives during AOFRW (had to be written to both the old and new AOF files)
The main modifications of this PR:
1. Remove the AOF rewrite buffer and related code.
2. Divide the AOF into multiple files, they are classified as two types, one is the the `BASE` type,
it represents the full amount of data (Maybe AOF or RDB format) after each AOFRW, there is only
one `BASE` file at most. The second is `INCR` type, may have more than one. They represent the
incremental commands since the last AOFRW.
3. Use a AOF manifest file to record and manage these AOF files mentioned above.
4. The original configuration of `appendfilename` will be the base part of the new file name, for example:
`appendonly.aof.1.base.rdb` and `appendonly.aof.2.incr.aof`
5. Add manifest-related TCL tests, and modified some existing tests that depend on the `appendfilename`
6. Remove the `aof_rewrite_buffer_length` field in info.
7. Add `aof-disable-auto-gc` configuration. By default we're automatically deleting HISTORY type AOFs.
It also gives users the opportunity to preserve the history AOFs. just for testing use now.
8. Add AOFRW limiting measure. When the AOFRW failures reaches the threshold (3 times now),
we will delay the execution of the next AOFRW by 1 minute. If the next AOFRW also fails, it will be
delayed by 2 minutes. The next is 4, 8, 16, the maximum delay is 60 minutes (1 hour). During the limit
period, we can still use the 'bgrewriteaof' command to execute AOFRW immediately.
9. Support upgrade (load) data from old version redis.
10. Add `appenddirname` configuration, as the directory name of the append only files. All AOF files and
manifest file will be placed in this directory.
11. Only the last AOF file (BASE or INCR) can be truncated. Otherwise redis will exit even if
`aof-load-truncated` is enabled.
Co-authored-by: Oran Agra <oran@redislabs.com>
2022-01-04 01:14:13 +08:00
serverLog ( LL_WARNING , " Error truncating the AOF file %s: %s " ,
filename , strerror ( errno ) ) ;
2014-09-16 10:32:58 +02:00
}
} else {
2014-09-16 10:57:40 +02:00
/* Make sure the AOF file descriptor points to the end of the
* file after the truncate call . */
if ( server . aof_fd ! = - 1 & & lseek ( server . aof_fd , 0 , SEEK_END ) = = - 1 ) {
Implement Multi Part AOF mechanism to avoid AOFRW overheads. (#9788)
Implement Multi-Part AOF mechanism to avoid overheads during AOFRW.
Introducing a folder with multiple AOF files tracked by a manifest file.
The main issues with the the original AOFRW mechanism are:
* buffering of commands that are processed during rewrite (consuming a lot of RAM)
* freezes of the main process when the AOFRW completes to drain the remaining part of the buffer and fsync it.
* double disk IO for the data that arrives during AOFRW (had to be written to both the old and new AOF files)
The main modifications of this PR:
1. Remove the AOF rewrite buffer and related code.
2. Divide the AOF into multiple files, they are classified as two types, one is the the `BASE` type,
it represents the full amount of data (Maybe AOF or RDB format) after each AOFRW, there is only
one `BASE` file at most. The second is `INCR` type, may have more than one. They represent the
incremental commands since the last AOFRW.
3. Use a AOF manifest file to record and manage these AOF files mentioned above.
4. The original configuration of `appendfilename` will be the base part of the new file name, for example:
`appendonly.aof.1.base.rdb` and `appendonly.aof.2.incr.aof`
5. Add manifest-related TCL tests, and modified some existing tests that depend on the `appendfilename`
6. Remove the `aof_rewrite_buffer_length` field in info.
7. Add `aof-disable-auto-gc` configuration. By default we're automatically deleting HISTORY type AOFs.
It also gives users the opportunity to preserve the history AOFs. just for testing use now.
8. Add AOFRW limiting measure. When the AOFRW failures reaches the threshold (3 times now),
we will delay the execution of the next AOFRW by 1 minute. If the next AOFRW also fails, it will be
delayed by 2 minutes. The next is 4, 8, 16, the maximum delay is 60 minutes (1 hour). During the limit
period, we can still use the 'bgrewriteaof' command to execute AOFRW immediately.
9. Support upgrade (load) data from old version redis.
10. Add `appenddirname` configuration, as the directory name of the append only files. All AOF files and
manifest file will be placed in this directory.
11. Only the last AOF file (BASE or INCR) can be truncated. Otherwise redis will exit even if
`aof-load-truncated` is enabled.
Co-authored-by: Oran Agra <oran@redislabs.com>
2022-01-04 01:14:13 +08:00
serverLog ( LL_WARNING , " Can't seek the end of the AOF file %s: %s " ,
filename , strerror ( errno ) ) ;
2014-09-16 10:57:40 +02:00
} else {
2015-07-27 09:41:48 +02:00
serverLog ( LL_WARNING ,
Implement Multi Part AOF mechanism to avoid AOFRW overheads. (#9788)
Implement Multi-Part AOF mechanism to avoid overheads during AOFRW.
Introducing a folder with multiple AOF files tracked by a manifest file.
The main issues with the the original AOFRW mechanism are:
* buffering of commands that are processed during rewrite (consuming a lot of RAM)
* freezes of the main process when the AOFRW completes to drain the remaining part of the buffer and fsync it.
* double disk IO for the data that arrives during AOFRW (had to be written to both the old and new AOF files)
The main modifications of this PR:
1. Remove the AOF rewrite buffer and related code.
2. Divide the AOF into multiple files, they are classified as two types, one is the the `BASE` type,
it represents the full amount of data (Maybe AOF or RDB format) after each AOFRW, there is only
one `BASE` file at most. The second is `INCR` type, may have more than one. They represent the
incremental commands since the last AOFRW.
3. Use a AOF manifest file to record and manage these AOF files mentioned above.
4. The original configuration of `appendfilename` will be the base part of the new file name, for example:
`appendonly.aof.1.base.rdb` and `appendonly.aof.2.incr.aof`
5. Add manifest-related TCL tests, and modified some existing tests that depend on the `appendfilename`
6. Remove the `aof_rewrite_buffer_length` field in info.
7. Add `aof-disable-auto-gc` configuration. By default we're automatically deleting HISTORY type AOFs.
It also gives users the opportunity to preserve the history AOFs. just for testing use now.
8. Add AOFRW limiting measure. When the AOFRW failures reaches the threshold (3 times now),
we will delay the execution of the next AOFRW by 1 minute. If the next AOFRW also fails, it will be
delayed by 2 minutes. The next is 4, 8, 16, the maximum delay is 60 minutes (1 hour). During the limit
period, we can still use the 'bgrewriteaof' command to execute AOFRW immediately.
9. Support upgrade (load) data from old version redis.
10. Add `appenddirname` configuration, as the directory name of the append only files. All AOF files and
manifest file will be placed in this directory.
11. Only the last AOF file (BASE or INCR) can be truncated. Otherwise redis will exit even if
`aof-load-truncated` is enabled.
Co-authored-by: Oran Agra <oran@redislabs.com>
2022-01-04 01:14:13 +08:00
" AOF %s loaded anyway because aof-load-truncated is enabled " , filename ) ;
ret = AOF_TRUNCATED ;
2014-09-16 10:57:40 +02:00
goto loaded_ok ;
}
2014-09-16 10:32:58 +02:00
}
2014-09-05 11:48:35 +02:00
}
Adapt redis-check-aof tool for Multi Part Aof (#10061)
Modifications of this PR:
1. Support the verification of `Multi Part AOF`, while still maintaining support for the
old-style `AOF/RDB-preamble`. `redis-check-aof` will automatically choose which
mode to use according to the incoming file format.
`Usage: redis-check-aof [--fix|--truncate-to-timestamp $timestamp] <AOF/manifest>`
2. Refactor part of the code to make it easier to understand
3. Currently only supports truncate (`--fix` or `--truncate-to-timestamp`) the last AOF
file (may be `BASE` or `INCR`)
The reasons for 3 above:
- for `--fix`: Only the last AOF may be truncated, this is guaranteed by redis
- for `--truncate-to-timestamp`: Normally, we only have `BASE` + `INCR` files
at most, and `BASE` cannot be truncated(It only contains a timestamp annotation
at the beginning of the file), so only `INCR` can be truncated. If we have a
`BASE+INCR1+INCR2` file (meaning we have an interrupted AOFRW), Only `INCR2`
files can be truncated at this time. If we still insist on truncate `INCR1`, we need to
manually delete `INCR2` and update the manifest file, then re-run `redis-check-aof`
- If we want to support truncate any file, we need to add very complicated code to support
the atomic modification of multiple file deletion and update manifest, I think this is unnecessary
2022-02-17 14:13:28 +08:00
serverLog ( LL_WARNING , " Unexpected end of file reading the append only file %s. You can: "
2024-04-06 20:33:09 -07:00
" 1) Make a backup of your AOF file, then use ./valkey-check-aof --fix <filename.manifest>. "
Adapt redis-check-aof tool for Multi Part Aof (#10061)
Modifications of this PR:
1. Support the verification of `Multi Part AOF`, while still maintaining support for the
old-style `AOF/RDB-preamble`. `redis-check-aof` will automatically choose which
mode to use according to the incoming file format.
`Usage: redis-check-aof [--fix|--truncate-to-timestamp $timestamp] <AOF/manifest>`
2. Refactor part of the code to make it easier to understand
3. Currently only supports truncate (`--fix` or `--truncate-to-timestamp`) the last AOF
file (may be `BASE` or `INCR`)
The reasons for 3 above:
- for `--fix`: Only the last AOF may be truncated, this is guaranteed by redis
- for `--truncate-to-timestamp`: Normally, we only have `BASE` + `INCR` files
at most, and `BASE` cannot be truncated(It only contains a timestamp annotation
at the beginning of the file), so only `INCR` can be truncated. If we have a
`BASE+INCR1+INCR2` file (meaning we have an interrupted AOFRW), Only `INCR2`
files can be truncated at this time. If we still insist on truncate `INCR1`, we need to
manually delete `INCR2` and update the manifest file, then re-run `redis-check-aof`
- If we want to support truncate any file, we need to add very complicated code to support
the atomic modification of multiple file deletion and update manifest, I think this is unnecessary
2022-02-17 14:13:28 +08:00
" 2) Alternatively you can set the 'aof-load-truncated' configuration option to yes and restart the server. " , filename ) ;
2021-06-14 10:38:08 +03:00
ret = AOF_FAILED ;
goto cleanup ;
2014-09-05 09:56:26 +02:00
fmterr : /* Format error. */
Adapt redis-check-aof tool for Multi Part Aof (#10061)
Modifications of this PR:
1. Support the verification of `Multi Part AOF`, while still maintaining support for the
old-style `AOF/RDB-preamble`. `redis-check-aof` will automatically choose which
mode to use according to the incoming file format.
`Usage: redis-check-aof [--fix|--truncate-to-timestamp $timestamp] <AOF/manifest>`
2. Refactor part of the code to make it easier to understand
3. Currently only supports truncate (`--fix` or `--truncate-to-timestamp`) the last AOF
file (may be `BASE` or `INCR`)
The reasons for 3 above:
- for `--fix`: Only the last AOF may be truncated, this is guaranteed by redis
- for `--truncate-to-timestamp`: Normally, we only have `BASE` + `INCR` files
at most, and `BASE` cannot be truncated(It only contains a timestamp annotation
at the beginning of the file), so only `INCR` can be truncated. If we have a
`BASE+INCR1+INCR2` file (meaning we have an interrupted AOFRW), Only `INCR2`
files can be truncated at this time. If we still insist on truncate `INCR1`, we need to
manually delete `INCR2` and update the manifest file, then re-run `redis-check-aof`
- If we want to support truncate any file, we need to add very complicated code to support
the atomic modification of multiple file deletion and update manifest, I think this is unnecessary
2022-02-17 14:13:28 +08:00
serverLog ( LL_WARNING , " Bad file format reading the append only file %s: "
2024-04-06 20:33:09 -07:00
" make a backup of your AOF file, then use ./valkey-check-aof --fix <filename.manifest> " , filename ) ;
2021-06-14 10:38:08 +03:00
ret = AOF_FAILED ;
/* fall through to cleanup. */
cleanup :
2021-10-04 12:17:22 +03:00
if ( fakeClient ) freeClient ( fakeClient ) ;
2023-02-16 08:07:35 +02:00
server . current_client = old_cur_client ;
server . executing_client = old_exec_client ;
2019-09-04 17:20:37 +02:00
fclose ( fp ) ;
Implement Multi Part AOF mechanism to avoid AOFRW overheads. (#9788)
Implement Multi-Part AOF mechanism to avoid overheads during AOFRW.
Introducing a folder with multiple AOF files tracked by a manifest file.
The main issues with the the original AOFRW mechanism are:
* buffering of commands that are processed during rewrite (consuming a lot of RAM)
* freezes of the main process when the AOFRW completes to drain the remaining part of the buffer and fsync it.
* double disk IO for the data that arrives during AOFRW (had to be written to both the old and new AOF files)
The main modifications of this PR:
1. Remove the AOF rewrite buffer and related code.
2. Divide the AOF into multiple files, they are classified as two types, one is the the `BASE` type,
it represents the full amount of data (Maybe AOF or RDB format) after each AOFRW, there is only
one `BASE` file at most. The second is `INCR` type, may have more than one. They represent the
incremental commands since the last AOFRW.
3. Use a AOF manifest file to record and manage these AOF files mentioned above.
4. The original configuration of `appendfilename` will be the base part of the new file name, for example:
`appendonly.aof.1.base.rdb` and `appendonly.aof.2.incr.aof`
5. Add manifest-related TCL tests, and modified some existing tests that depend on the `appendfilename`
6. Remove the `aof_rewrite_buffer_length` field in info.
7. Add `aof-disable-auto-gc` configuration. By default we're automatically deleting HISTORY type AOFs.
It also gives users the opportunity to preserve the history AOFs. just for testing use now.
8. Add AOFRW limiting measure. When the AOFRW failures reaches the threshold (3 times now),
we will delay the execution of the next AOFRW by 1 minute. If the next AOFRW also fails, it will be
delayed by 2 minutes. The next is 4, 8, 16, the maximum delay is 60 minutes (1 hour). During the limit
period, we can still use the 'bgrewriteaof' command to execute AOFRW immediately.
9. Support upgrade (load) data from old version redis.
10. Add `appenddirname` configuration, as the directory name of the append only files. All AOF files and
manifest file will be placed in this directory.
11. Only the last AOF file (BASE or INCR) can be truncated. Otherwise redis will exit even if
`aof-load-truncated` is enabled.
Co-authored-by: Oran Agra <oran@redislabs.com>
2022-01-04 01:14:13 +08:00
sdsfree ( aof_filepath ) ;
return ret ;
}
/* Load the AOF files according the aofManifest pointed by am. */
int loadAppendOnlyFiles ( aofManifest * am ) {
serverAssert ( am ! = NULL ) ;
2022-05-29 18:33:16 +08:00
int status , ret = AOF_OK ;
Implement Multi Part AOF mechanism to avoid AOFRW overheads. (#9788)
Implement Multi-Part AOF mechanism to avoid overheads during AOFRW.
Introducing a folder with multiple AOF files tracked by a manifest file.
The main issues with the the original AOFRW mechanism are:
* buffering of commands that are processed during rewrite (consuming a lot of RAM)
* freezes of the main process when the AOFRW completes to drain the remaining part of the buffer and fsync it.
* double disk IO for the data that arrives during AOFRW (had to be written to both the old and new AOF files)
The main modifications of this PR:
1. Remove the AOF rewrite buffer and related code.
2. Divide the AOF into multiple files, they are classified as two types, one is the the `BASE` type,
it represents the full amount of data (Maybe AOF or RDB format) after each AOFRW, there is only
one `BASE` file at most. The second is `INCR` type, may have more than one. They represent the
incremental commands since the last AOFRW.
3. Use a AOF manifest file to record and manage these AOF files mentioned above.
4. The original configuration of `appendfilename` will be the base part of the new file name, for example:
`appendonly.aof.1.base.rdb` and `appendonly.aof.2.incr.aof`
5. Add manifest-related TCL tests, and modified some existing tests that depend on the `appendfilename`
6. Remove the `aof_rewrite_buffer_length` field in info.
7. Add `aof-disable-auto-gc` configuration. By default we're automatically deleting HISTORY type AOFs.
It also gives users the opportunity to preserve the history AOFs. just for testing use now.
8. Add AOFRW limiting measure. When the AOFRW failures reaches the threshold (3 times now),
we will delay the execution of the next AOFRW by 1 minute. If the next AOFRW also fails, it will be
delayed by 2 minutes. The next is 4, 8, 16, the maximum delay is 60 minutes (1 hour). During the limit
period, we can still use the 'bgrewriteaof' command to execute AOFRW immediately.
9. Support upgrade (load) data from old version redis.
10. Add `appenddirname` configuration, as the directory name of the append only files. All AOF files and
manifest file will be placed in this directory.
11. Only the last AOF file (BASE or INCR) can be truncated. Otherwise redis will exit even if
`aof-load-truncated` is enabled.
Co-authored-by: Oran Agra <oran@redislabs.com>
2022-01-04 01:14:13 +08:00
long long start ;
Fix auto-aof-rewrite-percentage based AOFRW trigger after restart (#10550)
The `auto-aof-rewrite-percentage` config defines at what growth percentage
an automatic AOF rewrite is triggered.
This normally works OK since the size of the AOF file at the end of a rewrite
is stored in `server.aof_rewrite_base_size`.
However, on startup, redis used to store the entire size of the AOF file into that
variable, resulting in a wrong automatic AOF rewrite trigger (could have been
triggered much later than desired).
This issue would only affect the first AOFRW after startup, after that future AOFRW
would have been triggered correctly.
This bug existed in all previous versions of Redis.
This PR unifies the meaning of `server.aof_rewrite_base_size`, which only represents
the size of BASE AOF.
Note that after an AOFRW this size includes the size of the incremental file (all the
commands that executed during rewrite), so that auto-aof-rewrite-percentage is the
ratio from the size of the AOF after rewrite.
However, on startup, it is complicated to know that size, and we compromised on
taking just the size of the base file, this means that the first rewrite after startup can
happen a little bit too soon.
Co-authored-by: Oran Agra <oran@redislabs.com>
Co-authored-by: yoav-steinberg <yoav@redislabs.com>
2022-04-07 19:47:07 +08:00
off_t total_size = 0 , base_size = 0 ;
Implement Multi Part AOF mechanism to avoid AOFRW overheads. (#9788)
Implement Multi-Part AOF mechanism to avoid overheads during AOFRW.
Introducing a folder with multiple AOF files tracked by a manifest file.
The main issues with the the original AOFRW mechanism are:
* buffering of commands that are processed during rewrite (consuming a lot of RAM)
* freezes of the main process when the AOFRW completes to drain the remaining part of the buffer and fsync it.
* double disk IO for the data that arrives during AOFRW (had to be written to both the old and new AOF files)
The main modifications of this PR:
1. Remove the AOF rewrite buffer and related code.
2. Divide the AOF into multiple files, they are classified as two types, one is the the `BASE` type,
it represents the full amount of data (Maybe AOF or RDB format) after each AOFRW, there is only
one `BASE` file at most. The second is `INCR` type, may have more than one. They represent the
incremental commands since the last AOFRW.
3. Use a AOF manifest file to record and manage these AOF files mentioned above.
4. The original configuration of `appendfilename` will be the base part of the new file name, for example:
`appendonly.aof.1.base.rdb` and `appendonly.aof.2.incr.aof`
5. Add manifest-related TCL tests, and modified some existing tests that depend on the `appendfilename`
6. Remove the `aof_rewrite_buffer_length` field in info.
7. Add `aof-disable-auto-gc` configuration. By default we're automatically deleting HISTORY type AOFs.
It also gives users the opportunity to preserve the history AOFs. just for testing use now.
8. Add AOFRW limiting measure. When the AOFRW failures reaches the threshold (3 times now),
we will delay the execution of the next AOFRW by 1 minute. If the next AOFRW also fails, it will be
delayed by 2 minutes. The next is 4, 8, 16, the maximum delay is 60 minutes (1 hour). During the limit
period, we can still use the 'bgrewriteaof' command to execute AOFRW immediately.
9. Support upgrade (load) data from old version redis.
10. Add `appenddirname` configuration, as the directory name of the append only files. All AOF files and
manifest file will be placed in this directory.
11. Only the last AOF file (BASE or INCR) can be truncated. Otherwise redis will exit even if
`aof-load-truncated` is enabled.
Co-authored-by: Oran Agra <oran@redislabs.com>
2022-01-04 01:14:13 +08:00
sds aof_name ;
int total_num , aof_num = 0 , last_file ;
/* If the 'server.aof_filename' file exists in dir, we may be starting
* from an old redis version . We will use enter upgrade mode in three situations .
*
* 1. If the ' server . aof_dirname ' directory not exist
* 2. If the ' server . aof_dirname ' directory exists but the manifest file is missing
* 3. If the ' server . aof_dirname ' directory exists and the manifest file it contains
* has only one base AOF record , and the file name of this base AOF is ' server . aof_filename ' ,
* and the ' server . aof_filename ' file not exist in ' server . aof_dirname ' directory
* */
if ( fileExist ( server . aof_filename ) ) {
if ( ! dirExists ( server . aof_dirname ) | |
( am - > base_aof_info = = NULL & & listLength ( am - > incr_aof_list ) = = 0 ) | |
( am - > base_aof_info ! = NULL & & listLength ( am - > incr_aof_list ) = = 0 & &
! strcmp ( am - > base_aof_info - > file_name , server . aof_filename ) & & ! aofFileExist ( server . aof_filename ) ) )
{
aofUpgradePrepare ( am ) ;
}
}
if ( am - > base_aof_info = = NULL & & listLength ( am - > incr_aof_list ) = = 0 ) {
return AOF_NOT_EXIST ;
}
total_num = getBaseAndIncrAppendOnlyFilesNum ( am ) ;
serverAssert ( total_num > 0 ) ;
/* Here we calculate the total size of all BASE and INCR files in
* advance , it will be set to ` server . loading_total_bytes ` . */
2022-02-22 08:59:23 +02:00
total_size = getBaseAndIncrAppendOnlyFilesSize ( am , & status ) ;
if ( status ! = AOF_OK ) {
/* If an AOF exists in the manifest but not on the disk, we consider this to be a fatal error. */
if ( status = = AOF_NOT_EXIST ) status = AOF_FAILED ;
return status ;
} else if ( total_size = = 0 ) {
return AOF_EMPTY ;
}
Implement Multi Part AOF mechanism to avoid AOFRW overheads. (#9788)
Implement Multi-Part AOF mechanism to avoid overheads during AOFRW.
Introducing a folder with multiple AOF files tracked by a manifest file.
The main issues with the the original AOFRW mechanism are:
* buffering of commands that are processed during rewrite (consuming a lot of RAM)
* freezes of the main process when the AOFRW completes to drain the remaining part of the buffer and fsync it.
* double disk IO for the data that arrives during AOFRW (had to be written to both the old and new AOF files)
The main modifications of this PR:
1. Remove the AOF rewrite buffer and related code.
2. Divide the AOF into multiple files, they are classified as two types, one is the the `BASE` type,
it represents the full amount of data (Maybe AOF or RDB format) after each AOFRW, there is only
one `BASE` file at most. The second is `INCR` type, may have more than one. They represent the
incremental commands since the last AOFRW.
3. Use a AOF manifest file to record and manage these AOF files mentioned above.
4. The original configuration of `appendfilename` will be the base part of the new file name, for example:
`appendonly.aof.1.base.rdb` and `appendonly.aof.2.incr.aof`
5. Add manifest-related TCL tests, and modified some existing tests that depend on the `appendfilename`
6. Remove the `aof_rewrite_buffer_length` field in info.
7. Add `aof-disable-auto-gc` configuration. By default we're automatically deleting HISTORY type AOFs.
It also gives users the opportunity to preserve the history AOFs. just for testing use now.
8. Add AOFRW limiting measure. When the AOFRW failures reaches the threshold (3 times now),
we will delay the execution of the next AOFRW by 1 minute. If the next AOFRW also fails, it will be
delayed by 2 minutes. The next is 4, 8, 16, the maximum delay is 60 minutes (1 hour). During the limit
period, we can still use the 'bgrewriteaof' command to execute AOFRW immediately.
9. Support upgrade (load) data from old version redis.
10. Add `appenddirname` configuration, as the directory name of the append only files. All AOF files and
manifest file will be placed in this directory.
11. Only the last AOF file (BASE or INCR) can be truncated. Otherwise redis will exit even if
`aof-load-truncated` is enabled.
Co-authored-by: Oran Agra <oran@redislabs.com>
2022-01-04 01:14:13 +08:00
startLoading ( total_size , RDBFLAGS_AOF_PREAMBLE , 0 ) ;
/* Load BASE AOF if needed. */
if ( am - > base_aof_info ) {
serverAssert ( am - > base_aof_info - > file_type = = AOF_FILE_TYPE_BASE ) ;
aof_name = ( char * ) am - > base_aof_info - > file_name ;
updateLoadingFileName ( aof_name ) ;
Fix auto-aof-rewrite-percentage based AOFRW trigger after restart (#10550)
The `auto-aof-rewrite-percentage` config defines at what growth percentage
an automatic AOF rewrite is triggered.
This normally works OK since the size of the AOF file at the end of a rewrite
is stored in `server.aof_rewrite_base_size`.
However, on startup, redis used to store the entire size of the AOF file into that
variable, resulting in a wrong automatic AOF rewrite trigger (could have been
triggered much later than desired).
This issue would only affect the first AOFRW after startup, after that future AOFRW
would have been triggered correctly.
This bug existed in all previous versions of Redis.
This PR unifies the meaning of `server.aof_rewrite_base_size`, which only represents
the size of BASE AOF.
Note that after an AOFRW this size includes the size of the incremental file (all the
commands that executed during rewrite), so that auto-aof-rewrite-percentage is the
ratio from the size of the AOF after rewrite.
However, on startup, it is complicated to know that size, and we compromised on
taking just the size of the base file, this means that the first rewrite after startup can
happen a little bit too soon.
Co-authored-by: Oran Agra <oran@redislabs.com>
Co-authored-by: yoav-steinberg <yoav@redislabs.com>
2022-04-07 19:47:07 +08:00
base_size = getAppendOnlyFileSize ( aof_name , NULL ) ;
Implement Multi Part AOF mechanism to avoid AOFRW overheads. (#9788)
Implement Multi-Part AOF mechanism to avoid overheads during AOFRW.
Introducing a folder with multiple AOF files tracked by a manifest file.
The main issues with the the original AOFRW mechanism are:
* buffering of commands that are processed during rewrite (consuming a lot of RAM)
* freezes of the main process when the AOFRW completes to drain the remaining part of the buffer and fsync it.
* double disk IO for the data that arrives during AOFRW (had to be written to both the old and new AOF files)
The main modifications of this PR:
1. Remove the AOF rewrite buffer and related code.
2. Divide the AOF into multiple files, they are classified as two types, one is the the `BASE` type,
it represents the full amount of data (Maybe AOF or RDB format) after each AOFRW, there is only
one `BASE` file at most. The second is `INCR` type, may have more than one. They represent the
incremental commands since the last AOFRW.
3. Use a AOF manifest file to record and manage these AOF files mentioned above.
4. The original configuration of `appendfilename` will be the base part of the new file name, for example:
`appendonly.aof.1.base.rdb` and `appendonly.aof.2.incr.aof`
5. Add manifest-related TCL tests, and modified some existing tests that depend on the `appendfilename`
6. Remove the `aof_rewrite_buffer_length` field in info.
7. Add `aof-disable-auto-gc` configuration. By default we're automatically deleting HISTORY type AOFs.
It also gives users the opportunity to preserve the history AOFs. just for testing use now.
8. Add AOFRW limiting measure. When the AOFRW failures reaches the threshold (3 times now),
we will delay the execution of the next AOFRW by 1 minute. If the next AOFRW also fails, it will be
delayed by 2 minutes. The next is 4, 8, 16, the maximum delay is 60 minutes (1 hour). During the limit
period, we can still use the 'bgrewriteaof' command to execute AOFRW immediately.
9. Support upgrade (load) data from old version redis.
10. Add `appenddirname` configuration, as the directory name of the append only files. All AOF files and
manifest file will be placed in this directory.
11. Only the last AOF file (BASE or INCR) can be truncated. Otherwise redis will exit even if
`aof-load-truncated` is enabled.
Co-authored-by: Oran Agra <oran@redislabs.com>
2022-01-04 01:14:13 +08:00
last_file = + + aof_num = = total_num ;
start = ustime ( ) ;
ret = loadSingleAppendOnlyFile ( aof_name ) ;
if ( ret = = AOF_OK | | ( ret = = AOF_TRUNCATED & & last_file ) ) {
serverLog ( LL_NOTICE , " DB loaded from base file %s: %.3f seconds " ,
aof_name , ( float ) ( ustime ( ) - start ) / 1000000 ) ;
}
2022-02-22 08:59:23 +02:00
/* If the truncated file is not the last file, we consider this to be a fatal error. */
if ( ret = = AOF_TRUNCATED & & ! last_file ) {
Implement Multi Part AOF mechanism to avoid AOFRW overheads. (#9788)
Implement Multi-Part AOF mechanism to avoid overheads during AOFRW.
Introducing a folder with multiple AOF files tracked by a manifest file.
The main issues with the the original AOFRW mechanism are:
* buffering of commands that are processed during rewrite (consuming a lot of RAM)
* freezes of the main process when the AOFRW completes to drain the remaining part of the buffer and fsync it.
* double disk IO for the data that arrives during AOFRW (had to be written to both the old and new AOF files)
The main modifications of this PR:
1. Remove the AOF rewrite buffer and related code.
2. Divide the AOF into multiple files, they are classified as two types, one is the the `BASE` type,
it represents the full amount of data (Maybe AOF or RDB format) after each AOFRW, there is only
one `BASE` file at most. The second is `INCR` type, may have more than one. They represent the
incremental commands since the last AOFRW.
3. Use a AOF manifest file to record and manage these AOF files mentioned above.
4. The original configuration of `appendfilename` will be the base part of the new file name, for example:
`appendonly.aof.1.base.rdb` and `appendonly.aof.2.incr.aof`
5. Add manifest-related TCL tests, and modified some existing tests that depend on the `appendfilename`
6. Remove the `aof_rewrite_buffer_length` field in info.
7. Add `aof-disable-auto-gc` configuration. By default we're automatically deleting HISTORY type AOFs.
It also gives users the opportunity to preserve the history AOFs. just for testing use now.
8. Add AOFRW limiting measure. When the AOFRW failures reaches the threshold (3 times now),
we will delay the execution of the next AOFRW by 1 minute. If the next AOFRW also fails, it will be
delayed by 2 minutes. The next is 4, 8, 16, the maximum delay is 60 minutes (1 hour). During the limit
period, we can still use the 'bgrewriteaof' command to execute AOFRW immediately.
9. Support upgrade (load) data from old version redis.
10. Add `appenddirname` configuration, as the directory name of the append only files. All AOF files and
manifest file will be placed in this directory.
11. Only the last AOF file (BASE or INCR) can be truncated. Otherwise redis will exit even if
`aof-load-truncated` is enabled.
Co-authored-by: Oran Agra <oran@redislabs.com>
2022-01-04 01:14:13 +08:00
ret = AOF_FAILED ;
2022-11-22 09:18:36 -05:00
serverLog ( LL_WARNING , " Fatal error: the truncated file is not the last file " ) ;
Implement Multi Part AOF mechanism to avoid AOFRW overheads. (#9788)
Implement Multi-Part AOF mechanism to avoid overheads during AOFRW.
Introducing a folder with multiple AOF files tracked by a manifest file.
The main issues with the the original AOFRW mechanism are:
* buffering of commands that are processed during rewrite (consuming a lot of RAM)
* freezes of the main process when the AOFRW completes to drain the remaining part of the buffer and fsync it.
* double disk IO for the data that arrives during AOFRW (had to be written to both the old and new AOF files)
The main modifications of this PR:
1. Remove the AOF rewrite buffer and related code.
2. Divide the AOF into multiple files, they are classified as two types, one is the the `BASE` type,
it represents the full amount of data (Maybe AOF or RDB format) after each AOFRW, there is only
one `BASE` file at most. The second is `INCR` type, may have more than one. They represent the
incremental commands since the last AOFRW.
3. Use a AOF manifest file to record and manage these AOF files mentioned above.
4. The original configuration of `appendfilename` will be the base part of the new file name, for example:
`appendonly.aof.1.base.rdb` and `appendonly.aof.2.incr.aof`
5. Add manifest-related TCL tests, and modified some existing tests that depend on the `appendfilename`
6. Remove the `aof_rewrite_buffer_length` field in info.
7. Add `aof-disable-auto-gc` configuration. By default we're automatically deleting HISTORY type AOFs.
It also gives users the opportunity to preserve the history AOFs. just for testing use now.
8. Add AOFRW limiting measure. When the AOFRW failures reaches the threshold (3 times now),
we will delay the execution of the next AOFRW by 1 minute. If the next AOFRW also fails, it will be
delayed by 2 minutes. The next is 4, 8, 16, the maximum delay is 60 minutes (1 hour). During the limit
period, we can still use the 'bgrewriteaof' command to execute AOFRW immediately.
9. Support upgrade (load) data from old version redis.
10. Add `appenddirname` configuration, as the directory name of the append only files. All AOF files and
manifest file will be placed in this directory.
11. Only the last AOF file (BASE or INCR) can be truncated. Otherwise redis will exit even if
`aof-load-truncated` is enabled.
Co-authored-by: Oran Agra <oran@redislabs.com>
2022-01-04 01:14:13 +08:00
}
if ( ret = = AOF_OPEN_ERR | | ret = = AOF_FAILED ) {
goto cleanup ;
}
}
/* Load INCR AOFs if needed. */
if ( listLength ( am - > incr_aof_list ) ) {
listNode * ln ;
listIter li ;
listRewind ( am - > incr_aof_list , & li ) ;
while ( ( ln = listNext ( & li ) ) ! = NULL ) {
aofInfo * ai = ( aofInfo * ) ln - > value ;
serverAssert ( ai - > file_type = = AOF_FILE_TYPE_INCR ) ;
aof_name = ( char * ) ai - > file_name ;
updateLoadingFileName ( aof_name ) ;
last_file = + + aof_num = = total_num ;
start = ustime ( ) ;
ret = loadSingleAppendOnlyFile ( aof_name ) ;
if ( ret = = AOF_OK | | ( ret = = AOF_TRUNCATED & & last_file ) ) {
serverLog ( LL_NOTICE , " DB loaded from incr file %s: %.3f seconds " ,
aof_name , ( float ) ( ustime ( ) - start ) / 1000000 ) ;
}
2022-02-22 08:59:23 +02:00
/* We know that (at least) one of the AOF files has data (total_size > 0),
* so empty incr AOF file doesn ' t count as a AOF_EMPTY result */
if ( ret = = AOF_EMPTY ) ret = AOF_OK ;
2022-11-22 09:18:36 -05:00
/* If the truncated file is not the last file, we consider this to be a fatal error. */
2022-02-22 08:59:23 +02:00
if ( ret = = AOF_TRUNCATED & & ! last_file ) {
Implement Multi Part AOF mechanism to avoid AOFRW overheads. (#9788)
Implement Multi-Part AOF mechanism to avoid overheads during AOFRW.
Introducing a folder with multiple AOF files tracked by a manifest file.
The main issues with the the original AOFRW mechanism are:
* buffering of commands that are processed during rewrite (consuming a lot of RAM)
* freezes of the main process when the AOFRW completes to drain the remaining part of the buffer and fsync it.
* double disk IO for the data that arrives during AOFRW (had to be written to both the old and new AOF files)
The main modifications of this PR:
1. Remove the AOF rewrite buffer and related code.
2. Divide the AOF into multiple files, they are classified as two types, one is the the `BASE` type,
it represents the full amount of data (Maybe AOF or RDB format) after each AOFRW, there is only
one `BASE` file at most. The second is `INCR` type, may have more than one. They represent the
incremental commands since the last AOFRW.
3. Use a AOF manifest file to record and manage these AOF files mentioned above.
4. The original configuration of `appendfilename` will be the base part of the new file name, for example:
`appendonly.aof.1.base.rdb` and `appendonly.aof.2.incr.aof`
5. Add manifest-related TCL tests, and modified some existing tests that depend on the `appendfilename`
6. Remove the `aof_rewrite_buffer_length` field in info.
7. Add `aof-disable-auto-gc` configuration. By default we're automatically deleting HISTORY type AOFs.
It also gives users the opportunity to preserve the history AOFs. just for testing use now.
8. Add AOFRW limiting measure. When the AOFRW failures reaches the threshold (3 times now),
we will delay the execution of the next AOFRW by 1 minute. If the next AOFRW also fails, it will be
delayed by 2 minutes. The next is 4, 8, 16, the maximum delay is 60 minutes (1 hour). During the limit
period, we can still use the 'bgrewriteaof' command to execute AOFRW immediately.
9. Support upgrade (load) data from old version redis.
10. Add `appenddirname` configuration, as the directory name of the append only files. All AOF files and
manifest file will be placed in this directory.
11. Only the last AOF file (BASE or INCR) can be truncated. Otherwise redis will exit even if
`aof-load-truncated` is enabled.
Co-authored-by: Oran Agra <oran@redislabs.com>
2022-01-04 01:14:13 +08:00
ret = AOF_FAILED ;
2022-11-22 09:18:36 -05:00
serverLog ( LL_WARNING , " Fatal error: the truncated file is not the last file " ) ;
Implement Multi Part AOF mechanism to avoid AOFRW overheads. (#9788)
Implement Multi-Part AOF mechanism to avoid overheads during AOFRW.
Introducing a folder with multiple AOF files tracked by a manifest file.
The main issues with the the original AOFRW mechanism are:
* buffering of commands that are processed during rewrite (consuming a lot of RAM)
* freezes of the main process when the AOFRW completes to drain the remaining part of the buffer and fsync it.
* double disk IO for the data that arrives during AOFRW (had to be written to both the old and new AOF files)
The main modifications of this PR:
1. Remove the AOF rewrite buffer and related code.
2. Divide the AOF into multiple files, they are classified as two types, one is the the `BASE` type,
it represents the full amount of data (Maybe AOF or RDB format) after each AOFRW, there is only
one `BASE` file at most. The second is `INCR` type, may have more than one. They represent the
incremental commands since the last AOFRW.
3. Use a AOF manifest file to record and manage these AOF files mentioned above.
4. The original configuration of `appendfilename` will be the base part of the new file name, for example:
`appendonly.aof.1.base.rdb` and `appendonly.aof.2.incr.aof`
5. Add manifest-related TCL tests, and modified some existing tests that depend on the `appendfilename`
6. Remove the `aof_rewrite_buffer_length` field in info.
7. Add `aof-disable-auto-gc` configuration. By default we're automatically deleting HISTORY type AOFs.
It also gives users the opportunity to preserve the history AOFs. just for testing use now.
8. Add AOFRW limiting measure. When the AOFRW failures reaches the threshold (3 times now),
we will delay the execution of the next AOFRW by 1 minute. If the next AOFRW also fails, it will be
delayed by 2 minutes. The next is 4, 8, 16, the maximum delay is 60 minutes (1 hour). During the limit
period, we can still use the 'bgrewriteaof' command to execute AOFRW immediately.
9. Support upgrade (load) data from old version redis.
10. Add `appenddirname` configuration, as the directory name of the append only files. All AOF files and
manifest file will be placed in this directory.
11. Only the last AOF file (BASE or INCR) can be truncated. Otherwise redis will exit even if
`aof-load-truncated` is enabled.
Co-authored-by: Oran Agra <oran@redislabs.com>
2022-01-04 01:14:13 +08:00
}
if ( ret = = AOF_OPEN_ERR | | ret = = AOF_FAILED ) {
goto cleanup ;
}
}
}
server . aof_current_size = total_size ;
Fix auto-aof-rewrite-percentage based AOFRW trigger after restart (#10550)
The `auto-aof-rewrite-percentage` config defines at what growth percentage
an automatic AOF rewrite is triggered.
This normally works OK since the size of the AOF file at the end of a rewrite
is stored in `server.aof_rewrite_base_size`.
However, on startup, redis used to store the entire size of the AOF file into that
variable, resulting in a wrong automatic AOF rewrite trigger (could have been
triggered much later than desired).
This issue would only affect the first AOFRW after startup, after that future AOFRW
would have been triggered correctly.
This bug existed in all previous versions of Redis.
This PR unifies the meaning of `server.aof_rewrite_base_size`, which only represents
the size of BASE AOF.
Note that after an AOFRW this size includes the size of the incremental file (all the
commands that executed during rewrite), so that auto-aof-rewrite-percentage is the
ratio from the size of the AOF after rewrite.
However, on startup, it is complicated to know that size, and we compromised on
taking just the size of the base file, this means that the first rewrite after startup can
happen a little bit too soon.
Co-authored-by: Oran Agra <oran@redislabs.com>
Co-authored-by: yoav-steinberg <yoav@redislabs.com>
2022-04-07 19:47:07 +08:00
/* Ideally, the aof_rewrite_base_size variable should hold the size of the
* AOF when the last rewrite ended , this should include the size of the
* incremental file that was created during the rewrite since otherwise we
* risk the next automatic rewrite to happen too soon ( or immediately if
* auto - aof - rewrite - percentage is low ) . However , since we do not persist
* aof_rewrite_base_size information anywhere , we initialize it on restart
* to the size of BASE AOF file . This might cause the first AOFRW to be
* executed early , but that shouldn ' t be a problem since everything will be
* fine after the first AOFRW . */
server . aof_rewrite_base_size = base_size ;
Implement Multi Part AOF mechanism to avoid AOFRW overheads. (#9788)
Implement Multi-Part AOF mechanism to avoid overheads during AOFRW.
Introducing a folder with multiple AOF files tracked by a manifest file.
The main issues with the the original AOFRW mechanism are:
* buffering of commands that are processed during rewrite (consuming a lot of RAM)
* freezes of the main process when the AOFRW completes to drain the remaining part of the buffer and fsync it.
* double disk IO for the data that arrives during AOFRW (had to be written to both the old and new AOF files)
The main modifications of this PR:
1. Remove the AOF rewrite buffer and related code.
2. Divide the AOF into multiple files, they are classified as two types, one is the the `BASE` type,
it represents the full amount of data (Maybe AOF or RDB format) after each AOFRW, there is only
one `BASE` file at most. The second is `INCR` type, may have more than one. They represent the
incremental commands since the last AOFRW.
3. Use a AOF manifest file to record and manage these AOF files mentioned above.
4. The original configuration of `appendfilename` will be the base part of the new file name, for example:
`appendonly.aof.1.base.rdb` and `appendonly.aof.2.incr.aof`
5. Add manifest-related TCL tests, and modified some existing tests that depend on the `appendfilename`
6. Remove the `aof_rewrite_buffer_length` field in info.
7. Add `aof-disable-auto-gc` configuration. By default we're automatically deleting HISTORY type AOFs.
It also gives users the opportunity to preserve the history AOFs. just for testing use now.
8. Add AOFRW limiting measure. When the AOFRW failures reaches the threshold (3 times now),
we will delay the execution of the next AOFRW by 1 minute. If the next AOFRW also fails, it will be
delayed by 2 minutes. The next is 4, 8, 16, the maximum delay is 60 minutes (1 hour). During the limit
period, we can still use the 'bgrewriteaof' command to execute AOFRW immediately.
9. Support upgrade (load) data from old version redis.
10. Add `appenddirname` configuration, as the directory name of the append only files. All AOF files and
manifest file will be placed in this directory.
11. Only the last AOF file (BASE or INCR) can be truncated. Otherwise redis will exit even if
`aof-load-truncated` is enabled.
Co-authored-by: Oran Agra <oran@redislabs.com>
2022-01-04 01:14:13 +08:00
cleanup :
2022-02-22 08:59:23 +02:00
stopLoading ( ret = = AOF_OK | | ret = = AOF_TRUNCATED ) ;
2021-06-14 10:38:08 +03:00
return ret ;
2010-06-22 00:07:48 +02:00
}
2012-05-21 16:50:05 +02:00
/* ----------------------------------------------------------------------------
* AOF rewrite
* - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - */
2011-05-14 12:36:22 +02:00
/* Delegate writing an object to writing a bulk string or bulk long long.
2016-04-25 16:49:57 +03:00
* This is not placed in rio . c since that adds the server . h dependency . */
2011-05-14 12:36:22 +02:00
int rioWriteBulkObject ( rio * r , robj * obj ) {
/* Avoid using getDecodedObject to help copy-on-write (we are often
* in a child process when this function is called ) . */
2015-07-26 15:28:00 +02:00
if ( obj - > encoding = = OBJ_ENCODING_INT ) {
2011-05-14 12:36:22 +02:00
return rioWriteBulkLongLong ( r , ( long ) obj - > ptr ) ;
2012-06-05 21:50:10 +02:00
} else if ( sdsEncodedObject ( obj ) ) {
2011-05-14 12:36:22 +02:00
return rioWriteBulkString ( r , obj - > ptr , sdslen ( obj - > ptr ) ) ;
} else {
2015-07-27 09:41:48 +02:00
serverPanic ( " Unknown string encoding " ) ;
2011-05-14 12:36:22 +02:00
}
}
2011-12-06 18:22:52 +01:00
/* Emit the commands needed to rebuild a list object.
* The function returns 0 on error , 1 on success . */
int rewriteListObject ( rio * r , robj * key , robj * o ) {
long long count = 0 , items = listTypeLength ( o ) ;
Add listpack encoding for list (#11303)
Improve memory efficiency of list keys
## Description of the feature
The new listpack encoding uses the old `list-max-listpack-size` config
to perform the conversion, which we can think it of as a node inside a
quicklist, but without 80 bytes overhead (internal fragmentation included)
of quicklist and quicklistNode structs.
For example, a list key with 5 items of 10 chars each, now takes 128 bytes
instead of 208 it used to take.
## Conversion rules
* Convert listpack to quicklist
When the listpack length or size reaches the `list-max-listpack-size` limit,
it will be converted to a quicklist.
* Convert quicklist to listpack
When a quicklist has only one node, and its length or size is reduced to half
of the `list-max-listpack-size` limit, it will be converted to a listpack.
This is done to avoid frequent conversions when we add or remove at the bounding size or length.
## Interface changes
1. add list entry param to listTypeSetIteratorDirection
When list encoding is listpack, `listTypeIterator->lpi` points to the next entry of current entry,
so when changing the direction, we need to use the current node (listTypeEntry->p) to
update `listTypeIterator->lpi` to the next node in the reverse direction.
## Benchmark
### Listpack VS Quicklist with one node
* LPUSH - roughly 0.3% improvement
* LRANGE - roughly 13% improvement
### Both are quicklist
* LRANGE - roughly 3% improvement
* LRANGE without pipeline - roughly 3% improvement
From the benchmark, as we can see from the results
1. When list is quicklist encoding, LRANGE improves performance by <5%.
2. When list is listpack encoding, LRANGE improves performance by ~13%,
the main enhancement is brought by `addListListpackRangeReply()`.
## Memory usage
1M lists(key:0~key:1000000) with 5 items of 10 chars ("hellohello") each.
shows memory usage down by 35.49%, from 214MB to 138MB.
## Note
1. Add conversion callback to support doing some work before conversion
Since the quicklist iterator decompresses the current node when it is released, we can
no longer decompress the quicklist after we convert the list.
2022-11-17 02:29:46 +08:00
listTypeIterator * li = listTypeInitIterator ( o , 0 , LIST_TAIL ) ;
listTypeEntry entry ;
while ( listTypeNext ( li , & entry ) ) {
if ( count = = 0 ) {
int cmd_items = ( items > AOF_REWRITE_ITEMS_PER_CMD ) ?
AOF_REWRITE_ITEMS_PER_CMD : items ;
if ( ! rioWriteBulkCount ( r , ' * ' , 2 + cmd_items ) | |
! rioWriteBulkString ( r , " RPUSH " , 5 ) | |
! rioWriteBulkObject ( r , key ) )
{
listTypeReleaseIterator ( li ) ;
return 0 ;
2011-12-06 18:22:52 +01:00
}
Add listpack encoding for list (#11303)
Improve memory efficiency of list keys
## Description of the feature
The new listpack encoding uses the old `list-max-listpack-size` config
to perform the conversion, which we can think it of as a node inside a
quicklist, but without 80 bytes overhead (internal fragmentation included)
of quicklist and quicklistNode structs.
For example, a list key with 5 items of 10 chars each, now takes 128 bytes
instead of 208 it used to take.
## Conversion rules
* Convert listpack to quicklist
When the listpack length or size reaches the `list-max-listpack-size` limit,
it will be converted to a quicklist.
* Convert quicklist to listpack
When a quicklist has only one node, and its length or size is reduced to half
of the `list-max-listpack-size` limit, it will be converted to a listpack.
This is done to avoid frequent conversions when we add or remove at the bounding size or length.
## Interface changes
1. add list entry param to listTypeSetIteratorDirection
When list encoding is listpack, `listTypeIterator->lpi` points to the next entry of current entry,
so when changing the direction, we need to use the current node (listTypeEntry->p) to
update `listTypeIterator->lpi` to the next node in the reverse direction.
## Benchmark
### Listpack VS Quicklist with one node
* LPUSH - roughly 0.3% improvement
* LRANGE - roughly 13% improvement
### Both are quicklist
* LRANGE - roughly 3% improvement
* LRANGE without pipeline - roughly 3% improvement
From the benchmark, as we can see from the results
1. When list is quicklist encoding, LRANGE improves performance by <5%.
2. When list is listpack encoding, LRANGE improves performance by ~13%,
the main enhancement is brought by `addListListpackRangeReply()`.
## Memory usage
1M lists(key:0~key:1000000) with 5 items of 10 chars ("hellohello") each.
shows memory usage down by 35.49%, from 214MB to 138MB.
## Note
1. Add conversion callback to support doing some work before conversion
Since the quicklist iterator decompresses the current node when it is released, we can
no longer decompress the quicklist after we convert the list.
2022-11-17 02:29:46 +08:00
}
2011-12-06 18:22:52 +01:00
Add listpack encoding for list (#11303)
Improve memory efficiency of list keys
## Description of the feature
The new listpack encoding uses the old `list-max-listpack-size` config
to perform the conversion, which we can think it of as a node inside a
quicklist, but without 80 bytes overhead (internal fragmentation included)
of quicklist and quicklistNode structs.
For example, a list key with 5 items of 10 chars each, now takes 128 bytes
instead of 208 it used to take.
## Conversion rules
* Convert listpack to quicklist
When the listpack length or size reaches the `list-max-listpack-size` limit,
it will be converted to a quicklist.
* Convert quicklist to listpack
When a quicklist has only one node, and its length or size is reduced to half
of the `list-max-listpack-size` limit, it will be converted to a listpack.
This is done to avoid frequent conversions when we add or remove at the bounding size or length.
## Interface changes
1. add list entry param to listTypeSetIteratorDirection
When list encoding is listpack, `listTypeIterator->lpi` points to the next entry of current entry,
so when changing the direction, we need to use the current node (listTypeEntry->p) to
update `listTypeIterator->lpi` to the next node in the reverse direction.
## Benchmark
### Listpack VS Quicklist with one node
* LPUSH - roughly 0.3% improvement
* LRANGE - roughly 13% improvement
### Both are quicklist
* LRANGE - roughly 3% improvement
* LRANGE without pipeline - roughly 3% improvement
From the benchmark, as we can see from the results
1. When list is quicklist encoding, LRANGE improves performance by <5%.
2. When list is listpack encoding, LRANGE improves performance by ~13%,
the main enhancement is brought by `addListListpackRangeReply()`.
## Memory usage
1M lists(key:0~key:1000000) with 5 items of 10 chars ("hellohello") each.
shows memory usage down by 35.49%, from 214MB to 138MB.
## Note
1. Add conversion callback to support doing some work before conversion
Since the quicklist iterator decompresses the current node when it is released, we can
no longer decompress the quicklist after we convert the list.
2022-11-17 02:29:46 +08:00
unsigned char * vstr ;
size_t vlen ;
long long lval ;
vstr = listTypeGetValue ( & entry , & vlen , & lval ) ;
if ( vstr ) {
if ( ! rioWriteBulkString ( r , ( char * ) vstr , vlen ) ) {
listTypeReleaseIterator ( li ) ;
return 0 ;
}
} else {
if ( ! rioWriteBulkLongLong ( r , lval ) ) {
listTypeReleaseIterator ( li ) ;
return 0 ;
2011-12-07 11:34:25 +01:00
}
2011-12-06 18:22:52 +01:00
}
Add listpack encoding for list (#11303)
Improve memory efficiency of list keys
## Description of the feature
The new listpack encoding uses the old `list-max-listpack-size` config
to perform the conversion, which we can think it of as a node inside a
quicklist, but without 80 bytes overhead (internal fragmentation included)
of quicklist and quicklistNode structs.
For example, a list key with 5 items of 10 chars each, now takes 128 bytes
instead of 208 it used to take.
## Conversion rules
* Convert listpack to quicklist
When the listpack length or size reaches the `list-max-listpack-size` limit,
it will be converted to a quicklist.
* Convert quicklist to listpack
When a quicklist has only one node, and its length or size is reduced to half
of the `list-max-listpack-size` limit, it will be converted to a listpack.
This is done to avoid frequent conversions when we add or remove at the bounding size or length.
## Interface changes
1. add list entry param to listTypeSetIteratorDirection
When list encoding is listpack, `listTypeIterator->lpi` points to the next entry of current entry,
so when changing the direction, we need to use the current node (listTypeEntry->p) to
update `listTypeIterator->lpi` to the next node in the reverse direction.
## Benchmark
### Listpack VS Quicklist with one node
* LPUSH - roughly 0.3% improvement
* LRANGE - roughly 13% improvement
### Both are quicklist
* LRANGE - roughly 3% improvement
* LRANGE without pipeline - roughly 3% improvement
From the benchmark, as we can see from the results
1. When list is quicklist encoding, LRANGE improves performance by <5%.
2. When list is listpack encoding, LRANGE improves performance by ~13%,
the main enhancement is brought by `addListListpackRangeReply()`.
## Memory usage
1M lists(key:0~key:1000000) with 5 items of 10 chars ("hellohello") each.
shows memory usage down by 35.49%, from 214MB to 138MB.
## Note
1. Add conversion callback to support doing some work before conversion
Since the quicklist iterator decompresses the current node when it is released, we can
no longer decompress the quicklist after we convert the list.
2022-11-17 02:29:46 +08:00
if ( + + count = = AOF_REWRITE_ITEMS_PER_CMD ) count = 0 ;
items - - ;
2011-12-06 18:22:52 +01:00
}
Add listpack encoding for list (#11303)
Improve memory efficiency of list keys
## Description of the feature
The new listpack encoding uses the old `list-max-listpack-size` config
to perform the conversion, which we can think it of as a node inside a
quicklist, but without 80 bytes overhead (internal fragmentation included)
of quicklist and quicklistNode structs.
For example, a list key with 5 items of 10 chars each, now takes 128 bytes
instead of 208 it used to take.
## Conversion rules
* Convert listpack to quicklist
When the listpack length or size reaches the `list-max-listpack-size` limit,
it will be converted to a quicklist.
* Convert quicklist to listpack
When a quicklist has only one node, and its length or size is reduced to half
of the `list-max-listpack-size` limit, it will be converted to a listpack.
This is done to avoid frequent conversions when we add or remove at the bounding size or length.
## Interface changes
1. add list entry param to listTypeSetIteratorDirection
When list encoding is listpack, `listTypeIterator->lpi` points to the next entry of current entry,
so when changing the direction, we need to use the current node (listTypeEntry->p) to
update `listTypeIterator->lpi` to the next node in the reverse direction.
## Benchmark
### Listpack VS Quicklist with one node
* LPUSH - roughly 0.3% improvement
* LRANGE - roughly 13% improvement
### Both are quicklist
* LRANGE - roughly 3% improvement
* LRANGE without pipeline - roughly 3% improvement
From the benchmark, as we can see from the results
1. When list is quicklist encoding, LRANGE improves performance by <5%.
2. When list is listpack encoding, LRANGE improves performance by ~13%,
the main enhancement is brought by `addListListpackRangeReply()`.
## Memory usage
1M lists(key:0~key:1000000) with 5 items of 10 chars ("hellohello") each.
shows memory usage down by 35.49%, from 214MB to 138MB.
## Note
1. Add conversion callback to support doing some work before conversion
Since the quicklist iterator decompresses the current node when it is released, we can
no longer decompress the quicklist after we convert the list.
2022-11-17 02:29:46 +08:00
listTypeReleaseIterator ( li ) ;
2011-12-06 18:22:52 +01:00
return 1 ;
}
2011-12-12 15:57:51 +01:00
/* Emit the commands needed to rebuild a set object.
* The function returns 0 on error , 1 on success . */
int rewriteSetObject ( rio * r , robj * key , robj * o ) {
long long count = 0 , items = setTypeSize ( o ) ;
2022-11-09 18:50:07 +01:00
setTypeIterator * si = setTypeInitIterator ( o ) ;
char * str ;
size_t len ;
int64_t llval ;
while ( setTypeNext ( si , & str , & len , & llval ) ! = - 1 ) {
if ( count = = 0 ) {
int cmd_items = ( items > AOF_REWRITE_ITEMS_PER_CMD ) ?
AOF_REWRITE_ITEMS_PER_CMD : items ;
if ( ! rioWriteBulkCount ( r , ' * ' , 2 + cmd_items ) | |
! rioWriteBulkString ( r , " SADD " , 4 ) | |
! rioWriteBulkObject ( r , key ) )
{
return 0 ;
2011-12-12 15:57:51 +01:00
}
}
2022-11-09 18:50:07 +01:00
size_t written = str ?
rioWriteBulkString ( r , str , len ) : rioWriteBulkLongLong ( r , llval ) ;
if ( ! written ) {
setTypeReleaseIterator ( si ) ;
return 0 ;
2011-12-12 15:57:51 +01:00
}
2022-11-09 18:50:07 +01:00
if ( + + count = = AOF_REWRITE_ITEMS_PER_CMD ) count = 0 ;
items - - ;
2011-12-12 15:57:51 +01:00
}
2022-11-09 18:50:07 +01:00
setTypeReleaseIterator ( si ) ;
2011-12-12 15:57:51 +01:00
return 1 ;
}
2011-12-12 17:27:39 +01:00
/* Emit the commands needed to rebuild a sorted set object.
* The function returns 0 on error , 1 on success . */
int rewriteSortedSetObject ( rio * r , robj * key , robj * o ) {
long long count = 0 , items = zsetLength ( o ) ;
2021-09-09 23:18:53 +08:00
if ( o - > encoding = = OBJ_ENCODING_LISTPACK ) {
2011-12-12 17:27:39 +01:00
unsigned char * zl = o - > ptr ;
unsigned char * eptr , * sptr ;
unsigned char * vstr ;
unsigned int vlen ;
long long vll ;
double score ;
2021-09-09 23:18:53 +08:00
eptr = lpSeek ( zl , 0 ) ;
2015-07-26 15:29:53 +02:00
serverAssert ( eptr ! = NULL ) ;
2021-09-09 23:18:53 +08:00
sptr = lpNext ( zl , eptr ) ;
2015-07-26 15:29:53 +02:00
serverAssert ( sptr ! = NULL ) ;
2011-12-12 17:27:39 +01:00
while ( eptr ! = NULL ) {
2021-09-09 23:18:53 +08:00
vstr = lpGetValue ( eptr , & vlen , & vll ) ;
2011-12-12 17:27:39 +01:00
score = zzlGetScore ( sptr ) ;
if ( count = = 0 ) {
2015-07-27 09:41:48 +02:00
int cmd_items = ( items > AOF_REWRITE_ITEMS_PER_CMD ) ?
AOF_REWRITE_ITEMS_PER_CMD : items ;
2011-12-12 17:27:39 +01:00
2020-10-28 06:35:28 -04:00
if ( ! rioWriteBulkCount ( r , ' * ' , 2 + cmd_items * 2 ) | |
! rioWriteBulkString ( r , " ZADD " , 4 ) | |
! rioWriteBulkObject ( r , key ) )
{
return 0 ;
}
2011-12-12 17:27:39 +01:00
}
2020-10-28 06:35:28 -04:00
if ( ! rioWriteBulkDouble ( r , score ) ) return 0 ;
2011-12-12 17:27:39 +01:00
if ( vstr ! = NULL ) {
2020-10-28 06:35:28 -04:00
if ( ! rioWriteBulkString ( r , ( char * ) vstr , vlen ) ) return 0 ;
2011-12-12 17:27:39 +01:00
} else {
2020-10-28 06:35:28 -04:00
if ( ! rioWriteBulkLongLong ( r , vll ) ) return 0 ;
2011-12-12 17:27:39 +01:00
}
zzlNext ( zl , & eptr , & sptr ) ;
2015-07-27 09:41:48 +02:00
if ( + + count = = AOF_REWRITE_ITEMS_PER_CMD ) count = 0 ;
2011-12-12 17:27:39 +01:00
items - - ;
}
2015-07-26 15:28:00 +02:00
} else if ( o - > encoding = = OBJ_ENCODING_SKIPLIST ) {
2011-12-12 17:27:39 +01:00
zset * zs = o - > ptr ;
dictIterator * di = dictGetIterator ( zs - > dict ) ;
dictEntry * de ;
while ( ( de = dictNext ( di ) ) ! = NULL ) {
2015-08-04 09:20:55 +02:00
sds ele = dictGetKey ( de ) ;
2011-12-12 17:27:39 +01:00
double * score = dictGetVal ( de ) ;
if ( count = = 0 ) {
2015-07-27 09:41:48 +02:00
int cmd_items = ( items > AOF_REWRITE_ITEMS_PER_CMD ) ?
AOF_REWRITE_ITEMS_PER_CMD : items ;
2011-12-12 17:27:39 +01:00
2020-10-28 06:35:28 -04:00
if ( ! rioWriteBulkCount ( r , ' * ' , 2 + cmd_items * 2 ) | |
! rioWriteBulkString ( r , " ZADD " , 4 ) | |
! rioWriteBulkObject ( r , key ) )
{
dictReleaseIterator ( di ) ;
return 0 ;
}
}
if ( ! rioWriteBulkDouble ( r , * score ) | |
! rioWriteBulkString ( r , ele , sdslen ( ele ) ) )
{
dictReleaseIterator ( di ) ;
return 0 ;
2011-12-12 17:27:39 +01:00
}
2015-07-27 09:41:48 +02:00
if ( + + count = = AOF_REWRITE_ITEMS_PER_CMD ) count = 0 ;
2011-12-12 17:27:39 +01:00
items - - ;
}
dictReleaseIterator ( di ) ;
} else {
2015-07-27 09:41:48 +02:00
serverPanic ( " Unknown sorted zset encoding " ) ;
2011-12-12 17:27:39 +01:00
}
return 1 ;
}
2013-12-05 16:35:32 +01:00
/* Write either the key or the value of the currently selected item of a hash.
2012-03-10 10:36:51 +01:00
* The ' hi ' argument passes a valid Redis hash iterator .
* The ' what ' filed specifies if to write a key or a value and can be
2015-07-26 15:28:00 +02:00
* either OBJ_HASH_KEY or OBJ_HASH_VALUE .
2012-03-10 10:36:51 +01:00
*
* The function returns 0 on error , non - zero on success . */
2012-01-02 22:14:10 -08:00
static int rioWriteHashIteratorCursor ( rio * r , hashTypeIterator * hi , int what ) {
2021-08-10 14:18:49 +08:00
if ( hi - > encoding = = OBJ_ENCODING_LISTPACK ) {
2012-01-02 22:14:10 -08:00
unsigned char * vstr = NULL ;
unsigned int vlen = UINT_MAX ;
long long vll = LLONG_MAX ;
2021-08-10 14:18:49 +08:00
hashTypeCurrentFromListpack ( hi , what , & vstr , & vlen , & vll ) ;
2015-09-23 09:33:23 +02:00
if ( vstr )
2012-01-02 22:14:10 -08:00
return rioWriteBulkString ( r , ( char * ) vstr , vlen ) ;
2015-09-23 09:33:23 +02:00
else
2012-01-02 22:14:10 -08:00
return rioWriteBulkLongLong ( r , vll ) ;
2015-07-26 15:28:00 +02:00
} else if ( hi - > encoding = = OBJ_ENCODING_HT ) {
2015-09-23 09:33:23 +02:00
sds value = hashTypeCurrentFromHashTable ( hi , what ) ;
return rioWriteBulkString ( r , value , sdslen ( value ) ) ;
2012-01-02 22:14:10 -08:00
}
2015-07-27 09:41:48 +02:00
serverPanic ( " Unknown hash encoding " ) ;
2012-01-02 22:14:10 -08:00
return 0 ;
}
2011-12-12 17:39:23 +01:00
/* Emit the commands needed to rebuild a hash object.
* The function returns 0 on error , 1 on success . */
int rewriteHashObject ( rio * r , robj * key , robj * o ) {
2012-01-02 22:14:10 -08:00
hashTypeIterator * hi ;
2011-12-12 17:39:23 +01:00
long long count = 0 , items = hashTypeLength ( o ) ;
2012-01-02 22:14:10 -08:00
hi = hashTypeInitIterator ( o ) ;
2015-07-26 23:17:55 +02:00
while ( hashTypeNext ( hi ) ! = C_ERR ) {
2012-01-02 22:14:10 -08:00
if ( count = = 0 ) {
2015-07-27 09:41:48 +02:00
int cmd_items = ( items > AOF_REWRITE_ITEMS_PER_CMD ) ?
AOF_REWRITE_ITEMS_PER_CMD : items ;
2011-12-12 17:39:23 +01:00
2020-10-28 06:35:28 -04:00
if ( ! rioWriteBulkCount ( r , ' * ' , 2 + cmd_items * 2 ) | |
! rioWriteBulkString ( r , " HMSET " , 5 ) | |
! rioWriteBulkObject ( r , key ) )
{
hashTypeReleaseIterator ( hi ) ;
return 0 ;
}
2011-12-12 17:39:23 +01:00
}
2020-10-28 06:35:28 -04:00
if ( ! rioWriteHashIteratorCursor ( r , hi , OBJ_HASH_KEY ) | |
! rioWriteHashIteratorCursor ( r , hi , OBJ_HASH_VALUE ) )
{
hashTypeReleaseIterator ( hi ) ;
return 0 ;
}
2015-07-27 09:41:48 +02:00
if ( + + count = = AOF_REWRITE_ITEMS_PER_CMD ) count = 0 ;
2012-01-02 22:14:10 -08:00
items - - ;
}
2011-12-12 17:39:23 +01:00
2012-01-02 22:14:10 -08:00
hashTypeReleaseIterator ( hi ) ;
2011-12-12 17:39:23 +01:00
return 1 ;
}
2018-03-23 17:21:31 +01:00
/* Helper for rewriteStreamObject() that generates a bulk string into the
* AOF representing the ID ' id ' . */
int rioWriteBulkStreamID ( rio * r , streamID * id ) {
int retval ;
sds replyid = sdscatfmt ( sdsempty ( ) , " %U-%U " , id - > ms , id - > seq ) ;
2020-01-13 13:25:37 +01:00
retval = rioWriteBulkString ( r , replyid , sdslen ( replyid ) ) ;
2018-03-23 17:21:31 +01:00
sdsfree ( replyid ) ;
return retval ;
}
/* Helper for rewriteStreamObject(): emit the XCLAIM needed in order to
* add the message described by ' nack ' having the id ' rawid ' , into the pending
* list of the specified consumer . All this in the context of the specified
* key and group . */
int rioWriteStreamPendingEntry ( rio * r , robj * key , const char * groupname , size_t groupname_len , streamConsumer * consumer , unsigned char * rawid , streamNACK * nack ) {
/* XCLAIM <key> <group> <consumer> 0 <id> TIME <milliseconds-unix-time>
RETRYCOUNT < count > JUSTID FORCE . */
streamID id ;
streamDecodeID ( rawid , & id ) ;
if ( rioWriteBulkCount ( r , ' * ' , 12 ) = = 0 ) return 0 ;
if ( rioWriteBulkString ( r , " XCLAIM " , 6 ) = = 0 ) return 0 ;
if ( rioWriteBulkObject ( r , key ) = = 0 ) return 0 ;
if ( rioWriteBulkString ( r , groupname , groupname_len ) = = 0 ) return 0 ;
if ( rioWriteBulkString ( r , consumer - > name , sdslen ( consumer - > name ) ) = = 0 ) return 0 ;
if ( rioWriteBulkString ( r , " 0 " , 1 ) = = 0 ) return 0 ;
if ( rioWriteBulkStreamID ( r , & id ) = = 0 ) return 0 ;
if ( rioWriteBulkString ( r , " TIME " , 4 ) = = 0 ) return 0 ;
if ( rioWriteBulkLongLong ( r , nack - > delivery_time ) = = 0 ) return 0 ;
if ( rioWriteBulkString ( r , " RETRYCOUNT " , 10 ) = = 0 ) return 0 ;
if ( rioWriteBulkLongLong ( r , nack - > delivery_count ) = = 0 ) return 0 ;
if ( rioWriteBulkString ( r , " JUSTID " , 6 ) = = 0 ) return 0 ;
if ( rioWriteBulkString ( r , " FORCE " , 5 ) = = 0 ) return 0 ;
return 1 ;
}
2020-09-24 12:02:40 +03:00
/* Helper for rewriteStreamObject(): emit the XGROUP CREATECONSUMER is
* needed in order to create consumers that do not have any pending entries .
* All this in the context of the specified key and group . */
int rioWriteStreamEmptyConsumer ( rio * r , robj * key , const char * groupname , size_t groupname_len , streamConsumer * consumer ) {
/* XGROUP CREATECONSUMER <key> <group> <consumer> */
if ( rioWriteBulkCount ( r , ' * ' , 5 ) = = 0 ) return 0 ;
if ( rioWriteBulkString ( r , " XGROUP " , 6 ) = = 0 ) return 0 ;
if ( rioWriteBulkString ( r , " CREATECONSUMER " , 14 ) = = 0 ) return 0 ;
if ( rioWriteBulkObject ( r , key ) = = 0 ) return 0 ;
if ( rioWriteBulkString ( r , groupname , groupname_len ) = = 0 ) return 0 ;
if ( rioWriteBulkString ( r , consumer - > name , sdslen ( consumer - > name ) ) = = 0 ) return 0 ;
return 1 ;
}
2017-09-15 12:37:04 +02:00
/* Emit the commands needed to rebuild a stream object.
* The function returns 0 on error , 1 on success . */
int rewriteStreamObject ( rio * r , robj * key , robj * o ) {
2018-03-23 17:21:31 +01:00
stream * s = o - > ptr ;
2017-09-15 12:37:04 +02:00
streamIterator si ;
2018-03-23 17:21:31 +01:00
streamIteratorStart ( & si , s , NULL , NULL , 0 ) ;
2017-09-15 12:37:04 +02:00
streamID id ;
int64_t numfields ;
2018-10-08 21:23:38 +08:00
if ( s - > length ) {
/* Reconstruct the stream data using XADD commands. */
while ( streamIteratorGetID ( & si , & id , & numfields ) ) {
/* Emit a two elements array for each item. The first is
* the ID , the second is an array of field - value pairs . */
/* Emit the XADD <key> <id> ...fields... command. */
2020-09-22 02:05:47 -04:00
if ( ! rioWriteBulkCount ( r , ' * ' , 3 + numfields * 2 ) | |
! rioWriteBulkString ( r , " XADD " , 4 ) | |
! rioWriteBulkObject ( r , key ) | |
! rioWriteBulkStreamID ( r , & id ) )
{
streamIteratorStop ( & si ) ;
return 0 ;
}
2018-10-08 21:23:38 +08:00
while ( numfields - - ) {
unsigned char * field , * value ;
int64_t field_len , value_len ;
streamIteratorGetField ( & si , & field , & value , & field_len , & value_len ) ;
2020-09-22 02:05:47 -04:00
if ( ! rioWriteBulkString ( r , ( char * ) field , field_len ) | |
! rioWriteBulkString ( r , ( char * ) value , value_len ) )
{
streamIteratorStop ( & si ) ;
return 0 ;
}
2018-10-08 21:23:38 +08:00
}
2017-09-15 12:37:04 +02:00
}
2018-10-08 21:23:38 +08:00
} else {
2018-10-16 16:48:31 +02:00
/* Use the XADD MAXLEN 0 trick to generate an empty stream if
* the key we are serializing is an empty string , which is possible
* for the Stream type . */
2019-12-04 17:29:29 +02:00
id . ms = 0 ; id . seq = 1 ;
2020-09-22 02:05:47 -04:00
if ( ! rioWriteBulkCount ( r , ' * ' , 7 ) | |
! rioWriteBulkString ( r , " XADD " , 4 ) | |
! rioWriteBulkObject ( r , key ) | |
! rioWriteBulkString ( r , " MAXLEN " , 6 ) | |
! rioWriteBulkString ( r , " 0 " , 1 ) | |
! rioWriteBulkStreamID ( r , & id ) | |
! rioWriteBulkString ( r , " x " , 1 ) | |
! rioWriteBulkString ( r , " y " , 1 ) )
{
streamIteratorStop ( & si ) ;
return 0 ;
}
2017-09-15 12:37:04 +02:00
}
2018-03-23 17:21:31 +01:00
2018-10-16 16:48:31 +02:00
/* Append XSETID after XADD, make sure lastid is correct,
* in case of XDEL lastid . */
Add stream consumer group lag tracking and reporting (#9127)
Adds the ability to track the lag of a consumer group (CG), that is, the number
of entries yet-to-be-delivered from the stream.
The proposed constant-time solution is in the spirit of "best-effort."
Partially addresses #8737.
## Description of approach
We add a new "entries_added" property to the stream. This starts at 0 for a new
stream and is incremented by 1 with every `XADD`. It is essentially an all-time
counter of the entries added to the stream.
Given the stream's length and this counter value, we can trivially find the logical
"entries_added" counter of the first ID if and only if the stream is contiguous.
A fragmented stream contains one or more tombstones generated by `XDEL`s.
The new "xdel_max_id" stream property tracks the latest tombstone.
The CG also tracks its last delivered ID's as an "entries_read" counter and
increments it independently when delivering new messages, unless the this
read counter is invalid (-1 means invalid offset). When the CG's counter is
available, the reported lag is the difference between added and read counters.
Lastly, this also adds a "first_id" field to the stream structure in order to make
looking it up cheaper in most cases.
## Limitations
There are two cases in which the mechanism isn't able to track the lag.
In these cases, `XINFO` replies with `null` in the "lag" field.
The first case is when a CG is created with an arbitrary last delivered ID,
that isn't "0-0", nor the first or the last entries of the stream. In this case,
it is impossible to obtain a valid read counter (short of an O(N) operation).
The second case is when there are one or more tombstones fragmenting
the stream's entries range.
In both cases, given enough time and assuming that the consumers are
active (reading and lacking) and advancing, the CG should be able to
catch up with the tip of the stream and report zero lag.
Once that's achieved, lag tracking would resume as normal (until the
next tombstone is set).
## API changes
* `XGROUP CREATE` added with the optional named argument `[ENTRIESREAD entries-read]`
for explicitly specifying the new CG's counter.
* `XGROUP SETID` added with an optional positional argument `[ENTRIESREAD entries-read]`
for specifying the CG's counter.
* `XINFO` reports the maximal tombstone ID, the recorded first entry ID, and total
number of entries added to the stream.
* `XINFO` reports the current lag and logical read counter of CGs.
* `XSETID` is an internal command that's used in replication/aof. It has been added with
the optional positional arguments `[ENTRIESADDED entries-added] [MAXDELETEDID max-deleted-entry-id]`
for propagating the CG's offset and maximal tombstone ID of the stream.
## The generic unsolved problem
The current stream implementation doesn't provide an efficient way to obtain the
approximate/exact size of a range of entries. While it could've been nice to have
that ability (#5813) in general, let alone specifically in the context of CGs, the risk
and complexities involved in such implementation are in all likelihood prohibitive.
## A refactoring note
The `streamGetEdgeID` has been refactored to accommodate both the existing seek
of any entry as well as seeking non-deleted entries (the addition of the `skip_tombstones`
argument). Furthermore, this refactoring also migrated the seek logic to use the
`streamIterator` (rather than `raxIterator`) that was, in turn, extended with the
`skip_tombstones` Boolean struct field to control the emission of these.
Co-authored-by: Guy Benoish <guy.benoish@redislabs.com>
Co-authored-by: Oran Agra <oran@redislabs.com>
2022-02-23 22:34:58 +02:00
if ( ! rioWriteBulkCount ( r , ' * ' , 7 ) | |
2020-09-22 02:05:47 -04:00
! rioWriteBulkString ( r , " XSETID " , 6 ) | |
! rioWriteBulkObject ( r , key ) | |
Add stream consumer group lag tracking and reporting (#9127)
Adds the ability to track the lag of a consumer group (CG), that is, the number
of entries yet-to-be-delivered from the stream.
The proposed constant-time solution is in the spirit of "best-effort."
Partially addresses #8737.
## Description of approach
We add a new "entries_added" property to the stream. This starts at 0 for a new
stream and is incremented by 1 with every `XADD`. It is essentially an all-time
counter of the entries added to the stream.
Given the stream's length and this counter value, we can trivially find the logical
"entries_added" counter of the first ID if and only if the stream is contiguous.
A fragmented stream contains one or more tombstones generated by `XDEL`s.
The new "xdel_max_id" stream property tracks the latest tombstone.
The CG also tracks its last delivered ID's as an "entries_read" counter and
increments it independently when delivering new messages, unless the this
read counter is invalid (-1 means invalid offset). When the CG's counter is
available, the reported lag is the difference between added and read counters.
Lastly, this also adds a "first_id" field to the stream structure in order to make
looking it up cheaper in most cases.
## Limitations
There are two cases in which the mechanism isn't able to track the lag.
In these cases, `XINFO` replies with `null` in the "lag" field.
The first case is when a CG is created with an arbitrary last delivered ID,
that isn't "0-0", nor the first or the last entries of the stream. In this case,
it is impossible to obtain a valid read counter (short of an O(N) operation).
The second case is when there are one or more tombstones fragmenting
the stream's entries range.
In both cases, given enough time and assuming that the consumers are
active (reading and lacking) and advancing, the CG should be able to
catch up with the tip of the stream and report zero lag.
Once that's achieved, lag tracking would resume as normal (until the
next tombstone is set).
## API changes
* `XGROUP CREATE` added with the optional named argument `[ENTRIESREAD entries-read]`
for explicitly specifying the new CG's counter.
* `XGROUP SETID` added with an optional positional argument `[ENTRIESREAD entries-read]`
for specifying the CG's counter.
* `XINFO` reports the maximal tombstone ID, the recorded first entry ID, and total
number of entries added to the stream.
* `XINFO` reports the current lag and logical read counter of CGs.
* `XSETID` is an internal command that's used in replication/aof. It has been added with
the optional positional arguments `[ENTRIESADDED entries-added] [MAXDELETEDID max-deleted-entry-id]`
for propagating the CG's offset and maximal tombstone ID of the stream.
## The generic unsolved problem
The current stream implementation doesn't provide an efficient way to obtain the
approximate/exact size of a range of entries. While it could've been nice to have
that ability (#5813) in general, let alone specifically in the context of CGs, the risk
and complexities involved in such implementation are in all likelihood prohibitive.
## A refactoring note
The `streamGetEdgeID` has been refactored to accommodate both the existing seek
of any entry as well as seeking non-deleted entries (the addition of the `skip_tombstones`
argument). Furthermore, this refactoring also migrated the seek logic to use the
`streamIterator` (rather than `raxIterator`) that was, in turn, extended with the
`skip_tombstones` Boolean struct field to control the emission of these.
Co-authored-by: Guy Benoish <guy.benoish@redislabs.com>
Co-authored-by: Oran Agra <oran@redislabs.com>
2022-02-23 22:34:58 +02:00
! rioWriteBulkStreamID ( r , & s - > last_id ) | |
! rioWriteBulkString ( r , " ENTRIESADDED " , 12 ) | |
! rioWriteBulkLongLong ( r , s - > entries_added ) | |
! rioWriteBulkString ( r , " MAXDELETEDID " , 12 ) | |
! rioWriteBulkStreamID ( r , & s - > max_deleted_entry_id ) )
2020-09-22 02:05:47 -04:00
{
streamIteratorStop ( & si ) ;
return 0 ;
}
2018-10-16 16:48:31 +02:00
2018-03-23 17:21:31 +01:00
/* Create all the stream consumer groups. */
if ( s - > cgroups ) {
raxIterator ri ;
raxStart ( & ri , s - > cgroups ) ;
raxSeek ( & ri , " ^ " , NULL , 0 ) ;
while ( raxNext ( & ri ) ) {
streamCG * group = ri . data ;
/* Emit the XGROUP CREATE in order to create the group. */
Add stream consumer group lag tracking and reporting (#9127)
Adds the ability to track the lag of a consumer group (CG), that is, the number
of entries yet-to-be-delivered from the stream.
The proposed constant-time solution is in the spirit of "best-effort."
Partially addresses #8737.
## Description of approach
We add a new "entries_added" property to the stream. This starts at 0 for a new
stream and is incremented by 1 with every `XADD`. It is essentially an all-time
counter of the entries added to the stream.
Given the stream's length and this counter value, we can trivially find the logical
"entries_added" counter of the first ID if and only if the stream is contiguous.
A fragmented stream contains one or more tombstones generated by `XDEL`s.
The new "xdel_max_id" stream property tracks the latest tombstone.
The CG also tracks its last delivered ID's as an "entries_read" counter and
increments it independently when delivering new messages, unless the this
read counter is invalid (-1 means invalid offset). When the CG's counter is
available, the reported lag is the difference between added and read counters.
Lastly, this also adds a "first_id" field to the stream structure in order to make
looking it up cheaper in most cases.
## Limitations
There are two cases in which the mechanism isn't able to track the lag.
In these cases, `XINFO` replies with `null` in the "lag" field.
The first case is when a CG is created with an arbitrary last delivered ID,
that isn't "0-0", nor the first or the last entries of the stream. In this case,
it is impossible to obtain a valid read counter (short of an O(N) operation).
The second case is when there are one or more tombstones fragmenting
the stream's entries range.
In both cases, given enough time and assuming that the consumers are
active (reading and lacking) and advancing, the CG should be able to
catch up with the tip of the stream and report zero lag.
Once that's achieved, lag tracking would resume as normal (until the
next tombstone is set).
## API changes
* `XGROUP CREATE` added with the optional named argument `[ENTRIESREAD entries-read]`
for explicitly specifying the new CG's counter.
* `XGROUP SETID` added with an optional positional argument `[ENTRIESREAD entries-read]`
for specifying the CG's counter.
* `XINFO` reports the maximal tombstone ID, the recorded first entry ID, and total
number of entries added to the stream.
* `XINFO` reports the current lag and logical read counter of CGs.
* `XSETID` is an internal command that's used in replication/aof. It has been added with
the optional positional arguments `[ENTRIESADDED entries-added] [MAXDELETEDID max-deleted-entry-id]`
for propagating the CG's offset and maximal tombstone ID of the stream.
## The generic unsolved problem
The current stream implementation doesn't provide an efficient way to obtain the
approximate/exact size of a range of entries. While it could've been nice to have
that ability (#5813) in general, let alone specifically in the context of CGs, the risk
and complexities involved in such implementation are in all likelihood prohibitive.
## A refactoring note
The `streamGetEdgeID` has been refactored to accommodate both the existing seek
of any entry as well as seeking non-deleted entries (the addition of the `skip_tombstones`
argument). Furthermore, this refactoring also migrated the seek logic to use the
`streamIterator` (rather than `raxIterator`) that was, in turn, extended with the
`skip_tombstones` Boolean struct field to control the emission of these.
Co-authored-by: Guy Benoish <guy.benoish@redislabs.com>
Co-authored-by: Oran Agra <oran@redislabs.com>
2022-02-23 22:34:58 +02:00
if ( ! rioWriteBulkCount ( r , ' * ' , 7 ) | |
2020-07-21 01:13:05 -04:00
! rioWriteBulkString ( r , " XGROUP " , 6 ) | |
! rioWriteBulkString ( r , " CREATE " , 6 ) | |
! rioWriteBulkObject ( r , key ) | |
! rioWriteBulkString ( r , ( char * ) ri . key , ri . key_len ) | |
Add stream consumer group lag tracking and reporting (#9127)
Adds the ability to track the lag of a consumer group (CG), that is, the number
of entries yet-to-be-delivered from the stream.
The proposed constant-time solution is in the spirit of "best-effort."
Partially addresses #8737.
## Description of approach
We add a new "entries_added" property to the stream. This starts at 0 for a new
stream and is incremented by 1 with every `XADD`. It is essentially an all-time
counter of the entries added to the stream.
Given the stream's length and this counter value, we can trivially find the logical
"entries_added" counter of the first ID if and only if the stream is contiguous.
A fragmented stream contains one or more tombstones generated by `XDEL`s.
The new "xdel_max_id" stream property tracks the latest tombstone.
The CG also tracks its last delivered ID's as an "entries_read" counter and
increments it independently when delivering new messages, unless the this
read counter is invalid (-1 means invalid offset). When the CG's counter is
available, the reported lag is the difference between added and read counters.
Lastly, this also adds a "first_id" field to the stream structure in order to make
looking it up cheaper in most cases.
## Limitations
There are two cases in which the mechanism isn't able to track the lag.
In these cases, `XINFO` replies with `null` in the "lag" field.
The first case is when a CG is created with an arbitrary last delivered ID,
that isn't "0-0", nor the first or the last entries of the stream. In this case,
it is impossible to obtain a valid read counter (short of an O(N) operation).
The second case is when there are one or more tombstones fragmenting
the stream's entries range.
In both cases, given enough time and assuming that the consumers are
active (reading and lacking) and advancing, the CG should be able to
catch up with the tip of the stream and report zero lag.
Once that's achieved, lag tracking would resume as normal (until the
next tombstone is set).
## API changes
* `XGROUP CREATE` added with the optional named argument `[ENTRIESREAD entries-read]`
for explicitly specifying the new CG's counter.
* `XGROUP SETID` added with an optional positional argument `[ENTRIESREAD entries-read]`
for specifying the CG's counter.
* `XINFO` reports the maximal tombstone ID, the recorded first entry ID, and total
number of entries added to the stream.
* `XINFO` reports the current lag and logical read counter of CGs.
* `XSETID` is an internal command that's used in replication/aof. It has been added with
the optional positional arguments `[ENTRIESADDED entries-added] [MAXDELETEDID max-deleted-entry-id]`
for propagating the CG's offset and maximal tombstone ID of the stream.
## The generic unsolved problem
The current stream implementation doesn't provide an efficient way to obtain the
approximate/exact size of a range of entries. While it could've been nice to have
that ability (#5813) in general, let alone specifically in the context of CGs, the risk
and complexities involved in such implementation are in all likelihood prohibitive.
## A refactoring note
The `streamGetEdgeID` has been refactored to accommodate both the existing seek
of any entry as well as seeking non-deleted entries (the addition of the `skip_tombstones`
argument). Furthermore, this refactoring also migrated the seek logic to use the
`streamIterator` (rather than `raxIterator`) that was, in turn, extended with the
`skip_tombstones` Boolean struct field to control the emission of these.
Co-authored-by: Guy Benoish <guy.benoish@redislabs.com>
Co-authored-by: Oran Agra <oran@redislabs.com>
2022-02-23 22:34:58 +02:00
! rioWriteBulkStreamID ( r , & group - > last_id ) | |
! rioWriteBulkString ( r , " ENTRIESREAD " , 11 ) | |
! rioWriteBulkLongLong ( r , group - > entries_read ) )
2020-07-21 01:13:05 -04:00
{
raxStop ( & ri ) ;
2020-09-22 02:05:47 -04:00
streamIteratorStop ( & si ) ;
2020-07-21 01:13:05 -04:00
return 0 ;
}
2018-03-23 17:21:31 +01:00
/* Generate XCLAIMs for each consumer that happens to
2020-09-24 12:02:40 +03:00
* have pending entries . Empty consumers would be generated with
* XGROUP CREATECONSUMER . */
2018-03-23 17:21:31 +01:00
raxIterator ri_cons ;
raxStart ( & ri_cons , group - > consumers ) ;
raxSeek ( & ri_cons , " ^ " , NULL , 0 ) ;
while ( raxNext ( & ri_cons ) ) {
streamConsumer * consumer = ri_cons . data ;
2020-09-24 12:02:40 +03:00
/* If there are no pending entries, just emit XGROUP CREATECONSUMER */
if ( raxSize ( consumer - > pel ) = = 0 ) {
if ( rioWriteStreamEmptyConsumer ( r , key , ( char * ) ri . key ,
ri . key_len , consumer ) = = 0 )
{
2020-12-13 04:47:24 -05:00
raxStop ( & ri_cons ) ;
raxStop ( & ri ) ;
streamIteratorStop ( & si ) ;
2020-09-24 12:02:40 +03:00
return 0 ;
}
continue ;
}
2018-03-23 17:21:31 +01:00
/* For the current consumer, iterate all the PEL entries
* to emit the XCLAIM protocol . */
raxIterator ri_pel ;
raxStart ( & ri_pel , consumer - > pel ) ;
raxSeek ( & ri_pel , " ^ " , NULL , 0 ) ;
while ( raxNext ( & ri_pel ) ) {
streamNACK * nack = ri_pel . data ;
if ( rioWriteStreamPendingEntry ( r , key , ( char * ) ri . key ,
ri . key_len , consumer ,
ri_pel . key , nack ) = = 0 )
{
2020-07-21 01:13:05 -04:00
raxStop ( & ri_pel ) ;
raxStop ( & ri_cons ) ;
raxStop ( & ri ) ;
2020-09-22 02:05:47 -04:00
streamIteratorStop ( & si ) ;
2018-03-23 17:21:31 +01:00
return 0 ;
}
}
raxStop ( & ri_pel ) ;
}
raxStop ( & ri_cons ) ;
}
raxStop ( & ri ) ;
}
2017-09-15 12:37:04 +02:00
streamIteratorStop ( & si ) ;
return 1 ;
}
2016-05-18 11:45:40 +02:00
/* Call the module type callback in order to rewrite a data type
2016-08-09 11:07:32 +02:00
* that is exported by a module and is not handled by Redis itself .
2016-05-18 11:45:40 +02:00
* The function returns 0 on error , 1 on success . */
2021-06-16 14:45:49 +08:00
int rewriteModuleObject ( rio * r , robj * key , robj * o , int dbid ) {
2024-04-05 16:59:55 -07:00
ValkeyModuleIO io ;
2016-05-18 11:45:40 +02:00
moduleValue * mv = o - > ptr ;
moduleType * mt = mv - > type ;
2021-06-16 14:45:49 +08:00
moduleInitIOContext ( io , mt , r , key , dbid ) ;
2016-05-18 11:45:40 +02:00
mt - > aof_rewrite ( & io , key , mv - > value ) ;
2016-10-06 17:05:38 +02:00
if ( io . ctx ) {
moduleFreeContext ( io . ctx ) ;
zfree ( io . ctx ) ;
}
2016-05-18 11:45:40 +02:00
return io . error ? 0 : 1 ;
}
2022-01-19 21:21:42 +02:00
static int rewriteFunctions ( rio * aof ) {
dict * functions = functionsLibGet ( ) ;
dictIterator * iter = dictGetIterator ( functions ) ;
dictEntry * entry = NULL ;
while ( ( entry = dictNext ( iter ) ) ) {
functionLibInfo * li = dictGetVal ( entry ) ;
2022-04-05 10:27:24 +03:00
if ( rioWrite ( aof , " *3 \r \n " , 4 ) = = 0 ) goto werr ;
2022-01-30 13:38:58 +01:00
char function_load [ ] = " $8 \r \n FUNCTION \r \n $4 \r \n LOAD \r \n " ;
if ( rioWrite ( aof , function_load , sizeof ( function_load ) - 1 ) = = 0 ) goto werr ;
2022-01-19 21:21:42 +02:00
if ( rioWriteBulkString ( aof , li - > code , sdslen ( li - > code ) ) = = 0 ) goto werr ;
}
dictReleaseIterator ( iter ) ;
return 1 ;
werr :
dictReleaseIterator ( iter ) ;
return 0 ;
}
2016-08-09 16:41:40 +02:00
int rewriteAppendOnlyFileRio ( rio * aof ) {
2010-06-22 00:07:48 +02:00
dictIterator * di = NULL ;
dictEntry * de ;
2016-08-09 16:41:40 +02:00
int j ;
2020-12-20 20:23:20 +02:00
long key_count = 0 ;
2021-02-16 16:06:51 +02:00
long long updated_time = 0 ;
2010-06-22 00:07:48 +02:00
2021-10-25 18:08:34 +08:00
/* Record timestamp at the beginning of rewriting AOF. */
if ( server . aof_timestamp_enabled ) {
sds ts = genAofTimestampAnnotationIfNeeded ( 1 ) ;
if ( rioWrite ( aof , ts , sdslen ( ts ) ) = = 0 ) { sdsfree ( ts ) ; goto werr ; }
sdsfree ( ts ) ;
}
2022-01-19 21:21:42 +02:00
if ( rewriteFunctions ( aof ) = = 0 ) goto werr ;
2010-06-22 00:07:48 +02:00
for ( j = 0 ; j < server . dbnum ; j + + ) {
char selectcmd [ ] = " *2 \r \n $6 \r \n SELECT \r \n " ;
redisDb * db = server . db + j ;
dict * d = db - > dict ;
if ( dictSize ( d ) = = 0 ) continue ;
2011-06-17 15:40:55 +02:00
di = dictGetSafeIterator ( d ) ;
2010-06-22 00:07:48 +02:00
/* SELECT the new DB */
2016-08-09 16:41:40 +02:00
if ( rioWrite ( aof , selectcmd , sizeof ( selectcmd ) - 1 ) = = 0 ) goto werr ;
if ( rioWriteBulkLongLong ( aof , j ) = = 0 ) goto werr ;
2010-06-22 00:07:48 +02:00
/* Iterate this DB writing every entry */
while ( ( de = dictNext ( di ) ) ! = NULL ) {
2011-05-10 10:07:04 +02:00
sds keystr ;
2010-06-22 00:07:48 +02:00
robj key , * o ;
2011-11-12 01:04:27 +01:00
long long expiretime ;
Use madvise(MADV_DONTNEED) to release memory to reduce COW (#8974)
## Backgroud
As we know, after `fork`, one process will copy pages when writing data to these
pages(CoW), and another process still keep old pages, they totally cost more memory.
For redis, we suffered that redis consumed much memory when the fork child is serializing
key/values, even that maybe cause OOM.
But actually we find, in redis fork child process, the child process don't need to keep some
memory and parent process may write or update that, for example, child process will never
access the key-value that is serialized but users may update it in parent process.
So we think it may reduce COW if the child process release memory that it is not needed.
## Implementation
For releasing key value in child process, we may think we call `decrRefCount` to free memory,
but i find the fork child process still use much memory when we don't write any data to redis,
and it costs much more time that slows down bgsave. Maybe because memory allocator doesn't
really release memory to OS, and it may modify some inner data for this free operation, especially
when we free small objects.
Moreover, CoW is based on pages, so it is a easy way that we only free the memory bulk that is
not less than kernel page size. madvise(MADV_DONTNEED) can quickly release specified region
pages to OS bypassing memory allocator, and allocator still consider that this memory still is used
and don't change its inner data.
There are some buffers we can release in the fork child process:
- **Serialized key-values**
the fork child process never access serialized key-values, so we try to free them.
Because we only can release big bulk memory, and it is time consumed to iterate all
items/members/fields/entries of complex data type. So we decide to iterate them and
try to release them only when their average size of item/member/field/entry is more
than page size of OS.
- **Replication backlog**
Because replication backlog is a cycle buffer, it will be changed quickly if redis has heavy
write traffic, but in fork child process, we don't need to access that.
- **Client buffers**
If clients have requests during having the fork child process, clients' buffer also be changed
frequently. The memory includes client query buffer, output buffer, and client struct used memory.
To get child process peak private dirty memory, we need to count peak memory instead
of last used memory, because the child process may continue to release memory (since
COW used to only grow till now, the last was equivalent to the peak).
Also we're adding a new `current_cow_peak` info variable (to complement the existing
`current_cow_size`)
Co-authored-by: Oran Agra <oran@redislabs.com>
2021-08-05 04:01:46 +08:00
size_t aof_bytes_before_key = aof - > processed_bytes ;
2010-06-22 00:07:48 +02:00
2011-11-08 17:07:55 +01:00
keystr = dictGetKey ( de ) ;
o = dictGetVal ( de ) ;
2010-06-22 00:07:48 +02:00
initStaticStringObject ( key , keystr ) ;
2010-12-28 18:06:40 +01:00
2010-06-22 00:07:48 +02:00
expiretime = getExpire ( db , & key ) ;
/* Save the key and associated value */
2015-07-26 15:28:00 +02:00
if ( o - > type = = OBJ_STRING ) {
2010-06-22 00:07:48 +02:00
/* Emit a SET command */
char cmd [ ] = " *3 \r \n $3 \r \n SET \r \n " ;
2016-08-09 16:41:40 +02:00
if ( rioWrite ( aof , cmd , sizeof ( cmd ) - 1 ) = = 0 ) goto werr ;
2010-06-22 00:07:48 +02:00
/* Key and value */
2016-08-09 16:41:40 +02:00
if ( rioWriteBulkObject ( aof , & key ) = = 0 ) goto werr ;
if ( rioWriteBulkObject ( aof , o ) = = 0 ) goto werr ;
2015-07-26 15:28:00 +02:00
} else if ( o - > type = = OBJ_LIST ) {
2016-08-09 16:41:40 +02:00
if ( rewriteListObject ( aof , & key , o ) = = 0 ) goto werr ;
2015-07-26 15:28:00 +02:00
} else if ( o - > type = = OBJ_SET ) {
2016-08-09 16:41:40 +02:00
if ( rewriteSetObject ( aof , & key , o ) = = 0 ) goto werr ;
2015-07-26 15:28:00 +02:00
} else if ( o - > type = = OBJ_ZSET ) {
2016-08-09 16:41:40 +02:00
if ( rewriteSortedSetObject ( aof , & key , o ) = = 0 ) goto werr ;
2015-07-26 15:28:00 +02:00
} else if ( o - > type = = OBJ_HASH ) {
2016-08-09 16:41:40 +02:00
if ( rewriteHashObject ( aof , & key , o ) = = 0 ) goto werr ;
2017-09-15 12:37:04 +02:00
} else if ( o - > type = = OBJ_STREAM ) {
if ( rewriteStreamObject ( aof , & key , o ) = = 0 ) goto werr ;
2016-05-18 11:45:40 +02:00
} else if ( o - > type = = OBJ_MODULE ) {
2021-06-16 14:45:49 +08:00
if ( rewriteModuleObject ( aof , & key , o , j ) = = 0 ) goto werr ;
2010-06-22 00:07:48 +02:00
} else {
2015-07-27 09:41:48 +02:00
serverPanic ( " Unknown object type " ) ;
2010-06-22 00:07:48 +02:00
}
Use madvise(MADV_DONTNEED) to release memory to reduce COW (#8974)
## Backgroud
As we know, after `fork`, one process will copy pages when writing data to these
pages(CoW), and another process still keep old pages, they totally cost more memory.
For redis, we suffered that redis consumed much memory when the fork child is serializing
key/values, even that maybe cause OOM.
But actually we find, in redis fork child process, the child process don't need to keep some
memory and parent process may write or update that, for example, child process will never
access the key-value that is serialized but users may update it in parent process.
So we think it may reduce COW if the child process release memory that it is not needed.
## Implementation
For releasing key value in child process, we may think we call `decrRefCount` to free memory,
but i find the fork child process still use much memory when we don't write any data to redis,
and it costs much more time that slows down bgsave. Maybe because memory allocator doesn't
really release memory to OS, and it may modify some inner data for this free operation, especially
when we free small objects.
Moreover, CoW is based on pages, so it is a easy way that we only free the memory bulk that is
not less than kernel page size. madvise(MADV_DONTNEED) can quickly release specified region
pages to OS bypassing memory allocator, and allocator still consider that this memory still is used
and don't change its inner data.
There are some buffers we can release in the fork child process:
- **Serialized key-values**
the fork child process never access serialized key-values, so we try to free them.
Because we only can release big bulk memory, and it is time consumed to iterate all
items/members/fields/entries of complex data type. So we decide to iterate them and
try to release them only when their average size of item/member/field/entry is more
than page size of OS.
- **Replication backlog**
Because replication backlog is a cycle buffer, it will be changed quickly if redis has heavy
write traffic, but in fork child process, we don't need to access that.
- **Client buffers**
If clients have requests during having the fork child process, clients' buffer also be changed
frequently. The memory includes client query buffer, output buffer, and client struct used memory.
To get child process peak private dirty memory, we need to count peak memory instead
of last used memory, because the child process may continue to release memory (since
COW used to only grow till now, the last was equivalent to the peak).
Also we're adding a new `current_cow_peak` info variable (to complement the existing
`current_cow_size`)
Co-authored-by: Oran Agra <oran@redislabs.com>
2021-08-05 04:01:46 +08:00
/* In fork child process, we can try to release memory back to the
2022-05-29 18:33:16 +08:00
* OS and possibly avoid or decrease COW . We give the dismiss
Use madvise(MADV_DONTNEED) to release memory to reduce COW (#8974)
## Backgroud
As we know, after `fork`, one process will copy pages when writing data to these
pages(CoW), and another process still keep old pages, they totally cost more memory.
For redis, we suffered that redis consumed much memory when the fork child is serializing
key/values, even that maybe cause OOM.
But actually we find, in redis fork child process, the child process don't need to keep some
memory and parent process may write or update that, for example, child process will never
access the key-value that is serialized but users may update it in parent process.
So we think it may reduce COW if the child process release memory that it is not needed.
## Implementation
For releasing key value in child process, we may think we call `decrRefCount` to free memory,
but i find the fork child process still use much memory when we don't write any data to redis,
and it costs much more time that slows down bgsave. Maybe because memory allocator doesn't
really release memory to OS, and it may modify some inner data for this free operation, especially
when we free small objects.
Moreover, CoW is based on pages, so it is a easy way that we only free the memory bulk that is
not less than kernel page size. madvise(MADV_DONTNEED) can quickly release specified region
pages to OS bypassing memory allocator, and allocator still consider that this memory still is used
and don't change its inner data.
There are some buffers we can release in the fork child process:
- **Serialized key-values**
the fork child process never access serialized key-values, so we try to free them.
Because we only can release big bulk memory, and it is time consumed to iterate all
items/members/fields/entries of complex data type. So we decide to iterate them and
try to release them only when their average size of item/member/field/entry is more
than page size of OS.
- **Replication backlog**
Because replication backlog is a cycle buffer, it will be changed quickly if redis has heavy
write traffic, but in fork child process, we don't need to access that.
- **Client buffers**
If clients have requests during having the fork child process, clients' buffer also be changed
frequently. The memory includes client query buffer, output buffer, and client struct used memory.
To get child process peak private dirty memory, we need to count peak memory instead
of last used memory, because the child process may continue to release memory (since
COW used to only grow till now, the last was equivalent to the peak).
Also we're adding a new `current_cow_peak` info variable (to complement the existing
`current_cow_size`)
Co-authored-by: Oran Agra <oran@redislabs.com>
2021-08-05 04:01:46 +08:00
* mechanism a hint about an estimated size of the object we stored . */
size_t dump_size = aof - > processed_bytes - aof_bytes_before_key ;
if ( server . in_fork_child ) dismissObject ( o , dump_size ) ;
2010-06-22 00:07:48 +02:00
/* Save the expire time */
if ( expiretime ! = - 1 ) {
2011-11-10 17:52:02 +01:00
char cmd [ ] = " *3 \r \n $9 \r \n PEXPIREAT \r \n " ;
2016-08-09 16:41:40 +02:00
if ( rioWrite ( aof , cmd , sizeof ( cmd ) - 1 ) = = 0 ) goto werr ;
if ( rioWriteBulkObject ( aof , & key ) = = 0 ) goto werr ;
if ( rioWriteBulkLongLong ( aof , expiretime ) = = 0 ) goto werr ;
2010-06-22 00:07:48 +02:00
}
2020-12-20 20:23:20 +02:00
2021-02-16 16:06:51 +02:00
/* Update info every 1 second (approximately).
2020-12-20 20:23:20 +02:00
* in order to avoid calling mstime ( ) on each iteration , we will
* check the diff every 1024 keys */
2021-02-16 16:06:51 +02:00
if ( ( key_count + + & 1023 ) = = 0 ) {
2020-12-20 20:23:20 +02:00
long long now = mstime ( ) ;
2021-02-16 16:06:51 +02:00
if ( now - updated_time > = 1000 ) {
sendChildInfo ( CHILD_INFO_TYPE_CURRENT_INFO , key_count , " AOF rewrite " ) ;
updated_time = now ;
2020-12-20 20:23:20 +02:00
}
}
2022-01-02 09:39:01 +02:00
/* Delay before next key if required (for testing) */
if ( server . rdb_key_save_delay )
debugDelay ( server . rdb_key_save_delay ) ;
2010-06-22 00:07:48 +02:00
}
dictReleaseIterator ( di ) ;
2015-01-21 16:39:38 +01:00
di = NULL ;
2010-06-22 00:07:48 +02:00
}
2016-08-09 11:07:32 +02:00
return C_OK ;
werr :
if ( di ) dictReleaseIterator ( di ) ;
return C_ERR ;
}
/* Write a sequence of commands able to fully rebuild the dataset into
* " filename " . Used both by REWRITEAOF and BGREWRITEAOF .
*
* In order to minimize the number of commands needed in the rewritten
* log Redis uses variadic commands when possible , such as RPUSH , SADD
* and ZADD . However at max AOF_REWRITE_ITEMS_PER_CMD items per time
* are inserted using a single command . */
int rewriteAppendOnlyFile ( char * filename ) {
rio aof ;
2020-10-19 08:32:18 -04:00
FILE * fp = NULL ;
2016-08-09 11:07:32 +02:00
char tmpfile [ 256 ] ;
/* Note that we have to use a different temp name here compared to the
* one used by rewriteAppendOnlyFileBackground ( ) function . */
snprintf ( tmpfile , 256 , " temp-rewriteaof-%d.aof " , ( int ) getpid ( ) ) ;
fp = fopen ( tmpfile , " w " ) ;
if ( ! fp ) {
serverLog ( LL_WARNING , " Opening the temp file for AOF rewrite in rewriteAppendOnlyFile(): %s " , strerror ( errno ) ) ;
return C_ERR ;
}
rioInitWithFile ( & aof , fp ) ;
Reclaim page cache of RDB file (#11248)
# Background
The RDB file is usually generated and used once and seldom used again, but the content would reside in page cache until OS evicts it. A potential problem is that once the free memory exhausts, the OS have to reclaim some memory from page cache or swap anonymous page out, which may result in a jitters to the Redis service.
Supposing an exact scenario, a high-capacity machine hosts many redis instances, and we're upgrading the Redis together. The page cache in host machine increases as RDBs are generated. Once the free memory drop into low watermark(which is more likely to happen in older Linux kernel like 3.10, before [watermark_scale_factor](https://lore.kernel.org/lkml/1455813719-2395-1-git-send-email-hannes@cmpxchg.org/) is introduced, the `low watermark` is linear to `min watermark`, and there'is not too much buffer space for `kswapd` to be wake up to reclaim memory), a `direct reclaim` happens, which means the process would stall to wait for memory allocation.
# What the PR does
The PR introduces a capability to reclaim the cache when the RDB is operated. Generally there're two cases, read and write the RDB. For read it's a little messy to address the incremental reclaim, so the reclaim is done in one go in background after the load is finished to avoid blocking the work thread. For write, incremental reclaim amortizes the work of reclaim so no need to put it into background, and the peak watermark of cache can be reduced in this way.
Two cases are addresses specially, replication and restart, for both of which the cache is leveraged to speed up the processing, so the reclaim is postponed to a right time. To do this, a flag is added to`rdbSave` and `rdbLoad` to control whether the cache need to be kept, with the default value false.
# Something deserve noting
1. Though `posix_fadvise` is the POSIX standard, but only few platform support it, e.g. Linux, FreeBSD 10.0.
2. In Linux `posix_fadvise` only take effect on writeback-ed pages, so a `sync`(or `fsync`, `fdatasync`) is needed to flush the dirty page before `posix_fadvise` if we reclaim write cache.
# About test
A unit test is added to verify the effect of `posix_fadvise`.
In integration test overall cache increase is checked, as well as the cache backed by RDB as a specific TCL test is executed in isolated Github action job.
2023-02-12 15:23:29 +08:00
if ( server . aof_rewrite_incremental_fsync ) {
2018-03-16 00:44:50 +08:00
rioSetAutoSync ( & aof , REDIS_AUTOSYNC_BYTES ) ;
Reclaim page cache of RDB file (#11248)
# Background
The RDB file is usually generated and used once and seldom used again, but the content would reside in page cache until OS evicts it. A potential problem is that once the free memory exhausts, the OS have to reclaim some memory from page cache or swap anonymous page out, which may result in a jitters to the Redis service.
Supposing an exact scenario, a high-capacity machine hosts many redis instances, and we're upgrading the Redis together. The page cache in host machine increases as RDBs are generated. Once the free memory drop into low watermark(which is more likely to happen in older Linux kernel like 3.10, before [watermark_scale_factor](https://lore.kernel.org/lkml/1455813719-2395-1-git-send-email-hannes@cmpxchg.org/) is introduced, the `low watermark` is linear to `min watermark`, and there'is not too much buffer space for `kswapd` to be wake up to reclaim memory), a `direct reclaim` happens, which means the process would stall to wait for memory allocation.
# What the PR does
The PR introduces a capability to reclaim the cache when the RDB is operated. Generally there're two cases, read and write the RDB. For read it's a little messy to address the incremental reclaim, so the reclaim is done in one go in background after the load is finished to avoid blocking the work thread. For write, incremental reclaim amortizes the work of reclaim so no need to put it into background, and the peak watermark of cache can be reduced in this way.
Two cases are addresses specially, replication and restart, for both of which the cache is leveraged to speed up the processing, so the reclaim is postponed to a right time. To do this, a flag is added to`rdbSave` and `rdbLoad` to control whether the cache need to be kept, with the default value false.
# Something deserve noting
1. Though `posix_fadvise` is the POSIX standard, but only few platform support it, e.g. Linux, FreeBSD 10.0.
2. In Linux `posix_fadvise` only take effect on writeback-ed pages, so a `sync`(or `fsync`, `fdatasync`) is needed to flush the dirty page before `posix_fadvise` if we reclaim write cache.
# About test
A unit test is added to verify the effect of `posix_fadvise`.
In integration test overall cache increase is checked, as well as the cache backed by RDB as a specific TCL test is executed in isolated Github action job.
2023-02-12 15:23:29 +08:00
rioSetReclaimCache ( & aof , 1 ) ;
}
2016-08-09 11:07:32 +02:00
2019-10-29 17:59:09 +02:00
startSaving ( RDBFLAGS_AOF_PREAMBLE ) ;
2016-08-09 16:41:40 +02:00
if ( server . aof_use_rdb_preamble ) {
2016-08-09 11:07:32 +02:00
int error ;
2022-01-02 09:39:01 +02:00
if ( rdbSaveRio ( SLAVE_REQ_NONE , & aof , & error , RDBFLAGS_AOF_PREAMBLE , NULL ) = = C_ERR ) {
2016-08-09 11:07:32 +02:00
errno = error ;
goto werr ;
}
} else {
2016-08-09 16:41:40 +02:00
if ( rewriteAppendOnlyFileRio ( & aof ) = = C_ERR ) goto werr ;
2016-08-09 11:07:32 +02:00
}
2010-06-22 00:07:48 +02:00
/* Make sure data will not remain on the OS's output buffers */
2020-10-19 08:32:18 -04:00
if ( fflush ( fp ) ) goto werr ;
if ( fsync ( fileno ( fp ) ) ) goto werr ;
Reclaim page cache of RDB file (#11248)
# Background
The RDB file is usually generated and used once and seldom used again, but the content would reside in page cache until OS evicts it. A potential problem is that once the free memory exhausts, the OS have to reclaim some memory from page cache or swap anonymous page out, which may result in a jitters to the Redis service.
Supposing an exact scenario, a high-capacity machine hosts many redis instances, and we're upgrading the Redis together. The page cache in host machine increases as RDBs are generated. Once the free memory drop into low watermark(which is more likely to happen in older Linux kernel like 3.10, before [watermark_scale_factor](https://lore.kernel.org/lkml/1455813719-2395-1-git-send-email-hannes@cmpxchg.org/) is introduced, the `low watermark` is linear to `min watermark`, and there'is not too much buffer space for `kswapd` to be wake up to reclaim memory), a `direct reclaim` happens, which means the process would stall to wait for memory allocation.
# What the PR does
The PR introduces a capability to reclaim the cache when the RDB is operated. Generally there're two cases, read and write the RDB. For read it's a little messy to address the incremental reclaim, so the reclaim is done in one go in background after the load is finished to avoid blocking the work thread. For write, incremental reclaim amortizes the work of reclaim so no need to put it into background, and the peak watermark of cache can be reduced in this way.
Two cases are addresses specially, replication and restart, for both of which the cache is leveraged to speed up the processing, so the reclaim is postponed to a right time. To do this, a flag is added to`rdbSave` and `rdbLoad` to control whether the cache need to be kept, with the default value false.
# Something deserve noting
1. Though `posix_fadvise` is the POSIX standard, but only few platform support it, e.g. Linux, FreeBSD 10.0.
2. In Linux `posix_fadvise` only take effect on writeback-ed pages, so a `sync`(or `fsync`, `fdatasync`) is needed to flush the dirty page before `posix_fadvise` if we reclaim write cache.
# About test
A unit test is added to verify the effect of `posix_fadvise`.
In integration test overall cache increase is checked, as well as the cache backed by RDB as a specific TCL test is executed in isolated Github action job.
2023-02-12 15:23:29 +08:00
if ( reclaimFilePageCache ( fileno ( fp ) , 0 , 0 ) = = - 1 ) {
/* A minor error. Just log to know what happens */
serverLog ( LL_NOTICE , " Unable to reclaim page cache: %s " , strerror ( errno ) ) ;
}
2020-10-19 08:32:18 -04:00
if ( fclose ( fp ) ) { fp = NULL ; goto werr ; }
fp = NULL ;
2010-06-22 00:07:48 +02:00
/* Use RENAME to make sure the DB file is changed atomically only
* if the generate DB file is ok . */
if ( rename ( tmpfile , filename ) = = - 1 ) {
2015-07-27 09:41:48 +02:00
serverLog ( LL_WARNING , " Error moving temp append only file on the final destination: %s " , strerror ( errno ) ) ;
2010-06-22 00:07:48 +02:00
unlink ( tmpfile ) ;
2019-10-29 17:59:09 +02:00
stopSaving ( 0 ) ;
2015-07-26 23:17:55 +02:00
return C_ERR ;
2010-06-22 00:07:48 +02:00
}
2019-10-29 17:59:09 +02:00
stopSaving ( 1 ) ;
Implement Multi Part AOF mechanism to avoid AOFRW overheads. (#9788)
Implement Multi-Part AOF mechanism to avoid overheads during AOFRW.
Introducing a folder with multiple AOF files tracked by a manifest file.
The main issues with the the original AOFRW mechanism are:
* buffering of commands that are processed during rewrite (consuming a lot of RAM)
* freezes of the main process when the AOFRW completes to drain the remaining part of the buffer and fsync it.
* double disk IO for the data that arrives during AOFRW (had to be written to both the old and new AOF files)
The main modifications of this PR:
1. Remove the AOF rewrite buffer and related code.
2. Divide the AOF into multiple files, they are classified as two types, one is the the `BASE` type,
it represents the full amount of data (Maybe AOF or RDB format) after each AOFRW, there is only
one `BASE` file at most. The second is `INCR` type, may have more than one. They represent the
incremental commands since the last AOFRW.
3. Use a AOF manifest file to record and manage these AOF files mentioned above.
4. The original configuration of `appendfilename` will be the base part of the new file name, for example:
`appendonly.aof.1.base.rdb` and `appendonly.aof.2.incr.aof`
5. Add manifest-related TCL tests, and modified some existing tests that depend on the `appendfilename`
6. Remove the `aof_rewrite_buffer_length` field in info.
7. Add `aof-disable-auto-gc` configuration. By default we're automatically deleting HISTORY type AOFs.
It also gives users the opportunity to preserve the history AOFs. just for testing use now.
8. Add AOFRW limiting measure. When the AOFRW failures reaches the threshold (3 times now),
we will delay the execution of the next AOFRW by 1 minute. If the next AOFRW also fails, it will be
delayed by 2 minutes. The next is 4, 8, 16, the maximum delay is 60 minutes (1 hour). During the limit
period, we can still use the 'bgrewriteaof' command to execute AOFRW immediately.
9. Support upgrade (load) data from old version redis.
10. Add `appenddirname` configuration, as the directory name of the append only files. All AOF files and
manifest file will be placed in this directory.
11. Only the last AOF file (BASE or INCR) can be truncated. Otherwise redis will exit even if
`aof-load-truncated` is enabled.
Co-authored-by: Oran Agra <oran@redislabs.com>
2022-01-04 01:14:13 +08:00
2015-07-26 23:17:55 +02:00
return C_OK ;
2010-06-22 00:07:48 +02:00
werr :
2015-07-27 09:41:48 +02:00
serverLog ( LL_WARNING , " Write error writing append only file on disk: %s " , strerror ( errno ) ) ;
2020-10-19 08:32:18 -04:00
if ( fp ) fclose ( fp ) ;
2010-06-22 00:07:48 +02:00
unlink ( tmpfile ) ;
2019-10-29 17:59:09 +02:00
stopSaving ( 0 ) ;
2015-07-26 23:17:55 +02:00
return C_ERR ;
2010-06-22 00:07:48 +02:00
}
2014-07-04 15:22:49 +02:00
/* ----------------------------------------------------------------------------
2014-07-12 18:02:17 +02:00
* AOF background rewrite
2014-07-04 15:22:49 +02:00
* - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - */
2010-06-22 00:07:48 +02:00
/* This is how rewriting of the append only file in background works:
*
* 1 ) The user calls BGREWRITEAOF
* 2 ) Redis calls this function , that forks ( ) :
* 2 a ) the child rewrite the append only file in a temp file .
Implement Multi Part AOF mechanism to avoid AOFRW overheads. (#9788)
Implement Multi-Part AOF mechanism to avoid overheads during AOFRW.
Introducing a folder with multiple AOF files tracked by a manifest file.
The main issues with the the original AOFRW mechanism are:
* buffering of commands that are processed during rewrite (consuming a lot of RAM)
* freezes of the main process when the AOFRW completes to drain the remaining part of the buffer and fsync it.
* double disk IO for the data that arrives during AOFRW (had to be written to both the old and new AOF files)
The main modifications of this PR:
1. Remove the AOF rewrite buffer and related code.
2. Divide the AOF into multiple files, they are classified as two types, one is the the `BASE` type,
it represents the full amount of data (Maybe AOF or RDB format) after each AOFRW, there is only
one `BASE` file at most. The second is `INCR` type, may have more than one. They represent the
incremental commands since the last AOFRW.
3. Use a AOF manifest file to record and manage these AOF files mentioned above.
4. The original configuration of `appendfilename` will be the base part of the new file name, for example:
`appendonly.aof.1.base.rdb` and `appendonly.aof.2.incr.aof`
5. Add manifest-related TCL tests, and modified some existing tests that depend on the `appendfilename`
6. Remove the `aof_rewrite_buffer_length` field in info.
7. Add `aof-disable-auto-gc` configuration. By default we're automatically deleting HISTORY type AOFs.
It also gives users the opportunity to preserve the history AOFs. just for testing use now.
8. Add AOFRW limiting measure. When the AOFRW failures reaches the threshold (3 times now),
we will delay the execution of the next AOFRW by 1 minute. If the next AOFRW also fails, it will be
delayed by 2 minutes. The next is 4, 8, 16, the maximum delay is 60 minutes (1 hour). During the limit
period, we can still use the 'bgrewriteaof' command to execute AOFRW immediately.
9. Support upgrade (load) data from old version redis.
10. Add `appenddirname` configuration, as the directory name of the append only files. All AOF files and
manifest file will be placed in this directory.
11. Only the last AOF file (BASE or INCR) can be truncated. Otherwise redis will exit even if
`aof-load-truncated` is enabled.
Co-authored-by: Oran Agra <oran@redislabs.com>
2022-01-04 01:14:13 +08:00
* 2 b ) the parent open a new INCR AOF file to continue writing .
2010-06-22 00:07:48 +02:00
* 3 ) When the child finished ' 2 a ' exists .
Implement Multi Part AOF mechanism to avoid AOFRW overheads. (#9788)
Implement Multi-Part AOF mechanism to avoid overheads during AOFRW.
Introducing a folder with multiple AOF files tracked by a manifest file.
The main issues with the the original AOFRW mechanism are:
* buffering of commands that are processed during rewrite (consuming a lot of RAM)
* freezes of the main process when the AOFRW completes to drain the remaining part of the buffer and fsync it.
* double disk IO for the data that arrives during AOFRW (had to be written to both the old and new AOF files)
The main modifications of this PR:
1. Remove the AOF rewrite buffer and related code.
2. Divide the AOF into multiple files, they are classified as two types, one is the the `BASE` type,
it represents the full amount of data (Maybe AOF or RDB format) after each AOFRW, there is only
one `BASE` file at most. The second is `INCR` type, may have more than one. They represent the
incremental commands since the last AOFRW.
3. Use a AOF manifest file to record and manage these AOF files mentioned above.
4. The original configuration of `appendfilename` will be the base part of the new file name, for example:
`appendonly.aof.1.base.rdb` and `appendonly.aof.2.incr.aof`
5. Add manifest-related TCL tests, and modified some existing tests that depend on the `appendfilename`
6. Remove the `aof_rewrite_buffer_length` field in info.
7. Add `aof-disable-auto-gc` configuration. By default we're automatically deleting HISTORY type AOFs.
It also gives users the opportunity to preserve the history AOFs. just for testing use now.
8. Add AOFRW limiting measure. When the AOFRW failures reaches the threshold (3 times now),
we will delay the execution of the next AOFRW by 1 minute. If the next AOFRW also fails, it will be
delayed by 2 minutes. The next is 4, 8, 16, the maximum delay is 60 minutes (1 hour). During the limit
period, we can still use the 'bgrewriteaof' command to execute AOFRW immediately.
9. Support upgrade (load) data from old version redis.
10. Add `appenddirname` configuration, as the directory name of the append only files. All AOF files and
manifest file will be placed in this directory.
11. Only the last AOF file (BASE or INCR) can be truncated. Otherwise redis will exit even if
`aof-load-truncated` is enabled.
Co-authored-by: Oran Agra <oran@redislabs.com>
2022-01-04 01:14:13 +08:00
* 4 ) The parent will trap the exit code , if it ' s OK , it will :
* 4 a ) get a new BASE file name and mark the previous ( if we have ) as the HISTORY type
* 4 b ) rename ( 2 ) the temp file in new BASE file name
* 4 c ) mark the rewritten INCR AOFs as history type
* 4 d ) persist AOF manifest file
* 4 e ) Delete the history files use bio
2010-06-22 00:07:48 +02:00
*/
int rewriteAppendOnlyFileBackground ( void ) {
pid_t childpid ;
2019-09-27 12:03:09 +02:00
if ( hasActiveChildProcess ( ) ) return C_ERR ;
Implement Multi Part AOF mechanism to avoid AOFRW overheads. (#9788)
Implement Multi-Part AOF mechanism to avoid overheads during AOFRW.
Introducing a folder with multiple AOF files tracked by a manifest file.
The main issues with the the original AOFRW mechanism are:
* buffering of commands that are processed during rewrite (consuming a lot of RAM)
* freezes of the main process when the AOFRW completes to drain the remaining part of the buffer and fsync it.
* double disk IO for the data that arrives during AOFRW (had to be written to both the old and new AOF files)
The main modifications of this PR:
1. Remove the AOF rewrite buffer and related code.
2. Divide the AOF into multiple files, they are classified as two types, one is the the `BASE` type,
it represents the full amount of data (Maybe AOF or RDB format) after each AOFRW, there is only
one `BASE` file at most. The second is `INCR` type, may have more than one. They represent the
incremental commands since the last AOFRW.
3. Use a AOF manifest file to record and manage these AOF files mentioned above.
4. The original configuration of `appendfilename` will be the base part of the new file name, for example:
`appendonly.aof.1.base.rdb` and `appendonly.aof.2.incr.aof`
5. Add manifest-related TCL tests, and modified some existing tests that depend on the `appendfilename`
6. Remove the `aof_rewrite_buffer_length` field in info.
7. Add `aof-disable-auto-gc` configuration. By default we're automatically deleting HISTORY type AOFs.
It also gives users the opportunity to preserve the history AOFs. just for testing use now.
8. Add AOFRW limiting measure. When the AOFRW failures reaches the threshold (3 times now),
we will delay the execution of the next AOFRW by 1 minute. If the next AOFRW also fails, it will be
delayed by 2 minutes. The next is 4, 8, 16, the maximum delay is 60 minutes (1 hour). During the limit
period, we can still use the 'bgrewriteaof' command to execute AOFRW immediately.
9. Support upgrade (load) data from old version redis.
10. Add `appenddirname` configuration, as the directory name of the append only files. All AOF files and
manifest file will be placed in this directory.
11. Only the last AOF file (BASE or INCR) can be truncated. Otherwise redis will exit even if
`aof-load-truncated` is enabled.
Co-authored-by: Oran Agra <oran@redislabs.com>
2022-01-04 01:14:13 +08:00
if ( dirCreateIfMissing ( server . aof_dirname ) = = - 1 ) {
serverLog ( LL_WARNING , " Can't open or create append-only dir %s: %s " ,
server . aof_dirname , strerror ( errno ) ) ;
2022-06-23 23:40:20 +08:00
server . aof_lastbgrewrite_status = C_ERR ;
Implement Multi Part AOF mechanism to avoid AOFRW overheads. (#9788)
Implement Multi-Part AOF mechanism to avoid overheads during AOFRW.
Introducing a folder with multiple AOF files tracked by a manifest file.
The main issues with the the original AOFRW mechanism are:
* buffering of commands that are processed during rewrite (consuming a lot of RAM)
* freezes of the main process when the AOFRW completes to drain the remaining part of the buffer and fsync it.
* double disk IO for the data that arrives during AOFRW (had to be written to both the old and new AOF files)
The main modifications of this PR:
1. Remove the AOF rewrite buffer and related code.
2. Divide the AOF into multiple files, they are classified as two types, one is the the `BASE` type,
it represents the full amount of data (Maybe AOF or RDB format) after each AOFRW, there is only
one `BASE` file at most. The second is `INCR` type, may have more than one. They represent the
incremental commands since the last AOFRW.
3. Use a AOF manifest file to record and manage these AOF files mentioned above.
4. The original configuration of `appendfilename` will be the base part of the new file name, for example:
`appendonly.aof.1.base.rdb` and `appendonly.aof.2.incr.aof`
5. Add manifest-related TCL tests, and modified some existing tests that depend on the `appendfilename`
6. Remove the `aof_rewrite_buffer_length` field in info.
7. Add `aof-disable-auto-gc` configuration. By default we're automatically deleting HISTORY type AOFs.
It also gives users the opportunity to preserve the history AOFs. just for testing use now.
8. Add AOFRW limiting measure. When the AOFRW failures reaches the threshold (3 times now),
we will delay the execution of the next AOFRW by 1 minute. If the next AOFRW also fails, it will be
delayed by 2 minutes. The next is 4, 8, 16, the maximum delay is 60 minutes (1 hour). During the limit
period, we can still use the 'bgrewriteaof' command to execute AOFRW immediately.
9. Support upgrade (load) data from old version redis.
10. Add `appenddirname` configuration, as the directory name of the append only files. All AOF files and
manifest file will be placed in this directory.
11. Only the last AOF file (BASE or INCR) can be truncated. Otherwise redis will exit even if
`aof-load-truncated` is enabled.
Co-authored-by: Oran Agra <oran@redislabs.com>
2022-01-04 01:14:13 +08:00
return C_ERR ;
}
/* We set aof_selected_db to -1 in order to force the next call to the
2022-01-25 17:37:18 +08:00
* feedAppendOnlyFile ( ) to issue a SELECT command . */
Implement Multi Part AOF mechanism to avoid AOFRW overheads. (#9788)
Implement Multi-Part AOF mechanism to avoid overheads during AOFRW.
Introducing a folder with multiple AOF files tracked by a manifest file.
The main issues with the the original AOFRW mechanism are:
* buffering of commands that are processed during rewrite (consuming a lot of RAM)
* freezes of the main process when the AOFRW completes to drain the remaining part of the buffer and fsync it.
* double disk IO for the data that arrives during AOFRW (had to be written to both the old and new AOF files)
The main modifications of this PR:
1. Remove the AOF rewrite buffer and related code.
2. Divide the AOF into multiple files, they are classified as two types, one is the the `BASE` type,
it represents the full amount of data (Maybe AOF or RDB format) after each AOFRW, there is only
one `BASE` file at most. The second is `INCR` type, may have more than one. They represent the
incremental commands since the last AOFRW.
3. Use a AOF manifest file to record and manage these AOF files mentioned above.
4. The original configuration of `appendfilename` will be the base part of the new file name, for example:
`appendonly.aof.1.base.rdb` and `appendonly.aof.2.incr.aof`
5. Add manifest-related TCL tests, and modified some existing tests that depend on the `appendfilename`
6. Remove the `aof_rewrite_buffer_length` field in info.
7. Add `aof-disable-auto-gc` configuration. By default we're automatically deleting HISTORY type AOFs.
It also gives users the opportunity to preserve the history AOFs. just for testing use now.
8. Add AOFRW limiting measure. When the AOFRW failures reaches the threshold (3 times now),
we will delay the execution of the next AOFRW by 1 minute. If the next AOFRW also fails, it will be
delayed by 2 minutes. The next is 4, 8, 16, the maximum delay is 60 minutes (1 hour). During the limit
period, we can still use the 'bgrewriteaof' command to execute AOFRW immediately.
9. Support upgrade (load) data from old version redis.
10. Add `appenddirname` configuration, as the directory name of the append only files. All AOF files and
manifest file will be placed in this directory.
11. Only the last AOF file (BASE or INCR) can be truncated. Otherwise redis will exit even if
`aof-load-truncated` is enabled.
Co-authored-by: Oran Agra <oran@redislabs.com>
2022-01-04 01:14:13 +08:00
server . aof_selected_db = - 1 ;
flushAppendOnlyFile ( 1 ) ;
2022-06-23 23:40:20 +08:00
if ( openNewIncrAofForAppend ( ) ! = C_OK ) {
server . aof_lastbgrewrite_status = C_ERR ;
return C_ERR ;
}
2023-09-28 16:05:53 +02:00
if ( server . aof_state = = AOF_WAIT_REWRITE ) {
/* Wait for all bio jobs related to AOF to drain. This prevents a race
* between updates to ` fsynced_reploff_pending ` of the worker thread , belonging
* to the previous AOF , and the new one . This concern is specific for a full
* sync scenario where we don ' t wanna risk the ACKed replication offset
* jumping backwards or forward when switching to a different master . */
bioDrainWorker ( BIO_AOF_FSYNC ) ;
/* Set the initial repl_offset, which will be applied to fsynced_reploff
* when AOFRW finishes ( after possibly being updated by a bio thread ) */
atomicSet ( server . fsynced_reploff_pending , server . master_repl_offset ) ;
server . fsynced_reploff = 0 ;
}
2022-02-17 14:32:48 +02:00
server . stat_aof_rewrites + + ;
2023-09-28 16:05:53 +02:00
2020-09-20 13:43:28 +03:00
if ( ( childpid = redisFork ( CHILD_TYPE_AOF ) ) = = 0 ) {
2010-06-22 00:07:48 +02:00
char tmpfile [ 256 ] ;
2011-05-29 15:17:29 +02:00
/* Child */
2013-02-26 11:52:12 +01:00
redisSetProcTitle ( " redis-aof-rewrite " ) ;
Support setcpuaffinity on linux/bsd
Currently, there are several types of threads/child processes of a
redis server. Sometimes we need deeply optimise the performance of
redis, so we would like to isolate threads/processes.
There were some discussion about cpu affinity cases in the issue:
https://github.com/antirez/redis/issues/2863
So implement cpu affinity setting by redis.conf in this patch, then
we can config server_cpulist/bio_cpulist/aof_rewrite_cpulist/
bgsave_cpulist by cpu list.
Examples of cpulist in redis.conf:
server_cpulist 0-7:2 means cpu affinity 0,2,4,6
bio_cpulist 1,3 means cpu affinity 1,3
aof_rewrite_cpulist 8-11 means cpu affinity 8,9,10,11
bgsave_cpulist 1,10-11 means cpu affinity 1,10,11
Test on linux/freebsd, both work fine.
Signed-off-by: zhenwei pi <pizhenwei@bytedance.com>
2020-05-02 20:05:39 +08:00
redisSetCpuAffinity ( server . aof_rewrite_cpulist ) ;
2010-06-22 00:07:48 +02:00
snprintf ( tmpfile , 256 , " temp-rewriteaof-bg-%d.aof " , ( int ) getpid ( ) ) ;
2015-07-26 23:17:55 +02:00
if ( rewriteAppendOnlyFile ( tmpfile ) = = C_OK ) {
2022-05-26 22:23:05 +08:00
serverLog ( LL_NOTICE ,
" Successfully created the temporary AOF base file %s " , tmpfile ) ;
2021-02-16 16:06:51 +02:00
sendChildCowInfo ( CHILD_INFO_TYPE_AOF_COW_SIZE , " AOF rewrite " ) ;
2012-04-07 12:11:23 +02:00
exitFromChild ( 0 ) ;
2010-06-22 00:07:48 +02:00
} else {
2012-04-07 12:11:23 +02:00
exitFromChild ( 1 ) ;
2010-06-22 00:07:48 +02:00
}
} else {
/* Parent */
if ( childpid = = - 1 ) {
2021-11-16 19:59:03 +08:00
server . aof_lastbgrewrite_status = C_ERR ;
2015-07-27 09:41:48 +02:00
serverLog ( LL_WARNING ,
2010-06-22 00:07:48 +02:00
" Can't rewrite append only file in background: fork: %s " ,
strerror ( errno ) ) ;
2015-07-26 23:17:55 +02:00
return C_ERR ;
2010-06-22 00:07:48 +02:00
}
2015-07-27 09:41:48 +02:00
serverLog ( LL_NOTICE ,
2020-12-13 17:09:54 +02:00
" Background append only file rewriting started by pid %ld " , ( long ) childpid ) ;
2011-12-21 11:58:42 +01:00
server . aof_rewrite_scheduled = 0 ;
2012-05-25 12:11:30 +02:00
server . aof_rewrite_time_start = time ( NULL ) ;
2015-07-26 23:17:55 +02:00
return C_OK ;
2010-06-22 00:07:48 +02:00
}
2015-07-26 23:17:55 +02:00
return C_OK ; /* unreached */
2010-06-22 00:07:48 +02:00
}
2015-07-26 15:20:46 +02:00
void bgrewriteaofCommand ( client * c ) {
Refactory fork child related infra, Unify child pid
This is a refactory commit, isn't suppose to have any actual impact.
it does the following:
- keep just one server struct fork child pid variable instead of 3
- have one server struct variable indicating the purpose of the current fork
child.
- redisFork is now responsible of updating the server struct with the pid,
which means it can be the one that calls updateDictResizePolicy
- move child info pipe handling into redisFork instead of having them
repeated outside
- there are two classes of fork purposes, mutually exclusive group (AOF, RDB,
Module), and one that can create several forks to coexist in parallel (LDB,
but maybe Modules some day too, Module API allows for that).
- minor fix to killRDBChild:
unlike killAppendOnlyChild and TerminateModuleForkChild, the killRDBChild
doesn't clear the pid variable or call wait4, so checkChildrenDone does
the cleanup for it.
This commit removes the explicit calls to rdbRemoveTempFile, closeChildInfoPipe,
updateDictResizePolicy, which didn't do any harm, but where unnecessary.
2020-12-16 15:14:04 +02:00
if ( server . child_type = = CHILD_TYPE_AOF ) {
2010-09-02 19:52:24 +02:00
addReplyError ( c , " Background append only file rewriting already in progress " ) ;
2022-01-04 12:37:47 +01:00
} else if ( hasActiveChildProcess ( ) | | server . in_exec ) {
2011-12-21 11:58:42 +01:00
server . aof_rewrite_scheduled = 1 ;
2022-04-26 21:31:19 +08:00
/* When manually triggering AOFRW we reset the count
* so that it can be executed immediately . */
server . stat_aofrw_consecutive_failures = 0 ;
2011-06-10 18:35:16 +02:00
addReplyStatus ( c , " Background append only file rewriting scheduled " ) ;
2015-07-26 23:17:55 +02:00
} else if ( rewriteAppendOnlyFileBackground ( ) = = C_OK ) {
2010-09-02 19:52:24 +02:00
addReplyStatus ( c , " Background append only file rewriting started " ) ;
2010-06-22 00:07:48 +02:00
} else {
2019-09-26 16:14:21 +02:00
addReplyError ( c , " Can't execute an AOF background rewriting. "
" Please check the server logs for more information. " ) ;
2010-06-22 00:07:48 +02:00
}
}
void aofRemoveTempFile ( pid_t childpid ) {
char tmpfile [ 256 ] ;
snprintf ( tmpfile , 256 , " temp-rewriteaof-bg-%d.aof " , ( int ) childpid ) ;
2020-10-13 14:34:07 +08:00
bg_unlink ( tmpfile ) ;
2019-03-12 20:46:40 +08:00
snprintf ( tmpfile , 256 , " temp-rewriteaof-%d.aof " , ( int ) childpid ) ;
2020-10-13 14:34:07 +08:00
bg_unlink ( tmpfile ) ;
2010-06-22 00:07:48 +02:00
}
2022-02-22 08:59:23 +02:00
/* Get size of an AOF file.
* The status argument is an optional output argument to be filled with
* one of the AOF_ status values . */
off_t getAppendOnlyFileSize ( sds filename , int * status ) {
2011-06-10 12:39:23 +02:00
struct redis_stat sb ;
Implement Multi Part AOF mechanism to avoid AOFRW overheads. (#9788)
Implement Multi-Part AOF mechanism to avoid overheads during AOFRW.
Introducing a folder with multiple AOF files tracked by a manifest file.
The main issues with the the original AOFRW mechanism are:
* buffering of commands that are processed during rewrite (consuming a lot of RAM)
* freezes of the main process when the AOFRW completes to drain the remaining part of the buffer and fsync it.
* double disk IO for the data that arrives during AOFRW (had to be written to both the old and new AOF files)
The main modifications of this PR:
1. Remove the AOF rewrite buffer and related code.
2. Divide the AOF into multiple files, they are classified as two types, one is the the `BASE` type,
it represents the full amount of data (Maybe AOF or RDB format) after each AOFRW, there is only
one `BASE` file at most. The second is `INCR` type, may have more than one. They represent the
incremental commands since the last AOFRW.
3. Use a AOF manifest file to record and manage these AOF files mentioned above.
4. The original configuration of `appendfilename` will be the base part of the new file name, for example:
`appendonly.aof.1.base.rdb` and `appendonly.aof.2.incr.aof`
5. Add manifest-related TCL tests, and modified some existing tests that depend on the `appendfilename`
6. Remove the `aof_rewrite_buffer_length` field in info.
7. Add `aof-disable-auto-gc` configuration. By default we're automatically deleting HISTORY type AOFs.
It also gives users the opportunity to preserve the history AOFs. just for testing use now.
8. Add AOFRW limiting measure. When the AOFRW failures reaches the threshold (3 times now),
we will delay the execution of the next AOFRW by 1 minute. If the next AOFRW also fails, it will be
delayed by 2 minutes. The next is 4, 8, 16, the maximum delay is 60 minutes (1 hour). During the limit
period, we can still use the 'bgrewriteaof' command to execute AOFRW immediately.
9. Support upgrade (load) data from old version redis.
10. Add `appenddirname` configuration, as the directory name of the append only files. All AOF files and
manifest file will be placed in this directory.
11. Only the last AOF file (BASE or INCR) can be truncated. Otherwise redis will exit even if
`aof-load-truncated` is enabled.
Co-authored-by: Oran Agra <oran@redislabs.com>
2022-01-04 01:14:13 +08:00
off_t size ;
2014-07-01 17:19:08 +02:00
mstime_t latency ;
2011-06-10 12:39:23 +02:00
Implement Multi Part AOF mechanism to avoid AOFRW overheads. (#9788)
Implement Multi-Part AOF mechanism to avoid overheads during AOFRW.
Introducing a folder with multiple AOF files tracked by a manifest file.
The main issues with the the original AOFRW mechanism are:
* buffering of commands that are processed during rewrite (consuming a lot of RAM)
* freezes of the main process when the AOFRW completes to drain the remaining part of the buffer and fsync it.
* double disk IO for the data that arrives during AOFRW (had to be written to both the old and new AOF files)
The main modifications of this PR:
1. Remove the AOF rewrite buffer and related code.
2. Divide the AOF into multiple files, they are classified as two types, one is the the `BASE` type,
it represents the full amount of data (Maybe AOF or RDB format) after each AOFRW, there is only
one `BASE` file at most. The second is `INCR` type, may have more than one. They represent the
incremental commands since the last AOFRW.
3. Use a AOF manifest file to record and manage these AOF files mentioned above.
4. The original configuration of `appendfilename` will be the base part of the new file name, for example:
`appendonly.aof.1.base.rdb` and `appendonly.aof.2.incr.aof`
5. Add manifest-related TCL tests, and modified some existing tests that depend on the `appendfilename`
6. Remove the `aof_rewrite_buffer_length` field in info.
7. Add `aof-disable-auto-gc` configuration. By default we're automatically deleting HISTORY type AOFs.
It also gives users the opportunity to preserve the history AOFs. just for testing use now.
8. Add AOFRW limiting measure. When the AOFRW failures reaches the threshold (3 times now),
we will delay the execution of the next AOFRW by 1 minute. If the next AOFRW also fails, it will be
delayed by 2 minutes. The next is 4, 8, 16, the maximum delay is 60 minutes (1 hour). During the limit
period, we can still use the 'bgrewriteaof' command to execute AOFRW immediately.
9. Support upgrade (load) data from old version redis.
10. Add `appenddirname` configuration, as the directory name of the append only files. All AOF files and
manifest file will be placed in this directory.
11. Only the last AOF file (BASE or INCR) can be truncated. Otherwise redis will exit even if
`aof-load-truncated` is enabled.
Co-authored-by: Oran Agra <oran@redislabs.com>
2022-01-04 01:14:13 +08:00
sds aof_filepath = makePath ( server . aof_dirname , filename ) ;
2014-07-01 17:19:08 +02:00
latencyStartMonitor ( latency ) ;
Implement Multi Part AOF mechanism to avoid AOFRW overheads. (#9788)
Implement Multi-Part AOF mechanism to avoid overheads during AOFRW.
Introducing a folder with multiple AOF files tracked by a manifest file.
The main issues with the the original AOFRW mechanism are:
* buffering of commands that are processed during rewrite (consuming a lot of RAM)
* freezes of the main process when the AOFRW completes to drain the remaining part of the buffer and fsync it.
* double disk IO for the data that arrives during AOFRW (had to be written to both the old and new AOF files)
The main modifications of this PR:
1. Remove the AOF rewrite buffer and related code.
2. Divide the AOF into multiple files, they are classified as two types, one is the the `BASE` type,
it represents the full amount of data (Maybe AOF or RDB format) after each AOFRW, there is only
one `BASE` file at most. The second is `INCR` type, may have more than one. They represent the
incremental commands since the last AOFRW.
3. Use a AOF manifest file to record and manage these AOF files mentioned above.
4. The original configuration of `appendfilename` will be the base part of the new file name, for example:
`appendonly.aof.1.base.rdb` and `appendonly.aof.2.incr.aof`
5. Add manifest-related TCL tests, and modified some existing tests that depend on the `appendfilename`
6. Remove the `aof_rewrite_buffer_length` field in info.
7. Add `aof-disable-auto-gc` configuration. By default we're automatically deleting HISTORY type AOFs.
It also gives users the opportunity to preserve the history AOFs. just for testing use now.
8. Add AOFRW limiting measure. When the AOFRW failures reaches the threshold (3 times now),
we will delay the execution of the next AOFRW by 1 minute. If the next AOFRW also fails, it will be
delayed by 2 minutes. The next is 4, 8, 16, the maximum delay is 60 minutes (1 hour). During the limit
period, we can still use the 'bgrewriteaof' command to execute AOFRW immediately.
9. Support upgrade (load) data from old version redis.
10. Add `appenddirname` configuration, as the directory name of the append only files. All AOF files and
manifest file will be placed in this directory.
11. Only the last AOF file (BASE or INCR) can be truncated. Otherwise redis will exit even if
`aof-load-truncated` is enabled.
Co-authored-by: Oran Agra <oran@redislabs.com>
2022-01-04 01:14:13 +08:00
if ( redis_stat ( aof_filepath , & sb ) = = - 1 ) {
2022-02-22 08:59:23 +02:00
if ( status ) * status = errno = = ENOENT ? AOF_NOT_EXIST : AOF_OPEN_ERR ;
Implement Multi Part AOF mechanism to avoid AOFRW overheads. (#9788)
Implement Multi-Part AOF mechanism to avoid overheads during AOFRW.
Introducing a folder with multiple AOF files tracked by a manifest file.
The main issues with the the original AOFRW mechanism are:
* buffering of commands that are processed during rewrite (consuming a lot of RAM)
* freezes of the main process when the AOFRW completes to drain the remaining part of the buffer and fsync it.
* double disk IO for the data that arrives during AOFRW (had to be written to both the old and new AOF files)
The main modifications of this PR:
1. Remove the AOF rewrite buffer and related code.
2. Divide the AOF into multiple files, they are classified as two types, one is the the `BASE` type,
it represents the full amount of data (Maybe AOF or RDB format) after each AOFRW, there is only
one `BASE` file at most. The second is `INCR` type, may have more than one. They represent the
incremental commands since the last AOFRW.
3. Use a AOF manifest file to record and manage these AOF files mentioned above.
4. The original configuration of `appendfilename` will be the base part of the new file name, for example:
`appendonly.aof.1.base.rdb` and `appendonly.aof.2.incr.aof`
5. Add manifest-related TCL tests, and modified some existing tests that depend on the `appendfilename`
6. Remove the `aof_rewrite_buffer_length` field in info.
7. Add `aof-disable-auto-gc` configuration. By default we're automatically deleting HISTORY type AOFs.
It also gives users the opportunity to preserve the history AOFs. just for testing use now.
8. Add AOFRW limiting measure. When the AOFRW failures reaches the threshold (3 times now),
we will delay the execution of the next AOFRW by 1 minute. If the next AOFRW also fails, it will be
delayed by 2 minutes. The next is 4, 8, 16, the maximum delay is 60 minutes (1 hour). During the limit
period, we can still use the 'bgrewriteaof' command to execute AOFRW immediately.
9. Support upgrade (load) data from old version redis.
10. Add `appenddirname` configuration, as the directory name of the append only files. All AOF files and
manifest file will be placed in this directory.
11. Only the last AOF file (BASE or INCR) can be truncated. Otherwise redis will exit even if
`aof-load-truncated` is enabled.
Co-authored-by: Oran Agra <oran@redislabs.com>
2022-01-04 01:14:13 +08:00
serverLog ( LL_WARNING , " Unable to obtain the AOF file %s length. stat: %s " ,
filename , strerror ( errno ) ) ;
size = 0 ;
2011-06-10 12:39:23 +02:00
} else {
2022-02-22 08:59:23 +02:00
if ( status ) * status = AOF_OK ;
Implement Multi Part AOF mechanism to avoid AOFRW overheads. (#9788)
Implement Multi-Part AOF mechanism to avoid overheads during AOFRW.
Introducing a folder with multiple AOF files tracked by a manifest file.
The main issues with the the original AOFRW mechanism are:
* buffering of commands that are processed during rewrite (consuming a lot of RAM)
* freezes of the main process when the AOFRW completes to drain the remaining part of the buffer and fsync it.
* double disk IO for the data that arrives during AOFRW (had to be written to both the old and new AOF files)
The main modifications of this PR:
1. Remove the AOF rewrite buffer and related code.
2. Divide the AOF into multiple files, they are classified as two types, one is the the `BASE` type,
it represents the full amount of data (Maybe AOF or RDB format) after each AOFRW, there is only
one `BASE` file at most. The second is `INCR` type, may have more than one. They represent the
incremental commands since the last AOFRW.
3. Use a AOF manifest file to record and manage these AOF files mentioned above.
4. The original configuration of `appendfilename` will be the base part of the new file name, for example:
`appendonly.aof.1.base.rdb` and `appendonly.aof.2.incr.aof`
5. Add manifest-related TCL tests, and modified some existing tests that depend on the `appendfilename`
6. Remove the `aof_rewrite_buffer_length` field in info.
7. Add `aof-disable-auto-gc` configuration. By default we're automatically deleting HISTORY type AOFs.
It also gives users the opportunity to preserve the history AOFs. just for testing use now.
8. Add AOFRW limiting measure. When the AOFRW failures reaches the threshold (3 times now),
we will delay the execution of the next AOFRW by 1 minute. If the next AOFRW also fails, it will be
delayed by 2 minutes. The next is 4, 8, 16, the maximum delay is 60 minutes (1 hour). During the limit
period, we can still use the 'bgrewriteaof' command to execute AOFRW immediately.
9. Support upgrade (load) data from old version redis.
10. Add `appenddirname` configuration, as the directory name of the append only files. All AOF files and
manifest file will be placed in this directory.
11. Only the last AOF file (BASE or INCR) can be truncated. Otherwise redis will exit even if
`aof-load-truncated` is enabled.
Co-authored-by: Oran Agra <oran@redislabs.com>
2022-01-04 01:14:13 +08:00
size = sb . st_size ;
2011-06-10 12:39:23 +02:00
}
2014-07-01 17:19:08 +02:00
latencyEndMonitor ( latency ) ;
Implement Multi Part AOF mechanism to avoid AOFRW overheads. (#9788)
Implement Multi-Part AOF mechanism to avoid overheads during AOFRW.
Introducing a folder with multiple AOF files tracked by a manifest file.
The main issues with the the original AOFRW mechanism are:
* buffering of commands that are processed during rewrite (consuming a lot of RAM)
* freezes of the main process when the AOFRW completes to drain the remaining part of the buffer and fsync it.
* double disk IO for the data that arrives during AOFRW (had to be written to both the old and new AOF files)
The main modifications of this PR:
1. Remove the AOF rewrite buffer and related code.
2. Divide the AOF into multiple files, they are classified as two types, one is the the `BASE` type,
it represents the full amount of data (Maybe AOF or RDB format) after each AOFRW, there is only
one `BASE` file at most. The second is `INCR` type, may have more than one. They represent the
incremental commands since the last AOFRW.
3. Use a AOF manifest file to record and manage these AOF files mentioned above.
4. The original configuration of `appendfilename` will be the base part of the new file name, for example:
`appendonly.aof.1.base.rdb` and `appendonly.aof.2.incr.aof`
5. Add manifest-related TCL tests, and modified some existing tests that depend on the `appendfilename`
6. Remove the `aof_rewrite_buffer_length` field in info.
7. Add `aof-disable-auto-gc` configuration. By default we're automatically deleting HISTORY type AOFs.
It also gives users the opportunity to preserve the history AOFs. just for testing use now.
8. Add AOFRW limiting measure. When the AOFRW failures reaches the threshold (3 times now),
we will delay the execution of the next AOFRW by 1 minute. If the next AOFRW also fails, it will be
delayed by 2 minutes. The next is 4, 8, 16, the maximum delay is 60 minutes (1 hour). During the limit
period, we can still use the 'bgrewriteaof' command to execute AOFRW immediately.
9. Support upgrade (load) data from old version redis.
10. Add `appenddirname` configuration, as the directory name of the append only files. All AOF files and
manifest file will be placed in this directory.
11. Only the last AOF file (BASE or INCR) can be truncated. Otherwise redis will exit even if
`aof-load-truncated` is enabled.
Co-authored-by: Oran Agra <oran@redislabs.com>
2022-01-04 01:14:13 +08:00
latencyAddSampleIfNeeded ( " aof-fstat " , latency ) ;
sdsfree ( aof_filepath ) ;
return size ;
}
2022-02-22 08:59:23 +02:00
/* Get size of all AOF files referred by the manifest (excluding history).
* The status argument is an output argument to be filled with
* one of the AOF_ status values . */
off_t getBaseAndIncrAppendOnlyFilesSize ( aofManifest * am , int * status ) {
Implement Multi Part AOF mechanism to avoid AOFRW overheads. (#9788)
Implement Multi-Part AOF mechanism to avoid overheads during AOFRW.
Introducing a folder with multiple AOF files tracked by a manifest file.
The main issues with the the original AOFRW mechanism are:
* buffering of commands that are processed during rewrite (consuming a lot of RAM)
* freezes of the main process when the AOFRW completes to drain the remaining part of the buffer and fsync it.
* double disk IO for the data that arrives during AOFRW (had to be written to both the old and new AOF files)
The main modifications of this PR:
1. Remove the AOF rewrite buffer and related code.
2. Divide the AOF into multiple files, they are classified as two types, one is the the `BASE` type,
it represents the full amount of data (Maybe AOF or RDB format) after each AOFRW, there is only
one `BASE` file at most. The second is `INCR` type, may have more than one. They represent the
incremental commands since the last AOFRW.
3. Use a AOF manifest file to record and manage these AOF files mentioned above.
4. The original configuration of `appendfilename` will be the base part of the new file name, for example:
`appendonly.aof.1.base.rdb` and `appendonly.aof.2.incr.aof`
5. Add manifest-related TCL tests, and modified some existing tests that depend on the `appendfilename`
6. Remove the `aof_rewrite_buffer_length` field in info.
7. Add `aof-disable-auto-gc` configuration. By default we're automatically deleting HISTORY type AOFs.
It also gives users the opportunity to preserve the history AOFs. just for testing use now.
8. Add AOFRW limiting measure. When the AOFRW failures reaches the threshold (3 times now),
we will delay the execution of the next AOFRW by 1 minute. If the next AOFRW also fails, it will be
delayed by 2 minutes. The next is 4, 8, 16, the maximum delay is 60 minutes (1 hour). During the limit
period, we can still use the 'bgrewriteaof' command to execute AOFRW immediately.
9. Support upgrade (load) data from old version redis.
10. Add `appenddirname` configuration, as the directory name of the append only files. All AOF files and
manifest file will be placed in this directory.
11. Only the last AOF file (BASE or INCR) can be truncated. Otherwise redis will exit even if
`aof-load-truncated` is enabled.
Co-authored-by: Oran Agra <oran@redislabs.com>
2022-01-04 01:14:13 +08:00
off_t size = 0 ;
listNode * ln ;
listIter li ;
if ( am - > base_aof_info ) {
serverAssert ( am - > base_aof_info - > file_type = = AOF_FILE_TYPE_BASE ) ;
2022-02-22 08:59:23 +02:00
size + = getAppendOnlyFileSize ( am - > base_aof_info - > file_name , status ) ;
if ( * status ! = AOF_OK ) return 0 ;
Implement Multi Part AOF mechanism to avoid AOFRW overheads. (#9788)
Implement Multi-Part AOF mechanism to avoid overheads during AOFRW.
Introducing a folder with multiple AOF files tracked by a manifest file.
The main issues with the the original AOFRW mechanism are:
* buffering of commands that are processed during rewrite (consuming a lot of RAM)
* freezes of the main process when the AOFRW completes to drain the remaining part of the buffer and fsync it.
* double disk IO for the data that arrives during AOFRW (had to be written to both the old and new AOF files)
The main modifications of this PR:
1. Remove the AOF rewrite buffer and related code.
2. Divide the AOF into multiple files, they are classified as two types, one is the the `BASE` type,
it represents the full amount of data (Maybe AOF or RDB format) after each AOFRW, there is only
one `BASE` file at most. The second is `INCR` type, may have more than one. They represent the
incremental commands since the last AOFRW.
3. Use a AOF manifest file to record and manage these AOF files mentioned above.
4. The original configuration of `appendfilename` will be the base part of the new file name, for example:
`appendonly.aof.1.base.rdb` and `appendonly.aof.2.incr.aof`
5. Add manifest-related TCL tests, and modified some existing tests that depend on the `appendfilename`
6. Remove the `aof_rewrite_buffer_length` field in info.
7. Add `aof-disable-auto-gc` configuration. By default we're automatically deleting HISTORY type AOFs.
It also gives users the opportunity to preserve the history AOFs. just for testing use now.
8. Add AOFRW limiting measure. When the AOFRW failures reaches the threshold (3 times now),
we will delay the execution of the next AOFRW by 1 minute. If the next AOFRW also fails, it will be
delayed by 2 minutes. The next is 4, 8, 16, the maximum delay is 60 minutes (1 hour). During the limit
period, we can still use the 'bgrewriteaof' command to execute AOFRW immediately.
9. Support upgrade (load) data from old version redis.
10. Add `appenddirname` configuration, as the directory name of the append only files. All AOF files and
manifest file will be placed in this directory.
11. Only the last AOF file (BASE or INCR) can be truncated. Otherwise redis will exit even if
`aof-load-truncated` is enabled.
Co-authored-by: Oran Agra <oran@redislabs.com>
2022-01-04 01:14:13 +08:00
}
listRewind ( am - > incr_aof_list , & li ) ;
while ( ( ln = listNext ( & li ) ) ! = NULL ) {
aofInfo * ai = ( aofInfo * ) ln - > value ;
serverAssert ( ai - > file_type = = AOF_FILE_TYPE_INCR ) ;
2022-02-22 08:59:23 +02:00
size + = getAppendOnlyFileSize ( ai - > file_name , status ) ;
if ( * status ! = AOF_OK ) return 0 ;
Implement Multi Part AOF mechanism to avoid AOFRW overheads. (#9788)
Implement Multi-Part AOF mechanism to avoid overheads during AOFRW.
Introducing a folder with multiple AOF files tracked by a manifest file.
The main issues with the the original AOFRW mechanism are:
* buffering of commands that are processed during rewrite (consuming a lot of RAM)
* freezes of the main process when the AOFRW completes to drain the remaining part of the buffer and fsync it.
* double disk IO for the data that arrives during AOFRW (had to be written to both the old and new AOF files)
The main modifications of this PR:
1. Remove the AOF rewrite buffer and related code.
2. Divide the AOF into multiple files, they are classified as two types, one is the the `BASE` type,
it represents the full amount of data (Maybe AOF or RDB format) after each AOFRW, there is only
one `BASE` file at most. The second is `INCR` type, may have more than one. They represent the
incremental commands since the last AOFRW.
3. Use a AOF manifest file to record and manage these AOF files mentioned above.
4. The original configuration of `appendfilename` will be the base part of the new file name, for example:
`appendonly.aof.1.base.rdb` and `appendonly.aof.2.incr.aof`
5. Add manifest-related TCL tests, and modified some existing tests that depend on the `appendfilename`
6. Remove the `aof_rewrite_buffer_length` field in info.
7. Add `aof-disable-auto-gc` configuration. By default we're automatically deleting HISTORY type AOFs.
It also gives users the opportunity to preserve the history AOFs. just for testing use now.
8. Add AOFRW limiting measure. When the AOFRW failures reaches the threshold (3 times now),
we will delay the execution of the next AOFRW by 1 minute. If the next AOFRW also fails, it will be
delayed by 2 minutes. The next is 4, 8, 16, the maximum delay is 60 minutes (1 hour). During the limit
period, we can still use the 'bgrewriteaof' command to execute AOFRW immediately.
9. Support upgrade (load) data from old version redis.
10. Add `appenddirname` configuration, as the directory name of the append only files. All AOF files and
manifest file will be placed in this directory.
11. Only the last AOF file (BASE or INCR) can be truncated. Otherwise redis will exit even if
`aof-load-truncated` is enabled.
Co-authored-by: Oran Agra <oran@redislabs.com>
2022-01-04 01:14:13 +08:00
}
return size ;
}
int getBaseAndIncrAppendOnlyFilesNum ( aofManifest * am ) {
int num = 0 ;
if ( am - > base_aof_info ) num + + ;
if ( am - > incr_aof_list ) num + = listLength ( am - > incr_aof_list ) ;
return num ;
2011-06-10 12:39:23 +02:00
}
2010-06-22 00:07:48 +02:00
/* A background append only file rewriting (BGREWRITEAOF) terminated its work.
* Handle this . */
2011-01-07 18:15:14 +01:00
void backgroundRewriteDoneHandler ( int exitcode , int bysignal ) {
2010-06-22 00:07:48 +02:00
if ( ! bysignal & & exitcode = = 0 ) {
char tmpfile [ 256 ] ;
2011-08-18 15:49:06 +02:00
long long now = ustime ( ) ;
2022-04-27 13:07:52 +08:00
sds new_base_filepath = NULL ;
sds new_incr_filepath = NULL ;
Implement Multi Part AOF mechanism to avoid AOFRW overheads. (#9788)
Implement Multi-Part AOF mechanism to avoid overheads during AOFRW.
Introducing a folder with multiple AOF files tracked by a manifest file.
The main issues with the the original AOFRW mechanism are:
* buffering of commands that are processed during rewrite (consuming a lot of RAM)
* freezes of the main process when the AOFRW completes to drain the remaining part of the buffer and fsync it.
* double disk IO for the data that arrives during AOFRW (had to be written to both the old and new AOF files)
The main modifications of this PR:
1. Remove the AOF rewrite buffer and related code.
2. Divide the AOF into multiple files, they are classified as two types, one is the the `BASE` type,
it represents the full amount of data (Maybe AOF or RDB format) after each AOFRW, there is only
one `BASE` file at most. The second is `INCR` type, may have more than one. They represent the
incremental commands since the last AOFRW.
3. Use a AOF manifest file to record and manage these AOF files mentioned above.
4. The original configuration of `appendfilename` will be the base part of the new file name, for example:
`appendonly.aof.1.base.rdb` and `appendonly.aof.2.incr.aof`
5. Add manifest-related TCL tests, and modified some existing tests that depend on the `appendfilename`
6. Remove the `aof_rewrite_buffer_length` field in info.
7. Add `aof-disable-auto-gc` configuration. By default we're automatically deleting HISTORY type AOFs.
It also gives users the opportunity to preserve the history AOFs. just for testing use now.
8. Add AOFRW limiting measure. When the AOFRW failures reaches the threshold (3 times now),
we will delay the execution of the next AOFRW by 1 minute. If the next AOFRW also fails, it will be
delayed by 2 minutes. The next is 4, 8, 16, the maximum delay is 60 minutes (1 hour). During the limit
period, we can still use the 'bgrewriteaof' command to execute AOFRW immediately.
9. Support upgrade (load) data from old version redis.
10. Add `appenddirname` configuration, as the directory name of the append only files. All AOF files and
manifest file will be placed in this directory.
11. Only the last AOF file (BASE or INCR) can be truncated. Otherwise redis will exit even if
`aof-load-truncated` is enabled.
Co-authored-by: Oran Agra <oran@redislabs.com>
2022-01-04 01:14:13 +08:00
aofManifest * temp_am ;
2014-07-01 17:19:08 +02:00
mstime_t latency ;
2010-06-22 00:07:48 +02:00
2015-07-27 09:41:48 +02:00
serverLog ( LL_NOTICE ,
2011-08-18 15:49:06 +02:00
" Background AOF rewrite terminated with success " ) ;
Implement Multi Part AOF mechanism to avoid AOFRW overheads. (#9788)
Implement Multi-Part AOF mechanism to avoid overheads during AOFRW.
Introducing a folder with multiple AOF files tracked by a manifest file.
The main issues with the the original AOFRW mechanism are:
* buffering of commands that are processed during rewrite (consuming a lot of RAM)
* freezes of the main process when the AOFRW completes to drain the remaining part of the buffer and fsync it.
* double disk IO for the data that arrives during AOFRW (had to be written to both the old and new AOF files)
The main modifications of this PR:
1. Remove the AOF rewrite buffer and related code.
2. Divide the AOF into multiple files, they are classified as two types, one is the the `BASE` type,
it represents the full amount of data (Maybe AOF or RDB format) after each AOFRW, there is only
one `BASE` file at most. The second is `INCR` type, may have more than one. They represent the
incremental commands since the last AOFRW.
3. Use a AOF manifest file to record and manage these AOF files mentioned above.
4. The original configuration of `appendfilename` will be the base part of the new file name, for example:
`appendonly.aof.1.base.rdb` and `appendonly.aof.2.incr.aof`
5. Add manifest-related TCL tests, and modified some existing tests that depend on the `appendfilename`
6. Remove the `aof_rewrite_buffer_length` field in info.
7. Add `aof-disable-auto-gc` configuration. By default we're automatically deleting HISTORY type AOFs.
It also gives users the opportunity to preserve the history AOFs. just for testing use now.
8. Add AOFRW limiting measure. When the AOFRW failures reaches the threshold (3 times now),
we will delay the execution of the next AOFRW by 1 minute. If the next AOFRW also fails, it will be
delayed by 2 minutes. The next is 4, 8, 16, the maximum delay is 60 minutes (1 hour). During the limit
period, we can still use the 'bgrewriteaof' command to execute AOFRW immediately.
9. Support upgrade (load) data from old version redis.
10. Add `appenddirname` configuration, as the directory name of the append only files. All AOF files and
manifest file will be placed in this directory.
11. Only the last AOF file (BASE or INCR) can be truncated. Otherwise redis will exit even if
`aof-load-truncated` is enabled.
Co-authored-by: Oran Agra <oran@redislabs.com>
2022-01-04 01:14:13 +08:00
snprintf ( tmpfile , 256 , " temp-rewriteaof-bg-%d.aof " ,
Refactory fork child related infra, Unify child pid
This is a refactory commit, isn't suppose to have any actual impact.
it does the following:
- keep just one server struct fork child pid variable instead of 3
- have one server struct variable indicating the purpose of the current fork
child.
- redisFork is now responsible of updating the server struct with the pid,
which means it can be the one that calls updateDictResizePolicy
- move child info pipe handling into redisFork instead of having them
repeated outside
- there are two classes of fork purposes, mutually exclusive group (AOF, RDB,
Module), and one that can create several forks to coexist in parallel (LDB,
but maybe Modules some day too, Module API allows for that).
- minor fix to killRDBChild:
unlike killAppendOnlyChild and TerminateModuleForkChild, the killRDBChild
doesn't clear the pid variable or call wait4, so checkChildrenDone does
the cleanup for it.
This commit removes the explicit calls to rdbRemoveTempFile, closeChildInfoPipe,
updateDictResizePolicy, which didn't do any harm, but where unnecessary.
2020-12-16 15:14:04 +02:00
( int ) server . child_pid ) ;
2011-08-18 15:49:06 +02:00
Implement Multi Part AOF mechanism to avoid AOFRW overheads. (#9788)
Implement Multi-Part AOF mechanism to avoid overheads during AOFRW.
Introducing a folder with multiple AOF files tracked by a manifest file.
The main issues with the the original AOFRW mechanism are:
* buffering of commands that are processed during rewrite (consuming a lot of RAM)
* freezes of the main process when the AOFRW completes to drain the remaining part of the buffer and fsync it.
* double disk IO for the data that arrives during AOFRW (had to be written to both the old and new AOF files)
The main modifications of this PR:
1. Remove the AOF rewrite buffer and related code.
2. Divide the AOF into multiple files, they are classified as two types, one is the the `BASE` type,
it represents the full amount of data (Maybe AOF or RDB format) after each AOFRW, there is only
one `BASE` file at most. The second is `INCR` type, may have more than one. They represent the
incremental commands since the last AOFRW.
3. Use a AOF manifest file to record and manage these AOF files mentioned above.
4. The original configuration of `appendfilename` will be the base part of the new file name, for example:
`appendonly.aof.1.base.rdb` and `appendonly.aof.2.incr.aof`
5. Add manifest-related TCL tests, and modified some existing tests that depend on the `appendfilename`
6. Remove the `aof_rewrite_buffer_length` field in info.
7. Add `aof-disable-auto-gc` configuration. By default we're automatically deleting HISTORY type AOFs.
It also gives users the opportunity to preserve the history AOFs. just for testing use now.
8. Add AOFRW limiting measure. When the AOFRW failures reaches the threshold (3 times now),
we will delay the execution of the next AOFRW by 1 minute. If the next AOFRW also fails, it will be
delayed by 2 minutes. The next is 4, 8, 16, the maximum delay is 60 minutes (1 hour). During the limit
period, we can still use the 'bgrewriteaof' command to execute AOFRW immediately.
9. Support upgrade (load) data from old version redis.
10. Add `appenddirname` configuration, as the directory name of the append only files. All AOF files and
manifest file will be placed in this directory.
11. Only the last AOF file (BASE or INCR) can be truncated. Otherwise redis will exit even if
`aof-load-truncated` is enabled.
Co-authored-by: Oran Agra <oran@redislabs.com>
2022-01-04 01:14:13 +08:00
serverAssert ( server . aof_manifest ! = NULL ) ;
2011-08-18 15:49:06 +02:00
Implement Multi Part AOF mechanism to avoid AOFRW overheads. (#9788)
Implement Multi-Part AOF mechanism to avoid overheads during AOFRW.
Introducing a folder with multiple AOF files tracked by a manifest file.
The main issues with the the original AOFRW mechanism are:
* buffering of commands that are processed during rewrite (consuming a lot of RAM)
* freezes of the main process when the AOFRW completes to drain the remaining part of the buffer and fsync it.
* double disk IO for the data that arrives during AOFRW (had to be written to both the old and new AOF files)
The main modifications of this PR:
1. Remove the AOF rewrite buffer and related code.
2. Divide the AOF into multiple files, they are classified as two types, one is the the `BASE` type,
it represents the full amount of data (Maybe AOF or RDB format) after each AOFRW, there is only
one `BASE` file at most. The second is `INCR` type, may have more than one. They represent the
incremental commands since the last AOFRW.
3. Use a AOF manifest file to record and manage these AOF files mentioned above.
4. The original configuration of `appendfilename` will be the base part of the new file name, for example:
`appendonly.aof.1.base.rdb` and `appendonly.aof.2.incr.aof`
5. Add manifest-related TCL tests, and modified some existing tests that depend on the `appendfilename`
6. Remove the `aof_rewrite_buffer_length` field in info.
7. Add `aof-disable-auto-gc` configuration. By default we're automatically deleting HISTORY type AOFs.
It also gives users the opportunity to preserve the history AOFs. just for testing use now.
8. Add AOFRW limiting measure. When the AOFRW failures reaches the threshold (3 times now),
we will delay the execution of the next AOFRW by 1 minute. If the next AOFRW also fails, it will be
delayed by 2 minutes. The next is 4, 8, 16, the maximum delay is 60 minutes (1 hour). During the limit
period, we can still use the 'bgrewriteaof' command to execute AOFRW immediately.
9. Support upgrade (load) data from old version redis.
10. Add `appenddirname` configuration, as the directory name of the append only files. All AOF files and
manifest file will be placed in this directory.
11. Only the last AOF file (BASE or INCR) can be truncated. Otherwise redis will exit even if
`aof-load-truncated` is enabled.
Co-authored-by: Oran Agra <oran@redislabs.com>
2022-01-04 01:14:13 +08:00
/* Dup a temporary aof_manifest for subsequent modifications. */
temp_am = aofManifestDup ( server . aof_manifest ) ;
/* Get a new BASE file name and mark the previous (if we have)
* as the HISTORY type . */
2022-04-27 13:07:52 +08:00
sds new_base_filename = getNewBaseFileNameAndMarkPreAsHistory ( temp_am ) ;
Implement Multi Part AOF mechanism to avoid AOFRW overheads. (#9788)
Implement Multi-Part AOF mechanism to avoid overheads during AOFRW.
Introducing a folder with multiple AOF files tracked by a manifest file.
The main issues with the the original AOFRW mechanism are:
* buffering of commands that are processed during rewrite (consuming a lot of RAM)
* freezes of the main process when the AOFRW completes to drain the remaining part of the buffer and fsync it.
* double disk IO for the data that arrives during AOFRW (had to be written to both the old and new AOF files)
The main modifications of this PR:
1. Remove the AOF rewrite buffer and related code.
2. Divide the AOF into multiple files, they are classified as two types, one is the the `BASE` type,
it represents the full amount of data (Maybe AOF or RDB format) after each AOFRW, there is only
one `BASE` file at most. The second is `INCR` type, may have more than one. They represent the
incremental commands since the last AOFRW.
3. Use a AOF manifest file to record and manage these AOF files mentioned above.
4. The original configuration of `appendfilename` will be the base part of the new file name, for example:
`appendonly.aof.1.base.rdb` and `appendonly.aof.2.incr.aof`
5. Add manifest-related TCL tests, and modified some existing tests that depend on the `appendfilename`
6. Remove the `aof_rewrite_buffer_length` field in info.
7. Add `aof-disable-auto-gc` configuration. By default we're automatically deleting HISTORY type AOFs.
It also gives users the opportunity to preserve the history AOFs. just for testing use now.
8. Add AOFRW limiting measure. When the AOFRW failures reaches the threshold (3 times now),
we will delay the execution of the next AOFRW by 1 minute. If the next AOFRW also fails, it will be
delayed by 2 minutes. The next is 4, 8, 16, the maximum delay is 60 minutes (1 hour). During the limit
period, we can still use the 'bgrewriteaof' command to execute AOFRW immediately.
9. Support upgrade (load) data from old version redis.
10. Add `appenddirname` configuration, as the directory name of the append only files. All AOF files and
manifest file will be placed in this directory.
11. Only the last AOF file (BASE or INCR) can be truncated. Otherwise redis will exit even if
`aof-load-truncated` is enabled.
Co-authored-by: Oran Agra <oran@redislabs.com>
2022-01-04 01:14:13 +08:00
serverAssert ( new_base_filename ! = NULL ) ;
2022-04-27 13:07:52 +08:00
new_base_filepath = makePath ( server . aof_dirname , new_base_filename ) ;
2011-08-18 15:49:06 +02:00
Implement Multi Part AOF mechanism to avoid AOFRW overheads. (#9788)
Implement Multi-Part AOF mechanism to avoid overheads during AOFRW.
Introducing a folder with multiple AOF files tracked by a manifest file.
The main issues with the the original AOFRW mechanism are:
* buffering of commands that are processed during rewrite (consuming a lot of RAM)
* freezes of the main process when the AOFRW completes to drain the remaining part of the buffer and fsync it.
* double disk IO for the data that arrives during AOFRW (had to be written to both the old and new AOF files)
The main modifications of this PR:
1. Remove the AOF rewrite buffer and related code.
2. Divide the AOF into multiple files, they are classified as two types, one is the the `BASE` type,
it represents the full amount of data (Maybe AOF or RDB format) after each AOFRW, there is only
one `BASE` file at most. The second is `INCR` type, may have more than one. They represent the
incremental commands since the last AOFRW.
3. Use a AOF manifest file to record and manage these AOF files mentioned above.
4. The original configuration of `appendfilename` will be the base part of the new file name, for example:
`appendonly.aof.1.base.rdb` and `appendonly.aof.2.incr.aof`
5. Add manifest-related TCL tests, and modified some existing tests that depend on the `appendfilename`
6. Remove the `aof_rewrite_buffer_length` field in info.
7. Add `aof-disable-auto-gc` configuration. By default we're automatically deleting HISTORY type AOFs.
It also gives users the opportunity to preserve the history AOFs. just for testing use now.
8. Add AOFRW limiting measure. When the AOFRW failures reaches the threshold (3 times now),
we will delay the execution of the next AOFRW by 1 minute. If the next AOFRW also fails, it will be
delayed by 2 minutes. The next is 4, 8, 16, the maximum delay is 60 minutes (1 hour). During the limit
period, we can still use the 'bgrewriteaof' command to execute AOFRW immediately.
9. Support upgrade (load) data from old version redis.
10. Add `appenddirname` configuration, as the directory name of the append only files. All AOF files and
manifest file will be placed in this directory.
11. Only the last AOF file (BASE or INCR) can be truncated. Otherwise redis will exit even if
`aof-load-truncated` is enabled.
Co-authored-by: Oran Agra <oran@redislabs.com>
2022-01-04 01:14:13 +08:00
/* Rename the temporary aof file to 'new_base_filename'. */
2014-07-01 17:19:08 +02:00
latencyStartMonitor ( latency ) ;
Implement Multi Part AOF mechanism to avoid AOFRW overheads. (#9788)
Implement Multi-Part AOF mechanism to avoid overheads during AOFRW.
Introducing a folder with multiple AOF files tracked by a manifest file.
The main issues with the the original AOFRW mechanism are:
* buffering of commands that are processed during rewrite (consuming a lot of RAM)
* freezes of the main process when the AOFRW completes to drain the remaining part of the buffer and fsync it.
* double disk IO for the data that arrives during AOFRW (had to be written to both the old and new AOF files)
The main modifications of this PR:
1. Remove the AOF rewrite buffer and related code.
2. Divide the AOF into multiple files, they are classified as two types, one is the the `BASE` type,
it represents the full amount of data (Maybe AOF or RDB format) after each AOFRW, there is only
one `BASE` file at most. The second is `INCR` type, may have more than one. They represent the
incremental commands since the last AOFRW.
3. Use a AOF manifest file to record and manage these AOF files mentioned above.
4. The original configuration of `appendfilename` will be the base part of the new file name, for example:
`appendonly.aof.1.base.rdb` and `appendonly.aof.2.incr.aof`
5. Add manifest-related TCL tests, and modified some existing tests that depend on the `appendfilename`
6. Remove the `aof_rewrite_buffer_length` field in info.
7. Add `aof-disable-auto-gc` configuration. By default we're automatically deleting HISTORY type AOFs.
It also gives users the opportunity to preserve the history AOFs. just for testing use now.
8. Add AOFRW limiting measure. When the AOFRW failures reaches the threshold (3 times now),
we will delay the execution of the next AOFRW by 1 minute. If the next AOFRW also fails, it will be
delayed by 2 minutes. The next is 4, 8, 16, the maximum delay is 60 minutes (1 hour). During the limit
period, we can still use the 'bgrewriteaof' command to execute AOFRW immediately.
9. Support upgrade (load) data from old version redis.
10. Add `appenddirname` configuration, as the directory name of the append only files. All AOF files and
manifest file will be placed in this directory.
11. Only the last AOF file (BASE or INCR) can be truncated. Otherwise redis will exit even if
`aof-load-truncated` is enabled.
Co-authored-by: Oran Agra <oran@redislabs.com>
2022-01-04 01:14:13 +08:00
if ( rename ( tmpfile , new_base_filepath ) = = - 1 ) {
2015-07-27 09:41:48 +02:00
serverLog ( LL_WARNING ,
2022-05-26 22:23:05 +08:00
" Error trying to rename the temporary AOF base file %s into %s: %s " ,
2016-02-15 16:14:56 +01:00
tmpfile ,
2022-04-26 21:31:19 +08:00
new_base_filepath ,
2016-02-15 16:14:56 +01:00
strerror ( errno ) ) ;
Implement Multi Part AOF mechanism to avoid AOFRW overheads. (#9788)
Implement Multi-Part AOF mechanism to avoid overheads during AOFRW.
Introducing a folder with multiple AOF files tracked by a manifest file.
The main issues with the the original AOFRW mechanism are:
* buffering of commands that are processed during rewrite (consuming a lot of RAM)
* freezes of the main process when the AOFRW completes to drain the remaining part of the buffer and fsync it.
* double disk IO for the data that arrives during AOFRW (had to be written to both the old and new AOF files)
The main modifications of this PR:
1. Remove the AOF rewrite buffer and related code.
2. Divide the AOF into multiple files, they are classified as two types, one is the the `BASE` type,
it represents the full amount of data (Maybe AOF or RDB format) after each AOFRW, there is only
one `BASE` file at most. The second is `INCR` type, may have more than one. They represent the
incremental commands since the last AOFRW.
3. Use a AOF manifest file to record and manage these AOF files mentioned above.
4. The original configuration of `appendfilename` will be the base part of the new file name, for example:
`appendonly.aof.1.base.rdb` and `appendonly.aof.2.incr.aof`
5. Add manifest-related TCL tests, and modified some existing tests that depend on the `appendfilename`
6. Remove the `aof_rewrite_buffer_length` field in info.
7. Add `aof-disable-auto-gc` configuration. By default we're automatically deleting HISTORY type AOFs.
It also gives users the opportunity to preserve the history AOFs. just for testing use now.
8. Add AOFRW limiting measure. When the AOFRW failures reaches the threshold (3 times now),
we will delay the execution of the next AOFRW by 1 minute. If the next AOFRW also fails, it will be
delayed by 2 minutes. The next is 4, 8, 16, the maximum delay is 60 minutes (1 hour). During the limit
period, we can still use the 'bgrewriteaof' command to execute AOFRW immediately.
9. Support upgrade (load) data from old version redis.
10. Add `appenddirname` configuration, as the directory name of the append only files. All AOF files and
manifest file will be placed in this directory.
11. Only the last AOF file (BASE or INCR) can be truncated. Otherwise redis will exit even if
`aof-load-truncated` is enabled.
Co-authored-by: Oran Agra <oran@redislabs.com>
2022-01-04 01:14:13 +08:00
aofManifestFree ( temp_am ) ;
sdsfree ( new_base_filepath ) ;
2022-07-03 20:36:42 +08:00
server . aof_lastbgrewrite_status = C_ERR ;
server . stat_aofrw_consecutive_failures + + ;
2010-06-22 00:07:48 +02:00
goto cleanup ;
}
2014-07-01 17:19:08 +02:00
latencyEndMonitor ( latency ) ;
Implement Multi Part AOF mechanism to avoid AOFRW overheads. (#9788)
Implement Multi-Part AOF mechanism to avoid overheads during AOFRW.
Introducing a folder with multiple AOF files tracked by a manifest file.
The main issues with the the original AOFRW mechanism are:
* buffering of commands that are processed during rewrite (consuming a lot of RAM)
* freezes of the main process when the AOFRW completes to drain the remaining part of the buffer and fsync it.
* double disk IO for the data that arrives during AOFRW (had to be written to both the old and new AOF files)
The main modifications of this PR:
1. Remove the AOF rewrite buffer and related code.
2. Divide the AOF into multiple files, they are classified as two types, one is the the `BASE` type,
it represents the full amount of data (Maybe AOF or RDB format) after each AOFRW, there is only
one `BASE` file at most. The second is `INCR` type, may have more than one. They represent the
incremental commands since the last AOFRW.
3. Use a AOF manifest file to record and manage these AOF files mentioned above.
4. The original configuration of `appendfilename` will be the base part of the new file name, for example:
`appendonly.aof.1.base.rdb` and `appendonly.aof.2.incr.aof`
5. Add manifest-related TCL tests, and modified some existing tests that depend on the `appendfilename`
6. Remove the `aof_rewrite_buffer_length` field in info.
7. Add `aof-disable-auto-gc` configuration. By default we're automatically deleting HISTORY type AOFs.
It also gives users the opportunity to preserve the history AOFs. just for testing use now.
8. Add AOFRW limiting measure. When the AOFRW failures reaches the threshold (3 times now),
we will delay the execution of the next AOFRW by 1 minute. If the next AOFRW also fails, it will be
delayed by 2 minutes. The next is 4, 8, 16, the maximum delay is 60 minutes (1 hour). During the limit
period, we can still use the 'bgrewriteaof' command to execute AOFRW immediately.
9. Support upgrade (load) data from old version redis.
10. Add `appenddirname` configuration, as the directory name of the append only files. All AOF files and
manifest file will be placed in this directory.
11. Only the last AOF file (BASE or INCR) can be truncated. Otherwise redis will exit even if
`aof-load-truncated` is enabled.
Co-authored-by: Oran Agra <oran@redislabs.com>
2022-01-04 01:14:13 +08:00
latencyAddSampleIfNeeded ( " aof-rename " , latency ) ;
2022-05-26 22:23:05 +08:00
serverLog ( LL_NOTICE ,
" Successfully renamed the temporary AOF base file %s into %s " , tmpfile , new_base_filename ) ;
2011-08-18 15:49:06 +02:00
2022-04-26 21:31:19 +08:00
/* Rename the temporary incr aof file to 'new_incr_filename'. */
if ( server . aof_state = = AOF_WAIT_REWRITE ) {
/* Get temporary incr aof name. */
sds temp_incr_aof_name = getTempIncrAofName ( ) ;
sds temp_incr_filepath = makePath ( server . aof_dirname , temp_incr_aof_name ) ;
/* Get next new incr aof name. */
sds new_incr_filename = getNewIncrAofName ( temp_am ) ;
2022-04-27 13:07:52 +08:00
new_incr_filepath = makePath ( server . aof_dirname , new_incr_filename ) ;
2022-04-26 21:31:19 +08:00
latencyStartMonitor ( latency ) ;
if ( rename ( temp_incr_filepath , new_incr_filepath ) = = - 1 ) {
serverLog ( LL_WARNING ,
2022-05-26 22:23:05 +08:00
" Error trying to rename the temporary AOF incr file %s into %s: %s " ,
2022-04-26 21:31:19 +08:00
temp_incr_filepath ,
new_incr_filepath ,
strerror ( errno ) ) ;
bg_unlink ( new_base_filepath ) ;
sdsfree ( new_base_filepath ) ;
aofManifestFree ( temp_am ) ;
sdsfree ( temp_incr_filepath ) ;
sdsfree ( new_incr_filepath ) ;
2022-05-26 22:23:05 +08:00
sdsfree ( temp_incr_aof_name ) ;
2022-07-03 20:36:42 +08:00
server . aof_lastbgrewrite_status = C_ERR ;
server . stat_aofrw_consecutive_failures + + ;
2022-04-26 21:31:19 +08:00
goto cleanup ;
}
latencyEndMonitor ( latency ) ;
latencyAddSampleIfNeeded ( " aof-rename " , latency ) ;
2022-05-26 22:23:05 +08:00
serverLog ( LL_NOTICE ,
" Successfully renamed the temporary AOF incr file %s into %s " , temp_incr_aof_name , new_incr_filename ) ;
2022-04-26 21:31:19 +08:00
sdsfree ( temp_incr_filepath ) ;
2022-05-26 22:23:05 +08:00
sdsfree ( temp_incr_aof_name ) ;
2022-04-26 21:31:19 +08:00
}
Implement Multi Part AOF mechanism to avoid AOFRW overheads. (#9788)
Implement Multi-Part AOF mechanism to avoid overheads during AOFRW.
Introducing a folder with multiple AOF files tracked by a manifest file.
The main issues with the the original AOFRW mechanism are:
* buffering of commands that are processed during rewrite (consuming a lot of RAM)
* freezes of the main process when the AOFRW completes to drain the remaining part of the buffer and fsync it.
* double disk IO for the data that arrives during AOFRW (had to be written to both the old and new AOF files)
The main modifications of this PR:
1. Remove the AOF rewrite buffer and related code.
2. Divide the AOF into multiple files, they are classified as two types, one is the the `BASE` type,
it represents the full amount of data (Maybe AOF or RDB format) after each AOFRW, there is only
one `BASE` file at most. The second is `INCR` type, may have more than one. They represent the
incremental commands since the last AOFRW.
3. Use a AOF manifest file to record and manage these AOF files mentioned above.
4. The original configuration of `appendfilename` will be the base part of the new file name, for example:
`appendonly.aof.1.base.rdb` and `appendonly.aof.2.incr.aof`
5. Add manifest-related TCL tests, and modified some existing tests that depend on the `appendfilename`
6. Remove the `aof_rewrite_buffer_length` field in info.
7. Add `aof-disable-auto-gc` configuration. By default we're automatically deleting HISTORY type AOFs.
It also gives users the opportunity to preserve the history AOFs. just for testing use now.
8. Add AOFRW limiting measure. When the AOFRW failures reaches the threshold (3 times now),
we will delay the execution of the next AOFRW by 1 minute. If the next AOFRW also fails, it will be
delayed by 2 minutes. The next is 4, 8, 16, the maximum delay is 60 minutes (1 hour). During the limit
period, we can still use the 'bgrewriteaof' command to execute AOFRW immediately.
9. Support upgrade (load) data from old version redis.
10. Add `appenddirname` configuration, as the directory name of the append only files. All AOF files and
manifest file will be placed in this directory.
11. Only the last AOF file (BASE or INCR) can be truncated. Otherwise redis will exit even if
`aof-load-truncated` is enabled.
Co-authored-by: Oran Agra <oran@redislabs.com>
2022-01-04 01:14:13 +08:00
/* Change the AOF file type in 'incr_aof_list' from AOF_FILE_TYPE_INCR
* to AOF_FILE_TYPE_HIST , and move them to the ' history_aof_list ' . */
markRewrittenIncrAofAsHistory ( temp_am ) ;
/* Persist our modifications. */
if ( persistAofManifest ( temp_am ) = = C_ERR ) {
bg_unlink ( new_base_filepath ) ;
aofManifestFree ( temp_am ) ;
sdsfree ( new_base_filepath ) ;
2022-04-27 13:07:52 +08:00
if ( new_incr_filepath ) {
bg_unlink ( new_incr_filepath ) ;
sdsfree ( new_incr_filepath ) ;
}
2022-07-03 20:36:42 +08:00
server . aof_lastbgrewrite_status = C_ERR ;
server . stat_aofrw_consecutive_failures + + ;
Implement Multi Part AOF mechanism to avoid AOFRW overheads. (#9788)
Implement Multi-Part AOF mechanism to avoid overheads during AOFRW.
Introducing a folder with multiple AOF files tracked by a manifest file.
The main issues with the the original AOFRW mechanism are:
* buffering of commands that are processed during rewrite (consuming a lot of RAM)
* freezes of the main process when the AOFRW completes to drain the remaining part of the buffer and fsync it.
* double disk IO for the data that arrives during AOFRW (had to be written to both the old and new AOF files)
The main modifications of this PR:
1. Remove the AOF rewrite buffer and related code.
2. Divide the AOF into multiple files, they are classified as two types, one is the the `BASE` type,
it represents the full amount of data (Maybe AOF or RDB format) after each AOFRW, there is only
one `BASE` file at most. The second is `INCR` type, may have more than one. They represent the
incremental commands since the last AOFRW.
3. Use a AOF manifest file to record and manage these AOF files mentioned above.
4. The original configuration of `appendfilename` will be the base part of the new file name, for example:
`appendonly.aof.1.base.rdb` and `appendonly.aof.2.incr.aof`
5. Add manifest-related TCL tests, and modified some existing tests that depend on the `appendfilename`
6. Remove the `aof_rewrite_buffer_length` field in info.
7. Add `aof-disable-auto-gc` configuration. By default we're automatically deleting HISTORY type AOFs.
It also gives users the opportunity to preserve the history AOFs. just for testing use now.
8. Add AOFRW limiting measure. When the AOFRW failures reaches the threshold (3 times now),
we will delay the execution of the next AOFRW by 1 minute. If the next AOFRW also fails, it will be
delayed by 2 minutes. The next is 4, 8, 16, the maximum delay is 60 minutes (1 hour). During the limit
period, we can still use the 'bgrewriteaof' command to execute AOFRW immediately.
9. Support upgrade (load) data from old version redis.
10. Add `appenddirname` configuration, as the directory name of the append only files. All AOF files and
manifest file will be placed in this directory.
11. Only the last AOF file (BASE or INCR) can be truncated. Otherwise redis will exit even if
`aof-load-truncated` is enabled.
Co-authored-by: Oran Agra <oran@redislabs.com>
2022-01-04 01:14:13 +08:00
goto cleanup ;
}
sdsfree ( new_base_filepath ) ;
2022-04-27 13:07:52 +08:00
if ( new_incr_filepath ) sdsfree ( new_incr_filepath ) ;
Implement Multi Part AOF mechanism to avoid AOFRW overheads. (#9788)
Implement Multi-Part AOF mechanism to avoid overheads during AOFRW.
Introducing a folder with multiple AOF files tracked by a manifest file.
The main issues with the the original AOFRW mechanism are:
* buffering of commands that are processed during rewrite (consuming a lot of RAM)
* freezes of the main process when the AOFRW completes to drain the remaining part of the buffer and fsync it.
* double disk IO for the data that arrives during AOFRW (had to be written to both the old and new AOF files)
The main modifications of this PR:
1. Remove the AOF rewrite buffer and related code.
2. Divide the AOF into multiple files, they are classified as two types, one is the the `BASE` type,
it represents the full amount of data (Maybe AOF or RDB format) after each AOFRW, there is only
one `BASE` file at most. The second is `INCR` type, may have more than one. They represent the
incremental commands since the last AOFRW.
3. Use a AOF manifest file to record and manage these AOF files mentioned above.
4. The original configuration of `appendfilename` will be the base part of the new file name, for example:
`appendonly.aof.1.base.rdb` and `appendonly.aof.2.incr.aof`
5. Add manifest-related TCL tests, and modified some existing tests that depend on the `appendfilename`
6. Remove the `aof_rewrite_buffer_length` field in info.
7. Add `aof-disable-auto-gc` configuration. By default we're automatically deleting HISTORY type AOFs.
It also gives users the opportunity to preserve the history AOFs. just for testing use now.
8. Add AOFRW limiting measure. When the AOFRW failures reaches the threshold (3 times now),
we will delay the execution of the next AOFRW by 1 minute. If the next AOFRW also fails, it will be
delayed by 2 minutes. The next is 4, 8, 16, the maximum delay is 60 minutes (1 hour). During the limit
period, we can still use the 'bgrewriteaof' command to execute AOFRW immediately.
9. Support upgrade (load) data from old version redis.
10. Add `appenddirname` configuration, as the directory name of the append only files. All AOF files and
manifest file will be placed in this directory.
11. Only the last AOF file (BASE or INCR) can be truncated. Otherwise redis will exit even if
`aof-load-truncated` is enabled.
Co-authored-by: Oran Agra <oran@redislabs.com>
2022-01-04 01:14:13 +08:00
/* We can safely let `server.aof_manifest` point to 'temp_am' and free the previous one. */
aofManifestFreeAndUpdate ( temp_am ) ;
Fix fork done handler wrongly update fsync metrics and enhance AOF_ FSYNC_ALWAYS (#11973)
This PR fix several unrelated bugs that were discovered by the same set of tests
(WAITAOF tests in #11713), could make the `WAITAOF` test hang.
The change in `backgroundRewriteDoneHandler` is about MP-AOF.
That leftover / old code assumes that we started a new AOF file just now
(when we have a new base into which we're gonna incrementally write), but
the fact is that with MP-AOF, the fork done handler doesn't really affect the
incremental file being maintained by the parent process, there's no reason to
re-issue `SELECT`, and no reason to update any of the fsync variables in that flow.
This should have been deleted with MP-AOF (introduced in #9788, 7.0).
The damage is that the update to `aof_fsync_offset` will cause us to miss an fsync
in `flushAppendOnlyFile`, that happens if we stop write commands in `AOF_FSYNC_EVERYSEC`
while an AOFRW is in progress. This caused a new `WAITAOF` test to sometime hang forever.
Also because of MP-AOF, we needed to change `aof_fsync_offset` to `aof_last_incr_fsync_offset`
and match it to `aof_last_incr_size` in `flushAppendOnlyFile`. This is because in the past we compared
`aof_fsync_offset` and `aof_current_size`, but with MP-AOF it could be the total AOF file will be
smaller after AOFRW, and the (already existing) incr file still has data that needs to be fsynced.
The change in `flushAppendOnlyFile`, about the `AOF_FSYNC_ALWAYS`, it is follow #6053
(the details is in #5985), we also check `AOF_FSYNC_ALWAYS` to handle a case where
appendfsync is changed from everysec to always while there is data that's written but not yet fsynced.
2023-03-29 20:17:05 +08:00
if ( server . aof_state ! = AOF_OFF ) {
Implement Multi Part AOF mechanism to avoid AOFRW overheads. (#9788)
Implement Multi-Part AOF mechanism to avoid overheads during AOFRW.
Introducing a folder with multiple AOF files tracked by a manifest file.
The main issues with the the original AOFRW mechanism are:
* buffering of commands that are processed during rewrite (consuming a lot of RAM)
* freezes of the main process when the AOFRW completes to drain the remaining part of the buffer and fsync it.
* double disk IO for the data that arrives during AOFRW (had to be written to both the old and new AOF files)
The main modifications of this PR:
1. Remove the AOF rewrite buffer and related code.
2. Divide the AOF into multiple files, they are classified as two types, one is the the `BASE` type,
it represents the full amount of data (Maybe AOF or RDB format) after each AOFRW, there is only
one `BASE` file at most. The second is `INCR` type, may have more than one. They represent the
incremental commands since the last AOFRW.
3. Use a AOF manifest file to record and manage these AOF files mentioned above.
4. The original configuration of `appendfilename` will be the base part of the new file name, for example:
`appendonly.aof.1.base.rdb` and `appendonly.aof.2.incr.aof`
5. Add manifest-related TCL tests, and modified some existing tests that depend on the `appendfilename`
6. Remove the `aof_rewrite_buffer_length` field in info.
7. Add `aof-disable-auto-gc` configuration. By default we're automatically deleting HISTORY type AOFs.
It also gives users the opportunity to preserve the history AOFs. just for testing use now.
8. Add AOFRW limiting measure. When the AOFRW failures reaches the threshold (3 times now),
we will delay the execution of the next AOFRW by 1 minute. If the next AOFRW also fails, it will be
delayed by 2 minutes. The next is 4, 8, 16, the maximum delay is 60 minutes (1 hour). During the limit
period, we can still use the 'bgrewriteaof' command to execute AOFRW immediately.
9. Support upgrade (load) data from old version redis.
10. Add `appenddirname` configuration, as the directory name of the append only files. All AOF files and
manifest file will be placed in this directory.
11. Only the last AOF file (BASE or INCR) can be truncated. Otherwise redis will exit even if
`aof-load-truncated` is enabled.
Co-authored-by: Oran Agra <oran@redislabs.com>
2022-01-04 01:14:13 +08:00
/* AOF enabled. */
2022-02-22 08:59:23 +02:00
server . aof_current_size = getAppendOnlyFileSize ( new_base_filename , NULL ) + server . aof_last_incr_size ;
2011-12-21 11:58:42 +01:00
server . aof_rewrite_base_size = server . aof_current_size ;
2010-06-22 00:07:48 +02:00
}
2011-08-18 15:49:06 +02:00
Implement Multi Part AOF mechanism to avoid AOFRW overheads. (#9788)
Implement Multi-Part AOF mechanism to avoid overheads during AOFRW.
Introducing a folder with multiple AOF files tracked by a manifest file.
The main issues with the the original AOFRW mechanism are:
* buffering of commands that are processed during rewrite (consuming a lot of RAM)
* freezes of the main process when the AOFRW completes to drain the remaining part of the buffer and fsync it.
* double disk IO for the data that arrives during AOFRW (had to be written to both the old and new AOF files)
The main modifications of this PR:
1. Remove the AOF rewrite buffer and related code.
2. Divide the AOF into multiple files, they are classified as two types, one is the the `BASE` type,
it represents the full amount of data (Maybe AOF or RDB format) after each AOFRW, there is only
one `BASE` file at most. The second is `INCR` type, may have more than one. They represent the
incremental commands since the last AOFRW.
3. Use a AOF manifest file to record and manage these AOF files mentioned above.
4. The original configuration of `appendfilename` will be the base part of the new file name, for example:
`appendonly.aof.1.base.rdb` and `appendonly.aof.2.incr.aof`
5. Add manifest-related TCL tests, and modified some existing tests that depend on the `appendfilename`
6. Remove the `aof_rewrite_buffer_length` field in info.
7. Add `aof-disable-auto-gc` configuration. By default we're automatically deleting HISTORY type AOFs.
It also gives users the opportunity to preserve the history AOFs. just for testing use now.
8. Add AOFRW limiting measure. When the AOFRW failures reaches the threshold (3 times now),
we will delay the execution of the next AOFRW by 1 minute. If the next AOFRW also fails, it will be
delayed by 2 minutes. The next is 4, 8, 16, the maximum delay is 60 minutes (1 hour). During the limit
period, we can still use the 'bgrewriteaof' command to execute AOFRW immediately.
9. Support upgrade (load) data from old version redis.
10. Add `appenddirname` configuration, as the directory name of the append only files. All AOF files and
manifest file will be placed in this directory.
11. Only the last AOF file (BASE or INCR) can be truncated. Otherwise redis will exit even if
`aof-load-truncated` is enabled.
Co-authored-by: Oran Agra <oran@redislabs.com>
2022-01-04 01:14:13 +08:00
/* We don't care about the return value of `aofDelHistoryFiles`, because the history
* deletion failure will not cause any problems . */
aofDelHistoryFiles ( ) ;
2015-07-26 23:17:55 +02:00
server . aof_lastbgrewrite_status = C_OK ;
2022-04-26 21:31:19 +08:00
server . stat_aofrw_consecutive_failures = 0 ;
2012-07-17 12:06:53 +10:00
2015-07-27 09:41:48 +02:00
serverLog ( LL_NOTICE , " Background AOF rewrite finished successfully " ) ;
2011-12-21 10:31:34 +01:00
/* Change state from WAIT_REWRITE to ON if needed */
Implementing the WAITAOF command (issue #10505) (#11713)
Implementing the WAITAOF functionality which would allow the user to
block until a specified number of Redises have fsynced all previous write
commands to the AOF.
Syntax: `WAITAOF <num_local> <num_replicas> <timeout>`
Response: Array containing two elements: num_local, num_replicas
num_local is always either 0 or 1 representing the local AOF on the master.
num_replicas is the number of replicas that acknowledged the a replication
offset of the last write being fsynced to the AOF.
Returns an error when called on replicas, or when called with non-zero
num_local on a master with AOF disabled, in all other cases the response
just contains number of fsync copies.
Main changes:
* Added code to keep track of replication offsets that are confirmed to have
been fsynced to disk.
* Keep advancing master_repl_offset even when replication is disabled (and
there's no replication backlog, only if there's an AOF enabled).
This way we can use this command and it's mechanisms even when replication
is disabled.
* Extend REPLCONF ACK to `REPLCONF ACK <ofs> FACK <ofs>`, the FACK
will be appended only if there's an AOF on the replica, and already ignored on
old masters (thus backwards compatible)
* WAIT now no longer wait for the replication offset after your last command, but
rather the replication offset after your last write (or read command that caused
propagation, e.g. lazy expiry).
Unrelated changes:
* WAIT command respects CLIENT_DENY_BLOCKING (not just CLIENT_MULTI)
Implementation details:
* Add an atomic var named `fsynced_reploff_pending` that's updated
(usually by the bio thread) and later copied to the main `fsynced_reploff`
variable (only if the AOF base file exists).
I.e. during the initial AOF rewrite it will not be used as the fsynced offset
since the AOF base is still missing.
* Replace close+fsync bio job with new BIO_CLOSE_AOF (AOF specific)
job that will also update fsync offset the field.
* Handle all AOF jobs (BIO_CLOSE_AOF, BIO_AOF_FSYNC) in the same bio
worker thread, to impose ordering on their execution. This solves a
race condition where a job could set `fsynced_reploff_pending` to a higher
value than another pending fsync job, resulting in indicating an offset
for which parts of the data have not yet actually been fsynced.
Imposing an ordering on the jobs guarantees that fsync jobs are executed
in increasing order of replication offset.
* Drain bio jobs when switching `appendfsync` to "always"
This should prevent a write race between updates to `fsynced_reploff_pending`
in the main thread (`flushAppendOnlyFile` when set to ALWAYS fsync), and
those done in the bio thread.
* Drain the pending fsync when starting over a new AOF to avoid race conditions
with the previous AOF offsets overriding the new one (e.g. after switching to
replicate from a new master).
* Make sure to update the fsynced offset at the end of the initial AOF rewrite.
a must in case there are no additional writes that trigger a periodic fsync,
specifically for a replica that does a full sync.
Limitations:
It is possible to write a module and a Lua script that propagate to the AOF and doesn't
propagate to the replication stream. see REDISMODULE_ARGV_NO_REPLICAS and luaRedisSetReplCommand.
These features are incompatible with the WAITAOF command, and can result
in two bad cases. The scenario is that the user executes command that only
propagates to AOF, and then immediately
issues a WAITAOF, and there's no further writes on the replication stream after that.
1. if the the last thing that happened on the replication stream is a PING
(which increased the replication offset but won't trigger an fsync on the replica),
then the client would hang forever (will wait for an fack that the replica will never
send sine it doesn't trigger any fsyncs).
2. if the last thing that happened is a write command that got propagated properly,
then WAITAOF will be released immediately, without waiting for an fsync (since
the offset didn't change)
Refactoring:
* Plumbing to allow bio worker to handle multiple job types
This introduces infrastructure necessary to allow BIO workers to
not have a 1-1 mapping of worker to job-type. This allows in the
future to assign multiple job types to a single worker, either as
a performance/resource optimization, or as a way of enforcing
ordering between specific classes of jobs.
Co-authored-by: Oran Agra <oran@redislabs.com>
2023-03-14 20:26:21 +02:00
if ( server . aof_state = = AOF_WAIT_REWRITE ) {
2015-07-27 09:41:48 +02:00
server . aof_state = AOF_ON ;
2011-08-18 15:49:06 +02:00
Implementing the WAITAOF command (issue #10505) (#11713)
Implementing the WAITAOF functionality which would allow the user to
block until a specified number of Redises have fsynced all previous write
commands to the AOF.
Syntax: `WAITAOF <num_local> <num_replicas> <timeout>`
Response: Array containing two elements: num_local, num_replicas
num_local is always either 0 or 1 representing the local AOF on the master.
num_replicas is the number of replicas that acknowledged the a replication
offset of the last write being fsynced to the AOF.
Returns an error when called on replicas, or when called with non-zero
num_local on a master with AOF disabled, in all other cases the response
just contains number of fsync copies.
Main changes:
* Added code to keep track of replication offsets that are confirmed to have
been fsynced to disk.
* Keep advancing master_repl_offset even when replication is disabled (and
there's no replication backlog, only if there's an AOF enabled).
This way we can use this command and it's mechanisms even when replication
is disabled.
* Extend REPLCONF ACK to `REPLCONF ACK <ofs> FACK <ofs>`, the FACK
will be appended only if there's an AOF on the replica, and already ignored on
old masters (thus backwards compatible)
* WAIT now no longer wait for the replication offset after your last command, but
rather the replication offset after your last write (or read command that caused
propagation, e.g. lazy expiry).
Unrelated changes:
* WAIT command respects CLIENT_DENY_BLOCKING (not just CLIENT_MULTI)
Implementation details:
* Add an atomic var named `fsynced_reploff_pending` that's updated
(usually by the bio thread) and later copied to the main `fsynced_reploff`
variable (only if the AOF base file exists).
I.e. during the initial AOF rewrite it will not be used as the fsynced offset
since the AOF base is still missing.
* Replace close+fsync bio job with new BIO_CLOSE_AOF (AOF specific)
job that will also update fsync offset the field.
* Handle all AOF jobs (BIO_CLOSE_AOF, BIO_AOF_FSYNC) in the same bio
worker thread, to impose ordering on their execution. This solves a
race condition where a job could set `fsynced_reploff_pending` to a higher
value than another pending fsync job, resulting in indicating an offset
for which parts of the data have not yet actually been fsynced.
Imposing an ordering on the jobs guarantees that fsync jobs are executed
in increasing order of replication offset.
* Drain bio jobs when switching `appendfsync` to "always"
This should prevent a write race between updates to `fsynced_reploff_pending`
in the main thread (`flushAppendOnlyFile` when set to ALWAYS fsync), and
those done in the bio thread.
* Drain the pending fsync when starting over a new AOF to avoid race conditions
with the previous AOF offsets overriding the new one (e.g. after switching to
replicate from a new master).
* Make sure to update the fsynced offset at the end of the initial AOF rewrite.
a must in case there are no additional writes that trigger a periodic fsync,
specifically for a replica that does a full sync.
Limitations:
It is possible to write a module and a Lua script that propagate to the AOF and doesn't
propagate to the replication stream. see REDISMODULE_ARGV_NO_REPLICAS and luaRedisSetReplCommand.
These features are incompatible with the WAITAOF command, and can result
in two bad cases. The scenario is that the user executes command that only
propagates to AOF, and then immediately
issues a WAITAOF, and there's no further writes on the replication stream after that.
1. if the the last thing that happened on the replication stream is a PING
(which increased the replication offset but won't trigger an fsync on the replica),
then the client would hang forever (will wait for an fack that the replica will never
send sine it doesn't trigger any fsyncs).
2. if the last thing that happened is a write command that got propagated properly,
then WAITAOF will be released immediately, without waiting for an fsync (since
the offset didn't change)
Refactoring:
* Plumbing to allow bio worker to handle multiple job types
This introduces infrastructure necessary to allow BIO workers to
not have a 1-1 mapping of worker to job-type. This allows in the
future to assign multiple job types to a single worker, either as
a performance/resource optimization, or as a way of enforcing
ordering between specific classes of jobs.
Co-authored-by: Oran Agra <oran@redislabs.com>
2023-03-14 20:26:21 +02:00
/* Update the fsynced replication offset that just now become valid.
* This could either be the one we took in startAppendOnly , or a
* newer one set by the bio thread . */
long long fsynced_reploff_pending ;
atomicGet ( server . fsynced_reploff_pending , fsynced_reploff_pending ) ;
server . fsynced_reploff = fsynced_reploff_pending ;
}
2015-07-27 09:41:48 +02:00
serverLog ( LL_VERBOSE ,
2011-08-18 15:49:06 +02:00
" Background AOF rewrite signal handler took %lldus " , ustime ( ) - now ) ;
2010-06-22 00:07:48 +02:00
} else if ( ! bysignal & & exitcode ! = 0 ) {
2020-01-17 11:46:19 +08:00
server . aof_lastbgrewrite_status = C_ERR ;
2022-04-26 21:31:19 +08:00
server . stat_aofrw_consecutive_failures + + ;
2020-01-17 11:46:19 +08:00
serverLog ( LL_WARNING ,
" Background AOF rewrite terminated with error " ) ;
} else {
2015-11-13 09:32:07 +01:00
/* SIGUSR1 is whitelisted, so we have a way to kill a child without
Squash merging 125 typo/grammar/comment/doc PRs (#7773)
List of squashed commits or PRs
===============================
commit 66801ea
Author: hwware <wen.hui.ware@gmail.com>
Date: Mon Jan 13 00:54:31 2020 -0500
typo fix in acl.c
commit 46f55db
Author: Itamar Haber <itamar@redislabs.com>
Date: Sun Sep 6 18:24:11 2020 +0300
Updates a couple of comments
Specifically:
* RM_AutoMemory completed instead of pointing to docs
* Updated link to custom type doc
commit 61a2aa0
Author: xindoo <xindoo@qq.com>
Date: Tue Sep 1 19:24:59 2020 +0800
Correct errors in code comments
commit a5871d1
Author: yz1509 <pro-756@qq.com>
Date: Tue Sep 1 18:36:06 2020 +0800
fix typos in module.c
commit 41eede7
Author: bookug <bookug@qq.com>
Date: Sat Aug 15 01:11:33 2020 +0800
docs: fix typos in comments
commit c303c84
Author: lazy-snail <ws.niu@outlook.com>
Date: Fri Aug 7 11:15:44 2020 +0800
fix spelling in redis.conf
commit 1eb76bf
Author: zhujian <zhujianxyz@gmail.com>
Date: Thu Aug 6 15:22:10 2020 +0800
add a missing 'n' in comment
commit 1530ec2
Author: Daniel Dai <764122422@qq.com>
Date: Mon Jul 27 00:46:35 2020 -0400
fix spelling in tracking.c
commit e517b31
Author: Hunter-Chen <huntcool001@gmail.com>
Date: Fri Jul 17 22:33:32 2020 +0800
Update redis.conf
Co-authored-by: Itamar Haber <itamar@redislabs.com>
commit c300eff
Author: Hunter-Chen <huntcool001@gmail.com>
Date: Fri Jul 17 22:33:23 2020 +0800
Update redis.conf
Co-authored-by: Itamar Haber <itamar@redislabs.com>
commit 4c058a8
Author: 陈浩鹏 <chenhaopeng@heytea.com>
Date: Thu Jun 25 19:00:56 2020 +0800
Grammar fix and clarification
commit 5fcaa81
Author: bodong.ybd <bodong.ybd@alibaba-inc.com>
Date: Fri Jun 19 10:09:00 2020 +0800
Fix typos
commit 4caca9a
Author: Pruthvi P <pruthvi@ixigo.com>
Date: Fri May 22 00:33:22 2020 +0530
Fix typo eviciton => eviction
commit b2a25f6
Author: Brad Dunbar <dunbarb2@gmail.com>
Date: Sun May 17 12:39:59 2020 -0400
Fix a typo.
commit 12842ae
Author: hwware <wen.hui.ware@gmail.com>
Date: Sun May 3 17:16:59 2020 -0400
fix spelling in redis conf
commit ddba07c
Author: Chris Lamb <chris@chris-lamb.co.uk>
Date: Sat May 2 23:25:34 2020 +0100
Correct a "conflicts" spelling error.
commit 8fc7bf2
Author: Nao YONASHIRO <yonashiro@r.recruit.co.jp>
Date: Thu Apr 30 10:25:27 2020 +0900
docs: fix EXPIRE_FAST_CYCLE_DURATION to ACTIVE_EXPIRE_CYCLE_FAST_DURATION
commit 9b2b67a
Author: Brad Dunbar <dunbarb2@gmail.com>
Date: Fri Apr 24 11:46:22 2020 -0400
Fix a typo.
commit 0746f10
Author: devilinrust <63737265+devilinrust@users.noreply.github.com>
Date: Thu Apr 16 00:17:53 2020 +0200
Fix typos in server.c
commit 92b588d
Author: benjessop12 <56115861+benjessop12@users.noreply.github.com>
Date: Mon Apr 13 13:43:55 2020 +0100
Fix spelling mistake in lazyfree.c
commit 1da37aa
Merge: 2d4ba28 af347a8
Author: hwware <wen.hui.ware@gmail.com>
Date: Thu Mar 5 22:41:31 2020 -0500
Merge remote-tracking branch 'upstream/unstable' into expiretypofix
commit 2d4ba28
Author: hwware <wen.hui.ware@gmail.com>
Date: Mon Mar 2 00:09:40 2020 -0500
fix typo in expire.c
commit 1a746f7
Author: SennoYuki <minakami1yuki@gmail.com>
Date: Thu Feb 27 16:54:32 2020 +0800
fix typo
commit 8599b1a
Author: dongheejeong <donghee950403@gmail.com>
Date: Sun Feb 16 20:31:43 2020 +0000
Fix typo in server.c
commit f38d4e8
Author: hwware <wen.hui.ware@gmail.com>
Date: Sun Feb 2 22:58:38 2020 -0500
fix typo in evict.c
commit fe143fc
Author: Leo Murillo <leonardo.murillo@gmail.com>
Date: Sun Feb 2 01:57:22 2020 -0600
Fix a few typos in redis.conf
commit 1ab4d21
Author: viraja1 <anchan.viraj@gmail.com>
Date: Fri Dec 27 17:15:58 2019 +0530
Fix typo in Latency API docstring
commit ca1f70e
Author: gosth <danxuedexing@qq.com>
Date: Wed Dec 18 15:18:02 2019 +0800
fix typo in sort.c
commit a57c06b
Author: ZYunH <zyunhjob@163.com>
Date: Mon Dec 16 22:28:46 2019 +0800
fix-zset-typo
commit b8c92b5
Author: git-hulk <hulk.website@gmail.com>
Date: Mon Dec 16 15:51:42 2019 +0800
FIX: typo in cluster.c, onformation->information
commit 9dd981c
Author: wujm2007 <jim.wujm@gmail.com>
Date: Mon Dec 16 09:37:52 2019 +0800
Fix typo
commit e132d7a
Author: Sebastien Williams-Wynn <s.williamswynn.mail@gmail.com>
Date: Fri Nov 15 00:14:07 2019 +0000
Minor typo change
commit 47f44d5
Author: happynote3966 <01ssrmikururudevice01@gmail.com>
Date: Mon Nov 11 22:08:48 2019 +0900
fix comment typo in redis-cli.c
commit b8bdb0d
Author: fulei <fulei@kuaishou.com>
Date: Wed Oct 16 18:00:17 2019 +0800
Fix a spelling mistake of comments in defragDictBucketCallback
commit 0def46a
Author: fulei <fulei@kuaishou.com>
Date: Wed Oct 16 13:09:27 2019 +0800
fix some spelling mistakes of comments in defrag.c
commit f3596fd
Author: Phil Rajchgot <tophil@outlook.com>
Date: Sun Oct 13 02:02:32 2019 -0400
Typo and grammar fixes
Redis and its documentation are great -- just wanted to submit a few corrections in the spirit of Hacktoberfest. Thanks for all your work on this project. I use it all the time and it works beautifully.
commit 2b928cd
Author: KangZhiDong <worldkzd@gmail.com>
Date: Sun Sep 1 07:03:11 2019 +0800
fix typos
commit 33aea14
Author: Axlgrep <axlgrep@gmail.com>
Date: Tue Aug 27 11:02:18 2019 +0800
Fixed eviction spelling issues
commit e282a80
Author: Simen Flatby <simen@oms.no>
Date: Tue Aug 20 15:25:51 2019 +0200
Update comments to reflect prop name
In the comments the prop is referenced as replica-validity-factor,
but it is really named cluster-replica-validity-factor.
commit 74d1f9a
Author: Jim Green <jimgreen2013@qq.com>
Date: Tue Aug 20 20:00:31 2019 +0800
fix comment error, the code is ok
commit eea1407
Author: Liao Tonglang <liaotonglang@gmail.com>
Date: Fri May 31 10:16:18 2019 +0800
typo fix
fix cna't to can't
commit 0da553c
Author: KAWACHI Takashi <tkawachi@gmail.com>
Date: Wed Jul 17 00:38:16 2019 +0900
Fix typo
commit 7fc8fb6
Author: Michael Prokop <mika@grml.org>
Date: Tue May 28 17:58:42 2019 +0200
Typo fixes
s/familar/familiar/
s/compatiblity/compatibility/
s/ ot / to /
s/itsef/itself/
commit 5f46c9d
Author: zhumoing <34539422+zhumoing@users.noreply.github.com>
Date: Tue May 21 21:16:50 2019 +0800
typo-fixes
typo-fixes
commit 321dfe1
Author: wxisme <850885154@qq.com>
Date: Sat Mar 16 15:10:55 2019 +0800
typo fix
commit b4fb131
Merge: 267e0e6 3df1eb8
Author: Nikitas Bastas <nikitasbst@gmail.com>
Date: Fri Feb 8 22:55:45 2019 +0200
Merge branch 'unstable' of antirez/redis into unstable
commit 267e0e6
Author: Nikitas Bastas <nikitasbst@gmail.com>
Date: Wed Jan 30 21:26:04 2019 +0200
Minor typo fix
commit 30544e7
Author: inshal96 <39904558+inshal96@users.noreply.github.com>
Date: Fri Jan 4 16:54:50 2019 +0500
remove an extra 'a' in the comments
commit 337969d
Author: BrotherGao <yangdongheng11@gmail.com>
Date: Sat Dec 29 12:37:29 2018 +0800
fix typo in redis.conf
commit 9f4b121
Merge: 423a030 e504583
Author: BrotherGao <yangdongheng@xiaomi.com>
Date: Sat Dec 29 11:41:12 2018 +0800
Merge branch 'unstable' of antirez/redis into unstable
commit 423a030
Merge: 42b02b7 46a51cd
Author: 杨东衡 <yangdongheng@xiaomi.com>
Date: Tue Dec 4 23:56:11 2018 +0800
Merge branch 'unstable' of antirez/redis into unstable
commit 42b02b7
Merge: 68c0e6e b8febe6
Author: Dongheng Yang <yangdongheng11@gmail.com>
Date: Sun Oct 28 15:54:23 2018 +0800
Merge pull request #1 from antirez/unstable
update local data
commit 714b589
Author: Christian <crifei93@gmail.com>
Date: Fri Dec 28 01:17:26 2018 +0100
fix typo "resulution"
commit e23259d
Author: garenchan <1412950785@qq.com>
Date: Wed Dec 26 09:58:35 2018 +0800
fix typo: segfauls -> segfault
commit a9359f8
Author: xjp <jianping_xie@aliyun.com>
Date: Tue Dec 18 17:31:44 2018 +0800
Fixed REDISMODULE_H spell bug
commit a12c3e4
Author: jdiaz <jrd.palacios@gmail.com>
Date: Sat Dec 15 23:39:52 2018 -0600
Fixes hyperloglog hash function comment block description
commit 770eb11
Author: 林上耀 <1210tom@163.com>
Date: Sun Nov 25 17:16:10 2018 +0800
fix typo
commit fd97fbb
Author: Chris Lamb <chris@chris-lamb.co.uk>
Date: Fri Nov 23 17:14:01 2018 +0100
Correct "unsupported" typo.
commit a85522d
Author: Jungnam Lee <jungnam.lee@oracle.com>
Date: Thu Nov 8 23:01:29 2018 +0900
fix typo in test comments
commit ade8007
Author: Arun Kumar <palerdot@users.noreply.github.com>
Date: Tue Oct 23 16:56:35 2018 +0530
Fixed grammatical typo
Fixed typo for word 'dictionary'
commit 869ee39
Author: Hamid Alaei <hamid.a85@gmail.com>
Date: Sun Aug 12 16:40:02 2018 +0430
fix documentations: (ThreadSafeContextStart/Stop -> ThreadSafeContextLock/Unlock), minor typo
commit f89d158
Author: Mayank Jain <mayankjain255@gmail.com>
Date: Tue Jul 31 23:01:21 2018 +0530
Updated README.md with some spelling corrections.
Made correction in spelling of some misspelled words.
commit 892198e
Author: dsomeshwar <someshwar.dhayalan@gmail.com>
Date: Sat Jul 21 23:23:04 2018 +0530
typo fix
commit 8a4d780
Author: Itamar Haber <itamar@redislabs.com>
Date: Mon Apr 30 02:06:52 2018 +0300
Fixes some typos
commit e3acef6
Author: Noah Rosamilia <ivoahivoah@gmail.com>
Date: Sat Mar 3 23:41:21 2018 -0500
Fix typo in /deps/README.md
commit 04442fb
Author: WuYunlong <xzsyeb@126.com>
Date: Sat Mar 3 10:32:42 2018 +0800
Fix typo in readSyncBulkPayload() comment.
commit 9f36880
Author: WuYunlong <xzsyeb@126.com>
Date: Sat Mar 3 10:20:37 2018 +0800
replication.c comment: run_id -> replid.
commit f866b4a
Author: Francesco 'makevoid' Canessa <makevoid@gmail.com>
Date: Thu Feb 22 22:01:56 2018 +0000
fix comment typo in server.c
commit 0ebc69b
Author: 줍 <jubee0124@gmail.com>
Date: Mon Feb 12 16:38:48 2018 +0900
Fix typo in redis.conf
Fix `five behaviors` to `eight behaviors` in [this sentence ](antirez/redis@unstable/redis.conf#L564)
commit b50a620
Author: martinbroadhurst <martinbroadhurst@users.noreply.github.com>
Date: Thu Dec 28 12:07:30 2017 +0000
Fix typo in valgrind.sup
commit 7d8f349
Author: Peter Boughton <peter@sorcerersisle.com>
Date: Mon Nov 27 19:52:19 2017 +0000
Update CONTRIBUTING; refer doc updates to redis-doc repo.
commit 02dec7e
Author: Klauswk <klauswk1@hotmail.com>
Date: Tue Oct 24 16:18:38 2017 -0200
Fix typo in comment
commit e1efbc8
Author: chenshi <baiwfg2@gmail.com>
Date: Tue Oct 3 18:26:30 2017 +0800
Correct two spelling errors of comments
commit 93327d8
Author: spacewander <spacewanderlzx@gmail.com>
Date: Wed Sep 13 16:47:24 2017 +0800
Update the comment for OBJ_ENCODING_EMBSTR_SIZE_LIMIT's value
The value of OBJ_ENCODING_EMBSTR_SIZE_LIMIT is 44 now instead of 39.
commit 63d361f
Author: spacewander <spacewanderlzx@gmail.com>
Date: Tue Sep 12 15:06:42 2017 +0800
Fix <prevlen> related doc in ziplist.c
According to the definition of ZIP_BIG_PREVLEN and other related code,
the guard of single byte <prevlen> should be 254 instead of 255.
commit ebe228d
Author: hanael80 <hanael80@gmail.com>
Date: Tue Aug 15 09:09:40 2017 +0900
Fix typo
commit 6b696e6
Author: Matt Robenolt <matt@ydekproductions.com>
Date: Mon Aug 14 14:50:47 2017 -0700
Fix typo in LATENCY DOCTOR output
commit a2ec6ae
Author: caosiyang <caosiyang@qiyi.com>
Date: Tue Aug 15 14:15:16 2017 +0800
Fix a typo: form => from
commit 3ab7699
Author: caosiyang <caosiyang@qiyi.com>
Date: Thu Aug 10 18:40:33 2017 +0800
Fix a typo: replicationFeedSlavesFromMaster() => replicationFeedSlavesFromMasterStream()
commit 72d43ef
Author: caosiyang <caosiyang@qiyi.com>
Date: Tue Aug 8 15:57:25 2017 +0800
fix a typo: servewr => server
commit 707c958
Author: Bo Cai <charpty@gmail.com>
Date: Wed Jul 26 21:49:42 2017 +0800
redis-cli.c typo: conut -> count.
Signed-off-by: Bo Cai <charpty@gmail.com>
commit b9385b2
Author: JackDrogon <jack.xsuperman@gmail.com>
Date: Fri Jun 30 14:22:31 2017 +0800
Fix some spell problems
commit 20d9230
Author: akosel <aaronjkosel@gmail.com>
Date: Sun Jun 4 19:35:13 2017 -0500
Fix typo
commit b167bfc
Author: Krzysiek Witkowicz <krzysiekwitkowicz@gmail.com>
Date: Mon May 22 21:32:27 2017 +0100
Fix #4008 small typo in comment
commit 2b78ac8
Author: Jake Clarkson <jacobwclarkson@gmail.com>
Date: Wed Apr 26 15:49:50 2017 +0100
Correct typo in tests/unit/hyperloglog.tcl
commit b0f1cdb
Author: Qi Luo <qiluo-msft@users.noreply.github.com>
Date: Wed Apr 19 14:25:18 2017 -0700
Fix typo
commit a90b0f9
Author: charsyam <charsyam@naver.com>
Date: Thu Mar 16 18:19:53 2017 +0900
fix typos
fix typos
fix typos
commit 8430a79
Author: Richard Hart <richardhart92@gmail.com>
Date: Mon Mar 13 22:17:41 2017 -0400
Fixed log message typo in listenToPort.
commit 481a1c2
Author: Vinod Kumar <kumar003vinod@gmail.com>
Date: Sun Jan 15 23:04:51 2017 +0530
src/db.c: Correct "save" -> "safe" typo
commit 586b4d3
Author: wangshaonan <wshn13@gmail.com>
Date: Wed Dec 21 20:28:27 2016 +0800
Fix typo they->the in helloworld.c
commit c1c4b5e
Author: Jenner <hypxm@qq.com>
Date: Mon Dec 19 16:39:46 2016 +0800
typo error
commit 1ee1a3f
Author: tielei <43289893@qq.com>
Date: Mon Jul 18 13:52:25 2016 +0800
fix some comments
commit 11a41fb
Author: Otto Kekäläinen <otto@seravo.fi>
Date: Sun Jul 3 10:23:55 2016 +0100
Fix spelling in documentation and comments
commit 5fb5d82
Author: francischan <f1ancis621@gmail.com>
Date: Tue Jun 28 00:19:33 2016 +0800
Fix outdated comments about redis.c file.
It should now refer to server.c file.
commit 6b254bc
Author: lmatt-bit <lmatt123n@gmail.com>
Date: Thu Apr 21 21:45:58 2016 +0800
Refine the comment of dictRehashMilliseconds func
SLAVECONF->REPLCONF in comment - by andyli029
commit ee9869f
Author: clark.kang <charsyam@naver.com>
Date: Tue Mar 22 11:09:51 2016 +0900
fix typos
commit f7b3b11
Author: Harisankar H <harisankarh@gmail.com>
Date: Wed Mar 9 11:49:42 2016 +0530
Typo correction: "faield" --> "failed"
Typo correction: "faield" --> "failed"
commit 3fd40fc
Author: Itamar Haber <itamar@redislabs.com>
Date: Thu Feb 25 10:31:51 2016 +0200
Fixes a typo in comments
commit 621c160
Author: Prayag Verma <prayag.verma@gmail.com>
Date: Mon Feb 1 12:36:20 2016 +0530
Fix typo in Readme.md
Spelling mistakes -
`eviciton` > `eviction`
`familar` > `familiar`
commit d7d07d6
Author: WonCheol Lee <toctoc21c@gmail.com>
Date: Wed Dec 30 15:11:34 2015 +0900
Typo fixed
commit a4dade7
Author: Felix Bünemann <buenemann@louis.info>
Date: Mon Dec 28 11:02:55 2015 +0100
[ci skip] Improve supervised upstart config docs
This mentions that "expect stop" is required for supervised upstart
to work correctly. See http://upstart.ubuntu.com/cookbook/#expect-stop
for an explanation.
commit d9caba9
Author: daurnimator <quae@daurnimator.com>
Date: Mon Dec 21 18:30:03 2015 +1100
README: Remove trailing whitespace
commit 72d42e5
Author: daurnimator <quae@daurnimator.com>
Date: Mon Dec 21 18:29:32 2015 +1100
README: Fix typo. th => the
commit dd6e957
Author: daurnimator <quae@daurnimator.com>
Date: Mon Dec 21 18:29:20 2015 +1100
README: Fix typo. familar => familiar
commit 3a12b23
Author: daurnimator <quae@daurnimator.com>
Date: Mon Dec 21 18:28:54 2015 +1100
README: Fix typo. eviciton => eviction
commit 2d1d03b
Author: daurnimator <quae@daurnimator.com>
Date: Mon Dec 21 18:21:45 2015 +1100
README: Fix typo. sever => server
commit 3973b06
Author: Itamar Haber <itamar@garantiadata.com>
Date: Sat Dec 19 17:01:20 2015 +0200
Typo fix
commit 4f2e460
Author: Steve Gao <fu@2token.com>
Date: Fri Dec 4 10:22:05 2015 +0800
Update README - fix typos
commit b21667c
Author: binyan <binbin.yan@nokia.com>
Date: Wed Dec 2 22:48:37 2015 +0800
delete redundancy color judge in sdscatcolor
commit 88894c7
Author: binyan <binbin.yan@nokia.com>
Date: Wed Dec 2 22:14:42 2015 +0800
the example output shoule be HelloWorld
commit 2763470
Author: binyan <binbin.yan@nokia.com>
Date: Wed Dec 2 17:41:39 2015 +0800
modify error word keyevente
Signed-off-by: binyan <binbin.yan@nokia.com>
commit 0847b3d
Author: Bruno Martins <bscmartins@gmail.com>
Date: Wed Nov 4 11:37:01 2015 +0000
typo
commit bbb9e9e
Author: dawedawe <dawedawe@gmx.de>
Date: Fri Mar 27 00:46:41 2015 +0100
typo: zimap -> zipmap
commit 5ed297e
Author: Axel Advento <badwolf.bloodseeker.rev@gmail.com>
Date: Tue Mar 3 15:58:29 2015 +0800
Fix 'salve' typos to 'slave'
commit edec9d6
Author: LudwikJaniuk <ludvig.janiuk@gmail.com>
Date: Wed Jun 12 14:12:47 2019 +0200
Update README.md
Co-Authored-By: Qix <Qix-@users.noreply.github.com>
commit 692a7af
Author: LudwikJaniuk <ludvig.janiuk@gmail.com>
Date: Tue May 28 14:32:04 2019 +0200
grammar
commit d962b0a
Author: Nick Frost <nickfrostatx@gmail.com>
Date: Wed Jul 20 15:17:12 2016 -0700
Minor grammar fix
commit 24fff01aaccaf5956973ada8c50ceb1462e211c6 (typos)
Author: Chad Miller <chadm@squareup.com>
Date: Tue Sep 8 13:46:11 2020 -0400
Fix faulty comment about operation of unlink()
commit 3cd5c1f3326c52aa552ada7ec797c6bb16452355
Author: Kevin <kevin.xgr@gmail.com>
Date: Wed Nov 20 00:13:50 2019 +0800
Fix typo in server.c.
From a83af59 Mon Sep 17 00:00:00 2001
From: wuwo <wuwo@wacai.com>
Date: Fri, 17 Mar 2017 20:37:45 +0800
Subject: [PATCH] falure to failure
From c961896 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?=E5=B7=A6=E6=87=B6?= <veficos@gmail.com>
Date: Sat, 27 May 2017 15:33:04 +0800
Subject: [PATCH] fix typo
From e600ef2 Mon Sep 17 00:00:00 2001
From: "rui.zou" <rui.zou@yunify.com>
Date: Sat, 30 Sep 2017 12:38:15 +0800
Subject: [PATCH] fix a typo
From c7d07fa Mon Sep 17 00:00:00 2001
From: Alexandre Perrin <alex@kaworu.ch>
Date: Thu, 16 Aug 2018 10:35:31 +0200
Subject: [PATCH] deps README.md typo
From b25cb67 Mon Sep 17 00:00:00 2001
From: Guy Korland <gkorland@gmail.com>
Date: Wed, 26 Sep 2018 10:55:37 +0300
Subject: [PATCH 1/2] fix typos in header
From ad28ca6 Mon Sep 17 00:00:00 2001
From: Guy Korland <gkorland@gmail.com>
Date: Wed, 26 Sep 2018 11:02:36 +0300
Subject: [PATCH 2/2] fix typos
commit 34924cdedd8552466fc22c1168d49236cb7ee915
Author: Adrian Lynch <adi_ady_ade@hotmail.com>
Date: Sat Apr 4 21:59:15 2015 +0100
Typos fixed
commit fd2a1e7
Author: Jan <jsteemann@users.noreply.github.com>
Date: Sat Oct 27 19:13:01 2018 +0200
Fix typos
Fix typos
commit e14e47c1a234b53b0e103c5f6a1c61481cbcbb02
Author: Andy Lester <andy@petdance.com>
Date: Fri Aug 2 22:30:07 2019 -0500
Fix multiple misspellings of "following"
commit 79b948ce2dac6b453fe80995abbcaac04c213d5a
Author: Andy Lester <andy@petdance.com>
Date: Fri Aug 2 22:24:28 2019 -0500
Fix misspelling of create-cluster
commit 1fffde52666dc99ab35efbd31071a4c008cb5a71
Author: Andy Lester <andy@petdance.com>
Date: Wed Jul 31 17:57:56 2019 -0500
Fix typos
commit 204c9ba9651e9e05fd73936b452b9a30be456cfe
Author: Xiaobo Zhu <xiaobo.zhu@shopee.com>
Date: Tue Aug 13 22:19:25 2019 +0800
fix typos
Squashed commit of the following:
commit 1d9aaf8
Author: danmedani <danmedani@gmail.com>
Date: Sun Aug 2 11:40:26 2015 -0700
README typo fix.
Squashed commit of the following:
commit 32bfa7c
Author: Erik Dubbelboer <erik@dubbelboer.com>
Date: Mon Jul 6 21:15:08 2015 +0200
Fixed grammer
Squashed commit of the following:
commit b24f69c
Author: Sisir Koppaka <sisir.koppaka@gmail.com>
Date: Mon Mar 2 22:38:45 2015 -0500
utils/hashtable/rehashing.c: Fix typos
Squashed commit of the following:
commit 4e04082
Author: Erik Dubbelboer <erik@dubbelboer.com>
Date: Mon Mar 23 08:22:21 2015 +0000
Small config file documentation improvements
Squashed commit of the following:
commit acb8773
Author: ctd1500 <ctd1500@gmail.com>
Date: Fri May 8 01:52:48 2015 -0700
Typo and grammar fixes in readme
commit 2eb75b6
Author: ctd1500 <ctd1500@gmail.com>
Date: Fri May 8 01:36:18 2015 -0700
fixed redis.conf comment
Squashed commit of the following:
commit a8249a2
Author: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Fri Dec 11 11:39:52 2015 +0530
Revise correction of typos.
Squashed commit of the following:
commit 3c02028
Author: zhaojun11 <zhaojun11@jd.com>
Date: Wed Jan 17 19:05:28 2018 +0800
Fix typos include two code typos in cluster.c and latency.c
Squashed commit of the following:
commit 9dba47c
Author: q191201771 <191201771@qq.com>
Date: Sat Jan 4 11:31:04 2020 +0800
fix function listCreate comment in adlist.c
Update src/server.c
commit 2c7c2cb536e78dd211b1ac6f7bda00f0f54faaeb
Author: charpty <charpty@gmail.com>
Date: Tue May 1 23:16:59 2018 +0800
server.c typo: modules system dictionary type comment
Signed-off-by: charpty <charpty@gmail.com>
commit a8395323fb63cb59cb3591cb0f0c8edb7c29a680
Author: Itamar Haber <itamar@redislabs.com>
Date: Sun May 6 00:25:18 2018 +0300
Updates test_helper.tcl's help with undocumented options
Specifically:
* Host
* Port
* Client
commit bde6f9ced15755cd6407b4af7d601b030f36d60b
Author: wxisme <850885154@qq.com>
Date: Wed Aug 8 15:19:19 2018 +0800
fix comments in deps files
commit 3172474ba991532ab799ee1873439f3402412331
Author: wxisme <850885154@qq.com>
Date: Wed Aug 8 14:33:49 2018 +0800
fix some comments
commit 01b6f2b6858b5cf2ce4ad5092d2c746e755f53f0
Author: Thor Juhasz <thor@juhasz.pro>
Date: Sun Nov 18 14:37:41 2018 +0100
Minor fixes to comments
Found some parts a little unclear on a first read, which prompted me to have a better look at the file and fix some minor things I noticed.
Fixing minor typos and grammar. There are no changes to configuration options.
These changes are only meant to help the user better understand the explanations to the various configuration options
2020-09-10 13:43:38 +03:00
* triggering an error condition . */
2022-04-26 21:31:19 +08:00
if ( bysignal ! = SIGUSR1 ) {
2015-11-13 09:32:07 +01:00
server . aof_lastbgrewrite_status = C_ERR ;
2022-04-26 21:31:19 +08:00
server . stat_aofrw_consecutive_failures + + ;
}
2012-07-17 12:06:53 +10:00
2015-07-27 09:41:48 +02:00
serverLog ( LL_WARNING ,
2011-08-18 15:49:06 +02:00
" Background AOF rewrite terminated by signal %d " , bysignal ) ;
2010-06-22 00:07:48 +02:00
}
2011-08-18 15:49:06 +02:00
2010-06-22 00:07:48 +02:00
cleanup :
Refactory fork child related infra, Unify child pid
This is a refactory commit, isn't suppose to have any actual impact.
it does the following:
- keep just one server struct fork child pid variable instead of 3
- have one server struct variable indicating the purpose of the current fork
child.
- redisFork is now responsible of updating the server struct with the pid,
which means it can be the one that calls updateDictResizePolicy
- move child info pipe handling into redisFork instead of having them
repeated outside
- there are two classes of fork purposes, mutually exclusive group (AOF, RDB,
Module), and one that can create several forks to coexist in parallel (LDB,
but maybe Modules some day too, Module API allows for that).
- minor fix to killRDBChild:
unlike killAppendOnlyChild and TerminateModuleForkChild, the killRDBChild
doesn't clear the pid variable or call wait4, so checkChildrenDone does
the cleanup for it.
This commit removes the explicit calls to rdbRemoveTempFile, closeChildInfoPipe,
updateDictResizePolicy, which didn't do any harm, but where unnecessary.
2020-12-16 15:14:04 +02:00
aofRemoveTempFile ( server . child_pid ) ;
2022-04-26 21:31:19 +08:00
/* Clear AOF buffer and delete temp incr aof for next rewrite. */
if ( server . aof_state = = AOF_WAIT_REWRITE ) {
sdsfree ( server . aof_buf ) ;
server . aof_buf = sdsempty ( ) ;
aofDelTempIncrAofFile ( ) ;
}
2012-05-25 12:11:30 +02:00
server . aof_rewrite_time_last = time ( NULL ) - server . aof_rewrite_time_start ;
server . aof_rewrite_time_start = - 1 ;
2011-12-21 10:31:34 +01:00
/* Schedule a new rewrite if we are waiting for it to switch the AOF ON. */
2015-07-27 09:41:48 +02:00
if ( server . aof_state = = AOF_WAIT_REWRITE )
2011-12-21 11:58:42 +01:00
server . aof_rewrite_scheduled = 1 ;
2010-06-22 00:07:48 +02:00
}