futriix

Author	SHA1	Message	Date
John Sully	2ab2078085	Fix failure to count keys in cluster slots when reloading a FLASH database Former-commit-id: f6dd863e51f91620f184ff80f08cfe518d29c87f	2020-04-28 20:48:46 -04:00
Oran Agra	58619c1286	Keep track of meaningful replication offset in replicas too Now both master and replicas keep track of the last replication offset that contains meaningful data (ignoring the tailing pings), and both trim that tail from the replication backlog, and the offset with which they try to use for psync. the implication is that if someone missed some pings, or even have excessive pings that the promoted replica has, it'll still be able to psync (avoid full sync). the downside (which was already committed) is that replicas running old code may fail to psync, since the promoted replica trims pings form it's backlog. This commit adds a test that reproduces several cases of promotions and demotions with stale and non-stale pings Background: The mearningful offset on the master was added recently to solve a problem were the master is left all alone, injecting PINGs into it's backlog when no one is listening and then gets demoted and tries to replicate from a replica that didn't have any of the PINGs (or at least not the last ones). however, consider this case: master A has two replicas (B and C) replicating directly from it. there's no traffic at all, and also no network issues, just many pings in the tail of the backlog. now B gets promoted, A becomes a replica of B, and C remains a replica of A. when A gets demoted, it trims the pings from its backlog, and successfully replicate from B. however, C is still aware of these PINGs, when it'll disconnect and re-connect to A, it'll ask for something that's not in the backlog anymore (since A trimmed the tail of it's backlog), and be forced to do a full sync (something it didn't have to do before the meaningful offset fix). Besides that, the psync2 test was always failing randomly here and there, it turns out the reason were PINGs. Investigating it shows the following scenario: cycle 1: redis #1 is master, and all the rest are direct replicas of #1 cycle 2: redis #2 is promoted to master, #1 is a replica of #2 and #3 is replica of #1 now we see that when #1 is demoted it prints: 17339:S 21 Apr 2020 11:16:38.523 * Using the meaningful offset 3929963 instead of 3929977 to exclude the final PINGs (14 bytes difference) 17339:S 21 Apr 2020 11:16:39.391 * Trying a partial resynchronization (request e2b3f8817735fdfe5fa4626766daa938b61419e5:3929964). 17339:S 21 Apr 2020 11:16:39.392 * Successful partial resynchronization with master. and when #3 connects to the demoted #2, #2 says: 17339:S 21 Apr 2020 11:16:40.084 * Partial resynchronization not accepted: Requested offset for secondary ID was 3929978, but I can reply up to 3929964 so the issue here is that the meaningful offset feature saved the day for the demoted master (since it needs to sync from a replica that didn't get the last ping), but it didn't help one of the other replicas which did get the last ping.	2020-04-27 15:52:49 +02:00
Oran Agra	5633862924	Keep track of meaningful replication offset in replicas too Now both master and replicas keep track of the last replication offset that contains meaningful data (ignoring the tailing pings), and both trim that tail from the replication backlog, and the offset with which they try to use for psync. the implication is that if someone missed some pings, or even have excessive pings that the promoted replica has, it'll still be able to psync (avoid full sync). the downside (which was already committed) is that replicas running old code may fail to psync, since the promoted replica trims pings form it's backlog. This commit adds a test that reproduces several cases of promotions and demotions with stale and non-stale pings Background: The mearningful offset on the master was added recently to solve a problem were the master is left all alone, injecting PINGs into it's backlog when no one is listening and then gets demoted and tries to replicate from a replica that didn't have any of the PINGs (or at least not the last ones). however, consider this case: master A has two replicas (B and C) replicating directly from it. there's no traffic at all, and also no network issues, just many pings in the tail of the backlog. now B gets promoted, A becomes a replica of B, and C remains a replica of A. when A gets demoted, it trims the pings from its backlog, and successfully replicate from B. however, C is still aware of these PINGs, when it'll disconnect and re-connect to A, it'll ask for something that's not in the backlog anymore (since A trimmed the tail of it's backlog), and be forced to do a full sync (something it didn't have to do before the meaningful offset fix). Besides that, the psync2 test was always failing randomly here and there, it turns out the reason were PINGs. Investigating it shows the following scenario: cycle 1: redis #1 is master, and all the rest are direct replicas of #1 cycle 2: redis #2 is promoted to master, #1 is a replica of #2 and #3 is replica of #1 now we see that when #1 is demoted it prints: 17339:S 21 Apr 2020 11:16:38.523 * Using the meaningful offset 3929963 instead of 3929977 to exclude the final PINGs (14 bytes difference) 17339:S 21 Apr 2020 11:16:39.391 * Trying a partial resynchronization (request e2b3f8817735fdfe5fa4626766daa938b61419e5:3929964). 17339:S 21 Apr 2020 11:16:39.392 * Successful partial resynchronization with master. and when #3 connects to the demoted #2, #2 says: 17339:S 21 Apr 2020 11:16:40.084 * Partial resynchronization not accepted: Requested offset for secondary ID was 3929978, but I can reply up to 3929964 so the issue here is that the meaningful offset feature saved the day for the demoted master (since it needs to sync from a replica that didn't get the last ping), but it didn't help one of the other replicas which did get the last ping.	2020-04-27 15:52:23 +02:00
John Sully	16fd1ed48e	shared pointer comparisons with other pointers Former-commit-id: d5ede50b040c82e02eb2b82982091bdd0fb7da12	2020-04-24 22:20:26 -04:00
antirez	ae3cf7911a	LCS -> STRALGO LCS. STRALGO should be a container for mostly read-only string algorithms in Redis. The algorithms should have two main characteristics: 1. They should be non trivial to compute, and often not part of programming language standard libraries. 2. They should be fast enough that it is a good idea to have optimized C implementations. Next thing I would love to see? A small strings compression algorithm.	2020-04-24 16:54:32 +02:00
antirez	262da0ba78	LCS -> STRALGO LCS. STRALGO should be a container for mostly read-only string algorithms in Redis. The algorithms should have two main characteristics: 1. They should be non trivial to compute, and often not part of programming language standard libraries. 2. They should be fast enough that it is a good idea to have optimized C implementations. Next thing I would love to see? A small strings compression algorithm.	2020-04-24 16:49:27 +02:00
antirez	f066273907	Tracking: NOLOOP internals implementation.	2020-04-24 10:14:48 +02:00
John Sully	9d407dd520	RDB load performance, eliminate useless reads Former-commit-id: 68e5d1850dbba89a87710968d314cb8c0d3cb562	2020-04-22 00:47:49 -04:00
antirez	d3d5108c7d	Tracking: NOLOOP internals implementation.	2020-04-21 10:51:46 +02:00
John Sully	f627dd8cbe	Initial merge of unstable 6 Former-commit-id: aac140de199646914cc02997a45111c9c695e55d	2020-04-16 16:36:16 -04:00
John Sully	c001ea5b41	Merge branch 'unstable' into redis_6_merge Former-commit-id: cc9924ffa606200f331b3bf5e1e1a4aa3f2702fa	2020-04-15 23:00:13 -04:00
John Sully	822f64ed2f	During AOF reload we can erroneously read incorrect aof_state values, so this variable must be read with the global lock acquired Former-commit-id: 6ff9d23fd4541a011d754209d9fda3ef3af4a7f9	2020-04-15 22:30:19 -04:00
John Sully	e8270a2f0b	Convert variables accessed outside lock to atomics Former-commit-id: b0796ff5fd7e069a2fadbfd968f7bbb2020edd2d	2020-04-15 22:25:17 -04:00
antirez	97e58ee026	Use the special static refcount for stack objects.	2020-04-15 16:03:16 +02:00
antirez	ec24d65c06	RDB: refactor some RDB loading code into dbAddRDBLoad().	2020-04-15 16:03:16 +02:00
antirez	a889c94447	incrRefCount(): abort on statically allocated object.	2020-04-15 16:03:16 +02:00
antirez	533fd1fe7b	RDB: load files faster avoiding useless free+realloc. Reloading of the RDB generated by DEBUG POPULATE 5000000 SAVE is now 25% faster. This commit also prepares the ability to have more flexibility when loading stuff from the RDB, since we no longer use dbAdd() but can control exactly how things are added in the database.	2020-04-15 16:03:16 +02:00
John Sully	f69c169c04	Merge tag '6.0-rc3' into redis_6_merge Redis 6.0 RC3. Former-commit-id: b2cb10de5f39b4d8e1ee19877c2bdaf19eefd9db	2020-04-14 22:56:19 -04:00
John Sully	3e656bcb3f	Merge commit '2a820251c8ffba152e00ef7b1ca5e7a477087d33' into redis_6_merge Former-commit-id: 50768cd242c0360c6e943c57f866789280d30dc0	2020-04-14 22:25:44 -04:00
John Sully	ce54857237	Merge commit '454e12cb8961f21c9dd8502dc82ae6ffd7e22fe0' into redis_6_merge Former-commit-id: cc3ebbe5194e9744fb84ce490e90ac5fbe7f8716	2020-04-14 22:19:29 -04:00
John Sully	90cbdd5573	Merge commit 'c208956fbe077e8249d2965dec2ea1b9b7588d6d' into redis_6_merge Former-commit-id: 2825e515504cffcf6000be2e547ab1cbd86441bc	2020-04-14 20:55:29 -04:00
John Sully	68c50ae876	Merge commit '6718d5d37517bd927635649708913affb98f67c9' into redis_6_merge Former-commit-id: ef1236b6009ebd7b00f6dd2f43df57ad95e51253	2020-04-14 20:19:48 -04:00
John Sully	2c049c16a2	Merge commit '79e8b17d7b44c793d8b22668b8583a297ee1b387' into redis_6_merge Former-commit-id: 28cbed1d13961c5568f2bdc50c6a23107d3434d0	2020-04-14 20:09:53 -04:00
John Sully	d48ea996e7	Merge commit '13fbdf970660b15011c4312f31137e58bbda5b2c' into redis_6_merge Former-commit-id: cde199a7973ad63317b68f581df607321e12bf46	2020-04-14 19:43:04 -04:00
John Sully	6df76a3bfa	Merge commit 'a8e2bbe8f6a3f4833f286cc5049e6b52c87de1a9' into redis_6_merge Former-commit-id: 5589a0a69ca6f5798b750a6a79f7e9b44d20e136	2020-04-14 19:22:44 -04:00
John Sully	2038fa270a	Merge commit '491949ee5bf4ffbfc746ea4ed5a6d673b0e2fb81' into redis_6_merge Former-commit-id: 09e8fb17cd0889ad17461e48446221b3955f5a8f	2020-04-14 18:44:42 -04:00
antirez	c38bc83f47	PSYNC2: meaningful offset implemented. A very commonly signaled operational problem with Redis master-replicas sets is that, once the master becomes unavailable for some reason, especially because of network problems, many times it wont be able to perform a partial resynchronization with the new master, once it rejoins the partition, for the following reason: 1. The master becomes isolated, however it keeps sending PINGs to the replicas. Such PINGs will never be received since the link connection is actually already severed. 2. On the other side, one of the replicas will turn into the new master, setting its secondary replication ID offset to the one of the last command received from the old master: this offset will not include the PINGs sent by the master once the link was already disconnected. 3. When the master rejoins the partion and is turned into a replica, its offset will be too advanced because of the PINGs, so a PSYNC will fail, and a full synchronization will be required. Related to issue #7002 and other discussion we had in the past around this problem. Former-commit-id: 5d6e8fe3e3e43162f0c57f580b6e8432274fca30	2020-04-14 17:56:09 -04:00
antirez	f7e8240900	Remove RDB files used for replication in persistence-less instances. Former-commit-id: b323645227a3e2cc5928e649586221aba508b10d	2020-04-14 17:27:05 -04:00
Guy Benoish	49dda32114	Diskless-load emptyDb-related fixes 1. Call emptyDb even in case of diskless-load: We want modules to get the same FLUSHDB event as disk-based replication. 2. Do not fire any module events when flushing the backups array. 3. Delete redundant call to signalFlushedDb (Called from emptyDb). Former-commit-id: aa8a3077a2d20e66e34f72f2860d0cc3daad496e	2020-04-14 17:21:10 -04:00
antirez	ce7bfb275d	Use the special static refcount for stack objects.	2020-04-09 16:25:30 +02:00
antirez	9d24b6f810	RDB: refactor some RDB loading code into dbAddRDBLoad().	2020-04-09 16:21:48 +02:00
antirez	df90feb894	incrRefCount(): abort on statically allocated object.	2020-04-09 16:20:41 +02:00
antirez	f9c0953fdd	RDB: load files faster avoiding useless free+realloc. Reloading of the RDB generated by DEBUG POPULATE 5000000 SAVE is now 25% faster. This commit also prepares the ability to have more flexibility when loading stuff from the RDB, since we no longer use dbAdd() but can control exactly how things are added in the database.	2020-04-09 10:24:46 +02:00
antirez	fdd6da647d	Speedup INFO by counting client memory incrementally. Related to #5145. Design note: clients may change type when they turn into replicas or are moved into the Pub/Sub category and so forth. Moreover the recomputation of the bytes used is problematic for obvious reasons: it changes continuously, so as a conservative way to avoid accumulating errors, each client remembers the contribution it gave to the sum, and removes it when it is freed or before updating it with the new memory usage.	2020-04-07 16:53:13 +02:00
zhaozhao.zz	73e7bcc598	lazyfree: add a new configuration lazyfree-lazy-user-del Delete keys in async way when executing DEL command, if lazyfree-lazy-user-del is yes.	2020-04-07 16:53:13 +02:00
antirez	087ed099d1	LCS: initial functionality implemented.	2020-04-07 16:52:57 +02:00
srzhao	c5d805f877	Check OOM at script start to get stable lua OOM state. Checking OOM by `getMaxMemoryState` inside script might get different result with `freeMemoryIfNeededAndSafe` at script start, because lua stack and arguments also consume memory. This leads to memory `borderline` when memory grows near server.maxmemory: - `freeMemoryIfNeededAndSafe` at script start detects no OOM, no memory freed - `getMaxMemoryState` inside script detects OOM, script aborted We solve this 'borderline' issue by saving OOM state at script start to get stable lua OOM state. related to issue #6565 and #5250.	2020-04-07 16:52:28 +02:00
Oran Agra	6f54629071	modules don't signalModifiedKey in setKey() since that's done (optionally) in RM_CloseKey	2020-04-07 16:52:04 +02:00
antirez	45d7fcec22	Speedup INFO by counting client memory incrementally. Related to #5145. Design note: clients may change type when they turn into replicas or are moved into the Pub/Sub category and so forth. Moreover the recomputation of the bytes used is problematic for obvious reasons: it changes continuously, so as a conservative way to avoid accumulating errors, each client remembers the contribution it gave to the sum, and removes it when it is freed or before updating it with the new memory usage.	2020-04-07 12:07:54 +02:00
Salvatore Sanfilippo	a58de4cff0	Merge pull request #6243 from soloestoy/expand-lazy-free-server-del lazyfree: add a new configuration lazyfree-lazy-user-del	2020-04-06 17:27:39 +02:00
antirez	f543150d0b	Merge branch 'lcs' into unstable	2020-04-06 13:51:55 +02:00
Salvatore Sanfilippo	60251ac868	Merge pull request #6797 from patpatbear/issue_#6565_memory_borderline Check OOM at script start to get stable lua OOM state.	2020-04-06 11:59:01 +02:00
John Sully	9b2392107c	Add the ability to set a starting core # when setting thread affinity Former-commit-id: 9e2e2067c6df5919f1c6b8b9e6e3457c7edc0755	2020-04-04 22:58:17 -04:00
John Sully	2684a266c8	Fix subkey expires not replicating correctly, and AOF issues Former-commit-id: bd183cdee13081a02efef5df75edf2292b872a16	2020-04-04 21:52:27 -04:00
Salvatore Sanfilippo	9a64bcf30a	Merge pull request #6694 from oranagra/signal_modified_key modules don't signalModifiedKey in setKey() since that's done (optionally) in RM_CloseKey	2020-04-02 19:00:20 +02:00
antirez	d77fd23ae2	LCS: initial functionality implemented.	2020-04-01 16:13:18 +02:00
Guy Benoish	5c23cd55d4	Modules: Test MULTI/EXEC replication of RM_Replicate Makse sure call() doesn't wrap replicated commands with a redundant MULTI/EXEC Other, unrelated changes: 1. Formatting compiler warning in INFO CLIENTS 2. Use CLIENT_ID_AOF instead of UINT64_MAX	2020-03-31 17:12:19 +02:00
Salvatore Sanfilippo	2399ed885b	Merge pull request #7037 from guybe7/fix_module_replicate_multi Modules: Test MULTI/EXEC replication of RM_Replicate	2020-03-31 17:00:57 +02:00
antirez	2a820251c8	timeout.c created: move client timeouts code there.	2020-03-31 16:57:20 +02:00
antirez	514ca204bb	Fix module commands propagation double MULTI bug. b512cb40 introduced automatic wrapping of MULTI/EXEC for the alsoPropagate API. However this collides with the built-in mechanism already present in module.c. To avoid complex changes near Redis 6 GA this commit introduces the ability to exclude call() MUTLI/EXEC wrapping for also propagate in order to continue to use the old code paths in module.c.	2020-03-31 16:57:20 +02:00

... 6 7 8 9 10 ...

1052 Commits