futriix

Author	SHA1	Message	Date
antirez	83bf475551	Added GEORADIUS(BYMEMBER)_RO variants for read-only operations. Issue #4084 shows how for a design error, GEORADIUS is a write command because of the STORE option. Because of this it does not work on readonly slaves, gets redirected to masters in Redis Cluster even when the connection is in READONLY mode and so forth. To break backward compatibility at this stage, with Redis 4.0 to be in advanced RC state, is problematic for the user base. The API can be fixed into the unstable branch soon if we'll decide to do so in order to be more consistent, and reease Redis 5.0 with this incompatibility in the future. This is still unclear. However, the ability to scale GEO queries in slaves easily is too important so this commit adds two read-only variants to the GEORADIUS and GEORADIUSBYMEMBER command: GEORADIUS_RO and GEORADIUSBYMEMBER_RO. The commands are exactly as the original commands, but they do not accept the STORE and STOREDIST options.	2017-06-30 10:03:37 +02:00
antirez	f8547e53f0	Added GEORADIUS(BYMEMBER)_RO variants for read-only operations. Issue #4084 shows how for a design error, GEORADIUS is a write command because of the STORE option. Because of this it does not work on readonly slaves, gets redirected to masters in Redis Cluster even when the connection is in READONLY mode and so forth. To break backward compatibility at this stage, with Redis 4.0 to be in advanced RC state, is problematic for the user base. The API can be fixed into the unstable branch soon if we'll decide to do so in order to be more consistent, and reease Redis 5.0 with this incompatibility in the future. This is still unclear. However, the ability to scale GEO queries in slaves easily is too important so this commit adds two read-only variants to the GEORADIUS and GEORADIUSBYMEMBER command: GEORADIUS_RO and GEORADIUSBYMEMBER_RO. The commands are exactly as the original commands, but they do not accept the STORE and STOREDIST options.	2017-06-30 10:03:37 +02:00
antirez	627713f627	RDB modules values serialization format version 2. The original RDB serialization format was not parsable without the module loaded, becuase the structure was managed only by the module itself. Moreover RDB is a streaming protocol in the sense that it is both produce di an append-only fashion, and is also sometimes directly sent to the socket (in the case of diskless replication). The fact that modules values cannot be parsed without the relevant module loaded is a problem in many ways: RDB checking tools must have loaded modules even for doing things not involving the value at all, like splitting an RDB into N RDBs by key or alike, or just checking the RDB for sanity. In theory module values could be just a blob of data with a prefixed length in order for us to be able to skip it. However prefixing the values with a length would mean one of the following: 1. To be able to write some data at a previous offset. This breaks stremaing. 2. To bufferize values before outputting them. This breaks performances. 3. To have some chunked RDB output format. This breaks simplicity. Moreover, the above solution, still makes module values a totally opaque matter, with the fowllowing problems: 1. The RDB check tool can just skip the value without being able to at least check the general structure. For datasets composed mostly of modules values this means to just check the outer level of the RDB not actually doing any checko on most of the data itself. 2. It is not possible to do any recovering or processing of data for which a module no longer exists in the future, or is unknown. So this commit implements a different solution. The modules RDB serialization API is composed if well defined calls to store integers, floats, doubles or strings. After this commit, the parts generated by the module API have a one-byte prefix for each of the above emitted parts, and there is a final EOF byte as well. So even if we don't know exactly how to interpret a module value, we can always parse it at an high level, check the overall structure, understand the types used to store the information, and easily skip the whole value. The change is backward compatible: older RDB files can be still loaded since the new encoding has a new RDB type: MODULE_2 (of value 7). The commit also implements the ability to check RDB files for sanity taking advantage of the new feature.	2017-06-27 13:19:16 +02:00
antirez	365dd037dc	RDB modules values serialization format version 2. The original RDB serialization format was not parsable without the module loaded, becuase the structure was managed only by the module itself. Moreover RDB is a streaming protocol in the sense that it is both produce di an append-only fashion, and is also sometimes directly sent to the socket (in the case of diskless replication). The fact that modules values cannot be parsed without the relevant module loaded is a problem in many ways: RDB checking tools must have loaded modules even for doing things not involving the value at all, like splitting an RDB into N RDBs by key or alike, or just checking the RDB for sanity. In theory module values could be just a blob of data with a prefixed length in order for us to be able to skip it. However prefixing the values with a length would mean one of the following: 1. To be able to write some data at a previous offset. This breaks stremaing. 2. To bufferize values before outputting them. This breaks performances. 3. To have some chunked RDB output format. This breaks simplicity. Moreover, the above solution, still makes module values a totally opaque matter, with the fowllowing problems: 1. The RDB check tool can just skip the value without being able to at least check the general structure. For datasets composed mostly of modules values this means to just check the outer level of the RDB not actually doing any checko on most of the data itself. 2. It is not possible to do any recovering or processing of data for which a module no longer exists in the future, or is unknown. So this commit implements a different solution. The modules RDB serialization API is composed if well defined calls to store integers, floats, doubles or strings. After this commit, the parts generated by the module API have a one-byte prefix for each of the above emitted parts, and there is a final EOF byte as well. So even if we don't know exactly how to interpret a module value, we can always parse it at an high level, check the overall structure, understand the types used to store the information, and easily skip the whole value. The change is backward compatible: older RDB files can be still loaded since the new encoding has a new RDB type: MODULE_2 (of value 7). The commit also implements the ability to check RDB files for sanity taking advantage of the new feature.	2017-06-27 13:19:16 +02:00
xuzhou	809a73be97	Fix set with ex/px option when propagated to aof	2017-06-16 17:51:38 +08:00
xuzhou	530fcf8687	Fix set with ex/px option when propagated to aof	2017-06-16 17:51:38 +08:00
Qu Chen	29122cfa05	Implement getKeys procedure for georadius and georadiusbymember commands.	2017-06-14 18:15:48 +02:00
Qu Chen	4740424049	Implement getKeys procedure for georadius and georadiusbymember commands.	2017-06-14 18:15:48 +02:00
antirez	e6ae9c9bab	Modules TSC: use atomic var for server.unixtime. This avoids Helgrind complaining, but we are actually not using atomicGet() to get the unixtime value for now: too many places where it is used and given tha time_t is word-sized it should be safe in all the archs we support as it is. On the other hand, Helgrind, when Redis is compiled with "make helgrind" in order to force the __sync macros, will detect the write in updateCachedTime() as a read (because atomic functions are used) and will not complain about races. This commit also includes minor refactoring of mutex initializations and a "helgrind" target in the Makefile.	2017-05-10 10:04:16 +02:00
antirez	1f598fc2bb	Modules TSC: use atomic var for server.unixtime. This avoids Helgrind complaining, but we are actually not using atomicGet() to get the unixtime value for now: too many places where it is used and given tha time_t is word-sized it should be safe in all the archs we support as it is. On the other hand, Helgrind, when Redis is compiled with "make helgrind" in order to force the __sync macros, will detect the write in updateCachedTime() as a read (because atomic functions are used) and will not complain about races. This commit also includes minor refactoring of mutex initializations and a "helgrind" target in the Makefile.	2017-05-10 10:04:16 +02:00
antirez	61eb08813b	Modules TSC: Add mutex for server.lruclock. Only useful for when no atomic builtins are available.	2017-05-09 16:32:49 +02:00
antirez	9390c384b8	Modules TSC: Add mutex for server.lruclock. Only useful for when no atomic builtins are available.	2017-05-09 16:32:49 +02:00
antirez	42948bc052	Modules TSC: Improve inter-thread synchronization. More work to do with server.unixtime and similar. Need to write Helgrind suppression file in order to suppress the valse positives.	2017-05-09 11:57:09 +02:00
antirez	ece658713b	Modules TSC: Improve inter-thread synchronization. More work to do with server.unixtime and similar. Need to write Helgrind suppression file in order to suppress the valse positives.	2017-05-09 11:57:09 +02:00
antirez	441c323498	Modules TSC: Release the GIL for all the time we are blocked. Instead of giving the module background operations just a small time to run in the beforeSleep() function, we can have the lock released for all the time we are blocked in the multiplexing syscall.	2017-05-03 11:26:21 +02:00
antirez	3fcf959e60	Modules TSC: Release the GIL for all the time we are blocked. Instead of giving the module background operations just a small time to run in the beforeSleep() function, we can have the lock released for all the time we are blocked in the multiplexing syscall.	2017-05-03 11:26:21 +02:00
antirez	2d1ae6f06d	Modules TSC: GIL and cooperative multi tasking setup.	2017-04-28 18:41:10 +02:00
antirez	59b06b14c9	Modules TSC: GIL and cooperative multi tasking setup.	2017-04-28 18:41:10 +02:00
antirez	ad56461850	Fix PSYNC2 incomplete command bug as described in #3899 . This bug was discovered by @kevinmcgehee and constituted a major hidden bug in the PSYNC2 implementation, caused by the propagation from the master of incomplete commands to slaves. The bug had several results: 1. Borrowing from Kevin text in the issue: "Given that slaves blindly copy over their master's input into their own replication backlog over successive read syscalls, it's possible that with large commands or small TCP buffers, partial commands are present in this buffer. If the master were to fail before successfully propagating the entire command to a slave, the slaves will never execute the partial command (since the client is invalidated) but will copy it to replication backlog which may relay those invalid bytes to its slaves on PSYNC2, corrupting the backlog and possibly other valid commands that follow the failover. Simple command boundaries aren't sufficient to capture this, either, because in the case of a MULTI/EXEC block, if the master successfully propagates a subset of the commands but not the EXEC, then the transaction in the backlog becomes corrupt and could corrupt other slaves that consume this data." 2. As identified by @yangsiran later, there is another effect of the bug. For the same mechanism of the first problem, a slave having another slave, could receive a full resynchronization request with an already half-applied command in the backlog. Once the RDB is ready, it will be sent to the slave, and the replication will continue sending to the sub-slave the other half of the command, which is not valid. The fix, designed by @yangsiran and @antirez, and implemented by @antirez, uses a secondary buffer in order to feed the sub-masters and update the replication backlog and offsets, only when a given part of the query buffer is actually applied to the state of the instance, that is, when the command gets processed and the command is not pending in the Redis transaction buffer because of CLIENT_MULTI state. Given that now the backlog and offsets representation are in agreement with the actual processed commands, both issue 1 and 2 should no longer be possible. Thanks to @kevinmcgehee, @yangsiran and @oranagra for their work in identifying and designing a fix for this problem.	2017-04-19 10:25:45 +02:00
antirez	22be435efe	Fix PSYNC2 incomplete command bug as described in #3899 . This bug was discovered by @kevinmcgehee and constituted a major hidden bug in the PSYNC2 implementation, caused by the propagation from the master of incomplete commands to slaves. The bug had several results: 1. Borrowing from Kevin text in the issue: "Given that slaves blindly copy over their master's input into their own replication backlog over successive read syscalls, it's possible that with large commands or small TCP buffers, partial commands are present in this buffer. If the master were to fail before successfully propagating the entire command to a slave, the slaves will never execute the partial command (since the client is invalidated) but will copy it to replication backlog which may relay those invalid bytes to its slaves on PSYNC2, corrupting the backlog and possibly other valid commands that follow the failover. Simple command boundaries aren't sufficient to capture this, either, because in the case of a MULTI/EXEC block, if the master successfully propagates a subset of the commands but not the EXEC, then the transaction in the backlog becomes corrupt and could corrupt other slaves that consume this data." 2. As identified by @yangsiran later, there is another effect of the bug. For the same mechanism of the first problem, a slave having another slave, could receive a full resynchronization request with an already half-applied command in the backlog. Once the RDB is ready, it will be sent to the slave, and the replication will continue sending to the sub-slave the other half of the command, which is not valid. The fix, designed by @yangsiran and @antirez, and implemented by @antirez, uses a secondary buffer in order to feed the sub-masters and update the replication backlog and offsets, only when a given part of the query buffer is actually applied to the state of the instance, that is, when the command gets processed and the command is not pending in the Redis transaction buffer because of CLIENT_MULTI state. Given that now the backlog and offsets representation are in agreement with the actual processed commands, both issue 1 and 2 should no longer be possible. Thanks to @kevinmcgehee, @yangsiran and @oranagra for their work in identifying and designing a fix for this problem.	2017-04-19 10:25:45 +02:00
antirez	60ffbb72b4	Fix modules blocking commands awake delay. If a thread unblocks a client blocked in a module command, by using the RedisMdoule_UnblockClient() API, the event loop may not be awaken until the next timeout of the multiplexing API or the next unrelated I/O operation on other clients. We actually want the client to be served ASAP, so a mechanism is needed in order for the unblocking API to inform Redis that there is a client to serve ASAP. This commit fixes the issue using the old trick of the pipe: when a client needs to be unblocked, a byte is written in a pipe. When we run the list of clients blocked in modules, we consume all the bytes written in the pipe. Writes and reads are performed inside the context of the mutex, so no race is possible in which we consume the bytes that are actually related to an awake request for a client that should still be put into the list of clients to unblock. It was verified that after the fix the server handles the blocked clients with the expected short delay. Thanks to @dvirsky for understanding there was such a problem and reporting it.	2017-04-10 09:33:21 +02:00
antirez	ffefc9f92d	Fix modules blocking commands awake delay. If a thread unblocks a client blocked in a module command, by using the RedisMdoule_UnblockClient() API, the event loop may not be awaken until the next timeout of the multiplexing API or the next unrelated I/O operation on other clients. We actually want the client to be served ASAP, so a mechanism is needed in order for the unblocking API to inform Redis that there is a client to serve ASAP. This commit fixes the issue using the old trick of the pipe: when a client needs to be unblocked, a byte is written in a pipe. When we run the list of clients blocked in modules, we consume all the bytes written in the pipe. Writes and reads are performed inside the context of the mutex, so no race is possible in which we consume the bytes that are actually related to an awake request for a client that should still be put into the list of clients to unblock. It was verified that after the fix the server handles the blocked clients with the expected short delay. Thanks to @dvirsky for understanding there was such a problem and reporting it.	2017-04-10 09:33:21 +02:00
antirez	de52b6375b	Cluster: hash slots tracking using a radix tree.	2017-03-27 16:37:22 +02:00
antirez	1409c545da	Cluster: hash slots tracking using a radix tree.	2017-03-27 16:37:22 +02:00
antirez	b49721d57d	Use SipHash hash function to mitigate HashDos attempts. This change attempts to switch to an hash function which mitigates the effects of the HashDoS attack (denial of service attack trying to force data structures to worst case behavior) while at the same time providing Redis with an hash function that does not expect the input data to be word aligned, a condition no longer true now that sds.c strings have a varialbe length header. Note that it is possible sometimes that even using an hash function for which collisions cannot be generated without knowing the seed, special implementation details or the exposure of the seed in an indirect way (for example the ability to add elements to a Set and check the return in which Redis returns them with SMEMBERS) may make the attacker's life simpler in the process of trying to guess the correct seed, however the next step would be to switch to a log(N) data structure when too many items in a single bucket are detected: this seems like an overkill in the case of Redis. SPEED REGRESION TESTS: In order to verify that switching from MurmurHash to SipHash had no impact on speed, a set of benchmarks involving fast insertion of 5 million of keys were performed. The result shows Redis with SipHash in high pipelining conditions to be about 4% slower compared to using the previous hash function. However this could partially be related to the fact that the current implementation does not attempt to hash whole words at a time but reads single bytes, in order to have an output which is endian-netural and at the same time working on systems where unaligned memory accesses are a problem. Further X86 specific optimizations should be tested, the function may easily get at the same level of MurMurHash2 if a few optimizations are performed.	2017-02-20 17:29:17 +01:00
antirez	adeed29a99	Use SipHash hash function to mitigate HashDos attempts. This change attempts to switch to an hash function which mitigates the effects of the HashDoS attack (denial of service attack trying to force data structures to worst case behavior) while at the same time providing Redis with an hash function that does not expect the input data to be word aligned, a condition no longer true now that sds.c strings have a varialbe length header. Note that it is possible sometimes that even using an hash function for which collisions cannot be generated without knowing the seed, special implementation details or the exposure of the seed in an indirect way (for example the ability to add elements to a Set and check the return in which Redis returns them with SMEMBERS) may make the attacker's life simpler in the process of trying to guess the correct seed, however the next step would be to switch to a log(N) data structure when too many items in a single bucket are detected: this seems like an overkill in the case of Redis. SPEED REGRESION TESTS: In order to verify that switching from MurmurHash to SipHash had no impact on speed, a set of benchmarks involving fast insertion of 5 million of keys were performed. The result shows Redis with SipHash in high pipelining conditions to be about 4% slower compared to using the previous hash function. However this could partially be related to the fact that the current implementation does not attempt to hash whole words at a time but reads single bytes, in order to have an output which is endian-netural and at the same time working on systems where unaligned memory accesses are a problem. Further X86 specific optimizations should be tested, the function may easily get at the same level of MurMurHash2 if a few optimizations are performed.	2017-02-20 17:29:17 +01:00
antirez	c6dfff5b61	serverPanic(): allow printf() alike formatting. This is of great interest because allows us to print debugging informations that could be of useful when debugging, like in the following example: serverPanic("Unexpected encoding for object %d, %d", obj->type, obj->encoding);	2017-01-18 17:05:10 +01:00
antirez	53b8bf2c89	serverPanic(): allow printf() alike formatting. This is of great interest because allows us to print debugging informations that could be of useful when debugging, like in the following example: serverPanic("Unexpected encoding for object %d, %d", obj->type, obj->encoding);	2017-01-18 17:05:10 +01:00
antirez	c0837ddbcc	Use const in modules types mem_usage method. As suggested by @itamarhaber.	2017-01-12 12:47:46 +01:00
antirez	636c693f44	Use const in modules types mem_usage method. As suggested by @itamarhaber.	2017-01-12 12:47:46 +01:00
antirez	eabcf0fed8	Defrag: not enabled by default. Error on CONFIG SET if not available.	2017-01-11 15:43:08 +01:00
antirez	6ad34a4b78	Defrag: not enabled by default. Error on CONFIG SET if not available.	2017-01-11 15:43:08 +01:00
oranagra	53511a429c	active memory defragmentation	2016-12-30 03:37:52 +02:00
oranagra	7aa9e6d2ae	active memory defragmentation	2016-12-30 03:37:52 +02:00
antirez	03f1cc2023	Only show Redis logo if logging to stdout / TTY. You can still force the logo in the normal logs. For motivations, check issue #3112. For me the reason is that actually the logo is nice to have in interactive sessions, but inside the logs kinda loses its usefulness, but for the ability of users to recognize restarts easily: for this reason the new startup sequence shows a one liner ASCII "wave" so that there is still a bit of visual clue. Startup logging was modified in order to log events in more obvious ways, and to log more events. Also certain important informations are now more easy to parse/grep since they are printed in field=value style. The option --always-show-logo in redis.conf was added, defaulting to no.	2016-12-19 16:41:47 +01:00
antirez	06bfeb482d	Only show Redis logo if logging to stdout / TTY. You can still force the logo in the normal logs. For motivations, check issue #3112. For me the reason is that actually the logo is nice to have in interactive sessions, but inside the logs kinda loses its usefulness, but for the ability of users to recognize restarts easily: for this reason the new startup sequence shows a one liner ASCII "wave" so that there is still a bit of visual clue. Startup logging was modified in order to log events in more obvious ways, and to log more events. Also certain important informations are now more easy to parse/grep since they are printed in field=value style. The option --always-show-logo in redis.conf was added, defaulting to no.	2016-12-19 16:41:47 +01:00
antirez	6820b5cae1	Switch PFCOUNT to LogLog-Beta algorithm. The new algorithm provides the same speed with a smaller error for cardinalities in the range 0-100k. Before switching, the new and old algorithm behavior was studied in details in the context of issue #3677. You can find a few graphs and motivations there.	2016-12-16 11:07:30 +01:00
antirez	87538cb7fe	Switch PFCOUNT to LogLog-Beta algorithm. The new algorithm provides the same speed with a smaller error for cardinalities in the range 0-100k. Before switching, the new and old algorithm behavior was studied in details in the context of issue #3677. You can find a few graphs and motivations there.	2016-12-16 11:07:30 +01:00
Harish Murthy	c1545d2f23	LogLog-Beta Algorithm support within HLL Config option to use LogLog-Beta Algorithm for Cardinality	2016-12-16 11:07:30 +01:00
Harish Murthy	c55e3fbae5	LogLog-Beta Algorithm support within HLL Config option to use LogLog-Beta Algorithm for Cardinality	2016-12-16 11:07:30 +01:00
antirez	d02963eec9	DEBUG: new "ziplist" subcommand added. Dumps a ziplist on stdout. The commit improves ziplistRepr() and adds a new debugging subcommand so that we can trigger the dump directly from the Redis API. This command capability was used while investigating issue #3684.	2016-12-16 09:02:50 +01:00
antirez	ac61f90625	DEBUG: new "ziplist" subcommand added. Dumps a ziplist on stdout. The commit improves ziplistRepr() and adds a new debugging subcommand so that we can trigger the dump directly from the Redis API. This command capability was used while investigating issue #3684.	2016-12-16 09:02:50 +01:00
antirez	03119a0831	Writable slaves expires: fix leak in key tracking. We need to use a dictionary type that frees the key, since we copy the keys in the dictionary we use to track expires created in the slave side.	2016-12-13 16:27:13 +01:00
antirez	b6f871cf42	Writable slaves expires: fix leak in key tracking. We need to use a dictionary type that frees the key, since we copy the keys in the dictionary we use to track expires created in the slave side.	2016-12-13 16:27:13 +01:00
antirez	3f302b9ecf	INFO: show num of slave-expires keys tracked.	2016-12-13 16:02:29 +01:00
antirez	d1adc85aa6	INFO: show num of slave-expires keys tracked.	2016-12-13 16:02:29 +01:00
antirez	a8a74bb8a5	Replication: fix the infamous key leakage of writable slaves + EXPIRE. BACKGROUND AND USE CASEj Redis slaves are normally write only, however the supprot a "writable" mode which is very handy when scaling reads on slaves, that actually need write operations in order to access data. For instance imagine having slaves replicating certain Sets keys from the master. When accessing the data on the slave, we want to peform intersections between such Sets values. However we don't want to intersect each time: to cache the intersection for some time often is a good idea. To do so, it is possible to setup a slave as a writable slave, and perform the intersection on the slave side, perhaps setting a TTL on the resulting key so that it will expire after some time. THE BUG Problem: in order to have a consistent replication, expiring of keys in Redis replication is up to the master, that synthesize DEL operations to send in the replication stream. However slaves logically expire keys by hiding them from read attempts from clients so that if the master did not promptly sent a DEL, the client still see logically expired keys as non existing. Because slaves don't actively expire keys by actually evicting them but just masking from the POV of read operations, if a key is created in a writable slave, and an expire is set, the key will be leaked forever: 1. No DEL will be received from the master, which does not know about such a key at all. 2. No eviction will be performed by the slave, since it needs to disable eviction because it's up to masters, otherwise consistency of data is lost. THE FIX In order to fix the problem, the slave should be able to tag keys that were created in the slave side and have an expire set in some way. My solution involved using an unique additional dictionary created by the writable slave only if needed. The dictionary is obviously keyed by the key name that we need to track: all the keys that are set with an expire directly by a client writing to the slave are tracked. The value in the dictionary is a bitmap of all the DBs where such a key name need to be tracked, so that we can use a single dictionary to track keys in all the DBs used by the slave (actually this limits the solution to the first 64 DBs, but the default with Redis is to use 16 DBs). This solution allows to pay both a small complexity and CPU penalty, which is zero when the feature is not used, actually. The slave-side eviction is encapsulated in code which is not coupled with the rest of the Redis core, if not for the hook to track the keys. TODO I'm doing the first smoke tests to see if the feature works as expected: so far so good. Unit tests should be added before merging into the 4.0 branch.	2016-12-13 10:59:54 +01:00
antirez	04542cff92	Replication: fix the infamous key leakage of writable slaves + EXPIRE. BACKGROUND AND USE CASEj Redis slaves are normally write only, however the supprot a "writable" mode which is very handy when scaling reads on slaves, that actually need write operations in order to access data. For instance imagine having slaves replicating certain Sets keys from the master. When accessing the data on the slave, we want to peform intersections between such Sets values. However we don't want to intersect each time: to cache the intersection for some time often is a good idea. To do so, it is possible to setup a slave as a writable slave, and perform the intersection on the slave side, perhaps setting a TTL on the resulting key so that it will expire after some time. THE BUG Problem: in order to have a consistent replication, expiring of keys in Redis replication is up to the master, that synthesize DEL operations to send in the replication stream. However slaves logically expire keys by hiding them from read attempts from clients so that if the master did not promptly sent a DEL, the client still see logically expired keys as non existing. Because slaves don't actively expire keys by actually evicting them but just masking from the POV of read operations, if a key is created in a writable slave, and an expire is set, the key will be leaked forever: 1. No DEL will be received from the master, which does not know about such a key at all. 2. No eviction will be performed by the slave, since it needs to disable eviction because it's up to masters, otherwise consistency of data is lost. THE FIX In order to fix the problem, the slave should be able to tag keys that were created in the slave side and have an expire set in some way. My solution involved using an unique additional dictionary created by the writable slave only if needed. The dictionary is obviously keyed by the key name that we need to track: all the keys that are set with an expire directly by a client writing to the slave are tracked. The value in the dictionary is a bitmap of all the DBs where such a key name need to be tracked, so that we can use a single dictionary to track keys in all the DBs used by the slave (actually this limits the solution to the first 64 DBs, but the default with Redis is to use 16 DBs). This solution allows to pay both a small complexity and CPU penalty, which is zero when the feature is not used, actually. The slave-side eviction is encapsulated in code which is not coupled with the rest of the Redis core, if not for the hook to track the keys. TODO I'm doing the first smoke tests to see if the feature works as expected: so far so good. Unit tests should be added before merging into the 4.0 branch.	2016-12-13 10:59:54 +01:00
antirez	a52b715835	Modules: change type registration API to use a struct of methods.	2016-11-30 11:14:01 +01:00
antirez	71e8d15e49	Modules: change type registration API to use a struct of methods.	2016-11-30 11:14:01 +01:00

... 7 8 9 10 11 ...

590 Commits