futriix

Author	SHA1	Message	Date
Oran Agra	3ca451c46f	Make a light weight version (default) of DEBUG HTSTATS (#12212 ) The light version only shows the table sizes, while the pre-existing version that shows chain length stats is reachable with the `full` argument. This should allow looking into rehashing state, even on huge dicts, on which we're afraid to run the command for fear of causing a server freeze. Also, fix a possible overflow in dictGetStats.	2023-05-24 16:27:44 +03:00
Viktor Söderqvist	f3f6f7c0d6	Key as dict entry - memory optimization for sets (#11595 ) If a dict has only keys, and no use of values, then a key can be stored directly in a dict's hashtable. The key replaces the dictEntry. To distinguish between a key and a dictEntry, we only use this optimization if the key is odd, i.e. if the key has the least significant bit set. This is true for sds strings, since the sds header is always an odd number of bytes. Dict entries are used as a fallback when there is a hash collision. A special dict entry without a value (only key and next) is used so we save one word in this case too. This saves 24 bytes per set element for larges sets, and also gains some speed improvement as a side effect (less allocations and cache misses). A quick test adding 1M elements to a set using the command below resulted in memory usage of 28.83M, compared to 46.29M on unstable. That's 18 bytes per set element on average. eval 'for i=1,1000000,1 do redis.call("sadd", "myset", "x"..i) end' 0 Other changes: Allocations are ensured to have at least 8 bits alignment on all systems. This affects 32-bit builds compiled without HAVE_MALLOC_SIZE (not jemalloc or glibc) in which Redis stores the size of each allocation, after this change in 8 bytes instead of previously 4 bytes per allocation. This is done so we can reliably use the 3 least significant bits in a pointer to encode stuff.	2023-01-20 18:45:29 +02:00
Viktor Söderqvist	b60d33c91e	Remove the bucket-cb from dictScan and move dictEntry defrag to dictScanDefrag This change deletes the dictGetNext and dictGetNextRef functions, so the dict API doesn't expose the next field at all. The bucket function in dictScan is deleted. A separate dictScanDefrag function is added which takes a defrag alloc function to defrag-reallocate the dict entries. "Dirty" code accessing the dict internals in active defrag is removed. An 'afterReplaceEntry' is added to dictType, which allows the dict user to keep the dictEntry metadata up to date after reallocation/defrag/move. Additionally, for updating the cluster slot-to-key mapping, after a dictEntry has been reallocated, we need to know which db a dict belongs to, so we store a pointer to the db in a new metadata section in the dict struct, which is a new mechanism similar to dictEntry metadata. This adds some complexity but provides better isolation.	2023-01-11 10:25:20 +01:00
Viktor Söderqvist	d4e9e0aebd	activeDefragSdsDict use scan instead of iterator and drop dictSetNext Also delete unused function activeDefragSdsListAndDict	2023-01-11 10:25:01 +01:00
Viktor Söderqvist	c84248b5d2	Make dictEntry opaque Use functions for all accesses to dictEntry (except in dict.c). Dict abuses e.g. in defrag.c have been replaced by support functions provided by dict.	2023-01-11 09:59:24 +01:00
Oran Agra	2bec254d89	Make sure that fork child doesn't do incremental rehashing (#11692 ) Turns out that a fork child calling getExpire while persisting keys (and possibly also a result of some module fork tasks) could cause dictFind to do incremental rehashing in the child process, which is both a waste of time, and also causes COW harm.	2023-01-10 08:40:40 +02:00
ranshid	383d902ce6	reprocess command when client is unblocked on keys (#11012 ) TL;DR --------------------------------------- Following the discussion over the issue [#7551](https://github.com/redis/redis/issues/7551) We decided to refactor the client blocking code to eliminate some of the code duplications and to rebuild the infrastructure better for future key blocking cases. In this PR --------------------------------------- 1. reprocess the command once a client becomes unblocked on key (instead of running custom code for the unblocked path that's different than the one that would have run if blocking wasn't needed) 2. eliminate some (now) irrelevant code for handling unblocking lists/zsets/streams etc... 3. modify some tests to intercept the error in cases of error on reprocess after unblock (see details in the notes section below) 4. replace '$' on the client argv with current stream id. Since once we reprocess the stream XREAD we need to read from the last msg and not wait for new msg in order to prevent endless block loop. 5. Added statistics to the info "Clients" section to report the: * `total_blocking_keys` - number of blocking keys * `total_blocking_keys_on_nokey` - number of blocking keys which have at least 1 client which would like to be unblocked on when the key is deleted. 6. Avoid expiring unblocked key during unblock. Previously we used to lookup the unblocked key which might have been expired during the lookup. Now we lookup the key using NOTOUCH and NOEXPIRE to avoid deleting it at this point, so propagating commands in blocked.c is no longer needed. 7. deprecated command flags. We decided to remove the CMD_CALL_STATS and CMD_CALL_SLOWLOG and make an explicit verification in the call() function in order to decide if stats update should take place. This should simplify the logic and also mitigate existing issues: for example module calls which are triggered as part of AOF loading might still report stats even though they are called during AOF loading. Behavior changes --------------------------------------------------- 1. As this implementation prevents writing dedicated code handling unblocked streams/lists/zsets, since we now re-process the command once the client is unblocked some errors will be reported differently. The old implementation used to issue ``UNBLOCKED the stream key no longer exists`` in the following cases: - The stream key has been deleted (ie. calling DEL) - The stream and group existed but the key type was changed by overriding it (ie. with set command) - The key not longer exists after we swapdb with a db which does not contains this key - After swapdb when the new db has this key but with different type. In the new implementation the reported errors will be the same as if the command was processed after effect: NOGROUP - in case key no longer exists, or WRONGTYPE in case the key was overridden with a different type. 2. Reprocessing the command means that some checks will be reevaluated once the client is unblocked. For example, ACL rules might change since the command originally was executed and will fail once the client is unblocked. Another example is OOM condition checks which might enable the command to run and block but fail the command reprocess once the client is unblocked. 3. One of the changes in this PR is that no command stats are being updated once the command is blocked (all stats will be updated once the client is unblocked). This implies that when we have many clients blocked, users will no longer be able to get that information from the command stats. However the information can still be gathered from the client list. Client blocking --------------------------------------------------- the blocking on key will still be triggered the same way as it is done today. in order to block the current client on list of keys, the call to blockForKeys will still need to be made which will perform the same as it is today: * add the client to the list of blocked clients on each key * keep the key with a matching list node (position in the global blocking clients list for that key) in the client private blocking key dict. * flag the client with CLIENT_BLOCKED * update blocking statistics * register the client on the timeout table Key Unblock --------------------------------------------------- Unblocking a specific key will be triggered (same as today) by calling signalKeyAsReady. the implementation in that part will stay the same as today - adding the key to the global readyList. The reason to maintain the readyList (as apposed to iterating over all clients blocked on the specific key) is in order to keep the signal operation as short as possible, since it is called during the command processing. The main change is that instead of going through a dedicated code path that operates the blocked command we will just call processPendingCommandsAndResetClient. ClientUnblock (keys) --------------------------------------------------- 1. Unblocking clients on keys will be triggered after command is processed and during the beforeSleep 8. the general schema is: 9. For each key k in the readyList: ``` For each client c which is blocked on k: in case either: 1. k exists AND the k type matches the current client blocking type OR 2. k exists and c is blocked on module command OR 3. k does not exists and c was blocked with the flag unblock_on_deleted_key do: 1. remove the client from the list of clients blocked on this key 2. remove the blocking list node from the client blocking key dict 3. remove the client from the timeout list 10. queue the client on the unblocked_clients list 11. NEW: call processCommandAndResetClient(c); ``` NOTE: for module blocked clients we will still call the moduleUnblockClientByHandle which will queue the client for processing in moduleUnblockedClients list. Process Unblocked clients --------------------------------------------------- The process of all unblocked clients is done in the beforeSleep and no change is planned in that part. The general schema will be: For each client c in server.unblocked_clients: * remove client from the server.unblocked_clients * set back the client readHandler * continue processing the pending command and input buffer. Some notes regarding the new implementation --------------------------------------------------- 1. Although it was proposed, it is currently difficult to remove the read handler from the client while it is blocked. The reason is that a blocked client should be unblocked when it is disconnected, or we might consume data into void. 2. While this PR mainly keep the current blocking logic as-is, there might be some future additions to the infrastructure that we would like to have: - allow non-preemptive blocking of client - sometimes we can think that a new kind of blocking can be expected to not be preempt. for example lets imagine we hold some keys on disk and when a command needs to process them it will block until the keys are uploaded. in this case we will want the client to not disconnect or be unblocked until the process is completed (remove the client read handler, prevent client timeout, disable unblock via debug command etc...). - allow generic blocking based on command declared keys - we might want to add a hook before command processing to check if any of the declared keys require the command to block. this way it would be easier to add new kinds of key-based blocking mechanisms. Co-authored-by: Oran Agra <oran@redislabs.com> Signed-off-by: Ran Shidlansik <ranshid@amazon.com>	2023-01-01 23:35:42 +02:00
Huang Zhw	c81813148b	Add a special notification unlink available only for modules (#9406 ) Add a new module event `RedisModule_Event_Key`, this event is fired when a key is removed from the keyspace. The event includes an open key that can be used for reading the key before it is removed. Modules can also extract the key-name, and use RM_Open or RM_Call to access key from within that event, but shouldn't modify anything from within this event. The following sub events are available: - `REDISMODULE_SUBEVENT_KEY_DELETED` - `REDISMODULE_SUBEVENT_KEY_EXPIRED` - `REDISMODULE_SUBEVENT_KEY_EVICTED` - `REDISMODULE_SUBEVENT_KEY_OVERWRITE` The data pointer can be casted to a RedisModuleKeyInfo structure with the following fields: ``` RedisModuleKey key; // Opened Key ``` ### internals We also add two dict functions: `dictTwoPhaseUnlinkFind` finds an element from the table, also get the plink of the entry. The entry is returned if the element is found. The user should later call `dictTwoPhaseUnlinkFree` with it in order to unlink and release it. Otherwise if the key is not found, NULL is returned. These two functions should be used in pair. `dictTwoPhaseUnlinkFind` pauses rehash and `dictTwoPhaseUnlinkFree` resumes rehash. * We change `dbOverwrite` to `dbReplaceValue` which just replaces the value of the key and doesn't fire any events. The "overwrite" part (which emits events) is just when called from `setKey`, the other places that called dbOverwrite were ones that just update the value in-place (INCR, SPOP, and dbUnshareStringValue). This should not have any real impact since `moduleNotifyKeyUnlink` and `signalDeletedKeyAsReady` wouldn't have mattered in these cases anyway (i.e. module keys and stream keys didn't have direct calls to dbOverwrite) since we allow doing RM_OpenKey from withing these callbacks, we temporarily disable lazy expiry. * We also temporarily disable lazy expiry when we are in unlink/unlink2 callback and keyspace notification callback. * Move special definitions to the top of redismodule.h This is needed to resolve compilation errors with RedisModuleKeyInfoV1 that carries a RedisModuleKey member. Co-authored-by: Oran Agra <oran@redislabs.com>	2022-11-30 11:56:36 +02:00
tmoshaiov	fb1d56bc2a	Added API to initialize dictionary iterators without memory allocation (#11245 ) * Added api to use dictionary iterators without calling malloc.	2022-09-07 20:57:43 -05:00
Moti Cohen	1aa6c4ab92	Adding parentheses and do-while(0) to macros (#11080 ) Fixing few macros that doesn't follows most basic safety conventions which is wrapping any usage of passed variable with parentheses and if written more than one command, then wrap it with do-while(0) (or parentheses).	2022-08-03 19:38:08 +03:00
yoav-steinberg	c7dc17fc0f	Fix possible int overflow when hashing an sds. (#9916 ) This caused a crash when adding elements larger than 2GB to a set (same goes for hash keys). See #8455. Details: * The fix makes the dict hash functions receive a `size_t` instead of an `int`. In practice the dict hash functions call siphash which receives a `size_t` and the callers to the hash function pass a `size_t` to it so the fix is trivial. * The issue was recreated by attempting to add a >2gb value to a set. Appropriate tests were added where I create a set with large elements and check basic functionality on it (SADD, SCARD, SPOP, etc...). * When I added the tests I also refactored a bit all the tests code which is run under the `--large-memory` flag. This removed code duplication for the test framework's `write_big_bulk` and `write_big_bulk` code and also takes care of not allocating the test frameworks helper huge string used by these tests when not run under `--large-memory`. * I also added the _violoations.tcl_ unit tests to be part of the entire test suite and leaned up non relevant list related tests that were in there. This was done in this PR because most of the _violations_ tests are "large memory" tests.	2021-12-13 21:16:25 +02:00
sundb	e725d737fb	Add --large-memory flag for REDIS_TEST to enable tests that consume more than 100mb (#9784 ) This is a preparation step in order to add a new test in quicklist.c see #9776	2021-11-16 08:55:10 +02:00
Ozan Tezcan	b91d8b289b	Add sanitizer support and clean up sanitizer findings (#9601 ) - Added sanitizer support. `address`, `undefined` and `thread` sanitizers are available. - To build Redis with desired sanitizer : `make SANITIZER=undefined` - There were some sanitizer findings, cleaned up codebase - Added tests with address and undefined behavior sanitizers to daily CI. - Added tests with address sanitizer to the per-PR CI (smoke out mem leaks sooner). Basically, there are three types of issues : 1- Unaligned load/store : Most probably, this issue may cause a crash on a platform that does not support unaligned access. Redis does unaligned access only on supported platforms. 2- Signed integer overflow. Although, signed overflow issue can be problematic time to time and change how compiler generates code, current findings mostly about signed shift or simple addition overflow. For most platforms Redis can be compiled for, this wouldn't cause any issue as far as I can tell (checked generated code on godbolt.org). 3 -Minor leak (redis-cli), use-after-free(just before calling exit()); UB means nothing guaranteed and risky to reason about program behavior but I don't think any of the fixes here worth backporting. As sanitizers are now part of the CI, preventing new issues will be the real benefit.	2021-11-11 13:51:33 +02:00
Viktor Söderqvist	f24c63a292	Slot-to-keys using dict entry metadata (#9356 ) * Enhance dict to support arbitrary metadata carried in dictEntry Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech> * Rewrite slot-to-keys mapping to linked lists using dict entry metadata This is a memory enhancement for Redis Cluster. The radix tree slots_to_keys (which duplicates all key names prefixed with their slot number) is replaced with a linked list for each slot. The dict entries of the same cluster slot form a linked list and the pointers are stored as metadata in each dict entry of the main DB dict. This commit also moves the slot-to-key API from db.c to cluster.c. Co-authored-by: Jim Brunner <brunnerj@amazon.com>	2021-08-30 23:25:36 -07:00
Yossi Gottlieb	fe359cbfc2	Fix: don't assume char is unsigned. (#9375 ) On systems that have unsigned char by default (s390x, arm), redis-server could crash as soon as it populates the command table.	2021-08-15 21:37:44 +03:00
yoav-steinberg	5e908a290c	dict struct memory optimizations (#9228 ) Reduce dict struct memory overhead on 64bit dict size goes down from jemalloc's 96 byte bin to its 56 byte bin. summary of changes: - Remove `privdata` from callbacks and dict creation. (this affects many files, see "Interface change" below). - Meld `dictht` struct into the `dict` struct to eliminate struct padding. (this affects just dict.c and defrag.c) - Eliminate the `sizemask` field, can be calculated from size when needed. - Convert the `size` field into `size_exp` (exponent), utilizes one byte instead of 8. Interface change: pass dict pointer to dict type call back functions. This is instead of passing the removed privdata field. In the future if we'd like to have private data in the callbacks we can extract it from the dict type. We can extend dictType to include a custom dict struct allocator and use it to allocate more data at the end of the dict struct. This data can then be used to store private data later acccessed by the callbacks.	2021-08-05 08:25:58 +03:00
Binbin	56976ffb49	Use function instead of code in dict.c and delete dead code in dict.h (#8878 ) Use function instead of code in dict.c and delete dead code in dict.h	2021-05-09 15:21:18 +03:00
sundb	95d6297db8	Add run all test support with define REDIS_TEST (#8570 ) 1. Add `redis-server test all` support to run all tests. 2. Add redis test to daily ci. 3. Add `--accurate` option to run slow tests for more iterations (so that by default we run less cycles (shorter time, and less prints). 4. Move dict benchmark to REDIS_TEST. 5. fix some leaks in tests 6. make quicklist tests run on a specific fill set of options rather than huge ranges 7. move some prints in quicklist test outside their loops to reduce prints 8. removing sds.h from dict.c since it is now used in both redis-server and redis-cli (uses hiredis sds)	2021-03-10 09:13:11 +02:00
Jim Brunner	06966d2a0e	dict: pause rehash, minor readability refactor (#8515 ) The `dict` field `iterators` is misleading and incorrect. This variable is used for 1 purpose - to pause rehashing. The current `iterators` field doesn't actually count "iterators". It counts "safe iterators". But - it doesn't actually count safe iterators either. For one, it's only incremented once the safe iterator begins to iterate, not when it's created. It's also incremented in `dictScan` to prevent rehashing (and commented to make it clear why `iterators` is being incremented during a scan). This update renames the field as `pauserehash` and creates 2 helper macros `dictPauseRehashing(d)` and `dictResumeRehashing(d)`	2021-02-20 12:56:30 +02:00
Greg Femec	266949c7fc	Fix random element selection for large hash tables. (#8133 ) When a database on a 64 bit build grows past 2^31 keys, the underlying hash table expands to 2^32 buckets. After this point, the algorithms for selecting random elements only return elements from half of the available buckets because they use random() which has a range of 0 to 2^31 - 1. This causes problems for eviction policies which use dictGetSomeKeys or dictGetRandomKey. Over time they cause the hash table to become unbalanced because, while new keys are spread out evenly across all buckets, evictions come from only half of the available buckets. Eventually this half of the table starts to run out of keys and it takes longer and longer to find candidates for eviction. This continues until no more evictions can happen. This solution addresses this by using a 64 bit PRNG instead of libc random(). Co-authored-by: Greg Femec <gfemec@google.com>	2020-12-23 15:52:07 +02:00
Oran Agra	7ca00d694d	Sanitize dump payload: fail RESTORE if memory allocation fails When RDB input attempts to make a huge memory allocation that fails, RESTORE should fail gracefully rather than die with panic	2020-12-06 14:54:34 +02:00
Wang Yuan	75f9dec644	Limit the main db and expires dictionaries to expand (#7954 ) As we know, redis may reject user's requests or evict some keys if used memory is over maxmemory. Dictionaries expanding may make things worse, some big dictionaries, such as main db and expires dict, may eat huge memory at once for allocating a new big hash table and be far more than maxmemory after expanding. There are related issues: #4213 #4583 More details, when expand dict in redis, we will allocate a new big ht[1] that generally is double of ht[0], The size of ht[1] will be very big if ht[0] already is big. For db dict, if we have more than 64 million keys, we need to cost 1GB for ht[1] when dict expands. If the sum of used memory and new hash table of dict needed exceeds maxmemory, we shouldn't allow the dict to expand. Because, if we enable keys eviction, we still couldn't add much more keys after eviction and rehashing, what's worse, redis will keep less keys when redis only remains a little memory for storing new hash table instead of users' data. Moreover users can't write data in redis if disable keys eviction. What this commit changed ? Add a new member function expandAllowed for dict type, it provide a way for caller to allow expand or not. We expose two parameters for this function: more memory needed for expanding and dict current load factor, users can implement a function to make a decision by them. For main db dict and expires dict type, these dictionaries may be very big and cost huge memory for expanding, so we implement a judgement function: we can stop dict to expand provisionally if used memory will be over maxmemory after dict expands, but to guarantee the performance of redis, we still allow dict to expand if dict load factor exceeds the safe load factor. Add test cases to verify we don't allow main db to expand when left memory is not enough, so that avoid keys eviction. Other changes: For new hash table size when expand. Before this commit, the size is that double used of dict and later _dictNextPower. Actually we aim to control a dict load factor between 0.5 and 1.0. Now we replace 2 with +1, since the first check is that used >= size, the outcome of before will usually be the same as _dictNextPower(used+1). The only case where it'll differ is when dict_can_resize is false during fork, so that later the _dictNextPower(used2) will cause the dict to jump to 4 (i.e. _dictNextPower(10252) will return 4096). Fix rehash test cases due to changing algorithm of new hash table size when expand.	2020-12-06 11:53:04 +02:00
antirez	61a01793ed	Better distribution for set get-random-element operations.	2019-02-18 18:27:18 +01:00
zhaozhao.zz	7c6ddbc37d	dict: fix the int problem for defrag	2017-12-05 15:38:03 +01:00
antirez	adeed29a99	Use SipHash hash function to mitigate HashDos attempts. This change attempts to switch to an hash function which mitigates the effects of the HashDoS attack (denial of service attack trying to force data structures to worst case behavior) while at the same time providing Redis with an hash function that does not expect the input data to be word aligned, a condition no longer true now that sds.c strings have a varialbe length header. Note that it is possible sometimes that even using an hash function for which collisions cannot be generated without knowing the seed, special implementation details or the exposure of the seed in an indirect way (for example the ability to add elements to a Set and check the return in which Redis returns them with SMEMBERS) may make the attacker's life simpler in the process of trying to guess the correct seed, however the next step would be to switch to a log(N) data structure when too many items in a single bucket are detected: this seems like an overkill in the case of Redis. SPEED REGRESION TESTS: In order to verify that switching from MurmurHash to SipHash had no impact on speed, a set of benchmarks involving fast insertion of 5 million of keys were performed. The result shows Redis with SipHash in high pipelining conditions to be about 4% slower compared to using the previous hash function. However this could partially be related to the fact that the current implementation does not attempt to hash whole words at a time but reads single bytes, in order to have an output which is endian-netural and at the same time working on systems where unaligned memory accesses are a problem. Further X86 specific optimizations should be tested, the function may easily get at the same level of MurMurHash2 if a few optimizations are performed.	2017-02-20 17:29:17 +01:00
oranagra	5ab6a54cc6	active defrag improvements	2017-01-02 09:42:32 +02:00
oranagra	7aa9e6d2ae	active memory defragmentation	2016-12-30 03:37:52 +02:00
antirez	09a50d34a2	dict.c: dictReplaceRaw() -> dictAddOrFind(). What they say about "naming things" in programming?	2016-09-14 16:43:38 +02:00
oranagra	afcbcc0e58	dict.c: introduce dictUnlink(). Notes by @antirez: This patch was picked from a larger commit by Oran and adapted to change the API a bit. The basic idea is to avoid double lookups when there is to use the value of the deleted entry. BEFORE: entry = dictFind( ... ); /* 1st lookup. / / Do somethjing with the entry. / dictDelete(...); / 2nd lookup. / AFTER: entry = dictUnlink( ... ); / 1st lookup. / / Do somethjing with the entry. / dictFreeUnlinkedEntry(entry); / No lookups!. */	2016-09-14 12:18:59 +02:00
oranagra	68bf45fa1e	Optimize repeated keyname hashing. (Change cherry-picked and modified by @antirez from a larger commit provided by @oranagra in PR #3223).	2016-09-12 13:19:05 +02:00
antirez	0c05436cef	Lazyfree: a first implementation of non blocking DEL.	2015-10-01 13:00:19 +02:00
antirez	0f64080dcb	DEBUG HTSTATS <dbid> added. The command reports information about the hash table internal state representing the specified database ID. This can be used in order to investigate rehashings, memory usage issues and for other debugging purposes.	2015-07-14 17:15:37 +02:00
antirez	9feee428f2	SPOP: reimplemented for speed and better distribution. The old version of SPOP with "count" argument used an API call of dict.c which was actually designed for a different goal, and was not capable of good distribution. We follow a different three-cases approach optimized for different ratiion between sets and requested number of elements. The implementation is simpler and allowed the removal of a large amount of code.	2015-02-11 10:52:28 +01:00
antirez	5792a217f8	dict.c: add dictGetSomeKeys(), specialized for eviction.	2015-02-11 10:52:27 +01:00
antirez	064d5c96ac	Use long for rehash and iterator index in dict.h. This allows to support datasets with more than 2 billion of keys (possible in very large memory instances, this bug was actually reported). Closes issue #1814.	2014-08-26 10:18:56 +02:00
xiaoyu	d786fb6e94	Clarify argument to dict macro d is more clear because the type of argument is dict not dictht Closes #513	2014-08-18 10:59:01 +02:00
antirez	edca2b14d2	Remove warnings and improve integer sign correctness.	2014-08-13 11:44:38 +02:00
antirez	d1cb6a0fc4	Add double field in dict.c entry value union.	2014-07-22 17:38:22 +02:00
antirez	5317f5e99a	Added dictGetRandomKeys() to dict.c: mass get random entries. This new function is useful to get a number of random entries from an hash table when we just need to do some sampling without particularly good distribution. It just jumps at a random place of the hash table and returns the first N items encountered by scanning linearly. The main usefulness of this function is to speedup Redis internal sampling of the key space, for example for key eviction or expiry.	2014-03-20 15:50:46 +01:00
antirez	2eb781b35b	dict.c: added optional callback to dictEmpty(). Redis hash table implementation has many non-blocking features like incremental rehashing, however while deleting a large hash table there was no way to have a callback called to do some incremental work. This commit adds this support, as an optiona callback argument to dictEmpty() that is currently called at a fixed interval (one time every 65k deletions).	2013-12-10 18:46:24 +01:00
Pieter Noordhuis	7f490b197f	Add SCAN command	2013-10-25 10:49:48 +02:00
antirez	48cde3fe47	dict.c iterator API misuse protection. dict.c allows the user to create unsafe iterators, that are iterators that will not touch the dictionary data structure in any way, preventing copy on write, but at the same time are limited in their usage. The limitation is that when itearting with an unsafe iterator, no call to other dictionary functions must be done inside the iteration loop, otherwise the dictionary may be incrementally rehashed resulting into missing elements in the set of the elements returned by the iterator. However after introducing this kind of iterators a number of bugs were found due to misuses of the API, and we are still finding bugs about this issue. The bugs are not trivial to track because the effect is just missing elements during the iteartion. This commit introduces auto-detection of the API misuse. The idea is that an unsafe iterator has a contract: from initialization to the release of the iterator the dictionary should not change. So we take a fingerprint of the dictionary state, xoring a few important dict properties when the unsafe iteartor is initialized. We later check when the iterator is released if the fingerprint is still the same. If it is not, we found a misuse of the iterator, as not allowed API calls changed the internal state of the dictionary. This code was checked against a real bug, issue #1240. This is what Redis prints (aborting) when a misuse is detected: Assertion failed: (iter->fingerprint == dictFingerprint(iter->d)), function dictReleaseIterator, file dict.c, line 587.	2013-08-19 15:00:57 +02:00
Salvatore Sanfilippo	ecd82f59fe	Merge pull request #693 from ghurrell/dict-h-typos Fix (cosmetic) typos in dict.h	2012-10-22 02:55:23 -07:00
antirez	da920e75d4	Hash function switched to murmurhash2. The previously used hash function, djbhash, is not secure against collision attacks even when the seed is randomized as there are simple ways to find seed-independent collisions. The new hash function appears to be safe (or much harder to exploit at least) in this case, and has better distribution. Better distribution does not always means that's better. For instance in a fast benchmark with "DEBUG POPULATE 1000000" I obtained the following results: 1.6 seconds with djbhash 2.0 seconds with murmurhash2 This is due to the fact that djbhash will hash objects that follow the pattern `prefix:<id>` and where the id is numerically near, to near buckets. This improves the locality. However in other access patterns with keys that have no relation murmurhash2 has some (apparently minimal) speed advantage. On the other hand a better distribution should significantly improve the quality of the distribution of elements returned with dictGetRandomKey() that is used in SPOP, SRANDMEMBER, RANDOMKEY, and other commands. Everything considered, and under the suspect that this commit fixes a security issue in Redis, we are switching to the new hash function. If some serious speed regression will be found in the future we'll be able to step back easiliy. This commit fixes issue #663.	2012-10-05 11:20:13 +02:00
Greg Hurrell	4b1f6ad3e7	Fix (cosmetic) typos in dict.h	2012-10-02 22:01:26 -07:00
antirez	a48c8d873b	Fix for hash table collision attack. We simply randomize hash table initialization value at startup time.	2012-01-21 23:30:13 +01:00
antirez	14ed10d957	dict set/get macros for integers fixed.	2011-11-09 13:39:59 +01:00
antirez	6c578b764a	dict.c: added macros to get signed/unsigned integer values from hash entry. Field name of hash entry union modified for clarity.	2011-11-08 23:59:53 +01:00
antirez	aa9a61ccd7	dict.c: added macros in dict.h to set signed and unsigned 64 bit values directly inside the hash entry without using additional memory.	2011-11-08 19:41:29 +01:00
antirez	c0ba9ebe13	dict.c API names modified to be more coincise and consistent.	2011-11-08 17:07:55 +01:00

1 2

55 Commits