63 Commits

Author SHA1 Message Date
antirez
1b16400551 DEBUG HTSTATS <dbid> added.
The command reports information about the hash table internal state
representing the specified database ID.

This can be used in order to investigate rehashings, memory usage issues
and for other debugging purposes.
2015-07-14 17:15:37 +02:00
antirez
df2a1e0901 dict.c: convert types to unsigned long where appropriate.
No semantical changes since to make dict.c truly able to scale over the
32 bit table size limit, the hash function shoulds and other internals
related to hash function output should be 64 bit ready.
2015-03-27 10:14:52 +01:00
antirez
ce5d516e75 dict.c: add casting to avoid compilation warning.
rehashidx is always positive in the two code paths, since the only
negative value it could have is -1 when there is no rehashing in
progress, and the condition is explicitly checked.
2015-03-27 10:12:25 +01:00
antirez
87a6696d89 SPOP: reimplemented for speed and better distribution.
The old version of SPOP with "count" argument used an API call of dict.c
which was actually designed for a different goal, and was not capable of
good distribution. We follow a different three-cases approach optimized
for different ratiion between sets and requested number of elements.

The implementation is simpler and allowed the removal of a large amount
of code.
2015-02-11 10:52:28 +01:00
antirez
e1da371706 dict.c: reset emptylen when bucket is not empty.
Fixed by @oranagra, thank you.
2015-02-11 10:52:27 +01:00
antirez
2bb647770d dict.c: add dictGetSomeKeys(), specialized for eviction. 2015-02-11 10:52:27 +01:00
antirez
6086a3656a dict.c: avoid code repetition in dictRehash().
Avoid code repetition introduced with PR #2367, also fixes the return
value to always return 0 if there is nothing more to rehash.
2015-02-11 10:52:27 +01:00
Sun He
9c047bbfec dict.c/dictRehash: check again to update 2015-02-11 10:52:26 +01:00
antirez
15940c1988 dict.c: don't try buckets that are empty for sure in dictGetRandomKey().
This is very similar to the optimization applied to dictGetRandomKeys,
but applied to the single key variant.

Related to issue #2306.
2015-02-11 10:52:26 +01:00
antirez
5d106de0c9 dict.c: dictGetRandomKeys() optimization for big->small table case.
Related to issue #2306.
2015-02-11 10:52:26 +01:00
antirez
71b990a7ab dict.c: dictGetRandomKeys() visit pattern optimization.
We use the invariant that the original table ht[0] is never populated up
to the index before the current rehashing index.

Related to issue #2306.
2015-02-11 10:52:26 +01:00
antirez
19d51b181b dict.c: put a bound to max work dictRehash() call can do.
Related to issue #2306.
2015-02-11 10:52:26 +01:00
antirez
1115984e9f dict.c: prevent useless resize to same size.
Related to issue #2306.
2015-02-11 10:52:26 +01:00
antirez
4c149c5ab9 Less blocking dictGetRandomKeys().
Related to issue #2306.
2015-02-11 10:52:26 +01:00
antirez
848a382619 dict.c: make chaining strategy more clear in dictAddRaw(). 2015-01-23 18:11:05 +01:00
Matt Stancliff
0cd666c85e Cleanup wording of dictScan() comment
Some language in the comment was difficult
to understand, so this commit: clarifies wording, removes
unnecessary words, and relocates some dependent clauses
closer to what they actually describe.

I also tried to break up longer chains of thought
(if X, then Y, and Q, and also F, so obviously M)
into more manageable chunks for ease of understanding.
2014-09-29 06:49:08 -04:00
Michael Parker
cb4ce4ab18 Fix hash table size in comment for dictScan
Closes #1351
2014-09-29 06:49:07 -04:00
antirez
4db8a7396c Fix dictRehash assert casting type.
Also related to #1929.
2014-08-26 10:32:44 +02:00
antirez
4937702744 Cast to right type in dictNext().
This closes issue #1929, the other part was fixed in the context of issue
2014-08-26 10:26:36 +02:00
Cong Ding
ca13fbca08 Remove unused function
Closes #878
2014-08-18 11:12:26 +02:00
antirez
2e94ffb1d1 Remove warnings and improve integer sign correctness. 2014-08-13 11:44:38 +02:00
antirez
28573f8fe8 Added dictGetRandomKeys() to dict.c: mass get random entries.
This new function is useful to get a number of random entries from an
hash table when we just need to do some sampling without particularly
good distribution.

It just jumps at a random place of the hash table and returns the first
N items encountered by scanning linearly.

The main usefulness of this function is to speedup Redis internal
sampling of the key space, for example for key eviction or expiry.
2014-03-20 15:50:46 +01:00
zhanghailei
37bd3f04bb FIXED a typo more thank should be more than 2014-03-04 11:21:34 +08:00
zhanghailei
1ec4a421e3 According to context,the size should be 16 rather than 64 2014-03-04 11:21:34 +08:00
antirez
247a311317 dict.c: added optional callback to dictEmpty().
Redis hash table implementation has many non-blocking features like
incremental rehashing, however while deleting a large hash table there
was no way to have a callback called to do some incremental work.

This commit adds this support, as an optiona callback argument to
dictEmpty() that is currently called at a fixed interval (one time every
65k deletions).
2013-12-10 18:46:24 +01:00
antirez
7a5a646df9 Fixed grammar: before H the article is a, not an. 2013-12-05 16:35:32 +01:00
antirez
865d3b0f33 removed not used vars in dictScan(). 2013-11-05 11:56:11 +01:00
antirez
99efa37a6b dictScan(): empty hash table requires special handling. 2013-10-28 11:17:18 +01:00
antirez
f86c07df30 Fixed typos in dictScan() comment. 2013-10-25 17:05:55 +02:00
antirez
817e6766aa dictScan() algorithm documented. 2013-10-25 17:01:30 +02:00
Pieter Noordhuis
f18269d1ef Fix error in scan algorithm
The irrelevant bits shouldn't be masked to 1. This can result in slots being
skipped when the hash table is resized between calls to the iterator.
2013-10-25 10:50:03 +02:00
Pieter Noordhuis
956c0ed927 Add SCAN command 2013-10-25 10:49:48 +02:00
antirez
13c59cfdc8 dictFingerprint(): cast pointers to integer of same size. 2013-08-20 11:49:55 +02:00
antirez
5824ab54dd Revert "Fixed type in dict.c comment: 265 -> 256."
This reverts commit d22d557e41151c1d716045e0059550e197d6e526.
2013-08-19 17:25:48 +02:00
antirez
d22d557e41 Fixed type in dict.c comment: 265 -> 256. 2013-08-19 15:10:37 +02:00
antirez
ded611636f assert.h replaced with redisassert.h when appropriate.
Also a warning was suppressed by including unistd.h in redisassert.h
(needed for _exit()).
2013-08-19 15:01:21 +02:00
antirez
ae1bb62f62 dictFingerprint() fingerprinting made more robust.
The previous hashing used the trivial algorithm of xoring the integers
together. This is not optimal as it is very likely that different
hash table setups will hash the same, for instance an hash table at the
start of the rehashing process, and at the end, will have the same
fingerprint.

Now we hash N integers in a smarter way, by summing every integer to the
previous hash, and taking the integer hashing again (see the code for
further details). This way it is a lot less likely that we get a
collision. Moreover this way of hashing explicitly protects from the
same set of integers in a different order to hash to the same number.

This commit is related to issue #1240.
2013-08-19 15:01:12 +02:00
antirez
bfaadb0df2 dict.c iterator API misuse protection.
dict.c allows the user to create unsafe iterators, that are iterators
that will not touch the dictionary data structure in any way, preventing
copy on write, but at the same time are limited in their usage.

The limitation is that when itearting with an unsafe iterator, no call
to other dictionary functions must be done inside the iteration loop,
otherwise the dictionary may be incrementally rehashed resulting into
missing elements in the set of the elements returned by the iterator.

However after introducing this kind of iterators a number of bugs were
found due to misuses of the API, and we are still finding
bugs about this issue. The bugs are not trivial to track because the
effect is just missing elements during the iteartion.

This commit introduces auto-detection of the API misuse. The idea is
that an unsafe iterator has a contract: from initialization to the
release of the iterator the dictionary should not change.

So we take a fingerprint of the dictionary state, xoring a few important
dict properties when the unsafe iteartor is initialized. We later check
when the iterator is released if the fingerprint is still the same. If it
is not, we found a misuse of the iterator, as not allowed API calls
changed the internal state of the dictionary.

This code was checked against a real bug, issue #1240.

This is what Redis prints (aborting) when a misuse is detected:

Assertion failed: (iter->fingerprint == dictFingerprint(iter->d)),
function dictReleaseIterator, file dict.c, line 587.
2013-08-19 15:00:57 +02:00
guiquanz
df7a5b7157 Fixed many typos. 2013-01-19 10:59:44 +01:00
charsyam
08402aee73 Remove unnecessary condition in _dictExpandIfNeeded (dict.c) 2012-11-28 11:44:39 +01:00
antirez
a32d1ddff6 BSD license added to every C source and header file. 2012-11-08 18:31:32 +01:00
antirez
0c7d3bef67 Hash function switched to murmurhash2.
The previously used hash function, djbhash, is not secure against
collision attacks even when the seed is randomized as there are simple
ways to find seed-independent collisions.

The new hash function appears to be safe (or much harder to exploit at
least) in this case, and has better distribution.

Better distribution does not always means that's better. For instance in
a fast benchmark with "DEBUG POPULATE 1000000" I obtained the following
results:

    1.6 seconds with djbhash
    2.0 seconds with murmurhash2

This is due to the fact that djbhash will hash objects that follow the
pattern `prefix:<id>` and where the id is numerically near, to near
buckets. This improves the locality.

However in other access patterns with keys that have no relation
murmurhash2 has some (apparently minimal) speed advantage.

On the other hand a better distribution should significantly
improve the quality of the distribution of elements returned with
dictGetRandomKey() that is used in SPOP, SRANDMEMBER, RANDOMKEY, and
other commands.

Everything considered, and under the suspect that this commit fixes a
security issue in Redis, we are switching to the new hash function.
If some serious speed regression will be found in the future we'll be able
to step back easiliy.

This commit fixes issue #663.
2012-10-05 11:20:13 +02:00
antirez
0f3705f3c8 Even inside #if 0 comments are comments. 2012-04-21 21:49:21 +02:00
Salvatore Sanfilippo
cc080d1fc2 Merge pull request #440 from ErikDubbelboer/spelling
Fixed some spelling errors in comments
2012-04-21 03:31:06 -07:00
antirez
8fc5a95344 Currenly not used code in dict.c commented out. 2012-04-18 23:56:07 +02:00
Erik Dubbelboer
358745fcc2 Update src/dict.c 2012-04-07 15:45:53 +03:00
Erik Dubbelboer
1c82a561f1 Fixed some spelling errors in the comments 2012-04-07 14:40:29 +02:00
huangz1990
f7192a2ce1 fix typo 2012-03-15 14:27:14 +08:00
antirez
36d5d67ed7 fixed typo in hahs function seed default value. It is no longer used but fixed to retain the old constant as default anyway. 2012-01-22 01:40:23 +01:00
antirez
fff238e507 Fix for hash table collision attack. We simply randomize hash table initialization value at startup time. 2012-01-21 23:30:13 +01:00