I think the log of pfail status changes will be very useful.
The other parts were scanned and found that it can be modified.
Changes:
1. Changing pfail status releated logs from VERBOSE to NOTICE.
2. Changing configEpoch collision log from VERBOSE(warning) to NOTICE.
3. Changing some logs from DEBUG to NOTICE.
Signed-off-by: Binbin <binloveplay1314@qq.com>
Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>
In markNodeAsFailingIfNeeded we will count needed_quorum and failures,
needed_quorum is the half the cluster->size and plus one, and
cluster-size
is the size of primary node which contain slots, but when counting
failures, we dit not check if primary has slots.
Only the primary has slots that has the rights to vote, adding a new
clusterNodeIsVotingPrimary to formalize this concept.
Release notes:
bugfix where nodes not in the quorum group might spuriously mark nodes
as failed
---------
Signed-off-by: Binbin <binloveplay1314@qq.com>
Co-authored-by: Ping Xie <pingxie@outlook.com>
When there is a link failure while an ongoing MEET request is sent the
sending node stops sending anymore MEET and starts sending PINGs. Since
every node responds to PINGs from unknown nodes with a PONG, the
receiving node never adds the sending node. But the sending node adds
the receiving node when it sees a PONG. This can lead to asymmetry in
cluster membership. This changes makes the sender keep sending MEET
until it sees a PONG, avoiding the asymmetry.
---------
Signed-off-by: Sankar <1890648+srgsanky@users.noreply.github.com>
Safe iterators must call resetIterators when they are done being used.
Fix one issue where a safe iterator was not correctly calling reset
during cluster slot caching and fixed a second issue where reset
iterator was being called twice. For the double release case,
kvstoreIteratorNextDict is responsible for patching up the iterator, but
we were calling it a second time in kvstoreIteratorNext.
In addition, I added some documentation around initializing iterators,
added an assert to prevent double initialization, and remove a function
from the public interface which isn't needed and might lead to incorrect
usage of the safe iterators.
Bumping srgsanky for finding it here:
c4782066e7 (r142867004).
---------
Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
More rebranding of
* Log messages (#252)
* The DENIED error reply
* Internal function names and comments, mainly Lua API
---------
Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
Remove the unused value duplicate API from dict. It's unused in the codebase and introduces unnecessary overhead.
---------
Signed-off-by: Eran Liberty <eran.liberty@gmail.com>
There are currently three block types: BLOCKED_WAIT, BLOCKED_WAITAOF,
and BLOCKED_WAIT_PREREPL, used to block clients executing `WAIT`,
`WAITAOF`, and `CLUSTER SETSLOT`, respectively. They share the same
workflow: the client is blocked until replication to the expected number
of replicas completes. However, they provide different responses
depending on the commands involved. Using distinct block types leads to
code duplication and reduced readability. This PR consolidates the three
types into a single WAIT type, differentiating them using the pending
command to ensure the appropriate response is returned.
Fix#427
---------
Signed-off-by: Ping Xie <pingxie@google.com>
In #53, we will cache the CLUSTER SLOTS response to improve the
throughput and reduct the latency.
In the code snippet below, the second cluster slots will use the
old hostname:
```
config set cluster-preferred-endpoint-type hostname
config set cluster-announce-hostname old-hostname.com
multi
cluster slots
config set cluster-announce-hostname new-hostname.com
cluster slots
exec
```
When updating the hostname, in updateAnnouncedHostname, we will set
CLUSTER_TODO_SAVE_CONFIG and we will do a clearCachedClusterSlotsResponse
in clusterSaveConfigOrDie, so harmless in most cases.
Move the clearCachedClusterSlotsResponse call to clusterDoBeforeSleep
instead of scheduling it to be called in clusterSaveConfigOrDie.
Signed-off-by: Binbin <binloveplay1314@qq.com>
This aligns the behaviour with established Valkey commands with a
TIMEOUT argument, such as BLPOP.
Fix#422
Signed-off-by: Ping Xie <pingxie@google.com>
I have validated that these settings closely match the existing coding
style with one major exception on `BreakBeforeBraces`, which will be
`Attach` going forward. The mixed `BreakBeforeBraces` styles in the
current codebase are hard to imitate and also very odd IMHO - see below
```
if (a == 1) { /*Attach */
}
```
```
if (a == 1 ||
b == 2)
{ /* Why? */
}
```
Please do NOT merge just yet. Will add the github action next once the
style is reviewed/approved.
---------
Signed-off-by: Ping Xie <pingxie@google.com>
This commit adds a logic to cache `CLUSTER SLOTS` response for reduced
latency and also updates the cache when a change in the cluster is
detected.
Historically, `CLUSTER SLOTS` command was deprecated, however all the
server clients have been using `CLUSTER SLOTS` and have not migrated to
`CLUSTER SHARDS`. In future this logic can be added to any other
commands to improve the performance of the engine.
---------
Signed-off-by: Roshan Khatri <rvkhatri@amazon.com>
Cleaned up the minor cluster refactoring notes that were intended to be
follow ups that never happened. Basically:
1. Minor style nitpicks
2. Generalized clusterNodeIsMyself so that it wasn't implementation
dependent.
3. Removed getMyClusterId, and just make it an explicit call to myself's
name, which seems more straightforward and removes unnecessary
abstraction.
4. Remove clusterNodeGetSlaveof infavor of clusterNodeGetMaster. We
already do a check if it's a replica, and if it wasn't working it would
have been crashing.
Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
Renamed redis to valkey/server in aof.c serverlogs.
The AOF rewrite child process title is set to "redis-aof-rewrite" if
Valkey was started from a redis-server symlink, otherwise to
"valkey-aof-rewrite".
This is a breaking changes since logs are changed.
Part of #207.
---------
Signed-off-by: Shivshankar-Reddy <shiva.sheri.github@gmail.com>
This patch try to do following things:
1. Rename `redis_*` and `REDIS_*` macros defined in config.h to
`valkey_*`, `VALKEY_*` and update associated used files. (`redis_fstat`,
`redis_fsync`, `REDIS_THREAD_STACK_SIZE`, etc.)
2. Remove the leading double underscore for guard macro in config.h.
---------
Signed-off-by: Lipeng Zhu <lipeng.zhu@intel.com>
Log messages containing "Redis" in some files are changed.
Add macro SERVER_TITLE defined to "Valkey" (uppercase V) is introduced
and used in log messages, so at least it will be easy to patch this
definition to get Redis or any other name in the logs instead of Valkey.
Change "Redis" in some log messages to use %s and SERVER_TITLE.
This is a partial implementation of
https://github.com/valkey-io/valkey/issues/207
---------
Signed-off-by: 0del <bany.y0599@gmail.com>
This includes comments used for module API documentation.
* Strategy for replacement: Regex search: `(//|/\*| \*|#).* ("|\()?(r|R)edis( |\.
|'|\n|,|-|\)|")(?!nor the names of its contributors)(?!Ltd.)(?!Labs)(?!Contributors.)`
* Don't edit copyright comments
* Replace "Redis version X.X" -> "Redis OSS version X.X" to distinguish
from newly licensed repository
* Replace "Redis Object" -> "Object"
* Exclude markdown for now
* Don't edit Lua scripting comments referring to redis.X API
* Replace "Redis Protocol" -> "RESP"
* Replace redis-benchmark, -cli, -server, -check-aof/rdb with "valkey-"
prefix
* Most other places, I use best judgement to either remove "Redis", or
replace with "the server" or "server"
Fixes#148
---------
Signed-off-by: Jacob Murphy <jkmurphy@google.com>
Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
Ref: https://github.com/redis/redis/pull/12760
### Description
#### Fixes compatibilty of PlaceholderKV cluster (7.2 - extensions
enabled by default) with older Redis cluster (< 7.0 - extensions not
handled) .
With some of the extensions enabled by default in 7.2 version, new nodes
running 7.2 and above start sending out larger clusterbus message
payload including the ping extensions. This caused an incompatibility
with node running engine versions < 7.0. Old nodes (< 7.0) would receive
the payload from new nodes (> 7.2) would observe a payload length
(totlen) > (estlen) and would perform an early exit and won't process
the message.
This fix introduces a flag `extensions_supported` on the clusterMsg
indicating the sender node can handle extensions parsing. Once, a
receiver nodes receives a message with this flag set to 1, it would
update clusterNode new field extensions_supported and start sending out
extensions if it has any.
This PR also introduces a DEBUG sub command to enable/disable cluster
message extensions `process-clustermsg-extensions` feature.
Note: A successful `PING`/`PONG` is required as a sender for a given
node to be marked as `extensions_supported` and then extensions message
will be sent to it. This could cause a slight delay in receiving the
extensions message(s).
### Testing
TCL test verifying the cluster state is healthy irrespective of
enabling/disabling cluster message extensions feature.
---------
Signed-off-by: Harkrishn Patro <harkrisp@amazon.com>
This commit updates the processing of PONG gossip messages in the
cluster. When a node (B) becomes a replica due to a failover, its PONG
messages include its new primary node's (A) information and B's
configuration epoch is aligned with A's. This allows observer nodes to
identify changes in primary-ship, addressing issues of intermediate
states and enhancing cluster state consistency during topology changes.
Fix#13018
The receiver does not update any of its cluster state based on gossip
about itself. This commit explicitly avoids sending or processing gossip
about the receiver.
Currently cluster bus gossips include 10% of nodes in the cluster with a
minimum of 3 nodes. For up to 30 node clusters, this commit makes sure
that 1/3 of the gossip (1 out of 3 gossips) is never discarded. This
should help with relatively faster convergence of cluster state in
general.
After fix for #13033, address sanitizer reports this heap-use-after-free
error. When the pubsubshard_channels dict becomes empty, we will delete
the dict, and the dictReleaseIterator will call dictResetIterator, it
will use the dict so we will trigger the error.
This PR introduced a new struct kvstoreDictIterator to wrap
dictIterator.
Replace the original dict iterator with the new kvstore dict iterator.
---------
Co-authored-by: Oran Agra <oran@redislabs.com>
Co-authored-by: guybe7 <guy.benoish@redislabs.com>
# Description
Gather most of the scattered `redisDb`-related code from the per-slot
dict PR (#11695) and turn it to a new data structure, `kvstore`. i.e.
it's a class that represents an array of dictionaries.
# Motivation
The main motivation is code cleanliness, the idea of using an array of
dictionaries is very well-suited to becoming a self-contained data
structure.
This allowed cleaning some ugly code, among others: loops that run twice
on the main dict and expires dict, and duplicate code for allocating and
releasing this data structure.
# Notes
1. This PR reverts the part of https://github.com/redis/redis/pull/12848
where the `rehashing` list is global (handling rehashing `dict`s is
under the responsibility of `kvstore`, and should not be managed by the
server)
2. This PR also replaces the type of `server.pubsubshard_channels` from
`dict**` to `kvstore` (original PR:
https://github.com/redis/redis/pull/12804). After that was done,
server.pubsub_channels was also chosen to be a `kvstore` (with only one
`dict`, which seems odd) just to make the code cleaner by making it the
same type as `server.pubsubshard_channels`, see
`pubsubtype.serverPubSubChannels`
3. the keys and expires kvstores are currenlty configured to allocate
the individual dicts only when the first key is added (unlike before, in
which they allocated them in advance), but they won't release them when
the last key is deleted.
Worth mentioning that due to the recent change the reply of DEBUG
HTSTATS changed, in case no keys were ever added to the db.
before:
```
127.0.0.1:6379> DEBUG htstats 9
[Dictionary HT]
Hash table 0 stats (main hash table):
No stats available for empty dictionaries
[Expires HT]
Hash table 0 stats (main hash table):
No stats available for empty dictionaries
```
after:
```
127.0.0.1:6379> DEBUG htstats 9
[Dictionary HT]
[Expires HT]
```
In the following case sender may be unknown, so we need to set up a
NULL check for sender:
```
/* If this is a MEET packet from an unknown node, we still process
* the gossip section here since we have to trust the sender because
* of the message type. */
if (!sender && type == CLUSTERMSG_TYPE_MEET)
clusterProcessGossipSection(hdr,link);
```
There have been occasional instances of memory corruption (though code bugs or bit flips) leading to invalid node information being gossiped around. To prevent this invalid information spreading, we verify the node IDs in received gossip are in an acceptable format, and disregard any gossiped nodes with invalid IDs. This PR uses the existing verifyClusterNodeId function to check the validity of the gossiped node IDs and if an invalid one is encountered, logs raw byte information to help debug the corruption.
---------
Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>
Crash reported in #12695. In the process of upgrading the cluster from
7.0 to 7.2, because the 7.0 nodes will not gossip shard id, in 7.2 we
will rely on shard id to build the server.cluster->shards dict.
In some cases, for example, the 7.0 master node and the 7.2 replica node.
From the view of 7.2 replica node, the cluster->shards dictionary does not
have its master node. In this case calling CLUSTER SHARDS on the 7.2 replica
node may crash.
We should fix the underlying assumption of updateShardId, which is that the
shard dict should be always in sync with the node's shard_id. The fix was
suggested by PingXie, see more details in #12695.
If there are nodes in the cluster that do not support shard-id, they
will gossip shard-id. From the perspective of nodes that support shard-id,
their shard-id is meaningless (since shard-id is randomly generated when
we create a node.)
Nodes that support shard-id will save the shard-id information in nodes.conf.
If the node is restarted according to nodes.conf, the server will report a
corrupted cluster config file error. Because auxShardIdSetter will reject
configurations with inconsistent master-replica shard-ids.
A cluster-wide consensus for the node's shard_id is not necessary. The key
is maintaining consistency of the shard_id on each individual 7.2 node.
As the cluster progressively upgrades to version 7.2, we can expect the
shard_ids across all nodes to naturally converge and align.
In this PR, when processing the gossip, if sender is a replica and does not
support shard-id, set the shard_id to the shard_id of its master.
We have achieved replacing `slots_to_keys` radix tree with key->slot
linked list (#9356), and then replacing the list with slot specific
dictionaries for keys (#11695).
Shard channels behave just like keys in many ways, and we also need a
slots->channels mapping. Currently this is still done by using a radix
tree. So we should split `server.pubsubshard_channels` into 16384 dicts
and drop the radix tree, just like what we did to DBs.
Some benefits (basically the benefits of what we've done to DBs):
1. Optimize counting channels in a slot. This is currently used only in
removing channels in a slot. But this is potentially more useful:
sometimes we need to know how many channels there are in a specific slot
when doing slot migration. Counting is now implemented by traversing the
radix tree, and with this PR it will be as simple as calling `dictSize`,
from O(n) to O(1).
2. The radix tree in the cluster has been removed. The shard channel
names no longer require additional storage, which can save memory.
3. Potentially useful in slot migration, as shard channels are logically
split by slots, thus making it easier to migrate, remove or add as a
whole.
4. Avoid rehashing a big dict when there is a large number of channels.
Drawbacks:
1. Takes more memory than using radix tree when there are relatively few
shard channels.
What this PR does:
1. in cluster mode, split `server.pubsubshard_channels` into 16384
dicts, in standalone mode, still use only one dict.
2. drop the `slots_to_channels` radix tree.
3. to save memory (to solve the drawback above), all 16384 dicts are
created lazily, which means only when a channel is about to be inserted
to the dict will the dict be initialized, and when all channels are
deleted, the dict would delete itself.
5. use `server.shard_channel_count` to keep track of the number of all
shard channels.
---------
Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
This is a follow-up fix to #12733. We need to apply the same changes to
delKeysInSlot. Refer to #12733 for more details.
This PR contains some other minor cleanups / improvements to the test
suite and docs.
It uses the postnotifications test module in a cluster mode test which
revealed a leak in the test module (fixed).
When loading RDB on cluster nodes, it is necessary to consider the
scenario where a node is a replica.
For example, during a rolling upgrade, new version instances are often
mounted as replicas on old version instances. In this case, the full
synchronization legacy RDB does not contain slot information, and the
new version instance, acting as a replica, should be able to handle the
legacy RDB correctly for `dbExpand`.
Additionally, renaming `getMyClusterSlotCount` to `getMyShardSlotCount`
would be appropriate.
Introduced in #11695
A followup PR for #12742
Add some brief comments explaining the purpose of the file to the head
of cluster_legacy.c and cluster.c.
Add copyright notice to cluster.c
Signed-off-by: Josh Hershberg <yehoshua@redis.com>
Co-authored-by: Josh Hershberg <yehoshua@redis.com>
The failover command is up until now not supported
in cluster mode. This commit allows a cluster
implementation to support the command. The legacy
clustering implementation still does not support
this command.
Signed-off-by: Josh Hershberg <yehoshua@redis.com>
Move primary functions used to implement datapath
clustering into cluster.c, making them shared. This
required adding "accessor" and other functions to
abstract access to node details and cluster state.
Signed-off-by: Josh Hershberg <yehoshua@redis.com>
Divide up clusterCommand into clusterCommand for shared
sub-commands and clusterCommandSpecial for implementation
specific sub-commands. So to, the cluster command help
sub-command has been divided into two implementations,
clusterCommandHelp and clusterCommandHelpSpecial. Some
common sub-subcommand implementations have been extracted
and their implemenations either made shared or else
implementation specific.
Signed-off-by: Josh Hershberg <yehoshua@redis.com>