791 Commits

Author SHA1 Message Date
Ping Xie
203b12e41f
Introduce Shard IDs to logically group nodes in cluster mode (#10536)
Introduce Shard IDs to logically group nodes in cluster mode.
1. Added a new "shard_id" field to "cluster nodes" output and nodes.conf after "hostname"
2. Added a new PING extension to propagate "shard_id"
3. Handled upgrade from pre-7.2 releases automatically
4. Refactored PING extension assembling/parsing logic

Behavior of Shard IDs:

Replicas will always follow the shards of their reported primaries. If a primary updates its shard ID, the replica will follow. (This need not follow for cluster v2) This is not an expected use case.
2022-11-16 19:24:18 -08:00
Binbin
e632e62e68
Print IP and port on cluster bus message sanity check (#11443)
* Print IP and port on cluster bus message sanity check

Add a print statement to indicate which IP/port is sending
the error messages. That way we can at least check to see
if it is a node in the cluster or some other nefarious nodes.

It is proposed in #11339.

Unrelated changes: the return check for connAddrPeerName should
be -1 instead of C_ERR, although the value of C_ERR is also -1.

Co-authored-by: Madelyn Olson <34459052+madolson@users.noreply.github.com>
2022-11-01 19:27:30 -07:00
Brennan
47c493e070
Re-design cluster link send buffer to improve memory management (#11343)
Re-design cluster link send queue to improve memory management
2022-11-01 19:26:44 -07:00
Moti Cohen
c0d7226274
Refactor and (internally) rebrand from pause-clients to pause-actions (#11098)
Renamed from "Pause Clients" to "Pause Actions" since the mechanism can pause
several actions in redis, not just clients (e.g. eviction, expiration).

Previously each pause purpose (which has a timeout that's tracked separately from others purposes),
also implicitly dictated what it pauses (reads, writes, eviction, etc). Now it is explicit, and
the actions that are paused (bit flags) are defined separately from the purpose.

- Previously, when using feature pause-client it also implicitly means to make the server static:
  - Pause replica traffic
  - Pauses eviction processing
  - Pauses expire processing

Making the server static is used also for failover and shutdown. This PR internally rebrand
pause-client API to become pause-action API. It also Simplifies pauseClients structure
by replacing pointers array with static array.

The context of this PR is to add another trigger to pause-client which will activated in case
of OOM as throttling mechanism ([see here](https://github.com/redis/redis/issues/10907)).
In this case we want only to pause client, and eviction actions.
2022-10-27 11:57:04 +03:00
Meir Shpilraien (Spielrein)
56f97bfa5f
Fix wrong replication on cluster slotmap changes with module KSN propagation (#11377)
As discussed on #11084, `propagatePendingCommands` should happened after the del
notification is fired so that the notification effect and the `del` will be replicated inside MULTI EXEC.

Test was added to verify the fix.
2022-10-16 08:30:01 +03:00
Meir Shpilraien (Spielrein)
eb6accad40
Fix crash on RM_Call inside module load (#11346)
PR #9320 introduces initialization order changes. Now cluster is initialized after modules.
This changes causes a crash if the module uses RM_Call inside the load function
on cluster mode (the code will try to access `server.cluster` which at this point is NULL).

To solve it, separate cluster initialization into 2 phases:
1. Structure initialization that happened before the modules initialization
2. Listener initialization that happened after.

Test was added to verify the fix.
2022-10-12 13:09:51 +03:00
Binbin
35b3fbd90c
Freeze time sampling during command execution, and scripts (#10300)
Freeze time during execution of scripts and all other commands.
This means that a key is either expired or not, and doesn't change
state during a script execution. resolves #10182

This PR try to add a new `commandTimeSnapshot` function.
The function logic is extracted from `keyIsExpired`, but the related
calls to `fixed_time_expire` and `mstime()` are removed, see below.

In commands, we will avoid calling `mstime()` multiple times
and just use the one that sampled in call. The background is,
e.g. using `PEXPIRE 1` with valgrind sometimes result in the key
being deleted rather than expired. The reason is that both `PEXPIRE`
command and `checkAlreadyExpired` call `mstime()` separately.

There are other more important changes in this PR:
1. Eliminate `fixed_time_expire`, it is no longer needed. 
   When we want to sample time we should always use a time snapshot. 
   We will use `in_nested_call` instead to update the cached time in `call`.
2. Move the call for `updateCachedTime` from `serverCron` to `afterSleep`.
    Now `commandTimeSnapshot` will always return the sample time, the
    `lookupKeyReadWithFlags` call in `getNodeByQuery` will get a outdated
    cached time (because `processCommand` is out of the `call` context).
    We put the call to `updateCachedTime` in `aftersleep`.
3. Cache the time each time the module lock Redis.
    Call `updateCachedTime` in `moduleGILAfterLock`, affecting `RM_ThreadSafeContextLock`
    and `RM_ThreadSafeContextTryLock`

Currently the commandTimeSnapshot change affects the following TTL commands:
- SET EX / SET PX
- EXPIRE / PEXPIRE
- SETEX / PSETEX
- GETEX EX / GETEX PX
- TTL / PTTL
- EXPIRETIME / PEXPIRETIME
- RESTORE key TTL

And other commands just use the cached mstime (including TIME).

This is considered to be a breaking change since it can break a script
that uses a loop to wait for a key to expire.
2022-10-09 08:18:34 +03:00
Madelyn Olson
663fbd3459
Stabilize cluster hostnames tests (#11307)
This PR introduces a couple of changes to improve cluster test stability:
1. Increase the cluster node timeout to 3 seconds, which is similar to the
   normal cluster tests, but introduce a new mechanism to increase the ping
   period so that the tests are still fast. This new config is a debug config.
2. Set `cluster-replica-no-failover yes` on a wider array of tests which are
   sensitive to failovers. This was occurring on the ARM CI.
2022-10-03 09:25:16 +03:00
Binbin
3c02d1acc4
code, typo and comment cleanups (#11280)
- fix `the the` typo
- `LPOPRPUSH` does not exist, should be `RPOPLPUSH`
- `CLUSTER GETKEYINSLOT` 's time complexity should be O(N)
- `there bytes` should be `three bytes`, this closes #11266
- `slave` word to `replica` in log, modified the front and missed the back
- remove useless aofReadDiffFromParent in server.h
- `trackingHandlePendingKeyInvalidations` method adds a void parameter
2022-10-02 13:56:45 +03:00
Binbin
ed4c432ec5
Update CLUSTER NODES help message (#11341)
We will always show the bus-port, and if the node hostname exists, it will also show it.
2022-09-30 06:24:44 -07:00
Binbin
1de675b3d5
Fix CLUSTER SHARDS showing empty hostname (#11297)
* Fix CLUSTER SHARDS showing empty hostname

In #10290, we changed clusterNode hostname from `char*`
to `sds`, and the old `node->hostname` was changed to
`sdslen(node->hostname)!=0`.

But in `addNodeDetailsToShardReply` it is missing.
It results in the return of an empty string hostname
in CLUSTER SHARDS command if it unavailable.

Like this (note that we listed it as optional in the doc):
```
 9) "hostname"
10) ""
```
2022-09-22 11:39:34 -07:00
Viktor Söderqvist
42e4241ece
Avoid crash when a cluster node is a replica of a replica of itself (#11263) 2022-09-13 17:48:48 -07:00
Madelyn Olson
6c03786b66
Prevent use after free for inbound cluster link (#11255) 2022-09-13 16:19:29 -05:00
chendianqiang
e42d98ed27
Correctly handle scripts with shebang (not read-only) on a cluster replica (#11223)
EVAL scripts are by default not considered `write` commands, so they were allowed on a replica.
But when adding a shebang, they become `write` command (unless the `no-writes` flag is added).
With this change we'll handle them as write commands, and reply with MOVED instead of
READONLY when executed on a redis cluster replica.

Co-authored-by: chendianqiang <chendianqiang@meituan.com>
2022-09-05 16:59:14 +03:00
weimeng
8945067544
bugfix:del keys in slot replicate to replica, and trigger other invalidations (#11084)
Bugfix:
with the scenario if we force assigned a slot to other master,
old master will lose the slot ownership, then old master will
call the function delKeysInSlot() to delete all keys which in
the slot. These delete operations should replicate to replicas,
avoid the data divergence issue in master and replicas.

Additionally, in this case, we now call:
* signalModifiedKey (to invalidate WATCH)
* moduleNotifyKeyspaceEvent (key space notification for modules)
* dirty++ (to signal that the persistence file may be outdated)

Co-authored-by: weimeng <weimeng@didiglobal.com>
Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>
2022-08-28 11:37:26 +03:00
Oran Agra
c789fb0aa7
Fix assertion when a key is lazy expired during cluster key migration (#11176)
Redis 7.0 has #9890 which added an assertion when the propagation queue
was not flushed and we got to beforeSleep.
But it turns out that when processCommands calls getNodeByQuery and
decides to reject the command, it can lead to a key that was lazy
expired and is deleted without later flushing the propagation queue.

This change prevents lazy expiry from deleting the key at this stage
(not as part of a command being processed in `call`)
2022-08-24 19:39:15 +03:00
zhenwei pi
0b27cfe37d Introduce .listen into connection type
Introduce listen method into connection type, this allows no hard code
of listen logic. Originally, we initialize server during startup like
this:
    if (server.port)
        listenToPort(server.port,&server.ipfd);
    if (server.tls_port)
        listenToPort(server.port,&server.tlsfd);
    if (server.unixsocket)
        anetUnixServer(...server.unixsocket...);

    ...
    if (createSocketAcceptHandler(&server.ipfd, acceptTcpHandler) != C_OK)
    if (createSocketAcceptHandler(&server.tlsfd, acceptTcpHandler) != C_OK)
    if (createSocketAcceptHandler(&server.sofd, acceptTcpHandler) != C_OK)
    ...

If a new connection type gets supported, we have to add more hard code
to setup listener.

Introduce .listen and refactor listener, and Unix socket supports this.
this allows to setup listener arguments and create listener in a loop.

What's more, '.listen' is defined in connection.h, so we should include
server.h to import 'struct socketFds', but server.h has already include
'connection.h'. To avoid including loop(also to make code reasonable),
define 'struct connListener' in connection.h instead of 'struct socketFds'
in server.h. This leads this commit to get more changes.

There are more fields in 'struct connListener', hence it's possible to
simplify changeBindAddr & applyTLSPort() & updatePort() into a single
logic: update the listener config from the server.xxx, and re-create
the listener.

Because of the new field 'priv' in struct connListener, we expect to pass
this to the accept handler(even it's not used currently), this may be used
in the future.

Signed-off-by: zhenwei pi <pizhenwei@bytedance.com>
2022-08-22 15:16:08 +08:00
zhenwei pi
45617385e7 Use connection name of string
Suggested by Oran, use an array to store all the connection types
instead of a linked list, and use connection name of string. The index
of a connection is dynamically allocated.

Currently we support max 8 connection types, include:
- tcp
- unix socket
- tls

and RDMA is in the plan, then we have another 4 types to support, it
should be enough in a long time.

Introduce 3 functions to get connection type by a fast path:
- connectionTypeTcp()
- connectionTypeTls()
- connectionTypeUnix()

Note that connectionByType() is designed to use only in unlikely code path.

Signed-off-by: zhenwei pi <pizhenwei@bytedance.com>
2022-08-22 15:15:37 +08:00
zhenwei pi
1234e3a562 Fully abstract connection type
Abstract common interface of connection type, so Redis can hide the
implementation and uplayer only calls connection API without macro.

               uplayer
                  |
           connection layer
             /          \
          socket        TLS

Currently, for both socket and TLS, all the methods of connection type
are declared as static functions.

It's possible to build TLS(even socket) as a shared library, and Redis
loads it dynamically in the next step.

Also add helper function connTypeOfCluster() and
connTypeOfReplication() to simplify the code:
link->conn = server.tls_cluster ? connCreateTLS() : connCreateSocket();
-> link->conn = connCreate(connTypeOfCluster());

Signed-off-by: zhenwei pi <pizhenwei@bytedance.com>
2022-08-22 15:11:44 +08:00
zhenwei pi
bff7ecc786 Introduce connAddr
Originally, connPeerToString is designed to get the address info from
socket only(for both TCP & TLS), and the API 'connPeerToString' is
oriented to operate a FD like:
int connPeerToString(connection *conn, char *ip, size_t ip_len, int *port) {
    return anetFdToString(conn ? conn->fd : -1, ip, ip_len, port, FD_TO_PEER_NAME);
}

Introduce connAddr and implement .addr method for socket and TLS,
thus the API 'connAddr' and 'connFormatAddr' become oriented to a
connection like:
static inline int connAddr(connection *conn, char *ip, size_t ip_len, int *port, int remote) {
    if (conn && conn->type->addr) {
        return conn->type->addr(conn, ip, ip_len, port, remote);
    }

    return -1;
}

Also remove 'FD_TO_PEER_NAME' & 'FD_TO_SOCK_NAME', use a boolean type
'remote' to get local/remote address of a connection.

With these changes, it's possible to support the other connection
types which does not use socket(Ex, RDMA).

Thanks to Oran for suggestions!

Signed-off-by: zhenwei pi <pizhenwei@bytedance.com>
2022-08-22 15:01:40 +08:00
Binbin
3a16ad30b7
Fix CLUSTERDOWN issue in cluster reshard unblock test (#11139)
change the cluster-node-timeout from 1 to 1000
2022-08-18 09:18:18 -07:00
judeng
7d8911d22a
Optimize the performance of multi-key commands in cluster mode (#11044)
* Optimize the performance of multi-key commands in cluster mode

* add note
2022-08-04 20:42:56 -07:00
Binbin
90f35cea81
Avoid false positive out-of-bounds in writeForgottenNodePingExt (#11053)
In clusterMsgPingExtForgottenNode, sizeof(name) is CLUSTER_NAMELEN,
and sizeof(clusterMsgPingExtForgottenNode) is > CLUSTER_NAMELEN.
Doing a (name + sizeof(clusterMsgPingExtForgottenNode)) sanitizer
generates an out-of-bounds error which is a false positive in here
2022-07-28 15:14:18 -07:00
Viktor Söderqvist
5032de50f2
Gossip forgotten nodes on CLUSTER FORGET (#10869)
Gossip the cluster node blacklist in ping and pong messages.
This means that CLUSTER FORGET doesn't need to be sent to all nodes in a cluster.
It can be sent to one or more nodes and then be propagated to the rest of them.

For each blacklisted node, its node id and its remaining blacklist TTL is gossiped in a
cluster bus ping extension (introduced in #9530).
2022-07-26 10:28:13 +03:00
Tian
d00b8af892
Don't update node ip when peer fd is closed (#10696) 2022-07-20 16:59:27 -07:00
Tian
cc2848132f
Make cluster config file saving atomic and fsync acl (#10924)
As an outstanding part mentioned in #10737, we could just make the cluster config file and
ACL file saving done with a more safe and atomic pattern (write to temp file, fsync, rename, fsync dir).

The cluster config file uses an in-place overwrite and truncation (which was also used by the
main config file before #7824).
The ACL file is using the temp file and rename approach, but was missing an fsync.

Co-authored-by: 朱天 <zhutian03@meituan.com>
2022-07-20 09:11:01 +03:00
ranshid
eacca729a5
Avoid using unsafe C functions (#10932)
replace use of:
sprintf --> snprintf
strcpy/strncpy  --> redis_strlcpy
strcat/strncat  --> redis_strlcat

**why are we making this change?**
Much of the code uses some unsafe variants or deprecated buffer handling
functions.
While most cases are probably not presenting any issue on the known path
programming errors and unterminated strings might lead to potential
buffer overflows which are not covered by tests.

**As part of this PR we change**
1. added implementation for redis_strlcpy and redis_strlcat based on the strl implementation: https://linux.die.net/man/3/strl
2. change all occurrences of use of sprintf with use of snprintf
3. change occurrences of use of  strcpy/strncpy with redis_strlcpy
4. change occurrences of use of strcat/strncat with redis_strlcat
5. change the behavior of ll2string/ull2string/ld2string so that it will always place null
  termination ('\0') on the output buffer in the first index. this was done in order to make
  the use of these functions more safe in cases were the user will not check the output
  returned by them (for example in rdbRemoveTempFile)
6. we added a compiler directive to issue a deprecation error in case a use of
  sprintf/strcpy/strcat is found during compilation which will result in error during compile time.
  However keep in mind that since the deprecation attribute is not supported on all compilers,
  this is expected to fail during push workflows.


**NOTE:** while this is only an initial milestone. We might also consider
using the *_s implementation provided by the C11 Extensions (however not
yet widly supported). I would also suggest to start
looking at static code analyzers to track unsafe use cases.
For example LLVM clang checker supports security.insecureAPI.DeprecatedOrUnsafeBufferHandling
which can help locate unsafe function usage.
https://clang.llvm.org/docs/analyzer/checkers.html#security-insecureapi-deprecatedorunsafebufferhandling-c
The main reason not to onboard it at this stage is that the alternative
excepted by clang is to use the C11 extensions which are not always
supported by stdlib.
2022-07-18 10:56:26 +03:00
Madelyn Olson
e6a1b2ea95
Fix crash during handshake and cluster shards call (#10942)
* Fix an engine crash when there are nodes in handshaking and a user calls cluster shards
2022-07-10 22:00:44 -07:00
Qu Chen
33b7ff387c
Unlock cluster config file upon server shutdown. (#10912)
Currently in cluster mode, Redis process locks the cluster config file when
starting up and holds the lock for the entire lifetime of the process.
When the server shuts down, it doesn't explicitly release the lock on the
cluster config file. We noticed a problem with restart testing that if you shut down
a very large redis-server process (i.e. with several hundred GB of data stored),
it takes the OS a while to free the resources and unlock the cluster config file.
So if we immediately try to restart the redis server process, it might fail to acquire
the lock on the cluster config file and fail to come up.

This fix explicitly releases the lock on the cluster config file upon a shutdown rather
than relying on the OS to release the lock, which is a cleaner and safer approach to
free up resources acquired.
2022-07-04 09:38:19 +03:00
Tian
069b30a2b3
A minor refinement to clusterbus extension estlen (#10902) 2022-06-27 20:42:55 -07:00
WuYunlong
64205345bc
migrateGetSocket() cleanup.. (#5546)
I think parameter c is only useful to get client reply.
Besides, other commands' host and port parameters may not be the at index 1 and 2.
2022-06-23 18:41:32 +03:00
Bar Shaul
091701f363
Set replicas' configEpoch to 0 when loaded from cluster configuration file (#10798)
* Changed clusterLoadConfig to set the config epoch of replica nodes to 0 when loaded.
2022-06-20 21:40:48 -07:00
judeng
ff6419658b
Optimize the performance of clusterSendPing for large clusters (#10624)
Optimize the performance of clusterSendPing by improving speed of checking for duplicate items in gossip.
2022-06-20 21:02:22 -07:00
Huang Zhw
78960ad57b
Throw -TRYAGAIN instead of -ASK on migrating nodes for multi-key commands when the node only has some of the keys (#9526)
* In cluster getNodeByQuery when target slot is in migrating state and
the slot lack some keys but have at least one key, should return TRYAGAIN.

Before this commit, when a node is in migrating state and recevies
multiple keys command, if some keys don't exist, the command emits
an `ASK` redirection.

After this commit, if some keys exist and some keys don't exist, the
command emits a TRYAGAIN error. If all keys don't exist, the command
emits an `ASK` redirection.
2022-06-13 21:32:43 -07:00
Mixficsol
c751d8a686
Update cluster.c (#10773)
On line 4068, redis has a logical nodeIsSlave(myself) on the outer if layer,
which you can delete without having to repeat the decision
2022-06-06 08:17:18 +03:00
Binbin
2a1ea8c7d8
CLUSTER SHARDS should returns slots as integers, not strings (#10683)
It used to returns slots as strings, like:
```
redis> cluster shards
1) 1) "slots"
   2) 1) "10923"
      2) "16383"
```

CLUSTER SHARDS docs and the top comment of #10293 says that it returns integers.
Note other commands like CLUSTER SLOTS, it returns slots as integers.
Use addReplyLongLong instead of addReplyBulkLongLong, now it returns slots as integers:
```
redis> cluster shards
1) 1) "slots"
   2) 1) (integer) 10923
      2) (integer) 16383
```

This is a small breaking change, introduced in 7.0.0 (7.0 RC3, #10293)

Fixes #10680
2022-05-10 14:22:01 +03:00
guybe7
f49ff156ec
Add RM_PublishMessageShard (#10543)
since PUBLISH and SPUBLISH use different dictionaries for channels and clients,
and we already have an API for PUBLISH, it only makes sense to have one for SPUBLISH

Add test coverage and unifying some test infrastructure.
2022-04-17 15:43:22 +03:00
王恒
ee17e7af8d
improve malloc efficiency: reduce call times of zrealloc (#10533)
* improve malloc efficiency: reduce call times of zrealloc

Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>
2022-04-09 19:52:36 -07:00
bugwz
2db0d898f8
Cluster node name sanity check (#10391)
* Limit cluster node id length for CLUSTER commands loading
* Cluster node name sanity check for length and values

Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>
2022-04-04 22:51:51 -07:00
Viktor Söderqvist
b53c7f2c0b
Turn into replica on SETSLOT (#10489)
* Fix race condition where node loses its last slot and turns into replica

When a node has lost its last slot and finds out from the SETSLOT command
before the cluster bus PONG from the new owner arrives. In this case, the
node didn't turn itself into a replica of the new slot owner.

This commit adds the same logic to the SETSLOT command as already exists
for the cluster bus PONG processing.

* Revert "Fix new / failing cluster slot migration test (#10482)"

This reverts commit 0b21ef8d49c47a5dd47a0bcc0cf1b2c235f24fd0.

In this test, the old slot owner finds out that it has lost its last
slot in a nondeterministic way. Either the cluster bus PONG from the
new slot owner and sometimes in a SETSLOT command from redis-cli. In
both cases, the result should be the same and the old owner should
turn itself into a replica of the new slot owner.
2022-04-02 14:58:07 -07:00
Oran Agra
3b1e65a32b
improve malloc efficiency for cluster slots_info_pairs (#10488)
This commit improve malloc efficiency of the slots_info_pairs mechanism in cluster.c
by changing adlist into an array being realloced with greedy growth mechanism

Recently the cluster tests are consistently failing when executed with ASAN in the CI.
I tried to track down the commit that started it, and it appears to be #10293.
Looking at the commit, i realize it didn't affect this test / flow, other than the
replacement of the slots_info_pairs from sds to list.

I concluded that what could be happening is that the slot range is very fragmented,
and that results in many allocations.
with sds, it results in one allocation and also, we have a greedy growth mechanism,
but with adlist, we just have many many small allocations.
this probably causes stress on ASAN, and causes it to be slow at termination.
2022-03-29 10:05:06 +03:00
Madelyn Olson
557222d1e0
Fix timing issue in shards test and fix displayed TLS port (#10450) 2022-03-20 22:08:40 -07:00
Madelyn Olson
e8771efda9
Fixed incorrect parsing of hostname information from nodes.conf (#10435) 2022-03-16 14:07:24 -07:00
Harkrishn Patro
45ccae89bb
Add new cluster shards command (#10293)
Implement a new cluster shards command, which provides a flexible and extensible API for topology discovery.

Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>
2022-03-15 18:24:40 -07:00
Binbin
af6d5c5932
Constrain cluster node name logging (#10376)
Cluster node name is not null terminated, so need to be constrained.
2022-03-02 22:41:31 -08:00
Binbin
ad7a6275ff
Fix node-id type in cluster-setslot (#10348)
* The type of node-id should be string, not integer.
* Also improve the CLUSTER SETSLOT help message.
2022-02-27 23:46:57 -08:00
Itamar Haber
c81c7f51c3
Add stream consumer group lag tracking and reporting (#9127)
Adds the ability to track the lag of a consumer group (CG), that is, the number
of entries yet-to-be-delivered from the stream.

The proposed constant-time solution is in the spirit of "best-effort."

Partially addresses #8737.

## Description of approach

We add a new "entries_added" property to the stream. This starts at 0 for a new
stream and is incremented by 1 with every `XADD`.  It is essentially an all-time
counter of the entries added to the stream.

Given the stream's length and this counter value, we can trivially find the logical
"entries_added" counter of the first ID if and only if the stream is contiguous.
A fragmented stream contains one or more tombstones generated by `XDEL`s.
The new "xdel_max_id" stream property tracks the latest tombstone.

The CG also tracks its last delivered ID's as an "entries_read" counter and
increments it independently when delivering new messages, unless the this
read counter is invalid (-1 means invalid offset). When the CG's counter is
available, the reported lag is the difference between added and read counters.

Lastly, this also adds a "first_id" field to the stream structure in order to make
looking it up cheaper in most cases.

## Limitations

There are two cases in which the mechanism isn't able to track the lag.
In these cases, `XINFO` replies with `null` in the "lag" field.

The first case is when a CG is created with an arbitrary last delivered ID,
that isn't "0-0", nor the first or the last entries of the stream. In this case,
it is impossible to obtain a valid read counter (short of an O(N) operation).
The second case is when there are one or more tombstones fragmenting
the stream's entries range.

In both cases, given enough time and assuming that the consumers are
active (reading and lacking) and advancing, the CG should be able to
catch up with the tip of the stream and report zero lag.
Once that's achieved, lag tracking would resume as normal (until the
next tombstone is set).

## API changes

* `XGROUP CREATE` added with the optional named argument `[ENTRIESREAD entries-read]`
  for explicitly specifying the new CG's counter.
* `XGROUP SETID` added with an optional positional argument `[ENTRIESREAD entries-read]`
  for specifying the CG's counter.
* `XINFO` reports the maximal tombstone ID, the recorded first entry ID, and total
  number of entries added to the stream.
* `XINFO` reports the current lag and logical read counter of CGs.
* `XSETID` is an internal command that's used in replication/aof. It has been added with
  the optional positional arguments `[ENTRIESADDED entries-added] [MAXDELETEDID max-deleted-entry-id]`
  for propagating the CG's offset and maximal tombstone ID of the stream.

## The generic unsolved problem

The current stream implementation doesn't provide an efficient way to obtain the
approximate/exact size of a range of entries. While it could've been nice to have
that ability (#5813) in general, let alone specifically in the context of CGs, the risk
and complexities involved in such implementation are in all likelihood prohibitive.

## A refactoring note

The `streamGetEdgeID` has been refactored to accommodate both the existing seek
of any entry as well as seeking non-deleted entries (the addition of the `skip_tombstones`
argument). Furthermore, this refactoring also migrated the seek logic to use the
`streamIterator` (rather than `raxIterator`) that was, in turn, extended with the
`skip_tombstones` Boolean struct field to control the emission of these.

Co-authored-by: Guy Benoish <guy.benoish@redislabs.com>
Co-authored-by: Oran Agra <oran@redislabs.com>
2022-02-23 22:34:58 +02:00
Binbin
c0ea77f0e1
Show publishshard_sent stat in cluster info (#10314)
publishshard was added in #8621 (7.0 RC1), but the publishshard_sent
stat is not shown in CLUSTER INFO command.

Other changes:
1. Remove useless `needhelp` statements, it was removed in 3dad819.
2. Use `LL_WARNING` log level for some error logs (I/O error, Connection failed).
3. Fix typos that saw by the way.
2022-02-19 21:11:20 -08:00
Ping Xie
f7f68c654a
Use sds for clusterNode.hostname (#10290)
* Provide a fallback static_assert implementation
* Use sds for clusterNode.hostname
2022-02-16 13:35:49 -08:00
Harkrishn Patro
a5d17f0b6c
Check target node is a primary during cluster setslot. (#10277) 2022-02-10 23:14:27 -08:00