From 0992ada2fe1cbc8f5f25c0cdec67cb69bb8d3810 Mon Sep 17 00:00:00 2001 From: WuYunlong Date: Tue, 31 Dec 2019 18:15:21 +0800 Subject: [PATCH] Fix petential cluster link error. Funcion adjustOpenFilesLimit() has an implicit parameter, which is server.maxclients. This function aims to ajust maximum file descriptor number according to server.maxclients by best effort, which is "bestlimit" could be lower than "maxfiles" but greater than "oldlimit". When we try to increase "maxclients" using CONFIG SET command, we could increase maximum file descriptor number to a bigger value without calling aeResizeSetSize the same time. When later more and more clients connect to server, the allocated fd could be bigger and bigger, and eventually exceeds events size of aeEventLoop.events. When new nodes joins the cluster, new link is created, together with new fd, but when calling aeCreateFileEvent, we did not check the return value. In this case, we have a non-null "link" but the associated fd is not registered. So when we dynamically set "maxclients" we could reach an inconsistency between maximum file descriptor number of the process and server.maxclients. And later could cause cluster link and link fd inconsistency. While setting "maxclients" dynamically, we consider it as failed when resulting "maxclients" is not the same as expected. We try to restore back the maximum file descriptor number when we failed to set "maxclients" to the specified value, so that server.maxclients could act as a guard as before. --- src/config.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/src/config.c b/src/config.c index 9aa847183..ed19d336b 100644 --- a/src/config.c +++ b/src/config.c @@ -2098,6 +2098,10 @@ static int updateMaxclients(long long val, long long prev, char **err) { static char msg[128]; sprintf(msg, "The operating system is not able to handle the specified number of clients, try with %d", server.maxclients); *err = msg; + if (server.maxclients > prev) { + server.maxclients = prev; + adjustOpenFilesLimit(); + } return 0; } if ((unsigned int) aeGetSetSize(server.el) <