265 Commits

Author SHA1 Message Date
antirez
69fa4d0233 Sentinel: clarify effect of resetting failover_start_time. 2015-05-25 10:32:28 +02:00
antirez
33c2d05783 Sentinel: help subcommand in simulate-failure command 2015-05-25 10:24:27 +02:00
antirez
10b5f1ace6 Sentinel: initial failure simulator implemented
This commit adds the SENTINEL simulate-failure, that sets specific
hooks inside the state machine that will crash Sentinel, for testing
purposes.
2015-05-22 11:49:11 +02:00
antirez
28664641f4 Sentinel: fix sentinelTryConnectionSharing() by checking for no match
Trivial omission of the obvious no-match case.
2015-05-20 09:59:55 +02:00
antirez
275f5db399 Sentinel: SENTINEL CKQUORUM command
A way for monitoring systems to check that Sentinel is technically able
to reach the quorum and failover, using the currently visible Sentinels.
2015-05-18 12:57:47 +02:00
antirez
a93a48b2cb Sentinel: port address update code to shared links logic 2015-05-15 09:47:05 +02:00
antirez
6034acbfe5 Sentinel: config-rewrite unique ID just one time 2015-05-14 17:45:09 +02:00
antirez
9ab415cca5 Sentinel: remove debugging message from releaseInstanceLink() 2015-05-14 14:12:45 +02:00
antirez
5672f4c368 Sentinel: fix access to NULL link->cc in releaseInstanceLink() 2015-05-14 14:08:23 +02:00
antirez
da81f5b648 Sentinel: remove SHARED! debugging printf 2015-05-14 13:40:23 +02:00
antirez
5aa783eac9 Sentinel: rewrite callback chain removing instances with shared links
Otherwise pending commands callbacks will fire with a reference that no
longer exists.
2015-05-14 13:39:26 +02:00
antirez
be62919a68 Sentinel: debugging code removed from sentinelSendPing() 2015-05-14 10:52:32 +02:00
antirez
689afe98ee Sentinel: use active/last time for ping logic
The PING trigger was improved again by using two fields instead of a
single one to remember when the last ping was sent:

1. The "active" ping is the time at which we sent the last ping that
still received no reply. However we continue to ping non replying
instances even if they have an old active ping: the link may be
disconnected and reconencted in the meantime so the older pings may get
lost even if it's a TCP socket.

2. The "last" ping is the time at which we really sent the last ping
on the wire, and this is used in order to throttle the amount of pings
we send during failures (when no pong is received).

All in all the failure detector effectiveness should be identical but we
avoid to flood instances with pings during failures or when they are
slow.
2015-05-14 09:56:23 +02:00
antirez
773a7fe5c4 Sentinel: limit reconnection frequency to the ping period 2015-05-13 14:23:57 +02:00
antirez
1a7c6f5e04 Sentinel: PING trigger improved
It's ok to ping as soon as the ping period has elapsed since we received
the last PONG, but it's not good that we ping again if there is a
pending ping... With this change we'll send a new ping if there is one
pending only if two times the ping period elapsed since the ping which
is still pending was sent.
2015-05-12 17:03:53 +02:00
antirez
f54299a9e3 Sentinel: same-Sentinel link sharing across masters 2015-05-12 17:03:00 +02:00
antirez
67d19e865e Sentinel: add sentinelGetInstanceTypeString() fuction
This is useful for debugging and logging activities: given a
sentinelRedisInstance object returns a C string representing the
instance type: master, slave, sentinel.
2015-05-12 12:12:25 +02:00
antirez
7346c06fb0 Sentinel: add link refcount to instance description 2015-05-11 23:49:19 +02:00
therealbill
e19a09c411 adding a sentinel command: "flushconfig"
This new command triggers a config flush to save the in-memory config to
disk. This is useful for cases of a configuration management system or a
package manager wiping out your sentinel config while the process is
still running - and has not yet been restarted. It can also be useful
for scripting a backup and migrate or clone of a running sentinel.
2015-05-11 14:08:57 -05:00
antirez
c257c0b1c1 Sentinel: connection sharing WIP #1 2015-05-11 13:15:26 +02:00
antirez
fa9695e2d9 Sentinel: suppress warnings for not used args. 2015-05-08 17:17:59 +02:00
antirez
3eb45318d6 Sentinel: generate +sentinel again, removed in prev commit. 2015-05-08 17:16:48 +02:00
antirez
3cc4c341e7 Sentinel: Use privdata instead of c->data in sentinelReceiveHelloMessages()
This way we may later share the hiredis link "c" among the same Sentinel
instance referenced multiple times for multiple masters.
2015-05-08 17:16:39 +02:00
antirez
8b0615a04f Sentinel: clarify arguments of SENTINEL IS-MASTER-DOWN-BY-ADDR 2015-05-08 17:16:00 +02:00
antirez
038a39c0f8 Sentinel: don't detect duplicated Sentinels, just address switch
Since with a previous commit Sentinels now persist their unique ID, we
no longer need to detect duplicated Sentinels and re-add them. We remove
and re-add back using different events only in the case of address
switch of the same Sentinel, without generating a new +sentinel event.
2015-05-07 10:07:47 +02:00
antirez
686b84abde Sentinel: persist its unique ID across restarts.
Previously Sentinels always changed unique ID across restarts, relying
on the server.runid field. This is not a good idea, and forced Sentinel
to rely on detection of duplicated Sentinels and a potentially dangerous
clean-up and re-add operation of the Sentinel instance that was
rebooted.

Now the ID is generated at the first start and persisted in the
configuration file, so that a given Sentinel will have its unique
ID forever (unless the configuration is manually deleted or there is a
filesystem corruption).
2015-05-06 16:19:14 +02:00
therealbill
b04184f34c Making sentinel flush config on +slave
Originally, only the +slave event which occurs when a slave is
reconfigured during sentinelResetMasterAndChangeAddress triggers a flush
of the config to disk.  However, newly discovered slaves don't
apparently trigger this flush but do trigger the +slave event issuance.

So if you start up a sentinel, add a master, then add a slave to the
master (as a way to reproduce it) you'll see the +slave event issued,
but the sentinel config won't be updated with the known-slave entry.

This change makes sentinel do the flush of the config if a new slave is
deteted in sentinelRefreshInstanceInfo.
2015-05-04 12:54:13 +02:00
antirez
b2d4dddaf9 Sentinel: remove useless sentinelFlushConfig() call
To rewrite the config in the loop that adds slaves back after a master
reset, in order to handle switching to another master, is useless: it
just adds latency since there is an fsync call in the inner loop,
without providing any additional guarantee, but the contrary, since if
after the first loop iteration the server crashes we end with just a
single slave entry losing all the other informations.

It is wiser to rewrite the config at the end when the full new
state is configured.
2015-05-04 12:50:44 +02:00
clark.kang
0040a15d13 fix sentinel memory leak 2015-04-29 00:05:26 +09:00
Salvatore Sanfilippo
4aa4365112 Merge pull request #2386 from inkel/sentinel-add-client-command
Support CLIENT commands in Redis Sentinel
2015-03-13 18:23:36 +01:00
Salvatore Sanfilippo
b0eb128ac7 Merge pull request #2054 from mattsta/fix-set-sentinel-quorum
Sentinel: Add initial quorum bounds check
2015-02-25 10:09:40 +01:00
Salvatore Sanfilippo
b2ffd67c91 Merge pull request #1966 from mattsta/fix-sentinel-info
Sentinel: Improve INFO command behavior
2015-02-24 17:20:09 +01:00
Leandro López (inkel)
b5e5db9b86 Support CLIENT commands in Redis Sentinel
When trying to debug sentinel connections or max connections errors it
would be very useful to have the ability to see the list of connected
clients to a running sentinel. At the same time it would be very helpful
to be able to name each sentinel connection or kill offending clients.

This commits adds the already defined CLIENT commands back to Redis
Sentinel.
2015-02-02 18:16:18 -03:00
Matt Stancliff
01b7155ff5 Fix three simple clang analyzer warnings 2014-12-23 09:31:04 -05:00
Matt Stancliff
87d6324607 Add addReplyBulkSds() function
Refactor a common pattern into one function so we don't
end up with copy/paste programming.
2014-12-23 09:31:02 -05:00
Matt Stancliff
1ad036ab8f Add 'age' value to SENTINEL INFO-CACHE 2014-12-22 21:17:04 -05:00
antirez
070ec599ba sdsformatip() removed.
Specialized single-use function. Not the best match for sds.c btw.
Also genClientPeerId() is no longer static: we need symbols.
2014-12-11 18:29:04 +01:00
antirez
3d476bf2b6 AnetFormatIP(): renamed, commented, now sticks to IP:port format.
A few code style changes + consistent format: not nice for humans but
better for parsers.
2014-12-11 18:20:30 +01:00
Matt Stancliff
aca61af174 Sentinel: Improve INFO command behavior
Improvements:
  - Return empty string if asking for non-existing section (INFO foo)
  - Fix potential memory leak (caused by sdsempty() then returned if >2 args)
  - Clean up argument parsing
  - Allow "all" as valid section (same as "default" or zero args currently)
  - Move strcasecmp to end of evaluation chain in conditionals

Also, since we're C99, I moved some variable declarations to be closer
to where they are actually used (saves us from needing to free an empty info
if detect argument errors up front).

Closes #1915
Closes #1966
2014-12-11 10:49:16 -05:00
Matt Stancliff
f7a98bdf4d Cleanup all IP formatting code
Instead of manually checking for strchr(n,':') everywhere,
we can use our new centralized IP formatting functions.
2014-12-11 10:12:18 -05:00
antirez
8a03ffa160 Sentinel: INFO-CACHE comments reworked a bit.
Changed in order to make them more review friendly, based on the
experience of reviewing the code myself.
2014-12-10 11:15:13 +01:00
antirez
3f2975ad12 Sentinel: INFO-CACHE GCC minior code cleanup.
I guess the initial goal of the initialization was to suppress GCC
warning, but if we have to initialize, we can do it with the base-case
value instead of NULL which is never retained.
2014-12-10 11:12:26 +01:00
antirez
043ae412ca Sentinel: removed useless flag var from INFO-CACHE. 2014-12-10 11:05:37 +01:00
antirez
71e4635ae9 Sentinel: INFO-CACHE reply format command shortened. 2014-12-10 11:04:24 +01:00
Matt Stancliff
c25d7ceaee Add SENTINEL INFO-CACHE [masters...]
Sentinel queries the INFO from every master and from every replica of
every master.

We can cache the INFO results in Sentinel so Sentinel can be a single
place to quickly get all INFO output for an entire Sentinel monitoring
group.

This commit gives us SENTINEL INFO-CACHE in two forms:
  - SENTINEL INFO-CACHE — returns all masters and all replicas
  - SENTINEL INFO-CACHE master0 master1 ... masterN — vararg specify masters

Results are returned as a multibulk reply with two top-level entries
for each master.  The first entry for each master is the name of the master.
The second entry is a nested multibulk reply with the contents of INFO,
first for the master, then an additional entry for each of the
replicas.
2014-11-20 16:56:30 -05:00
Matt Stancliff
2da508abce Sentinel: Add initial quorum bounds check
Fixes #2054
2014-11-20 16:30:17 -05:00
Matt Stancliff
299c667adb Clean up text throughout project
- Remove trailing newlines from redis.conf
  - Fix comment misspelling
  - Clarifies zipEncodeLength usage and a C API mention (#1243, #1242)
  - Fix cluster typos (inspired by @papanikge #1507)
  - Fix rewite -> rewrite in a few places (inspired by #682)

Closes #1243, #1242, #1507
2014-09-29 06:49:07 -04:00
antirez
5142037af2 Sentinel sentinelGetLeader() top comment improved. 2014-09-11 19:27:45 +02:00
antirez
bdf2ab1891 Sentinel: fix computation of total number of votes.
The code to check the number of voters was never updated to follow the new
Sentinel specification, so the number of voters was computed using only
the set of Sentinels that provided a vote.

This means that there is a changing majority on partitions, even if
usually the issue is not triggered because of the configured quorum
check (what was broken was the other implicit check that requires anyway
half of the known sentinels to agree in order to start a failover).
2014-09-11 18:53:31 +02:00
antirez
da1b1e246a Sentinel: don't set announce-ip if is empty. 2014-09-04 11:45:58 +02:00