futriix

Author	SHA1	Message	Date
antirez	ebd8d56194	Sentinel: added missing exit(1) after checking for config file.	2014-02-24 16:22:52 +01:00
antirez	2361550063	Sentinel test: tmp dir and gitignore added.	2014-02-24 11:51:31 +01:00
antirez	26b08f160a	Sentinel test: minor fixes to --pause-on-error.	2014-02-23 18:02:52 +01:00
antirez	0506a4918f	Sentinel test: --pause-on-error option added. Pause the test with running instances available for state inspection on error.	2014-02-23 17:57:56 +01:00
antirez	d97b0090a9	Sentinel test: added empty units to fill later.	2014-02-23 17:50:59 +01:00
Salvatore Sanfilippo	10dce8abc3	Merge pull request #1545 from mattsta/fix-redis-cli-sync Deny SYNC and PSYNC in redis-cli	2014-02-23 17:47:28 +01:00
antirez	742ca7898b	Sentinel: IDONTKNOW error removed. This error was conceived for the older version of Sentinel that worked via master redirection and that was not able to get configuration updates from other Sentinels via the Pub/Sub channel of masters or slaves. This reply does not make sense today, every Sentinel should reply with the best information it has currently. The error will make even more sense in the future since the plan is to allow Sentinels to update the configuration of other Sentinels via gossip with a direct chat without the prerequisite that they have at least a monitored instance in common.	2014-02-22 17:34:46 +01:00
antirez	6f02cfaeac	Sentinel test: framework improved and conf-update unit added. It is now possible to kill and restart sentinel or redis instances for more real-world testing. The 01 unit tests the capability of Sentinel to update the configuration of Sentinels rejoining the cluster, however the test is pretty trivial and more tests should be added.	2014-02-22 17:27:49 +01:00
Salvatore Sanfilippo	1afaf50dd6	Merge pull request #1559 from mattsta/more-detailed-process-title Add cluster or sentinel to proc title	2014-02-21 09:32:13 +01:00
Matt Stancliff	b3d3db2d7c	Add cluster or sentinel to proc title If you launch redis with `redis-server --sentinel` then in a ps, your output only says "redis-server IP:Port" — this patch changes the proc title to include [sentinel] or [cluster] depending on the current server mode: e.g. "redis-server IP:Port [sentinel]" "redis-server IP:Port [cluster]"	2014-02-20 23:58:54 -05:00
antirez	8def3a143e	Sentinel test: move init tests as includes. Most units will start with these two basic tests to create an environment where the real tests are ran.	2014-02-20 16:58:23 +01:00
antirez	8c378427b2	Sentinel test: ability to run just a subset of test files.	2014-02-20 16:28:41 +01:00
antirez	d7326887e5	Sentinel: report instances role switch events. This is useful mostly for debugging of issues.	2014-02-20 12:13:52 +01:00
antirez	0cf419af0f	Sentinel test: some reliability fixes to 00-base tests.	2014-02-19 10:26:23 +01:00
antirez	6f298002de	Sentinel test: check that role matches at end of 00-base.	2014-02-19 10:08:49 +01:00
antirez	693385e2dc	Sentinel test: ODOWN and agreement.	2014-02-19 09:44:38 +01:00
antirez	4a22c51c48	Sentinel test: check reconfig of slaves and old master.	2014-02-18 17:03:56 +01:00
antirez	213cc0fe88	Sentinel test: basic failover tested. Framework improvements.	2014-02-18 16:31:52 +01:00
antirez	251d4f754d	Sentinel test: basic tests for MONITOR and auto-discovery.	2014-02-18 11:53:54 +01:00
antirez	bbd8a80fd4	Sentinel test: info fields, master-slave setup, fixes.	2014-02-18 11:38:49 +01:00
antirez	ad1e66fe4e	Prefix test file names with numbers to force exec order.	2014-02-18 11:07:42 +01:00
antirez	1310d19402	Sentinel test: provide basic commands to access instances.	2014-02-18 11:04:55 +01:00
antirez	33831e4eca	Sentinel: SENTINEL_SLAVE_RECONF_RETRY_PERIOD -> RECONF_TIMEOUT Rename define to match the new meaning.	2014-02-18 10:27:38 +01:00
antirez	3d25fb1f79	Sentinel: fix slave promotion timeout. If we can't reconfigure a slave in time during failover, go forward as anyway the slave will be fixed by Sentinels in the future, once they detect it is misconfigured. Otherwise a failover in progress may never terminate if for some reason the slave is uncapable to sync with the master while at the same time it is not disconnected.	2014-02-18 08:50:57 +01:00
antirez	576e405634	Sentinel: initial testing framework. Nothing tested at all so far... Just the infrastructure spawning N Sentinels and N Redis instances that the test will use again and again.	2014-02-17 17:38:04 +01:00
antirez	e3c7c1ebd5	Test: colorstr moved to util.tcl.	2014-02-17 17:36:50 +01:00
antirez	5917f2c5c5	Test: code to test server availability refactored. Some inline test moved into server_is_up procedure. Also find_available_port was moved into util since it is going to be used for the Sentinel test as well.	2014-02-17 16:44:57 +01:00
antirez	f36ff37d76	Get absoulte config file path before processig 'dir'. The code tried to obtain the configuration file absolute path after processing the configuration file. However if config file was a relative path and a "dir" statement was processed reading the config, the absolute path obtained was wrong. With this fix the absolute path is obtained before processing the configuration while the server is still in the original directory where it was executed.	2014-02-17 16:44:53 +01:00
antirez	dc67d9f64c	Sentinel: better specify startup errors due to config file. Now it logs the file name if it is not accessible. Also there is a different error for the missing config file case, and for the non writable file case.	2014-02-17 16:44:49 +01:00
antirez	bea68cc21f	Update cached time in rdbLoad() callback. server.unixtime and server.mstime are cached less precise timestamps that we use every time we don't need an accurate time representation and a syscall would be too slow for the number of calls we require. Such an example is the initialization and update process of the last interaction time with the client, that is used for timeouts. However rdbLoad() can take some time to load the DB, but at the same time it did not updated the time during DB loading. This resulted in the bug described in issue #1535, where in the replication process the slave loads the DB, creates the redisClient representation of its master, but the timestamp is so old that the master, under certain conditions, is sensed as already "timed out". Thanks to @yoav-steinberg and Redis Labs Inc for the bug report and analysis.	2014-02-13 15:13:26 +01:00
antirez	dda8bc1eff	Log when CONFIG REWRITE goes bad.	2014-02-13 14:32:44 +01:00
antirez	c0e261e37c	Test: regression for issue #1549 . It was verified that reverting the commit that fixes the bug, the test no longer passes.	2014-02-13 12:26:38 +01:00
antirez	d6b34bea6b	Fix script cache bug in the scripting engine. This commit fixes a serious Lua scripting replication issue, described by Github issue #1549. The root cause of the problem is that scripts were put inside the script cache, assuming that slaves and AOF already contained it, even if the scripts sometimes produced no changes in the data set, and were not actaully propagated to AOF/slaves. Example: eval "if tonumber(KEYS[1]) > 0 then redis.call('incr', 'x') end" 1 0 Then: evalsha <sha1 step 1 script> 1 0 At this step sha1 of the script is added to the replication script cache (the script is marked as known to the slaves) and EVALSHA command is transformed to EVAL. However it is not dirty (there is no changes to db), so it is not propagated to the slaves. Then the script is called again: evalsha <sha1 step 1 script> 1 1 At this step master checks that the script already exists in the replication script cache and doesn't transform it to EVAL command. It is dirty and propagated to the slaves, but they fail to evaluate the script as they don't have it in the script cache. The fix is trivial and just uses the new API to force the propagation of the executed command regardless of the dirty state of the data set. Thank you to @minus-infinity on Github for finding the issue, understanding the root cause, and fixing it.	2014-02-13 12:10:43 +01:00
antirez	055d761c7f	AOF write error: retry with a frequency of 1 hz.	2014-02-12 16:27:59 +01:00
antirez	84152ddd22	AOF: don't abort on write errors unless fsync is 'always'. A system similar to the RDB write error handling is used, in which when we can't write to the AOF file, writes are no longer accepted until we are able to write again. For fsync == always we still abort on errors since there is currently no easy way to avoid replying with success to the user otherwise, and this would violate the contract with the user of only acknowledging data already secured on disk.	2014-02-12 16:11:36 +01:00
antirez	283a633f98	Cluster: clusterDelNode(): remove node from master's slaves.	2014-02-11 10:34:25 +01:00
antirez	cfc5f8f67c	Cluster: UPDATE messages are the norm and verbose. Logging them at WARNING level was of little utility and of sure disturb.	2014-02-11 10:18:24 +01:00
antirez	234fafca84	Cluster: redis-trib fix: handling of another trivial case.	2014-02-11 10:13:18 +01:00
antirez	6d1d5542fc	Cluster: configEpoch assignment in SETNODE improved. Avoid to trash a configEpoch for every slot migrated if this node has already the max configEpoch across the cluster. Still work to do in this area but this avoids both ending with a very high configEpoch without any reason and to flood the system with fsyncs.	2014-02-11 10:09:17 +01:00
antirez	9b8e0c972a	Cluster: clusterSetStartupEpoch() made more generally useful. The actual goal of the function was to get the max configEpoch found in the cluster, so make it general by removing the assignment of the max epoch to currentEpoch that is useful only at startup.	2014-02-11 10:00:14 +01:00
antirez	e200c6dd00	Cluster: always increment the configEpoch in SETNODE after import. Removed a stale conditional preventing the configEpoch from incrementing after the import in certain conditions. Since the master got a new slot it should always claim a new configuration.	2014-02-11 09:50:37 +01:00
antirez	b60d185126	Cluster: on resharding upgrade version of receiving node. The node receiving the hash slot needs to have a version that wins over the other versions in order to force the ownership of the slot. However the current code is far from perfect since a failover can happen during the manual resharding. The fix is a work in progress but the bottom line is that the new version must either be voted as usually, set by redis-trib manually after it makes sure can't be used by other nodes, or reserved configEpochs could be used for manual operations (for example odd versions could be never used by slaves and are always used by CLUSTER SETSLOT NODE).	2014-02-11 00:36:05 +01:00
antirez	a1d0249297	Cluster: fsync at every SETSLOT command puts too pressure on disks. During slots migration redis-trib can send a number of SETSLOT commands. Fsyncing every time is a bit too much in production as verified empirically. To make sure configs are fsynced on all nodes after a resharding redis-trib may send something like CLUSTER CONFSYNC. In this case fsyncs were not providing too much value since anyway processes can crash in the middle of the resharding of an hash slot, and redis-trib should be able to recover from this condition anyway.	2014-02-10 23:54:08 +01:00
antirez	435af98eb8	Cluster: conditions to clear "migrating" on slot for SETSLOT ... NODE changed. If the slot is manually assigned to another node, clear the migrating status regardless of the fact it was previously assigned to us or not, as long as we no longer have keys for this slot. This avoid a race during slots migration that may leave the slot in migrating status in the source node, since it received an update message from the destination node that is already claiming the slot. This way we are sure that redis-trib at the end of the slot migration is always able to close the slot correctly.	2014-02-10 23:51:47 +01:00
antirez	5a79453abf	Cluster: remove debugging xputs from redis-trib.	2014-02-10 19:14:05 +01:00
antirez	e4732138b0	Cluster: redis-trib fix: cover new case of open slot. The case is the trivial one a single node claiming the slot as migrating, without nodes claiming it as importing.	2014-02-10 19:10:23 +01:00
antirez	2411d8fd94	redis-trib: log event after we have reference to 'master'.	2014-02-10 18:48:40 +01:00
antirez	e4a6144fc5	Cluster: don't update slave's master if we don't know it. There is no way we can update the slave's node->slaveof pointer if we don't know the master (no node with such an ID in our tables).	2014-02-10 18:33:34 +01:00
antirez	f31a53678a	Cluster: ignore slot config changes if we are importing it.	2014-02-10 18:04:43 +01:00
antirez	5c022633a2	Cluster: update configEpoch after manually messing with slots.	2014-02-10 18:01:58 +01:00

1 2 3 4 5 ...

3910 Commits