futriix

Author	SHA1	Message	Date
antirez	5765444454	Sentinel test: ability to run just a subset of test files.	2014-02-20 16:28:41 +01:00
antirez	d7326887e5	Sentinel: report instances role switch events. This is useful mostly for debugging of issues.	2014-02-20 12:13:52 +01:00
antirez	7d7b3810e7	Sentinel: report instances role switch events. This is useful mostly for debugging of issues.	2014-02-20 12:13:52 +01:00
Matt Stancliff	888041d194	Cluster: error out quicker if port is unusable The default cluster control port is 10,000 ports higher than the base Redis port. If Redis is started on a too-high port, Cluster can't start and everything will exit later anyway.	2014-02-19 17:30:07 -05:00
Matt Stancliff	ce68caea37	Cluster: error out quicker if port is unusable The default cluster control port is 10,000 ports higher than the base Redis port. If Redis is started on a too-high port, Cluster can't start and everything will exit later anyway.	2014-02-19 17:30:07 -05:00
Matt Stancliff	cde22bf623	Fix "can't bind to address" error reporting. Report the actual port used for the listening attempt instead of server.port. Originally, Redis would just listen on server.port. But, with clustering, Redis uses a Cluster Port too, so we can't say server.port is always where we are listening. If you tried to launch Redis with a too-high port number (any port where Port+10000 > 65535), Redis would refuse to start, but only print an error saying it can't connect to the Redis port. This patch fixes much confusions.	2014-02-19 17:26:33 -05:00
Matt Stancliff	b20ae393f1	Fix "can't bind to address" error reporting. Report the actual port used for the listening attempt instead of server.port. Originally, Redis would just listen on server.port. But, with clustering, Redis uses a Cluster Port too, so we can't say server.port is always where we are listening. If you tried to launch Redis with a too-high port number (any port where Port+10000 > 65535), Redis would refuse to start, but only print an error saying it can't connect to the Redis port. This patch fixes much confusions.	2014-02-19 17:26:33 -05:00
antirez	0cf419af0f	Sentinel test: some reliability fixes to 00-base tests.	2014-02-19 10:26:23 +01:00
antirez	e087d8a20d	Sentinel test: some reliability fixes to 00-base tests.	2014-02-19 10:26:23 +01:00
antirez	6f298002de	Sentinel test: check that role matches at end of 00-base.	2014-02-19 10:08:49 +01:00
antirez	a88a057a1f	Sentinel test: check that role matches at end of 00-base.	2014-02-19 10:08:49 +01:00
antirez	693385e2dc	Sentinel test: ODOWN and agreement.	2014-02-19 09:44:38 +01:00
antirez	2a08c7e5ac	Sentinel test: ODOWN and agreement.	2014-02-19 09:44:38 +01:00
antirez	4a22c51c48	Sentinel test: check reconfig of slaves and old master.	2014-02-18 17:03:56 +01:00
antirez	136537dcb0	Sentinel test: check reconfig of slaves and old master.	2014-02-18 17:03:56 +01:00
antirez	213cc0fe88	Sentinel test: basic failover tested. Framework improvements.	2014-02-18 16:31:52 +01:00
antirez	8e553ec67c	Sentinel test: basic failover tested. Framework improvements.	2014-02-18 16:31:52 +01:00
antirez	251d4f754d	Sentinel test: basic tests for MONITOR and auto-discovery.	2014-02-18 11:53:54 +01:00
antirez	c7b7439528	Sentinel test: basic tests for MONITOR and auto-discovery.	2014-02-18 11:53:54 +01:00
antirez	bbd8a80fd4	Sentinel test: info fields, master-slave setup, fixes.	2014-02-18 11:38:49 +01:00
antirez	c4fbc1d336	Sentinel test: info fields, master-slave setup, fixes.	2014-02-18 11:38:49 +01:00
antirez	ad1e66fe4e	Prefix test file names with numbers to force exec order.	2014-02-18 11:07:42 +01:00
antirez	19b863c7fa	Prefix test file names with numbers to force exec order.	2014-02-18 11:07:42 +01:00
antirez	1310d19402	Sentinel test: provide basic commands to access instances.	2014-02-18 11:04:55 +01:00
antirez	141bac4c79	Sentinel test: provide basic commands to access instances.	2014-02-18 11:04:55 +01:00
antirez	33831e4eca	Sentinel: SENTINEL_SLAVE_RECONF_RETRY_PERIOD -> RECONF_TIMEOUT Rename define to match the new meaning.	2014-02-18 10:27:38 +01:00
antirez	7cec9e48ce	Sentinel: SENTINEL_SLAVE_RECONF_RETRY_PERIOD -> RECONF_TIMEOUT Rename define to match the new meaning.	2014-02-18 10:27:38 +01:00
antirez	3d25fb1f79	Sentinel: fix slave promotion timeout. If we can't reconfigure a slave in time during failover, go forward as anyway the slave will be fixed by Sentinels in the future, once they detect it is misconfigured. Otherwise a failover in progress may never terminate if for some reason the slave is uncapable to sync with the master while at the same time it is not disconnected.	2014-02-18 08:50:57 +01:00
antirez	18b8bad53c	Sentinel: fix slave promotion timeout. If we can't reconfigure a slave in time during failover, go forward as anyway the slave will be fixed by Sentinels in the future, once they detect it is misconfigured. Otherwise a failover in progress may never terminate if for some reason the slave is uncapable to sync with the master while at the same time it is not disconnected.	2014-02-18 08:50:57 +01:00
antirez	576e405634	Sentinel: initial testing framework. Nothing tested at all so far... Just the infrastructure spawning N Sentinels and N Redis instances that the test will use again and again.	2014-02-17 17:38:04 +01:00
antirez	af788b5852	Sentinel: initial testing framework. Nothing tested at all so far... Just the infrastructure spawning N Sentinels and N Redis instances that the test will use again and again.	2014-02-17 17:38:04 +01:00
antirez	e3c7c1ebd5	Test: colorstr moved to util.tcl.	2014-02-17 17:36:50 +01:00
antirez	34c404e069	Test: colorstr moved to util.tcl.	2014-02-17 17:36:50 +01:00
antirez	5917f2c5c5	Test: code to test server availability refactored. Some inline test moved into server_is_up procedure. Also find_available_port was moved into util since it is going to be used for the Sentinel test as well.	2014-02-17 16:44:57 +01:00
antirez	a1dca2efab	Test: code to test server availability refactored. Some inline test moved into server_is_up procedure. Also find_available_port was moved into util since it is going to be used for the Sentinel test as well.	2014-02-17 16:44:57 +01:00
antirez	f36ff37d76	Get absoulte config file path before processig 'dir'. The code tried to obtain the configuration file absolute path after processing the configuration file. However if config file was a relative path and a "dir" statement was processed reading the config, the absolute path obtained was wrong. With this fix the absolute path is obtained before processing the configuration while the server is still in the original directory where it was executed.	2014-02-17 16:44:53 +01:00
antirez	ede33fb912	Get absoulte config file path before processig 'dir'. The code tried to obtain the configuration file absolute path after processing the configuration file. However if config file was a relative path and a "dir" statement was processed reading the config, the absolute path obtained was wrong. With this fix the absolute path is obtained before processing the configuration while the server is still in the original directory where it was executed.	2014-02-17 16:44:53 +01:00
antirez	dc67d9f64c	Sentinel: better specify startup errors due to config file. Now it logs the file name if it is not accessible. Also there is a different error for the missing config file case, and for the non writable file case.	2014-02-17 16:44:49 +01:00
antirez	e1b77b61f3	Sentinel: better specify startup errors due to config file. Now it logs the file name if it is not accessible. Also there is a different error for the missing config file case, and for the non writable file case.	2014-02-17 16:44:49 +01:00
antirez	bea68cc21f	Update cached time in rdbLoad() callback. server.unixtime and server.mstime are cached less precise timestamps that we use every time we don't need an accurate time representation and a syscall would be too slow for the number of calls we require. Such an example is the initialization and update process of the last interaction time with the client, that is used for timeouts. However rdbLoad() can take some time to load the DB, but at the same time it did not updated the time during DB loading. This resulted in the bug described in issue #1535, where in the replication process the slave loads the DB, creates the redisClient representation of its master, but the timestamp is so old that the master, under certain conditions, is sensed as already "timed out". Thanks to @yoav-steinberg and Redis Labs Inc for the bug report and analysis.	2014-02-13 15:13:26 +01:00
antirez	51bd9da1fd	Update cached time in rdbLoad() callback. server.unixtime and server.mstime are cached less precise timestamps that we use every time we don't need an accurate time representation and a syscall would be too slow for the number of calls we require. Such an example is the initialization and update process of the last interaction time with the client, that is used for timeouts. However rdbLoad() can take some time to load the DB, but at the same time it did not updated the time during DB loading. This resulted in the bug described in issue #1535, where in the replication process the slave loads the DB, creates the redisClient representation of its master, but the timestamp is so old that the master, under certain conditions, is sensed as already "timed out". Thanks to @yoav-steinberg and Redis Labs Inc for the bug report and analysis.	2014-02-13 15:13:26 +01:00
antirez	dda8bc1eff	Log when CONFIG REWRITE goes bad.	2014-02-13 14:32:44 +01:00
antirez	7e8abcf693	Log when CONFIG REWRITE goes bad.	2014-02-13 14:32:44 +01:00
antirez	c0e261e37c	Test: regression for issue #1549 . It was verified that reverting the commit that fixes the bug, the test no longer passes.	2014-02-13 12:26:38 +01:00
antirez	f2bdf601be	Test: regression for issue #1549 . It was verified that reverting the commit that fixes the bug, the test no longer passes.	2014-02-13 12:26:38 +01:00
antirez	d6b34bea6b	Fix script cache bug in the scripting engine. This commit fixes a serious Lua scripting replication issue, described by Github issue #1549. The root cause of the problem is that scripts were put inside the script cache, assuming that slaves and AOF already contained it, even if the scripts sometimes produced no changes in the data set, and were not actaully propagated to AOF/slaves. Example: eval "if tonumber(KEYS[1]) > 0 then redis.call('incr', 'x') end" 1 0 Then: evalsha <sha1 step 1 script> 1 0 At this step sha1 of the script is added to the replication script cache (the script is marked as known to the slaves) and EVALSHA command is transformed to EVAL. However it is not dirty (there is no changes to db), so it is not propagated to the slaves. Then the script is called again: evalsha <sha1 step 1 script> 1 1 At this step master checks that the script already exists in the replication script cache and doesn't transform it to EVAL command. It is dirty and propagated to the slaves, but they fail to evaluate the script as they don't have it in the script cache. The fix is trivial and just uses the new API to force the propagation of the executed command regardless of the dirty state of the data set. Thank you to @minus-infinity on Github for finding the issue, understanding the root cause, and fixing it.	2014-02-13 12:10:43 +01:00
antirez	21e6b0fbe9	Fix script cache bug in the scripting engine. This commit fixes a serious Lua scripting replication issue, described by Github issue #1549. The root cause of the problem is that scripts were put inside the script cache, assuming that slaves and AOF already contained it, even if the scripts sometimes produced no changes in the data set, and were not actaully propagated to AOF/slaves. Example: eval "if tonumber(KEYS[1]) > 0 then redis.call('incr', 'x') end" 1 0 Then: evalsha <sha1 step 1 script> 1 0 At this step sha1 of the script is added to the replication script cache (the script is marked as known to the slaves) and EVALSHA command is transformed to EVAL. However it is not dirty (there is no changes to db), so it is not propagated to the slaves. Then the script is called again: evalsha <sha1 step 1 script> 1 1 At this step master checks that the script already exists in the replication script cache and doesn't transform it to EVAL command. It is dirty and propagated to the slaves, but they fail to evaluate the script as they don't have it in the script cache. The fix is trivial and just uses the new API to force the propagation of the executed command regardless of the dirty state of the data set. Thank you to @minus-infinity on Github for finding the issue, understanding the root cause, and fixing it.	2014-02-13 12:10:43 +01:00
antirez	055d761c7f	AOF write error: retry with a frequency of 1 hz.	2014-02-12 16:27:59 +01:00
antirez	fc08c8599f	AOF write error: retry with a frequency of 1 hz.	2014-02-12 16:27:59 +01:00
antirez	84152ddd22	AOF: don't abort on write errors unless fsync is 'always'. A system similar to the RDB write error handling is used, in which when we can't write to the AOF file, writes are no longer accepted until we are able to write again. For fsync == always we still abort on errors since there is currently no easy way to avoid replying with success to the user otherwise, and this would violate the contract with the user of only acknowledging data already secured on disk.	2014-02-12 16:11:36 +01:00

... 415 416 417 418 419 ...

27398 Commits