futriix

Author	SHA1	Message	Date
antirez	6e9ffa93eb	Diskless replica: a few aesthetic changes to replication.c.	2019-07-08 18:32:47 +02:00
Oran Agra	29754ebe22	diskless replication on slave side (don't store rdb to file), plus some other related fixes The implementation of the diskless replication was currently diskless only on the master side. The slave side was still storing the received rdb file to the disk before loading it back in and parsing it. This commit adds two modes to load rdb directly from socket: 1) when-empty 2) using "swapdb" the third mode of using diskless slave by flushdb is risky and currently not included. other changes: -------------- distinguish between aof configuration and state so that we can re-enable aof only when sync eventually succeeds (and not when exiting from readSyncBulkPayload after a failed attempt) also a CONFIG GET and INFO during rdb loading would have lied When loading rdb from the network, don't kill the server on short read (that can be a network error) Fix rdb check when performed on preamble AOF tests: run replication tests for diskless slave too make replication test a bit more aggressive Add test for diskless load swapdb	2019-07-08 15:37:48 +03:00
antirez	9eea57cc31	Narrow the effects of PR #6029 to the exact state. CLIENT PAUSE may be used, in other contexts, for a long time making all the slaves time out. Better for now to be more specific about what should disable senidng PINGs. An alternative to that would be to virtually refresh the slave interactions when clients are paused, however for now I went for this more conservative solution.	2019-05-15 12:16:43 +02:00
Salvatore Sanfilippo	3f4f7aff1a	Merge pull request #6029 from chendq8/clientpause fix cluster failover time out	2019-05-15 12:03:19 +02:00
chendianqiang	1e29403134	stop ping when client pause	2019-04-17 21:20:10 +08:00
Salvatore Sanfilippo	641359787c	Merge pull request #3830 from oranagra/diskless_capa_pr several bugfixes to diskless replication	2019-03-22 17:41:40 +01:00
antirez	7e191d3ea3	More sensible name for function: restartAOFAfterSYNC(). Related to #3829.	2019-03-21 17:21:29 +01:00
antirez	4c49e7ad6f	Mostly aesthetic changes to restartAOF(). See #3829.	2019-03-21 17:18:24 +01:00
Oran Agra	544b9b0826	diskless replication - notify slave when rdb transfer failed in diskless replication - master was not notifing the slave that rdb transfer terminated on error, and lets slave wait for replication timeout	2019-03-20 17:46:19 +02:00
oranagra	4355e29749	bugfix to restartAOF, exit will never happen since retry will get negative. also reduce an excess sleep	2019-03-20 17:20:07 +02:00
antirez	340d03b64b	replicaofCommand() refactoring: stay into 80 cols.	2019-03-18 11:34:40 +01:00
antirez	beaacf96cd	Make comment in #5911 stay inside 80 cols.	2019-03-10 09:48:06 +01:00
John Sully	a3e3a24057	Replicas aren't allowed to run the replicaof command	2019-03-09 11:04:48 -05:00
zhaozhao.zz	de0f42bff3	ACL: add masteruser configuration for replication In mostly production environment, normal user's behavior should be limited. Now in redis ACL mechanism we can do it like that: user default on +@all ~* -@dangerous nopass user admin on +@all ~* >someSeriousPassword Then the default normal user can not execute dangerous commands like FLUSHALL/KEYS. But some admin commands are in dangerous category too like PSYNC, and the configurations above will forbid replica from sync with master. Finally I think we could add a new configuration for replication, it is masteruser option, like this: masteruser admin masterauth someSeriousPassword Then replica will try AUTH admin someSeriousPassword and get privilege to execute PSYNC. If masteruser is NULL, replica would AUTH with only masterauth like before.	2019-02-12 17:12:37 +08:00
ArkayZheng	1282b83681	Fix the output bug in rename exceptions.	2019-01-25 21:48:23 +08:00
antirez	da54f1fd3f	Refactoring: always kill AOF/RDB child via helper functions.	2019-01-21 11:28:44 +01:00
Salvatore Sanfilippo	081ef93364	Merge branch 'unstable' into fixChildInfoPipeFdLeak	2019-01-21 11:20:56 +01:00
Salvatore Sanfilippo	3faf82d14b	Merge pull request #5797 from trevor211/fixUpdateDictResizePolicy Fix update dict resize policy	2019-01-21 11:14:48 +01:00
WuYunlong	1e1a5ceb65	Fix child info pipe fd leak when child process gets killed.	2019-01-21 17:48:45 +08:00
WuYunlong	f02ae4788b	Update dict resize policy when rdb child process gets killed.	2019-01-21 17:33:18 +08:00
antirez	e018c7c83f	ACL: configure the master connection without user.	2019-01-17 18:33:36 +01:00
antirez	89b7b6a917	RESP3: addReplyString() -> addReplyProto(). The function naming was totally nuts. Let's fix it as we break PRs anyway with RESP3 refactoring and changes.	2019-01-09 17:00:30 +01:00
antirez	166cdd3583	RESP3: Use new deferred len API in replication.c.	2019-01-09 17:00:29 +01:00
antirez	2e14b6bbca	When replica kills a pending RDB save during SYNC, log it. This logs what happens in the context of the fix in PR #5367.	2018-10-31 11:47:10 +01:00
Salvatore Sanfilippo	2a5ac7ad88	Merge pull request #5367 from nUl1/fullresync-stopbgsave Prevent RDB autosave from overwriting full resync results	2018-10-31 11:42:04 +01:00
antirez	a6e595c778	Fix typo in replicationCron() comment.	2018-10-05 18:30:45 +02:00
Andrey Bugaevskiy	3249c4dc79	Move child termination to readSyncBulkPayload	2018-09-27 19:38:58 +03:00
Andrey Bugaevskiy	8b8c4ab5aa	Prevent RDB autosave from overwriting full resync results During the full database resync we may still have unsaved changes on the receiving side. This causes a race condition between synced data rename/load and the rename of rdbSave tempfile.	2018-09-19 19:58:39 +03:00
antirez	d6d9b778ac	Slave removal: replication.c logs fixed.	2018-09-11 15:32:28 +02:00
antirez	bdca5b9be5	Slave removal: SLAVEOF -> REPLICAOF. SLAVEOF is now an alias.	2018-09-11 15:32:28 +02:00
Oran Agra	c9ce6aa4a8	fix rare replication stream corruption with disk-based replication The slave sends \n keepalive messages to the master while parsing the rdb, and later sends REPLCONF ACK once a second. rarely, the master recives both a linefeed char and a REPLCONF in the same read, \n3\r\n$8\r\nREPLCONF\r\n... and it tries to trim two chars (\r\n) from the query buffer, trimming the '' from *3\r\n$8\r\nREPLCONF\r\n... then the master tries to process a command starting with '3' and replies to the slave a bunch of -ERR and one +OK. although the slave silently ignores these (prints a log message), this corrupts the replication offset at the slave since the slave increases the replication offset, and the master did not. other than the fix in processInlineBuffer, i did several other improvments while hunting this very rare bug. - when redis replies with "unknown command" it includes a portion of the arguments, not just the command name. so it would be easier to understand what was recived, in my case, on the slave side, it was -ERR, but the "arguments" were the interesting part (containing info on the error). - about a year ago i added code in addReplyErrorLength to print the error to the log in case of a reply to master (since this string isn't actually trasmitted to the master), now changed that block to print a similar log message to indicate an error being sent from the master to the slave. note that the slave is marked as CLIENT_SLAVE only after PSYNC was received, so this will not cause any harm for REPLCONF, and will only indicate problems that are gonna corrupt the replication stream anyway. - two places were c->reply was emptied, and i wanted to reset sentlen this is a precaution (i did not actually see such a problem), since a non-zero sentlen will cause corruption to be transmitted on the socket.	2018-07-17 12:51:49 +03:00
Oran Agra	28cf208bf3	slave buffers were wasteful and incorrectly counted causing eviction A) slave buffers didn't count internal fragmentation and sds unused space, this caused them to induce eviction although we didn't mean for it. B) slave buffers were consuming about twice the memory of what they actually needed. - this was mainly due to sdsMakeRoomFor growing to twice as much as needed each time but networking.c not storing more than 16k (partially fixed recently in 237a38737). - besides it wasn't able to store half of the new string into one buffer and the other half into the next (so the above mentioned fix helped mainly for small items). - lastly, the sds buffers had up to 30% internal fragmentation that was wasted, consumed but not used. C) inefficient performance due to starting from a small string and reallocing many times. what i changed: - creating dedicated buffers for reply list, counting their size with zmalloc_size - when creating a new reply node from, preallocate it to at least 16k. - when appending a new reply to the buffer, first fill all the unused space of the previous node before starting a new one. other changes: - expose mem_not_counted_for_evict info field for the benefit of the test suite - add a test to make sure slave buffers are counted correctly and that they don't cause eviction	2018-07-16 16:43:42 +03:00
Jack Drogon	bae1d36e5d	Fix typo	2018-07-03 18:19:46 +02:00
antirez	574b79fea6	Set repl_down_since to zero on state change. PR #5081 fixes an "interesting" bug about Redis Cluster failover but in general about the updating of repl_down_since, that is used in order to count the time a slave was left disconnected from its master. While the fix provided resolves the specific issue, in general the validity of repl_down_since is limited to states that are different than the state CONNECTED, and the disconnected time is set when the state is DISCONNECTED. However from CONNECTED to other states, the state machine must always go to DISCONNECTED first. So it makes sense to set the field to zero (since it is meaningless in that context) when the state is set to CONNECTED.	2018-07-03 12:42:14 +02:00
WuYunlong	59dd29bb35	fix server.repl_down_since resetting, so that slaves could failover automatically as expected.	2018-06-30 09:39:08 +08:00
antirez	f2dfdae6a6	Fix type of argslen in sendSynchronousCommand(). Related to #5037.	2018-06-26 14:38:35 +02:00
antirez	bcbb9b71f8	Remove black space.	2018-06-26 14:37:22 +02:00
Madelyn Olson	755109f784	Addressed comments	2018-06-26 00:57:35 +00:00
Madelyn Olson	317e235b8d	Fixed replication authentication with whitespace in password	2018-06-26 00:48:37 +00:00
shenlongxing	a35bf3f130	Fix write() errno error	2018-06-06 13:06:42 +02:00
Salvatore Sanfilippo	ebf7964c62	Merge pull request #4269 from jianqingdu/unstable fix not call va_end() when syncWrite() failed	2018-01-24 10:55:25 +01:00
antirez	3d176c3833	Hopefully more clear comment to explain the change in #4607 .	2018-01-16 15:52:13 +01:00
Oran Agra	12f7ed3bb5	PSYNC2 fix - promoted slave should hold on to it's backlog after a slave is promoted (assuming it has no slaves and it booted over an hour ago), it will lose it's replication backlog at the next replication cron, rather than waiting for slaves to connect to it. so on a simple master/slave faiover, if the new slave doesn't connect immediately, it may be too later and PSYNC2 will fail.	2018-01-16 10:10:42 +02:00
antirez	e0df883f45	add linkClient(): adds the client and caches the list node. We have this operation in two places: when caching the master and when linking a new client after the client creation. By having an API for this we avoid incurring in errors when modifying one of the two places forgetting the other. The function is also a good place where to document why we cache the linked list node. Related to #4497 and #4210.	2017-12-05 16:02:03 +01:00
zhaozhao.zz	2f98aa2d83	networking: optimize unlinkClient() in freeClient()	2017-11-30 18:11:05 +08:00
antirez	2ed2fb7f25	PSYNC2: reorganize comments related to recent fixes. Related to PR #4412 and issue #4407.	2017-11-24 11:08:29 +01:00
zhaozhao.zz	32d2ec3cd9	PSYNC2: safe free backlog when reach the time limit When we free the backlog, we should use a new replication ID and clear the ID2. Since without backlog we can not increment master_repl_offset even do write commands, that may lead to inconsistency when we try to connect a "slave-before" master (if this master is our slave before, our replid equals the master's replid2). As the master have our history, so we can match the master's replid2 and second_replid_offset, that make partial sync work, but the data is inconsistent.	2017-11-01 17:32:27 +08:00
antirez	0d68dd2fad	PSYNC2: More refinements related to #4316 .	2017-09-20 11:28:13 +02:00
zhaozhao.zz	019c7fa546	PSYNC2: make persisiting replication info more solid This commit is a reinforcement of commit af78ec8. 1. Replication information can be stored when the RDB file is generated by a mater using server.slaveseldb when server.repl_backlog is not NULL, or set repl_stream_db be -1. That's safe, because NULL server.repl_backlog will trigger full synchronization, then master will send SELECT command to replicaiton stream. 2. Only do rdbSave* when rsiptr is not NULL, if we do rdbSave* without rdbSaveInfo, slave will miss repl-stream-db. 3. Save the replication informations also in the case of SAVE command, FLUSHALL command and DEBUG reload.	2017-09-20 11:18:10 +02:00
antirez	af78ec8ccf	PSYNC2: Fix the way replication info is saved/loaded from RDB. This commit attempts to fix a number of bugs reported in #4316. They are related to the way replication info like replication ID, offsets, and currently selected DB in the master client, are stored and loaded by Redis. In order to avoid inconsistencies the changes in this commit try to enforce that: 1. Replication information are only stored when the RDB file is generated by a slave that has a valid 'master' client, so that we can always extract the currently selected DB. 2. When replication informations are persisted in the RDB file, all the info for a successful PSYNC or nothing is persisted. 3. The RDB replication informations are only loaded if the instance is configured as a slave, otherwise a master can start with IDs that relate to a different history of the data set, and stil retain such IDs in the future while receiving unrelated writes.	2017-09-19 23:03:39 +02:00

1 2 3 4 5

250 Commits