记录下自己关于Redis Sentinel的理解~
不管什么中间件,只要是单点部署就都会有单点故障的隐患,所以很容易想到的架构是:主从架构
主从复制分完全同步、部分同步两种情况:
完全同步:当一个从节点连接到Master后,向master发送一个SYNC
命令(新版本PSYNC
),master执行BGSAVE
生成RDB
文件,同时开启一个buffer记录master上的写操作,RDB文件生成好后,发送给Slave节点,slave保存到本地磁盘,然后再加载到内存。然后master将buffer里面到写命令发给slave, 好像是通过redis协议???
部分同步:slave可以发送PSYNC master_run_id offset
请求部分同步,master和slaves都会记录同步的offset,如果slave请求同步的offset对应的数据在master上有,就同步给slave, 如果在master上没有,就会执行一次完全同步。
从库请求master进行一次完全同步:
master的日志:
8233:M 01 Sep 2020 16:55:02.260 * Replica 127.0.0.1:6380 asks for synchronization
38233:M 01 Sep 2020 16:55:02.260 * Full resync requested by replica 127.0.0.1:6380
38233:M 01 Sep 2020 16:55:02.260 * Starting BGSAVE for SYNC with target: disk
38233:M 01 Sep 2020 16:55:02.261 * Background saving started by pid 38299
38299:C 01 Sep 2020 16:55:02.323 * DB saved on disk
38299:C 01 Sep 2020 16:55:02.323 * RDB: 4 MB of memory used by copy-on-write
38233:M 01 Sep 2020 16:55:02.378 * Background saving terminated with success
38233:M 01 Sep 2020 16:55:02.378 * Synchronization with replica 127.0.0.1:6380 succeeded
从库:6380的日志:
$ tail -f 6380.log
38295:S 01 Sep 2020 16:55:02.259 * Connecting to MASTER 127.0.0.1:6379
38295:S 01 Sep 2020 16:55:02.259 * MASTER <-> REPLICA sync started
38295:S 01 Sep 2020 16:55:02.259 * Non blocking connect for SYNC fired the event.
38295:S 01 Sep 2020 16:55:02.260 * Master replied to PING, replication can continue...
38295:S 01 Sep 2020 16:55:02.260 * Partial resynchronization not possible (no cached master)
38295:S 01 Sep 2020 16:55:02.262 * Full resync from master: 46ef90de89e6771b67bc2b43371da2f97a03b4d1:0
38295:S 01 Sep 2020 16:55:02.378 * MASTER <-> REPLICA sync: receiving 175 bytes from master
38295:S 01 Sep 2020 16:55:02.378 * MASTER <-> REPLICA sync: Flushing old data
38295:S 01 Sep 2020 16:55:02.378 * MASTER <-> REPLICA sync: Loading DB in memory
38295:S 01 Sep 2020 16:55:02.378 * MASTER <-> REPLICA sync: Finished with success
单点故障解决了,但是主从切换还得人工来搞,能不能做到自动切换呢,当然可以!
Master: 6379
Slaves:6380,6381
Sentinels:26379,26380,26381
__sentinel__:hello
发送message__sentinel__:hello
来自动发现其他的sentinelsentinel发布的message:__sentinel__:hello
通道的内容:
127.0.0.1:6381> PSUBSCRIBE *
Reading messages... (press Ctrl-C to quit)
"psubscribe"
"*"
(integer) 1
"pmessage"
"*"
"__sentinel__:hello"
"127.0.0.1,26380,fc976b271914f43a4a318dfe8c1f41a2e747f8d8,1,mymaster,127.0.0.1,6381,1"
"pmessage"
"*"
"__sentinel__:hello"
"127.0.0.1,26379,b60bd3e15db23a9862d213e7703001c72d48dc73,1,mymaster,127.0.0.1,6381,1"
哨兵节点之间的发布订阅事件
内容,自动发现了其他的Sentinel:
$ src/redis-cli -p 26379
127.0.0.1:26379> PSUBSCRIBE *
Reading messages... (press Ctrl-C to quit)
"psubscribe"
"*"
(integer) 1
"pmessage"
"*"
"+sentinel"
"sentinel fc976b271914f43a4a318dfe8c1f41a2e747f8d8 127.0.0.1 26380 @ mymaster 127.0.0.1 6379"
通过Master节点知道有哪些Slaves,通过向Master发送info
命令来发现Master下的从。
手动kill掉master节点的进程
1 查看sentinel的log日志:
$ tail -f 26379.log
(手动关闭了master6379节点)
38502:X 01 Sep 2020 17:26:38.311 # +sdown master mymaster 127.0.0.1 637938502:X 01 Sep 2020 17:26:38.395 # +odown master mymaster 127.0.0.1 6379 #quorum 2/2
2 查看sentinel之间的Pub/Sub Channel:
"+sdown"
"master mymaster 127.0.0.1 6379"
"pmessage"
"*"
"+odown"
"master mymaster 127.0.0.1 6379 #quorum 2/2"
"pmessage"
在三个sentinel中选出由哪个sentinel来做这次的主从自动切换,首先会sentinel投票
1 查看sentinel的log日志:
$ tail -f 26379.log
38502:X 01 Sep 2020 17:26:38.395 # +new-epoch 1
38502:X 01 Sep 2020 17:26:38.395 # +try-failover master mymaster 127.0.0.1 6379
38502:X 01 Sep 2020 17:26:38.396 # +vote-for-leader b60bd3e15db23a9862d213e7703001c72d48dc73 1 (给哨兵b60bd开启投票)
38502:X 01 Sep 2020 17:26:38.397 # fc976b271914f43a4a318dfe8c1f41a2e747f8d8 voted for b60bd3e15db23a9862d213e7703001c72d48dc73 1 (fc976b给sentinel Id=b60bd投1票)
38502:X 01 Sep 2020 17:26:38.454 # +elected-leader master mymaster 127.0.0.1 6379
$ tail -f 26379.log
38502:X 01 Sep 2020 17:26:38.454 # +failover-state-select-slave master mymaster 127.0.0.1 637938502:X 01 Sep 2020 17:26:38.545 # +selected-slave slave 127.0.0.1:6381 127.0.0.1 6381 @ mymaster 127.0.0.1 6379 (选择6381成为新的master)
38502:X 01 Sep 2020 17:26:38.545 * +failover-state-send-slaveof-noone slave 127.0.0.1:6381 127.0.0.1 6381 @ mymaster 127.0.0.1 6379 (6381成为新的master)
38502:X 01 Sep 2020 17:26:38.646 * +failover-state-wait-promotion slave 127.0.0.1:6381 127.0.0.1 6381 @ mymaster 127.0.0.1 6379
38502:X 01 Sep 2020 17:26:39.282 # +promoted-slave slave 127.0.0.1:6381 127.0.0.1 6381 @ mymaster 127.0.0.1 6379
38502:X 01 Sep 2020 17:26:39.282 # +failover-state-reconf-slaves master mymaster 127.0.0.1 6379
38502:X 01 Sep 2020 17:26:39.346 * +slave-reconf-sent slave 127.0.0.1:6380 127.0.0.1 6380 @ mymaster 127.0.0.1 6379
38502:X 01 Sep 2020 17:26:39.480 # -odown master mymaster 127.0.0.1 6379
38502:X 01 Sep 2020 17:26:40.131 * +slave-reconf-inprog slave 127.0.0.1:6380 127.0.0.1 6380 @ mymaster 127.0.0.1 6379
38502:X 01 Sep 2020 17:26:40.131 * +slave-reconf-done slave 127.0.0.1:6380 127.0.0.1 6380 @ mymaster 127.0.0.1 6379
38502:X 01 Sep 2020 17:26:40.186 # +failover-end master mymaster 127.0.0.1 6379
38502:X 01 Sep 2020 17:26:40.186 # +switch-master mymaster 127.0.0.1 6379 127.0.0.1 6381
38502:X 01 Sep 2020 17:26:40.186 * +slave slave 127.0.0.1:6380 127.0.0.1 6380 @ mymaster 127.0.0.1 6381
38502:X 01 Sep 2020 17:26:40.186 * +slave slave 127.0.0.1:6379 127.0.0.1 6379 @ mymaster 127.0.0.1 6381
- 存活的slave
- 复制偏移量最大的
- Run Id 最小的
$ src/redis-cli -p 6381
127.0.0.1:6381> info Replication
# Replication
role:master
connected_slaves:1
slave0:ip=127.0.0.1,port=6380,state=online,offset=79554,lag=0
master_replid:d349582dc829f56d1da32e2d2f1434c6f2c44802
master_replid2:46ef90de89e6771b67bc2b43371da2f97a03b4d1
master_repl_offset:79554
second_repl_offset:65161
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:631
repl_backlog_histlen:78924
127.0.0.1:6381>
$ src/redis-cli -p 26379
127.0.0.1:26379> PSUBSCRIBE *
Reading messages... (press Ctrl-C to quit)
"psubscribe"
"*"
(integer) 1
"pmessage"
"*"
"+sdown"
"master mymaster 127.0.0.1 6379"
"pmessage"
"*"
"+odown"
"master mymaster 127.0.0.1 6379 #quorum 2/2"
"pmessage"
"*"
"+new-epoch"
"1"
"pmessage"
"*"
"+try-failover"
"master mymaster 127.0.0.1 6379"
"pmessage"
"*"
"+vote-for-leader"
"b60bd3e15db23a9862d213e7703001c72d48dc73 1"
"pmessage"
"*"
"+elected-leader"
"master mymaster 127.0.0.1 6379"
"pmessage"
"*"
"+failover-state-select-slave"
"master mymaster 127.0.0.1 6379"
"pmessage"
"*"
"+selected-slave"
"slave 127.0.0.1:6381 127.0.0.1 6381 @ mymaster 127.0.0.1 6379"
"pmessage"
"*"
"+failover-state-send-slaveof-noone"
"slave 127.0.0.1:6381 127.0.0.1 6381 @ mymaster 127.0.0.1 6379"
"pmessage"
"*"
"+failover-state-wait-promotion"
"slave 127.0.0.1:6381 127.0.0.1 6381 @ mymaster 127.0.0.1 6379"
"pmessage"
"*"
"-role-change"
"slave 127.0.0.1:6381 127.0.0.1 6381 @ mymaster 127.0.0.1 6379 new reported role is master"
"pmessage"
"*"
"+promoted-slave"
"slave 127.0.0.1:6381 127.0.0.1 6381 @ mymaster 127.0.0.1 6379"
"pmessage"
"*"
"+failover-state-reconf-slaves"
"master mymaster 127.0.0.1 6379"
"pmessage"
"*"
"+slave-reconf-sent"
"slave 127.0.0.1:6380 127.0.0.1 6380 @ mymaster 127.0.0.1 6379"
"pmessage"
"*"
"-odown"
"master mymaster 127.0.0.1 6379"
"pmessage"
"*"
"+slave-reconf-inprog"
"slave 127.0.0.1:6380 127.0.0.1 6380 @ mymaster 127.0.0.1 6379"
"pmessage"
"*"
"+slave-reconf-done"
"slave 127.0.0.1:6380 127.0.0.1 6380 @ mymaster 127.0.0.1 6379"
"pmessage"
"*"
"+failover-end"
"master mymaster 127.0.0.1 6379"
"pmessage"
"*"
"+switch-master"
"mymaster 127.0.0.1 6379 127.0.0.1 6381"
"pmessage"
"*"
"+slave"
"slave 127.0.0.1:6380 127.0.0.1 6380 @ mymaster 127.0.0.1 6381"
$ tail -f 26379.log
38501:X 01 Sep 2020 17:15:48.851 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
38501:X 01 Sep 2020 17:15:48.851 # Redis version=5.0.9, bits=64, commit=00000000, modified=0, pid=38501, just started
38501:X 01 Sep 2020 17:15:48.851 # Configuration loaded
38502:X 01 Sep 2020 17:15:48.854 * Running mode=sentinel, port=26379.
38502:X 01 Sep 2020 17:15:48.855 # Sentinel ID is b60bd3e15db23a9862d213e7703001c72d48dc73
38502:X 01 Sep 2020 17:15:48.855 # +monitor master mymaster 127.0.0.1 6379 quorum 2
38502:X 01 Sep 2020 17:15:48.855 * +slave slave 127.0.0.1:6381 127.0.0.1 6381 @ mymaster 127.0.0.1 6379 (发现了slave)
38502:X 01 Sep 2020 17:17:59.296 * +slave slave 127.0.0.1:6380 127.0.0.1 6380 @ mymaster 127.0.0.1 6379 (发现了slave)
38502:X 01 Sep 2020 17:20:31.119 * +sentinel sentinel fc976b271914f43a4a318dfe8c1f41a2e747f8d8 127.0.0.1 26380 @ mymaster 127.0.0.1 6379 (发现了另外一个sentinel)
38502:X 01 Sep 2020 17:22:42.715 # +sdown slave 127.0.0.1:6380 127.0.0.1 6380 @ mymaster 127.0.0.1 6379
38502:X 01 Sep 2020 17:23:42.558 * +reboot slave 127.0.0.1:6380 127.0.0.1 6380 @ mymaster 127.0.0.1 6379
38502:X 01 Sep 2020 17:23:42.659 # -sdown slave 127.0.0.1:6380 127.0.0.1 6380 @ mymaster 127.0.0.1 6379
.............
............(手动关闭了master6379节点)
38502:X 01 Sep 2020 17:26:38.311 # +sdown master mymaster 127.0.0.1 637938502:X 01 Sep 2020 17:26:38.395 # +odown master mymaster 127.0.0.1 6379 #quorum 2/2
38502:X 01 Sep 2020 17:26:38.395 # +new-epoch 1
38502:X 01 Sep 2020 17:26:38.395 # +try-failover master mymaster 127.0.0.1 6379
38502:X 01 Sep 2020 17:26:38.396 # +vote-for-leader b60bd3e15db23a9862d213e7703001c72d48dc73 1 (给哨兵b60bd开启投票)
38502:X 01 Sep 2020 17:26:38.397 # fc976b271914f43a4a318dfe8c1f41a2e747f8d8 voted for b60bd3e15db23a9862d213e7703001c72d48dc73 1 (fc976b给sentinel Id=b60bd投1票)
38502:X 01 Sep 2020 17:26:38.454 # +elected-leader master mymaster 127.0.0.1 6379
38502:X 01 Sep 2020 17:26:38.454 # +failover-state-select-slave master mymaster 127.0.0.1 6379
38502:X 01 Sep 2020 17:26:38.545 # +selected-slave slave 127.0.0.1:6381 127.0.0.1 6381 @ mymaster 127.0.0.1 6379 (选择6381成为新的master)
38502:X 01 Sep 2020 17:26:38.545 * +failover-state-send-slaveof-noone slave 127.0.0.1:6381 127.0.0.1 6381 @ mymaster 127.0.0.1 6379 (6381成为新的master)
38502:X 01 Sep 2020 17:26:38.646 * +failover-state-wait-promotion slave 127.0.0.1:6381 127.0.0.1 6381 @ mymaster 127.0.0.1 6379
38502:X 01 Sep 2020 17:26:39.282 # +promoted-slave slave 127.0.0.1:6381 127.0.0.1 6381 @ mymaster 127.0.0.1 6379
38502:X 01 Sep 2020 17:26:39.282 # +failover-state-reconf-slaves master mymaster 127.0.0.1 6379
38502:X 01 Sep 2020 17:26:39.346 * +slave-reconf-sent slave 127.0.0.1:6380 127.0.0.1 6380 @ mymaster 127.0.0.1 6379
38502:X 01 Sep 2020 17:26:39.480 # -odown master mymaster 127.0.0.1 6379
38502:X 01 Sep 2020 17:26:40.131 * +slave-reconf-inprog slave 127.0.0.1:6380 127.0.0.1 6380 @ mymaster 127.0.0.1 6379
38502:X 01 Sep 2020 17:26:40.131 * +slave-reconf-done slave 127.0.0.1:6380 127.0.0.1 6380 @ mymaster 127.0.0.1 6379
38502:X 01 Sep 2020 17:26:40.186 # +failover-end master mymaster 127.0.0.1 6379
38502:X 01 Sep 2020 17:26:40.186 # +switch-master mymaster 127.0.0.1 6379 127.0.0.1 6381
38502:X 01 Sep 2020 17:26:40.186 * +slave slave 127.0.0.1:6380 127.0.0.1 6380 @ mymaster 127.0.0.1 6381
38502:X 01 Sep 2020 17:26:40.186 * +slave slave 127.0.0.1:6379 127.0.0.1 6379 @ mymaster 127.0.0.1 6381
38502:X 01 Sep 2020 17:27:10.258 # +sdown slave 127.0.0.1:6379 127.0.0.1 6379 @ mymaster 127.0.0.1 6381
原文:https://www.cnblogs.com/yangweiqiang/p/13627041.html