1.概述 在Docker中搭建Redis哨兵模式非常方便,下面是一个示例,演示一下如何使用Docker Compose搭建一个Redis哨兵模式环境。首先,确保我们本地环境已经安装了Docker和Docker Compose。
示例代码地址
2.搭建主从模式 2.1.创建文件夹 1 2 3 4 mkdir -p redis-sentinel-replication/redis/.data mkdir -p redis-sentinel-replication/redis/redis-server1 mkdir -p redis-sentinel-replication/redis/redis-server2 mkdir -p redis-sentinel-replication/redis/redis-server3
2.2.创建配置文件
给出的样例仅仅用于windows wsl docker本地测试,不具有正式环境部署参考价值
2.2.1.redis-server1/redis.conf 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 bind 0.0.0.0 loglevel debug logfile "/data/redis-6379.log" save 3600 1 save 300 100 save 60 10000 stop-writes-on-bgsave-error no rdbcompression yes rdbchecksum yes dbfilename dump.rdb rdb-del-sync-files no dir /data/ requirepass 123456 appendonly yes appendfilename "appendonly.aof" appendfsync everysec no-appendfsync-on-rewrite no auto-aof-rewrite-percentage 100 auto-aof-rewrite-min-size 64mb aof-load-truncated yes aof-use-rdb-preamble yes masterauth 123456 replica-announce-ip 172.200.0.2 replica-announce-port 6379 # 这里不开启的话,slave不一定能同步成功。windows wsl docker必须这样设置 repl-diskless-load on-empty-db
2.2.2.redis-server2/redis.conf 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 bind 0.0 .0 .0 loglevel debug logfile "/data/redis-6380.log" save 3600 1 save 300 100 save 60 10000 stop-writes-on-bgsave-error no rdbcompression yes rdbchecksum yes dbfilename dump.rdb rdb-del-sync-files no dir /data/ requirepass 123456 appendonly yes appendfilename "appendonly.aof" appendfsync everysec no -appendfsync-on-rewrite no auto-aof-rewrite-percentage 100 auto-aof-rewrite-min-size 64mb aof-load-truncated yes aof-use-rdb-preamble yes masterauth 123456 replica-announce-ip 172.200 .0 .3 replica-announce-port 6379 repl-diskless-load on-empty-db slaveof 172.200 .0 .2 6379 slave-read-only yes slave-serve-stale-data yes
2.2.3.redis-server3/redis.conf 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 bind 0.0 .0 .0 loglevel debug logfile "/data/redis-6381.log" save 3600 1 save 300 100 save 60 10000 stop-writes-on-bgsave-error no rdbcompression yes rdbchecksum yes dbfilename dump.rdb rdb-del-sync-files no dir /data/ requirepass 123456 appendonly yes appendfilename "appendonly.aof" appendfsync everysec no -appendfsync-on-rewrite no auto-aof-rewrite-percentage 100 auto-aof-rewrite-min-size 64mb aof-load-truncated yes aof-use-rdb-preamble yes masterauth 123456 replica-announce-ip 172.200 .0 .4 replica-announce-port 6379 repl-diskless-load on-empty-db slaveof 172.200 .0 .2 6379 slave-read-only yes slave-serve-stale-data yes
3.搭建哨兵集群 首先需要有一个redis主从集群,才能接着做redis哨兵。具体可以参考《第2小节搭建主从模式》
3.1.创建文件夹 1 2 3 4 mkdir -p redis-sentinel-replication/sentinel/.data mkdir -p redis-sentinel-replication/sentinel/conf/redis-sentinel1 mkdir -p redis-sentinel-replication/sentinel/conf/redis-sentinel2 mkdir -p redis-sentinel-replication/sentinel/conf/redis-sentinel3
3.2.创建配置文件 3.2.1.redis-sentinel1/sentinel.conf 1 2 3 4 5 6 7 8 9 10 11 12 protected-mode no port 26379 daemonize no pidfile "/var/run/redis-sentine1.pid" logfile "/data/sentinel-1.log" sentinel announce-ip "172.200.0.5" sentinel announce-port 26379 dir "/data" sentinel monitor mymaster 172.200.0.2 6379 2 sentinel auth-pass mymaster 123456 acllog-max-len 128 sentinel deny-scripts-reconfig yes
3.2.2.redis-sentinel2/sentinel.conf 1 2 3 4 5 6 7 8 9 10 11 12 protected-mode no port 26379 daemonize no pidfile "/var/run/redis-sentine2.pid" logfile "/data/sentinel-2.log" sentinel announce-ip "172.200.0.6" sentinel announce-port 26379 dir "/data" sentinel monitor mymaster 172.200 .0 .2 6379 2 sentinel auth-pass mymaster 123456 acllog-max-len 128 sentinel deny-scripts-reconfig yes
3.2.3.redis-sentinel3/sentinel.conf 1 2 3 4 5 6 7 8 9 10 11 12 protected-mode no port 26379 daemonize no pidfile "/var/run/redis-sentine3.pid" logfile "/data/sentinel-3.log" sentinel announce-ip "172.200.0.7" sentinel announce-port 26379 dir "/data" sentinel monitor mymaster 172.200 .0 .2 6379 2 sentinel auth-pass mymaster 123456 acllog-max-len 128 sentinel deny-scripts-reconfig yes
4.创建docker-compose.yml
文件 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 services: redis-master: image: redis:7.4.5 container_name: redis-master restart: always ports: - 6379 :6379 environment: TZ: "Asia/Shanghai" volumes: - ./redis/conf/redis-master.conf:/usr/local/etc/redis/redis.conf - ./redis/.data/redis-master/:/data:Z command: ["redis-server" , "/usr/local/etc/redis/redis.conf" ] networks: redis-network: ipv4_address: 172.200 .0 .2 redis-slave1: image: redis:7.4.5 container_name: redis-slave1 restart: always ports: - 6380 :6379 environment: TZ: "Asia/Shanghai" volumes: - ./redis/conf/redis-slave1.conf:/usr/local/etc/redis/redis.conf - ./redis/.data/redis-slave1/:/data:Z command: ["redis-server" , "/usr/local/etc/redis/redis.conf" ] networks: redis-network: ipv4_address: 172.200 .0 .3 redis-slave2: image: redis:7.4.5 container_name: redis-slave2 restart: always ports: - 6381 :6379 environment: TZ: "Asia/Shanghai" volumes: - ./redis/conf/redis-slave2.conf:/usr/local/etc/redis/redis.conf - ./redis/.data/redis-slave2/:/data:Z command: ["redis-server" , "/usr/local/etc/redis/redis.conf" ] networks: redis-network: ipv4_address: 172.200 .0 .4 redis-sentinel1: image: redis:7.4.5 container_name: redis-sentinel1 restart: always environment: TZ: "Asia/Shanghai" ports: - 26379 :26379 volumes: - ./sentinel/conf/redis-sentinel1:/usr/local/etc/redis/conf - ./sentinel/.data:/data:Z command: redis-sentinel /usr/local/etc/redis/conf/sentinel.conf networks: redis-network: ipv4_address: 172.200 .0 .5 redis-sentinel2: image: redis:7.4.5 container_name: redis-sentinel2 restart: always environment: TZ: "Asia/Shanghai" ports: - 26380 :26379 volumes: - ./sentinel/conf/redis-sentinel2:/usr/local/etc/redis/conf - ./sentinel/.data:/data:Z command: redis-sentinel /usr/local/etc/redis/conf/sentinel.conf networks: redis-network: ipv4_address: 172.200 .0 .6 redis-sentinel3: image: redis:7.4.5 container_name: redis-sentinel3 restart: always environment: TZ: "Asia/Shanghai" ports: - 26381 :26379 volumes: - ./sentinel/conf/redis-sentinel3:/usr/local/etc/redis/conf - ./sentinel/.data:/data:Z command: redis-sentinel /usr/local/etc/redis/conf/sentinel.conf networks: redis-network: ipv4_address: 172.200 .0 .7 networks: redis-network: driver: bridge ipam: config: - subnet: 172.200 .0 .0 /24
4.1.启动
启动容器
查看容器状态
4.2.简单测试
查看Sentinel集群是否生效
进入 Sentinel 容器,使用 Sentinel API 查看监控情况:
1 2 3 4 docker exec -it redis-sentinel1 /bin/bash redis-cli -p 26379 sentinel master mymaster # 查看redis主信息 sentinel slaves mymaster # 查看从redis信息
2. 执行上述指令,当看到以下的信息,即集群已经生效
1 2 3 4 5 6 7 8 ...... 31) "num-slaves" 32) "2" 33) "num-other-sentinels" 34) "2" 35) "quorum" 36) "2" ......
我们来手动停止redis-master查看故障转移过程
1 docker stop redis-master
redis-sentinel1日志分析
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 # 当前Sentinel 实例自己无法通过心跳检测连接到主节点,所以它先将其标记为“主观下线” 1:X 24 Aug 2025 06:46:59.289 # +sdown master mymaster 172.200.0.2 6379 1:X 24 Aug 2025 06:46:59.302 * Sentinel new configuration saved on disk # Sentinel 集群开启了一个新的纪元(Epoch),编号为 1。每次故障转移操作都会有一个唯一的、更大的 epoch 编号。所有配置更新和领导选举都基于这个 epoch,它确保了整个集群的状态一致性,让所有 Sentinel 都知道哪次故障转移是最新的 1:X 24 Aug 2025 06:46:59.304 # +new-epoch 1 # 新的 epoch 信息也被持久化到磁盘配置中 1:X 24 Aug 2025 06:46:59.312 * Sentinel new configuration saved on disk # 当前这个 Sentinel 节点 (1:X) 正在参与一次投票。它投票给 Sentinel 节点 f4a85f4091...,支持它成为负责执行本次(epoch 1)故障转移的领导者(Leader) 1:X 24 Aug 2025 06:46:59.313 # +vote-for-leader f4a85f409178a652d59e669e127bb144dbaeb5a3 1 # 这是故障转移的关键触发点。Sentinel 节点 1:X 收到足够多的投票,现在宣布主节点 mymaster 为 “客观下线” (Objectively Down) 1:X 24 Aug 2025 06:46:59.343 # +odown master mymaster 172.200.0.2 6379 #quorum 3/2 # 这个 Sentinel 节点自己计算了一下,它不会立即发起故障转移。它设置了一个延迟时间(6分钟),在这个时间之前它自己不会尝试成为领导者去执行故障转移 1:X 24 Aug 2025 06:46:59.344 * Next failover delay: I will not start a failover before Sun Aug 24 06:52:59 2025 # 当前这个 Sentinel 节点 (1:X) 收到了来自故障转移领导者(f4a85f4091...,其运行在 172.200.0.6:26379)的广播消息。消息内容是已经完成的、针对 mymaster 的新配置 1:X 24 Aug 2025 06:46:59.669 # +config-update-from sentinel f4a85f409178a652d59e669e127bb144dbaeb5a3 172.200.0.6 26379 @ mymaster 172.200.0.2 6379 # 这是最核心的操作日志! Sentinel 正式宣布:主节点 mymaster 已经从 172.200.0.2:6379 (旧主,即 redis-master) 切换(Failover) 到了 172.200.0.3:6379 (新主,即 redis-slave1)。 1:X 24 Aug 2025 06:46:59.671 # +switch-master mymaster 172.200.0.2 6379 172.200.0.3 6379 # 领导者 Sentinel 已经重新配置了集群的拓扑结构,并更新了所有 Sentinel 的视图。 # 它发现了从节点 172.200.0.4 (redis-slave2) 现在已经成功地复制(replicate) 新的主节点 (172.200.0.3) 1:X 24 Aug 2025 06:46:59.673 * +slave slave 172.200.0.4:6379 172.200.0.4 6379 @ mymaster 172.200.0.3 6379 # 它尝试将旧的、故障的主节点 (172.200.0.2) 也重新配置为新的主节点的一个从节点 1:X 24 Aug 2025 06:46:59.675 * +slave slave 172.200.0.2:6379 172.200.0.2 6379 @ mymaster 172.200.0.3 6379 # 当前 Sentinel 节点将接收到的新集群配置(包括新主节点、所有从节点信息)再次持久化到本地磁盘。现在它的配置文件已经完全更新 1:X 24 Aug 2025 06:46:59.684 * Sentinel new configuration saved on disk # 在大约 30 秒后,这个 Sentinel 节点发现,那个旧的、被降级为从节点的主节点 (172.200.0.2) 依然无法连接。 1:X 24 Aug 2025 06:47:29.752 # +sdown slave 172.200.0.2:6379 172.200.0.2 6379 @ mymaster 172.200.0.3 6379
redis-sentinel2日志分析
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 # Sentinel 主观判定主节点 mymaster (172.200.0.2:6379) 已下线 1:X 24 Aug 2025 06:46:59.227 # +sdown master mymaster 172.200.0.2 6379 # Sentinel 集群达成共识,确认主节点客观下线 (quorum 2/2 表示2个Sentinel同意,达到法定人数2) 1:X 24 Aug 2025 06:46:59.282 # +odown master mymaster 172.200.0.2 6379 #quorum 2/2 # 开启一个新的纪元(epoch),编号为1,用于标记这次故障转移操作 1:X 24 Aug 2025 06:46:59.284 # +new-epoch 1 # 开始尝试执行故障转移操作 1:X 24 Aug 2025 06:46:59.285 # +try-failover master mymaster 172.200.0.2 6379 # 将新的配置信息保存到磁盘 1:X 24 Aug 2025 06:46:59.293 * Sentinel new configuration saved on disk # 当前Sentinel投票给ID为f4a85f409178a652d59e669e127bb144dbaeb5a3的Sentinel成为领导者,负责纪元1的故障转移 1:X 24 Aug 2025 06:46:59.295 # +vote-for-leader f4a85f409178a652d59e669e127bb144dbaeb5a3 1 # Sentinel e94cf26d... 投票给 f4a85f4091... 成为领导者 1:X 24 Aug 2025 06:46:59.314 * e94cf26d20f1d29e0f771f7af25a68a2df463e85 voted for f4a85f409178a652d59e669e127bb144dbaeb5a3 1 # Sentinel 69d264a9... 投票给 f4a85f4091... 成为领导者 1:X 24 Aug 2025 06:46:59.315 * 69d264a94c3ebf062e09d15580fb9f737888deac voted for f4a85f409178a652d59e669e127bb144dbaeb5a3 1 # f4a85f4091... 成功当选为故障转移的领导者 1:X 24 Aug 2025 06:46:59.387 # +elected-leader master mymaster 172.200.0.2 6379 # 故障转移进入选择从节点阶段,领导者正在评估哪个从节点最适合提升为新主节点 1:X 24 Aug 2025 06:46:59.388 # +failover-state-select-slave master mymaster 172.200.0.2 6379 # 已选择从节点 172.200.0.3:6379 作为新的主节点 1:X 24 Aug 2025 06:46:59.456 # +selected-slave slave 172.200.0.3:6379 172.200.0.3 6379 @ mymaster 172.200.0.2 6379 # 向选中的从节点发送 SLAVEOF NO ONE 命令,使其停止复制并成为新的主节点 1:X 24 Aug 2025 06:46:59.457 * +failover-state-send-slaveof-noone slave 172.200.0.3:6379 172.200.0.3 6379 @ mymaster 172.200.0.2 6379 # 等待从节点被提升为主节点的确认 1:X 24 Aug 2025 06:46:59.549 * +failover-state-wait-promotion slave 172.200.0.3:6379 172.200.0.3 6379 @ mymaster 172.200.0.2 6379 # 将新的配置信息保存到磁盘 1:X 24 Aug 2025 06:46:59.618 * Sentinel new configuration saved on disk # 从节点 172.200.0.3:6379 已成功提升为新的主节点 1:X 24 Aug 2025 06:46:59.619 # +promoted-slave slave 172.200.0.3:6379 172.200.0.3 6379 @ mymaster 172.200.0.2 6379 # 故障转移进入重新配置从节点阶段,开始将其他从节点指向新的主节点 1:X 24 Aug 2025 06:46:59.620 # +failover-state-reconf-slaves master mymaster 172.200.0.2 6379 # 已向从节点 172.200.0.4:6379 发送重新配置命令,使其复制新的主节点 1:X 24 Aug 2025 06:46:59.668 * +slave-reconf-sent slave 172.200.0.4:6379 172.200.0.4 6379 @ mymaster 172.200.0.2 6379 # 主节点不再处于客观下线状态(可能是因为故障转移已开始处理) 1:X 24 Aug 2025 06:47:00.408 # -odown master mymaster 172.200.0.2 6379 # 从节点 172.200.0.4:6379 的重新配置正在进行中 1:X 24 Aug 2025 06:47:00.647 * +slave-reconf-inprog slave 172.200.0.4:6379 172.200.0.4 6379 @ mymaster 172.200.0.2 6379 # 从节点 172.200.0.4:6379 的重新配置已完成 1:X 24 Aug 2025 06:47:00.649 * +slave-reconf-done slave 172.200.0.4:6379 172.200.0.4 6379 @ mymaster 172.200.0.2 6379 # 故障转移操作完成 1:X 24 Aug 2025 06:47:00.699 # +failover-end master mymaster 172.200.0.2 6379 # 主节点已切换:从 172.200.0.2:6379 切换到 172.200.0.3:6379 1:X 24 Aug 2025 06:47:00.701 # +switch-master mymaster 172.200.0.2 6379 172.200.0.3 6379 # 发现从节点 172.200.0.4:6379 现在复制新的主节点 1:X 24 Aug 2025 06:47:00.703 * +slave slave 172.200.0.4:6379 172.200.0.4 6379 @ mymaster 172.200.0.3 6379 # 尝试将旧的主节点 172.200.0.2:6379 配置为新主节点的从节点 1:X 24 Aug 2025 06:47:00.705 * +slave slave 172.200.0.2:6379 172.200.0.2 6379 @ mymaster 172.200.0.3 6379 # 将新的配置信息保存到磁盘 1:X 24 Aug 2025 06:47:00.716 * Sentinel new configuration saved on disk # 约30秒后,Sentinel 发现旧的主节点(现在是作为从节点)仍然无法访问,标记为主观下线 1:X 24 Aug 2025 06:47:30.747 # +sdown slave 172.200.0.2:6379 172.200.0.2 6379 @ mymaster 172.200.0.3 6379
99.常见问题
Failed trying to load the MASTER synchronization DB from disk: No such file or directory
redis官方配置redis.conf
其中有这么一段
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 # ----------------------------------------------------------------------------- # # socket, or store the RDB to a file and read that file after it was completely # received from the master. # # the RDB file may increase replication time (and even increase the master's # Copy on Write memory and replica buffers). # However, parsing the RDB file directly from the socket may mean that we have # to flush the contents of the current database before the full rdb was # received. For this reason we have the following options: # # "disabled" - Don' t use diskless load (store the rdb file to the disk first)# "on-empty-db" - Use diskless load only when it is completely safe.# "swapdb" - Keep current db contents in RAM while parsing the data directly# from the socket. Replicas in this mode can keep serving current # data set while replication is in progress, except for cases where # they can't recognize master as having a data set from same # replication history. # Note that this requires sufficient memory, if you don' t have it,# you risk an OOM kill . repl-diskless-load disabled
我们将上面的属性设置为on-empty-db
即可
WARNING: Sentinel was not able to save the new configuration on disk!!!: Device or resource busy
如果直接使用文件映射指定sentinel.conf到容器内,这么做有可能导致哨兵没有写入配置文件的权限。
解决方案:使用文件夹映射。