1.概述

在Docker中搭建Redis哨兵模式非常方便,下面是一个示例,演示一下如何使用Docker Compose搭建一个Redis哨兵模式环境。首先,确保我们本地环境已经安装了Docker和Docker Compose。

示例代码地址

2.搭建主从模式

2.1.创建文件夹

1
2
3
4
mkdir -p redis-sentinel-replication/redis/.data
mkdir -p redis-sentinel-replication/redis/redis-server1
mkdir -p redis-sentinel-replication/redis/redis-server2
mkdir -p redis-sentinel-replication/redis/redis-server3

2.2.创建配置文件

给出的样例仅仅用于windows wsl docker本地测试,不具有正式环境部署参考价值

2.2.1.redis-server1/redis.conf

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
bind 0.0.0.0
loglevel debug
logfile "/data/redis-6379.log"
save 3600 1
save 300 100
save 60 10000
stop-writes-on-bgsave-error no
rdbcompression yes
rdbchecksum yes
dbfilename dump.rdb
rdb-del-sync-files no
dir /data/
requirepass 123456
appendonly yes
appendfilename "appendonly.aof"
appendfsync everysec
no-appendfsync-on-rewrite no
auto-aof-rewrite-percentage 100
auto-aof-rewrite-min-size 64mb
aof-load-truncated yes
aof-use-rdb-preamble yes
masterauth 123456
replica-announce-ip 172.200.0.2
replica-announce-port 6379
# 这里不开启的话,slave不一定能同步成功。windows wsl docker必须这样设置
repl-diskless-load on-empty-db

2.2.2.redis-server2/redis.conf

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
bind 0.0.0.0
loglevel debug
logfile "/data/redis-6380.log"
save 3600 1
save 300 100
save 60 10000
stop-writes-on-bgsave-error no
rdbcompression yes
rdbchecksum yes
dbfilename dump.rdb
rdb-del-sync-files no
dir /data/
requirepass 123456
appendonly yes
appendfilename "appendonly.aof"
appendfsync everysec
no-appendfsync-on-rewrite no
auto-aof-rewrite-percentage 100
auto-aof-rewrite-min-size 64mb
aof-load-truncated yes
aof-use-rdb-preamble yes
masterauth 123456
replica-announce-ip 172.200.0.3
replica-announce-port 6379
# 这里不开启的话,slave不一定能同步成功。windows wsl docker必须这样设置
repl-diskless-load on-empty-db

slaveof 172.200.0.2 6379
slave-read-only yes
slave-serve-stale-data yes

2.2.3.redis-server3/redis.conf

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
bind 0.0.0.0
loglevel debug
logfile "/data/redis-6381.log"
save 3600 1
save 300 100
save 60 10000
stop-writes-on-bgsave-error no
rdbcompression yes
rdbchecksum yes
dbfilename dump.rdb
rdb-del-sync-files no
dir /data/
requirepass 123456
appendonly yes
appendfilename "appendonly.aof"
appendfsync everysec
no-appendfsync-on-rewrite no
auto-aof-rewrite-percentage 100
auto-aof-rewrite-min-size 64mb
aof-load-truncated yes
aof-use-rdb-preamble yes
masterauth 123456
replica-announce-ip 172.200.0.4
replica-announce-port 6379
# 这里不开启的话,slave不一定能同步成功。windows wsl docker必须这样设置
repl-diskless-load on-empty-db

slaveof 172.200.0.2 6379
slave-read-only yes
slave-serve-stale-data yes

3.搭建哨兵集群

首先需要有一个redis主从集群,才能接着做redis哨兵。具体可以参考《第2小节搭建主从模式》

3.1.创建文件夹

1
2
3
4
mkdir -p redis-sentinel-replication/sentinel/.data
mkdir -p redis-sentinel-replication/sentinel/conf/redis-sentinel1
mkdir -p redis-sentinel-replication/sentinel/conf/redis-sentinel2
mkdir -p redis-sentinel-replication/sentinel/conf/redis-sentinel3

3.2.创建配置文件

3.2.1.redis-sentinel1/sentinel.conf

1
2
3
4
5
6
7
8
9
10
11
12
protected-mode no
port 26379
daemonize no
pidfile "/var/run/redis-sentine1.pid"
logfile "/data/sentinel-1.log"
sentinel announce-ip "172.200.0.5"
sentinel announce-port 26379
dir "/data"
sentinel monitor mymaster 172.200.0.2 6379 2
sentinel auth-pass mymaster 123456
acllog-max-len 128
sentinel deny-scripts-reconfig yes

3.2.2.redis-sentinel2/sentinel.conf

1
2
3
4
5
6
7
8
9
10
11
12
protected-mode no
port 26379
daemonize no
pidfile "/var/run/redis-sentine2.pid"
logfile "/data/sentinel-2.log"
sentinel announce-ip "172.200.0.6"
sentinel announce-port 26379
dir "/data"
sentinel monitor mymaster 172.200.0.2 6379 2
sentinel auth-pass mymaster 123456
acllog-max-len 128
sentinel deny-scripts-reconfig yes

3.2.3.redis-sentinel3/sentinel.conf

1
2
3
4
5
6
7
8
9
10
11
12
protected-mode no
port 26379
daemonize no
pidfile "/var/run/redis-sentine3.pid"
logfile "/data/sentinel-3.log"
sentinel announce-ip "172.200.0.7"
sentinel announce-port 26379
dir "/data"
sentinel monitor mymaster 172.200.0.2 6379 2
sentinel auth-pass mymaster 123456
acllog-max-len 128
sentinel deny-scripts-reconfig yes

4.创建docker-compose.yml文件

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
services:
# redis-master
redis-master:
image: redis:7.4.5
container_name: redis-master
restart: always
ports:
- 6379:6379
environment:
TZ: "Asia/Shanghai"
volumes:
- ./redis/conf/redis-master.conf:/usr/local/etc/redis/redis.conf
- ./redis/.data/redis-master/:/data:Z
command: ["redis-server", "/usr/local/etc/redis/redis.conf"]
networks:
redis-network:
ipv4_address: 172.200.0.2

# redis slave 1
redis-slave1:
image: redis:7.4.5
container_name: redis-slave1
restart: always
ports:
- 6380:6379
environment:
TZ: "Asia/Shanghai"
volumes:
- ./redis/conf/redis-slave1.conf:/usr/local/etc/redis/redis.conf
- ./redis/.data/redis-slave1/:/data:Z
command: ["redis-server", "/usr/local/etc/redis/redis.conf"]
networks:
redis-network:
ipv4_address: 172.200.0.3

# redis slave 2
redis-slave2:
image: redis:7.4.5
container_name: redis-slave2
restart: always
ports:
- 6381:6379
environment:
TZ: "Asia/Shanghai"
volumes:
- ./redis/conf/redis-slave2.conf:/usr/local/etc/redis/redis.conf
- ./redis/.data/redis-slave2/:/data:Z
command: ["redis-server", "/usr/local/etc/redis/redis.conf"]
networks:
redis-network:
ipv4_address: 172.200.0.4

# sentinel 1
redis-sentinel1:
image: redis:7.4.5
container_name: redis-sentinel1
restart: always
environment:
TZ: "Asia/Shanghai"
ports:
- 26379:26379
volumes:
- ./sentinel/conf/redis-sentinel1:/usr/local/etc/redis/conf
- ./sentinel/.data:/data:Z
command: redis-sentinel /usr/local/etc/redis/conf/sentinel.conf
networks:
redis-network:
ipv4_address: 172.200.0.5

# sentinel 2
redis-sentinel2:
image: redis:7.4.5
container_name: redis-sentinel2
restart: always
environment:
TZ: "Asia/Shanghai"
ports:
- 26380:26379
volumes:
- ./sentinel/conf/redis-sentinel2:/usr/local/etc/redis/conf
- ./sentinel/.data:/data:Z
command: redis-sentinel /usr/local/etc/redis/conf/sentinel.conf
networks:
redis-network:
ipv4_address: 172.200.0.6

# sentinel 3
redis-sentinel3:
image: redis:7.4.5
container_name: redis-sentinel3
restart: always
environment:
TZ: "Asia/Shanghai"
ports:
- 26381:26379
volumes:
- ./sentinel/conf/redis-sentinel3:/usr/local/etc/redis/conf
- ./sentinel/.data:/data:Z
command: redis-sentinel /usr/local/etc/redis/conf/sentinel.conf
networks:
redis-network:
ipv4_address: 172.200.0.7

networks:
redis-network:
driver: bridge
ipam:
config:
- subnet: 172.200.0.0/24

4.1.启动

  1. 启动容器
1
docker compose up -d

  1. 查看容器状态
1
docker ps

4.2.简单测试

  1. 查看Sentinel集群是否生效
    1. 进入 Sentinel 容器,使用 Sentinel API 查看监控情况:
1
2
3
4
docker exec -it redis-sentinel1 /bin/bash
redis-cli -p 26379
sentinel master mymaster # 查看redis主信息
sentinel slaves mymaster # 查看从redis信息

2. 执行上述指令,当看到以下的信息,即集群已经生效
1
2
3
4
5
6
7
8
......
31) "num-slaves"
32) "2"
33) "num-other-sentinels"
34) "2"
35) "quorum"
36) "2"
......
  1. 我们来手动停止redis-master查看故障转移过程
1
docker stop redis-master
  1. redis-sentinel1日志分析
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
# 当前Sentinel 实例自己无法通过心跳检测连接到主节点,所以它先将其标记为“主观下线”
1:X 24 Aug 2025 06:46:59.289 # +sdown master mymaster 172.200.0.2 6379
1:X 24 Aug 2025 06:46:59.302 * Sentinel new configuration saved on disk
# Sentinel 集群开启了一个新的纪元(Epoch),编号为 1。每次故障转移操作都会有一个唯一的、更大的 epoch 编号。所有配置更新和领导选举都基于这个 epoch,它确保了整个集群的状态一致性,让所有 Sentinel 都知道哪次故障转移是最新的
1:X 24 Aug 2025 06:46:59.304 # +new-epoch 1
# 新的 epoch 信息也被持久化到磁盘配置中
1:X 24 Aug 2025 06:46:59.312 * Sentinel new configuration saved on disk
# 当前这个 Sentinel 节点 (1:X) 正在参与一次投票。它投票给 Sentinel 节点 f4a85f4091...,支持它成为负责执行本次(epoch 1)故障转移的领导者(Leader)
1:X 24 Aug 2025 06:46:59.313 # +vote-for-leader f4a85f409178a652d59e669e127bb144dbaeb5a3 1
# 这是故障转移的关键触发点。Sentinel 节点 1:X 收到足够多的投票,现在宣布主节点 mymaster 为 “客观下线” (Objectively Down)
1:X 24 Aug 2025 06:46:59.343 # +odown master mymaster 172.200.0.2 6379 #quorum 3/2
# 这个 Sentinel 节点自己计算了一下,它不会立即发起故障转移。它设置了一个延迟时间(6分钟),在这个时间之前它自己不会尝试成为领导者去执行故障转移
1:X 24 Aug 2025 06:46:59.344 * Next failover delay: I will not start a failover before Sun Aug 24 06:52:59 2025
# 当前这个 Sentinel 节点 (1:X) 收到了来自故障转移领导者(f4a85f4091...,其运行在 172.200.0.6:26379)的广播消息。消息内容是已经完成的、针对 mymaster 的新配置
1:X 24 Aug 2025 06:46:59.669 # +config-update-from sentinel f4a85f409178a652d59e669e127bb144dbaeb5a3 172.200.0.6 26379 @ mymaster 172.200.0.2 6379
# 这是最核心的操作日志! Sentinel 正式宣布:主节点 mymaster 已经从 172.200.0.2:6379 (旧主,即 redis-master) 切换(Failover) 到了 172.200.0.3:6379 (新主,即 redis-slave1)。
1:X 24 Aug 2025 06:46:59.671 # +switch-master mymaster 172.200.0.2 6379 172.200.0.3 6379
# 领导者 Sentinel 已经重新配置了集群的拓扑结构,并更新了所有 Sentinel 的视图。
# 它发现了从节点 172.200.0.4 (redis-slave2) 现在已经成功地复制(replicate) 新的主节点 (172.200.0.3)
1:X 24 Aug 2025 06:46:59.673 * +slave slave 172.200.0.4:6379 172.200.0.4 6379 @ mymaster 172.200.0.3 6379
# 它尝试将旧的、故障的主节点 (172.200.0.2) 也重新配置为新的主节点的一个从节点
1:X 24 Aug 2025 06:46:59.675 * +slave slave 172.200.0.2:6379 172.200.0.2 6379 @ mymaster 172.200.0.3 6379
# 当前 Sentinel 节点将接收到的新集群配置(包括新主节点、所有从节点信息)再次持久化到本地磁盘。现在它的配置文件已经完全更新
1:X 24 Aug 2025 06:46:59.684 * Sentinel new configuration saved on disk
# 在大约 30 秒后,这个 Sentinel 节点发现,那个旧的、被降级为从节点的主节点 (172.200.0.2) 依然无法连接。
1:X 24 Aug 2025 06:47:29.752 # +sdown slave 172.200.0.2:6379 172.200.0.2 6379 @ mymaster 172.200.0.3 6379
  1. redis-sentinel2日志分析
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
# Sentinel 主观判定主节点 mymaster (172.200.0.2:6379) 已下线
1:X 24 Aug 2025 06:46:59.227 # +sdown master mymaster 172.200.0.2 6379

# Sentinel 集群达成共识,确认主节点客观下线 (quorum 2/2 表示2个Sentinel同意,达到法定人数2)
1:X 24 Aug 2025 06:46:59.282 # +odown master mymaster 172.200.0.2 6379 #quorum 2/2

# 开启一个新的纪元(epoch),编号为1,用于标记这次故障转移操作
1:X 24 Aug 2025 06:46:59.284 # +new-epoch 1

# 开始尝试执行故障转移操作
1:X 24 Aug 2025 06:46:59.285 # +try-failover master mymaster 172.200.0.2 6379

# 将新的配置信息保存到磁盘
1:X 24 Aug 2025 06:46:59.293 * Sentinel new configuration saved on disk

# 当前Sentinel投票给ID为f4a85f409178a652d59e669e127bb144dbaeb5a3的Sentinel成为领导者,负责纪元1的故障转移
1:X 24 Aug 2025 06:46:59.295 # +vote-for-leader f4a85f409178a652d59e669e127bb144dbaeb5a3 1

# Sentinel e94cf26d... 投票给 f4a85f4091... 成为领导者
1:X 24 Aug 2025 06:46:59.314 * e94cf26d20f1d29e0f771f7af25a68a2df463e85 voted for f4a85f409178a652d59e669e127bb144dbaeb5a3 1

# Sentinel 69d264a9... 投票给 f4a85f4091... 成为领导者
1:X 24 Aug 2025 06:46:59.315 * 69d264a94c3ebf062e09d15580fb9f737888deac voted for f4a85f409178a652d59e669e127bb144dbaeb5a3 1

# f4a85f4091... 成功当选为故障转移的领导者
1:X 24 Aug 2025 06:46:59.387 # +elected-leader master mymaster 172.200.0.2 6379

# 故障转移进入选择从节点阶段,领导者正在评估哪个从节点最适合提升为新主节点
1:X 24 Aug 2025 06:46:59.388 # +failover-state-select-slave master mymaster 172.200.0.2 6379

# 已选择从节点 172.200.0.3:6379 作为新的主节点
1:X 24 Aug 2025 06:46:59.456 # +selected-slave slave 172.200.0.3:6379 172.200.0.3 6379 @ mymaster 172.200.0.2 6379

# 向选中的从节点发送 SLAVEOF NO ONE 命令,使其停止复制并成为新的主节点
1:X 24 Aug 2025 06:46:59.457 * +failover-state-send-slaveof-noone slave 172.200.0.3:6379 172.200.0.3 6379 @ mymaster 172.200.0.2 6379

# 等待从节点被提升为主节点的确认
1:X 24 Aug 2025 06:46:59.549 * +failover-state-wait-promotion slave 172.200.0.3:6379 172.200.0.3 6379 @ mymaster 172.200.0.2 6379

# 将新的配置信息保存到磁盘
1:X 24 Aug 2025 06:46:59.618 * Sentinel new configuration saved on disk

# 从节点 172.200.0.3:6379 已成功提升为新的主节点
1:X 24 Aug 2025 06:46:59.619 # +promoted-slave slave 172.200.0.3:6379 172.200.0.3 6379 @ mymaster 172.200.0.2 6379

# 故障转移进入重新配置从节点阶段,开始将其他从节点指向新的主节点
1:X 24 Aug 2025 06:46:59.620 # +failover-state-reconf-slaves master mymaster 172.200.0.2 6379

# 已向从节点 172.200.0.4:6379 发送重新配置命令,使其复制新的主节点
1:X 24 Aug 2025 06:46:59.668 * +slave-reconf-sent slave 172.200.0.4:6379 172.200.0.4 6379 @ mymaster 172.200.0.2 6379

# 主节点不再处于客观下线状态(可能是因为故障转移已开始处理)
1:X 24 Aug 2025 06:47:00.408 # -odown master mymaster 172.200.0.2 6379

# 从节点 172.200.0.4:6379 的重新配置正在进行中
1:X 24 Aug 2025 06:47:00.647 * +slave-reconf-inprog slave 172.200.0.4:6379 172.200.0.4 6379 @ mymaster 172.200.0.2 6379

# 从节点 172.200.0.4:6379 的重新配置已完成
1:X 24 Aug 2025 06:47:00.649 * +slave-reconf-done slave 172.200.0.4:6379 172.200.0.4 6379 @ mymaster 172.200.0.2 6379

# 故障转移操作完成
1:X 24 Aug 2025 06:47:00.699 # +failover-end master mymaster 172.200.0.2 6379

# 主节点已切换:从 172.200.0.2:6379 切换到 172.200.0.3:6379
1:X 24 Aug 2025 06:47:00.701 # +switch-master mymaster 172.200.0.2 6379 172.200.0.3 6379

# 发现从节点 172.200.0.4:6379 现在复制新的主节点
1:X 24 Aug 2025 06:47:00.703 * +slave slave 172.200.0.4:6379 172.200.0.4 6379 @ mymaster 172.200.0.3 6379

# 尝试将旧的主节点 172.200.0.2:6379 配置为新主节点的从节点
1:X 24 Aug 2025 06:47:00.705 * +slave slave 172.200.0.2:6379 172.200.0.2 6379 @ mymaster 172.200.0.3 6379

# 将新的配置信息保存到磁盘
1:X 24 Aug 2025 06:47:00.716 * Sentinel new configuration saved on disk

# 约30秒后,Sentinel 发现旧的主节点(现在是作为从节点)仍然无法访问,标记为主观下线
1:X 24 Aug 2025 06:47:30.747 # +sdown slave 172.200.0.2:6379 172.200.0.2 6379 @ mymaster 172.200.0.3 6379

99.常见问题

  1. Failed trying to load the MASTER synchronization DB from disk: No such file or directory

redis官方配置redis.conf

其中有这么一段

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
# -----------------------------------------------------------------------------
#
# Replica can load the RDB it reads from the replication link directly from the
# socket, or store the RDB to a file and read that file after it was completely
# received from the master.
#
# In many cases the disk is slower than the network, and storing and loading
# the RDB file may increase replication time (and even increase the master's
# Copy on Write memory and replica buffers).
# However, parsing the RDB file directly from the socket may mean that we have
# to flush the contents of the current database before the full rdb was
# received. For this reason we have the following options:
#
# "disabled" - Don't use diskless load (store the rdb file to the disk first)
# "on-empty-db" - Use diskless load only when it is completely safe.
# "swapdb" - Keep current db contents in RAM while parsing the data directly
# from the socket. Replicas in this mode can keep serving current
# data set while replication is in progress, except for cases where
# they can't recognize master as having a data set from same
# replication history.
# Note that this requires sufficient memory, if you don't have it,
# you risk an OOM kill.
repl-diskless-load disabled

我们将上面的属性设置为on-empty-db即可

  1. WARNING: Sentinel was not able to save the new configuration on disk!!!: Device or resource busy

如果直接使用文件映射指定sentinel.conf到容器内,这么做有可能导致哨兵没有写入配置文件的权限。

解决方案:使用文件夹映射。