部署
本次共3台机器,设置主机名及hosts | 配置每台服务 my.cnf 中report_host 字段,为自己的 hostname
192.168.10.123 db1
192.168.10.124 db2
192.168.10.125 db3
安装 mysql5.7.20 ,可以参考下面的安装基本,我搭建的时候就是用的这个。
http://blog.51cto.com/hequan/1982428
安装mysql-shell mysql-route
wget https://dev.mysql.com/get/mysql57-community-release-el7-11.noarch.rpm
yum install mysql57-community-release-el7-11.noarch.rpm
yum install mysql-shell -y
yum install mysql-router -y
设置相关用户的权限,生产环境 可以不是 root用户
grant all privileges on . to ‘root‘@‘%‘ identified by ‘123456‘;
GRANT ALL PRIVILEGES ON mysql_innodb_cluster_metadata. TO root@‘%‘ WITH GRANT OPTION;
GRANT RELOAD, SHUTDOWN, PROCESS, FILE, SUPER, REPLICATION SLAVE, REPLICATION CLIENT, \
CREATE USER ON . TO root@‘%‘ WITH GRANT OPTION;
GRANT SELECT ON .* TO root@‘%‘ WITH GRANT OPTION;
flush privileges;
mysqlsh
[root@db1 ~]# mysqlsh
dba.checkInstanceConfiguration(‘root@db1:3306‘)
+----------------------------------+---------------+----------------+--------------------------------------------------+
| Variable | Current Value | Required Value | Note |
+----------------------------------+---------------+----------------+--------------------------------------------------+
| binlog_checksum | CRC32 | NONE | Update the server variable or restart the server |
| binlog_format | MIXED | ROW | Update the server variable or restart the server |
| enforce_gtid_consistency | OFF | ON | Restart the server |
| gtid_mode | OFF | ON | Restart the server |
| log_slave_updates | 0 | ON | Restart the server |
| master_info_repository | FILE | TABLE | Restart the server |
| relay_log_info_repository | FILE | TABLE | Restart the server |
| transaction_write_set_extraction | OFF | XXHASH64 | Restart the server |
+----------------------------------+---------------+----------------+--------------------------------------------------+
dba.configureLocalInstance(‘root@db1:3306‘)
Please provide the password for ‘root@db1:3306‘:
Detecting the configuration file...
Found configuration file at standard location: /etc/my.cnf
Do you want to modify this file? [Y|n]: [Y|n]: Y
dba.checkInstanceConfiguration(‘root@db1:3306‘)
Please provide the password for ‘root@db1:3306‘:
Validating instance...
The instance ‘db1:3306‘ is valid for Cluster usage
{
"status": "ok"
}
mysqlsh --uri root@db1:3306
mysql-js> var cluster = dba.createCluster(‘main‘)
A new InnoDB cluster will be created on instance ‘hequan@db1:3306‘.
Creating InnoDB cluster ‘main‘ on ‘hequan@db1:3306‘...
Adding Seed Instance...
Cluster successfully created. Use Cluster.addInstance() to add MySQL instances.
At least 3 instances are needed for the cluster to be able to withstand up to
one server failure.
mysql-js> cluster.addInstance(‘root@db2:3306‘)
mysql-js> cluster.addInstance(‘root@db3:3306‘)
mysql-js> cluster.status()
mysql-js> \connect db1
mysql-js> dba.configureLocalInstance(‘db1:3306‘)
mysql-js> cluster.describe();
var cluster = dba.getCluster();
cluster.status();
Mysql-route 设置
[root@db2 ~]# mysqlrouter --bootstrap root@db1:3306 --user mysqlrouter
Please enter MySQL password for root:
WARNING: The MySQL server does not have SSL configured and metadata used by the router may be transmitted unencrypted.
Bootstrapping system MySQL Router instance...
MySQL Router has now been configured for the InnoDB cluster ‘main‘.
The following connection information can be used to connect to the cluster.
Classic MySQL protocol connections to cluster ‘main‘:
X protocol connections to cluster ‘main‘:
Existing configurations backed up to /etc/mysqlrouter/mysqlrouter.conf.bak
[root@db2 ~]# systemctl start mysqlrouter
systemctl start mysqlrouter
systemctl enable mysqlrouter
[root@db2 ~]# netstat -lntup
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp 0 0 0.0.0.0:64460 0.0.0.0: LISTEN 2958/mysqlrouter
tcp 0 0 0.0.0.0:6446 0.0.0.0: LISTEN 2958/mysqlrouter
tcp 0 0 0.0.0.0:6447 0.0.0.0: LISTEN 2958/mysqlrouter
tcp 0 0 0.0.0.0:64470 0.0.0.0: LISTEN 2958/mysqlrouter
mysql -u root -h 127.0.0.1 -P 6446 -p
select @@port;
select @@hostname;
故障模拟
##关闭 db1 数据库,自动切换如下:
"topology": {
"db1:3306": {
"address": "db1:3306",
"mode": "R/O",
"readReplicas": {},
"role": "HA",
"status": "(MISSING)"
},
"db2:3306": {
"address": "db2:3306",
"mode": "R/W",
"readReplicas": {},
"role": "HA",
"status": "ONLINE"
},
"db3:3306": {
"address": "db3:3306",
"mode": "R/O",
"readReplicas": {},
"role": "HA",
"status": "ONLINE"
}
##重启db2 ,执行命令
mysql> show databases;
ERROR 2013 (HY000): Lost connection to MySQL server during query
mysql> show databases;
ERROR 2006 (HY000): MySQL server has gone away
No connection. Trying to reconnect...
Connection id: 20
Current database: NONE
mysql> select @@hostname;
+------------+
| @@hostname |
+------------+
| db1 |
+------------+
##重启节点后,需要手动加入
"db2:3306": {
"address": "db2:3306",
"mode": "R/O",
"readReplicas": {},
"role": "HA",
"status": "(MISSING)"
cluster.rejoinInstance(‘root@db2:3306‘)
The instance ‘db2:3306‘ was successfully added to the MySQL Cluster.
mysqlsh --uri root@db1:3306
mysql-js> var cluster = dba.rebootClusterFromCompleteOutage();
Reconfiguring the default cluster from complete outage...
The instance ‘db2:3306‘ was part of the cluster configuration.
Would you like to rejoin it to the cluster? [y|N]: y
The instance ‘db3:3306‘ was part of the cluster configuration.
Would you like to rejoin it to the cluster? [y|N]: y
The cluster was successfully rebooted.
报错总结:
##如果节点在加入集群前,执行了写操作,加入集群时会报错
ERROR: Error joining instance to cluster: ‘db2:3306‘ - Query failed. MySQL Error (3092): The server is not configured properly to be an active member of the group. Please see more details on error log.. Query: START group_replication (RuntimeError)
##登陆 db2 数据库 执行 reset master;
cluster.forceQuorumUsingPartitionOf("db1:3306")
mysql-js> cluster.rejoinInstance(‘root@db2:3306‘)
mysql-js> cluster.rejoinInstance(‘root@db3:3306‘)
附言:
官方文档: https://dev.mysql.com/doc/refman/5.7/en/mysql-innodb-cluster-userguide.html
节点有哪状态
* ONLINE - 节点状态正常。
* OFFLINE - 实例在运行,但没有加入任何Cluster。
* RECOVERING - 实例已加入Cluster,正在同步数据。
* ERROR - 同步数据发生异常。
* UNREACHABLE - 与其他节点通讯中断,可能是网络问题,可能是节点crash。
* MISSING 节点已加入集群,但未启动group replication
集群有哪些状态
* OK – 所有节点处于online状态,有冗余节点。
* OK_PARTIAL – 有节点不可用,但仍有冗余节点。
* OK_NO_TOLERANCE – 有足够的online节点,但没有冗余,例如:两个节点的Cluster,其中一个挂了,集群就不可用了。
* NO_QUORUM – 有节点处于online状态,但达不到法定节点数,此状态下Cluster无法写入,只能读取。
* UNKNOWN – 不是online或recovering状态,尝试连接其他实例查看状态。
* UNAVAILABLE – 组内节点全是offline状态,但实例在运行,可能实例刚重启还没加入Cluster。
原文:http://blog.51cto.com/13480211/2060178