一组MySQL复制环境中的Master意外掉电,重启后Master运行正常,但该复制环境中的其它slave端,Error Log中却抛出的如下错误信息:
Version: ‘5.6.17-log‘ socket: ‘‘ port: 3306 MySQL Community Server (GPL) 2014-09-26 18:30:19 5940 [Warning] Slave SQL: If a crash happens this configuration does not guarantee that the relay log info will be consistent, Error_code: 0 2014-09-26 18:30:19 5940 [Note] Slave SQL thread initialized, starting replication in log ‘FIRST‘ at position 0, relay log ‘.\Q12MB1DR67JGLJT-relay-bin.000001‘ position: 4 2014-09-26 18:30:19 5940 [Note] Slave I/O thread: connected to master ‘c@‘,replication started in log ‘FIRST‘ at position 4 2014-09-26 18:30:19 5940 [Warning] Slave I/O: Notifying master by SET @master_binlog_checksum= @@global.binlog_checksum failed with error: Unknown system variable ‘binlog_checksum‘, Error_code: 1193 2014-09-26 18:30:19 5940 [Warning] Slave I/O: Unknown system variable ‘SERVER_UUID‘ on master. A probable cause is that the variable is not supported on the master (version: 5.5.35-log), even though it is on the slave (version: 5.6.17-log), Error_code: 1193 2014-09-26 18:31:14 5940 [Note] Error reading relay log event: slave SQL thread was killed 2014-09-26 18:31:15 5940 [ERROR] Error reading packet from server: Lost connection to MySQL server during query ( server_errno=2013) 2014-09-26 18:31:15 5940 [Note] Slave I/O thread killed while reading event 2014-09-26 18:31:15 5940 [Note] Slave I/O thread exiting, read up to log ‘binLog.000001‘, position 278 2014-09-26 18:31:20 5940 [Note] ‘CHANGE MASTER TO executed‘. Previous state master_host=‘‘, master_port= 3306, master_log_file=‘binLog.000001‘, master_log_pos= 278, master_bind=‘‘. New state master_host=‘‘, master_port= 3306, master_log_file=‘‘, master_log_pos= 4, master_bind=‘‘. 2014-09-26 18:35:27 5940 [Note] ‘CHANGE MASTER TO executed‘. Previous state master_host=‘‘, master_port= 3306, master_log_file=‘‘, master_log_pos= 4, master_bind=‘‘. New state master_host=‘‘, master_port= 3306, master_log_file=‘‘, master_log_pos= 4, master_bind=‘‘.
110110 15:21:25 [ERROR] Error reading packet from server: Lost connection to MySQL server during query ( server_errno=2013) 110110 15:21:25 [Note] Slave I/O thread: Failed reading log event, reconnecting to retry, log ‘forummysql01-bin.002937‘ position 243387731 110110 15:21:25 [Note] Slave: connected to master ‘repl@‘,replication resumed in log ‘forummysql01-bin.002937‘ at position 243387731 110110 15:21:25 [ERROR] Error reading packet from server: Client requested master to start replication from impossible position ( server_errno=1236) 110110 15:21:25 [ERROR] Got fatal error 1236: ‘Client requested master to start replication from impossible position‘ from master when reading data from binary log 110110 15:21:25 [Note] Slave I/O thread exiting, read up to log ‘forummysql01-bin.002937‘, position 243387731
通过mysql命令行连接到slave端,执行show slave status查看复制状态:
mysql> show slave status\G *************************** 1. row *************************** Slave_IO_State: Master_Host: Master_User: c Master_Port: 3306 Connect_Retry: 60 Master_Log_File: Read_Master_Log_Pos: 4 Relay_Log_File: Q12MB1DR67JGLJT-relay-bin.000001 Relay_Log_Pos: 4 Relay_Master_Log_File: Slave_IO_Running: No Slave_SQL_Running: No Replicate_Do_DB: Replicate_Ignore_DB: Replicate_Do_Table: Replicate_Ignore_Table: Replicate_Wild_Do_Table: Replicate_Wild_Ignore_Table: Last_Errno: 0 Last_Error: Skip_Counter: 0 Exec_Master_Log_Pos: 0 Relay_Log_Space: 120 Until_Condition: None Until_Log_File: Until_Log_Pos: 0 Master_SSL_Allowed: No Master_SSL_CA_File: Master_SSL_CA_Path: Master_SSL_Cert: Master_SSL_Cipher: Master_SSL_Key: Seconds_Behind_Master: NULL Master_SSL_Verify_Server_Cert: No Last_IO_Errno: 1593 Last_IO_Error: Fatal error: The slave I/O thread stops because master and slave have equal MySQL server ids; these ids must be different for replication to work (or the --replicate-sam e-server-id option must be used on slave but this does not always make sense; please check the manua l before using it). Last_SQL_Errno: 0 Last_SQL_Error: Replicate_Ignore_Server_Ids: Master_Server_Id: 1 Master_UUID: Master_Info_File: F:\db-data\mysql\master.info SQL_Delay: 0 SQL_Remaining_Delay: NULL Slave_SQL_Running_State: Master_Retry_Count: 86400 Master_Bind: Last_IO_Error_Timestamp: 140926 19:17:52 Last_SQL_Error_Timestamp: Master_SSL_Crl: Master_SSL_Crlpath: Retrieved_Gtid_Set: Executed_Gtid_Set: Auto_Position: 0 1 row in set (0.00 sec)
Salve的io线程没有运行,看起来是接收日志出现了问题,尝试启动该线程:start slave io_thread;
mysql> start slave io_thread ; Query OK, 0 rows affected (0.00 sec)
再次执行show slave status查看复制状态:
mysql> show slave status\G *************************** 1. row *************************** Slave_IO_State: Master_Host: Master_User: c Master_Port: 3306 Connect_Retry: 60 Master_Log_File: Read_Master_Log_Pos: 4 Relay_Log_File: Q12MB1DR67JGLJT-relay-bin.000001 Relay_Log_Pos: 4 Relay_Master_Log_File: Slave_IO_Running: No Slave_SQL_Running: No Replicate_Do_DB: Replicate_Ignore_DB: Replicate_Do_Table: Replicate_Ignore_Table: Replicate_Wild_Do_Table: Replicate_Wild_Ignore_Table: Last_Errno: 0 Last_Error: Skip_Counter: 0 Exec_Master_Log_Pos: 0 Relay_Log_Space: 120 Until_Condition: None Until_Log_File: Until_Log_Pos: 0 Master_SSL_Allowed: No Master_SSL_CA_File: Master_SSL_CA_Path: Master_SSL_Cert: Master_SSL_Cipher: Master_SSL_Key: Seconds_Behind_Master: NULL Master_SSL_Verify_Server_Cert: No Last_IO_Errno: 1593 Last_IO_Error: Fatal error: The slave I/O thread stops because master and slave have equal MySQL server ids; these ids must be different for replication to work (or the --replicate-sam e-server-id option must be used on slave but this does not always make sense; please check the manua l before using it). Last_SQL_Errno: 0 Last_SQL_Error: Replicate_Ignore_Server_Ids: Master_Server_Id: 1 Master_UUID: Master_Info_File: F:\db-data\mysql\master.info SQL_Delay: 0 SQL_Remaining_Delay: NULL Slave_SQL_Running_State: Master_Retry_Count: 86400 Master_Bind: Last_IO_Error_Timestamp: 140926 19:25:12 Last_SQL_Error_Timestamp: Master_SSL_Crl: Master_SSL_Crlpath: Retrieved_Gtid_Set: Executed_Gtid_Set: Auto_Position: 0 1 row in set (0.00 sec)
看起来 没有反应,其中是有反映,执行启动io线程的命令后,Error Log文件中又抛出了日志文件位置异常的信息。看来还是得到master端,查看一下报错的日志文件指定位置到底执行的什么操作,以及该位置是否存在?
[root@forummysql01 data]# mysqlbinlog --start-position=243387732 forummysql01-bin.002937 /*!40019 SET @@session.max_insert_delayed_threads=0*/; /*!50003 SET @OLD_COMPLETION_TYPE=@@COMPLETION_TYPE,COMPLETION_TYPE=0*/; DELIMITER /*!*/; DELIMITER ; # End of log file ROLLBACK /* added by mysqlbinlog */; /*!50003 SET COMPLETION_TYPE=@OLD_COMPLETION_TYPE*/;
[root@forummysql01 data]# mysqlbinlog ./forummysql01-bin.002937 > /home/jss/bin-002937.log
[root@forummysql01 data]# tail -50 /home/jss/bin-002937.log ............................. # at 243297123 #110110 15:02:19 server id 1 end_log_pos 243297459 Query thread_id=1773644066 exec_time=0 error_code=0 SET TIMESTAMP=1294642939/*!*/; INSERT INTO cdb_sessions (sid, ip1, ip2, ip3, ip4, uid, username, groupid, styleid, invisible, action, lastactivity, lastolupdate, seccode, fid, tid) VALUES (‘HQFzjy‘, ‘202‘, ‘160‘, ‘180‘, ‘187‘, ‘0‘, ‘‘, ‘7‘, ‘1‘, ‘0‘, ‘3‘, ‘1294642939‘, ‘0‘, ‘232485‘, ‘27‘, ‘4583‘) /*!*/; ................ ................ ................ # at 243308840 #110110 15:02:20 server id 1 end_log_pos 243315309 Query thread_id=1773638971 exec_time=0 error_code=0 SET TIMESTAMP=1294642940/*!*/; update group_topic set TOPIC_TIT............................. /*!*/; DELIMITER ; # End of log file ROLLBACK /* added by mysqlbinlog */; /*!50003 SET COMPLETION_TYPE=@OLD_COMPLETION_TYPE*/;
可以看到该bin文件中最后的位置点是243315309,与错误日志中“‘forummysql01-bin.002937‘, position 243387731”相差较大,提示的错误点在二进制日志文件中确实不存在,我将其理解为逻辑错误,应该是由于master意外掉电,重新启动时自动flush了binlog,而slave并未获取到这个信息导致,因此解决该问题也比较简单,直接重置同步的master位置应该就可以。这里三思选择将日志文件序号递增(也可以选择将position位置号提前),执行命令如下:
mysql> stop slave; Query OK, 0 rows affected (0.00 sec) mysql> CHANGE MASTER TO MASTER_HOST=‘‘, -> MASTER_PORT=3306, -> MASTER_USER=‘repl‘, -> MASTER_PASSWORD=‘******‘, -> MASTER_LOG_FILE=‘forummysql01-bin.002938‘, -> MASTER_LOG_POS=0; Query OK, 0 rows affected (0.01 sec)
mysql> start slave; Query OK, 0 rows affected (0.00 sec) mysql> show slave status\G *************************** 1. row *************************** Slave_IO_State: Waiting for master to send event Master_Host: Master_User: repl Master_Port: 3306 Connect_Retry: 60 Master_Log_File: forummysql01-bin.002938 Read_Master_Log_Pos: 35910271 Relay_Log_File: phpmysql02-relay-bin.000003 Relay_Log_Pos: 21407790 Relay_Master_Log_File: forummysql01-bin.002938 Slave_IO_Running: Yes Slave_SQL_Running: Yes Replicate_Do_DB: Replicate_Ignore_DB: mysql Replicate_Do_Table: Replicate_Ignore_Table: Replicate_Wild_Do_Table: Replicate_Wild_Ignore_Table: Last_Errno: 0 Last_Error: Skip_Counter: 0 Exec_Master_Log_Pos: 21407646 Relay_Log_Space: 35910415 Until_Condition: None Until_Log_File: Until_Log_Pos: 0 Master_SSL_Allowed: No Master_SSL_CA_File: Master_SSL_CA_Path: Master_SSL_Cert: Master_SSL_Cipher: Master_SSL_Key: Seconds_Behind_Master: 2215 1 row in set (0.00 sec)
Slave相关进程已启动,Error Log文件中也没有再抛出错误信息。等待一段时间,让slave赶上master的进度,其它slave也参照此步骤操作,整个复制环境就恢复了。
来自 : http://blog.itpub.net/7607759/viewspace-683607/