Hi All,
I have been playing with mysqlfailover but a problem seems to stump me every time. It has to with the slave promoted to master and if shutdown, the datafiles can become corrupt, causing a crash dump and won't start again until that server is rebuilt. Let me explain:
The set-up:
2 x MySQL 5.6.10 on Redhat 6.3 x86_64.
IP addresses: 192.168.25.161, .162 respectively
Initially 1 x slave and 1 x master, set up with the following commands:
# mysql_install_db --datadir=/data
# /etc/init.d/mysql start
mysql> create user 'repl'@'%' identified by 'password';
mysql> grant all on *.* to 'repl'@'%';
mysql> RESET MASTER; RESET SLAVE;
Then on 1st server:
mysql> CHANGE MASTER TO MASTER_HOST = '192.168.25.162',MASTER_USER = 'repl', MASTER_PASSWORD = 'password', MASTER_AUTO_POSITION = 1;
And 2nd server:
mysql>CHANGE MASTER TO MASTER_HOST = '192.168.25.161',MASTER_USER = 'repl', MASTER_PASSWORD = 'password', MASTER_AUTO_POSITION = 1;
/etc/my.cnf (similar to the sample in PDF, with IP address adjusted):
[mysqld]
binlog-format=ROW
log-slave-updates=true
gtid-mode=on # GTID only
enforce-gtid-consistency=true # GTID only
master-info-repository=TABLE
relay-log-info-repository=TABLE
sync-master-info=1
slave-parallel-workers=2
binlog-checksum=CRC32
master-verify-checksum=1
slave-sql-verify-checksum=1
binlog-rows-query-log_events=1
server-id=1
report-port=3306
port=3306
log-bin=black-bin.log
datadir=/data
socket=/data/mysql.sock
report-host=192.168.25.161
relay-log = /data/relay-bin
general-log = 1
skip-slave-start = 1
What I did:
I started mysqlfailover like this:
#mysqlfailover --master=repl:password@192.168.25.161 --discover-slaves-login=repl:password --exec-before=/root/pre-failover.sh --exec-after=/root/post-failover.sh --rediscover
And mysqlfailover starts with 192.168.25.161 = master. 192.168.25.162 = slave.
For the first failover, things were sweet. .162 becomes master. I restarted Mysql on .161 successfully, resumed replication on .161. mysqlfailover recognises it as slave (Good!).
So I tried to failback such that .161 becomes the master again. When I stop the Mysql process (/etc/init.d/mysql stop and kill. I tried both), .162 produces this in the log. (Full log attached)
-------------
2013-04-11 01:56:20 25747 [Note] InnoDB: Starting shutdown...
2013-04-11 01:56:21 7fbe34e69700 InnoDB: Assertion failure in thread 140454908040960 in file trx0rseg.cc line 125
InnoDB: Failing assertion: UT_LIST_GET_LEN(rseg->update_undo_list) == 0
InnoDB: We intentionally generate a memory trap.
InnoDB: Submit a detailed bug report to
http://bugs.mysql.com.
InnoDB: If you get repeated assertion failures or crashes, even
InnoDB: immediately after the mysqld startup, there may be
InnoDB: corruption in the InnoDB tablespace. Please refer to
InnoDB:
http://dev.mysql.com/doc/refman/5.6/en/forcing-innodb-recovery.html
InnoDB: about forcing recovery.
17:56:21 UTC - mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.
key_buffer_size=8388608
read_buffer_size=131072
max_used_connections=4
max_threads=151
max_threads=151
thread_count=0
connection_count=0
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 68216 K bytes of memory
Hope that's ok; if not, decrease some variables in the equation.
Thread pointer: 0x0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0 thread_stack 0x40000
/usr/sbin/mysqld(my_print_stacktrace+0x35)[0x8bfd45]
/usr/sbin/mysqld(handle_fatal_signal+0x4a4)[0x65b074]
/lib64/libpthread.so.0(+0xf500)[0x7fbe4d5b7500]
/lib64/libc.so.6(gsignal+0x35)[0x7fbe4c2648a5]
/lib64/libc.so.6(abort+0x175)[0x7fbe4c266085]
/usr/sbin/mysqld[0x9a5cf8]
/usr/sbin/mysqld[0x9a8139]
/usr/sbin/mysqld[0x98f2bf]
/usr/sbin/mysqld[0x8dad94]
/usr/sbin/mysqld(_Z22ha_finalize_handlertonP13st_plugin_int+0x2e)[0x5a000e]
/usr/sbin/mysqld[0x6e05ee]
/usr/sbin/mysqld(_Z15plugin_shutdownv+0x233)[0x6e0ff3]
/usr/sbin/mysqld[0x594638]
/usr/sbin/mysqld(_Z10unireg_endv+0xe)[0x5949de]
/usr/sbin/mysqld[0x5978c8]
/usr/sbin/mysqld(kill_server_thread+0xe)[0x597a7e]
/usr/sbin/mysqld(pfs_spawn_thread+0x139)[0xadc7d9]
/lib64/libpthread.so.0(+0x7851)[0x7fbe4d5af851]
/lib64/libc.so.6(clone+0x6d)[0x7fbe4c31967d]
The manual page at
http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what is causing the crash.
130411 01:56:21 mysqld_safe Number of processes running now: 0
------------
mysqlfailover looks really promising if we can use it for production. Any help is appreciated.
Regards,
Rayson Chan