

Galera Cluster MariaDBエラーtips


(1) Percona Xtrabackupインストールしてないのにwsrep_sst_method=xtrabackup奴wwww

Galera Clusterの検証環境をつくろうと思って、既にある環境からmy.cnfを丸コピしてきたものを使って、

初期ノード起動後、2台目以降のGalera Clusterを起動しようとした時に発生したエラー

  • エラー内容


141230 21:41:42 [Note] WSREP: Running: 'wsrep_sst_xtrabackup --role 'joiner' --address '' --auth 'root' --datadir '/var/lib/mysql/' --defaults-file '/etc/my.cnf' --parent '4906''
which: no innobackupex in (/usr/sbin:/sbin:/usr//bin:/sbin:/usr/sbin:/bin:/usr/bin:/usr/bin)
WSREP_SST: [ERROR] innobackupex not in path: /usr/sbin:/sbin:/usr//bin:/sbin:/usr/sbin:/bin:/usr/bin:/usr/bin (20141230 21:41:42.953)
141230 21:41:42 [ERROR] WSREP: Failed to read 'ready <addr>' from: wsrep_sst_xtrabackup --role 'joiner' --address '' --auth 'root' --datadir '/var/lib/mysql/' --defaults-file '/etc/my.cnf' --parent '4906'
        Read: '(null)'
141230 21:41:42 [ERROR] WSREP: Process completed with error: wsrep_sst_xtrabackup --role 'joiner' --address '' --auth 'root' --datadir '/var/lib/mysql/' --defaults-file '/etc/my.cnf' --parent '4906': 2 (No such file or directory)
141230 21:41:42 [ERROR] WSREP: Failed to prepare for 'xtrabackup' SST. Unrecoverable.
141230 21:41:42 [ERROR] Aborting
  • 原因

wsrep_sst_method=xtrabackup だったため

wsrep_sst_methodはGalera Cluster起動時にマスタデータベース・サーバから同期する手段を指定するパラメータで、デフォルトがrsyncなのですが既にあった環境にはxtrabackupを指定していました。

つくりたてほやほやぁのサーバにはPercona XtrabackupをインストールしていなかったためGalera Cluster起動時に同期ができず失敗した模様です。

  • Percona XtraBackupのインストール
$ yum install http://www.percona.com/downloads/percona-release/redhat/0.1-3/percona-release-0.1-3.noarch.rpm
$ yum install xtrabackup
  • Galera Clusterの起動

Galera Cluster

$ service mysql start

また wsrep_sst_method=xtrabackup としたときは wsrep_sst_auth=<認証ユーザ>:<password> も設定しておくことを忘れずに。

(2) Clusterジョイン後に、参加したノードのシーケンス番号がClusterよりも高いため同期に失敗して起動しない

  • ログ抜粋
150108  0:45:22 [ERROR] WSREP: Local state seqno (12) is greater than group seqno (10): states diverged. Aborting to avoid potential data loss. Remove '/storage/data/mysql//grastate.dat' file and restart if you wish to continue. (FATAL)
         at galera/src/replicator_str.cpp:state_transfer_required():33
  • ログ詳細
150108  0:45:22 [Note] WSREP: STATE EXCHANGE: sent state msg: a097fc2f-96cf-11e4-9c92-e32d0822faad
150108  0:45:22 [Note] WSREP: STATE EXCHANGE: got state msg: a097fc2f-96cf-11e4-9c92-e32d0822faad from 0 (
150108  0:45:22 [Note] WSREP: STATE EXCHANGE: got state msg: a097fc2f-96cf-11e4-9c92-e32d0822faad from 2 (
150108  0:45:22 [Note] WSREP: STATE EXCHANGE: got state msg: a097fc2f-96cf-11e4-9c92-e32d0822faad from 1 (
150108  0:45:22 [Note] WSREP: Quorum results:
        version    = 3,
        component  = PRIMARY,
        conf_id    = 14,
        members    = 2/3 (joined/total),
        act_id     = 10,
        last_appl. = -1,
        protocols  = 0/5/3 (gcs/repl/appl),
        group UUID = 2a245897-968f-11e4-bc67-13c84f4a5ebb
150108  0:45:22 [Note] WSREP: Flow-control interval: [28, 28]
150108  0:45:22 [Note] WSREP: Shifting OPEN -> PRIMARY (TO: 10)
150108  0:45:22 [Note] WSREP: Closing send monitor...
150108  0:45:22 [Note] WSREP: Closed send monitor.
150108  0:45:22 [Note] WSREP: gcomm: terminating thread
150108  0:45:22 [Note] WSREP: gcomm: joining thread
150108  0:45:22 [Note] WSREP: gcomm: closing backend
150108  0:45:22 [Note] WSREP: view(view_id(NON_PRIM,213ef9d9-96ce-11e4-b33f-0756a3a1e93e,15) memb {
} joined {
} left {
} partitioned {
150108  0:45:22 [Note] WSREP: view((empty))
150108  0:45:22 [Note] WSREP: gcomm: closed
150108  0:45:22 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1
150108  0:45:22 [Note] WSREP: Flow-control interval: [16, 16]
150108  0:45:22 [Note] WSREP: Received NON-PRIMARY.
150108  0:45:22 [Note] WSREP: Shifting PRIMARY -> OPEN (TO: 10)
150108  0:45:22 [Note] WSREP: Received self-leave message.
150108  0:45:22 [Note] WSREP: Flow-control interval: [0, 0]
150108  0:45:22 [Note] WSREP: Received SELF-LEAVE. Closing connection.
150108  0:45:22 [Note] WSREP: Shifting OPEN -> CLOSED (TO: 10)
150108  0:45:22 [Note] WSREP: RECV thread exiting 0: Success
150108  0:45:22 [Note] WSREP: recv_thread() joined.
150108  0:45:22 [Note] WSREP: Closing replication queue.
150108  0:45:22 [Note] WSREP: Closing slave action queue.
☆☆150108  0:45:22 [ERROR] WSREP: Local state seqno (12) is greater than group seqno (10): states diverged. Aborting to avoid potential data loss. Remove '/storage/data/mysql//grastate.dat' file and restart if you wish to continue. (FATAL)
         at galera/src/replicator_str.cpp:state_transfer_required():33
150108  0:45:22 [Note] WSREP: applier thread exiting (code:8)
150108  0:45:22 [ERROR] Aborting

150108  0:45:24 [Note] WSREP: Closing send monitor...
150108  0:45:24 [Note] WSREP: Closed send monitor.
150108  0:45:24 [Note] WSREP: Service disconnected.
150108  0:45:24 [Note] WSREP: rollbacker thread exiting
150108  0:45:25 [Note] WSREP: Some threads may fail to exit.
150108  0:45:25 [Note] /usr/sbin/mysqld: Shutdown complete

Error in my_thread_global_end(): 1 threads didn't exit
150108 00:45:30 mysqld_safe mysqld from pid file /storage/data/mysql//ip-10-0-0-120.pid ended
150108 00:46:50 mysqld_safe Starting mysqld daemon with databases from /storage/data/mysql/
150108 00:46:50 mysqld_safe WSREP: Running position recovery with --log_error='/storage/data/mysql//wsrep_recovery.uw41rR' --pid-file='/storage/data/mysql//ip-10-0-0-120-recover.pid'
150108 00:46:53 mysqld_safe WSREP: Recovered position 2a245897-968f-11e4-bc67-13c84f4a5ebb:12

(3) wsrep_sst_method=xtrabackup指定していて2台目以降のGalera Clusterを起動すると[[Warning] WSREP: Gap in state sequence. Need state transfer.]がログに出力されてClusterが起動しない

現状出くわしたケースとしてgalera cluster -> replication slaveを構成しているDBで replication slaveで実行したinnobackupexのbackup fileをリストアすると2台目以降のmysql起動時に発生する。

力技だけど2台目以降のmy.cnfに wsrep_sst_method=rsync を指定して起動する。

