[Pacemaker] MySQL, Percona replication manager - split brain

Sat Oct 25 16:34:54 EDT 2014

25.10.2014 22:34, Digimer пишет:
> On 25/10/14 03:32 PM, Andrew wrote:
>> Hi all.
>>
>> I use Percona as RA on cluster (nothing mission-critical, currently -
>> just zabbix data); today after restarting MySQL resource (crm resource
>> restart p_mysql) I've got a split brain state - MySQL for some reason
>> started first at ex-slave node, ex-master starts later (possibly I've
>> set too small timeout to shutdown - only 120s, but I'm not sure).
>>
>> After restart resource on both nodes it seems like mysql replication was
>> ok - but then after ~50min it fails in split brain again for unknown
>> reason (no resource restart was noticed).
>>
>> In 'show replication status' there is an error in table caused by unique
>> index dup.
>>
>> So I have a questions:
>> 1) Which thing causes split brain, and how to avoid it in future?
>
> Cause:
>
> Logs?
ct 25 13:54:13 node2 crmd[29248]:   notice: do_state_transition: State 
transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC 
cause=C_FSA_INTERNAL origin=abort_transition_graph ]
Oct 25 13:54:13 node2 pengine[29247]:   notice: unpack_config: On loss 
of CCM Quorum: Ignore
Oct 25 13:54:13 node2 pengine[29247]:   notice: unpack_rsc_op: Operation 
monitor found resource p_pgsql:0 active in master mode on node1.cluster
Oct 25 13:54:13 node2 pengine[29247]:   notice: unpack_rsc_op: Operation 
monitor found resource p_mysql:1 active in master mode on node2.cluster
Oct 25 13:54:13 node2 pengine[29247]:   notice: process_pe_message: 
Calculated Transition 3459: /var/lib/pacemaker/pengine/pe-input-254.bz2
Oct 25 13:54:13 node2 crmd[29248]:   notice: run_graph: Transition 3459 
(Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0, 
Source=/var/lib/pacemaker/pengine/pe-input-254.bz2): Complete
Oct 25 13:54:13 node2 crmd[29248]:   notice: do_state_transition: State 
transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS 
cause=C_FSA_INTERNAL origin=notify_crmd ]
Oct 25 13:54:14 node2 pacemakerd[11626]:   notice: crm_add_logfile: 
Additional logging available in /var/log/cluster/corosync.log
Oct 25 13:54:16 node2 crmd[29248]:   notice: do_state_transition: State 
transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC 
cause=C_FSA_INTERNAL origin=abort_transition_graph ]
Oct 25 13:54:16 node2 pengine[29247]:   notice: unpack_config: On loss 
of CCM Quorum: Ignore
Oct 25 13:54:16 node2 pengine[29247]:   notice: unpack_rsc_op: Operation 
monitor found resource p_pgsql:0 active in master mode on node1.cluster
Oct 25 13:54:16 node2 pengine[29247]:   notice: unpack_rsc_op: Operation 
monitor found resource p_mysql:1 active in master mode on node2.cluster
Oct 25 13:54:16 node2 pengine[29247]:   notice: process_pe_message: 
Calculated Transition 3460: /var/lib/pacemaker/pengine/pe-input-255.bz2
Oct 25 13:54:16 node2 crmd[29248]:   notice: run_graph: Transition 3460 
(Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0, 
Source=/var/lib/pacemaker/pengine/pe-input-255.bz2): Complete
Oct 25 13:54:16 node2 crmd[29248]:   notice: do_state_transition: State 
transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS 
cause=C_FSA_INTERNAL origin=notify_crmd ]
Oct 25 13:54:18 node2 crmd[29248]:   notice: do_state_transition: State 
transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC 
cause=C_FSA_INTERNAL origin=abort_transition_graph ]
Oct 25 13:54:18 node2 pengine[29247]:   notice: unpack_config: On loss 
of CCM Quorum: Ignore
Oct 25 13:54:18 node2 pengine[29247]:   notice: unpack_rsc_op: Operation 
monitor found resource p_pgsql:0 active in master mode on node1.cluster
Oct 25 13:54:18 node2 pengine[29247]:   notice: unpack_rsc_op: Operation 
monitor found resource p_mysql:1 active in master mode on node2.cluster
Oct 25 13:54:18 node2 pengine[29247]:   notice: process_pe_message: 
Calculated Transition 3461: /var/lib/pacemaker/pengine/pe-input-256.bz2
Oct 25 13:54:18 node2 crmd[29248]:   notice: run_graph: Transition 3461 
(Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0, 
Source=/var/lib/pacemaker/pengine/pe-input-256.bz2): Complete
Oct 25 13:54:18 node2 crmd[29248]:   notice: do_state_transition: State 
transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS 
cause=C_FSA_INTERNAL origin=notify_crmd ]
Oct 25 13:54:19 node2 pacemaker: Signaling Pacemaker Cluster Manager to 
terminate
Oct 25 13:54:19 node2 pacemakerd[29237]:   notice: pcmk_shutdown_worker: 
Shuting down Pacemaker
Oct 25 13:54:19 node2 pacemakerd[29237]:   notice: stop_child: Stopping 
crmd: Sent -15 to process 29248
Oct 25 13:54:19 node2 crmd[29248]:   notice: crm_shutdown: Requesting 
shutdown, upper limit is 1200000ms
Oct 25 13:54:19 node2 crmd[29248]:   notice: do_state_transition: State 
transition S_IDLE -> S_POLICY_ENGINE [ input=I_SHUTDOWN cause=C_SHUTDOWN 
origin=crm_shutdown ]
Oct 25 13:54:19 node2 attrd[29246]:   notice: attrd_trigger_update: 
Sending flush op to all hosts for: shutdown (1414234459)
Oct 25 13:54:19 node2 pacemaker: Waiting for cluster services to unload
Oct 25 13:54:19 node2 attrd[29246]:   notice: attrd_perform_update: Sent 
update 7041: shutdown=1414234459
Oct 25 13:54:19 node2 pengine[29247]:   notice: unpack_config: On loss 
of CCM Quorum: Ignore
Oct 25 13:54:19 node2 pengine[29247]:   notice: unpack_rsc_op: Operation 
monitor found resource p_pgsql:0 active in master mode on node1.cluster
Oct 25 13:54:19 node2 pengine[29247]:   notice: unpack_rsc_op: Operation 
monitor found resource p_mysql:1 active in master mode on node2.cluster
Oct 25 13:54:19 node2 pengine[29247]:   notice: stage6: Scheduling Node 
node2.cluster for shutdown
Oct 25 13:54:19 node2 pengine[29247]:   notice: LogActions: Move 
ClusterIP#011(Started node2.cluster -> node1.cluster)
Oct 25 13:54:19 node2 pengine[29247]:   notice: LogActions: Move 
mysql_writer_vip#011(Started node2.cluster -> node1.cluster)
Oct 25 13:54:19 node2 pengine[29247]:   notice: LogActions: Move 
pgsql_reader_vip#011(Started node2.cluster -> node1.cluster)
Oct 25 13:54:19 node2 pengine[29247]:   notice: LogActions: Promote 
p_mysql:0#011(Slave -> Master node1.cluster)
Oct 25 13:54:19 node2 pengine[29247]:   notice: LogActions: Demote 
p_mysql:1#011(Master -> Stopped node2.cluster)
Oct 25 13:54:19 node2 pengine[29247]:   notice: LogActions: Stop 
p_pgsql:1#011(node2.cluster)
Oct 25 13:54:19 node2 pengine[29247]:   notice: LogActions: Stop 
p_nginx:1#011(node2.cluster)
Oct 25 13:54:19 node2 pengine[29247]:   notice: LogActions: Stop 
p_perl-fpm:1#011(node2.cluster)
Oct 25 13:54:19 node2 pengine[29247]:   notice: LogActions: Stop 
p_radiusd:1#011(node2.cluster)
Oct 25 13:54:19 node2 pengine[29247]:   notice: process_pe_message: 
Calculated Transition 3462: /var/lib/pacemaker/pengine/pe-input-257.bz2
Oct 25 13:54:19 node2 crmd[29248]:   notice: te_rsc_command: Initiating 
action 24: stop ClusterIP_stop_0 on node2.cluster (local)
Oct 25 13:54:19 node2 crmd[29248]:   notice: te_rsc_command: Initiating 
action 32: stop pgsql_reader_vip_stop_0 on node2.cluster (local)
Oct 25 13:54:19 node2 crmd[29248]:   notice: te_rsc_command: Initiating 
action 2: cancel p_mysql_cancel_2000 on node1.cluster
Oct 25 13:54:19 node2 crmd[29248]:   notice: te_rsc_command: Initiating 
action 143: notify p_mysql_pre_notify_demote_0 on node1.cluster
Oct 25 13:54:19 node2 crmd[29248]:   notice: te_rsc_command: Initiating 
action 145: notify p_mysql_pre_notify_demote_0 on node2.cluster (local)
Oct 25 13:54:19 node2 crmd[29248]:   notice: te_rsc_command: Initiating 
action 149: notify p_pgsql_pre_notify_stop_0 on node1.cluster
Oct 25 13:54:19 node2 crmd[29248]:   notice: te_rsc_command: Initiating 
action 151: notify p_pgsql_pre_notify_stop_0 on node2.cluster (local)
Oct 25 13:54:19 node2 crmd[29248]:   notice: te_rsc_command: Initiating 
action 105: stop p_perl-fpm_stop_0 on node2.cluster (local)
Oct 25 13:54:19 node2 crmd[29248]:   notice: te_rsc_command: Initiating 
action 120: stop p_radiusd_stop_0 on node2.cluster (local)
Oct 25 13:54:19 node2 lrmd[29245]:   notice: operation_finished: 
p_perl-fpm_stop_0:12107:stderr [ grep: /var/run/perl-fpm/perl-fpm.pid: 
No such file or directory ]
Oct 25 13:54:19 node2 crmd[29248]:   notice: process_lrm_event: LRM 
operation p_perl-fpm_stop_0 (call=2537, rc=0, cib-update=4282, 
confirmed=true) ok
Oct 25 13:54:19 node2 pacemakerd[12173]:   notice: crm_add_logfile: 
Additional logging available in /var/log/cluster/corosync.log
Oct 25 13:54:19 node2 mysql(p_mysql)[12101]: INFO: post-demote 
notification for node2.cluster
Oct 25 13:54:19 node2 IPaddr2(pgsql_reader_vip)[12100]: INFO: IP status 
= ok, IP_CIP=
Oct 25 13:54:19 node2 IPaddr(ClusterIP)[12099]: INFO: IP status = ok, 
IP_CIP=
Oct 25 13:54:19 node2 avahi-daemon[6932]: Withdrawing address record for 
192.168.253.31 on br0.
Oct 25 13:54:19 node2 crmd[29248]:   notice: process_lrm_event: LRM 
operation p_pgsql_notify_0 (call=2534, rc=0, cib-update=0, 
confirmed=true) ok
Oct 25 13:54:19 node2 avahi-daemon[6932]: Withdrawing address record for 
192.168.253.254 on br0.
Oct 25 13:54:19 node2 crmd[29248]:   notice: process_lrm_event: LRM 
operation pgsql_reader_vip_stop_0 (call=2528, rc=0, cib-update=4283, 
confirmed=true) ok
Oct 25 13:54:19 node2 crmd[29248]:   notice: process_lrm_event: LRM 
operation ClusterIP_stop_0 (call=2525, rc=0, cib-update=4284, 
confirmed=true) ok
Oct 25 13:54:19 node2 crmd[29248]:   notice: process_lrm_event: LRM 
operation p_mysql_notify_0 (call=2532, rc=0, cib-update=0, 
confirmed=true) ok
Oct 25 13:54:20 node2 crmd[29248]:   notice: process_lrm_event: LRM 
operation p_radiusd_stop_0 (call=2540, rc=0, cib-update=4285, 
confirmed=true) ok
Oct 25 13:54:20 node2 crmd[29248]:   notice: run_graph: Transition 3462 
(Complete=15, Pending=0, Fired=0, Skipped=29, Incomplete=18, 
Source=/var/lib/pacemaker/pengine/pe-input-257.bz2): Stopped
Oct 25 13:54:20 node2 pengine[29247]:   notice: unpack_config: On loss 
of CCM Quorum: Ignore
Oct 25 13:54:20 node2 pengine[29247]:   notice: unpack_rsc_op: Operation 
monitor found resource p_pgsql:0 active in master mode on node1.cluster
Oct 25 13:54:20 node2 pengine[29247]:   notice: unpack_rsc_op: Operation 
monitor found resource p_mysql:1 active in master mode on node2.cluster
Oct 25 13:54:20 node2 pengine[29247]:   notice: stage6: Scheduling Node 
node2.cluster for shutdown
Oct 25 13:54:20 node2 pengine[29247]:   notice: LogActions: Start 
ClusterIP#011(node1.cluster)
Oct 25 13:54:20 node2 pengine[29247]:   notice: LogActions: Move 
mysql_writer_vip#011(Started node2.cluster -> node1.cluster)
Oct 25 13:54:20 node2 pengine[29247]:   notice: LogActions: Start 
pgsql_reader_vip#011(node1.cluster)
Oct 25 13:54:20 node2 pengine[29247]:   notice: LogActions: Promote 
p_mysql:0#011(Slave -> Master node1.cluster)
Oct 25 13:54:20 node2 pengine[29247]:   notice: LogActions: Demote 
p_mysql:1#011(Master -> Stopped node2.cluster)
Oct 25 13:54:20 node2 pengine[29247]:   notice: LogActions: Stop 
p_pgsql:1#011(node2.cluster)
Oct 25 13:54:20 node2 pengine[29247]:   notice: LogActions: Stop 
p_nginx:1#011(node2.cluster)
Oct 25 13:54:20 node2 pengine[29247]:   notice: process_pe_message: 
Calculated Transition 3463: /var/lib/pacemaker/pengine/pe-input-258.bz2
Oct 25 13:54:20 node2 crmd[29248]:   notice: te_rsc_command: Initiating 
action 19: start ClusterIP_start_0 on node1.cluster
Oct 25 13:54:20 node2 crmd[29248]:   notice: te_rsc_command: Initiating 
action 26: start pgsql_reader_vip_start_0 on node1.cluster
Oct 25 13:54:20 node2 crmd[29248]:   notice: te_rsc_command: Initiating 
action 134: notify p_mysql_pre_notify_demote_0 on node1.cluster
Oct 25 13:54:20 node2 crmd[29248]:   notice: te_rsc_command: Initiating 
action 136: notify p_mysql_pre_notify_demote_0 on node2.cluster (local)
Oct 25 13:54:20 node2 crmd[29248]:   notice: te_rsc_command: Initiating 
action 140: notify p_pgsql_pre_notify_stop_0 on node1.cluster
Oct 25 13:54:20 node2 crmd[29248]:   notice: te_rsc_command: Initiating 
action 142: notify p_pgsql_pre_notify_stop_0 on node2.cluster (local)
Oct 25 13:54:20 node2 crmd[29248]:   notice: te_rsc_command: Initiating 
action 97: stop p_nginx_stop_0 on node2.cluster (local)
Oct 25 13:54:20 node2 pacemakerd[12402]:   notice: crm_add_logfile: 
Additional logging available in /var/log/cluster/corosync.log
Oct 25 13:54:20 node2 mysql(p_mysql)[12371]: INFO: post-demote 
notification for node2.cluster
Oct 25 13:54:20 node2 crmd[29248]:   notice: process_lrm_event: LRM 
operation p_pgsql_notify_0 (call=2552, rc=0, cib-update=0, 
confirmed=true) ok
Oct 25 13:54:20 node2 crmd[29248]:   notice: te_rsc_command: Initiating 
action 64: stop p_pgsql_stop_0 on node2.cluster (local)
Oct 25 13:54:20 node2 crmd[29248]:   notice: process_lrm_event: LRM 
operation p_mysql_notify_0 (call=2550, rc=0, cib-update=0, 
confirmed=true) ok
Oct 25 13:54:20 node2 crmd[29248]:   notice: te_rsc_command: Initiating 
action 34: demote p_mysql_demote_0 on node2.cluster (local)
Oct 25 13:54:20 node2 crmd[29248]:   notice: te_rsc_command: Initiating 
action 20: monitor ClusterIP_monitor_2000 on node1.cluster
Oct 25 13:54:20 node2 pacemakerd[12542]:   notice: crm_add_logfile: 
Additional logging available in /var/log/cluster/corosync.log
Oct 25 13:54:20 node2 crmd[29248]:   notice: te_rsc_command: Initiating 
action 27: monitor pgsql_reader_vip_monitor_10000 on node1.cluster
Oct 25 13:54:20 node2 attrd[29246]:   notice: attrd_trigger_update: 
Sending flush op to all hosts for: master-p_mysql (1)
Oct 25 13:54:20 node2 attrd[29246]:   notice: attrd_perform_update: Sent 
update 7043: master-p_mysql=1
Oct 25 13:54:20 node2 crmd[29248]:   notice: process_lrm_event: LRM 
operation p_mysql_demote_0 (call=2567, rc=0, cib-update=4287, 
confirmed=true) ok
Oct 25 13:54:20 node2 crmd[29248]:   notice: te_rsc_command: Initiating 
action 135: notify p_mysql_post_notify_demote_0 on node1.cluster
Oct 25 13:54:20 node2 crmd[29248]:   notice: te_rsc_command: Initiating 
action 137: notify p_mysql_post_notify_demote_0 on node2.cluster (local)
Oct 25 13:54:20 node2 attrd[29246]:   notice: attrd_trigger_update: 
Sending flush op to all hosts for: master-p_pgsql (-INFINITY)
Oct 25 13:54:20 node2 attrd[29246]:   notice: attrd_perform_update: Sent 
update 7045: master-p_pgsql=-INFINITY
Oct 25 13:54:20 node2 mysql(p_mysql)[12605]: INFO: Ignoring post-demote 
notification for my own demotion.
Oct 25 13:54:20 node2 crmd[29248]:   notice: process_lrm_event: LRM 
operation p_mysql_notify_0 (call=2571, rc=0, cib-update=0, 
confirmed=true) ok
Oct 25 13:54:20 node2 ntpd[7193]: Deleting interface #110 br0:0, 
192.168.253.254#123, interface stats: received=0, sent=0, dropped=0, 
active_time=47343 secs
Oct 25 13:54:20 node2 ntpd[7193]: Deleting interface #172 br0, 
192.168.253.31#123, interface stats: received=0, sent=0, dropped=0, 
active_time=45246 secs
Oct 25 13:54:20 node2 cib[29243]:   notice: cib:diff: Diff: --- 0.320.2419
Oct 25 13:54:20 node2 cib[29243]:   notice: cib:diff: Diff: +++ 0.321.1 
17dd8c556d2c592e066c52d67b1a5b87
Oct 25 13:54:20 node2 cib[29243]:   notice: cib:diff: -- <nvpair 
value="STREAMING|SYNC" id="nodes-node2.cluster-p_pgsql-data-status"/>
Oct 25 13:54:20 node2 cib[29243]:   notice: cib:diff: ++ <nvpair 
id="nodes-node2.cluster-p_pgsql-data-status" name="p_pgsql-data-status" 
value="DISCONNECT"/>
Oct 25 13:54:21 node2 nginx(p_nginx)[12373]: INFO: Killing nginx PID 29997
Oct 25 13:54:21 node2 nginx(p_nginx)[12373]: INFO: nginx stopped.
Oct 25 13:54:21 node2 lrmd[29245]:   notice: operation_finished: 
p_nginx_stop_0:12373:stderr [ /usr/lib/ocf/resource.d/heartbeat/nginx: 
line 467: kill: (29997) - No such process ]
Oct 25 13:54:21 node2 crmd[29248]:   notice: process_lrm_event: LRM 
operation p_nginx_stop_0 (call=2556, rc=0, cib-update=4288, 
confirmed=true) ok
Oct 25 13:54:21 node2 pgsql(p_pgsql)[12524]: INFO: waiting for server to 
shut down.... done server stopped
Oct 25 13:54:21 node2 pgsql(p_pgsql)[12524]: INFO: PostgreSQL is down
Oct 25 13:54:21 node2 pgsql(p_pgsql)[12524]: INFO: Changing 
p_pgsql-status on node2.cluster : HS:sync->STOP.
Oct 25 13:54:21 node2 attrd[29246]:   notice: attrd_trigger_update: 
Sending flush op to all hosts for: p_pgsql-status (STOP)
Oct 25 13:54:21 node2 attrd[29246]:   notice: attrd_perform_update: Sent 
update 7047: p_pgsql-status=STOP
Oct 25 13:54:21 node2 pgsql(p_pgsql)[12524]: INFO: Set all nodes into 
async mode.
Oct 25 13:54:21 node2 crmd[29248]:   notice: process_lrm_event: LRM 
operation p_pgsql_stop_0 (call=2562, rc=0, cib-update=4289, 
confirmed=true) ok
Oct 25 13:54:21 node2 crmd[29248]:   notice: te_rsc_command: Initiating 
action 141: notify p_pgsql_post_notify_stop_0 on node1.cluster
Oct 25 13:54:21 node2 crmd[29248]:   notice: run_graph: Transition 3463 
(Complete=28, Pending=0, Fired=0, Skipped=18, Incomplete=9, 
Source=/var/lib/pacemaker/pengine/pe-input-258.bz2): Stopped
Oct 25 13:54:21 node2 pengine[29247]:   notice: unpack_config: On loss 
of CCM Quorum: Ignore
Oct 25 13:54:21 node2 pengine[29247]:   notice: unpack_rsc_op: Operation 
monitor found resource p_pgsql:0 active in master mode on node1.cluster
Oct 25 13:54:21 node2 pengine[29247]:   notice: unpack_rsc_op: Operation 
monitor found resource p_mysql:1 active in master mode on node2.cluster
Oct 25 13:54:21 node2 pengine[29247]:   notice: stage6: Scheduling Node 
node2.cluster for shutdown
Oct 25 13:54:21 node2 pengine[29247]:   notice: LogActions: Stop 
mysql_writer_vip#011(Started node1.cluster)
Oct 25 13:54:21 node2 pengine[29247]:   notice: LogActions: Promote 
p_mysql:0#011(Slave -> Master node1.cluster)
Oct 25 13:54:21 node2 pengine[29247]:   notice: LogActions: Stop 
p_mysql:1#011(node2.cluster)
Oct 25 13:54:21 node2 pengine[29247]:     crit: stage8: Cannot shut down 
node 'node2.cluster' because of mysql_writer_vip: blocked
Oct 25 13:54:21 node2 pengine[29247]:   notice: process_pe_message: 
Calculated Transition 3464: /var/lib/pacemaker/pengine/pe-input-259.bz2
Oct 25 13:54:21 node2 crmd[29248]:   notice: te_rsc_command: Initiating 
action 125: notify p_mysql_pre_notify_stop_0 on node1.cluster
Oct 25 13:54:21 node2 crmd[29248]:   notice: te_rsc_command: Initiating 
action 127: notify p_mysql_pre_notify_stop_0 on node2.cluster (local)
Oct 25 13:54:21 node2 crmd[29248]:   notice: process_lrm_event: LRM 
operation p_mysql_notify_0 (call=2576, rc=0, cib-update=0, 
confirmed=true) ok
Oct 25 13:54:21 node2 crmd[29248]:   notice: te_rsc_command: Initiating 
action 32: stop p_mysql_stop_0 on node2.cluster (local)
Oct 25 13:54:21 node2 attrd[29246]:   notice: attrd_trigger_update: 
Sending flush op to all hosts for: master-p_mysql (<null>)
Oct 25 13:54:21 node2 attrd[29246]:   notice: attrd_perform_update: Sent 
delete 7049: node=node2.cluster, attr=master-p_mysql, id=<n/a>, 
set=(null), section=status
Oct 25 13:54:21 node2 attrd[29246]:   notice: attrd_trigger_update: 
Sending flush op to all hosts for: readable (0)
Oct 25 13:54:21 node2 attrd[29246]:   notice: attrd_perform_update: Sent 
update 7052: readable=0
Oct 25 13:54:25 node2 nginx(p_nginx)[12778]: INFO: nginx not running
Oct 25 13:54:25 node2 crmd[29248]:   notice: process_lrm_event: LRM 
operation p_nginx_monitor_10000 (call=2442, rc=7, cib-update=4291, 
confirmed=false) not running
Oct 25 13:54:25 node2 crmd[29248]:  warning: update_failcount: Updating 
failcount for p_nginx on node2.cluster after failed monitor: rc=7 
(update=value++, time=1414234465)
Oct 25 13:54:25 node2 attrd[29246]:   notice: attrd_trigger_update: 
Sending flush op to all hosts for: fail-count-p_nginx (1)
Oct 25 13:54:25 node2 attrd[29246]:   notice: attrd_perform_update: Sent 
update 7055: fail-count-p_nginx=1
Oct 25 13:54:25 node2 attrd[29246]:   notice: attrd_trigger_update: 
Sending flush op to all hosts for: last-failure-p_nginx (1414234465)
Oct 25 13:54:25 node2 attrd[29246]:   notice: attrd_perform_update: Sent 
update 7057: last-failure-p_nginx=1414234465
Oct 25 13:54:26 node2 nginx(p_nginx)[12844]: INFO: nginx not running
Oct 25 13:54:26 node2 nginx(p_nginx)[12845]: INFO: nginx not running
Oct 25 13:54:26 node2 crmd[29248]:   notice: process_lrm_event: LRM 
operation p_nginx_monitor_10000 (call=333, rc=7, cib-update=4292, 
confirmed=false) not running
Oct 25 13:54:26 node2 crmd[29248]:   notice: process_lrm_event: LRM 
operation p_nginx_monitor_10000 (call=2469, rc=7, cib-update=4293, 
confirmed=false) not running
Oct 25 13:54:26 node2 crmd[29248]:  warning: update_failcount: Updating 
failcount for p_nginx on node2.cluster after failed monitor: rc=7 
(update=value++, time=1414234466)
Oct 25 13:54:26 node2 attrd[29246]:   notice: attrd_trigger_update: 
Sending flush op to all hosts for: fail-count-p_nginx (2)
Oct 25 13:54:26 node2 crmd[29248]:  warning: update_failcount: Updating 
failcount for p_nginx on node2.cluster after failed monitor: rc=7 
(update=value++, time=1414234466)
Oct 25 13:54:26 node2 attrd[29246]:   notice: attrd_perform_update: Sent 
update 7059: fail-count-p_nginx=2
Oct 25 13:54:26 node2 attrd[29246]:   notice: attrd_trigger_update: 
Sending flush op to all hosts for: last-failure-p_nginx (1414234466)
Oct 25 13:54:26 node2 attrd[29246]:   notice: attrd_perform_update: Sent 
update 7061: last-failure-p_nginx=1414234466
Oct 25 13:54:26 node2 attrd[29246]:   notice: attrd_trigger_update: 
Sending flush op to all hosts for: fail-count-p_nginx (3)
Oct 25 13:54:26 node2 attrd[29246]:   notice: attrd_perform_update: Sent 
update 7063: fail-count-p_nginx=3
Oct 25 13:54:26 node2 attrd[29246]:   notice: attrd_trigger_update: 
Sending flush op to all hosts for: last-failure-p_nginx (1414234466)
Oct 25 13:54:26 node2 attrd[29246]:   notice: attrd_perform_update: Sent 
update 7065: last-failure-p_nginx=1414234466
Oct 25 13:54:35 node2 nginx(p_nginx)[13037]: INFO: nginx not running
Oct 25 13:54:36 node2 nginx(p_nginx)[13099]: INFO: nginx not running
Oct 25 13:54:36 node2 nginx(p_nginx)[13098]: INFO: nginx not running
Oct 25 13:54:45 node2 nginx(p_nginx)[13291]: INFO: nginx not running
Oct 25 13:54:47 node2 nginx(p_nginx)[13353]: INFO: nginx not running
Oct 25 13:54:47 node2 nginx(p_nginx)[13352]: INFO: nginx not running
Oct 25 13:54:55 node2 nginx(p_nginx)[13549]: INFO: nginx not running
Oct 25 13:54:57 node2 nginx(p_nginx)[13628]: INFO: nginx not running
Oct 25 13:54:57 node2 nginx(p_nginx)[13627]: INFO: nginx not running
Oct 25 13:55:05 node2 nginx(p_nginx)[13870]: INFO: nginx not running
Oct 25 13:55:07 node2 nginx(p_nginx)[13931]: INFO: nginx not running
Oct 25 13:55:07 node2 nginx(p_nginx)[13932]: INFO: nginx not running
Oct 25 13:55:15 node2 nginx(p_nginx)[14151]: INFO: nginx not running
Oct 25 13:55:17 node2 nginx(p_nginx)[14214]: INFO: nginx not running
Oct 25 13:55:17 node2 nginx(p_nginx)[14213]: INFO: nginx not running
Oct 25 13:55:25 node2 nginx(p_nginx)[14405]: INFO: nginx not running
Oct 25 13:55:27 node2 nginx(p_nginx)[14469]: INFO: nginx not running
Oct 25 13:55:27 node2 nginx(p_nginx)[14468]: INFO: nginx not running
Oct 25 13:55:35 node2 nginx(p_nginx)[14659]: INFO: nginx not running
Oct 25 13:55:37 node2 nginx(p_nginx)[14723]: INFO: nginx not running
Oct 25 13:55:37 node2 nginx(p_nginx)[14724]: INFO: nginx not running
Oct 25 13:55:38 node2 mysql(p_mysql)[12733]: INFO: MySQL is not running
Oct 25 13:55:38 node2 mysql(p_mysql)[12733]: INFO: MySQL is not running
Oct 25 13:55:38 node2 mysql(p_mysql)[12733]: INFO: MySQL stopped
Oct 25 13:55:38 node2 crmd[29248]:   notice: process_lrm_event: LRM 
operation p_mysql_stop_0 (call=2579, rc=0, cib-update=4294, 
confirmed=true) ok
Oct 25 13:55:38 node2 crmd[29248]:   notice: te_rsc_command: Initiating 
action 126: notify p_mysql_post_notify_stop_0 on node1.cluster
Oct 25 13:55:38 node2 crmd[29248]:   notice: run_graph: Transition 3464 
(Complete=10, Pending=0, Fired=0, Skipped=6, Incomplete=4, 
Source=/var/lib/pacemaker/pengine/pe-input-259.bz2): Stopped
Oct 25 13:55:38 node2 pengine[29247]:   notice: unpack_config: On loss 
of CCM Quorum: Ignore
Oct 25 13:55:38 node2 pengine[29247]:   notice: unpack_rsc_op: Operation 
monitor found resource p_pgsql:0 active in master mode on node1.cluster
Oct 25 13:55:38 node2 pengine[29247]:   notice: unpack_rsc_op: Operation 
monitor found resource p_mysql:1 active in master mode on node2.cluster
Oct 25 13:55:38 node2 pengine[29247]:  warning: unpack_rsc_op: 
Processing failed op monitor for p_nginx:1 on node2.cluster: not running (7)
Oct 25 13:55:38 node2 pengine[29247]:   notice: stage6: Scheduling Node 
node2.cluster for shutdown
Oct 25 13:55:38 node2 pengine[29247]:   notice: LogActions: Stop 
mysql_writer_vip#011(Started node1.cluster)
Oct 25 13:55:38 node2 pengine[29247]:   notice: LogActions: Promote 
p_mysql:0#011(Slave -> Master node1.cluster)
Oct 25 13:55:38 node2 pengine[29247]:     crit: stage8: Cannot shut down 
node 'node2.cluster' because of mysql_writer_vip: blocked
Oct 25 13:55:38 node2 pengine[29247]:   notice: process_pe_message: 
Calculated Transition 3465: /var/lib/pacemaker/pengine/pe-input-260.bz2
Oct 25 13:55:38 node2 crmd[29248]:   notice: te_rsc_command: Initiating 
action 126: notify p_mysql_pre_notify_promote_0 on node1.cluster
Oct 25 13:55:38 node2 crmd[29248]:   notice: te_rsc_command: Initiating 
action 30: promote p_mysql_promote_0 on node1.cluster
Oct 25 13:55:38 node2 cib[29243]:   notice: cib:diff: Diff: --- 0.321.15
Oct 25 13:55:38 node2 cib[29243]:   notice: cib:diff: Diff: +++ 0.322.1 
92b30c7922072633f9dbc336e4f1a066
Oct 25 13:55:38 node2 cib[29243]:   notice: cib:diff: -- <nvpair 
value="192.168.253.4|mysqld-bin.000409|106" 
id="mysql_replication-p_mysql_REPL_INFO"/>
Oct 25 13:55:38 node2 cib[29243]:   notice: cib:diff: ++ <nvpair 
id="mysql_replication-p_mysql_REPL_INFO" name="p_mysql_REPL_INFO" 
value="192.168.253.5|mysqld-bin.000471|106"/>
Oct 25 13:55:38 node2 crmd[29248]:   notice: te_rsc_command: Initiating 
action 127: notify p_mysql_post_notify_promote_0 on node1.cluster
Oct 25 13:55:38 node2 crmd[29248]:   notice: run_graph: Transition 3465 
(Complete=9, Pending=0, Fired=0, Skipped=1, Incomplete=0, 
Source=/var/lib/pacemaker/pengine/pe-input-260.bz2): Stopped
Oct 25 13:55:38 node2 pengine[29247]:   notice: unpack_config: On loss 
of CCM Quorum: Ignore
Oct 25 13:55:38 node2 pengine[29247]:   notice: unpack_rsc_op: Operation 
monitor found resource p_pgsql:0 active in master mode on node1.cluster
Oct 25 13:55:38 node2 pengine[29247]:   notice: unpack_rsc_op: Operation 
monitor found resource p_mysql:1 active in master mode on node2.cluster
Oct 25 13:55:38 node2 pengine[29247]:  warning: unpack_rsc_op: 
Processing failed op monitor for p_nginx:1 on node2.cluster: not running (7)
Oct 25 13:55:38 node2 pengine[29247]:   notice: stage6: Scheduling Node 
node2.cluster for shutdown
Oct 25 13:55:38 node2 pengine[29247]:   notice: LogActions: Move 
mysql_writer_vip#011(Started node2.cluster -> node1.cluster)
Oct 25 13:55:38 node2 pengine[29247]:   notice: process_pe_message: 
Calculated Transition 3466: /var/lib/pacemaker/pengine/pe-input-261.bz2
Oct 25 13:55:38 node2 crmd[29248]:   notice: te_rsc_command: Initiating 
action 21: stop mysql_writer_vip_stop_0 on node2.cluster (local)
Oct 25 13:55:38 node2 crmd[29248]:   notice: te_rsc_command: Initiating 
action 32: monitor p_mysql_monitor_5000 on node1.cluster
Oct 25 13:55:38 node2 IPaddr2(mysql_writer_vip)[14851]: INFO: IP status 
= ok, IP_CIP=
Oct 25 13:55:38 node2 avahi-daemon[6932]: Withdrawing address record for 
192.168.253.64 on br0.
Oct 25 13:55:38 node2 crmd[29248]:   notice: process_lrm_event: LRM 
operation mysql_writer_vip_stop_0 (call=2586, rc=0, cib-update=4300, 
confirmed=true) ok
Oct 25 13:55:38 node2 crmd[29248]:   notice: te_rsc_command: Initiating 
action 22: start mysql_writer_vip_start_0 on node1.cluster
Oct 25 13:55:38 node2 crmd[29248]:   notice: te_rsc_command: Initiating 
action 23: monitor mysql_writer_vip_monitor_10000 on node1.cluster
Oct 25 13:55:38 node2 crmd[29248]:   notice: run_graph: Transition 3466 
(Complete=6, Pending=0, Fired=0, Skipped=0, Incomplete=0, 
Source=/var/lib/pacemaker/pengine/pe-input-261.bz2): Complete
Oct 25 13:55:38 node2 crmd[29248]:   notice: lrm_state_verify_stopped: 
Stopped 1 recurring operations at shutdown... waiting (0 ops remaining)
Oct 25 13:55:38 node2 lrmd[29245]:  warning: qb_ipcs_event_sendv: 
new_event_notification (29245-29248-6): Bad file descriptor (9)
Oct 25 13:55:38 node2 lrmd[29245]:  warning: send_client_notify: 
Notification of client crmd/d068e1e6-7c6e-48b0-bb86-49d35fe11673 failed
Oct 25 13:55:38 node2 crmd[29248]:   notice: do_lrm_control: 
Disconnected from the LRM
Oct 25 13:55:38 node2 crmd[29248]:   notice: terminate_cs_connection: 
Disconnecting from Corosync
Oct 25 13:55:38 node2 corosync[29219]:   [pcmk  ] info: pcmk_ipc_exit: 
Client crmd (conn=0x1afff00, async-conn=0x1afff00) left
Oct 25 13:55:38 node2 cib[29243]:  warning: qb_ipcs_event_sendv: 
new_event_notification (29243-29248-13): Broken pipe (32)
Oct 25 13:55:38 node2 cib[29243]:  warning: do_local_notify: A-Sync 
reply to crmd failed: No message of desired type
Oct 25 13:55:38 node2 pacemakerd[29237]:  warning: qb_ipcs_event_sendv: 
new_event_notification (29237-29248-14): Broken pipe (32)
Oct 25 13:55:38 node2 pacemakerd[29237]:   notice: stop_child: Stopping 
pengine: Sent -15 to process 29247
Oct 25 13:55:38 node2 corosync[29219]:   [pcmk  ] WARN: 
route_ais_message: Sending message to local.crmd failed: ipc delivery 
failed (rc=-2)
Oct 25 13:55:38 node2 pacemakerd[29237]:   notice: stop_child: Stopping 
attrd: Sent -15 to process 29246
Oct 25 13:55:38 node2 attrd[29246]:   notice: main: Exiting...
Oct 25 13:55:38 node2 corosync[29219]:   [pcmk  ] info: pcmk_ipc_exit: 
Client attrd (conn=0x1af3480, async-conn=0x1af3480) left
Oct 25 13:55:38 node2 pacemakerd[29237]:   notice: stop_child: Stopping 
lrmd: Sent -15 to process 29245
Oct 25 13:55:38 node2 pacemakerd[29237]:   notice: stop_child: Stopping 
stonith-ng: Sent -15 to process 29244
Oct 25 13:55:38 node2 corosync[29219]:   [pcmk  ] info: pcmk_ipc_exit: 
Client stonith-ng (conn=0x1af7800, async-conn=0x1af7800) left
Oct 25 13:55:38 node2 pacemakerd[29237]:   notice: stop_child: Stopping 
cib: Sent -15 to process 29243
Oct 25 13:55:38 node2 cib[29243]:   notice: terminate_cs_connection: 
Disconnecting from Corosync
Oct 25 13:55:38 node2 corosync[29219]:   [pcmk  ] WARN: 
route_ais_message: Sending message to local.stonith-ng failed: ipc 
delivery failed (rc=-2)
Oct 25 13:55:38 node2 corosync[29219]:   [pcmk  ] info: pcmk_ipc_exit: 
Client cib (conn=0x1afbb80, async-conn=0x1afbb80) left
Oct 25 13:55:38 node2 pacemakerd[29237]:   notice: pcmk_shutdown_worker: 
Shutdown complete
Oct 25 13:55:38 node2 corosync[29219]:   [pcmk  ] WARN: 
route_ais_message: Sending message to local.stonith-ng failed: ipc 
delivery failed (rc=-2)
Oct 25 13:55:38 node2 corosync[29219]:   [pcmk  ] WARN: 
route_ais_message: Sending message to local.cib failed: ipc delivery 
failed (rc=-2)
Oct 25 13:55:38 node2 corosync[29219]:   [pcmk  ] WARN: 
route_ais_message: Sending message to local.cib failed: ipc delivery 
failed (rc=-2)
Oct 25 13:55:38 node2 corosync[29219]:   [pcmk  ] WARN: 
route_ais_message: Sending message to local.cib failed: ipc delivery 
failed (rc=-2)
Oct 25 13:55:38 node2 corosync[29219]:   [pcmk  ] WARN: 
route_ais_message: Sending message to local.cib failed: ipc delivery 
failed (rc=-2)
Oct 25 13:55:39 node2 corosync[29219]:   [pcmk  ] WARN: 
route_ais_message: Sending message to local.cib failed: ipc delivery 
failed (rc=-2)
Oct 25 13:55:39 node2 corosync[29219]:   [pcmk  ] WARN: 
route_ais_message: Sending message to local.cib failed: ipc delivery 
failed (rc=-2)
Oct 25 13:55:39 node2 corosync[29219]:   [pcmk  ] WARN: 
route_ais_message: Sending message to local.attrd failed: ipc delivery 
failed (rc=-2)
Oct 25 13:55:39 node2 corosync[29219]:   [pcmk  ] WARN: 
route_ais_message: Sending message to local.attrd failed: ipc delivery 
failed (rc=-2)
Oct 25 13:55:39 node2 corosync[29219]:   [pcmk  ] WARN: 
route_ais_message: Sending message to local.attrd failed: ipc delivery 
failed (rc=-2)
Oct 25 13:55:39 node2 corosync[29219]:   [pcmk  ] WARN: 
route_ais_message: Sending message to local.attrd failed: ipc delivery 
failed (rc=-2)
Oct 25 13:55:39 node2 corosync[29219]:   [pcmk  ] WARN: 
route_ais_message: Sending message to local.attrd failed: ipc delivery 
failed (rc=-2)
Oct 25 13:55:39 node2 corosync[29219]:   [pcmk  ] WARN: 
route_ais_message: Sending message to local.attrd failed: ipc delivery 
failed (rc=-2)
Oct 25 13:55:39 node2 pacemaker: Starting Pacemaker Cluster Manager
Oct 25 13:55:39 node2 pacemakerd[14916]:   notice: crm_add_logfile: 
Additional logging available in /var/log/cluster/corosync.log
Oct 25 13:55:39 node2 pacemakerd[14916]:   notice: main: Starting 
Pacemaker 1.1.10-14.el6_5.3 (Build: 368c726):  generated-manpages 
agent-manpages ascii-docs publican-docs ncurses libqb-logging libqb-ipc 
nagios  corosync-plugin cman
Oct 25 13:55:39 node2 pacemakerd[14916]:   notice: get_node_name: 
Defaulting to uname -n for the local classic openais (with plugin) node name
Oct 25 13:55:39 node2 corosync[29219]:   [pcmk  ] WARN: 
route_ais_message: Sending message to local.stonith-ng failed: ipc 
delivery failed (rc=-2)
Oct 25 13:55:39 node2 corosync[29219]:   [pcmk  ] WARN: 
route_ais_message: Sending message to local.stonith-ng failed: ipc 
delivery failed (rc=-2)
Oct 25 13:55:39 node2 corosync[29219]:   [pcmk  ] WARN: 
route_ais_message: Sending message to local.stonith-ng failed: ipc 
delivery failed (rc=-2)
Oct 25 13:55:39 node2 corosync[29219]:   [pcmk  ] WARN: 
route_ais_message: Sending message to local.stonith-ng failed: ipc 
delivery failed (rc=-2)
Oct 25 13:55:39 node2 corosync[29219]:   [pcmk  ] WARN: 
route_ais_message: Sending message to local.stonith-ng failed: ipc 
delivery failed (rc=-2)
Oct 25 13:55:39 node2 corosync[29219]:   [pcmk  ] WARN: 
route_ais_message: Sending message to local.stonith-ng failed: ipc 
delivery failed (rc=-2)
Oct 25 13:55:39 node2 cib[14922]:   notice: crm_add_logfile: Additional 
logging available in /var/log/cluster/corosync.log
Oct 25 13:55:39 node2 attrd[14925]:   notice: crm_add_logfile: 
Additional logging available in /var/log/cluster/corosync.log
Oct 25 13:55:39 node2 stonith-ng[14923]:   notice: crm_add_logfile: 
Additional logging available in /var/log/cluster/corosync.log
Oct 25 13:55:39 node2 cib[14922]:   notice: main: Using legacy config 
location: /var/lib/heartbeat/crm
Oct 25 13:55:39 node2 attrd[14925]:   notice: crm_cluster_connect: 
Connecting to cluster infrastructure: classic openais (with plugin)
Oct 25 13:55:39 node2 pengine[14926]:   notice: crm_add_logfile: 
Additional logging available in /var/log/cluster/corosync.log
Oct 25 13:55:39 node2 stonith-ng[14923]:   notice: crm_cluster_connect: 
Connecting to cluster infrastructure: classic openais (with plugin)
Oct 25 13:55:39 node2 lrmd[14924]:   notice: crm_add_logfile: Additional 
logging available in /var/log/cluster/corosync.log
Oct 25 13:55:39 node2 crmd[14927]:   notice: crm_add_logfile: Additional 
logging available in /var/log/cluster/corosync.log
Oct 25 13:55:39 node2 crmd[14927]:   notice: main: CRM Git Version: 368c726
Oct 25 13:55:39 node2 attrd[14925]:   notice: get_node_name: Defaulting 
to uname -n for the local classic openais (with plugin) node name
Oct 25 13:55:39 node2 corosync[29219]:   [pcmk  ] info: pcmk_ipc: 
Recorded connection 0x1af31c0 for attrd/0
Oct 25 13:55:39 node2 stonith-ng[14923]:   notice: get_node_name: 
Defaulting to uname -n for the local classic openais (with plugin) node name
Oct 25 13:55:39 node2 corosync[29219]:   [pcmk  ] info: pcmk_ipc: 
Recorded connection 0x1af7540 for stonith-ng/0
Oct 25 13:55:39 node2 attrd[14925]:   notice: get_node_name: Defaulting 
to uname -n for the local classic openais (with plugin) node name
Oct 25 13:55:39 node2 cib[14922]:   notice: crm_cluster_connect: 
Connecting to cluster infrastructure: classic openais (with plugin)
Oct 25 13:55:39 node2 stonith-ng[14923]:   notice: get_node_name: 
Defaulting to uname -n for the local classic openais (with plugin) node name
Oct 25 13:55:39 node2 attrd[14925]:   notice: main: Starting mainloop...
Oct 25 13:55:39 node2 cib[14922]:   notice: get_node_name: Defaulting to 
uname -n for the local classic openais (with plugin) node name
Oct 25 13:55:39 node2 corosync[29219]:   [pcmk  ] info: pcmk_ipc: 
Recorded connection 0x1afb8c0 for cib/0
Oct 25 13:55:39 node2 corosync[29219]:   [pcmk  ] info: pcmk_ipc: 
Sending membership update 1608 to cib
Oct 25 13:55:39 node2 cib[14922]:   notice: get_node_name: Defaulting to 
uname -n for the local classic openais (with plugin) node name
Oct 25 13:55:39 node2 cib[14922]:   notice: plugin_handle_membership: 
Membership 1608: quorum acquired
Oct 25 13:55:39 node2 cib[14922]:   notice: crm_update_peer_state: 
plugin_handle_membership: Node node2.cluster[83732672] - state is now 
member (was (null))
Oct 25 13:55:39 node2 cib[14922]:   notice: crm_update_peer_state: 
plugin_handle_membership: Node node1.cluster[100509888] - state is now 
member (was (null))
Oct 25 13:55:40 node2 corosync[29219]:   [pcmk  ] WARN: 
route_ais_message: Sending message to local.crmd failed: ipc delivery 
failed (rc=-2)
Oct 25 13:55:40 node2 cib[14922]:   notice: crm_ipc_prepare: Message 
exceeds the configured ipc limit (51200 bytes), consider configuring 
PCMK_ipc_buffer to 161796 or higher to avoid compression overheads
Oct 25 13:55:40 node2 crmd[14927]:   notice: crm_cluster_connect: 
Connecting to cluster infrastructure: classic openais (with plugin)
Oct 25 13:55:40 node2 crmd[14927]:   notice: get_node_name: Defaulting 
to uname -n for the local classic openais (with plugin) node name
Oct 25 13:55:40 node2 corosync[29219]:   [pcmk  ] info: pcmk_ipc: 
Recorded connection 0x1affc40 for crmd/0
Oct 25 13:55:40 node2 corosync[29219]:   [pcmk  ] info: pcmk_ipc: 
Sending membership update 1608 to crmd
Oct 25 13:55:40 node2 stonith-ng[14923]:   notice: setup_cib: Watching 
for stonith topology changes
Oct 25 13:55:40 node2 crmd[14927]:   notice: get_node_name: Defaulting 
to uname -n for the local classic openais (with plugin) node name
Oct 25 13:55:40 node2 stonith-ng[14923]:   notice: unpack_config: On 
loss of CCM Quorum: Ignore
Oct 25 13:55:40 node2 crmd[14927]:   notice: plugin_handle_membership: 
Membership 1608: quorum acquired
Oct 25 13:55:40 node2 crmd[14927]:   notice: crm_update_peer_state: 
plugin_handle_membership: Node node2.cluster[83732672] - state is now 
member (was (null))
Oct 25 13:55:40 node2 crmd[14927]:   notice: crm_update_peer_state: 
plugin_handle_membership: Node node1.cluster[100509888] - state is now 
member (was (null))
Oct 25 13:55:40 node2 crmd[14927]:   notice: do_started: The local CRM 
is operational
Oct 25 13:55:40 node2 crmd[14927]:   notice: do_state_transition: State 
transition S_STARTING -> S_PENDING [ input=I_PENDING 
cause=C_FSA_INTERNAL origin=do_started ]
Oct 25 13:55:40 node2 ntpd[7193]: Deleting interface #76 br0, 
192.168.253.64#123, interface stats: received=0, sent=0, dropped=0, 
active_time=3555898 secs
Oct 25 13:55:42 node2 crmd[14927]:   notice: do_state_transition: State 
transition S_PENDING -> S_NOT_DC [ input=I_NOT_DC cause=C_HA_MESSAGE 
origin=do_cl_join_finalize_respond ]
Oct 25 13:55:42 node2 attrd[14925]:   notice: attrd_local_callback: 
Sending full refresh (origin=crmd)
Oct 25 13:55:43 node2 crmd[14927]:   notice: process_lrm_event: LRM 
operation p_perl-fpm_monitor_0 (call=41, rc=7, cib-update=8, 
confirmed=true) not running
Oct 25 13:55:43 node2 crmd[14927]:   notice: process_lrm_event: LRM 
operation p_radiusd_monitor_0 (call=54, rc=7, cib-update=9, 
confirmed=true) not running
Oct 25 13:55:43 node2 pacemakerd[15105]:   notice: crm_add_logfile: 
Additional logging available in /var/log/cluster/corosync.log
Oct 25 13:55:43 node2 mysql(p_mysql)[14949]: INFO: MySQL is not running
Oct 25 13:55:43 node2 crmd[14927]:    error: generic_get_metadata: 
Failed to retrieve meta-data for ocf:percona:mysql
Oct 25 13:55:43 node2 crmd[14927]:  warning: get_rsc_metadata: No 
metadata found for mysql::ocf:percona: Input/output error (-5)
Oct 25 13:55:43 node2 crmd[14927]:   notice: process_lrm_event: LRM 
operation p_mysql_monitor_0 (call=26, rc=7, cib-update=10, 
confirmed=true) not running
Oct 25 13:55:43 node2 pgsql(p_pgsql)[14953]: INFO: Don't check 
/var/lib/pgsql/9.1/data/ during probe
Oct 25 13:55:43 node2 pgsql(p_pgsql)[14953]: INFO: PostgreSQL is down
Oct 25 13:55:43 node2 nginx(p_nginx)[14956]: INFO: nginx not running
Oct 25 13:55:43 node2 pacemakerd[15368]:   notice: crm_add_logfile: 
Additional logging available in /var/log/cluster/corosync.log
Oct 25 13:55:43 node2 crmd[14927]:   notice: process_lrm_event: LRM 
operation p_pgsql_monitor_0 (call=31, rc=7, cib-update=11, 
confirmed=true) not running
Oct 25 13:55:43 node2 crmd[14927]:   notice: process_lrm_event: LRM 
operation p_nginx_monitor_0 (call=36, rc=7, cib-update=12, 
confirmed=true) not running
Oct 25 13:55:43 node2 crmd[14927]:   notice: process_lrm_event: LRM 
operation p_web_ip_monitor_0 (call=45, rc=7, cib-update=13, 
confirmed=true) not running
Oct 25 13:55:43 node2 crmd[14927]:   notice: process_lrm_event: LRM 
operation pgsql_writer_vip_monitor_0 (call=21, rc=7, cib-update=14, 
confirmed=true) not running
Oct 25 13:55:43 node2 crmd[14927]:   notice: process_lrm_event: LRM 
operation ClusterIP_monitor_0 (call=5, rc=7, cib-update=15, 
confirmed=true) not running
Oct 25 13:55:43 node2 crmd[14927]:   notice: process_lrm_event: LRM 
operation p_radius_ip_monitor_0 (call=49, rc=7, cib-update=16, 
confirmed=true) not running
Oct 25 13:55:43 node2 crmd[14927]:   notice: process_lrm_event: LRM 
operation mysql_writer_vip_monitor_0 (call=13, rc=7, cib-update=17, 
confirmed=true) not running
Oct 25 13:55:43 node2 crmd[14927]:   notice: process_lrm_event: LRM 
operation pgsql_reader_vip_monitor_0 (call=17, rc=7, cib-update=18, 
confirmed=true) not running
Oct 25 13:55:43 node2 crmd[14927]:   notice: process_lrm_event: LRM 
operation mysql_reader_vip_monitor_0 (call=9, rc=7, cib-update=19, 
confirmed=true) not running
Oct 25 13:55:43 node2 attrd[14925]:   notice: attrd_trigger_update: 
Sending flush op to all hosts for: probe_complete (true)
Oct 25 13:55:43 node2 pacemakerd[15481]:   notice: crm_add_logfile: 
Additional logging available in /var/log/cluster/corosync.log
Oct 25 13:55:43 node2 attrd[14925]:   notice: attrd_trigger_update: 
Sending flush op to all hosts for: readable (0)
Oct 25 13:55:43 node2 attrd[14925]:   notice: attrd_trigger_update: 
Sending flush op to all hosts for: master-p_mysql (0)
Oct 25 13:55:43 node2 mysql(p_mysql)[15438]: INFO: MySQL is not running
Oct 25 13:55:43 node2 nginx(p_nginx)[15440]: INFO: nginx: the 
configuration file /etc/nginx/nginx.conf syntax is ok nginx: 
configuration file /etc/nginx/nginx.conf test is successful
Oct 25 13:55:43 node2 nginx(p_nginx)[15440]: INFO: Starting 
/usr/sbin/nginx - nginx version: nginx/1.0.15
Oct 25 13:55:43 node2 nginx(p_nginx)[15440]: INFO: /usr/sbin/nginx build 
configuration: configure arguments: --prefix=/usr/share/nginx 
--sbin-path=/usr/sbin/nginx --conf-path=/etc/nginx/nginx.conf 
--error-log-path=/var/log/nginx/error.log 
--http-log-path=/var/log/nginx/access.log 
--http-client-body-temp-path=/var/lib/nginx/tmp/client_body 
--http-proxy-temp-path=/var/lib/nginx/tmp/proxy 
--http-fastcgi-temp-path=/var/lib/nginx/tmp/fastcgi 
--http-uwsgi-temp-path=/var/lib/nginx/tmp/uwsgi 
--http-scgi-temp-path=/var/lib/nginx/tmp/scgi 
--pid-path=/var/run/nginx.pid --lock-path=/var/lock/subsys/nginx 
--user=nginx --group=nginx --with-file-aio --with-ipv6 
--with-http_ssl_module --with-http_realip_module 
--with-http_addition_module --with-http_xslt_module 
--with-http_image_filter_module --with-http_geoip_module 
--with-http_sub_module --with-http_dav_module --with-http_flv_module 
--with-http_mp4_module --with-http_gzip_static_module 
--with-http_random_index_module --with-http_secure_link_module 
--with-http_degradation_module --with-http_stub_status_module 
--with-http_perl_module --with-mail --with-mail_ssl_module 
--with-cc-opt='-O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions 
-fstack-protector --param=ssp-buffer-size=4 -m64 -mtune=generic' 
--with-ld-opt=-Wl,-E
Oct 25 13:55:43 node2 pgsql(p_pgsql)[15439]: INFO: Changing 
p_pgsql-status on node2.cluster : ->STOP.
Oct 25 13:55:44 node2 crmd[14927]:   notice: process_lrm_event: LRM 
operation p_nginx_start_0 (call=72, rc=0, cib-update=20, confirmed=true) ok
Oct 25 13:55:44 node2 attrd[14925]:   notice: attrd_trigger_update: 
Sending flush op to all hosts for: p_pgsql-status (STOP)
Oct 25 13:55:44 node2 IPaddr2(p_radius_ip)[15582]: INFO: Adding inet 
address 10.255.0.33/32 to device lo
Oct 25 13:55:44 node2 IPaddr2(p_web_ip)[15589]: INFO: Adding inet 
address 10.255.0.32/32 to device lo
Oct 25 13:55:44 node2 IPaddr2(p_radius_ip)[15582]: INFO: Bringing device 
lo up
Oct 25 13:55:44 node2 IPaddr2(p_web_ip)[15589]: INFO: Bringing device lo up
Oct 25 13:55:44 node2 crmd[14927]:   notice: process_lrm_event: LRM 
operation p_radius_ip_start_0 (call=76, rc=0, cib-update=21, 
confirmed=true) ok
Oct 25 13:55:44 node2 crmd[14927]:   notice: process_lrm_event: LRM 
operation p_web_ip_start_0 (call=78, rc=0, cib-update=22, confirmed=true) ok
Oct 25 13:55:44 node2 attrd[14925]:   notice: attrd_trigger_update: 
Sending flush op to all hosts for: master-p_pgsql (-INFINITY)
Oct 25 13:55:44 node2 crmd[14927]:   notice: process_lrm_event: LRM 
operation p_nginx_monitor_30000 (call=81, rc=0, cib-update=23, 
confirmed=false) ok
Oct 25 13:55:44 node2 pgsql(p_pgsql)[15439]: INFO: Set all nodes into 
async mode.
Oct 25 13:55:44 node2 crmd[14927]:   notice: process_lrm_event: LRM 
operation p_radius_ip_monitor_10000 (call=89, rc=0, cib-update=24, 
confirmed=false) ok
Oct 25 13:55:44 node2 crmd[14927]:   notice: process_lrm_event: LRM 
operation p_web_ip_monitor_10000 (call=91, rc=0, cib-update=25, 
confirmed=false) ok
Oct 25 13:55:44 node2 mysql(p_mysql)[15438]: INFO: MySQL is not running
Oct 25 13:55:44 node2 crmd[14927]:   notice: process_lrm_event: LRM 
operation p_nginx_monitor_10000 (call=83, rc=0, cib-update=26, 
confirmed=false) ok
Oct 25 13:55:44 node2 pgsql(p_pgsql)[15439]: INFO: server starting
Oct 25 13:55:44 node2 pgsql(p_pgsql)[15439]: INFO: PostgreSQL start 
command sent.
Oct 25 13:55:44 node2 pgsql(p_pgsql)[15439]: WARNING: Can't get 
PostgreSQL recovery status. rc=2
Oct 25 13:55:44 node2 pgsql(p_pgsql)[15439]: WARNING: Connection error 
(connection to the server went bad and the session was not interactive) 
occurred while executing the psql command.
Oct 25 13:55:44 node2 attrd[14925]:   notice: attrd_perform_update: Sent 
update 4: master-p_mysql=0
Oct 25 13:55:44 node2 attrd[14925]:   notice: attrd_perform_update: Sent 
update 7: master-p_pgsql=-INFINITY
Oct 25 13:55:44 node2 attrd[14925]:   notice: attrd_perform_update: Sent 
update 11: readable=0
Oct 25 13:55:44 node2 attrd[14925]:   notice: attrd_perform_update: Sent 
update 17: p_pgsql-status=STOP
Oct 25 13:55:44 node2 attrd[14925]:   notice: attrd_perform_update: Sent 
update 20: probe_complete=true
Oct 25 13:55:45 node2 crmd[14927]:   notice: process_lrm_event: LRM 
operation p_perl-fpm_start_0 (call=85, rc=0, cib-update=27, 
confirmed=true) ok
Oct 25 13:55:45 node2 crmd[14927]:   notice: process_lrm_event: LRM 
operation p_radiusd_start_0 (call=74, rc=0, cib-update=28, 
confirmed=true) ok
Oct 25 13:55:45 node2 pgsql(p_pgsql)[15439]: INFO: PostgreSQL is started.
Oct 25 13:55:45 node2 pgsql(p_pgsql)[15439]: INFO: Changing 
p_pgsql-status on node2.cluster : STOP->HS:alone.
Oct 25 13:55:45 node2 attrd[14925]:   notice: attrd_trigger_update: 
Sending flush op to all hosts for: p_pgsql-status (HS:alone)
Oct 25 13:55:45 node2 attrd[14925]:   notice: attrd_perform_update: Sent 
update 22: p_pgsql-status=HS:alone
Oct 25 13:55:45 node2 lrmd[14924]:   notice: operation_finished: 
p_pgsql_start_0:15439:stderr [ psql: could not connect to server: No 
such file or directory ]
Oct 25 13:55:45 node2 lrmd[14924]:   notice: operation_finished: 
p_pgsql_start_0:15439:stderr [ #011Is the server running locally and 
accepting ]
Oct 25 13:55:45 node2 lrmd[14924]:   notice: operation_finished: 
p_pgsql_start_0:15439:stderr [ #011connections on Unix domain socket 
"/tmp/.s.PGSQL.5432"? ]
Oct 25 13:55:45 node2 crmd[14927]:   notice: process_lrm_event: LRM 
operation p_pgsql_start_0 (call=70, rc=0, cib-update=29, confirmed=true) ok
Oct 25 13:55:45 node2 pacemakerd[16158]:   notice: crm_add_logfile: 
Additional logging available in /var/log/cluster/corosync.log
Oct 25 13:55:45 node2 crmd[14927]:   notice: process_lrm_event: LRM 
operation p_pgsql_notify_0 (call=100, rc=0, cib-update=0, confirmed=true) ok
Oct 25 13:55:45 node2 ntpd[7193]: Listening on interface #174 lo, 
10.255.0.33#123 Enabled
Oct 25 13:55:45 node2 ntpd[7193]: Listening on interface #175 lo, 
10.255.0.32#123 Enabled
Oct 25 13:55:45 node2 attrd[14925]:   notice: attrd_cs_dispatch: Update 
relayed from node1.cluster
Oct 25 13:55:45 node2 attrd[14925]:   notice: attrd_trigger_update: 
Sending flush op to all hosts for: p_pgsql-status (HS:async)
Oct 25 13:55:45 node2 attrd[14925]:   notice: attrd_perform_update: Sent 
update 24: p_pgsql-status=HS:async
Oct 25 13:55:46 node2 attrd[14925]:   notice: attrd_cs_dispatch: Update 
relayed from node1.cluster
Oct 25 13:55:46 node2 attrd[14925]:   notice: attrd_trigger_update: 
Sending flush op to all hosts for: master-p_pgsql (100)
Oct 25 13:55:46 node2 attrd[14925]:   notice: attrd_perform_update: Sent 
update 26: master-p_pgsql=100
Oct 25 13:55:46 node2 attrd[14925]:   notice: attrd_cs_dispatch: Update 
relayed from node1.cluster
Oct 25 13:55:46 node2 attrd[14925]:   notice: attrd_trigger_update: 
Sending flush op to all hosts for: p_pgsql-status (HS:sync)
Oct 25 13:55:46 node2 attrd[14925]:   notice: attrd_perform_update: Sent 
update 28: p_pgsql-status=HS:sync
Oct 25 13:55:48 node2 mysql(p_mysql)[15438]: INFO: Changing MySQL 
configuration to replicate from node1.cluster.
Oct 25 13:55:48 node2 mysql(p_mysql)[15438]: ERROR: check_slave invoked 
on an instance that is not a replication slave.
Oct 25 13:55:48 node2 mysql(p_mysql)[15438]: INFO: Restored master pos 
for 192.168.253.5 : mysqld-bin.000471:106
Oct 25 13:55:48 node2 attrd[14925]:   notice: attrd_trigger_update: 
Sending flush op to all hosts for: master-p_mysql (1)
Oct 25 13:55:48 node2 attrd[14925]:   notice: attrd_perform_update: Sent 
update 30: master-p_mysql=1
Oct 25 13:55:48 node2 attrd[14925]:   notice: attrd_trigger_update: 
Sending flush op to all hosts for: master-p_mysql (60)
Oct 25 13:55:48 node2 attrd[14925]:   notice: attrd_perform_update: Sent 
update 32: master-p_mysql=60
Oct 25 13:55:49 node2 attrd[14925]:   notice: attrd_trigger_update: 
Sending flush op to all hosts for: readable (1)
Oct 25 13:55:49 node2 attrd[14925]:   notice: attrd_perform_update: Sent 
update 34: readable=1
Oct 25 13:55:49 node2 mysql(p_mysql)[15438]: INFO: MySQL started
Oct 25 13:55:49 node2 lrmd[14924]:   notice: operation_finished: 
p_mysql_start_0:15438:stderr [ Error performing operation: No such 
device or address ]
Oct 25 13:55:49 node2 lrmd[14924]:   notice: operation_finished: 
p_mysql_start_0:15438:stderr [ resource ms_MySQL is NOT running ]
Oct 25 13:55:49 node2 crmd[14927]:   notice: process_lrm_event: LRM 
operation p_mysql_start_0 (call=68, rc=0, cib-update=30, confirmed=true) ok
Oct 25 13:55:49 node2 crmd[14927]:   notice: process_lrm_event: LRM 
operation p_mysql_notify_0 (call=104, rc=0, cib-update=0, confirmed=true) ok
Oct 25 13:55:49 node2 crmd[14927]:   notice: process_lrm_event: LRM 
operation p_perl-fpm_monitor_10000 (call=111, rc=0, cib-update=31, 
confirmed=false) ok
Oct 25 13:55:49 node2 crmd[14927]:   notice: process_lrm_event: LRM 
operation p_radiusd_monitor_10000 (call=113, rc=0, cib-update=32, 
confirmed=false) ok
Oct 25 13:55:49 node2 pacemakerd[16380]:   notice: crm_add_logfile: 
Additional logging available in /var/log/cluster/corosync.log
Oct 25 13:55:49 node2 crmd[14927]:   notice: process_lrm_event: LRM 
operation p_pgsql_monitor_7000 (call=109, rc=0, cib-update=33, 
confirmed=false) ok
Oct 25 13:55:49 node2 IPaddr2(mysql_reader_vip)[16445]: INFO: Adding 
inet address 192.168.253.63/24 with broadcast address 192.168.253.255 to 
device br0
Oct 25 13:55:49 node2 IPaddr2(pgsql_reader_vip)[16449]: INFO: Adding 
inet address 192.168.253.31/24 with broadcast address 192.168.253.255 to 
device br0
Oct 25 13:55:49 node2 avahi-daemon[6932]: Registering new address record 
for 192.168.253.63 on br0.IPv4.
Oct 25 13:55:49 node2 avahi-daemon[6932]: Registering new address record 
for 192.168.253.31 on br0.IPv4.
Oct 25 13:55:50 node2 IPaddr2(mysql_reader_vip)[16445]: INFO: Bringing 
device br0 up
Oct 25 13:55:50 node2 IPaddr2(pgsql_reader_vip)[16449]: INFO: Bringing 
device br0 up
Oct 25 13:55:50 node2 IPaddr2(mysql_reader_vip)[16445]: INFO: 
/usr/libexec/heartbeat/send_arp -i 200 -r 5 -p 
/var/run/resource-agents/send_arp-192.168.253.63 br0 192.168.253.63 auto 
not_used not_used
Oct 25 13:55:50 node2 IPaddr2(pgsql_reader_vip)[16449]: INFO: 
/usr/libexec/heartbeat/send_arp -i 200 -r 5 -p 
/var/run/resource-agents/send_arp-192.168.253.31 br0 192.168.253.31 auto 
not_used not_used
Oct 25 13:55:50 node2 crmd[14927]:   notice: process_lrm_event: LRM 
operation mysql_reader_vip_start_0 (call=117, rc=0, cib-update=34, 
confirmed=true) ok
Oct 25 13:55:50 node2 crmd[14927]:   notice: process_lrm_event: LRM 
operation pgsql_reader_vip_start_0 (call=119, rc=0, cib-update=35, 
confirmed=true) ok
Oct 25 13:55:50 node2 crmd[14927]:   notice: process_lrm_event: LRM 
operation mysql_reader_vip_monitor_10000 (call=124, rc=0, cib-update=36, 
confirmed=false) ok
Oct 25 13:55:50 node2 attrd[14925]:   notice: attrd_trigger_update: 
Sending flush op to all hosts for: master-p_mysql (52)
Oct 25 13:55:50 node2 attrd[14925]:   notice: attrd_perform_update: Sent 
update 36: master-p_mysql=52
Oct 25 13:55:50 node2 crmd[14927]:   notice: process_lrm_event: LRM 
operation pgsql_reader_vip_monitor_10000 (call=126, rc=0, cib-update=37, 
confirmed=false) ok
Oct 25 13:55:50 node2 crmd[14927]:   notice: process_lrm_event: LRM 
operation p_mysql_monitor_2000 (call=107, rc=0, cib-update=38, 
confirmed=false) ok
Oct 25 13:55:51 node2 ntpd[7193]: Listening on interface #176 br0, 
192.168.253.63#123 Enabled
Oct 25 13:55:51 node2 ntpd[7193]: Listening on interface #177 br0, 
192.168.253.31#123 Enabled
Oct 25 13:55:52 node2 attrd[14925]:   notice: attrd_trigger_update: 
Sending flush op to all hosts for: master-p_mysql (50)
Oct 25 13:55:52 node2 attrd[14925]:   notice: attrd_perform_update: Sent 
update 38: master-p_mysql=50
Oct 25 13:55:54 node2 attrd[14925]:   notice: attrd_trigger_update: 
Sending flush op to all hosts for: master-p_mysql (51)
Oct 25 13:55:54 node2 attrd[14925]:   notice: attrd_perform_update: Sent 
update 40: master-p_mysql=51
Oct 25 13:55:56 node2 mysql(p_mysql)[17074]: ERROR: MySQL instance 
configured for replication, but replication has failed.
Oct 25 13:55:56 node2 attrd[14925]:   notice: attrd_trigger_update: 
Sending flush op to all hosts for: readable (0)
Oct 25 13:55:56 node2 attrd[14925]:   notice: attrd_perform_update: Sent 
update 42: readable=0
Oct 25 13:55:56 node2 attrd[14925]:   notice: attrd_trigger_update: 
Sending flush op to all hosts for: master-p_mysql (0)
Oct 25 13:55:56 node2 attrd[14925]:   notice: attrd_perform_update: Sent 
update 44: master-p_mysql=0
Oct 25 13:55:56 node2 IPaddr2(mysql_reader_vip)[17141]: INFO: IP status 
= ok, IP_CIP=
Oct 25 13:55:56 node2 avahi-daemon[6932]: Withdrawing address record for 
192.168.253.63 on br0.
Oct 25 13:55:56 node2 crmd[14927]:   notice: process_lrm_event: LRM 
operation mysql_reader_vip_stop_0 (call=132, rc=0, cib-update=39, 
confirmed=true) ok
Oct 25 13:55:57 node2 pacemakerd[17206]:   notice: crm_add_logfile: 
Additional logging available in /var/log/cluster/corosync.log
Oct 25 13:55:58 node2 ntpd[7193]: Deleting interface #176 br0, 
192.168.253.63#123, interface stats: received=0, sent=0, dropped=0, 
active_time=7 secs
Oct 25 13:55:58 node2 mysql(p_mysql)[17267]: ERROR: MySQL instance 
configured for replication, but replication has failed.

>
> Prevent:
>
> Fencing (aka stonith). This is why fencing is required.
No node failure. Just daemon was restarted.

>
>> 2) How to resolve split brain state? Is it enough just to wait for
>> failure, then - restart mysql by hand and clean row with dup index in
>> slave db, and then run resource again? Or there is some automation for
>> such cases?
>
> How are you sharing data? Can you give us a better understanding of 
> your setup?
>
Semi-synchronous MySQL replication, if you mean sharing DB log between 
nodes.