<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN">
<html><body style='font-family: Verdana,Geneva,sans-serif'>
<p>Hi</p>
<pre>>Check the pacemaker logs on both bodes around the time it happens.<br /><br />This scenary happens when one node is starting and the other doesn't have the corosync&pacemaker services started. So only one node to check logs<br /><br />
>One of the nodes will be the DC, and will have "pengine:" logs with
>"saving inputs".<br /><br />No "saving inputs" message on logs on startup<br />
>The first thing I'd look for is who requested fencing. The DC will have
>stonith logs with "Client ... wants to fence ...". The client will
>either be crmd (i.e. the cluster itself) or some external program.<br /><br />It's crmd<br /><br />Aug 31 10:59:20 [30612] node1 stonith-ng: notice: handle_request: Client crmd.30616.aa4a8de3 wants to fence (reboot) 'node2' with device '(any)'<br />Aug 31 10:59:37 [30612] node1 stonith-ng: notice: handle_request: Client crmd.30616.aa4a8de3 wants to fence (reboot) 'node2' with device '(any)'<br />Aug 31 10:59:53 [30612] node1 stonith-ng: notice: handle_request: Client crmd.30616.aa4a8de3 wants to fence (reboot) 'node2' with device '(any)'<br />
>If it's the cluster, I'd look at the "pengine:" logs on the DC before
>that, to see if there are any hints (node unclean, etc.). Then keep
>going backward until the ultimate cause is found.<br /><br />The following are the pengine logs previous to the fist fencing:<br /><br />Aug 31 10:58:58 [30615] node1 pengine: info: crm_log_init: Changed active directory to /var/lib/pacemaker/cores/hacluster<br />Aug 31 10:58:58 [30615] node1 pengine: info: qb_ipcs_us_publish: server name: pengine<br />Aug 31 10:58:58 [30615] node1 pengine: info: main: Starting pengine<br />Aug 31 10:59:20 [30615] node1 pengine: notice: unpack_config: On loss of CCM Quorum: Ignore<br />Aug 31 10:59:20 [30615] node1 pengine: info: determine_online_status_fencing: Node node1 is active<br />Aug 31 10:59:20 [30615] node1 pengine: info: determine_online_status: Node node1 is online<br />Aug 31 10:59:20 [30615] node1 pengine: info: clone_print: Clone Set: fencing [st-fence_propio]<br />Aug 31 10:59:20 [30615] node1 pengine: info: short_print: Stopped: [ node1 node2 ]<br />Aug 31 10:59:20 [30615] node1 pengine: info: clone_print: Master/Slave Set: ms_drbd_databasestorage [p_drbd_databasestorage]<br />Aug 31 10:59:20 [30615] node1 pengine: info: short_print: Stopped: [ node1 node2 ]<br />Aug 31 10:59:20 [30615] node1 pengine: info: clone_print: Master/Slave Set: ms_drbd_datoswebstorage [p_drbd_datoswebstorage]<br />Aug 31 10:59:20 [30615] node1 pengine: info: short_print: Stopped: [ node1 node2 ]<br />Aug 31 10:59:20 [30615] node1 pengine: info: group_print: Resource Group: rg_database<br />Aug 31 10:59:20 [30615] node1 pengine: info: native_print: p_fs_database (ocf::heartbeat:Filesystem): Stopped<br />Aug 31 10:59:20 [30615] node1 pengine: info: native_print: p_ip_databasestorageip (ocf::heartbeat:IPaddr): Stopped<br />Aug 31 10:59:20 [30615] node1 pengine: info: native_print: p_ip_pub_database (ocf::heartbeat:IPaddr): Stopped<br />Aug 31 10:59:20 [30615] node1 pengine: info: native_print: p_moverip_database (lsb:moverip_database): Stopped<br />Aug 31 10:59:20 [30615] node1 pengine: info: native_print: servicio_enviamailpacemakerdatabase (lsb:enviamailpacemakerdatabase): Stopped<br />Aug 31 10:59:20 [30615] node1 pengine: info: group_print: Resource Group: rg_datosweb<br />Aug 31 10:59:20 [30615] node1 pengine: info: native_print: p_fs_datosweb (ocf::heartbeat:Filesystem): Stopped<br />Aug 31 10:59:20 [30615] node1 pengine: info: native_print: p_ip_datoswebstorageip (ocf::heartbeat:IPaddr): Stopped<br />Aug 31 10:59:20 [30615] node1 pengine: info: native_print: p_ip_pub_datosweb (ocf::heartbeat:IPaddr): Stopped<br />Aug 31 10:59:20 [30615] node1 pengine: info: native_print: p_moverip_datosweb (lsb:moverip_datosweb): Stopped<br />Aug 31 10:59:20 [30615] node1 pengine: info: native_print: servicio_enviamailpacemakerdatosweb (lsb:enviamailpacemakerdatosweb): Stopped<br />Aug 31 10:59:20 [30615] node1 pengine: info: native_color: Resource st-fence_propio:1 cannot run anywhere<br />Aug 31 10:59:20 [30615] node1 pengine: info: native_color: Resource p_drbd_databasestorage:1 cannot run anywhere<br />Aug 31 10:59:20 [30615] node1 pengine: info: master_color: ms_drbd_databasestorage: Promoted 0 instances of a possible 1 to master<br />Aug 31 10:59:20 [30615] node1 pengine: info: native_color: Resource p_drbd_datoswebstorage:1 cannot run anywhere<br />Aug 31 10:59:20 [30615] node1 pengine: info: master_color: ms_drbd_datoswebstorage: Promoted 0 instances of a possible 1 to master<br />Aug 31 10:59:20 [30615] node1 pengine: info: rsc_merge_weights: p_fs_database: Rolling back scores from p_ip_databasestorageip<br />Aug 31 10:59:20 [30615] node1 pengine: info: native_color: Resource p_fs_database cannot run anywhere<br />Aug 31 10:59:20 [30615] node1 pengine: info: rsc_merge_weights: p_ip_databasestorageip: Rolling back scores from p_ip_pub_database<br />Aug 31 10:59:20 [30615] node1 pengine: info: native_color: Resource p_ip_databasestorageip cannot run anywhere<br />Aug 31 10:59:20 [30615] node1 pengine: info: rsc_merge_weights: p_ip_pub_database: Rolling back scores from p_moverip_database<br />Aug 31 10:59:20 [30615] node1 pengine: info: native_color: Resource p_ip_pub_database cannot run anywhere<br />Aug 31 10:59:20 [30615] node1 pengine: info: rsc_merge_weights: p_moverip_database: Rolling back scores from servicio_enviamailpacemakerdatabase<br />Aug 31 10:59:20 [30615] node1 pengine: info: native_color: Resource p_moverip_database cannot run anywhere<br />Aug 31 10:59:20 [30615] node1 pengine: info: native_color: Resource servicio_enviamailpacemakerdatabase cannot run anywhere<br />Aug 31 10:59:20 [30615] node1 pengine: info: rsc_merge_weights: p_fs_datosweb: Rolling back scores from p_ip_datoswebstorageip<br />Aug 31 10:59:20 [30615] node1 pengine: info: native_color: Resource p_fs_datosweb cannot run anywhere<br />Aug 31 10:59:20 [30615] node1 pengine: info: rsc_merge_weights: p_ip_datoswebstorageip: Rolling back scores from p_ip_pub_datosweb<br />Aug 31 10:59:20 [30615] node1 pengine: info: native_color: Resource p_ip_datoswebstorageip cannot run anywhere<br />Aug 31 10:59:20 [30615] node1 pengine: info: rsc_merge_weights: p_ip_pub_datosweb: Rolling back scores from p_moverip_datosweb<br />Aug 31 10:59:20 [30615] node1 pengine: info: native_color: Resource p_ip_pub_datosweb cannot run anywhere<br />Aug 31 10:59:20 [30615] node1 pengine: info: rsc_merge_weights: p_moverip_datosweb: Rolling back scores from servicio_enviamailpacemakerdatosweb<br />Aug 31 10:59:20 [30615] node1 pengine: info: native_color: Resource p_moverip_datosweb cannot run anywhere<br />Aug 31 10:59:20 [30615] node1 pengine: info: native_color: Resource servicio_enviamailpacemakerdatosweb cannot run anywhere<br />Aug 31 10:59:20 [30615] node1 pengine: info: RecurringOp: Start recurring monitor (31s) for p_drbd_databasestorage:0 on node1<br />Aug 31 10:59:20 [30615] node1 pengine: info: RecurringOp: Start recurring monitor (31s) for p_drbd_databasestorage:0 on node1<br />Aug 31 10:59:20 [30615] node1 pengine: info: RecurringOp: Start recurring monitor (31s) for p_drbd_datoswebstorage:0 on node1<br />Aug 31 10:59:20 [30615] node1 pengine: info: RecurringOp: Start recurring monitor (31s) for p_drbd_datoswebstorage:0 on node1<br />Aug 31 10:59:20 [30615] node1 pengine: warning: stage6: Scheduling Node node2 for STONITH<br /><br /><br /><br />Any more clues?<br />Thanks<br />Cesar<br /><br /><br /></pre>
</body></html>