[ClusterLabs] Pacemaker shutting down peer node

Thu Jun 15 14:53:00 EDT 2017

On 06/15/2017 12:38 AM, Jaz Khan wrote:
> Hi,
> 
> I have been encountering this serious issue from past couple of months.
> I really have no idea that why pacemaker sends shutdown signal to peer
> node and it goes down. This is very strange and I am too much worried . 
> 
> This is not happening daily, but it surely does this kind of behavior
> after every few days.
> 
> Version:
> Pacemaker 1.1.16
> Corosync 2.4.2
> 
> Please help me out with this bug! Below is the log message.
> 
> 
> 
> Jun 14 15:52:23 apex1 crmd[18733]:  notice: State transition S_IDLE ->
> S_POLICY_ENGINE
> Jun 14 15:52:23 apex1 pengine[18732]:  notice: On loss of CCM Quorum: Ignore
> 
> Jun 14 15:52:23 apex1 pengine[18732]:  notice: Scheduling Node ha-apex2
> for shutdown

This is not a fencing, but a clean shutdown. Normally this only happens
in response to a user request.

Check the logs on both nodes before this point, to try to see what was
the first indication that it would shut down.

> 
> Jun 14 15:52:23 apex1 pengine[18732]:  notice: Move    vip#011(Started
> ha-apex2 -> ha-apex1)
> Jun 14 15:52:23 apex1 pengine[18732]:  notice: Move  
>  filesystem#011(Started ha-apex2 -> ha-apex1)
> Jun 14 15:52:23 apex1 pengine[18732]:  notice: Move    samba#011(Started
> ha-apex2 -> ha-apex1)
> Jun 14 15:52:23 apex1 pengine[18732]:  notice: Move  
>  database#011(Started ha-apex2 -> ha-apex1)
> Jun 14 15:52:23 apex1 pengine[18732]:  notice: Calculated transition
> 1744, saving inputs in /var/lib/pacemaker/pengine/pe-input-123.bz2
> Jun 14 15:52:23 apex1 crmd[18733]:  notice: Initiating stop operation
> vip_stop_0 on ha-apex2
> Jun 14 15:52:23 apex1 crmd[18733]:  notice: Initiating stop operation
> samba_stop_0 on ha-apex2
> Jun 14 15:52:23 apex1 crmd[18733]:  notice: Initiating stop operation
> database_stop_0 on ha-apex2
> Jun 14 15:52:26 apex1 crmd[18733]:  notice: Initiating stop operation
> filesystem_stop_0 on ha-apex2
> Jun 14 15:52:27 apex1 kernel: drbd apexdata apex2.br <http://apex2.br>:
> peer( Primary -> Secondary )
> Jun 14 15:52:27 apex1 crmd[18733]:  notice: Initiating start operation
> filesystem_start_0 locally on ha-apex1
> 
> Jun 14 15:52:27 apex1 crmd[18733]:  notice: do_shutdown of peer ha-apex2
> is complete
> 
> Jun 14 15:52:27 apex1 attrd[18731]:  notice: Node ha-apex2 state is now lost
> Jun 14 15:52:27 apex1 attrd[18731]:  notice: Removing all ha-apex2
> attributes for peer loss
> Jun 14 15:52:27 apex1 attrd[18731]:  notice: Lost attribute writer ha-apex2
> Jun 14 15:52:27 apex1 attrd[18731]:  notice: Purged 1 peers with id=2
> and/or uname=ha-apex2 from the membership cache
> Jun 14 15:52:27 apex1 stonith-ng[18729]:  notice: Node ha-apex2 state is
> now lost
> Jun 14 15:52:27 apex1 stonith-ng[18729]:  notice: Purged 1 peers with
> id=2 and/or uname=ha-apex2 from the membership cache
> Jun 14 15:52:27 apex1 cib[18728]:  notice: Node ha-apex2 state is now lost
> Jun 14 15:52:27 apex1 cib[18728]:  notice: Purged 1 peers with id=2
> and/or uname=ha-apex2 from the membership cache
> 
> 
> 
> Best regards,
> Jaz. K