[ClusterLabs] [Question] About movement of pacemaker_remote.
renayama19661014 at ybb.ne.jp
renayama19661014 at ybb.ne.jp
Thu Mar 12 06:00:21 UTC 2015
Hi All,
We confirm a function of pacemaker_remote.(stonith is invalid.)
There are two questions.
* Questsion 1 : The pacemaker_remote does not restore from trouble. Is this right movement?
- Step1 - Start a cluster.
-----------------------
[root at sl7-01 ~]# crm_mon -1 -Af
Last updated: Thu Mar 12 14:25:05 2015
Last change: Thu Mar 12 14:24:31 2015
Stack: corosync
Current DC: sl7-01 (2130706433) - partition WITHOUT quorum
Version: 1.1.12-ce09802
3 Nodes configured
5 Resources configured
Online: [ sl7-01 ]
RemoteOnline: [ snmp1 snmp2 ]
Host-rsc1 (ocf::heartbeat:Dummy): Started sl7-01
Remote-rsc1 (ocf::heartbeat:Dummy): Started snmp1
Remote-rsc2 (ocf::heartbeat:Dummy): Started snmp2
snmp1 (ocf::pacemaker:remote): Started sl7-01
snmp2 (ocf::pacemaker:remote): Started sl7-01
-----------------------
- Step2 - Let pacemaker_remote break down.
-----------------------
[root at snmp2 ~]# /usr/sbin/pacemaker_remoted &
[1] 24202
[root at snmp2 ~]# kill -TERM 24202
[root at sl7-01 ~]# crm_mon -1 -Af
Last updated: Thu Mar 12 14:25:55 2015
Last change: Thu Mar 12 14:24:31 2015
Stack: corosync
Current DC: sl7-01 (2130706433) - partition WITHOUT quorum
Version: 1.1.12-ce09802
3 Nodes configured
5 Resources configured
Online: [ sl7-01 ]
RemoteOnline: [ snmp1 ]
RemoteOFFLINE: [ snmp2 ]
Host-rsc1 (ocf::heartbeat:Dummy): Started sl7-01
Remote-rsc1 (ocf::heartbeat:Dummy): Started snmp1
snmp1 (ocf::pacemaker:remote): Started sl7-01
snmp2 (ocf::pacemaker:remote): FAILED sl7-01
Migration summary:
* Node sl7-01:
snmp2: migration-threshold=1 fail-count=1 last-failure='Thu Mar 12 14:25:40 2015'
* Node snmp1:
Failed actions:
snmp2_monitor_3000 on sl7-01 'unknown error' (1): call=6, status=Error, exit-reason='none', last-rc-change='Thu Mar 12 14:25:40 2015', queued=0ms, exec=0ms
-----------------------
- Step3 - Reboot pacemaker_remote. And remote clear it, but a node is offline.
-----------------------
[root at snmp2 ~]# /usr/sbin/pacemaker_remoted &
[2] 24248
[root at sl7-01 ~]# crm_resource -C -r snmp2
Cleaning up snmp2 on sl7-01
Cleaning up snmp2 on snmp1
Waiting for 1 replies from the CRMd. OK
[root at sl7-01 ~]# crm_mon -1 -Af
Last updated: Thu Mar 12 14:26:46 2015
Last change: Thu Mar 12 14:26:26 2015
Stack: corosync
Current DC: sl7-01 (2130706433) - partition WITHOUT quorum
Version: 1.1.12-ce09802
3 Nodes configured
5 Resources configured
Online: [ sl7-01 ]
RemoteOnline: [ snmp1 ]
RemoteOFFLINE: [ snmp2 ]
Host-rsc1 (ocf::heartbeat:Dummy): Started sl7-01
Remote-rsc1 (ocf::heartbeat:Dummy): Started snmp1
snmp1 (ocf::pacemaker:remote): Started sl7-01
snmp2 (ocf::pacemaker:remote): FAILED sl7-01
Migration summary:
* Node sl7-01:
snmp2: migration-threshold=1 fail-count=1000000 last-failure='Thu Mar 12 14:26:44 2015'
* Node snmp1:
Failed actions:
snmp2_start_0 on sl7-01 'unknown error' (1): call=8, status=Timed Out, exit-reason='none', last-rc-change='Thu Mar 12 14:26:26 2015', queued=0ms, exec=0ms
snmp2_start_0 on sl7-01 'unknown error' (1): call=8, status=Timed Out, exit-reason='none', last-rc-change='Thu Mar 12 14:26:26 2015', queued=0ms, exec=0ms
-----------------------
* Questsion 2 : When pacemaker_remote broke down, is the method that stonith moves a resource by constitution of the invalidity right in the next procedure? Is there a procedure to move without deleting the node?
- Step1 - Start a cluster.
-----------------------
[root at sl7-01 ~]# crm_mon -1 -Af
Last updated: Thu Mar 12 14:30:27 2015
Last change: Thu Mar 12 14:29:14 2015
Stack: corosync
Current DC: sl7-01 (2130706433) - partition WITHOUT quorum
Version: 1.1.12-ce09802
3 Nodes configured
5 Resources configured
Online: [ sl7-01 ]
RemoteOnline: [ snmp1 snmp2 ]
Host-rsc1 (ocf::heartbeat:Dummy): Started sl7-01
Remote-rsc1 (ocf::heartbeat:Dummy): Started snmp1
Remote-rsc2 (ocf::heartbeat:Dummy): Started snmp2
snmp1 (ocf::pacemaker:remote): Started sl7-01
snmp2 (ocf::pacemaker:remote): Started sl7-01
-----------------------
- Step2 - Let pacemaker_remote break down.
-----------------------
[root at snmp2 ~]# kill -TERM 24248
[root at sl7-01 ~]# crm_mon -1 -Af
Last updated: Thu Mar 12 14:31:59 2015
Last change: Thu Mar 12 14:29:14 2015
Stack: corosync
Current DC: sl7-01 (2130706433) - partition WITHOUT quorum
Version: 1.1.12-ce09802
3 Nodes configured
5 Resources configured
Online: [ sl7-01 ]
RemoteOnline: [ snmp1 ]
RemoteOFFLINE: [ snmp2 ]
Host-rsc1 (ocf::heartbeat:Dummy): Started sl7-01
Remote-rsc1 (ocf::heartbeat:Dummy): Started snmp1
snmp1 (ocf::pacemaker:remote): Started sl7-01
snmp2 (ocf::pacemaker:remote): FAILED sl7-01
Migration summary:
* Node sl7-01:
snmp2: migration-threshold=1 fail-count=1 last-failure='Thu Mar 12 14:31:42 2015'
* Node snmp1:
Failed actions:
snmp2_monitor_3000 on sl7-01 'unknown error' (1): call=6, status=Error, exit-reason='none', last-rc-change='Thu Mar 12 14:31:42 2015', queued=0ms, exec=0ms
-----------------------
- Step3 - We delete the inoperative node. Then a resource moves.
-----------------------
[root at sl7-01 ~]# crm
crm(live)# node
crm(live)node# delete snmp2
INFO: node snmp2 deleted
[root at sl7-01 ~]# crm_mon -1 -Af
Last updated: Thu Mar 12 14:35:00 2015
Last change: Thu Mar 12 14:34:20 2015
Stack: corosync
Current DC: sl7-01 (2130706433) - partition WITHOUT quorum
Version: 1.1.12-ce09802
3 Nodes configured
5 Resources configured
Online: [ sl7-01 ]
RemoteOnline: [ snmp1 ]
RemoteOFFLINE: [ snmp2 ]
Host-rsc1 (ocf::heartbeat:Dummy): Started sl7-01
Remote-rsc1 (ocf::heartbeat:Dummy): Started snmp1
Remote-rsc2 (ocf::heartbeat:Dummy): Started snmp1
snmp1 (ocf::pacemaker:remote): Started sl7-01
Migration summary:
* Node sl7-01:
snmp2: migration-threshold=1 fail-count=1 last-failure='Thu Mar 12 14:51:44 2015'
* Node snmp1:
Failed actions:
snmp2_monitor_3000 on sl7-01 'unknown error' (1): call=6, status=Error, exit-reason='none', last-rc-change='Thu Mar 12 14:51:44 2015', queued=0ms, exec=0ms
-----------------------
Best Regards,
Hideo Yamauchi.
More information about the Users
mailing list