[ClusterLabs] Antw: Re: Problem with stonith and starting services

Mon Jul 17 02:02:24 EDT 2017

Hi!

Could this mean the stonith-timeout is signioficantly larger than the time for a complete reboot? So the fenced node would be up again when the cluster thinks the fencing has just completed.

Regards,
Ulrich
P.S: Sorry for the late reply; I was offline for a while...

>>> Cesar Hernandez <c.hernandez at medlabmg.com> schrieb am 06.07.2017 um 16:20 in
Nachricht <0674AEED-8FD2-4DAB-A27F-498DB0F36C6D at medlabmg.com>:

>> 
>> If node2 is getting the notification of its own fencing, it wasn't
>> successfully fenced. Successful fencing would render it incapacitated
>> (powered down, or at least cut off from the network and any shared
>> resources).
> 
> 
> Maybe I don't understand you, or maybe you don't understand me... ;)
> This is the syslog of the machine, where you can see that the machine has 
> rebooted successfully, and as I said, it has been rebooted successfully all 
> the times:
> 
> Jul  5 10:41:54 node2 kernel: [    0.000000] Initializing cgroup subsys 
> cpuset
> Jul  5 10:41:54 node2 kernel: [    0.000000] Initializing cgroup subsys cpu
> Jul  5 10:41:54 node2 kernel: [    0.000000] Initializing cgroup subsys 
> cpuacct
> Jul  5 10:41:54 node2 kernel: [    0.000000] Linux version 3.16.0-4-amd64 
> (debian-kernel at lists.debian.org) (gcc version 4.8.4 (Debian 4.8.4-1) ) #1 SMP 
> Debian 3.16.39-1 (2016-12-30)
> Jul  5 10:41:54 node2 kernel: [    0.000000] Command line: 
> BOOT_IMAGE=/boot/vmlinuz-3.16.0-4-amd64 
> root=UUID=711e1ec2-2a36-4405-bf46-44b43cfee42e ro init=/bin/systemd 
> console=ttyS0 console=hvc0
> Jul  5 10:41:54 node2 kernel: [    0.000000] e820: BIOS-provided physical RAM 
> map:
> Jul  5 10:41:54 node2 kernel: [    0.000000] BIOS-e820: [mem 
> 0x0000000000000000-0x000000000009dfff] usable
> Jul  5 10:41:54 node2 kernel: [    0.000000] BIOS-e820: [mem 
> 0x000000000009e000-0x000000000009ffff] reserved
> Jul  5 10:41:54 node2 kernel: [    0.000000] BIOS-e820: [mem 
> 0x00000000000e0000-0x00000000000fffff] reserved
> Jul  5 10:41:54 node2 kernel: [    0.000000] BIOS-e820: [mem 
> 0x0000000000100000-0x000000003fffffff] usable
> Jul  5 10:41:54 node2 kernel: [    0.000000] BIOS-e820: [mem 
> 0x00000000fc000000-0x00000000ffffffff] reserved
> Jul  5 10:41:54 node2 kernel: [    0.000000] NX (Execute Disable) 
> protection: active
> Jul  5 10:41:54 node2 kernel: [    0.000000] SMBIOS 2.4 present.
> 
> ...
> 
> Jul  5 10:41:54 node2 dhclient: DHCPREQUEST on eth0 to 255.255.255.255 port 
> 67
> 
> ...
> 
> Jul  5 10:41:54 node2 corosync[585]:   [MAIN  ] Corosync Cluster Engine 
> ('UNKNOWN'): started and ready to provide service.
> Jul  5 10:41:54 node2 corosync[585]:   [MAIN  ] Corosync built-in features: 
> nss
> Jul  5 10:41:54 node2 corosync[585]:   [MAIN  ] Successfully read main 
> configuration file '/etc/corosync/corosync.conf'.
> 
> ...
> 
> Jul  5 10:41:57 node2 crmd[608]:   notice: Defaulting to uname -n for the 
> local classic openais (with plugin) node name
> Jul  5 10:41:57 node2 crmd[608]:   notice: Membership 4308: quorum acquired
> Jul  5 10:41:57 node2 crmd[608]:   notice: plugin_handle_membership: Node 
> node2[1108352940] - state is now member (was (null))
> Jul  5 10:41:57 node2 crmd[608]:   notice: plugin_handle_membership: Node 
> node11[794540] - state is now member (was (null))
> Jul  5 10:41:57 node2 crmd[608]:   notice: The local CRM is operational
> Jul  5 10:41:57 node2 crmd[608]:   notice: State transition S_STARTING -> 
> S_PENDING [ input=I_PENDING cause=C_FSA_INTERNAL origin=do_started ]
> Jul  5 10:41:57 node2 stonith-ng[604]:   notice: Watching for stonith 
> topology changes
> Jul  5 10:41:57 node2 stonith-ng[604]:   notice: Membership 4308: quorum 
> acquired
> Jul  5 10:41:57 node2 stonith-ng[604]:   notice: plugin_handle_membership: 
> Node node11[794540] - state is now member (was (null))
> Jul  5 10:41:57 node2 stonith-ng[604]:   notice: On loss of CCM Quorum: 
> Ignore
> Jul  5 10:41:58 node2 stonith-ng[604]:   notice: Added 'st-fence_propio:0' to 
> the device list (1 active devices)
> Jul  5 10:41:59 node2 stonith-ng[604]:   notice: Operation reboot of node2 by 
> node11 for crmd.2141 at node11.61c3e613: OK
> Jul  5 10:41:59 node2 crmd[608]:     crit: We were allegedly just fenced by 
> node11 for node11!
> Jul  5 10:41:59 node2 corosync[585]:   [pcmk  ] info: pcmk_ipc_exit: Client 
> crmd (conn=0x228d970, async-conn=0x228d970) left
> Jul  5 10:41:59 node2 pacemakerd[597]:  warning: The crmd process (608) can 
> no longer be respawned, shutting the cluster down.
> Jul  5 10:41:59 node2 pacemakerd[597]:   notice: Shutting down Pacemaker
> Jul  5 10:41:59 node2 pacemakerd[597]:   notice: Stopping pengine: Sent -15 
> to process 607
> Jul  5 10:41:59 node2 pengine[607]:   notice: Invoking handler for signal 
> 15: Terminated
> Jul  5 10:41:59 node2 pacemakerd[597]:   notice: Stopping attrd: Sent -15 to 
> process 606
> Jul  5 10:41:59 node2 attrd[606]:   notice: Invoking handler for signal 15: 
> Terminated
> Jul  5 10:41:59 node2 attrd[606]:   notice: Exiting...
> Jul  5 10:41:59 node2 corosync[585]:   [pcmk  ] info: pcmk_ipc_exit: Client 
> attrd (conn=0x2280ef0, async-conn=0x2280ef0) left
> 
> 
> _______________________________________________
> Users mailing list: Users at clusterlabs.org 
> http://lists.clusterlabs.org/mailman/listinfo/users 
> 
> Project Home: http://www.clusterlabs.org 
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf 
> Bugs: http://bugs.clusterlabs.org