[ClusterLabs] corosync doesn't start any resource

Tue Jun 19 10:39:27 EDT 2018

On Tue, 2018-06-19 at 16:17 +0200, Stefan Krueger wrote:
> Hi Ken,
> 
> thanks for help!
> I create a stonith-device and delete the no-quorum-policy.
> 
> It doesn't change anything, so I delete the orders, (co)locations and
> one ressource (nfs-server). at first it works fine but when I stop a
> cluster via 'pcs cluster stop' it takes infinity time, it looks like
> it has an problem with the nfs server so I tried to stop them
> manuelly via systemctl stop nfs-server, but it didn't change anything
> - the nfs-server won't stop. So I did a reset the server, now
> everything should move to the other node but it also didn't happen :(
> 
> Manually I can start/stop the nfs-server without any problems (nobody
> mount the nfs-share yet):
> systemctl start nfs-server.service ; sleep 5; systemctl status nfs-
> server.service ; sleep 5; systemctl stop nfs-server
> 
> so, again my ressources won't start
> pcs status
> Cluster name: zfs-vmstorage
> Stack: corosync
> Current DC: zfs-serv3 (version 1.1.16-94ff4df) - partition with
> quorum
> Last updated: Tue Jun 19 16:15:37 2018
> Last change: Tue Jun 19 15:41:24 2018 by hacluster via crmd on zfs-
> serv4
> 
> 2 nodes configured
> 5 resources configured
> 
> Online: [ zfs-serv3 zfs-serv4 ]
> 
> Full list of resources:
> 
>  vm_storage     (ocf::heartbeat:ZFS):   Stopped
>  ha-ip  (ocf::heartbeat:IPaddr2):       Stopped
>  resIPMI-zfs4   (stonith:external/ipmi):        Started zfs-serv3
>  resIPMI-zfs3   (stonith:external/ipmi):        Started zfs-serv4
>  nfs-server     (systemd:nfs-server):   Stopped

I'd check the logs for more information. It's odd that status doesn't
show any failures, which suggests the cluster didn't schedule any
actions.

The system log will have the most essential information. The detail log
(usually /var/log/pacemaker.log or /var/log/cluster/corosync.log) will
have extended information. The most interesting will be messages from
the pengine with actions to be scheduled ("Start", etc.). Then there
should be messages from the crmd about "Initiating" the command and
obtaining its "Result".

> 
> Daemon Status:
>   corosync: active/enabled
>   pacemaker: active/enabled
>   pcsd: active/enabled
> 
> 
> 
> 
> pcs config
> Cluster Name: zfs-vmstorage
> Corosync Nodes:
>  zfs-serv3 zfs-serv4
> Pacemaker Nodes:
>  zfs-serv3 zfs-serv4
> 
> Resources:
>  Resource: vm_storage (class=ocf provider=heartbeat type=ZFS)
>   Attributes: pool=vm_storage importargs="-d /dev/disk/by-vdev/"
>   Operations: monitor interval=5s timeout=30s (vm_storage-monitor-
> interval-5s)
>               start interval=0s timeout=90 (vm_storage-start-
> interval-0s)
>               stop interval=0s timeout=90 (vm_storage-stop-interval-
> 0s)
>  Resource: ha-ip (class=ocf provider=heartbeat type=IPaddr2)
>   Attributes: ip=172.16.101.73 cidr_netmask=16
>   Operations: start interval=0s timeout=20s (ha-ip-start-interval-0s)
>               stop interval=0s timeout=20s (ha-ip-stop-interval-0s)
>               monitor interval=10s timeout=20s (ha-ip-monitor-
> interval-10s)
>  Resource: nfs-server (class=systemd type=nfs-server)
>   Operations: start interval=0s timeout=100 (nfs-server-start-
> interval-0s)
>               stop interval=0s timeout=100 (nfs-server-stop-interval-
> 0s)
>               monitor interval=60 timeout=100 (nfs-server-monitor-
> interval-60)
> 
> Stonith Devices:
>  Resource: resIPMI-zfs4 (class=stonith type=external/ipmi)
>   Attributes: hostname=ipmi-zfs-serv4 ipaddr=172.xx.xx.17 userid=USER
> passwd=GEHEIM interface=lan
>   Operations: monitor interval=60s (resIPMI-zfs4-monitor-interval-
> 60s)
>  Resource: resIPMI-zfs3 (class=stonith type=external/ipmi)
>   Attributes: hostname=ipmi-zfs-serv3 ipaddr=172.xx.xx.16 userid=USER
> passwd=GEHEIM interface=lan
>   Operations: monitor interval=60s (resIPMI-zfs3-monitor-interval-
> 60s)
> Fencing Levels:
> 
> Location Constraints:
>   Resource: resIPMI-zfs3
>     Disabled on: zfs-serv3 (score:-INFINITY) (id:location-resIPMI-
> zfs3-zfs-serv3--INFINITY)
>   Resource: resIPMI-zfs4
>     Disabled on: zfs-serv4 (score:-INFINITY) (id:location-resIPMI-
> zfs4-zfs-serv4--INFINITY)
> Ordering Constraints:
>   Resource Sets:
>     set nfs-server vm_storage ha-ip action=start (id:pcs_rsc_set_nfs-
> server_vm_storage_ha-ip) (id:pcs_rsc_order_set_nfs-
> server_vm_storage_ha-ip)
>     set ha-ip nfs-server vm_storage action=stop (id:pcs_rsc_set_ha-
> ip_nfs-server_vm_storage) (id:pcs_rsc_order_set_ha-ip_nfs-
> server_vm_storage)
> Colocation Constraints:
>   Resource Sets:
>     set ha-ip nfs-server vm_storage (id:colocation-ha-ip-nfs-server-
> INFINITY-0) setoptions score=INFINITY (id:colocation-ha-ip-nfs-
> server-INFINITY)

I don't think your constraints are causing problems, but sets can be
difficult to follow. Your ordering/colocation constraints could be more
simply expressed as a group of nfs-server vm_storage ha-ip. With a
group, the cluster will do both ordering and colocation, in forward
order for start, and reverse order for stop.

> Ticket Constraints:
> 
> Alerts:
>  No alerts defined
> 
> Resources Defaults:
>  resource-stickiness: 100
> Operations Defaults:
>  No defaults set
> 
> Cluster Properties:
>  cluster-infrastructure: corosync
>  cluster-name: zfs-vmstorage
>  dc-version: 1.1.16-94ff4df
>  have-watchdog: false
>  last-lrm-refresh: 1528814481
>  no-quorum-policy: stop
>  stonith-enabled: false

^^^ You have to explicitly set stonith-enabled to true since it was set
to false earlier

BTW, IPMI is a good fencing method, but it has a problem if it's on-
board. If the host loses power entirely, IPMI will not respond, the
fencing will fail, and the cluster will be unable to recover. On-board
IPMI requires a back-up method such as an intelligent power switch or
sdb.

> 
> Quorum:
>   Options:
> 
> 
> 
> thanks for help!
> best regards
> Stefan
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> https://lists.clusterlabs.org/mailman/listinfo/users
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.
> pdf
> Bugs: http://bugs.clusterlabs.org
-- 
Ken Gaillot <kgaillot at redhat.com>