[Pacemaker] Very strange behavior on asymmetric cluster

Wed Mar 16 07:29:49 UTC 2011

2011/3/15 Arthur B. Olsen <ABO at ft.fo>:
>
>
> Tann 3/14/11 8:33 PM skrivaði "Pavel Levshin" <pavel at levshin.spb.ru>:
>
>>
>>14.03.2011 23:07, Arthur B. Olsen:
>>> If a mysql server is running on a cluster node which is not defined to
>>> run the mysql resource, pacemaker will mark it unmanaged and will not
>>> start it on the node which it is suppose to run on. Same goes for
>>> nfs-common. On my nfs servers nfs-common and nfs-kernel-server
>>> resources should be running, and all others have nfs-common installed.
>>> So pacemaker will just pick one random node marking the nfs-common
>>> resource as running unmanaged and will not start it where i
>>> specifically told it to run.
>>>
>>> Likewise i can not have two drbd raid on different pair of node with
>>> the same name. My two nfs servers hava a drbd raid between them and my
>>> mysql servers hava a drbd raid running between them. Both had their
>>> resources called r0 and pacemaker one to be slave and one to be
>>> master, completely disregarding my location rules. Same with the mount
>>> point. I can't use the same folder name  on both sql and nfs server to
>>> mount the drbd0 disk in, because pacemaker will concider it mounted
>>> and not try to mount the second. Changing the names of the resources
>>> and the mount point solved the drbd issues.
>>>
>>> Right now a mysql process is running as test on a web server in the
>>> cluster, and pacemaker will not start it on my sql servers, same for
>>> my nfs servers.
>>>
>>> What i dont understand i why is pacemaker trying to monitor service on
>>> nodes that are not supposed to run the service. And why does it stop
>>> the service on the node that are supposed to run the service.
>>
>>Resources come unmanaged because you have fencing disabled and resource
>>agent fails to "monitor" and "stop" on some node where it is not needed
>>at all.
>>
>>You have not a way to tell the cluster that it is not supposed to run a
>>service on some nodes. I believe this is a pacemaker's deficiency.
>
> I thought that symmetric-cluster=true was exactly that.

symmetric-cluster=false means it can't run anywhere unless you
explicitly say it can.
However, Pacemaker will still check the status of _all_ resources on
_all_ nodes to make sure this is true.

>
>>
>>Currently, you have two ways:
>>
>>1. Delete resource agents from those servers which are not supposed to
>>run it.
>>
>>2. Or make sure those "unused" resource agents return 5 "not installed"
>>for monitor action. If they return anything else, you have your trouble.
>>
>>You may also divide your cluster into two or three independent clusters,
>>one per resource group.
>
> And so i did. Just made it tow independent clusters one for mysql, and one
> for nfs.
>
> Somewhat dissapointed that it didn't work. Seems like a better design to
> have all in one cluster, for qourum and to share failover for some
> resources like dns.
>
> I still don't understand why it monitors services by default on all nodes
> when the cluster is assymetrical

Because we don't just make assumptions about what the state of the
cluster is - we verify them.