[Pacemaker] Reason for cluster resource migration

Fri Dec 7 02:36:27 UTC 2012

On Wed, Dec 5, 2012 at 8:29 AM, Andrew Martin <amartin at xes-inc.com> wrote:
> Hello,
>
> I am running a 3-node Pacemaker cluster (2 "real" nodes and 1 quorum node in
> standby) on Ubuntu 12.04 server (amd64) with Pacemaker 1.1.8 and Corosync
> 2.1.0. My cluster configuration is:
> http://pastebin.com/6TPkWtbt
>
> Recently, pengine died on storage0 (where the resources were running) which
> also happened to be the DC at the time. Consequently, Pacemaker went into
> recovery mode and released its role as DC, at which point storage1 took over
> the DC role and migrated the resources away from storage0 and onto storage1.
> Looking through the logs, it seems like storage0 came back into the cluster
> before the migration of the resources began:
> Dec 03 08:31:20 [3165] storage1       crmd:     info: peer_update_callback:
> Client storage0/peer now has status [online] (DC=true)
> ...
> Dec 03 08:31:20 [3164] storage1    pengine:   notice: LogActions:
> Start   rscXXX    (storage1)
>
> Thus, why did the migration occur, rather than aborting and having the
> resources simply remain running on storage0? Here are the logs from each of
> the nodes:
> storage0: http://pastebin.com/ZqqnH9uf
> storage1: http://pastebin.com/rvSLVcZs

Hmm, thats an interesting one.
Can you provide this file?  It will hold the answer:

Dec 03 08:31:31 [3164] storage1    pengine:   notice:
process_pe_message: 	Calculated Transition 1:
/var/lib/pacemaker/pengine/pe-input-28.bz2

>
> Thanks,
>
> Andrew
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>