[Pacemaker] Resource Group Scoring - failover node showing -1000000

Thu Aug 11 08:43:24 EDT 2011

I'm having a hard time trying to understand the scoring that is displayed
for a Resource group.

I'm trying to accomplish a Resource group with two resources (an LVM and a
LUN) that runs in an Active/Passive method, that only attempts to run on two
nodes the Primary s02ns070 and the secondary s02ns090.

Everything appears to work correctly for s02ns070 (Primary node) except that
the scoring for the group_color and native_color of the lun (resMDT0000) is
not displaying properly, see snip below (but it works).
The backup node s02ns090 does not seem to have the proper scoring for the
native_color for both the LVM and the LUN.  It currently shows -1000000,
which is why it's not failing over.

I'm trying to figure out where the -1000000 score is coming from.

Here are the relevant portions of my config file:

<snip>
primitive resMDT0000 ocf:heartbeat:Filesystem \
    meta target-role="Started" \
    operations $id="resMDT0000-operations" \
    op monitor interval="120" timeout="60" \
    op start interval="0" timeout="300" \
    op stop interval="0" timeout="300" \
    params device="/dev/mapper/dsdw_mdt_vg-dsdw_mdt_vol"
directory="/lustre/dsdw-MDT0000" fstype="lustre"
primitive resMDTLVM ocf:heartbeat:LVM \
    params volgrpname="dsdw_mdt_vg"
group MDSgroup resMDTLVM resMDT0000
location locMDSprimary MDSgroup inf: s02ns070
location locMDSsecondary MDSgroup 5000: s02ns090
colocation colocMDSOSS1 -inf: anchorOSS1 MDSgroup
colocation colocMDSOSS2 -inf: anchorOSS2 MDSgroup
colocation colocMDSOSS3 -inf: anchorOSS3 MDSgroup
colocation colocMDSOSS4 -inf: anchorOSS4 MDSgroup
<snip>

On first startup of the cluster the following scores are set to the relevant
nodes:  found using ptest -Ls

<snip>
group_color: MDSgroup allocation score on s02ns070: 1000000
group_color: MDSgroup allocation score on s02ns090: 5000
group_color: resMDTLVM allocation score on s02ns070: 1000000
group_color: resMDTLVM allocation score on s02ns090: 5000

group_color: resMDT0000 allocation score on s02ns070: 0
group_color: resMDT0000 allocation score on s02ns090: 0

native_color: resMDTLVM allocation score on s02ns070: 1000000
native_color: resMDTLVM allocation score on s02ns090: -1000000

native_color: resMDT0000 allocation score on s02ns070: 0
native_color: resMDT0000 allocation score on s02ns090: -1000000
<snip>

On top of this the secondary node is trying to start resources that it
shouldn't have access to (according to how I think I have colocation set up)

I have attached an hb_report from the time I start both nodes until it
settles in the odd configuration of primary node holding the resource and
the secondary node trying to start other seemingly random resources.

I have looked into the Asymmetrical "opt-in" clusters from
http://www.clusterlabs.org/doc/en-US/Pacemaker/1.0/html/Pacemaker_Explained/ch06s02s02.htmland
I am wondering if this will fix some (if not all) of my issues with
the
secondary node.  I have also checked out the Master/Slave configuration but
I'm not sure that's what I am looking for since LVMs and the LUN can not
(and should not) be started in more than one place.

My questions are:
1) Why does the resource resMDT0000 not seem to pull the proper scoring both
in the group_color or that native_color? And what is it about my
configuration that I set up wrong to make this happen?
2) Is there a way to 'reset' scoring or force a score recalculation?
3) What would be the proper debug tool to use to find out where and what is
changing/affecting the scores?

Any help would be greatly appreciated.

Bobbie Lind
Systems Engineer
*Solutions Made Simple, Inc (SMSi)*
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.clusterlabs.org/pipermail/pacemaker/attachments/20110811/27c1d4c3/attachment-0002.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: hb_report.tar.bz2
Type: application/x-bzip2
Size: 191715 bytes
Desc: not available
URL: <http://lists.clusterlabs.org/pipermail/pacemaker/attachments/20110811/27c1d4c3/attachment-0002.bz2>