[Pacemaker] [BUG] Clone + group = orphan(s) ?

Thomas Guthmann tguthmann at iseek.com.au
Mon Jan 4 01:25:53 EST 2010


Hi,

I noticed weird stuff with pacemaker when I ask it to clone a group. 
Let's say that I have a group containing 4 primitives (1 named process + 
3 IPAddr2 to load on lo). I want to clone the group tom-DNS twice.

     crm configure clone tom-DNS-clone tom-DNS meta clone-max=2

If the group is running and I add on the fly a clone it's very often 
that I will have : tom-DNS:0, tom-DNS:1 and tom-DNS:2 with one of them 
an orphan (I can have more orphans).

Then, if I want to get rid of the orphan, I can try a :

     crm_resource -r tom-DNS-clone -C.

That "cleanup" usually just makes the things even worse. It generates 
new orphans and pacemaker will move (stop then start) the 2 running 
cloned groups to one of 4 groups I have now in the clone resource. I 
don't really follow the logic and the log is so verbose I don't know 
where I should start or what to find. At the end I usually have a broken 
state. Half of the group is running and the other part not (like 2 IPs 
only and not the third one nor named).

You will find a log in attachement. Tags :
- GO  : I have just commit the clone line (above)
- DONE: everything seems to be stable now
- EOF : we have 9 tom-DNS without touching anything since GO :)

So cloned groups are not fun and side effects are random :) I will do 
more tests without IPAddr2 which seems a bit fancy and dodgy.

My question is : is it a bug or is it a possible constraints issue... ?

I also notice an increased load (from 0.5 to 3) when pacemaker has 
orphans but I don't know what it is doing. CPU is not very high. Cib 
process uses 15% of cpu and we have no disk IOs. I didn't check maybe 
it's using a lot the network... All nodes in the ring are impacted with 
that.

Tell me if you need additional information I can provide.

Cheers,
Thomas

-- 
* pacemaker 1.0.6
* corosync 1.1.2
* centos 5.4
* 4 nodes (2 of them are DNS nodes)
* asymmetrical cluster
-------------- next part --------------
A non-text attachment was scrubbed...
Name: clone-commit-after.log.gz
Type: application/x-gzip
Size: 21471 bytes
Desc: not available
URL: <http://lists.clusterlabs.org/pipermail/pacemaker/attachments/20100104/8397f51d/attachment.bin>


More information about the Pacemaker mailing list