[ClusterLabs] Forcing rgmanager to re-read the config

Digimer lists at alteeve.ca
Tue Jul 14 13:58:40 EDT 2015


On 14/07/15 10:34 AM, Jan Pokorný wrote:
> On 08/07/15 17:41 -0700, Digimer wrote:
>> Second time in a couple of years I hit a weird bug... I have no idea how
>> to reproduce it, sadly.
>>
>> I added a VM to cluster.conf via ccs and it showed up on node 1 but not
>> node 2. I confirmed that node 2 had the updated cluster.conf, but the
>> running process wouldn't see it.
> 
> Could be weird synchronization problem as touched at the bottom of
> https://bugzilla.redhat.com/show_bug.cgi?id=1157951#c2
> 
> Just to limit already limited reproducibility (as I blindly expect those
> two cases to be related), I would highly recommend using equivalent of
> ccs-0.16.2-75.el6 or newer (from 6.7 Beta perhaps?).
> 
>> I tried up'ing the version and running 'cman_tool version -r', no bueno
>> (tried on both nodes). I tried deleting the VM, pushing the new
>> cluster.conf and it disappeared from node 1 as expected. Added it back,
>> pushed again and again, only showed up on node 1's 'clustat'. I tried
>> again on the other node. No bueno...
>>
>> So I tried freezing all of the services on both nodes and restart
>> rgmanager on both nodes. The stop didn't touch the services, but
>> starting it back up tore down and restarted the services, messing up the
>> VMs. =/
> 
> I would expect there is a lot of undefined behavior when (un)freezing
> under uneven circumstances.
> 
>> I noticed on node 2 the following during shutdown:
>>
>> ========
>> Jul  8 20:24:57 rm-a01n02 rgmanager[3971]: Shutting down
>> Jul  8 20:25:07 rm-a01n02 rgmanager[3971]: Shutting down
>> Jul  8 20:25:08 rm-a01n02 rgmanager[3971]: Member 1 shutting down
>> Jul  8 20:25:13 rm-a01n02 rgmanager[3971]: Initializing vm:vm07-rhel6-temp
>> Jul  8 20:25:13 rm-a01n02 rgmanager[3971]: vm:vm07-rhel6-temp was added
>> to the config, but I am not initializing it.
>> Jul  8 20:25:13 rm-a01n02 rgmanager[3971]: Reconfiguring
>> Jul  8 20:25:16 rm-a01n02 rgmanager[3971]: Reconfiguring
>> Jul  8 20:25:17 rm-a01n02 rgmanager[3971]: Disconnecting from CMAN
>> Jul  8 20:25:18 rm-a01n02 rgmanager[3971]: Reconfiguring
>> Jul  8 20:25:21 rm-a01n02 rgmanager[3971]: Reconfiguring
>> Jul  8 20:25:24 rm-a01n02 rgmanager[3971]: Reconfiguring
>> Jul  8 20:25:27 rm-a01n02 rgmanager[3971]: Reconfiguring
>> Jul  8 20:25:32 rm-a01n02 rgmanager[3971]: Exiting
>> ========
>>
>> Notice that at this point, the VM suddenly was found.
>>
>> So I am wondering; Is it possible to force the process above without
>> restarting rgmanager?
> 
> You can try to substitute -HUP for -USR1 in the command stated
> https://bugzilla.redhat.com/show_bug.cgi?id=1157951#c4
> but not really sure it will force re-reading the config.
> 
> Anyway, the original=dumping form might also provide some insights.

I will try to find a reproducer when I get home. As I mentioned in that
ticket, I hit it again on Sunday, but this time without using ccs. So I
think we can rule it out.

-- 
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without
access to education?




More information about the Users mailing list