[ClusterLabs Developers] OCF_RESKEY_CRM_meta_notify_active_* always empty

Fri Jul 29 22:32:14 UTC 2016

I finally had time to investigate this, and it definitely is broken.

The only existing heartbeat RA to use the *_notify_active_* variables is
Filesystem, and it only does so for OCFS2 on SLES10, which didn't even
ship pacemaker, so I'm guessing it's been broken from the beginning of
pacemaker.

The fix looks straightforward, so I should be able to take care of it soon.

Filed bug http://bugs.clusterlabs.org/show_bug.cgi?id=5295

On 05/08/2016 04:57 AM, Jehan-Guillaume de Rorthais wrote:
> Le Fri, 6 May 2016 15:41:11 -0500,
> Ken Gaillot <kgaillot at redhat.com> a écrit :
> 
>> On 05/03/2016 05:30 PM, Jehan-Guillaume de Rorthais wrote:
>>> Le Tue, 3 May 2016 21:10:12 +0200,
>>> Jehan-Guillaume de Rorthais <jgdr at dalibo.com> a écrit :
>>>
>>>> Le Mon, 2 May 2016 17:59:55 -0500,
>>>> Ken Gaillot <kgaillot at redhat.com> a écrit :
>>>>
>>>>> On 04/28/2016 04:47 AM, Jehan-Guillaume de Rorthais wrote:
>>>>>> Hello all,
>>>>>>
>>>>>> While testing and experiencing with our RA for PostgreSQL, I found the
>>>>>> meta_notify_active_* variables seems always empty. Here is an example of
>>>>>> these variables as they are seen from our RA during a
>>>>>> migration/switchover:
>>>>>>
>>>>>>
>>>>>>   {
>>>>>>     'type' => 'pre',
>>>>>>     'operation' => 'demote',
>>>>>>     'active' => [],
>>>>>>     'inactive' => [],
>>>>>>     'start' => [],
>>>>>>     'stop' => [],
>>>>>>     'demote' => [
>>>>>>                   {
>>>>>>                     'rsc' => 'pgsqld:1',
>>>>>>                     'uname' => 'hanode1'
>>>>>>                   }
>>>>>>                 ],
>>>>>>     
>>>>>>     'master' => [
>>>>>>                   {
>>>>>>                     'rsc' => 'pgsqld:1',
>>>>>>                     'uname' => 'hanode1'
>>>>>>                   }
>>>>>>                 ],
>>>>>>     
>>>>>>     'promote' => [
>>>>>>                    {
>>>>>>                      'rsc' => 'pgsqld:0',
>>>>>>                      'uname' => 'hanode3'
>>>>>>                    }
>>>>>>                  ],
>>>>>>     'slave' => [
>>>>>>                  {
>>>>>>                    'rsc' => 'pgsqld:0',
>>>>>>                    'uname' => 'hanode3'
>>>>>>                  },
>>>>>>                  {
>>>>>>                    'rsc' => 'pgsqld:2',
>>>>>>                    'uname' => 'hanode2'
>>>>>>                  }
>>>>>>                ],
>>>>>>     
>>>>>>   }
>>>>>>
>>>>>> In case this comes from our side, here is code building this:
>>>>>>
>>>>>>   https://github.com/dalibo/PAF/blob/6e86284bc647ef1e81f01f047f1862e40ba62906/lib/OCF_Functions.pm#L444
>>>>>>
>>>>>> But looking at the variable itself in debug logs, I always find it empty,
>>>>>> in various situations (switchover, recover, failover).
>>>>>>
>>>>>> If I understand the documentation correctly, I would expect 'active' to
>>>>>> list all the three resources, shouldn't it? Currently, to bypass this, we
>>>>>> consider: active == master + slave
>>>>>
>>>>> You're right, it should. The pacemaker code that generates the "active"
>>>>> variables is the same used for "demote" etc., so it seems unlikely the
>>>>> issue is on pacemaker's side. Especially since your code treats active
>>>>> etc. differently from demote etc., it seems like it must be in there
>>>>> somewhere, but I don't see where.
>>>>
>>>> The code treat active, inactive, start and stop all together, for any
>>>> cloned resource. If the resource is a multistate, it adds promote, demote,
>>>> slave and master.
>>>>
>>>> Note that from this piece of code, the 7 other notify vars are set
>>>> correctly: start, stop, inactive, promote, demote, slave, master. Only
>>>> active is always missing.
>>>>
>>>> I'll investigate and try to find where is hiding the bug.
>>>
>>> So I added a piece of code to dump the **all** the environment variables to
>>> a temp file as early as possible **to avoid any interaction with our perl
>>> module** in the code of the RA, ie.:
>>>
>>>   BEGIN {
>>>     use Time::HiRes qw(time);
>>>     my $now = time;
>>>     open my $fh, ">", "/tmp/test-$now.env.txt";
>>>     printf($fh "%-20s = ''%s''\n", $_, $ENV{$_}) foreach sort keys %ENV;
>>>   }
>>>
>>> Then I started my cluster and set maintenance-mode=false while no resources
>>> where running. So the debug files contains the probe action, start on all
>>> nodes, one promote on the master and the first monitors. The "*active"
>>> variables are always empty anywhere in the cluster. Find in attachment the
>>> result of the following command on the master node:
>>>
>>>   for i in test-*; do echo "===== $i ====="; grep OCF_ $i; done >
>>> debug-env.txt
>>>
>>> I'm using Pacemaker 1.1.13-10.el7_2.2-44eb2dd under CentOS 7.2.1511.
>>>
>>> For completeness, I added the Pacemaker configuration I use for my 3 node
>>> dev/test cluster.
>>>
>>> Let me know if you think of more investigations and test I could run on this
>>> issue. I'm out of ideas for tonight (and I really would prefer having this
>>> bug on my side).
>>
>> From your environment dumps, what I think is happening is that you are
>> getting multiple notifications (start, pre-promote, post-promote) in a
>> single cluster transition. So the variables reflect the initial state of
>> that transition -- none of the instances are active, all three are being
>> started (so the nodes are in the "*_start_*" variables), and one is
>> being promoted.
> 
> 
> Yes, this is what happening here. It's embarrassing I didn't thought about
> that :)
> 
>> The starts will be done before the promote. If one of the starts fails,
>> the transition will be aborted, and a new one will be calculated. So, if
>> you get to the promote, you can assume anything in "*_start_*" is now
>> active.
> 
> I did another simple test:
> 
>   * 3 ms clones are running on hanode1 hanode2 hanode3
>   * master role is on hanode1
>   * I move the master role to hanode 2 using: 
>     "pcs resource move pgsql-ha hanode2 --master"
> 
> The transition gives us:
> 
>   * demote on hanode1
>   * promote en hanode2
> 
> I suppose all the three clone on hanode1, hanode2 and hanode3 should appear in
> active env variable in this context, isn't it?
> 
> Please, find in attachment the environment dumps of this transition from
> hanode1. You'll see both "OCF_RESKEY_CRM_meta_notify_active_resource" and
> "OCF_RESKEY_CRM_meta_notify_active_uname" only contains one char: a space.
> 
> I start looking at the Pacemaker code, at least to have a better understanding
> on where environment variables are set and when they are available. I was out
> of luck so far but I lack of time. Any pointers would be appreciated :)
> 
>>> On a side note, I noticed with these debug files that the notify
>>> variables where also available outside of notify actions (start and notify
>>> here). Are they always available during "transition actions" (start, stop,
>>> promote, demote)? Checking at the mysql RA, they are using
>>> OCF_RESKEY_CRM_meta_notify_master_uname during the start action. So I
>>> suppose it's safe?
>>
>> Good question, I've never tried that before. I'm reluctant to say it's
>> guaranteed; it's possible seeing them in the start action is a side
>> effect of the current implementation and could theoretically change in
>> the future. But if mysql is relying on it, I suppose it's
>> well-established already, making changing it unlikely ...
> 
> Thank you very much for this clarification. Presently we keep in a private
> attribute what we //think// (we can not rely on active_uname :/) are the active
> uname for the ms resource. As it seems the notify vars appears outside of notify
> action is just a side effect of the current implementation, I prefer to stay
> away from them when we are not in a notify action and keep our current
> implementation.
> 
> Thank you,
>