[ClusterLabs Developers] OCF_RESKEY_CRM_meta_notify_active_* always empty

Tue Aug 2 11:02:36 EDT 2016

On 08/02/2016 03:10 AM, Jehan-Guillaume de Rorthais wrote:
> Le Mon, 1 Aug 2016 12:00:24 -0500,
> Ken Gaillot <kgaillot at redhat.com> a écrit :
> 
>> On 08/01/2016 11:18 AM, Jehan-Guillaume de Rorthais wrote:
>>> Le Mon, 1 Aug 2016 10:27:53 -0500,
>>> Ken Gaillot <kgaillot at redhat.com> a écrit :
>>>
>>>> On 07/29/2016 06:19 PM, Andrew Beekhof wrote:
>>>>> Urgh. I must be confused with sles11. 
>>>>> In any case, the first version of pacemaker was identical to the last
>>>>> heartbeat crm. 
>>>>>
>>>>> I don't recall the ocfs2 agent changing design while I was there, so 11
>>>>> may be broken too
>>>>
>>>> I just realized *_active_* is only broken for master/slave clones.
>>>> Filesystem is not master/slave, so it wouldn't have any issue.
>>>
>>> Well, I'm glad we are the first RA using it :)
>>>
>>> I wonders how other m/s RA are doing without it. We are using it (actually
>>> "master + slave + start - stop" because of the bug) to check during a
>>> promotion after a failover if the resource being promoted is the best one
>>> among the known ones.
>>
>> Yes, the very simple workaround is simply to set active = master +
>> slave; that's all the pacemaker fix will do. You'll still need the "+
>> start - stop" to get the situation after the action.
> 
> Ok, thank you for the confirmation.
> 
>> Coincidentally, we need to bump crm_feature_set to 3.0.11 anyway, so
>> you'll be able to test that to tell whether *_active_* is correct, if
>> desired.
> 
> I will test it.

fix is merged in master branch

> 
>> There is an ocf_version_cmp function in ocf-shellfuncs.
> 
> Our RA is written in perl...but we have ported most of ocf-shellfuncs in a perl
> module, including this function :)
> 
> cf. [ClusterLabs Developers] Perl Modules for resource agents
>     Thu, 26 Nov 2015 01:13:36 +0100
> 
>>>>>> On 30 Jul 2016, at 8:51 AM, Ken Gaillot <kgaillot at redhat.com> wrote:
>>>>>>
>>>>>>> On 07/29/2016 05:41 PM, Andrew Beekhof wrote:
>>>>>>>
>>>>>>>
>>>>>>> Sent from my iPhone
>>>>>>>
>>>>>>>> On 30 Jul 2016, at 8:32 AM, Ken Gaillot <kgaillot at redhat.com> wrote:
>>>>>>>>
>>>>>>>> I finally had time to investigate this, and it definitely is broken.
>>>>>>>>
>>>>>>>> The only existing heartbeat RA to use the *_notify_active_* variables
>>>>>>>> is Filesystem, and it only does so for OCFS2 on SLES10, which didn't
>>>>>>>> even ship pacemaker,
>>>>>>>
>>>>>>> I'm pretty sure it did
>>>>>>
>>>>>> All I could find was:
>>>>>>
>>>>>> "SLES 10 did not yet ship pacemaker, but heartbeat with the builtin crm"
>>>>>>
>>>>>> http://oss.clusterlabs.org/pipermail/pacemaker/2014-July/022232.html
>>>>>>
>>>>>> I'm sure people were compiling it, and ClusterLabs probably even
>>>>>> provided a repo, but it looks like sles didn't ship it.
>>>>>>
>>>>>> The issue is that the code that builds the active list checks for role
>>>>>> RSC_ROLE_STARTED rather than RSC_ROLE_SLAVE + RSC_ROLE_MASTER, so I
>>>>>> don't think it ever would have worked.
>>>>>>
>>>>>>>
>>>>>>>> so I'm guessing it's been broken from the beginning of
>>>>>>>> pacemaker.
>>>>>>>>
>>>>>>>> The fix looks straightforward, so I should be able to take care of it
>>>>>>>> soon.
>>>>>>>>
>>>>>>>> Filed bug http://bugs.clusterlabs.org/show_bug.cgi?id=5295
>>>>>>>>
>>>>>>>>> On 05/08/2016 04:57 AM, Jehan-Guillaume de Rorthais wrote:
>>>>>>>>> Le Fri, 6 May 2016 15:41:11 -0500,
>>>>>>>>> Ken Gaillot <kgaillot at redhat.com> a écrit :
>>>>>>>>>
>>>>>>>>>>> On 05/03/2016 05:30 PM, Jehan-Guillaume de Rorthais wrote:
>>>>>>>>>>> Le Tue, 3 May 2016 21:10:12 +0200,
>>>>>>>>>>> Jehan-Guillaume de Rorthais <jgdr at dalibo.com> a écrit :
>>>>>>>>>>>
>>>>>>>>>>>> Le Mon, 2 May 2016 17:59:55 -0500,
>>>>>>>>>>>> Ken Gaillot <kgaillot at redhat.com> a écrit :
>>>>>>>>>>>>
>>>>>>>>>>>>>> On 04/28/2016 04:47 AM, Jehan-Guillaume de Rorthais wrote:
>>>>>>>>>>>>>> Hello all,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> While testing and experiencing with our RA for PostgreSQL, I
>>>>>>>>>>>>>> found the meta_notify_active_* variables seems always empty.
>>>>>>>>>>>>>> Here is an example of these variables as they are seen from our
>>>>>>>>>>>>>> RA during a migration/switchover:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> {
>>>>>>>>>>>>>>   'type' => 'pre',
>>>>>>>>>>>>>>   'operation' => 'demote',
>>>>>>>>>>>>>>   'active' => [],
>>>>>>>>>>>>>>   'inactive' => [],
>>>>>>>>>>>>>>   'start' => [],
>>>>>>>>>>>>>>   'stop' => [],
>>>>>>>>>>>>>>   'demote' => [
>>>>>>>>>>>>>>                 {
>>>>>>>>>>>>>>                   'rsc' => 'pgsqld:1',
>>>>>>>>>>>>>>                   'uname' => 'hanode1'
>>>>>>>>>>>>>>                 }
>>>>>>>>>>>>>>               ],
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>   'master' => [
>>>>>>>>>>>>>>                 {
>>>>>>>>>>>>>>                   'rsc' => 'pgsqld:1',
>>>>>>>>>>>>>>                   'uname' => 'hanode1'
>>>>>>>>>>>>>>                 }
>>>>>>>>>>>>>>               ],
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>   'promote' => [
>>>>>>>>>>>>>>                  {
>>>>>>>>>>>>>>                    'rsc' => 'pgsqld:0',
>>>>>>>>>>>>>>                    'uname' => 'hanode3'
>>>>>>>>>>>>>>                  }
>>>>>>>>>>>>>>                ],
>>>>>>>>>>>>>>   'slave' => [
>>>>>>>>>>>>>>                {
>>>>>>>>>>>>>>                  'rsc' => 'pgsqld:0',
>>>>>>>>>>>>>>                  'uname' => 'hanode3'
>>>>>>>>>>>>>>                },
>>>>>>>>>>>>>>                {
>>>>>>>>>>>>>>                  'rsc' => 'pgsqld:2',
>>>>>>>>>>>>>>                  'uname' => 'hanode2'
>>>>>>>>>>>>>>                }
>>>>>>>>>>>>>>              ],
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> In case this comes from our side, here is code building this:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> https://github.com/dalibo/PAF/blob/6e86284bc647ef1e81f01f047f1862e40ba62906/lib/OCF_Functions.pm#L444
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> But looking at the variable itself in debug logs, I always find
>>>>>>>>>>>>>> it empty, in various situations (switchover, recover, failover).
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> If I understand the documentation correctly, I would expect
>>>>>>>>>>>>>> 'active' to list all the three resources, shouldn't it?
>>>>>>>>>>>>>> Currently, to bypass this, we consider: active == master + slave
>>>>>>>>>>>>>
>>>>>>>>>>>>> You're right, it should. The pacemaker code that generates the
>>>>>>>>>>>>> "active" variables is the same used for "demote" etc., so it seems
>>>>>>>>>>>>> unlikely the issue is on pacemaker's side. Especially since your
>>>>>>>>>>>>> code treats active etc. differently from demote etc., it seems
>>>>>>>>>>>>> like it must be in there somewhere, but I don't see where.
>>>>>>>>>>>>
>>>>>>>>>>>> The code treat active, inactive, start and stop all together, for
>>>>>>>>>>>> any cloned resource. If the resource is a multistate, it adds
>>>>>>>>>>>> promote, demote, slave and master.
>>>>>>>>>>>>
>>>>>>>>>>>> Note that from this piece of code, the 7 other notify vars are set
>>>>>>>>>>>> correctly: start, stop, inactive, promote, demote, slave, master.
>>>>>>>>>>>> Only active is always missing.
>>>>>>>>>>>>
>>>>>>>>>>>> I'll investigate and try to find where is hiding the bug.
>>>>>>>>>>>
>>>>>>>>>>> So I added a piece of code to dump the **all** the environment
>>>>>>>>>>> variables to a temp file as early as possible **to avoid any
>>>>>>>>>>> interaction with our perl module** in the code of the RA, ie.:
>>>>>>>>>>>
>>>>>>>>>>> BEGIN {
>>>>>>>>>>>   use Time::HiRes qw(time);
>>>>>>>>>>>   my $now = time;
>>>>>>>>>>>   open my $fh, ">", "/tmp/test-$now.env.txt";
>>>>>>>>>>>   printf($fh "%-20s = ''%s''\n", $_, $ENV{$_}) foreach sort keys
>>>>>>>>>>> %ENV; }
>>>>>>>>>>>
>>>>>>>>>>> Then I started my cluster and set maintenance-mode=false while no
>>>>>>>>>>> resources where running. So the debug files contains the probe
>>>>>>>>>>> action, start on all nodes, one promote on the master and the first
>>>>>>>>>>> monitors. The "*active" variables are always empty anywhere in the
>>>>>>>>>>> cluster. Find in attachment the result of the following command on
>>>>>>>>>>> the master node:
>>>>>>>>>>>
>>>>>>>>>>> for i in test-*; do echo "===== $i ====="; grep OCF_ $i; done >
>>>>>>>>>>> debug-env.txt
>>>>>>>>>>>
>>>>>>>>>>> I'm using Pacemaker 1.1.13-10.el7_2.2-44eb2dd under CentOS 7.2.1511.
>>>>>>>>>>>
>>>>>>>>>>> For completeness, I added the Pacemaker configuration I use for my 3
>>>>>>>>>>> node dev/test cluster.
>>>>>>>>>>>
>>>>>>>>>>> Let me know if you think of more investigations and test I could run
>>>>>>>>>>> on this issue. I'm out of ideas for tonight (and I really would
>>>>>>>>>>> prefer having this bug on my side).
>>>>>>>>>>
>>>>>>>>>> From your environment dumps, what I think is happening is that you
>>>>>>>>>> are getting multiple notifications (start, pre-promote,
>>>>>>>>>> post-promote) in a single cluster transition. So the variables
>>>>>>>>>> reflect the initial state of that transition -- none of the
>>>>>>>>>> instances are active, all three are being started (so the nodes are
>>>>>>>>>> in the "*_start_*" variables), and one is being promoted.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Yes, this is what happening here. It's embarrassing I didn't thought
>>>>>>>>> about that :)
>>>>>>>>>
>>>>>>>>>> The starts will be done before the promote. If one of the starts
>>>>>>>>>> fails, the transition will be aborted, and a new one will be
>>>>>>>>>> calculated. So, if you get to the promote, you can assume anything
>>>>>>>>>> in "*_start_*" is now active.
>>>>>>>>>
>>>>>>>>> I did another simple test:
>>>>>>>>>
>>>>>>>>> * 3 ms clones are running on hanode1 hanode2 hanode3
>>>>>>>>> * master role is on hanode1
>>>>>>>>> * I move the master role to hanode 2 using: 
>>>>>>>>>   "pcs resource move pgsql-ha hanode2 --master"
>>>>>>>>>
>>>>>>>>> The transition gives us:
>>>>>>>>>
>>>>>>>>> * demote on hanode1
>>>>>>>>> * promote en hanode2
>>>>>>>>>
>>>>>>>>> I suppose all the three clone on hanode1, hanode2 and hanode3 should
>>>>>>>>> appear in active env variable in this context, isn't it?
>>>>>>>>>
>>>>>>>>> Please, find in attachment the environment dumps of this transition
>>>>>>>>> from hanode1. You'll see both
>>>>>>>>> "OCF_RESKEY_CRM_meta_notify_active_resource" and
>>>>>>>>> "OCF_RESKEY_CRM_meta_notify_active_uname" only contains one char: a
>>>>>>>>> space.
>>>>>>>>>
>>>>>>>>> I start looking at the Pacemaker code, at least to have a better
>>>>>>>>> understanding on where environment variables are set and when they are
>>>>>>>>> available. I was out of luck so far but I lack of time. Any pointers
>>>>>>>>> would be appreciated :)
>>>>>>>>>
>>>>>>>>>>> On a side note, I noticed with these debug files that the notify
>>>>>>>>>>> variables where also available outside of notify actions (start and
>>>>>>>>>>> notify here). Are they always available during "transition
>>>>>>>>>>> actions" (start, stop, promote, demote)? Checking at the mysql RA,
>>>>>>>>>>> they are using OCF_RESKEY_CRM_meta_notify_master_uname during the
>>>>>>>>>>> start action. So I suppose it's safe?
>>>>>>>>>>
>>>>>>>>>> Good question, I've never tried that before. I'm reluctant to say
>>>>>>>>>> it's guaranteed; it's possible seeing them in the start action is a
>>>>>>>>>> side effect of the current implementation and could theoretically
>>>>>>>>>> change in the future. But if mysql is relying on it, I suppose it's
>>>>>>>>>> well-established already, making changing it unlikely ...
>>>>>>>>>
>>>>>>>>> Thank you very much for this clarification. Presently we keep in a
>>>>>>>>> private attribute what we //think// (we can not rely on
>>>>>>>>> active_uname :/) are the active uname for the ms resource. As it seems
>>>>>>>>> the notify vars appears outside of notify action is just a side effect
>>>>>>>>> of the current implementation, I prefer to stay away from them when we
>>>>>>>>> are not in a notify action and keep our current implementation.
>>>>>>>>>
>>>>>>>>> Thank you,
>>>
>>>
>>>
>>
> 
> 
>