[ClusterLabs Developers] OCF_RESKEY_CRM_meta_notify_active_* always empty

Jehan-Guillaume de Rorthais jgdr at dalibo.com
Tue May 3 18:30:34 EDT 2016


Le Tue, 3 May 2016 21:10:12 +0200,
Jehan-Guillaume de Rorthais <jgdr at dalibo.com> a écrit :

> Le Mon, 2 May 2016 17:59:55 -0500,
> Ken Gaillot <kgaillot at redhat.com> a écrit :
> 
> > On 04/28/2016 04:47 AM, Jehan-Guillaume de Rorthais wrote:
> > > Hello all,
> > > 
> > > While testing and experiencing with our RA for PostgreSQL, I found the
> > > meta_notify_active_* variables seems always empty. Here is an example of
> > > these variables as they are seen from our RA during a
> > > migration/switchover:
> > > 
> > > 
> > >   {
> > >     'type' => 'pre',
> > >     'operation' => 'demote',
> > >     'active' => [],
> > >     'inactive' => [],
> > >     'start' => [],
> > >     'stop' => [],
> > >     'demote' => [
> > >                   {
> > >                     'rsc' => 'pgsqld:1',
> > >                     'uname' => 'hanode1'
> > >                   }
> > >                 ],
> > >     
> > >     'master' => [
> > >                   {
> > >                     'rsc' => 'pgsqld:1',
> > >                     'uname' => 'hanode1'
> > >                   }
> > >                 ],
> > >     
> > >     'promote' => [
> > >                    {
> > >                      'rsc' => 'pgsqld:0',
> > >                      'uname' => 'hanode3'
> > >                    }
> > >                  ],
> > >     'slave' => [
> > >                  {
> > >                    'rsc' => 'pgsqld:0',
> > >                    'uname' => 'hanode3'
> > >                  },
> > >                  {
> > >                    'rsc' => 'pgsqld:2',
> > >                    'uname' => 'hanode2'
> > >                  }
> > >                ],
> > >     
> > >   }
> > > 
> > > In case this comes from our side, here is code building this:
> > > 
> > >   https://github.com/dalibo/PAF/blob/6e86284bc647ef1e81f01f047f1862e40ba62906/lib/OCF_Functions.pm#L444
> > > 
> > > But looking at the variable itself in debug logs, I always find it empty,
> > > in various situations (switchover, recover, failover).
> > > 
> > > If I understand the documentation correctly, I would expect 'active' to
> > > list all the three resources, shouldn't it? Currently, to bypass this, we
> > > consider: active == master + slave
> > 
> > You're right, it should. The pacemaker code that generates the "active"
> > variables is the same used for "demote" etc., so it seems unlikely the
> > issue is on pacemaker's side. Especially since your code treats active
> > etc. differently from demote etc., it seems like it must be in there
> > somewhere, but I don't see where.
> 
> The code treat active, inactive, start and stop all together, for any cloned
> resource. If the resource is a multistate, it adds promote, demote, slave and
> master.
> 
> Note that from this piece of code, the 7 other notify vars are set
> correctly: start, stop, inactive, promote, demote, slave, master. Only active
> is always missing.
> 
> I'll investigate and try to find where is hiding the bug.

So I added a piece of code to dump the **all** the environment variables to a
temp file as early as possible **to avoid any interaction with our perl
module** in the code of the RA, ie.:

  BEGIN {
    use Time::HiRes qw(time);
    my $now = time;
    open my $fh, ">", "/tmp/test-$now.env.txt";
    printf($fh "%-20s = ''%s''\n", $_, $ENV{$_}) foreach sort keys %ENV;
  }

Then I started my cluster and set maintenance-mode=false while no resources
where running. So the debug files contains the probe action, start on all
nodes, one promote on the master and the first monitors. The "*active" variables
are always empty anywhere in the cluster. Find in attachment the result of
the following command on the master node:

  for i in test-*; do echo "===== $i ====="; grep OCF_ $i; done > debug-env.txt

I'm using Pacemaker 1.1.13-10.el7_2.2-44eb2dd under CentOS 7.2.1511.

For completeness, I added the Pacemaker configuration I use for my 3 node
dev/test cluster.

Let me know if you think of more investigations and test I could run on this
issue. I'm out of ideas for tonight (and I really would prefer having this bug
on my side).


On a side note, I noticed with these debug files that the notify
variables where also available outside of notify actions (start and notify
here). Are they always available during "transition actions" (start, stop,
promote, demote)? Checking at the mysql RA, they are using
OCF_RESKEY_CRM_meta_notify_master_uname during the start action. So I suppose
it's safe?

Regards,
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: debug-env.txt
URL: <https://lists.clusterlabs.org/pipermail/developers/attachments/20160504/c642d94f/attachment-0006.txt>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: cluster-setup.txt
URL: <https://lists.clusterlabs.org/pipermail/developers/attachments/20160504/c642d94f/attachment-0007.txt>


More information about the Developers mailing list