[ClusterLabs] FYI: regression using 2.0.0 / 1.1.19 Pacemaker Remote node with older cluster nodes

Tue Jul 17 20:57:06 UTC 2018

Upon further investigation, there is no problem when resource agents
are called by the cluster, which thankfully makes this issue less
significant.

The problem occurs when "crm_node -n" is called on the command line or
by a script, on a Pacemaker Remote node running 1.1.19 or 2.0.0 or
later, with cluster nodes running 1.1.18 or earlier. Upgrading cluster
nodes before Pacemaker Remote nodes avoids the issue.

If you have any custom resource agents, a good practice is to make sure
that they do not call any unnecessary commands (including "crm_node -n"
or "ocf_local_nodename") for meta-data actions. This will not only be
more efficient, but also make command-line meta-data calls immune to
issues like this.

A complete solution would make every command-line "crm_node -n" call
take longer and have more chances to fail, so I'm inclined to leave
this as a known issue, and rely on the workarounds.

On Mon, 2018-07-16 at 09:21 -0500, Ken Gaillot wrote:
> Hi all,
> 
> The just-released Pacemaker 2.0.0 and 1.1.19 releases have an issue
> when a Pacemaker Remote node is upgraded before the cluster nodes.
> 
> Pacemaker 2.0.0 contains a fix (also backported to 1.1.19) for the
> longstanding issue of "crm_node -n" getting the wrong name when run
> on
> the command line of a Pacemaker Remote node whose node name is
> different from its local hostname.
> 
> However, the fix can cause resource agents running on a Pacemaker
> Remote node to hang when used with a cluster node older than 2.0.0 /
> 1.1.19.
> 
> The only workaround is to upgrade all cluster nodes before upgrading
> any Pacemaker Remote nodes (which is the recommended practice
> anyway).
-- 
Ken Gaillot <kgaillot at redhat.com>