[Pacemaker] crm resource move doesn't move the resource

Sun Oct 10 12:13:03 EDT 2010

On Fri, Oct 8, 2010 at 10:05 PM, Pavlos Parissis
<pavlos.parissis at gmail.com> wrote:
> On 8 October 2010 09:29, Andrew Beekhof <andrew at beekhof.net> wrote:
>> On Fri, Oct 8, 2010 at 8:34 AM, Pavlos Parissis
>> <pavlos.parissis at gmail.com> wrote:
>>> On 8 October 2010 08:29, Andrew Beekhof <andrew at beekhof.net> wrote:
>>>> On Thu, Oct 7, 2010 at 9:58 PM, Pavlos Parissis
>>>> <pavlos.parissis at gmail.com> wrote:
>>>>>
>>>>>
>>>>> On 7 October 2010 09:01, Andrew Beekhof <andrew at beekhof.net> wrote:
>>>>>>
>>>>>> On Sat, Oct 2, 2010 at 6:31 PM, Pavlos Parissis
>>>>>> <pavlos.parissis at gmail.com> wrote:
>>>>>> > Hi,
>>>>>> >
>>>>>> > I am having again the same issue, in a different set of 3 nodes. When I
>>>>>> > try
>>>>>> > to failover manually the resource group on the standby node, the ms-drbd
>>>>>> > resource is not moved as well and as a result the resource group is not
>>>>>> > fully started, only the ip resource is started.
>>>>>> > Any ideas why I am having this issue?
>>>>>>
>>>>>> I think its a bug that was fixed recently.  Could you try the latest
>>>>>> from code Mercurial?
>>>>>
>>>>> 1.1 or 1.2 branch?
>>>>
>>>> 1.1
>>>>
>>> to save time on compiling stuff I want to use the available rpms on
>>> 1.1.3 version from rpm-next repo.
>>> But before I go and recreate the scenario, which means rebuild 3
>>> nodes, I would like to know if this bug is fixed in 1.1.3
>>
>> As I said, I believe so.
>>
>
> I've just upgraded[1] my pacemaker to 1.1.3 and stonithd can not be
> started, am I missing something?

Heartbeat based clusters need the following added to ha.cf

apiauth stonith-ng	uid=root

>
> Oct 08 21:08:01 node-02 heartbeat: [14192]: info: Starting
> "/usr/lib/heartbeat/stonithd" as uid 0  gid 0 (pid 14192)
> Oct 08 21:08:01 node-02 heartbeat: [14193]: info: Starting
> "/usr/lib/heartbeat/attrd" as uid 101  gid 103 (pid 14193)
> Oct 08 21:08:01 node-02 heartbeat: [14194]: info: Starting
> "/usr/lib/heartbeat/crmd" as uid 101  gid 103 (pid 14194)
> Oct 08 21:08:01 node-02 ccm: [14189]: info: Hostname: node-02
> Oct 08 21:08:01 node-02 cib: [14190]: WARN: ccm_connect: CCM Activation failed
> Oct 08 21:08:01 node-02 cib: [14190]: WARN: ccm_connect: CCM
> Connection failed 1 times (30 max)
> Oct 08 21:08:01 node-02 attrd: [14193]: info: Invoked: /usr/lib/heartbeat/attrd
> Oct 08 21:08:01 node-02 stonith-ng: [14192]: info: Invoked:
> /usr/lib/heartbeat/stonithd
> Oct 08 21:08:01 node-02 stonith-ng: [14192]: info:
> G_main_add_SignalHandler: Added signal handler for signal 17
> Oct 08 21:08:01 node-02 heartbeat: [14158]: WARN: Client [stonith-ng]
> pid 14192 failed authorization [no default client auth]
> Oct 08 21:08:01 node-02 heartbeat: [14158]: ERROR:
> api_process_registration_msg: cannot add client(stonith-ng)
> Oct 08 21:08:01 node-02 stonith-ng: [14192]: ERROR:
> register_heartbeat_conn: Cannot sign on with heartbeat:
> Oct 08 21:08:01 node-02 stonith-ng: [14192]: CRIT: main: Cannot sign
> in to the cluster... terminating
> Oct 08 21:08:01 node-02 heartbeat: [14158]: WARN: Managed
> /usr/lib/heartbeat/stonithd process 14192 exited with return code 100.
> Oct 08 21:08:01 node-02 crmd: [14194]: info: Invoked: /usr/lib/heartbeat/crmd
> Oct 08 21:08:01 node-02 crmd: [14194]: info: G_main_add_SignalHandler:
> Added signal handler for signal 17
> Oct 08 21:08:02 node-02 crmd: [14194]: WARN: do_cib_control: Couldn't
> complete CIB registration 1 times... pause and retry
> Oct 08 21:08:04 node-02 cib: [14190]: WARN: ccm_connect: CCM Activation failed
> Oct 08 21:08:04 node-02 cib: [14190]: WARN: ccm_connect: CCM
> Connection failed 2 times (30 max)
> Oct 08 21:08:05 node-02 crmd: [14194]: WARN: do_cib_control: Couldn't
> complete CIB registration 2 times... pause and retry
> [..snip...]
> Oct 08 21:08:33 node-02 crmd: [14194]: ERROR: te_connect_stonith:
> Sign-in failed: triggered a retry
>
>
> [1] I use CentOS 5.4 and when I did the installation I used the
> following repository
> [root at node-02 ~]# cat /etc/yum.repos.d/pacemaker.repo
> [clusterlabs]
> name=High Availability/Clustering server technologies (epel-5)
> baseurl=http://www.clusterlabs.org/rpm/epel-5
> type=rpm-md
> gpgcheck=0
> enabled=1
>
> and in order to perform the upgrade I added the following rep.
>
> [clusterlabs-next]
> name=High Availability/Clustering server technologies (epel-5-next)
> baseurl=http://www.clusterlabs.org/rpm-next/epel-5
> metadata_expire=45m
> type=rpm-md
> gpgcheck=0
> enabled=1
>
> and here is the installation/upgrade log, where you can see only
> pacemaker-libs and pacemaker were upgraded.
> Oct 03 21:06:20 Installed: libibverbs-1.1.3-2.el5.i386
> Oct 03 21:06:25 Installed: lm_sensors-2.10.7-9.el5.i386
> Oct 03 21:06:31 Installed: 1:net-snmp-5.3.2.2-9.el5_5.1.i386
> Oct 03 21:06:31 Installed: librdmacm-1.0.10-1.el5.i386
> Oct 03 21:06:32 Installed: openhpi-libs-2.14.0-5.el5.i386
> Oct 03 21:06:33 Installed: OpenIPMI-libs-2.0.16-7.el5.i386
> Oct 03 21:06:35 Installed: libesmtp-1.0.4-5.el5.i386
> Oct 03 21:06:36 Installed: cluster-glue-libs-1.0.6-1.6.el5.i386
> Oct 03 21:06:37 Installed: heartbeat-libs-3.0.3-2.3.el5.i386
> Oct 03 21:06:39 Installed: corosynclib-1.2.7-1.1.el5.i386
> Oct 03 21:06:42 Installed: cluster-glue-1.0.6-1.6.el5.i386
> Oct 03 21:06:45 Installed: resource-agents-1.0.3-2.6.el5.i386
> Oct 03 21:06:46 Installed: heartbeat-3.0.3-2.3.el5.i386
> Oct 03 21:06:47 Installed: pacemaker-libs-1.0.9.1-1.15.el5.i386
> Oct 03 21:06:49 Installed: pacemaker-1.0.9.1-1.15.el5.i386
> Oct 03 21:06:50 Installed: corosync-1.2.7-1.1.el5.i386
> Oct 08 21:06:37 Updated: pacemaker-libs-1.1.3-1.el5.i386
> Oct 08 21:06:43 Updated: pacemaker-1.1.3-1.el5.i386
>
> and my conf
> [root at node-02 log]# cibadmin -Ql|grep vali
> <cib validate-with="pacemaker-1.0" crm_feature_set="3.0.2"
> have-quorum="1" dc-uuid="b7764e7b-0a00-4745-8d9e-6911271eefb2"
> admin_epoch="0" epoch="319" num_updates="60">
> [root at node-02 log]# crm configure show
> node $id="80275014-5efe-4825-a29c-d42610f08cd1" node-02
> node $id="b7764e7b-0a00-4745-8d9e-6911271eefb2" node-03
> node $id="c7459ab3-55b6-4155-946d-5c1ba783507f" node-01
> primitive drbd_01 ocf:linbit:drbd \
>        params drbd_resource="drbd_pbx_service_1" \
>        op monitor interval="30s" \
>        op start interval="0" timeout="240s" \
>        op stop interval="0" timeout="120s"
> primitive drbd_02 ocf:linbit:drbd \
>        params drbd_resource="drbd_pbx_service_2" \
>        op monitor interval="30s" \
>        op start interval="0" timeout="240s" \
>        op stop interval="0" timeout="120s"
> primitive fs_01 ocf:heartbeat:Filesystem \
>        params device="/dev/drbd1" directory="/pbx_service_01" fstype="ext3" \
>        meta migration-threshold="3" failure-timeout="60" \
>        op monitor interval="20s" timeout="40s" OCF_CHECK_LEVEL="20" \
>        op start interval="0" timeout="60s" \
>        op stop interval="0" timeout="60s"
> primitive fs_02 ocf:heartbeat:Filesystem \
>        params device="/dev/drbd2" directory="/pbx_service_02" fstype="ext3" \
>        meta migration-threshold="3" failure-timeout="60" \
>        op monitor interval="20s" timeout="40s" OCF_CHECK_LEVEL="20" \
>        op start interval="0" timeout="60s" \
>        op stop interval="0" timeout="60s"
> primitive ip_01 ocf:heartbeat:IPaddr2 \
>        params ip="192.168.78.10" cidr_netmask="24" broadcast="192.168.78.255" \
>        meta failure-timeout="120" migration-threshold="3" \
>        op monitor interval="5s"
> primitive ip_02 ocf:heartbeat:IPaddr2 \
>        params ip="192.168.78.20" cidr_netmask="24" broadcast="192.168.78.255" \
>        meta failure-timeout="120" migration-threshold="3" \
>        op monitor interval="5s"
> primitive pbx_01 lsb:znd-pbx_01 \
>        meta failure-timeout="120" migration-threshold="3"
> target-role="Started" \
>        op monitor interval="20s" timeout="40s" \
>        op start interval="0" timeout="60s" \
>        op stop interval="0" timeout="60s"
> primitive pbx_02 ocf:heartbeat:Dummy \
>        params state="/pbx_service_02/Dummy.state" \
>        meta failure-timeout="120" migration-threshold="3" \
>        op monitor interval="20s" timeout="40s"
> primitive sshd-pbx_01 lsb:sshd-pbx_01 \
>        meta target-role="Started" \
>        op monitor interval="10m" \
>        op start interval="0" timeout="60s" \
>        op stop interval="0" timeout="60s"
> primitive sshd-pbx_02 lsb:sshd-pbx_02 \
>        meta target-role="Started" \
>        op monitor interval="10m" \
>        op start interval="0" timeout="60s" \
>        op stop interval="0" timeout="60s"
> primitive stonith-meatware stonith:meatware \
>        params hostlist="node-01 node-02 node-03" stonith-timeout="60" \
>        op start interval="0" timeout="60s" \
>        op stop interval="0" timeout="60s"
> group pbx_service_01 ip_01 fs_01 pbx_01 sshd-pbx_01 \
>        meta target-role="Started"
> group pbx_service_02 ip_02 fs_02 pbx_02 sshd-pbx_02 \
>        meta target-role="Started"
> ms ms-drbd_01 drbd_01 \
>        meta master-max="1" master-node-max="1" clone-max="2"
> clone-node-max="1" notify="true" target-role="Started"
> ms ms-drbd_02 drbd_02 \
>        meta master-max="1" master-node-max="1" clone-max="2"
> clone-node-max="1" notify="true" target-role="Started"
> clone stonith-clone stonith-meatware \
>        meta clone-max="3" clone-node-max="1" target-role="Started"
> globally_unique="false"
> location PrimaryNode-drbd_01 ms-drbd_01 100: node-01
> location PrimaryNode-drbd_02 ms-drbd_02 100: node-02
> location PrimaryNode-pbx_service_01 pbx_service_01 200: node-01
> location PrimaryNode-pbx_service_02 pbx_service_02 200: node-02
> location SecondaryNode-drbd_01 ms-drbd_01 0: node-03
> location SecondaryNode-drbd_02 ms-drbd_02 0: node-03
> location SecondaryNode-pbx_service_01 pbx_service_01 10: node-03
> location SecondaryNode-pbx_service_02 pbx_service_02 10: node-03
> location stonith-node-01 stonith-clone 100: node-01
> location stonith-node-02 stonith-clone 100: node-02
> location stonith-node-03 stonith-clone 100: node-03
> colocation fs_01-on-drbd_01 inf: fs_01 ms-drbd_01:Master
> colocation fs_02-on-drbd_02 inf: fs_02 ms-drbd_02:Master
> order pbx_service_01-after-drbd_01 inf: ms-drbd_01:promote pbx_service_01:start
> order pbx_service_02-after-drbd_02 inf: ms-drbd_02:promote pbx_service_02:start
> property $id="cib-bootstrap-options" \
>        stonith-enabled="true" \
>        symmetric-cluster="false" \
>        dc-version="1.1.3-9c2342c0378140df9bed7d192f2b9ed157908007" \
>        cluster-infrastructure="Heartbeat" \
>        last-lrm-refresh="1286195722"
> rsc_defaults $id="rsc-options" \
>        resource-stickiness="1000"
> [root at node-02 log]#
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>