[Pacemaker] Duplicate node after corosync / pacemaker upgrade

Tue Aug 13 03:42:15 EDT 2013

Hi.

I tried to set things going according to Andrew 3rd suggestion in
http://blog.clusterlabs.org/blog/2012/pacemaker-and-cluster-filesystems/ .

Everyone Talks to Corosync 2.0

Requirements:

-          Filesystems supported: GFS2

-          Corosync: 2.x

-          Pacemaker: 1.1.7 or later

-          Other: none

I'm running RHEL 6.3. I have 2-node cluster without fencing - Vmware
machines are used.

I had previously installed corosync and pacemaker by using yum, so the old
versions were installed (corosync 1.4.x, pacemaker 1.1.7). Everything worked
after I created config files. I used crmsh.

Then I tried to compile from latest source from git the new versions of:

-          libqb

-          resource-agents

-          corosync

-          pacemaker

Compilation of the pacemaker was done with ./configure --without-cman
--without-heartbeat

Now everything new is installed.

[root at tjtcaps01 ~]# rpm -qa | grep libqb

libqb-devel-0.16.0-1.el6.x86_64

libqb-0.16.0-1.el6.x86_64

[root at tjtcaps01 ~]# rpm -qa | grep resource-agents

resource-agents-3.9.5-1.158.1a87e.el6.x86_64

[root at tjtcaps01 ~]# rpm -qa | grep corosync

corosync-2.3.1-1.el6.x86_64

corosynclib-2.3.1-1.el6.x86_64

corosynclib-devel-2.3.1-1.el6.x86_64

[root at tjtcaps01 ~]# rpm -qa | grep pacemaker

pacemaker-cts-1.1.10-1.el6.x86_64

pacemaker-remote-1.1.10-1.el6.x86_64

pacemaker-cli-1.1.10-1.el6.x86_64

pacemaker-1.1.10-1.el6.x86_64

drbd-pacemaker-8.4.3-2.el6.x86_64

pacemaker-libs-1.1.10-1.el6.x86_64

pacemaker-libs-devel-1.1.10-1.el6.x86_64

pacemaker-doc-1.1.10-1.el6.x86_64

pacemaker-cluster-libs-1.1.10-1.el6.x86_64

What I have left were configuration files. CIB was not adjusted. Corosync
config (/etc/corosync/corosync.conf) was altered like this (lines commented
are from previous version):

[root at tjtcaps01 ~]# cat /etc/corosync/corosync.conf

# Please read the corosync.conf.5 manual page

compatibility: whitetank

totem {

        version: 2

        secauth: off

        threads: 0

        interface {

                ringnumber: 0

                bindnetaddr: 192.168.105.0

                mcastaddr: 226.95.1.1

                mcastport: 4000

                ttl: 1

        }

}

quorum {

        provider: corosync_votequorum

        expected_votes: 1

        two_node: 1

}

logging {

        fileline: off

        to_stderr: no

        to_logfile: yes

        to_syslog: yes

        logfile: /var/log/cluster/corosync.log

        debug: off

        timestamp: on

        logger_subsys {

                subsys: AMF

                debug: off

        }

}

#amf {

#       mode: disabled

#}

#aisexec {

#       user: root

#       group: root

#}

#service {

   # Load the Pacemaker Cluster Resource Manager

#       name: pacemaker

#               ver: 0

#               }

I booted the services on both nodes in the following matter:

1.       service corosync start

2.       service pacemaker start

Then checked status with pcs:

[root at tjtcaps01 ~]# pcs status

Cluster name: 

Last updated: Tue Aug 13 09:24:24 2013

Last change: Tue Aug 13 09:24:21 2013 via cibadmin on tjtcaps01

Stack: corosync

Current DC: tjtcaps01 (3232262648) - partition with quorum

Version: 1.1.10-1.el6-368c726

4 Nodes configured

6 Resources configured

Online: [ tjtcaps01 tjtcaps02 ]

OFFLINE: [ tjtcaps01 tjtcaps02 ]

Full list of resources:

Resource Group: PGServer

     pg_lvm     (ocf::heartbeat:LVM):   Started tjtcaps01 

     pg_fs      (ocf::heartbeat:Filesystem):    Started tjtcaps01 

     pg_lsb     (lsb:postgresql-9.2):   Started tjtcaps01 

     pg_vip     (ocf::heartbeat:IPaddr2):       Started tjtcaps01 

 Master/Slave Set: ms_drbd_pg [drbd_pg]

     Masters: [ tjtcaps01 ]

     Slaves: [ tjtcaps02 ]

PCSD Status:

  192.168.105.248: Offline

  192.168.105.249: Offline

Here are my questions:

1.       Was it the right path I have taken to install Corosync 2.0 +
pacemaker 1.1.10 even if I am using RHEL? Suggestion 3 on the aforementioned
blog seemed nicer to me than option 2 (Everyone Talks to CMAN).

2.       Why the situation with node duplicates occurred? Did corosync
import to CIB those two additional nodes or pacemaker automatically add a
new definition of the same nodes to the CIB?

3.       Which nodes from cib should I delete? (please see the CIB query
later in the mail)

a.       This definition? <node id="tjtcaps01" type="normal"
uname="tjtcaps01"/>

b.      Or this definition? <node id="3232262648" uname="tjtcaps01"/>

4.       Why does pcs status command show PCSD Status Offline? If this is
not the right place to ask about corosync I will contact corosync mailing
list?

[root at tjtcaps01 ~]# cibadmin -Ql

<cib epoch="26" num_updates="11" admin_epoch="0"
validate-with="pacemaker-1.2" crm_feature_set="3.0.7"
update-origin="tjtcaps01" update-client="cibadmin" cib-last-written="Tue Aug
13 09:24:21 2013" have-quorum="1" dc-uuid="3232262648">

  <configuration>

    <crm_config>

      <cluster_property_set id="cib-bootstrap-options">

        <nvpair id="cib-bootstrap-options-dc-version" name="dc-version"
value="1.1.10-1.el6-368c726"/>

        <nvpair id="cib-bootstrap-options-cluster-infrastructure"
name="cluster-infrastructure" value="corosync"/>

        <nvpair id="cib-bootstrap-options-expected-quorum-votes"
name="expected-quorum-votes" value="2"/>

        <nvpair id="cib-bootstrap-options-stonith-enabled"
name="stonith-enabled" value="false"/>

        <nvpair id="cib-bootstrap-options-no-quorum-policy"
name="no-quorum-policy" value="ignore"/>

        <nvpair id="cib-bootstrap-options-maintenance-mode"
name="maintenance-mode" value="false"/>

      </cluster_property_set>

    </crm_config>

    <nodes>

      <node id="tjtcaps01" type="normal" uname="tjtcaps01"/>

      <node id="tjtcaps02" type="normal" uname="tjtcaps02"/>

      <node id="3232262648" uname="tjtcaps01"/>

      <node id="3232262649" uname="tjtcaps02"/>

    </nodes>

    <resources>

      <group id="PGServer">

        <primitive class="ocf" id="pg_lvm" provider="heartbeat" type="LVM">

          <instance_attributes id="pg_lvm-instance_attributes">

            <nvpair id="pg_lvm-instance_attributes-volgrpname"
name="volgrpname" value="vg_drbd"/>

          </instance_attributes>

          <operations>

            <op id="pg_lvm-start-0" interval="0" name="start" timeout="30"/>

            <op id="pg_lvm-stop-0" interval="0" name="stop" timeout="30"/>

          </operations>

        </primitive>

        <primitive class="ocf" id="pg_fs" provider="heartbeat"
type="Filesystem">

          <instance_attributes id="pg_fs-instance_attributes">

            <nvpair id="pg_fs-instance_attributes-device" name="device"
value="/dev/vg_drbd/lv_pgsql"/>

            <nvpair id="pg_fs-instance_attributes-directory"
name="directory" value="/var/lib/pgsql/9.2/data"/>

            <nvpair id="pg_fs-instance_attributes-options" name="options"
value="noatime,nodiratime"/>

            <nvpair id="pg_fs-instance_attributes-fstype" name="fstype"
value="xfs"/>

          </instance_attributes>

          <operations>

            <op id="pg_fs-start-0" interval="0" name="start" timeout="60"/>

            <op id="pg_fs-stop-0" interval="0" name="stop" timeout="120"/>

          </operations>

        </primitive>

        <primitive class="lsb" id="pg_lsb" type="postgresql-9.2">

          <operations>

            <op id="pg_lsb-monitor-30" interval="30" name="monitor"
timeout="60"/>

            <op id="pg_lsb-start-0" interval="0" name="start" timeout="60"/>

            <op id="pg_lsb-stop-0" interval="0" name="stop" timeout="60"/>

          </operations>

        </primitive>

        <primitive class="ocf" id="pg_vip" provider="heartbeat"
type="IPaddr2">

          <instance_attributes id="pg_vip-instance_attributes">

            <nvpair id="pg_vip-instance_attributes-ip" name="ip"
value="192.168.105.252"/>

            <nvpair id="pg_vip-instance_attributes-iflabel" name="iflabel"
value="tjtcapsvip"/>

          </instance_attributes>

          <operations>

            <op id="pg_vip-monitor-5" interval="5" name="monitor"/>

          </operations>

        </primitive>

      </group>

      <master id="ms_drbd_pg">

        <meta_attributes id="ms_drbd_pg-meta_attributes">

          <nvpair id="ms_drbd_pg-meta_attributes-master-max"
name="master-max" value="1"/>

          <nvpair id="ms_drbd_pg-meta_attributes-master-node-max"
name="master-node-max" value="1"/>

          <nvpair id="ms_drbd_pg-meta_attributes-clone-max" name="clone-max"
value="2"/>

          <nvpair id="ms_drbd_pg-meta_attributes-clone-node-max"
name="clone-node-max" value="1"/>

          <nvpair id="ms_drbd_pg-meta_attributes-notify" name="notify"
value="true"/>

        </meta_attributes>

        <primitive class="ocf" id="drbd_pg" provider="linbit" type="drbd">

          <instance_attributes id="drbd_pg-instance_attributes">

            <nvpair id="drbd_pg-instance_attributes-drbd_resource"
name="drbd_resource" value="postgres"/>

          </instance_attributes>

          <operations>

            <op id="drbd_pg-monitor-15" interval="15" name="monitor"
role="Master"/>

            <op id="drbd_pg-monitor-16" interval="16" name="monitor"
role="Slave"/>

           <op id="drbd_pg-start-0" interval="0" name="start"
timeout="240"/>

            <op id="drbd_pg-stop-0" interval="0" name="stop" timeout="120"/>

          </operations>

        </primitive>

      </master>

    </resources>

    <constraints>

      <rsc_location id="master-prefer-node1" node="tjtcaps01" rsc="pg_vip"
score="50"/>

      <rsc_colocation id="col_pg_drbd" rsc="PGServer" score="INFINITY"
with-rsc="ms_drbd_pg" with-rsc-role="Master"/>

      <rsc_order first="ms_drbd_pg" first-action="promote" id="ord_pg"
score="INFINITY" then="PGServer" then-action="start"/>

      <rsc_location id="cli-prefer-PGServer" rsc="PGServer" node="tjtcaps01"
score="INFINITY"/>

    </constraints>

    <rsc_defaults>

      <meta_attributes id="rsc-options">

        <nvpair id="rsc-options-resource-stickiness"
name="resource-stickiness" value="100"/>

      </meta_attributes>

    </rsc_defaults>

  </configuration>

  <status>

    <node_state id="3232262648" uname="tjtcaps01" in_ccm="true"
crmd="online" crm-debug-origin="do_update_resource" join="member"
expected="member">

      <transient_attributes id="3232262648">

        <instance_attributes id="status-3232262648">

          <nvpair id="status-3232262648-master-drbd_pg"
name="master-drbd_pg" value="10000"/>

          <nvpair id="status-3232262648-probe_complete"
name="probe_complete" value="true"/>

        </instance_attributes>

      </transient_attributes>

      <lrm id="3232262648">

        <lrm_resources>

          <lrm_resource id="pg_vip" type="IPaddr2" class="ocf"
provider="heartbeat">

            <lrm_rsc_op id="pg_vip_last_failure_0"
operation_key="pg_vip_monitor_0" operation="monitor"
crm-debug-origin="build_active_RAs" crm_feature_set="3.0.7"
transition-key="6:0:7:4ea258f0-f3cb-46d8-8ead-7e722aa9bf99"
transition-magic="0:0;6:0:7:4ea258f0-f3cb-46d8-8ead-7e722aa9bf99"
call-id="17" rc-code="0" op-status="0" interval="0" last-run="1376320020"
last-rc-change="1376320020" exec-time="257" queue-time="6"
op-digest="a0e257157c7f43b5dbaea731697d31ca"/>

            <lrm_rsc_op id="pg_vip_monitor_5000"
operation_key="pg_vip_monitor_5000" operation="monitor"
crm-debug-origin="do_update_resource" crm_feature_set="3.0.7"
transition-key="14:69:0:4ea258f0-f3cb-46d8-8ead-7e722aa9bf99"
transition-magic="0:0;14:69:0:4ea258f0-f3cb-46d8-8ead-7e722aa9bf99"
call-id="31" rc-code="0" op-status="0" interval="5000"
last-rc-change="1376378662" exec-time="145" queue-time="0"
op-digest="ae3a464c19aa8cd5b27dfe56422f45f1"/>

          </lrm_resource>

          <lrm_resource id="pg_lvm" type="LVM" class="ocf"
provider="heartbeat">

            <lrm_rsc_op id="pg_lvm_last_failure_0"
operation_key="pg_lvm_monitor_0" operation="monitor"
crm-debug-origin="build_active_RAs" crm_feature_set="3.0.7"
transition-key="3:0:7:4ea258f0-f3cb-46d8-8ead-7e722aa9bf99"
transition-magic="0:0;3:0:7:4ea258f0-f3cb-46d8-8ead-7e722aa9bf99"
call-id="5" rc-code="0" op-status="0" interval="0" last-run="1376320020"
last-rc-change="1376320020" exec-time="147" queue-time="0"
op-digest="8d4be0b9171df5ac3b4484891fe0f160"/>

          </lrm_resource>

          <lrm_resource id="pg_lsb" type="postgresql-9.2" class="lsb">

            <lrm_rsc_op id="pg_lsb_last_failure_0"
operation_key="pg_lsb_monitor_0" operation="monitor"
crm-debug-origin="build_active_RAs" crm_feature_set="3.0.7"
transition-key="5:0:7:4ea258f0-f3cb-46d8-8ead-7e722aa9bf99"
transition-magic="0:0;5:0:7:4ea258f0-f3cb-46d8-8ead-7e722aa9bf99"
call-id="13" rc-code="0" op-status="0" interval="0" last-run="1376320020"
last-rc-change="1376320020" exec-time="92" queue-time="0"
op-digest="f2317cad3d54cec5d7d7aa7d0bf35cf8"/>

            <lrm_rsc_op id="pg_lsb_monitor_30000"
operation_key="pg_lsb_monitor_30000" operation="monitor"
crm-debug-origin="do_update_resource" crm_feature_set="3.0.7"
transition-key="11:69:0:4ea258f0-f3cb-46d8-8ead-7e722aa9bf99"
transition-magic="0:0;11:69:0:4ea258f0-f3cb-46d8-8ead-7e722aa9bf99"
call-id="29" rc-code="0" op-status="0" interval="30000"
last-rc-change="1376378662" exec-time="66" queue-time="0"
op-digest="873ed4f07792aa8ff18f3254244675ea"/>

          </lrm_resource>

          <lrm_resource id="pg_fs" type="Filesystem" class="ocf"
provider="heartbeat">

            <lrm_rsc_op id="pg_fs_last_failure_0"
operation_key="pg_fs_monitor_0" operation="monitor"
crm-debug-origin="build_active_RAs" crm_feature_set="3.0.7"
transition-key="4:0:7:4ea258f0-f3cb-46d8-8ead-7e722aa9bf99"
transition-magic="0:0;4:0:7:4ea258f0-f3cb-46d8-8ead-7e722aa9bf99"
call-id="9" rc-code="0" op-status="0" interval="0" last-run="1376320020"
last-rc-change="1376320020" exec-time="256" queue-time="0"
op-digest="8d338a386e15f64e7389c329553bbead"/>

          </lrm_resource>

          <lrm_resource id="drbd_pg" type="drbd" class="ocf"
provider="linbit">

            <lrm_rsc_op id="drbd_pg_last_failure_0"
operation_key="drbd_pg_monitor_0" operation="monitor"
crm-debug-origin="build_active_RAs" crm_feature_set="3.0.7"
transition-key="7:0:7:4ea258f0-f3cb-46d8-8ead-7e722aa9bf99"
transition-magic="0:8;7:0:7:4ea258f0-f3cb-46d8-8ead-7e722aa9bf99"
call-id="22" rc-code="8" op-status="0" interval="0" last-run="1376320020"
last-rc-change="1376320020" exec-time="301" queue-time="1"
op-digest="aced06114de28a9ed9baeef6ca82fda7"/>

            <lrm_rsc_op id="drbd_pg_monitor_15000"
operation_key="drbd_pg_monitor_15000" operation="monitor"
crm-debug-origin="do_update_resource" crm_feature_set="3.0.7"
transition-key="23:69:8:4ea258f0-f3cb-46d8-8ead-7e722aa9bf99"
transition-magic="0:8;23:69:8:4ea258f0-f3cb-46d8-8ead-7e722aa9bf99"
call-id="33" rc-code="8" op-status="0" interval="15000"
last-rc-change="1376378662" exec-time="160" queue-time="0"
op-digest="1b574855f35af4f42926160a697d4dac"/>

          </lrm_resource>

        </lrm_resources>

      </lrm>

    </node_state>

    <node_state id="3232262649" in_ccm="true" crmd="online" join="member"
crm-debug-origin="do_update_resource" uname="tjtcaps02" expected="member">

      <transient_attributes id="3232262649">

        <instance_attributes id="status-3232262649">

          <nvpair id="status-3232262649-master-drbd_pg"
name="master-drbd_pg" value="10000"/>

          <nvpair id="status-3232262649-probe_complete"
name="probe_complete" value="true"/>

        </instance_attributes>

      </transient_attributes>

      <lrm id="3232262649">

        <lrm_resources>

          <lrm_resource id="pg_vip" type="IPaddr2" class="ocf"
provider="heartbeat">

            <lrm_rsc_op id="pg_vip_last_0" operation_key="pg_vip_monitor_0"
operation="monitor" crm-debug-origin="build_active_RAs"
crm_feature_set="3.0.7"
transition-key="7:2:7:4ea258f0-f3cb-46d8-8ead-7e722aa9bf99"
transition-magic="0:7;7:2:7:4ea258f0-f3cb-46d8-8ead-7e722aa9bf99"
call-id="17" rc-code="7" op-status="0" interval="0" last-run="1376320287"
last-rc-change="1376320287" exec-time="254" queue-time="0"
op-digest="a0e257157c7f43b5dbaea731697d31ca"/>

          </lrm_resource>

          <lrm_resource id="pg_lvm" type="LVM" class="ocf"
provider="heartbeat">

            <lrm_rsc_op id="pg_lvm_last_0" operation_key="pg_lvm_monitor_0"
operation="monitor" crm-debug-origin="build_active_RAs"
crm_feature_set="3.0.7"
transition-key="4:2:7:4ea258f0-f3cb-46d8-8ead-7e722aa9bf99"
transition-magic="0:7;4:2:7:4ea258f0-f3cb-46d8-8ead-7e722aa9bf99"
call-id="5" rc-code="7" op-status="0" interval="0" last-run="1376320287"
last-rc-change="1376320287" exec-time="162" queue-time="0"
op-digest="8d4be0b9171df5ac3b4484891fe0f160"/>

          </lrm_resource>

          <lrm_resource id="pg_lsb" type="postgresql-9.2" class="lsb">

            <lrm_rsc_op id="pg_lsb_last_0" operation_key="pg_lsb_monitor_0"
operation="monitor" crm-debug-origin="build_active_RAs"
crm_feature_set="3.0.7"
transition-key="6:2:7:4ea258f0-f3cb-46d8-8ead-7e722aa9bf99"
transition-magic="0:7;6:2:7:4ea258f0-f3cb-46d8-8ead-7e722aa9bf99"
call-id="13" rc-code="7" op-status="0" interval="0" last-run="1376320287"
last-rc-change="1376320287" exec-time="116" queue-time="0"
op-digest="f2317cad3d54cec5d7d7aa7d0bf35cf8"/>

          </lrm_resource>

          <lrm_resource id="pg_fs" type="Filesystem" class="ocf"
provider="heartbeat">

            <lrm_rsc_op id="pg_fs_last_0" operation_key="pg_fs_monitor_0"
operation="monitor" crm-debug-origin="build_active_RAs"
crm_feature_set="3.0.7"
transition-key="5:2:7:4ea258f0-f3cb-46d8-8ead-7e722aa9bf99"
transition-magic="0:7;5:2:7:4ea258f0-f3cb-46d8-8ead-7e722aa9bf99"
call-id="9" rc-code="7" op-status="0" interval="0" last-run="1376320287"
last-rc-change="1376320287" exec-time="285" queue-time="0"
op-digest="8d338a386e15f64e7389c329553bbead"/>

          </lrm_resource>

          <lrm_resource id="drbd_pg" type="drbd" class="ocf"
provider="linbit">

            <lrm_rsc_op id="drbd_pg_last_failure_0"
operation_key="drbd_pg_monitor_0" operation="monitor"
crm-debug-origin="build_active_RAs" crm_feature_set="3.0.7"
transition-key="8:2:7:4ea258f0-f3cb-46d8-8ead-7e722aa9bf99"
transition-magic="0:0;8:2:7:4ea258f0-f3cb-46d8-8ead-7e722aa9bf99"
call-id="22" rc-code="0" op-status="0" interval="0" last-run="1376320287"
last-rc-change="1376320287" exec-time="283" queue-time="1"
op-digest="aced06114de28a9ed9baeef6ca82fda7"/>

            <lrm_rsc_op id="drbd_pg_monitor_16000"
operation_key="drbd_pg_monitor_16000" operation="monitor"
crm-debug-origin="do_update_resource" crm_feature_set="3.0.7"
transition-key="26:69:0:4ea258f0-f3cb-46d8-8ead-7e722aa9bf99"
transition-magic="0:0;26:69:0:4ea258f0-f3cb-46d8-8ead-7e722aa9bf99"
call-id="29" rc-code="0" op-status="0" interval="16000"
last-rc-change="1376378662" exec-time="115" queue-time="0"
op-digest="1b574855f35af4f42926160a697d4dac"/>

          </lrm_resource>

        </lrm_resources>

      </lrm>

    </node_state>

  </status>

</cib>

Best regards,

Michal Mistina

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.clusterlabs.org/pipermail/pacemaker/attachments/20130813/62774aaf/attachment-0002.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 3076 bytes
Desc: not available
URL: <http://lists.clusterlabs.org/pipermail/pacemaker/attachments/20130813/62774aaf/attachment-0002.p7s>