[ClusterLabs] pcs create master/slave resource doesn't work (Ken Gaillot)

Hui Xiang xianghuir at gmail.com
Thu Nov 30 03:14:15 EST 2017


Thanks Ken very much for your helpful infomation.

I am now blocking on I can't see the pacemaker DC do any further
start/promote etc action on my resource agents, no helpful logs founded.

So my first question is that in what kind of situation DC will decide do
call start action?  does the monitor operation need to be return
OCF_SUCCESS? in my case, it will return OCF_NOT_RUNNING, and the monitor
operation is not being called any more, which should be wrong as I felt
that it should be called intervally.

The resource agent monitor logistic:
In the xx_monitor function it will call xx_update, and there always
hit  "$CRM_MASTER
-D;;" , what does it usually mean? will it stopped that start operation
being called?

ovsdb_server_master_update() {
    ocf_log info "ovsdb_server_master_update: $1}"

    case $1 in
        $OCF_SUCCESS)
        $CRM_MASTER -v ${slave_score};;
        $OCF_RUNNING_MASTER)
            $CRM_MASTER -v ${master_score};;
        #*) $CRM_MASTER -D;;
    esac
    ocf_log info "ovsdb_server_master_update end}"
}

ovsdb_server_monitor() {
    ocf_log info "ovsdb_server_monitor"
    ovsdb_server_check_status
    rc=$?

    ovsdb_server_master_update $rc
    ocf_log info "monitor is going to return $rc"
    return $rc
}


Below is my cluster configuration:

1. First I have an vip set.
[root at node-1 ~]# pcs resource show
 vip__management_old (ocf::es:ns_IPaddr2): Started node-1.domain.tld

2. Use pcs to create ovndb-servers and constraint
[root at node-1 ~]# pcs resource create tst-ovndb ocf:ovn:ovndb-servers
manage_northd=yes master_ip=192.168.0.2 nb_master_port=6641
sb_master_port=6642 master
     ([root at node-1 ~]# pcs resource meta tst-ovndb-master notify=true
      Error: unable to find a resource/clone/master/group:
tst-ovndb-master) ## returned error, so I changed into below command.
[root at node-1 ~]# pcs resource master tst-ovndb-master tst-ovndb notify=true
[root at node-1 ~]# pcs constraint colocation add master tst-ovndb-master with
vip__management_old

3. pcs status
[root at node-1 ~]# pcs status
 vip__management_old (ocf::es:ns_IPaddr2): Started node-1.domain.tld
 Master/Slave Set: tst-ovndb-master [tst-ovndb]
     Stopped: [ node-1.domain.tld node-2.domain.tld node-3.domain.tld ]

4. pcs resource show XXX
[root at node-1 ~]# pcs resource show  vip__management_old
 Resource: vip__management_old (class=ocf provider=es type=ns_IPaddr2)
  Attributes: nic=br-mgmt base_veth=br-mgmt-hapr ns_veth=hapr-m
ip=192.168.0.2 iflabel=ka cidr_netmask=24 ns=haproxy gateway=none
gateway_metric=0 iptables_start_rules=false iptables_stop_rules=false
iptables_comment=default-comment
  Meta Attrs: migration-threshold=3 failure-timeout=60
resource-stickiness=1
  Operations: monitor interval=3 timeout=30 (vip__management_old-monitor-3)
              start interval=0 timeout=30 (vip__management_old-start-0)
              stop interval=0 timeout=30 (vip__management_old-stop-0)
[root at node-1 ~]# pcs resource show tst-ovndb-master
 Master: tst-ovndb-master
  Meta Attrs: notify=true
  Resource: tst-ovndb (class=ocf provider=ovn type=ovndb-servers)
   Attributes: manage_northd=yes master_ip=192.168.0.2 nb_master_port=6641
sb_master_port=6642
   Operations: start interval=0s timeout=30s (tst-ovndb-start-timeout-30s)
               stop interval=0s timeout=20s (tst-ovndb-stop-timeout-20s)
               promote interval=0s timeout=50s (tst-ovndb-promote-timeout-
50s)
               demote interval=0s timeout=50s (tst-ovndb-demote-timeout-50s)
               monitor interval=30s timeout=20s (tst-ovndb-monitor-interval-
30s)
               monitor interval=10s role=Master timeout=20s
(tst-ovndb-monitor-interval-10s-role-Master)
               monitor interval=30s role=Slave timeout=20s
(tst-ovndb-monitor-interval-30s-role-Slave)


colocation colocation-tst-ovndb-master-vip__management_old-INFINITY inf:
tst-ovndb-master:Master vip__management_old:Started

5. I have put log in every ovndb-servers op, seems only the monitor op is
being called, no promoted by the pacemaker DC:
<30>Nov 30 15:22:19 node-1 ovndb-servers(tst-ovndb)[2980860]: INFO:
ovsdb_server_monitor
<30>Nov 30 15:22:19 node-1 ovndb-servers(tst-ovndb)[2980860]: INFO:
ovsdb_server_check_status
<30>Nov 30 15:22:19 node-1 ovndb-servers(tst-ovndb)[2980860]: INFO: return
OCFOCF_NOT_RUNNINGG
<30>Nov 30 15:22:20 node-1 ovndb-servers(tst-ovndb)[2980860]: INFO:
ovsdb_server_master_update: 7}
<30>Nov 30 15:22:20 node-1 ovndb-servers(tst-ovndb)[2980860]: INFO:
ovsdb_server_master_update end}
<30>Nov 30 15:22:20 node-1 ovndb-servers(tst-ovndb)[2980860]: INFO: monitor
is going to return 7
<30>Nov 30 15:22:20 node-1 ovndb-servers(undef)[2980970]: INFO: metadata
exit OCF_SUCCESS}

6. The cluster property:
property cib-bootstrap-options: \
        have-watchdog=false \
        dc-version=1.1.12-a14efad \
        cluster-infrastructure=corosync \
        no-quorum-policy=ignore \
        stonith-enabled=false \
        symmetric-cluster=false \
        last-lrm-refresh=1511802933



Thank you very much for any help.
Hui.


Date: Mon, 27 Nov 2017 12:07:57 -0600
From: Ken Gaillot <kgaillot at redhat.com>
To: Cluster Labs - All topics related to open-source clustering
        welcomed        <users at clusterlabs.org>, jpokorny at redhat.com
Subject: Re: [ClusterLabs] pcs create master/slave resource doesn't
        work
Message-ID: <1511806077.5194.6.camel at redhat.com>
Content-Type: text/plain; charset="UTF-8"

On Fri, 2017-11-24 at 18:00 +0800, Hui Xiang wrote:
> Jan,
>
> ? Very appreciated on your help, I am getting further more, but still
> it looks very strange.
>
> 1. To use "debug-promote", I upgrade pacemaker from 1.12 to 1.16, pcs
> to 0.9.160.
>
> 2. Recreate resource with below commands
> pcs resource create ovndb_servers ocf:ovn:ovndb-servers \
> ? master_ip=192.168.0.99 \
> ? op monitor interval="10s" \
> ? op monitor interval="11s" role=Master
> pcs resource master ovndb_servers-master ovndb_servers \
> ? meta notify="true" master-max="1" master-node-max="1" clone-max="3"
> clone-node-max="1"
> pcs resource create VirtualIP ocf:heartbeat:IPaddr2 ip=192.168.0.99 \
> ? ? op monitor interval=10s
> pcs constraint colocation add VirtualIP with master ovndb_servers-
> master \
> ? score=INFINITY
>
> 3. pcs status
> ?Master/Slave Set: ovndb_servers-master [ovndb_servers]
> ? ? ?Stopped: [ node-1.domain.tld node-2.domain.tld node-3.domain.tld
> ]
> ?VirtualIP    (ocf::heartbeat:IPaddr2):       Stopped
>
> 4. Manually run 'debug-start' on 3 nodes and 'debug-promote' on one
> of nodes
> run below on [ node-1.domain.tld node-2.domain.tld node-3.domain.tld
> ]
> # pcs resource debug-start ovndb_servers --full
> run below on [ node-1.domain.tld ]
> # pcs resource debug-promote ovndb_servers --full

Before running debug-* commands, I'd unmanage the resource or put the
cluster in maintenance mode, so Pacemaker doesn't try to "correct" your
actions.

>
> 5. pcs status
> ?Master/Slave Set: ovndb_servers-master [ovndb_servers]
> ? ? ?Stopped: [ node-1.domain.tld node-2.domain.tld node-3.domain.tld
> ]
> ?VirtualIP    (ocf::heartbeat:IPaddr2):       Stopped
>
> 6. However I have seen that one of ovndb_servers has been indeed
> promoted as master, but pcs status still showed all 'stopped'
> what am I missing?

It's hard to tell from these logs. It's possible the resource agent's
monitor command is not exiting with the expected status values:

http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemake
r_Explained/index.html#_requirements_for_multi_state_resource_agents

One of the nodes will be elected the DC, meaning it coordinates the
cluster's actions. The DC's logs will have more "pengine:" messages,
with each action that needs to be taken (e.g. "* Start <rsc> <node>").

You can look through those actions to see what the cluster decided to
do -- whether the resources were ever started, whether any was
promoted, and whether any were explicitly stopped.


> ?>? stderr: + 17:45:59: ocf_log:327: __OCF_MSG='ovndb_servers:
> Promoting node-1.domain.tld as the master'
> ?>? stderr: + 17:45:59: ocf_log:329: case "${__OCF_PRIO}" in
> ?>? stderr: + 17:45:59: ocf_log:333: __OCF_PRIO=INFO
> ?>? stderr: + 17:45:59: ocf_log:338: '[' INFO = DEBUG ']'
> ?>? stderr: + 17:45:59: ocf_log:341: ha_log 'INFO: ovndb_servers:
> Promoting node-1.domain.tld as the master'
> ?>? stderr: + 17:45:59: ha_log:253: __ha_log 'INFO: ovndb_servers:
> Promoting node-1.domain.tld as the master'
> ?>? stderr: + 17:45:59: __ha_log:185: local ignore_stderr=false
> ?>? stderr: + 17:45:59: __ha_log:186: local loglevel
> ?>? stderr: + 17:45:59: __ha_log:188: '[' 'xINFO: ovndb_servers:
> Promoting node-1.domain.tld as the master' = x--ignore-stderr ']'
> ?>? stderr: + 17:45:59: __ha_log:190: '[' none = '' ']'
> ?>? stderr: + 17:45:59: __ha_log:192: tty
> ?>? stderr: + 17:45:59: __ha_log:193: '[' x = x0 -a x = xdebug ']'
> ?>? stderr: + 17:45:59: __ha_log:195: '[' false = true ']'
> ?>? stderr: + 17:45:59: __ha_log:199: '[' '' ']'
> ?>? stderr: + 17:45:59: __ha_log:202: echo 'INFO: ovndb_servers:
> Promoting node-1.domain.tld as the master'
> ?>? stderr: INFO: ovndb_servers: Promoting node-1.domain.tld as the
> master
> ?>? stderr: + 17:45:59: __ha_log:204: return 0
> ?>? stderr: + 17:45:59: ovsdb_server_promote:378:
> /usr/sbin/crm_attribute --type crm_config --name OVN_REPL_INFO -s
> ovn_ovsdb_master_server -v node-1.domain.tld
> ?>? stderr: + 17:45:59: ovsdb_server_promote:379:
> ovsdb_server_master_update 8
> ?>? stderr: + 17:45:59: ovsdb_server_master_update:214: case $1 in
> ?>? stderr: + 17:45:59: ovsdb_server_master_update:218:
> /usr/sbin/crm_master -l reboot -v 10
> ?>? stderr: + 17:45:59: ovsdb_server_promote:380: return 0
> ?>? stderr: + 17:45:59: 458: rc=0
> ?>? stderr: + 17:45:59: 459: exit 0
>
>
> On 23/11/17 23:52 +0800, Hui Xiang wrote:
> > I am working on HA with 3-nodes, which has below configurations:
> >?
> > """
> > pcs resource create ovndb_servers ocf:ovn:ovndb-servers \
> >???master_ip=168.254.101.2 \
> >???op monitor interval="10s" \
> >???op monitor interval="11s" role=Master
> > pcs resource master ovndb_servers-master ovndb_servers \
> >???meta notify="true" master-max="1" master-node-max="1" clone-
> max="3"
> > clone-node-max="1"
> > pcs resource create VirtualIP ocf:heartbeat:IPaddr2
> ip=168.254.101.2 \
> >?????op monitor interval=10s
> > pcs constraint order promote ovndb_servers-master then VirtualIP
> > pcs constraint colocation add VirtualIP with master ovndb_servers-
> master \
> >???score=INFINITY
> > """
>
> (Out of curiosity, this looks like a mix of output from?
> pcs config export pcs-commands [or clufter cib2pcscmd -s]
> and manual editing.??Is this a good guess?)
> It's the output of "pcs status".
>
> >???However, after setting it as above, the master is not being
> selected, all
> > are stopped, from pacemaker log, node-1 has been chosen as the
> master, I am
> > confuse where is wrong, can anybody give a help, it would be very
> > appreciated.
> >?
> >?
> >??Master/Slave Set: ovndb_servers-master [ovndb_servers]
> >??????Stopped: [ node-1.domain.tld node-2.domain.tld node-
> 3.domain.tld ]
> >??VirtualIP (ocf::heartbeat:IPaddr2): Stopped
> >?
> >?
> > # pacemaker log
> > Nov 23 23:06:03 [665246] node-1.domain.tld????????cib:?????info:
> > cib_perform_op: ++ /cib/configuration/resources:??<primitive
> class="ocf"
> > id="ovndb_servers" provider="ovn" type="ovndb-servers"/>
> > Nov 23 23:06:03 [665246] node-1.domain.tld????????cib:?????info:
> > cib_perform_op:
> ++??????????????????????????????????<instance_attributes
> > id="ovndb_servers-instance_attributes">
> > Nov 23 23:06:03 [665246] node-1.domain.tld????????cib:?????info:
> > cib_perform_op: ++????????????????????????????????????<nvpair
> > id="ovndb_servers-instance_attributes-master_ip" name="master_ip"
> > value="168.254.101.2"/>
> > Nov 23 23:06:03 [665246] node-1.domain.tld????????cib:?????info:
> > cib_perform_op:
> ++??????????????????????????????????</instance_attributes>
> > Nov 23 23:06:03 [665246] node-1.domain.tld????????cib:?????info:
> > cib_perform_op: ++??????????????????????????????????<operations>
> > Nov 23 23:06:03 [665246] node-1.domain.tld????????cib:?????info:
> > cib_perform_op: ++????????????????????????????????????<op
> > id="ovndb_servers-start-timeout-30s" interval="0s" name="start"
> > timeout="30s"/>
> > Nov 23 23:06:03 [665246] node-1.domain.tld????????cib:?????info:
> > cib_perform_op: ++????????????????????????????????????<op
> > id="ovndb_servers-stop-timeout-20s" interval="0s" name="stop"
> > timeout="20s"/>
> > Nov 23 23:06:03 [665246] node-1.domain.tld????????cib:?????info:
> > cib_perform_op: ++????????????????????????????????????<op
> > id="ovndb_servers-promote-timeout-50s" interval="0s" name="promote"
> > timeout="50s"/>
> > Nov 23 23:06:03 [665246] node-1.domain.tld????????cib:?????info:
> > cib_perform_op: ++????????????????????????????????????<op
> > id="ovndb_servers-demote-timeout-50s" interval="0s" name="demote"
> > timeout="50s"/>
> > Nov 23 23:06:03 [665246] node-1.domain.tld????????cib:?????info:
> > cib_perform_op: ++????????????????????????????????????<op
> > id="ovndb_servers-monitor-interval-10s" interval="10s"
> name="monitor"/>
> > Nov 23 23:06:03 [665246] node-1.domain.tld????????cib:?????info:
> > cib_perform_op: ++????????????????????????????????????<op
> > id="ovndb_servers-monitor-interval-11s-role-Master" interval="11s"
> > name="monitor" role="Master"/>
> > Nov 23 23:06:03 [665246] node-1.domain.tld????????cib:?????info:
> > cib_perform_op: ++??????????????????????????????????</operations>
> > Nov 23 23:06:03 [665246] node-1.domain.tld????????cib:?????info:
> > cib_perform_op: ++????????????????????????????????</primitive>
> >?
> > Nov 23 23:06:03 [665249] node-1.domain.tld??????attrd:?????info:
> > attrd_peer_update: Setting master-ovndb_servers[node-1.domain.tld]:
> (null)
> > -> 5 from node-1.domain.tld
>
> If it's probable your ocf:ovn:ovndb-servers agent in master mode can
> run something like "attrd_updater -n master-ovndb_servers -U 5", then
> it was indeed launched OK, and if it does not continue to run as
> expected, there may be a problem with the agent itself.
>
> no change.
> You can try running "pcs resource debug-promote ovndb_servers --full"
> to examine the executation details (assuming the agent responds to
> OCF_TRACE_RA=1 environment variable, which is what shell-based
> agents built on top ocf-shellfuncs sourcable shell library from
> resource-agents project, hence incl. also agents it ships,
> customarily do).
> Yes, thank, it's helpful.
>
> > Nov 23 23:06:03 [665251] node-1.domain.tld???????crmd:???notice:
> > process_lrm_event: Operation ovndb_servers_monitor_0: ok
> > (node=node-1.domain.tld, call=185, rc=0, cib-update=88,
> confirmed=true)
> > <29>Nov 23 23:06:03 node-1 crmd[665251]:???notice:
> process_lrm_event:
> > Operation ovndb_servers_monitor_0: ok (node=node-1.domain.tld,
> call=185,
> > rc=0, cib-update=88, confirmed=true)
> > Nov 23 23:06:03 [665246] node-1.domain.tld????????cib:?????info:
> > cib_perform_op: Diff: --- 0.630.2 2
> > Nov 23 23:06:03 [665246] node-1.domain.tld????????cib:?????info:
> > cib_perform_op: Diff: +++ 0.630.3 (null)
> > Nov 23 23:06:03 [665246] node-1.domain.tld????????cib:?????info:
> > cib_perform_op: +??/cib:??@num_updates=3
> > Nov 23 23:06:03 [665246] node-1.domain.tld????????cib:?????info:
> > cib_perform_op: ++
> >
> /cib/status/node_state[@id='1']/transient_attributes[@id='1']/instanc
> e_attributes[@id='status-1']:
> > <nvpair id="status-1-master-ovndb_servers" name="master-
> ovndb_servers"
> > value="5"/>
> > Nov 23 23:06:03 [665246] node-1.domain.tld????????cib:?????info:
> > cib_process_request: Completed cib_modify operation for section
> status: OK
> > (rc=0, origin=node-3.domain.tld/attrd/80, version=0.630.3)
>
> Also depends if there's anything interesting after this point...
>
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> http://lists.clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.
> pdf
> Bugs: http://bugs.clusterlabs.org
--
Ken Gaillot <kgaillot at redhat.com>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.clusterlabs.org/pipermail/users/attachments/20171130/967748f8/attachment-0002.html>


More information about the Users mailing list