[ClusterLabs] pcs create master/slave resource doesn't work (Ken Gaillot)

Thu Nov 30 05:39:22 EST 2017

The really weired thing is that the monitor is only called once other than
expected repeatedly, where should I check for it?

On Thu, Nov 30, 2017 at 4:14 PM, Hui Xiang <xianghuir at gmail.com> wrote:

> Thanks Ken very much for your helpful infomation.
>
> I am now blocking on I can't see the pacemaker DC do any further
> start/promote etc action on my resource agents, no helpful logs founded.
>
> So my first question is that in what kind of situation DC will decide do
> call start action?  does the monitor operation need to be return
> OCF_SUCCESS? in my case, it will return OCF_NOT_RUNNING, and the monitor
> operation is not being called any more, which should be wrong as I felt
> that it should be called intervally.
>
> The resource agent monitor logistic:
> In the xx_monitor function it will call xx_update, and there always hit  "$CRM_MASTER
> -D;;" , what does it usually mean? will it stopped that start operation
> being called?
>
> ovsdb_server_master_update() {
>     ocf_log info "ovsdb_server_master_update: $1}"
>
>     case $1 in
>         $OCF_SUCCESS)
>         $CRM_MASTER -v ${slave_score};;
>         $OCF_RUNNING_MASTER)
>             $CRM_MASTER -v ${master_score};;
>         #*) $CRM_MASTER -D;;
>     esac
>     ocf_log info "ovsdb_server_master_update end}"
> }
>
> ovsdb_server_monitor() {
>     ocf_log info "ovsdb_server_monitor"
>     ovsdb_server_check_status
>     rc=$?
>
>     ovsdb_server_master_update $rc
>     ocf_log info "monitor is going to return $rc"
>     return $rc
> }
>
>
> Below is my cluster configuration:
>
> 1. First I have an vip set.
> [root at node-1 ~]# pcs resource show
>  vip__management_old (ocf::es:ns_IPaddr2): Started node-1.domain.tld
>
> 2. Use pcs to create ovndb-servers and constraint
> [root at node-1 ~]# pcs resource create tst-ovndb ocf:ovn:ovndb-servers
> manage_northd=yes master_ip=192.168.0.2 nb_master_port=6641
> sb_master_port=6642 master
>      ([root at node-1 ~]# pcs resource meta tst-ovndb-master notify=true
>       Error: unable to find a resource/clone/master/group:
> tst-ovndb-master) ## returned error, so I changed into below command.
> [root at node-1 ~]# pcs resource master tst-ovndb-master tst-ovndb
> notify=true
> [root at node-1 ~]# pcs constraint colocation add master tst-ovndb-master
> with vip__management_old
>
> 3. pcs status
> [root at node-1 ~]# pcs status
>  vip__management_old (ocf::es:ns_IPaddr2): Started node-1.domain.tld
>  Master/Slave Set: tst-ovndb-master [tst-ovndb]
>      Stopped: [ node-1.domain.tld node-2.domain.tld node-3.domain.tld ]
>
> 4. pcs resource show XXX
> [root at node-1 ~]# pcs resource show  vip__management_old
>  Resource: vip__management_old (class=ocf provider=es type=ns_IPaddr2)
>   Attributes: nic=br-mgmt base_veth=br-mgmt-hapr ns_veth=hapr-m
> ip=192.168.0.2 iflabel=ka cidr_netmask=24 ns=haproxy gateway=none
> gateway_metric=0 iptables_start_rules=false iptables_stop_rules=false
> iptables_comment=default-comment
>   Meta Attrs: migration-threshold=3 failure-timeout=60
> resource-stickiness=1
>   Operations: monitor interval=3 timeout=30 (vip__management_old-monitor-3
> )
>               start interval=0 timeout=30 (vip__management_old-start-0)
>               stop interval=0 timeout=30 (vip__management_old-stop-0)
> [root at node-1 ~]# pcs resource show tst-ovndb-master
>  Master: tst-ovndb-master
>   Meta Attrs: notify=true
>   Resource: tst-ovndb (class=ocf provider=ovn type=ovndb-servers)
>    Attributes: manage_northd=yes master_ip=192.168.0.2 nb_master_port=6641
> sb_master_port=6642
>    Operations: start interval=0s timeout=30s (tst-ovndb-start-timeout-30s)
>                stop interval=0s timeout=20s (tst-ovndb-stop-timeout-20s)
>                promote interval=0s timeout=50s
> (tst-ovndb-promote-timeout-50s)
>                demote interval=0s timeout=50s
> (tst-ovndb-demote-timeout-50s)
>                monitor interval=30s timeout=20s
> (tst-ovndb-monitor-interval-30s)
>                monitor interval=10s role=Master timeout=20s
> (tst-ovndb-monitor-interval-10s-role-Master)
>                monitor interval=30s role=Slave timeout=20s
> (tst-ovndb-monitor-interval-30s-role-Slave)
>
>
> colocation colocation-tst-ovndb-master-vip__management_old-INFINITY inf:
> tst-ovndb-master:Master vip__management_old:Started
>
> 5. I have put log in every ovndb-servers op, seems only the monitor op is
> being called, no promoted by the pacemaker DC:
> <30>Nov 30 15:22:19 node-1 ovndb-servers(tst-ovndb)[2980860]: INFO:
> ovsdb_server_monitor
> <30>Nov 30 15:22:19 node-1 ovndb-servers(tst-ovndb)[2980860]: INFO:
> ovsdb_server_check_status
> <30>Nov 30 15:22:19 node-1 ovndb-servers(tst-ovndb)[2980860]: INFO:
> return OCFOCF_NOT_RUNNINGG
> <30>Nov 30 15:22:20 node-1 ovndb-servers(tst-ovndb)[2980860]: INFO:
> ovsdb_server_master_update: 7}
> <30>Nov 30 15:22:20 node-1 ovndb-servers(tst-ovndb)[2980860]: INFO:
> ovsdb_server_master_update end}
> <30>Nov 30 15:22:20 node-1 ovndb-servers(tst-ovndb)[2980860]: INFO:
> monitor is going to return 7
> <30>Nov 30 15:22:20 node-1 ovndb-servers(undef)[2980970]: INFO: metadata
> exit OCF_SUCCESS}
>
> 6. The cluster property:
> property cib-bootstrap-options: \
>         have-watchdog=false \
>         dc-version=1.1.12-a14efad \
>         cluster-infrastructure=corosync \
>         no-quorum-policy=ignore \
>         stonith-enabled=false \
>         symmetric-cluster=false \
>         last-lrm-refresh=1511802933
>
>
>
> Thank you very much for any help.
> Hui.
>
>
> Date: Mon, 27 Nov 2017 12:07:57 -0600
> From: Ken Gaillot <kgaillot at redhat.com>
> To: Cluster Labs - All topics related to open-source clustering
>         welcomed        <users at clusterlabs.org>, jpokorny at redhat.com
> Subject: Re: [ClusterLabs] pcs create master/slave resource doesn't
>         work
> Message-ID: <1511806077.5194.6.camel at redhat.com>
> Content-Type: text/plain; charset="UTF-8"
>
> On Fri, 2017-11-24 at 18:00 +0800, Hui Xiang wrote:
> > Jan,
> >
> > ? Very appreciated on your help, I am getting further more, but still
> > it looks very strange.
> >
> > 1. To use "debug-promote", I upgrade pacemaker from 1.12 to 1.16, pcs
> > to 0.9.160.
> >
> > 2. Recreate resource with below commands
> > pcs resource create ovndb_servers ocf:ovn:ovndb-servers \
> > ? master_ip=192.168.0.99 \
> > ? op monitor interval="10s" \
> > ? op monitor interval="11s" role=Master
> > pcs resource master ovndb_servers-master ovndb_servers \
> > ? meta notify="true" master-max="1" master-node-max="1" clone-max="3"
> > clone-node-max="1"
> > pcs resource create VirtualIP ocf:heartbeat:IPaddr2 ip=192.168.0.99 \
> > ? ? op monitor interval=10s
> > pcs constraint colocation add VirtualIP with master ovndb_servers-
> > master \
> > ? score=INFINITY
> >
> > 3. pcs status
> > ?Master/Slave Set: ovndb_servers-master [ovndb_servers]
> > ? ? ?Stopped: [ node-1.domain.tld node-2.domain.tld node-3.domain.tld
> > ]
> > ?VirtualIP    (ocf::heartbeat:IPaddr2):       Stopped
> >
> > 4. Manually run 'debug-start' on 3 nodes and 'debug-promote' on one
> > of nodes
> > run below on [ node-1.domain.tld node-2.domain.tld node-3.domain.tld
> > ]
> > # pcs resource debug-start ovndb_servers --full
> > run below on [ node-1.domain.tld ]
> > # pcs resource debug-promote ovndb_servers --full
>
> Before running debug-* commands, I'd unmanage the resource or put the
> cluster in maintenance mode, so Pacemaker doesn't try to "correct" your
> actions.
>
> >
> > 5. pcs status
> > ?Master/Slave Set: ovndb_servers-master [ovndb_servers]
> > ? ? ?Stopped: [ node-1.domain.tld node-2.domain.tld node-3.domain.tld
> > ]
> > ?VirtualIP    (ocf::heartbeat:IPaddr2):       Stopped
> >
> > 6. However I have seen that one of ovndb_servers has been indeed
> > promoted as master, but pcs status still showed all 'stopped'
> > what am I missing?
>
> It's hard to tell from these logs. It's possible the resource agent's
> monitor command is not exiting with the expected status values:
>
> http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemake
> r_Explained/index.html#_requirements_for_multi_state_resource_agents
>
> One of the nodes will be elected the DC, meaning it coordinates the
> cluster's actions. The DC's logs will have more "pengine:" messages,
> with each action that needs to be taken (e.g. "* Start <rsc> <node>").
>
> You can look through those actions to see what the cluster decided to
> do -- whether the resources were ever started, whether any was
> promoted, and whether any were explicitly stopped.
>
>
> > ?>? stderr: + 17:45:59: ocf_log:327: __OCF_MSG='ovndb_servers:
> > Promoting node-1.domain.tld as the master'
> > ?>? stderr: + 17:45:59: ocf_log:329: case "${__OCF_PRIO}" in
> > ?>? stderr: + 17:45:59: ocf_log:333: __OCF_PRIO=INFO
> > ?>? stderr: + 17:45:59: ocf_log:338: '[' INFO = DEBUG ']'
> > ?>? stderr: + 17:45:59: ocf_log:341: ha_log 'INFO: ovndb_servers:
> > Promoting node-1.domain.tld as the master'
> > ?>? stderr: + 17:45:59: ha_log:253: __ha_log 'INFO: ovndb_servers:
> > Promoting node-1.domain.tld as the master'
> > ?>? stderr: + 17:45:59: __ha_log:185: local ignore_stderr=false
> > ?>? stderr: + 17:45:59: __ha_log:186: local loglevel
> > ?>? stderr: + 17:45:59: __ha_log:188: '[' 'xINFO: ovndb_servers:
> > Promoting node-1.domain.tld as the master' = x--ignore-stderr ']'
> > ?>? stderr: + 17:45:59: __ha_log:190: '[' none = '' ']'
> > ?>? stderr: + 17:45:59: __ha_log:192: tty
> > ?>? stderr: + 17:45:59: __ha_log:193: '[' x = x0 -a x = xdebug ']'
> > ?>? stderr: + 17:45:59: __ha_log:195: '[' false = true ']'
> > ?>? stderr: + 17:45:59: __ha_log:199: '[' '' ']'
> > ?>? stderr: + 17:45:59: __ha_log:202: echo 'INFO: ovndb_servers:
> > Promoting node-1.domain.tld as the master'
> > ?>? stderr: INFO: ovndb_servers: Promoting node-1.domain.tld as the
> > master
> > ?>? stderr: + 17:45:59: __ha_log:204: return 0
> > ?>? stderr: + 17:45:59: ovsdb_server_promote:378:
> > /usr/sbin/crm_attribute --type crm_config --name OVN_REPL_INFO -s
> > ovn_ovsdb_master_server -v node-1.domain.tld
> > ?>? stderr: + 17:45:59: ovsdb_server_promote:379:
> > ovsdb_server_master_update 8
> > ?>? stderr: + 17:45:59: ovsdb_server_master_update:214: case $1 in
> > ?>? stderr: + 17:45:59: ovsdb_server_master_update:218:
> > /usr/sbin/crm_master -l reboot -v 10
> > ?>? stderr: + 17:45:59: ovsdb_server_promote:380: return 0
> > ?>? stderr: + 17:45:59: 458: rc=0
> > ?>? stderr: + 17:45:59: 459: exit 0
> >
> >
> > On 23/11/17 23:52 +0800, Hui Xiang wrote:
> > > I am working on HA with 3-nodes, which has below configurations:
> > >?
> > > """
> > > pcs resource create ovndb_servers ocf:ovn:ovndb-servers \
> > >???master_ip=168.254.101.2 \
> > >???op monitor interval="10s" \
> > >???op monitor interval="11s" role=Master
> > > pcs resource master ovndb_servers-master ovndb_servers \
> > >???meta notify="true" master-max="1" master-node-max="1" clone-
> > max="3"
> > > clone-node-max="1"
> > > pcs resource create VirtualIP ocf:heartbeat:IPaddr2
> > ip=168.254.101.2 \
> > >?????op monitor interval=10s
> > > pcs constraint order promote ovndb_servers-master then VirtualIP
> > > pcs constraint colocation add VirtualIP with master ovndb_servers-
> > master \
> > >???score=INFINITY
> > > """
> >
> > (Out of curiosity, this looks like a mix of output from?
> > pcs config export pcs-commands [or clufter cib2pcscmd -s]
> > and manual editing.??Is this a good guess?)
> > It's the output of "pcs status".
> >
> > >???However, after setting it as above, the master is not being
> > selected, all
> > > are stopped, from pacemaker log, node-1 has been chosen as the
> > master, I am
> > > confuse where is wrong, can anybody give a help, it would be very
> > > appreciated.
> > >?
> > >?
> > >??Master/Slave Set: ovndb_servers-master [ovndb_servers]
> > >??????Stopped: [ node-1.domain.tld node-2.domain.tld node-
> > 3.domain.tld ]
> > >??VirtualIP (ocf::heartbeat:IPaddr2): Stopped
> > >?
> > >?
> > > # pacemaker log
> > > Nov 23 23:06:03 [665246] node-1.domain.tld????????cib:?????info:
> > > cib_perform_op: ++ /cib/configuration/resources:??<primitive
> > class="ocf"
> > > id="ovndb_servers" provider="ovn" type="ovndb-servers"/>
> > > Nov 23 23:06:03 [665246] node-1.domain.tld????????cib:?????info:
> > > cib_perform_op:
> > ++??????????????????????????????????<instance_attributes
> > > id="ovndb_servers-instance_attributes">
> > > Nov 23 23:06:03 [665246] node-1.domain.tld????????cib:?????info:
> > > cib_perform_op: ++????????????????????????????????????<nvpair
> > > id="ovndb_servers-instance_attributes-master_ip" name="master_ip"
> > > value="168.254.101.2"/>
> > > Nov 23 23:06:03 [665246] node-1.domain.tld????????cib:?????info:
> > > cib_perform_op:
> > ++??????????????????????????????????</instance_attributes>
> > > Nov 23 23:06:03 [665246] node-1.domain.tld????????cib:?????info:
> > > cib_perform_op: ++??????????????????????????????????<operations>
> > > Nov 23 23:06:03 [665246] node-1.domain.tld????????cib:?????info:
> > > cib_perform_op: ++????????????????????????????????????<op
> > > id="ovndb_servers-start-timeout-30s" interval="0s" name="start"
> > > timeout="30s"/>
> > > Nov 23 23:06:03 [665246] node-1.domain.tld????????cib:?????info:
> > > cib_perform_op: ++????????????????????????????????????<op
> > > id="ovndb_servers-stop-timeout-20s" interval="0s" name="stop"
> > > timeout="20s"/>
> > > Nov 23 23:06:03 [665246] node-1.domain.tld????????cib:?????info:
> > > cib_perform_op: ++????????????????????????????????????<op
> > > id="ovndb_servers-promote-timeout-50s" interval="0s" name="promote"
> > > timeout="50s"/>
> > > Nov 23 23:06:03 [665246] node-1.domain.tld????????cib:?????info:
> > > cib_perform_op: ++????????????????????????????????????<op
> > > id="ovndb_servers-demote-timeout-50s" interval="0s" name="demote"
> > > timeout="50s"/>
> > > Nov 23 23:06:03 [665246] node-1.domain.tld????????cib:?????info:
> > > cib_perform_op: ++????????????????????????????????????<op
> > > id="ovndb_servers-monitor-interval-10s" interval="10s"
> > name="monitor"/>
> > > Nov 23 23:06:03 [665246] node-1.domain.tld????????cib:?????info:
> > > cib_perform_op: ++????????????????????????????????????<op
> > > id="ovndb_servers-monitor-interval-11s-role-Master" interval="11s"
> > > name="monitor" role="Master"/>
> > > Nov 23 23:06:03 [665246] node-1.domain.tld????????cib:?????info:
> > > cib_perform_op: ++??????????????????????????????????</operations>
> > > Nov 23 23:06:03 [665246] node-1.domain.tld????????cib:?????info:
> > > cib_perform_op: ++????????????????????????????????</primitive>
> > >?
> > > Nov 23 23:06:03 [665249] node-1.domain.tld??????attrd:?????info:
> > > attrd_peer_update: Setting master-ovndb_servers[node-1.domain.tld]:
> > (null)
> > > -> 5 from node-1.domain.tld
> >
> > If it's probable your ocf:ovn:ovndb-servers agent in master mode can
> > run something like "attrd_updater -n master-ovndb_servers -U 5", then
> > it was indeed launched OK, and if it does not continue to run as
> > expected, there may be a problem with the agent itself.
> >
> > no change.
> > You can try running "pcs resource debug-promote ovndb_servers --full"
> > to examine the executation details (assuming the agent responds to
> > OCF_TRACE_RA=1 environment variable, which is what shell-based
> > agents built on top ocf-shellfuncs sourcable shell library from
> > resource-agents project, hence incl. also agents it ships,
> > customarily do).
> > Yes, thank, it's helpful.
> >
> > > Nov 23 23:06:03 [665251] node-1.domain.tld???????crmd:???notice:
> > > process_lrm_event: Operation ovndb_servers_monitor_0: ok
> > > (node=node-1.domain.tld, call=185, rc=0, cib-update=88,
> > confirmed=true)
> > > <29>Nov 23 23:06:03 node-1 crmd[665251]:???notice:
> > process_lrm_event:
> > > Operation ovndb_servers_monitor_0: ok (node=node-1.domain.tld,
> > call=185,
> > > rc=0, cib-update=88, confirmed=true)
> > > Nov 23 23:06:03 [665246] node-1.domain.tld????????cib:?????info:
> > > cib_perform_op: Diff: --- 0.630.2 2
> > > Nov 23 23:06:03 [665246] node-1.domain.tld????????cib:?????info:
> > > cib_perform_op: Diff: +++ 0.630.3 (null)
> > > Nov 23 23:06:03 [665246] node-1.domain.tld????????cib:?????info:
> > > cib_perform_op: +??/cib:??@num_updates=3
> > > Nov 23 23:06:03 [665246] node-1.domain.tld????????cib:?????info:
> > > cib_perform_op: ++
> > >
> > /cib/status/node_state[@id='1']/transient_attributes[@id='1']/instanc
> > e_attributes[@id='status-1']:
> > > <nvpair id="status-1-master-ovndb_servers" name="master-
> > ovndb_servers"
> > > value="5"/>
> > > Nov 23 23:06:03 [665246] node-1.domain.tld????????cib:?????info:
> > > cib_process_request: Completed cib_modify operation for section
> > status: OK
> > > (rc=0, origin=node-3.domain.tld/attrd/80, version=0.630.3)
> >
> > Also depends if there's anything interesting after this point...
> >
> > _______________________________________________
> > Users mailing list: Users at clusterlabs.org
> > http://lists.clusterlabs.org/mailman/listinfo/users
> >
> > Project Home: http://www.clusterlabs.org
> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.
> > pdf
> > Bugs: http://bugs.clusterlabs.org
> --
> Ken Gaillot <kgaillot at redhat.com>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20171130/23085d3b/attachment-0003.html>