<div dir="ltr">Thank you very much Ken!! You nailed it, now it's working :-)</div><div class="gmail_extra"><br><div class="gmail_quote">On Tue, Dec 5, 2017 at 5:29 AM, Ken Gaillot <span dir="ltr"><<a href="mailto:kgaillot@redhat.com" target="_blank">kgaillot@redhat.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class="">On Mon, 2017-12-04 at 23:15 +0800, Hui Xiang wrote:<br>

> Thanks Ken very much for the helpful information. It indeed help a<br>

> lot for debbuging.<br>

><br>

>  " Each time the DC decides what to do, there will be a line like<br>

> "...<br>

> saving inputs in ..." with a file name. The log messages just before<br>

> that may give some useful information."<br>

>   - I am unable to find such information in the logs, it only prints<br>

> some like /var/lib/pacemaker/<wbr>pengine/pe-input-xx<br>

<br>

</span>If the cluster had nothing to do, it won't show anything, but if<br>

actions were needed, it should show them, like<br>

"Start      myrsc         ( node1 )".<br>

<br>

Are there any messages with "error" or "warning" in the log?<br>

<span class=""><br>

> When I am comparing the cib.xml file of good with bad one, it<br>

> diffetiates from the order of "name" and "id" as below shown, does it<br>

> matter for cib to function normally?<br>

<br>

</span>No, the XML attributes can be any order.<br>

<br>

I just noticed that your cluster has symmetric-cluster=false. That<br>

means that resources can't run anywhere by default; in order for a<br>

resource to run, there must be a location constraint allowing it to run<br>

on a node. Have you added such constraints?<br>

<div class="HOEnZb"><div class="h5"><br>

><br>

>           <operations><br>

>             <op id="ovndb-servers-monitor-20" interval="20"<br>

> name="monitor" timeout="30"/><br>

>             <op id="ovndb-servers-start-0" interval="0" name="start"<br>

> timeout="60"/><br>

>             <op id="ovndb-servers-stop-0" interval="0" name="stop"<br>

> timeout="60"/><br>

>             <op id="ovndb-servers-promote-0" interval="0"<br>

> name="promote" timeout="60"/><br>

>             <op id="ovndb-servers-demote-0" interval="0"<br>

> name="demote" timeout="60"/><br>

>           </operations><br>

><br>

>           <operations><br>

>             <op name="monitor" interval="<wbr>20" <br>

> timeout="30"  id="ovndb-<wbr>servers-monitor-20"/><br>

>             <op name="start" interval="0"  timeout="60"  id="ovndb-<br>

> servers-start-0" /><br>

>             <op name="stop" interval="0" timeout="60" id="ovndb-<br>

> servers-stop-0" /><br>

>           </operations><br>

><br>

><br>

> Thanks.<br>

> Hui.<br>

><br>

><br>

> On Sat, Dec 2, 2017 at 5:07 AM, Ken Gaillot <<a href="mailto:kgaillot@redhat.com">kgaillot@redhat.com</a>><br>

> wrote:<br>

> > On Fri, 2017-12-01 at 09:36 +0800, Hui Xiang wrote:<br>

> > > Hi all,<br>

> > ><br>

> > >   I am using the ovndb-servers ocf agent[1] which is a kind of<br>

> > multi-<br>

> > > state resource,when I am creating it(please see my previous<br>

> > email),<br>

> > > the monitor is called only once, and the start operation is never<br>

> > > called, according to below description, the once called monitor<br>

> > > operation returned OCF_NOT_RUNNING, should the pacemaker will<br>

> > decide<br>

> > > to execute start action based this return code? is there any way<br>

> > to<br>

> ><br>

> > Before Pacemaker does anything with a resource, it first calls a<br>

> > one-<br>

> > time monitor (called a "probe") to find out the current status of<br>

> > the<br>

> > resource across the cluster. This allows it to discover if the<br>

> > service<br>

> > is already running somewhere.<br>

> ><br>

> > So, you will see those probes for every resource when the cluster<br>

> > starts, or when the resource is added to the configuration, or when<br>

> > the<br>

> > resource is cleaned up.<br>

> ><br>

> > > check out what is the next action? Currently in my environment<br>

> > > nothing happened and I am almost tried all I known ways to debug,<br>

> > > however, no lucky, could anyone help it out? thank you very much.<br>

> > ><br>

> > > Monitor Return Code   Description<br>

> > > OCF_NOT_RUNNING       Stopped<br>

> > > OCF_SUCCESS   Running (Slave)<br>

> > > OCF_RUNNING_MASTER    Running (Master)<br>

> > > OCF_FAILED_MASTER     Failed (Master)<br>

> > > Other Failed (Slave)<br>

> > ><br>

> > ><br>

> > > [1] <a href="https://github.com/openvswitch/ovs/blob/master/ovn/utilities/" rel="noreferrer" target="_blank">https://github.com/<wbr>openvswitch/ovs/blob/master/<wbr>ovn/utilities/</a><br>

> > ovnd<br>

> > > b-servers.ocf<br>

> > > Hui.<br>

> > ><br>

> > ><br>

> > ><br>

> > > On Thu, Nov 30, 2017 at 6:39 PM, Hui Xiang <<a href="mailto:xianghuir@gmail.com">xianghuir@gmail.com</a>><br>

> > > wrote:<br>

> > > > The really weired thing is that the monitor is only called once<br>

> > > > other than expected repeatedly, where should I check for it?<br>

> > > ><br>

> > > > On Thu, Nov 30, 2017 at 4:14 PM, Hui Xiang <<a href="mailto:xianghuir@gmail.com">xianghuir@gmail.com</a><br>

> > ><br>

> > > > wrote:<br>

> > > > > Thanks Ken very much for your helpful infomation.<br>

> > > > ><br>

> > > > > I am now blocking on I can't see the pacemaker DC do any<br>

> > further<br>

> > > > > start/promote etc action on my resource agents, no helpful<br>

> > logs<br>

> > > > > founded.<br>

> ><br>

> > Each time the DC decides what to do, there will be a line like "...<br>

> > saving inputs in ..." with a file name. The log messages just<br>

> > before<br>

> > that may give some useful information.<br>

> ><br>

> > Otherwise, you can take that file, and simulate what the cluster<br>

> > decided at that point:<br>

> ><br>

> >   crm_simulate -Sx $FILENAME<br>

> ><br>

> > It will first show the status of the cluster at the start of the<br>

> > decision-making, then a "Transition Summary" with the actions that<br>

> > are<br>

> > required, then a simulated execution of those actions, and then<br>

> > what<br>

> > the resulting status would be if those actions succeeded.<br>

> ><br>

> > That may give you some more information. You can make it more<br>

> > verbose<br>

> > by using "-Ssx", or by adding "-VVVV", but it's not very user-<br>

> > friendly<br>

> > output.<br>

> ><br>

> > > > ><br>

> > > > > So my first question is that in what kind of situation DC<br>

> > will<br>

> > > > > decide do call start action?  does the monitor operation need<br>

> > to<br>

> > > > > be return OCF_SUCCESS? in my case, it will return<br>

> > > > > OCF_NOT_RUNNING, and the monitor operation is not being<br>

> > called<br>

> > > > > any more, which should be wrong as I felt that it should be<br>

> > > > > called intervally. <br>

> ><br>

> > The DC will ask for a start if the configuration and current status<br>

> > require it. For example, if the resource's current status is<br>

> > stopped,<br>

> > and the configuration calls for a target role of started (the<br>

> > default),<br>

> > then it will start it. On the other hand, if the current status is<br>

> > started, then it doesn't need to do anything -- or, if location<br>

> > constraints ban all the nodes from running the resource, then it<br>

> > can't<br>

> > do anything.<br>

> ><br>

> > So, it's all based on what the current status is (based on the last<br>

> > monitor result), and what the configuration requires.<br>

> ><br>

> > > > ><br>

> > > > > The resource agent monitor logistic:<br>

> > > > > In the xx_monitor function it will call xx_update, and there<br>

> > > > > always hit  "$CRM_MASTER -D;;" , what does it usually mean?<br>

> > will<br>

> > > > > it stopped that start operation being called? <br>

> ><br>

> > Each master/slave resource has a special node attribute with a<br>

> > "master<br>

> > score" for that node. The node with the highest master score will<br>

> > be<br>

> > promoted to master. It's up to the resource agent to set this<br>

> > attribute. The "-D" call you see deletes that attribute (presumably<br>

> > before updating it later).<br>

> ><br>

> > The master score has no effect on starting/stopping.<br>

> ><br>

> > > > ><br>

> > > > > ovsdb_server_master_update() {<br>

> > > > >     ocf_log info "ovsdb_server_master_update: $1}"<br>

> > > > ><br>

> > > > >     case $1 in<br>

> > > > >         $OCF_SUCCESS)<br>

> > > > >         $CRM_MASTER -v ${slave_score};;<br>

> > > > >         $OCF_RUNNING_MASTER)<br>

> > > > >             $CRM_MASTER -v ${master_score};;<br>

> > > > >         #*) $CRM_MASTER -D;;<br>

> > > > >     esac<br>

> > > > >     ocf_log info "ovsdb_server_master_update end}"<br>

> > > > > }<br>

> > > > ><br>

> > > > > ovsdb_server_monitor() {<br>

> > > > >     ocf_log info "ovsdb_server_monitor"<br>

> > > > >     ovsdb_server_check_status<br>

> > > > >     rc=$?<br>

> > > > ><br>

> > > > >     ovsdb_server_master_update $rc<br>

> > > > >     ocf_log info "monitor is going to return $rc"<br>

> > > > >     return $rc<br>

> > > > > }<br>

> > > > ><br>

> > > > ><br>

> > > > > Below is my cluster configuration:<br>

> > > > ><br>

> > > > > 1. First I have an vip set.<br>

> > > > > [root@node-1 ~]# pcs resource show<br>

> > > > >  vip__management_old      (ocf::es:ns_IPaddr2):   Started<br>

> > > > > node-1.domain.tld<br>

> > > > ><br>

> > > > > 2. Use pcs to create ovndb-servers and constraint <br>

> > > > > [root@node-1 ~]# pcs resource create tst-ovndb ocf:ovn:ovndb-<br>

> > > > > servers  manage_northd=yes master_ip=192.168.0.2<br>

> > > > > nb_master_port=6641 sb_master_port=6642 master<br>

> > > > >      ([root@node-1 ~]# pcs resource meta tst-ovndb-master<br>

> > > > > notify=true<br>

> > > > >       Error: unable to find a resource/clone/master/group:<br>

> > tst-<br>

> > > > > ovndb-master) ## returned error, so I changed into below<br>

> > command.<br>

> > > > > [root@node-1 ~]# pcs resource master tst-ovndb-master tst-<br>

> > ovndb<br>

> > > > > notify=true<br>

> > > > > [root@node-1 ~]# pcs constraint colocation add master tst-<br>

> > ovndb-<br>

> > > > > master with vip__management_old<br>

> > > > ><br>

> > > > > 3. pcs status<br>

> > > > > [root@node-1 ~]# pcs status<br>

> > > > >  vip__management_old      (ocf::es:ns_IPaddr2):   Started<br>

> > > > > node-1.domain.tld<br>

> > > > >  Master/Slave Set: tst-ovndb-master [tst-ovndb]<br>

> > > > >      Stopped: [ node-1.domain.tld node-2.domain.tld node-<br>

> > > > > 3.domain.tld ]<br>

> > > > ><br>

> > > > > 4. pcs resource show XXX<br>

> > > > > [root@node-1 ~]# pcs resource show  vip__management_old<br>

> > > > >  Resource: vip__management_old (class=ocf provider=es<br>

> > > > > type=ns_IPaddr2)<br>

> > > > >   Attributes: nic=br-mgmt base_veth=br-mgmt-hapr<br>

> > ns_veth=hapr-m<br>

> > > > > ip=192.168.0.2 iflabel=ka cidr_netmask=24 ns=haproxy<br>

> > gateway=none<br>

> > > > > gateway_metric=0 iptables_start_rules=false<br>

> > > > > iptables_stop_rules=false iptables_comment=default-<wbr>comment <br>

> > > > >   Meta Attrs: migration-threshold=3 failure-timeout=60<br>

> > resource-<br>

> > > > > stickiness=1 <br>

> > > > >   Operations: monitor interval=3 timeout=30<br>

> > (vip__management_old-<br>

> > > > > monitor-3)<br>

> > > > >               start interval=0 timeout=30<br>

> > (vip__management_old-<br>

> > > > > start-0)<br>

> > > > >               stop interval=0 timeout=30<br>

> > (vip__management_old-<br>

> > > > > stop-0)<br>

> > > > > [root@node-1 ~]# pcs resource show tst-ovndb-master<br>

> > > > >  Master: tst-ovndb-master<br>

> > > > >   Meta Attrs: notify=true <br>

> > > > >   Resource: tst-ovndb (class=ocf provider=ovn type=ovndb-<br>

> > servers)<br>

> > > > >    Attributes: manage_northd=yes master_ip=192.168.0.2<br>

> > > > > nb_master_port=6641 sb_master_port=6642 <br>

> > > > >    Operations: start interval=0s timeout=30s (tst-ovndb-<br>

> > start-<br>

> > > > > timeout-30s)<br>

> > > > >                stop interval=0s timeout=20s (tst-ovndb-stop-<br>

> > > > > timeout-20s)<br>

> > > > >                promote interval=0s timeout=50s (tst-ovndb-<br>

> > > > > promote-timeout-50s)<br>

> > > > >                demote interval=0s timeout=50s (tst-ovndb-<br>

> > demote-<br>

> > > > > timeout-50s)<br>

> > > > >                monitor interval=30s timeout=20s (tst-ovndb-<br>

> > > > > monitor-interval-30s)<br>

> > > > >                monitor interval=10s role=Master timeout=20s<br>

> > (tst-<br>

> > > > > ovndb-monitor-interval-10s-<wbr>role-Master)<br>

> > > > >                monitor interval=30s role=Slave timeout=20s<br>

> > (tst-<br>

> > > > > ovndb-monitor-interval-30s-<wbr>role-Slave)<br>

> > > > ><br>

> > > > ><br>

> > > > > colocation colocation-tst-ovndb-master-<wbr>vip__management_old-<br>

> > > > > INFINITY inf: tst-ovndb-master:Master<br>

> > vip__management_old:Started<br>

> > > > ><br>

> > > > > 5. I have put log in every ovndb-servers op, seems only the<br>

> > > > > monitor op is being called, no promoted by the pacemaker DC:<br>

> > > > > <30>Nov 30 15:22:19 node-1 ovndb-servers(tst-ovndb)[<wbr>2980860]:<br>

> > > > > INFO: ovsdb_server_monitor<br>

> > > > > <30>Nov 30 15:22:19 node-1 ovndb-servers(tst-ovndb)[<wbr>2980860]:<br>

> > > > > INFO: ovsdb_server_check_status<br>

> > > > > <30>Nov 30 15:22:19 node-1 ovndb-servers(tst-ovndb)[<wbr>2980860]:<br>

> > > > > INFO: return OCFOCF_NOT_RUNNINGG<br>

> > > > > <30>Nov 30 15:22:20 node-1 ovndb-servers(tst-ovndb)[<wbr>2980860]:<br>

> > > > > INFO: ovsdb_server_master_update: 7}<br>

> > > > > <30>Nov 30 15:22:20 node-1 ovndb-servers(tst-ovndb)[<wbr>2980860]:<br>

> > > > > INFO: ovsdb_server_master_update end}<br>

> > > > > <30>Nov 30 15:22:20 node-1 ovndb-servers(tst-ovndb)[<wbr>2980860]:<br>

> > > > > INFO: monitor is going to return 7<br>

> > > > > <30>Nov 30 15:22:20 node-1 ovndb-servers(undef)[2980970]:<br>

> > INFO:<br>

> > > > > metadata exit OCF_SUCCESS}<br>

> > > > ><br>

> > > > > 6. The cluster property:<br>

> > > > > property cib-bootstrap-options: \<br>

> > > > >         have-watchdog=false \<br>

> > > > >         dc-version=1.1.12-a14efad \<br>

> > > > >         cluster-infrastructure=<wbr>corosync \<br>

> > > > >         no-quorum-policy=ignore \<br>

> > > > >         stonith-enabled=false \<br>

> > > > >         symmetric-cluster=false \<br>

> > > > >         last-lrm-refresh=1511802933<br>

> > > > ><br>

> > > > ><br>

> > > > ><br>

> > > > > Thank you very much for any help.<br>

> > > > > Hui.<br>

> > > > ><br>

> > > > ><br>

> > > > > Date: Mon, 27 Nov 2017 12:07:57 -0600<br>

> > > > > From: Ken Gaillot <<a href="mailto:kgaillot@redhat.com">kgaillot@redhat.com</a>><br>

> > > > > To: Cluster Labs - All topics related to open-source<br>

> > clustering<br>

> > > > >         welcomed        <<a href="mailto:users@clusterlabs.org">users@clusterlabs.org</a>>, <wbr>jpokorny@red<br>

> > hat.<br>

> > > > > com<br>

> > > > > Subject: Re: [ClusterLabs] pcs create master/slave resource<br>

> > > > > doesn't<br>

> > > > >         work<br>

> > > > > Message-ID: <<a href="mailto:1511806077.5194.6.camel@redhat.com">1511806077.5194.6.camel@<wbr>redhat.com</a>><br>

> > > > > Content-Type: text/plain; charset="UTF-8"<br>

> > > > ><br>

> > > > > On Fri, 2017-11-24 at 18:00 +0800, Hui Xiang wrote:<br>

> > > > > > Jan,<br>

> > > > > ><br>

> > > > > > ? Very appreciated on your help, I am getting further more,<br>

> > but<br>

> > > > > still<br>

> > > > > > it looks very strange.<br>

> > > > > ><br>

> > > > > > 1. To use "debug-promote", I upgrade pacemaker from 1.12 to<br>

> > > > > 1.16, pcs<br>

> > > > > > to 0.9.160.<br>

> > > > > ><br>

> > > > > > 2. Recreate resource with below commands<br>

> > > > > > pcs resource create ovndb_servers ocf:ovn:ovndb-servers \<br>

> > > > > > ? master_ip=192.168.0.99 \<br>

> > > > > > ? op monitor interval="10s" \<br>

> > > > > > ? op monitor interval="11s" role=Master<br>

> > > > > > pcs resource master ovndb_servers-master ovndb_servers \<br>

> > > > > > ? meta notify="true" master-max="1" master-node-max="1"<br>

> > clone-<br>

> > > > > max="3"<br>

> > > > > > clone-node-max="1"<br>

> > > > > > pcs resource create VirtualIP ocf:heartbeat:IPaddr2<br>

> > > > > ip=192.168.0.99 \<br>

> > > > > > ? ? op monitor interval=10s<br>

> > > > > > pcs constraint colocation add VirtualIP with master<br>

> > > > > ovndb_servers-<br>

> > > > > > master \<br>

> > > > > > ? score=INFINITY<br>

> > > > > ><br>

> > > > > > 3. pcs status<br>

> > > > > > ?Master/Slave Set: ovndb_servers-master [ovndb_servers]<br>

> > > > > > ? ? ?Stopped: [ node-1.domain.tld node-2.domain.tld node-<br>

> > > > > 3.domain.tld<br>

> > > > > > ]<br>

> > > > > > ?VirtualIP    (ocf::heartbeat:IPaddr2):       Stopped<br>

> > > > > ><br>

> > > > > > 4. Manually run 'debug-start' on 3 nodes and 'debug-<br>

> > promote' on<br>

> > > > > one<br>

> > > > > > of nodes<br>

> > > > > > run below on [ node-1.domain.tld node-2.domain.tld node-<br>

> > > > > 3.domain.tld<br>

> > > > > > ]<br>

> > > > > > # pcs resource debug-start ovndb_servers --full<br>

> > > > > > run below on [ node-1.domain.tld ]<br>

> > > > > > # pcs resource debug-promote ovndb_servers --full<br>

> > > > ><br>

> > > > > Before running debug-* commands, I'd unmanage the resource or<br>

> > put<br>

> > > > > the<br>

> > > > > cluster in maintenance mode, so Pacemaker doesn't try to<br>

> > > > > "correct" your<br>

> > > > > actions.<br>

> > > > ><br>

> > > > > ><br>

> > > > > > 5. pcs status<br>

> > > > > > ?Master/Slave Set: ovndb_servers-master [ovndb_servers]<br>

> > > > > > ? ? ?Stopped: [ node-1.domain.tld node-2.domain.tld node-<br>

> > > > > 3.domain.tld<br>

> > > > > > ]<br>

> > > > > > ?VirtualIP    (ocf::heartbeat:IPaddr2):       Stopped<br>

> > > > > ><br>

> > > > > > 6. However I have seen that one of ovndb_servers has been<br>

> > > > > indeed<br>

> > > > > > promoted as master, but pcs status still showed all<br>

> > 'stopped'<br>

> > > > > > what am I missing?<br>

> > > > ><br>

> > > > > It's hard to tell from these logs. It's possible the resource<br>

> > > > > agent's<br>

> > > > > monitor command is not exiting with the expected status<br>

> > values:<br>

> > > > ><br>

> > > > > <a href="http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-singl" rel="noreferrer" target="_blank">http://clusterlabs.org/doc/en-<wbr>US/Pacemaker/1.1-pcs/html-<wbr>singl</a><br>

> > e/Pa<br>

> > > > > cemake<br>

> > > > ><br>

> > r_Explained/index.html#_<wbr>requirements_for_multi_state_<wbr>resource_age<br>

> > > > > nts<br>

> > > > ><br>

> > > > > One of the nodes will be elected the DC, meaning it<br>

> > coordinates<br>

> > > > > the<br>

> > > > > cluster's actions. The DC's logs will have more "pengine:"<br>

> > > > > messages,<br>

> > > > > with each action that needs to be taken (e.g. "* Start <rsc><br>

> > > > > <node>").<br>

> > > > ><br>

> > > > > You can look through those actions to see what the cluster<br>

> > > > > decided to<br>

> > > > > do -- whether the resources were ever started, whether any<br>

> > was<br>

> > > > > promoted, and whether any were explicitly stopped.<br>

> > > > ><br>

> > > > ><br>

> > > > > > ?>? stderr: + 17:45:59:<br>

> > ocf_log:327: __OCF_MSG='ovndb_<wbr>servers:<br>

> > > > > > Promoting node-1.domain.tld as the master'<br>

> > > > > > ?>? stderr: + 17:45:59: ocf_log:329: case "${__OCF_PRIO}"<br>

> > in<br>

> > > > > > ?>? stderr: + 17:45:59: ocf_log:333: __OCF_PRIO=INFO<br>

> > > > > > ?>? stderr: + 17:45:59: ocf_log:338: '[' INFO = DEBUG ']'<br>

> > > > > > ?>? stderr: + 17:45:59: ocf_log:341: ha_log 'INFO:<br>

> > > > > ovndb_servers:<br>

> > > > > > Promoting node-1.domain.tld as the master'<br>

> > > > > > ?>? stderr: + 17:45:59: ha_log:253: __ha_log 'INFO:<br>

> > > > > ovndb_servers:<br>

> > > > > > Promoting node-1.domain.tld as the master'<br>

> > > > > > ?>? stderr: + 17:45:59: __ha_log:185: local<br>

> > ignore_stderr=false<br>

> > > > > > ?>? stderr: + 17:45:59: __ha_log:186: local loglevel<br>

> > > > > > ?>? stderr: + 17:45:59: __ha_log:188: '[' 'xINFO:<br>

> > > > > ovndb_servers:<br>

> > > > > > Promoting node-1.domain.tld as the master' = x--ignore-<br>

> > stderr<br>

> > > > > ']'<br>

> > > > > > ?>? stderr: + 17:45:59: __ha_log:190: '[' none = '' ']'<br>

> > > > > > ?>? stderr: + 17:45:59: __ha_log:192: tty<br>

> > > > > > ?>? stderr: + 17:45:59: __ha_log:193: '[' x = x0 -a x =<br>

> > xdebug<br>

> > > > > ']'<br>

> > > > > > ?>? stderr: + 17:45:59: __ha_log:195: '[' false = true ']'<br>

> > > > > > ?>? stderr: + 17:45:59: __ha_log:199: '[' '' ']'<br>

> > > > > > ?>? stderr: + 17:45:59: __ha_log:202: echo 'INFO:<br>

> > > > > ovndb_servers:<br>

> > > > > > Promoting node-1.domain.tld as the master'<br>

> > > > > > ?>? stderr: INFO: ovndb_servers: Promoting node-<br>

> > 1.domain.tld as<br>

> > > > > the<br>

> > > > > > master<br>

> > > > > > ?>? stderr: + 17:45:59: __ha_log:204: return 0<br>

> > > > > > ?>? stderr: + 17:45:59: ovsdb_server_promote:378:<br>

> > > > > > /usr/sbin/crm_attribute --type crm_config --name<br>

> > OVN_REPL_INFO<br>

> > > > > -s<br>

> > > > > > ovn_ovsdb_master_server -v node-1.domain.tld<br>

> > > > > > ?>? stderr: + 17:45:59: ovsdb_server_promote:379:<br>

> > > > > > ovsdb_server_master_update 8<br>

> > > > > > ?>? stderr: + 17:45:59: ovsdb_server_master_update:<wbr>214:<br>

> > case $1<br>

> > > > > in<br>

> > > > > > ?>? stderr: + 17:45:59: ovsdb_server_master_update:<wbr>218:<br>

> > > > > > /usr/sbin/crm_master -l reboot -v 10<br>

> > > > > > ?>? stderr: + 17:45:59: ovsdb_server_promote:380: return 0<br>

> > > > > > ?>? stderr: + 17:45:59: 458: rc=0<br>

> > > > > > ?>? stderr: + 17:45:59: 459: exit 0<br>

> > > > > ><br>

> > > > > ><br>

> > > > > > On 23/11/17 23:52 +0800, Hui Xiang wrote:<br>

> > > > > > > I am working on HA with 3-nodes, which has below<br>

> > > > > configurations:<br>

> > > > > > >?<br>

> > > > > > > """<br>

> > > > > > > pcs resource create ovndb_servers ocf:ovn:ovndb-servers \<br>

> > > > > > >???master_ip=168.254.101.2 \<br>

> > > > > > >???op monitor interval="10s" \<br>

> > > > > > >???op monitor interval="11s" role=Master<br>

> > > > > > > pcs resource master ovndb_servers-master ovndb_servers \<br>

> > > > > > >???meta notify="true" master-max="1" master-node-max="1"<br>

> > > > > clone-<br>

> > > > > > max="3"<br>

> > > > > > > clone-node-max="1"<br>

> > > > > > > pcs resource create VirtualIP ocf:heartbeat:IPaddr2<br>

> > > > > > ip=168.254.101.2 \<br>

> > > > > > >?????op monitor interval=10s<br>

> > > > > > > pcs constraint order promote ovndb_servers-master then<br>

> > > > > VirtualIP<br>

> > > > > > > pcs constraint colocation add VirtualIP with master<br>

> > > > > ovndb_servers-<br>

> > > > > > master \<br>

> > > > > > >???score=INFINITY<br>

> > > > > > > """<br>

> > > > > ><br>

> > > > > > (Out of curiosity, this looks like a mix of output from?<br>

> > > > > > pcs config export pcs-commands [or clufter cib2pcscmd -s]<br>

> > > > > > and manual editing.??Is this a good guess?)<br>

> > > > > > It's the output of "pcs status".<br>

> > > > > ><br>

> > > > > > >???However, after setting it as above, the master is not<br>

> > being<br>

> > > > > > selected, all<br>

> > > > > > > are stopped, from pacemaker log, node-1 has been chosen<br>

> > as<br>

> > > > > the<br>

> > > > > > master, I am<br>

> > > > > > > confuse where is wrong, can anybody give a help, it would<br>

> > be<br>

> > > > > very<br>

> > > > > > > appreciated.<br>

> > > > > > >?<br>

> > > > > > >?<br>

> > > > > > >??Master/Slave Set: ovndb_servers-master [ovndb_servers]<br>

> > > > > > >??????Stopped: [ node-1.domain.tld node-2.domain.tld node-<br>

> > > > > > 3.domain.tld ]<br>

> > > > > > >??VirtualIP (ocf::heartbeat:IPaddr2): Stopped<br>

> > > > > > >?<br>

> > > > > > >?<br>

> > > > > > > # pacemaker log<br>

> > > > > > > Nov 23 23:06:03 [665246] node-<br>

> > > > > 1.domain.tld????????cib:?????<wbr>info:<br>

> > > > > > > cib_perform_op: ++<br>

> > /cib/configuration/resources:?<wbr>?<primitive<br>

> > > > > > class="ocf"<br>

> > > > > > > id="ovndb_servers" provider="ovn" type="ovndb-servers"/><br>

> > > > > > > Nov 23 23:06:03 [665246] node-<br>

> > > > > 1.domain.tld????????cib:?????<wbr>info:<br>

> > > > > > > cib_perform_op:<br>

> > > > > > ++????????????????????????????<wbr>??????<instance_attributes<br>

> > > > > > > id="ovndb_servers-instance_<wbr>attributes"><br>

> > > > > > > Nov 23 23:06:03 [665246] node-<br>

> > > > > 1.domain.tld????????cib:?????<wbr>info:<br>

> > > > > > > cib_perform_op:<br>

> > ++????????????????????????????<wbr>????????<nvpair<br>

> > > > > > > id="ovndb_servers-instance_<wbr>attributes-master_ip"<br>

> > > > > name="master_ip"<br>

> > > > > > > value="168.254.101.2"/><br>

> > > > > > > Nov 23 23:06:03 [665246] node-<br>

> > > > > 1.domain.tld????????cib:?????<wbr>info:<br>

> > > > > > > cib_perform_op:<br>

> > > > > > ++????????????????????????????<wbr>??????</instance_attributes><br>

> > > > > > > Nov 23 23:06:03 [665246] node-<br>

> > > > > 1.domain.tld????????cib:?????<wbr>info:<br>

> > > > > > > cib_perform_op:<br>

> > > > > ++????????????????????????????<wbr>??????<operations><br>

> > > > > > > Nov 23 23:06:03 [665246] node-<br>

> > > > > 1.domain.tld????????cib:?????<wbr>info:<br>

> > > > > > > cib_perform_op: ++????????????????????????????<wbr>????????<op<br>

> > > > > > > id="ovndb_servers-start-<wbr>timeout-30s" interval="0s"<br>

> > > > > name="start"<br>

> > > > > > > timeout="30s"/><br>

> > > > > > > Nov 23 23:06:03 [665246] node-<br>

> > > > > 1.domain.tld????????cib:?????<wbr>info:<br>

> > > > > > > cib_perform_op: ++????????????????????????????<wbr>????????<op<br>

> > > > > > > id="ovndb_servers-stop-<wbr>timeout-20s" interval="0s"<br>

> > name="stop"<br>

> > > > > > > timeout="20s"/><br>

> > > > > > > Nov 23 23:06:03 [665246] node-<br>

> > > > > 1.domain.tld????????cib:?????<wbr>info:<br>

> > > > > > > cib_perform_op: ++????????????????????????????<wbr>????????<op<br>

> > > > > > > id="ovndb_servers-promote-<wbr>timeout-50s" interval="0s"<br>

> > > > > name="promote"<br>

> > > > > > > timeout="50s"/><br>

> > > > > > > Nov 23 23:06:03 [665246] node-<br>

> > > > > 1.domain.tld????????cib:?????<wbr>info:<br>

> > > > > > > cib_perform_op: ++????????????????????????????<wbr>????????<op<br>

> > > > > > > id="ovndb_servers-demote-<wbr>timeout-50s" interval="0s"<br>

> > > > > name="demote"<br>

> > > > > > > timeout="50s"/><br>

> > > > > > > Nov 23 23:06:03 [665246] node-<br>

> > > > > 1.domain.tld????????cib:?????<wbr>info:<br>

> > > > > > > cib_perform_op: ++????????????????????????????<wbr>????????<op<br>

> > > > > > > id="ovndb_servers-monitor-<wbr>interval-10s" interval="10s"<br>

> > > > > > name="monitor"/><br>

> > > > > > > Nov 23 23:06:03 [665246] node-<br>

> > > > > 1.domain.tld????????cib:?????<wbr>info:<br>

> > > > > > > cib_perform_op: ++????????????????????????????<wbr>????????<op<br>

> > > > > > > id="ovndb_servers-monitor-<wbr>interval-11s-role-Master"<br>

> > > > > interval="11s"<br>

> > > > > > > name="monitor" role="Master"/><br>

> > > > > > > Nov 23 23:06:03 [665246] node-<br>

> > > > > 1.domain.tld????????cib:?????<wbr>info:<br>

> > > > > > > cib_perform_op:<br>

> > > > > ++????????????????????????????<wbr>??????</operations><br>

> > > > > > > Nov 23 23:06:03 [665246] node-<br>

> > > > > 1.domain.tld????????cib:?????<wbr>info:<br>

> > > > > > > cib_perform_op:<br>

> > > > > ++????????????????????????????<wbr>????</primitive><br>

> > > > > > >?<br>

> > > > > > > Nov 23 23:06:03 [665249] node-<br>

> > > > > 1.domain.tld??????attrd:?????<wbr>info:<br>

> > > > > > > attrd_peer_update: Setting master-ovndb_servers[node-<br>

> > > > > 1.domain.tld]:<br>

> > > > > > (null)<br>

> > > > > > > -> 5 from node-1.domain.tld<br>

> > > > > ><br>

> > > > > > If it's probable your ocf:ovn:ovndb-servers agent in master<br>

> > > > > mode can<br>

> > > > > > run something like "attrd_updater -n master-ovndb_servers<br>

> > -U<br>

> > > > > 5", then<br>

> > > > > > it was indeed launched OK, and if it does not continue to<br>

> > run<br>

> > > > > as<br>

> > > > > > expected, there may be a problem with the agent itself.<br>

> > > > > ><br>

> > > > > > no change.<br>

> > > > > > You can try running "pcs resource debug-promote<br>

> > ovndb_servers<br>

> > > > > --full"<br>

> > > > > > to examine the executation details (assuming the agent<br>

> > responds<br>

> > > > > to<br>

> > > > > > OCF_TRACE_RA=1 environment variable, which is what shell-<br>

> > based<br>

> > > > > > agents built on top ocf-shellfuncs sourcable shell library<br>

> > from<br>

> > > > > > resource-agents project, hence incl. also agents it ships,<br>

> > > > > > customarily do).<br>

> > > > > > Yes, thank, it's helpful.<br>

> > > > > ><br>

> > > > > > > Nov 23 23:06:03 [665251] node-<br>

> > > > > 1.domain.tld???????crmd:???<wbr>notice:<br>

> > > > > > > process_lrm_event: Operation ovndb_servers_monitor_0: ok<br>

> > > > > > > (node=node-1.domain.tld, call=185, rc=0, cib-update=88,<br>

> > > > > > confirmed=true)<br>

> > > > > > > <29>Nov 23 23:06:03 node-1 crmd[665251]:???notice:<br>

> > > > > > process_lrm_event:<br>

> > > > > > > Operation ovndb_servers_monitor_0: ok (node=node-<br>

> > > > > 1.domain.tld,<br>

> > > > > > call=185,<br>

> > > > > > > rc=0, cib-update=88, confirmed=true)<br>

> > > > > > > Nov 23 23:06:03 [665246] node-<br>

> > > > > 1.domain.tld????????cib:?????<wbr>info:<br>

> > > > > > > cib_perform_op: Diff: --- 0.630.2 2<br>

> > > > > > > Nov 23 23:06:03 [665246] node-<br>

> > > > > 1.domain.tld????????cib:?????<wbr>info:<br>

> > > > > > > cib_perform_op: Diff: +++ 0.630.3 (null)<br>

> > > > > > > Nov 23 23:06:03 [665246] node-<br>

> > > > > 1.domain.tld????????cib:?????<wbr>info:<br>

> > > > > > > cib_perform_op: +??/cib:??@num_updates=3<br>

> > > > > > > Nov 23 23:06:03 [665246] node-<br>

> > > > > 1.domain.tld????????cib:?????<wbr>info:<br>

> > > > > > > cib_perform_op: ++<br>

> > > > > > ><br>

> > > > > ><br>

> > > > ><br>

> > /cib/status/node_state[@id='1'<wbr>]/transient_attributes[@id='1'<wbr>]/ins<br>

> > > > > tanc<br>

> > > > > > e_attributes[@id='status-1']:<br>

> > > > > > > <nvpair id="status-1-master-ovndb_<wbr>servers" name="master-<br>

> > > > > > ovndb_servers"<br>

> > > > > > > value="5"/><br>

> > > > > > > Nov 23 23:06:03 [665246] node-<br>

> > > > > 1.domain.tld????????cib:?????<wbr>info:<br>

> > > > > > > cib_process_request: Completed cib_modify operation for<br>

> > > > > section<br>

> > > > > > status: OK<br>

> > > > > > > (rc=0, origin=node-3.domain.tld/<wbr>attrd/80,<br>

> > version=0.630.3)<br>

> > > > > ><br>

> > > > > > Also depends if there's anything interesting after this<br>

> > > > > point...<br>

> > > > > ><br>

> > > > > > ______________________________<wbr>_________________<br>

> > > > > > Users mailing list: <a href="mailto:Users@clusterlabs.org">Users@clusterlabs.org</a><br>

> > > > > > <a href="http://lists.clusterlabs.org/mailman/listinfo/users" rel="noreferrer" target="_blank">http://lists.clusterlabs.<wbr>org/mailman/listinfo/users</a><br>

> > > > > ><br>

> > > > > > Project Home: <a href="http://www.clusterlabs.org" rel="noreferrer" target="_blank">http://www.clusterlabs.<wbr>org</a><br>

> > > > > > Getting started: <a href="http://www.clusterlabs.org/doc/Cluster_fro" rel="noreferrer" target="_blank">http://www.<wbr>clusterlabs.org/doc/Cluster_<wbr>fro</a><br>

> > m_Sc<br>

> > > > > ratch.<br>

> > > > > > pdf<br>

> > > > > > Bugs: <a href="http://bugs.clusterlabs.org" rel="noreferrer" target="_blank">http://bugs.clusterlabs.<wbr>org</a><br>

> > > > > --<br>

> > > > > Ken Gaillot <<a href="mailto:kgaillot@redhat.com">kgaillot@redhat.com</a>><br>

> > > > ><br>

> > > ><br>

> > > ><br>

> > ><br>

> > ><br>

> > --<br>

> > Ken Gaillot <<a href="mailto:kgaillot@redhat.com">kgaillot@redhat.com</a>><br>

> ><br>

><br>

><br>

</div></div><span class="HOEnZb"><font color="#888888">--<br>

Ken Gaillot <<a href="mailto:kgaillot@redhat.com">kgaillot@redhat.com</a>><br>

</font></span></blockquote></div><br></div>