[ClusterLabs] pcs create master/slave resource doesn't work (Ken Gaillot)

Tue Dec 5 04:31:39 EST 2017

Thank you very much Ken!! You nailed it, now it's working :-)

On Tue, Dec 5, 2017 at 5:29 AM, Ken Gaillot <kgaillot at redhat.com> wrote:

> On Mon, 2017-12-04 at 23:15 +0800, Hui Xiang wrote:
> > Thanks Ken very much for the helpful information. It indeed help a
> > lot for debbuging.
> >
> >  " Each time the DC decides what to do, there will be a line like
> > "...
> > saving inputs in ..." with a file name. The log messages just before
> > that may give some useful information."
> >   - I am unable to find such information in the logs, it only prints
> > some like /var/lib/pacemaker/pengine/pe-input-xx
>
> If the cluster had nothing to do, it won't show anything, but if
> actions were needed, it should show them, like
> "Start      myrsc         ( node1 )".
>
> Are there any messages with "error" or "warning" in the log?
>
> > When I am comparing the cib.xml file of good with bad one, it
> > diffetiates from the order of "name" and "id" as below shown, does it
> > matter for cib to function normally?
>
> No, the XML attributes can be any order.
>
> I just noticed that your cluster has symmetric-cluster=false. That
> means that resources can't run anywhere by default; in order for a
> resource to run, there must be a location constraint allowing it to run
> on a node. Have you added such constraints?
>
> >
> >           <operations>
> >             <op id="ovndb-servers-monitor-20" interval="20"
> > name="monitor" timeout="30"/>
> >             <op id="ovndb-servers-start-0" interval="0" name="start"
> > timeout="60"/>
> >             <op id="ovndb-servers-stop-0" interval="0" name="stop"
> > timeout="60"/>
> >             <op id="ovndb-servers-promote-0" interval="0"
> > name="promote" timeout="60"/>
> >             <op id="ovndb-servers-demote-0" interval="0"
> > name="demote" timeout="60"/>
> >           </operations>
> >
> >           <operations>
> >             <op name="monitor" interval="20"
> > timeout="30"  id="ovndb-servers-monitor-20"/>
> >             <op name="start" interval="0"  timeout="60"  id="ovndb-
> > servers-start-0" />
> >             <op name="stop" interval="0" timeout="60" id="ovndb-
> > servers-stop-0" />
> >           </operations>
> >
> >
> > Thanks.
> > Hui.
> >
> >
> > On Sat, Dec 2, 2017 at 5:07 AM, Ken Gaillot <kgaillot at redhat.com>
> > wrote:
> > > On Fri, 2017-12-01 at 09:36 +0800, Hui Xiang wrote:
> > > > Hi all,
> > > >
> > > >   I am using the ovndb-servers ocf agent[1] which is a kind of
> > > multi-
> > > > state resource,when I am creating it(please see my previous
> > > email),
> > > > the monitor is called only once, and the start operation is never
> > > > called, according to below description, the once called monitor
> > > > operation returned OCF_NOT_RUNNING, should the pacemaker will
> > > decide
> > > > to execute start action based this return code? is there any way
> > > to
> > >
> > > Before Pacemaker does anything with a resource, it first calls a
> > > one-
> > > time monitor (called a "probe") to find out the current status of
> > > the
> > > resource across the cluster. This allows it to discover if the
> > > service
> > > is already running somewhere.
> > >
> > > So, you will see those probes for every resource when the cluster
> > > starts, or when the resource is added to the configuration, or when
> > > the
> > > resource is cleaned up.
> > >
> > > > check out what is the next action? Currently in my environment
> > > > nothing happened and I am almost tried all I known ways to debug,
> > > > however, no lucky, could anyone help it out? thank you very much.
> > > >
> > > > Monitor Return Code   Description
> > > > OCF_NOT_RUNNING       Stopped
> > > > OCF_SUCCESS   Running (Slave)
> > > > OCF_RUNNING_MASTER    Running (Master)
> > > > OCF_FAILED_MASTER     Failed (Master)
> > > > Other Failed (Slave)
> > > >
> > > >
> > > > [1] https://github.com/openvswitch/ovs/blob/master/ovn/utilities/
> > > ovnd
> > > > b-servers.ocf
> > > > Hui.
> > > >
> > > >
> > > >
> > > > On Thu, Nov 30, 2017 at 6:39 PM, Hui Xiang <xianghuir at gmail.com>
> > > > wrote:
> > > > > The really weired thing is that the monitor is only called once
> > > > > other than expected repeatedly, where should I check for it?
> > > > >
> > > > > On Thu, Nov 30, 2017 at 4:14 PM, Hui Xiang <xianghuir at gmail.com
> > > >
> > > > > wrote:
> > > > > > Thanks Ken very much for your helpful infomation.
> > > > > >
> > > > > > I am now blocking on I can't see the pacemaker DC do any
> > > further
> > > > > > start/promote etc action on my resource agents, no helpful
> > > logs
> > > > > > founded.
> > >
> > > Each time the DC decides what to do, there will be a line like "...
> > > saving inputs in ..." with a file name. The log messages just
> > > before
> > > that may give some useful information.
> > >
> > > Otherwise, you can take that file, and simulate what the cluster
> > > decided at that point:
> > >
> > >   crm_simulate -Sx $FILENAME
> > >
> > > It will first show the status of the cluster at the start of the
> > > decision-making, then a "Transition Summary" with the actions that
> > > are
> > > required, then a simulated execution of those actions, and then
> > > what
> > > the resulting status would be if those actions succeeded.
> > >
> > > That may give you some more information. You can make it more
> > > verbose
> > > by using "-Ssx", or by adding "-VVVV", but it's not very user-
> > > friendly
> > > output.
> > >
> > > > > >
> > > > > > So my first question is that in what kind of situation DC
> > > will
> > > > > > decide do call start action?  does the monitor operation need
> > > to
> > > > > > be return OCF_SUCCESS? in my case, it will return
> > > > > > OCF_NOT_RUNNING, and the monitor operation is not being
> > > called
> > > > > > any more, which should be wrong as I felt that it should be
> > > > > > called intervally.
> > >
> > > The DC will ask for a start if the configuration and current status
> > > require it. For example, if the resource's current status is
> > > stopped,
> > > and the configuration calls for a target role of started (the
> > > default),
> > > then it will start it. On the other hand, if the current status is
> > > started, then it doesn't need to do anything -- or, if location
> > > constraints ban all the nodes from running the resource, then it
> > > can't
> > > do anything.
> > >
> > > So, it's all based on what the current status is (based on the last
> > > monitor result), and what the configuration requires.
> > >
> > > > > >
> > > > > > The resource agent monitor logistic:
> > > > > > In the xx_monitor function it will call xx_update, and there
> > > > > > always hit  "$CRM_MASTER -D;;" , what does it usually mean?
> > > will
> > > > > > it stopped that start operation being called?
> > >
> > > Each master/slave resource has a special node attribute with a
> > > "master
> > > score" for that node. The node with the highest master score will
> > > be
> > > promoted to master. It's up to the resource agent to set this
> > > attribute. The "-D" call you see deletes that attribute (presumably
> > > before updating it later).
> > >
> > > The master score has no effect on starting/stopping.
> > >
> > > > > >
> > > > > > ovsdb_server_master_update() {
> > > > > >     ocf_log info "ovsdb_server_master_update: $1}"
> > > > > >
> > > > > >     case $1 in
> > > > > >         $OCF_SUCCESS)
> > > > > >         $CRM_MASTER -v ${slave_score};;
> > > > > >         $OCF_RUNNING_MASTER)
> > > > > >             $CRM_MASTER -v ${master_score};;
> > > > > >         #*) $CRM_MASTER -D;;
> > > > > >     esac
> > > > > >     ocf_log info "ovsdb_server_master_update end}"
> > > > > > }
> > > > > >
> > > > > > ovsdb_server_monitor() {
> > > > > >     ocf_log info "ovsdb_server_monitor"
> > > > > >     ovsdb_server_check_status
> > > > > >     rc=$?
> > > > > >
> > > > > >     ovsdb_server_master_update $rc
> > > > > >     ocf_log info "monitor is going to return $rc"
> > > > > >     return $rc
> > > > > > }
> > > > > >
> > > > > >
> > > > > > Below is my cluster configuration:
> > > > > >
> > > > > > 1. First I have an vip set.
> > > > > > [root at node-1 ~]# pcs resource show
> > > > > >  vip__management_old      (ocf::es:ns_IPaddr2):   Started
> > > > > > node-1.domain.tld
> > > > > >
> > > > > > 2. Use pcs to create ovndb-servers and constraint
> > > > > > [root at node-1 ~]# pcs resource create tst-ovndb ocf:ovn:ovndb-
> > > > > > servers  manage_northd=yes master_ip=192.168.0.2
> > > > > > nb_master_port=6641 sb_master_port=6642 master
> > > > > >      ([root at node-1 ~]# pcs resource meta tst-ovndb-master
> > > > > > notify=true
> > > > > >       Error: unable to find a resource/clone/master/group:
> > > tst-
> > > > > > ovndb-master) ## returned error, so I changed into below
> > > command.
> > > > > > [root at node-1 ~]# pcs resource master tst-ovndb-master tst-
> > > ovndb
> > > > > > notify=true
> > > > > > [root at node-1 ~]# pcs constraint colocation add master tst-
> > > ovndb-
> > > > > > master with vip__management_old
> > > > > >
> > > > > > 3. pcs status
> > > > > > [root at node-1 ~]# pcs status
> > > > > >  vip__management_old      (ocf::es:ns_IPaddr2):   Started
> > > > > > node-1.domain.tld
> > > > > >  Master/Slave Set: tst-ovndb-master [tst-ovndb]
> > > > > >      Stopped: [ node-1.domain.tld node-2.domain.tld node-
> > > > > > 3.domain.tld ]
> > > > > >
> > > > > > 4. pcs resource show XXX
> > > > > > [root at node-1 ~]# pcs resource show  vip__management_old
> > > > > >  Resource: vip__management_old (class=ocf provider=es
> > > > > > type=ns_IPaddr2)
> > > > > >   Attributes: nic=br-mgmt base_veth=br-mgmt-hapr
> > > ns_veth=hapr-m
> > > > > > ip=192.168.0.2 iflabel=ka cidr_netmask=24 ns=haproxy
> > > gateway=none
> > > > > > gateway_metric=0 iptables_start_rules=false
> > > > > > iptables_stop_rules=false iptables_comment=default-comment
> > > > > >   Meta Attrs: migration-threshold=3 failure-timeout=60
> > > resource-
> > > > > > stickiness=1
> > > > > >   Operations: monitor interval=3 timeout=30
> > > (vip__management_old-
> > > > > > monitor-3)
> > > > > >               start interval=0 timeout=30
> > > (vip__management_old-
> > > > > > start-0)
> > > > > >               stop interval=0 timeout=30
> > > (vip__management_old-
> > > > > > stop-0)
> > > > > > [root at node-1 ~]# pcs resource show tst-ovndb-master
> > > > > >  Master: tst-ovndb-master
> > > > > >   Meta Attrs: notify=true
> > > > > >   Resource: tst-ovndb (class=ocf provider=ovn type=ovndb-
> > > servers)
> > > > > >    Attributes: manage_northd=yes master_ip=192.168.0.2
> > > > > > nb_master_port=6641 sb_master_port=6642
> > > > > >    Operations: start interval=0s timeout=30s (tst-ovndb-
> > > start-
> > > > > > timeout-30s)
> > > > > >                stop interval=0s timeout=20s (tst-ovndb-stop-
> > > > > > timeout-20s)
> > > > > >                promote interval=0s timeout=50s (tst-ovndb-
> > > > > > promote-timeout-50s)
> > > > > >                demote interval=0s timeout=50s (tst-ovndb-
> > > demote-
> > > > > > timeout-50s)
> > > > > >                monitor interval=30s timeout=20s (tst-ovndb-
> > > > > > monitor-interval-30s)
> > > > > >                monitor interval=10s role=Master timeout=20s
> > > (tst-
> > > > > > ovndb-monitor-interval-10s-role-Master)
> > > > > >                monitor interval=30s role=Slave timeout=20s
> > > (tst-
> > > > > > ovndb-monitor-interval-30s-role-Slave)
> > > > > >
> > > > > >
> > > > > > colocation colocation-tst-ovndb-master-vip__management_old-
> > > > > > INFINITY inf: tst-ovndb-master:Master
> > > vip__management_old:Started
> > > > > >
> > > > > > 5. I have put log in every ovndb-servers op, seems only the
> > > > > > monitor op is being called, no promoted by the pacemaker DC:
> > > > > > <30>Nov 30 15:22:19 node-1 ovndb-servers(tst-ovndb)[2980860]:
> > > > > > INFO: ovsdb_server_monitor
> > > > > > <30>Nov 30 15:22:19 node-1 ovndb-servers(tst-ovndb)[2980860]:
> > > > > > INFO: ovsdb_server_check_status
> > > > > > <30>Nov 30 15:22:19 node-1 ovndb-servers(tst-ovndb)[2980860]:
> > > > > > INFO: return OCFOCF_NOT_RUNNINGG
> > > > > > <30>Nov 30 15:22:20 node-1 ovndb-servers(tst-ovndb)[2980860]:
> > > > > > INFO: ovsdb_server_master_update: 7}
> > > > > > <30>Nov 30 15:22:20 node-1 ovndb-servers(tst-ovndb)[2980860]:
> > > > > > INFO: ovsdb_server_master_update end}
> > > > > > <30>Nov 30 15:22:20 node-1 ovndb-servers(tst-ovndb)[2980860]:
> > > > > > INFO: monitor is going to return 7
> > > > > > <30>Nov 30 15:22:20 node-1 ovndb-servers(undef)[2980970]:
> > > INFO:
> > > > > > metadata exit OCF_SUCCESS}
> > > > > >
> > > > > > 6. The cluster property:
> > > > > > property cib-bootstrap-options: \
> > > > > >         have-watchdog=false \
> > > > > >         dc-version=1.1.12-a14efad \
> > > > > >         cluster-infrastructure=corosync \
> > > > > >         no-quorum-policy=ignore \
> > > > > >         stonith-enabled=false \
> > > > > >         symmetric-cluster=false \
> > > > > >         last-lrm-refresh=1511802933
> > > > > >
> > > > > >
> > > > > >
> > > > > > Thank you very much for any help.
> > > > > > Hui.
> > > > > >
> > > > > >
> > > > > > Date: Mon, 27 Nov 2017 12:07:57 -0600
> > > > > > From: Ken Gaillot <kgaillot at redhat.com>
> > > > > > To: Cluster Labs - All topics related to open-source
> > > clustering
> > > > > >         welcomed        <users at clusterlabs.org>, jpokorny at red
> > > hat.
> > > > > > com
> > > > > > Subject: Re: [ClusterLabs] pcs create master/slave resource
> > > > > > doesn't
> > > > > >         work
> > > > > > Message-ID: <1511806077.5194.6.camel at redhat.com>
> > > > > > Content-Type: text/plain; charset="UTF-8"
> > > > > >
> > > > > > On Fri, 2017-11-24 at 18:00 +0800, Hui Xiang wrote:
> > > > > > > Jan,
> > > > > > >
> > > > > > > ? Very appreciated on your help, I am getting further more,
> > > but
> > > > > > still
> > > > > > > it looks very strange.
> > > > > > >
> > > > > > > 1. To use "debug-promote", I upgrade pacemaker from 1.12 to
> > > > > > 1.16, pcs
> > > > > > > to 0.9.160.
> > > > > > >
> > > > > > > 2. Recreate resource with below commands
> > > > > > > pcs resource create ovndb_servers ocf:ovn:ovndb-servers \
> > > > > > > ? master_ip=192.168.0.99 \
> > > > > > > ? op monitor interval="10s" \
> > > > > > > ? op monitor interval="11s" role=Master
> > > > > > > pcs resource master ovndb_servers-master ovndb_servers \
> > > > > > > ? meta notify="true" master-max="1" master-node-max="1"
> > > clone-
> > > > > > max="3"
> > > > > > > clone-node-max="1"
> > > > > > > pcs resource create VirtualIP ocf:heartbeat:IPaddr2
> > > > > > ip=192.168.0.99 \
> > > > > > > ? ? op monitor interval=10s
> > > > > > > pcs constraint colocation add VirtualIP with master
> > > > > > ovndb_servers-
> > > > > > > master \
> > > > > > > ? score=INFINITY
> > > > > > >
> > > > > > > 3. pcs status
> > > > > > > ?Master/Slave Set: ovndb_servers-master [ovndb_servers]
> > > > > > > ? ? ?Stopped: [ node-1.domain.tld node-2.domain.tld node-
> > > > > > 3.domain.tld
> > > > > > > ]
> > > > > > > ?VirtualIP    (ocf::heartbeat:IPaddr2):       Stopped
> > > > > > >
> > > > > > > 4. Manually run 'debug-start' on 3 nodes and 'debug-
> > > promote' on
> > > > > > one
> > > > > > > of nodes
> > > > > > > run below on [ node-1.domain.tld node-2.domain.tld node-
> > > > > > 3.domain.tld
> > > > > > > ]
> > > > > > > # pcs resource debug-start ovndb_servers --full
> > > > > > > run below on [ node-1.domain.tld ]
> > > > > > > # pcs resource debug-promote ovndb_servers --full
> > > > > >
> > > > > > Before running debug-* commands, I'd unmanage the resource or
> > > put
> > > > > > the
> > > > > > cluster in maintenance mode, so Pacemaker doesn't try to
> > > > > > "correct" your
> > > > > > actions.
> > > > > >
> > > > > > >
> > > > > > > 5. pcs status
> > > > > > > ?Master/Slave Set: ovndb_servers-master [ovndb_servers]
> > > > > > > ? ? ?Stopped: [ node-1.domain.tld node-2.domain.tld node-
> > > > > > 3.domain.tld
> > > > > > > ]
> > > > > > > ?VirtualIP    (ocf::heartbeat:IPaddr2):       Stopped
> > > > > > >
> > > > > > > 6. However I have seen that one of ovndb_servers has been
> > > > > > indeed
> > > > > > > promoted as master, but pcs status still showed all
> > > 'stopped'
> > > > > > > what am I missing?
> > > > > >
> > > > > > It's hard to tell from these logs. It's possible the resource
> > > > > > agent's
> > > > > > monitor command is not exiting with the expected status
> > > values:
> > > > > >
> > > > > > http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-singl
> > > e/Pa
> > > > > > cemake
> > > > > >
> > > r_Explained/index.html#_requirements_for_multi_state_resource_age
> > > > > > nts
> > > > > >
> > > > > > One of the nodes will be elected the DC, meaning it
> > > coordinates
> > > > > > the
> > > > > > cluster's actions. The DC's logs will have more "pengine:"
> > > > > > messages,
> > > > > > with each action that needs to be taken (e.g. "* Start <rsc>
> > > > > > <node>").
> > > > > >
> > > > > > You can look through those actions to see what the cluster
> > > > > > decided to
> > > > > > do -- whether the resources were ever started, whether any
> > > was
> > > > > > promoted, and whether any were explicitly stopped.
> > > > > >
> > > > > >
> > > > > > > ?>? stderr: + 17:45:59:
> > > ocf_log:327: __OCF_MSG='ovndb_servers:
> > > > > > > Promoting node-1.domain.tld as the master'
> > > > > > > ?>? stderr: + 17:45:59: ocf_log:329: case "${__OCF_PRIO}"
> > > in
> > > > > > > ?>? stderr: + 17:45:59: ocf_log:333: __OCF_PRIO=INFO
> > > > > > > ?>? stderr: + 17:45:59: ocf_log:338: '[' INFO = DEBUG ']'
> > > > > > > ?>? stderr: + 17:45:59: ocf_log:341: ha_log 'INFO:
> > > > > > ovndb_servers:
> > > > > > > Promoting node-1.domain.tld as the master'
> > > > > > > ?>? stderr: + 17:45:59: ha_log:253: __ha_log 'INFO:
> > > > > > ovndb_servers:
> > > > > > > Promoting node-1.domain.tld as the master'
> > > > > > > ?>? stderr: + 17:45:59: __ha_log:185: local
> > > ignore_stderr=false
> > > > > > > ?>? stderr: + 17:45:59: __ha_log:186: local loglevel
> > > > > > > ?>? stderr: + 17:45:59: __ha_log:188: '[' 'xINFO:
> > > > > > ovndb_servers:
> > > > > > > Promoting node-1.domain.tld as the master' = x--ignore-
> > > stderr
> > > > > > ']'
> > > > > > > ?>? stderr: + 17:45:59: __ha_log:190: '[' none = '' ']'
> > > > > > > ?>? stderr: + 17:45:59: __ha_log:192: tty
> > > > > > > ?>? stderr: + 17:45:59: __ha_log:193: '[' x = x0 -a x =
> > > xdebug
> > > > > > ']'
> > > > > > > ?>? stderr: + 17:45:59: __ha_log:195: '[' false = true ']'
> > > > > > > ?>? stderr: + 17:45:59: __ha_log:199: '[' '' ']'
> > > > > > > ?>? stderr: + 17:45:59: __ha_log:202: echo 'INFO:
> > > > > > ovndb_servers:
> > > > > > > Promoting node-1.domain.tld as the master'
> > > > > > > ?>? stderr: INFO: ovndb_servers: Promoting node-
> > > 1.domain.tld as
> > > > > > the
> > > > > > > master
> > > > > > > ?>? stderr: + 17:45:59: __ha_log:204: return 0
> > > > > > > ?>? stderr: + 17:45:59: ovsdb_server_promote:378:
> > > > > > > /usr/sbin/crm_attribute --type crm_config --name
> > > OVN_REPL_INFO
> > > > > > -s
> > > > > > > ovn_ovsdb_master_server -v node-1.domain.tld
> > > > > > > ?>? stderr: + 17:45:59: ovsdb_server_promote:379:
> > > > > > > ovsdb_server_master_update 8
> > > > > > > ?>? stderr: + 17:45:59: ovsdb_server_master_update:214:
> > > case $1
> > > > > > in
> > > > > > > ?>? stderr: + 17:45:59: ovsdb_server_master_update:218:
> > > > > > > /usr/sbin/crm_master -l reboot -v 10
> > > > > > > ?>? stderr: + 17:45:59: ovsdb_server_promote:380: return 0
> > > > > > > ?>? stderr: + 17:45:59: 458: rc=0
> > > > > > > ?>? stderr: + 17:45:59: 459: exit 0
> > > > > > >
> > > > > > >
> > > > > > > On 23/11/17 23:52 +0800, Hui Xiang wrote:
> > > > > > > > I am working on HA with 3-nodes, which has below
> > > > > > configurations:
> > > > > > > >?
> > > > > > > > """
> > > > > > > > pcs resource create ovndb_servers ocf:ovn:ovndb-servers \
> > > > > > > >???master_ip=168.254.101.2 \
> > > > > > > >???op monitor interval="10s" \
> > > > > > > >???op monitor interval="11s" role=Master
> > > > > > > > pcs resource master ovndb_servers-master ovndb_servers \
> > > > > > > >???meta notify="true" master-max="1" master-node-max="1"
> > > > > > clone-
> > > > > > > max="3"
> > > > > > > > clone-node-max="1"
> > > > > > > > pcs resource create VirtualIP ocf:heartbeat:IPaddr2
> > > > > > > ip=168.254.101.2 \
> > > > > > > >?????op monitor interval=10s
> > > > > > > > pcs constraint order promote ovndb_servers-master then
> > > > > > VirtualIP
> > > > > > > > pcs constraint colocation add VirtualIP with master
> > > > > > ovndb_servers-
> > > > > > > master \
> > > > > > > >???score=INFINITY
> > > > > > > > """
> > > > > > >
> > > > > > > (Out of curiosity, this looks like a mix of output from?
> > > > > > > pcs config export pcs-commands [or clufter cib2pcscmd -s]
> > > > > > > and manual editing.??Is this a good guess?)
> > > > > > > It's the output of "pcs status".
> > > > > > >
> > > > > > > >???However, after setting it as above, the master is not
> > > being
> > > > > > > selected, all
> > > > > > > > are stopped, from pacemaker log, node-1 has been chosen
> > > as
> > > > > > the
> > > > > > > master, I am
> > > > > > > > confuse where is wrong, can anybody give a help, it would
> > > be
> > > > > > very
> > > > > > > > appreciated.
> > > > > > > >?
> > > > > > > >?
> > > > > > > >??Master/Slave Set: ovndb_servers-master [ovndb_servers]
> > > > > > > >??????Stopped: [ node-1.domain.tld node-2.domain.tld node-
> > > > > > > 3.domain.tld ]
> > > > > > > >??VirtualIP (ocf::heartbeat:IPaddr2): Stopped
> > > > > > > >?
> > > > > > > >?
> > > > > > > > # pacemaker log
> > > > > > > > Nov 23 23:06:03 [665246] node-
> > > > > > 1.domain.tld????????cib:?????info:
> > > > > > > > cib_perform_op: ++
> > > /cib/configuration/resources:??<primitive
> > > > > > > class="ocf"
> > > > > > > > id="ovndb_servers" provider="ovn" type="ovndb-servers"/>
> > > > > > > > Nov 23 23:06:03 [665246] node-
> > > > > > 1.domain.tld????????cib:?????info:
> > > > > > > > cib_perform_op:
> > > > > > > ++??????????????????????????????????<instance_attributes
> > > > > > > > id="ovndb_servers-instance_attributes">
> > > > > > > > Nov 23 23:06:03 [665246] node-
> > > > > > 1.domain.tld????????cib:?????info:
> > > > > > > > cib_perform_op:
> > > ++????????????????????????????????????<nvpair
> > > > > > > > id="ovndb_servers-instance_attributes-master_ip"
> > > > > > name="master_ip"
> > > > > > > > value="168.254.101.2"/>
> > > > > > > > Nov 23 23:06:03 [665246] node-
> > > > > > 1.domain.tld????????cib:?????info:
> > > > > > > > cib_perform_op:
> > > > > > > ++??????????????????????????????????</instance_attributes>
> > > > > > > > Nov 23 23:06:03 [665246] node-
> > > > > > 1.domain.tld????????cib:?????info:
> > > > > > > > cib_perform_op:
> > > > > > ++??????????????????????????????????<operations>
> > > > > > > > Nov 23 23:06:03 [665246] node-
> > > > > > 1.domain.tld????????cib:?????info:
> > > > > > > > cib_perform_op: ++????????????????????????????????????<op
> > > > > > > > id="ovndb_servers-start-timeout-30s" interval="0s"
> > > > > > name="start"
> > > > > > > > timeout="30s"/>
> > > > > > > > Nov 23 23:06:03 [665246] node-
> > > > > > 1.domain.tld????????cib:?????info:
> > > > > > > > cib_perform_op: ++????????????????????????????????????<op
> > > > > > > > id="ovndb_servers-stop-timeout-20s" interval="0s"
> > > name="stop"
> > > > > > > > timeout="20s"/>
> > > > > > > > Nov 23 23:06:03 [665246] node-
> > > > > > 1.domain.tld????????cib:?????info:
> > > > > > > > cib_perform_op: ++????????????????????????????????????<op
> > > > > > > > id="ovndb_servers-promote-timeout-50s" interval="0s"
> > > > > > name="promote"
> > > > > > > > timeout="50s"/>
> > > > > > > > Nov 23 23:06:03 [665246] node-
> > > > > > 1.domain.tld????????cib:?????info:
> > > > > > > > cib_perform_op: ++????????????????????????????????????<op
> > > > > > > > id="ovndb_servers-demote-timeout-50s" interval="0s"
> > > > > > name="demote"
> > > > > > > > timeout="50s"/>
> > > > > > > > Nov 23 23:06:03 [665246] node-
> > > > > > 1.domain.tld????????cib:?????info:
> > > > > > > > cib_perform_op: ++????????????????????????????????????<op
> > > > > > > > id="ovndb_servers-monitor-interval-10s" interval="10s"
> > > > > > > name="monitor"/>
> > > > > > > > Nov 23 23:06:03 [665246] node-
> > > > > > 1.domain.tld????????cib:?????info:
> > > > > > > > cib_perform_op: ++????????????????????????????????????<op
> > > > > > > > id="ovndb_servers-monitor-interval-11s-role-Master"
> > > > > > interval="11s"
> > > > > > > > name="monitor" role="Master"/>
> > > > > > > > Nov 23 23:06:03 [665246] node-
> > > > > > 1.domain.tld????????cib:?????info:
> > > > > > > > cib_perform_op:
> > > > > > ++??????????????????????????????????</operations>
> > > > > > > > Nov 23 23:06:03 [665246] node-
> > > > > > 1.domain.tld????????cib:?????info:
> > > > > > > > cib_perform_op:
> > > > > > ++????????????????????????????????</primitive>
> > > > > > > >?
> > > > > > > > Nov 23 23:06:03 [665249] node-
> > > > > > 1.domain.tld??????attrd:?????info:
> > > > > > > > attrd_peer_update: Setting master-ovndb_servers[node-
> > > > > > 1.domain.tld]:
> > > > > > > (null)
> > > > > > > > -> 5 from node-1.domain.tld
> > > > > > >
> > > > > > > If it's probable your ocf:ovn:ovndb-servers agent in master
> > > > > > mode can
> > > > > > > run something like "attrd_updater -n master-ovndb_servers
> > > -U
> > > > > > 5", then
> > > > > > > it was indeed launched OK, and if it does not continue to
> > > run
> > > > > > as
> > > > > > > expected, there may be a problem with the agent itself.
> > > > > > >
> > > > > > > no change.
> > > > > > > You can try running "pcs resource debug-promote
> > > ovndb_servers
> > > > > > --full"
> > > > > > > to examine the executation details (assuming the agent
> > > responds
> > > > > > to
> > > > > > > OCF_TRACE_RA=1 environment variable, which is what shell-
> > > based
> > > > > > > agents built on top ocf-shellfuncs sourcable shell library
> > > from
> > > > > > > resource-agents project, hence incl. also agents it ships,
> > > > > > > customarily do).
> > > > > > > Yes, thank, it's helpful.
> > > > > > >
> > > > > > > > Nov 23 23:06:03 [665251] node-
> > > > > > 1.domain.tld???????crmd:???notice:
> > > > > > > > process_lrm_event: Operation ovndb_servers_monitor_0: ok
> > > > > > > > (node=node-1.domain.tld, call=185, rc=0, cib-update=88,
> > > > > > > confirmed=true)
> > > > > > > > <29>Nov 23 23:06:03 node-1 crmd[665251]:???notice:
> > > > > > > process_lrm_event:
> > > > > > > > Operation ovndb_servers_monitor_0: ok (node=node-
> > > > > > 1.domain.tld,
> > > > > > > call=185,
> > > > > > > > rc=0, cib-update=88, confirmed=true)
> > > > > > > > Nov 23 23:06:03 [665246] node-
> > > > > > 1.domain.tld????????cib:?????info:
> > > > > > > > cib_perform_op: Diff: --- 0.630.2 2
> > > > > > > > Nov 23 23:06:03 [665246] node-
> > > > > > 1.domain.tld????????cib:?????info:
> > > > > > > > cib_perform_op: Diff: +++ 0.630.3 (null)
> > > > > > > > Nov 23 23:06:03 [665246] node-
> > > > > > 1.domain.tld????????cib:?????info:
> > > > > > > > cib_perform_op: +??/cib:??@num_updates=3
> > > > > > > > Nov 23 23:06:03 [665246] node-
> > > > > > 1.domain.tld????????cib:?????info:
> > > > > > > > cib_perform_op: ++
> > > > > > > >
> > > > > > >
> > > > > >
> > > /cib/status/node_state[@id='1']/transient_attributes[@id='1']/ins
> > > > > > tanc
> > > > > > > e_attributes[@id='status-1']:
> > > > > > > > <nvpair id="status-1-master-ovndb_servers" name="master-
> > > > > > > ovndb_servers"
> > > > > > > > value="5"/>
> > > > > > > > Nov 23 23:06:03 [665246] node-
> > > > > > 1.domain.tld????????cib:?????info:
> > > > > > > > cib_process_request: Completed cib_modify operation for
> > > > > > section
> > > > > > > status: OK
> > > > > > > > (rc=0, origin=node-3.domain.tld/attrd/80,
> > > version=0.630.3)
> > > > > > >
> > > > > > > Also depends if there's anything interesting after this
> > > > > > point...
> > > > > > >
> > > > > > > _______________________________________________
> > > > > > > Users mailing list: Users at clusterlabs.org
> > > > > > > http://lists.clusterlabs.org/mailman/listinfo/users
> > > > > > >
> > > > > > > Project Home: http://www.clusterlabs.org
> > > > > > > Getting started: http://www.clusterlabs.org/doc/Cluster_fro
> > > m_Sc
> > > > > > ratch.
> > > > > > > pdf
> > > > > > > Bugs: http://bugs.clusterlabs.org
> > > > > > --
> > > > > > Ken Gaillot <kgaillot at redhat.com>
> > > > > >
> > > > >
> > > > >
> > > >
> > > >
> > > --
> > > Ken Gaillot <kgaillot at redhat.com>
> > >
> >
> >
> --
> Ken Gaillot <kgaillot at redhat.com>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20171205/a87c2227/attachment-0003.html>