[ClusterLabs] pcs create master/slave resource doesn't work

Mon Nov 27 13:07:57 EST 2017

On Fri, 2017-11-24 at 18:00 +0800, Hui Xiang wrote:
> Jan,
> 
>   Very appreciated on your help, I am getting further more, but still
> it looks very strange.
> 
> 1. To use "debug-promote", I upgrade pacemaker from 1.12 to 1.16, pcs
> to 0.9.160.
> 
> 2. Recreate resource with below commands
> pcs resource create ovndb_servers ocf:ovn:ovndb-servers \
>   master_ip=192.168.0.99 \
>   op monitor interval="10s" \
>   op monitor interval="11s" role=Master
> pcs resource master ovndb_servers-master ovndb_servers \
>   meta notify="true" master-max="1" master-node-max="1" clone-max="3" 
> clone-node-max="1"
> pcs resource create VirtualIP ocf:heartbeat:IPaddr2 ip=192.168.0.99 \
>     op monitor interval=10s
> pcs constraint colocation add VirtualIP with master ovndb_servers-
> master \
>   score=INFINITY
> 
> 3. pcs status
>  Master/Slave Set: ovndb_servers-master [ovndb_servers]
>      Stopped: [ node-1.domain.tld node-2.domain.tld node-3.domain.tld 
> ]
>  VirtualIP	(ocf::heartbeat:IPaddr2):	Stopped
> 
> 4. Manually run 'debug-start' on 3 nodes and 'debug-promote' on one
> of nodes
> run below on [ node-1.domain.tld node-2.domain.tld node-3.domain.tld
> ]
> # pcs resource debug-start ovndb_servers --full
> run below on [ node-1.domain.tld ]
> # pcs resource debug-promote ovndb_servers --full

Before running debug-* commands, I'd unmanage the resource or put the
cluster in maintenance mode, so Pacemaker doesn't try to "correct" your
actions.

> 
> 5. pcs status
>  Master/Slave Set: ovndb_servers-master [ovndb_servers]
>      Stopped: [ node-1.domain.tld node-2.domain.tld node-3.domain.tld 
> ]
>  VirtualIP	(ocf::heartbeat:IPaddr2):	Stopped
> 
> 6. However I have seen that one of ovndb_servers has been indeed
> promoted as master, but pcs status still showed all 'stopped'
> what am I missing?

It's hard to tell from these logs. It's possible the resource agent's
monitor command is not exiting with the expected status values:

http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemake
r_Explained/index.html#_requirements_for_multi_state_resource_agents

One of the nodes will be elected the DC, meaning it coordinates the
cluster's actions. The DC's logs will have more "pengine:" messages,
with each action that needs to be taken (e.g. "* Start <rsc> <node>").

You can look through those actions to see what the cluster decided to
do -- whether the resources were ever started, whether any was
promoted, and whether any were explicitly stopped.

>  >  stderr: + 17:45:59: ocf_log:327: __OCF_MSG='ovndb_servers:
> Promoting node-1.domain.tld as the master'
>  >  stderr: + 17:45:59: ocf_log:329: case "${__OCF_PRIO}" in
>  >  stderr: + 17:45:59: ocf_log:333: __OCF_PRIO=INFO
>  >  stderr: + 17:45:59: ocf_log:338: '[' INFO = DEBUG ']'
>  >  stderr: + 17:45:59: ocf_log:341: ha_log 'INFO: ovndb_servers:
> Promoting node-1.domain.tld as the master'
>  >  stderr: + 17:45:59: ha_log:253: __ha_log 'INFO: ovndb_servers:
> Promoting node-1.domain.tld as the master'
>  >  stderr: + 17:45:59: __ha_log:185: local ignore_stderr=false
>  >  stderr: + 17:45:59: __ha_log:186: local loglevel
>  >  stderr: + 17:45:59: __ha_log:188: '[' 'xINFO: ovndb_servers:
> Promoting node-1.domain.tld as the master' = x--ignore-stderr ']'
>  >  stderr: + 17:45:59: __ha_log:190: '[' none = '' ']'
>  >  stderr: + 17:45:59: __ha_log:192: tty
>  >  stderr: + 17:45:59: __ha_log:193: '[' x = x0 -a x = xdebug ']'
>  >  stderr: + 17:45:59: __ha_log:195: '[' false = true ']'
>  >  stderr: + 17:45:59: __ha_log:199: '[' '' ']'
>  >  stderr: + 17:45:59: __ha_log:202: echo 'INFO: ovndb_servers:
> Promoting node-1.domain.tld as the master'
>  >  stderr: INFO: ovndb_servers: Promoting node-1.domain.tld as the
> master 
>  >  stderr: + 17:45:59: __ha_log:204: return 0
>  >  stderr: + 17:45:59: ovsdb_server_promote:378:
> /usr/sbin/crm_attribute --type crm_config --name OVN_REPL_INFO -s
> ovn_ovsdb_master_server -v node-1.domain.tld
>  >  stderr: + 17:45:59: ovsdb_server_promote:379:
> ovsdb_server_master_update 8
>  >  stderr: + 17:45:59: ovsdb_server_master_update:214: case $1 in
>  >  stderr: + 17:45:59: ovsdb_server_master_update:218:
> /usr/sbin/crm_master -l reboot -v 10
>  >  stderr: + 17:45:59: ovsdb_server_promote:380: return 0
>  >  stderr: + 17:45:59: 458: rc=0
>  >  stderr: + 17:45:59: 459: exit 0
> 
> 
> On 23/11/17 23:52 +0800, Hui Xiang wrote:
> > I am working on HA with 3-nodes, which has below configurations:
> > 
> > """
> > pcs resource create ovndb_servers ocf:ovn:ovndb-servers \
> >   master_ip=168.254.101.2 \
> >   op monitor interval="10s" \
> >   op monitor interval="11s" role=Master
> > pcs resource master ovndb_servers-master ovndb_servers \
> >   meta notify="true" master-max="1" master-node-max="1" clone-
> max="3"
> > clone-node-max="1"
> > pcs resource create VirtualIP ocf:heartbeat:IPaddr2
> ip=168.254.101.2 \
> >     op monitor interval=10s
> > pcs constraint order promote ovndb_servers-master then VirtualIP
> > pcs constraint colocation add VirtualIP with master ovndb_servers-
> master \
> >   score=INFINITY
> > """
> 
> (Out of curiosity, this looks like a mix of output from 
> pcs config export pcs-commands [or clufter cib2pcscmd -s]
> and manual editing.  Is this a good guess?)
> It's the output of "pcs status".
> 
> >   However, after setting it as above, the master is not being
> selected, all
> > are stopped, from pacemaker log, node-1 has been chosen as the
> master, I am
> > confuse where is wrong, can anybody give a help, it would be very
> > appreciated.
> > 
> > 
> >  Master/Slave Set: ovndb_servers-master [ovndb_servers]
> >      Stopped: [ node-1.domain.tld node-2.domain.tld node-
> 3.domain.tld ]
> >  VirtualIP (ocf::heartbeat:IPaddr2): Stopped
> > 
> > 
> > # pacemaker log
> > Nov 23 23:06:03 [665246] node-1.domain.tld        cib:     info:
> > cib_perform_op: ++ /cib/configuration/resources:  <primitive
> class="ocf"
> > id="ovndb_servers" provider="ovn" type="ovndb-servers"/>
> > Nov 23 23:06:03 [665246] node-1.domain.tld        cib:     info:
> > cib_perform_op:
> ++                                  <instance_attributes
> > id="ovndb_servers-instance_attributes">
> > Nov 23 23:06:03 [665246] node-1.domain.tld        cib:     info:
> > cib_perform_op: ++                                    <nvpair
> > id="ovndb_servers-instance_attributes-master_ip" name="master_ip"
> > value="168.254.101.2"/>
> > Nov 23 23:06:03 [665246] node-1.domain.tld        cib:     info:
> > cib_perform_op:
> ++                                  </instance_attributes>
> > Nov 23 23:06:03 [665246] node-1.domain.tld        cib:     info:
> > cib_perform_op: ++                                  <operations>
> > Nov 23 23:06:03 [665246] node-1.domain.tld        cib:     info:
> > cib_perform_op: ++                                    <op
> > id="ovndb_servers-start-timeout-30s" interval="0s" name="start"
> > timeout="30s"/>
> > Nov 23 23:06:03 [665246] node-1.domain.tld        cib:     info:
> > cib_perform_op: ++                                    <op
> > id="ovndb_servers-stop-timeout-20s" interval="0s" name="stop"
> > timeout="20s"/>
> > Nov 23 23:06:03 [665246] node-1.domain.tld        cib:     info:
> > cib_perform_op: ++                                    <op
> > id="ovndb_servers-promote-timeout-50s" interval="0s" name="promote"
> > timeout="50s"/>
> > Nov 23 23:06:03 [665246] node-1.domain.tld        cib:     info:
> > cib_perform_op: ++                                    <op
> > id="ovndb_servers-demote-timeout-50s" interval="0s" name="demote"
> > timeout="50s"/>
> > Nov 23 23:06:03 [665246] node-1.domain.tld        cib:     info:
> > cib_perform_op: ++                                    <op
> > id="ovndb_servers-monitor-interval-10s" interval="10s"
> name="monitor"/>
> > Nov 23 23:06:03 [665246] node-1.domain.tld        cib:     info:
> > cib_perform_op: ++                                    <op
> > id="ovndb_servers-monitor-interval-11s-role-Master" interval="11s"
> > name="monitor" role="Master"/>
> > Nov 23 23:06:03 [665246] node-1.domain.tld        cib:     info:
> > cib_perform_op: ++                                  </operations>
> > Nov 23 23:06:03 [665246] node-1.domain.tld        cib:     info:
> > cib_perform_op: ++                                </primitive>
> > 
> > Nov 23 23:06:03 [665249] node-1.domain.tld      attrd:     info:
> > attrd_peer_update: Setting master-ovndb_servers[node-1.domain.tld]: 
> (null)
> > -> 5 from node-1.domain.tld
> 
> If it's probable your ocf:ovn:ovndb-servers agent in master mode can
> run something like "attrd_updater -n master-ovndb_servers -U 5", then
> it was indeed launched OK, and if it does not continue to run as
> expected, there may be a problem with the agent itself.
> 
> no change.
> You can try running "pcs resource debug-promote ovndb_servers --full"
> to examine the executation details (assuming the agent responds to
> OCF_TRACE_RA=1 environment variable, which is what shell-based
> agents built on top ocf-shellfuncs sourcable shell library from
> resource-agents project, hence incl. also agents it ships,
> customarily do).
> Yes, thank, it's helpful.
> 
> > Nov 23 23:06:03 [665251] node-1.domain.tld       crmd:   notice:
> > process_lrm_event: Operation ovndb_servers_monitor_0: ok
> > (node=node-1.domain.tld, call=185, rc=0, cib-update=88,
> confirmed=true)
> > <29>Nov 23 23:06:03 node-1 crmd[665251]:   notice:
> process_lrm_event:
> > Operation ovndb_servers_monitor_0: ok (node=node-1.domain.tld,
> call=185,
> > rc=0, cib-update=88, confirmed=true)
> > Nov 23 23:06:03 [665246] node-1.domain.tld        cib:     info:
> > cib_perform_op: Diff: --- 0.630.2 2
> > Nov 23 23:06:03 [665246] node-1.domain.tld        cib:     info:
> > cib_perform_op: Diff: +++ 0.630.3 (null)
> > Nov 23 23:06:03 [665246] node-1.domain.tld        cib:     info:
> > cib_perform_op: +  /cib:  @num_updates=3
> > Nov 23 23:06:03 [665246] node-1.domain.tld        cib:     info:
> > cib_perform_op: ++
> >
> /cib/status/node_state[@id='1']/transient_attributes[@id='1']/instanc
> e_attributes[@id='status-1']:
> > <nvpair id="status-1-master-ovndb_servers" name="master-
> ovndb_servers"
> > value="5"/>
> > Nov 23 23:06:03 [665246] node-1.domain.tld        cib:     info:
> > cib_process_request: Completed cib_modify operation for section
> status: OK
> > (rc=0, origin=node-3.domain.tld/attrd/80, version=0.630.3)
> 
> Also depends if there's anything interesting after this point...
> 
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> http://lists.clusterlabs.org/mailman/listinfo/users
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.
> pdf
> Bugs: http://bugs.clusterlabs.org
-- 
Ken Gaillot <kgaillot at redhat.com>