[ClusterLabs] ovndb-servers resource agent doesn't work on pcs 0.9.164

Thu Sep 3 21:11:10 EDT 2020

It seems that I didnt uninstall my ovn environment properly. It get started
after I uninstall my ovn and delete /var/lib/ovn directory.

But I get another issue, ovn state is started on pacemaker but when I
checked via systemd there is no ovn/ovsdb started.

1. Pacemaker status
```
Full list of resources:

 internal_vip (ocf::heartbeat:IPaddr2): Started ag-controller2
 public_vip (ocf::heartbeat:IPaddr2): Started ag-controller2
 Clone Set: lb-haproxy-clone [lb-haproxy]
     Started: [ ag-controller2 ]
     Stopped: [ ag-controller0 ag-controller1 ]
 Clone Set: wsgi-keystone-clone [wsgi-keystone]
     Started: [ ag-controller0 ag-controller1 ag-controller2 ]
 Master/Slave Set: ovndb_servers-master [ovndb_servers]
     Masters: [ ag-controller2 ]
     Slaves: [ ag-controller0 ag-controller1 ]
```

2. Debug
```
instructor at ag-controller2:~$ sudo crm_resource --why -r ovndb_servers
Resource ovndb_servers:0 is running

instructor at ag-controller2:~$ sudo pcs resource unmanage ovndb_servers-master
instructor at ag-controller2:~$ sudo pcs resource debug-start ovndb_servers
--full
Operation start for ovndb_servers:0 (ocf:ovn:ovndb-servers) returned:
'master' (8)
```

On Fri, Sep 4, 2020 at 12:08 AM Ken Gaillot <kgaillot at redhat.com> wrote:

> On Thu, 2020-09-03 at 23:10 +0700, Popoi Zen wrote:
> > Hi, I try to create ovn cluster using ovndb-servers resource agent
> > from Pacemaker but it get error and failed.
> >
> > ```
> > instructor at ag-controller0:~$ sudo pcs status
> > Cluster name: os-ha
> > Stack: corosync
> > Current DC: ag-controller2 (version 1.1.18-2b07d5c5a9) - partition
> > with quorum
> > Last updated: Thu Sep  3 23:01:03 2020
> > Last change: Thu Sep  3 22:58:13 2020 by root via cibadmin on ag-
> > controller0
> >
> > 3 nodes configured
> > 8 resources configured
> >
> > Online: [ ag-controller0 ag-controller1 ag-controller2 ]
> >
> > Full list of resources:
> >
> >  internal_vip (ocf::heartbeat:IPaddr2):       Started ag-controller0
> >  public_vip   (ocf::heartbeat:IPaddr2):       Started ag-controller0
> >  Clone Set: lb-haproxy-clone [lb-haproxy]
> >      Started: [ ag-controller0 ]
> >      Stopped: [ ag-controller1 ag-controller2 ]
> >  Clone Set: wsgi-keystone-clone [wsgi-keystone]
> >      Started: [ ag-controller0 ag-controller1 ag-controller2 ]
> >
> > Daemon Status:
> >   corosync: active/enabled
> >   pacemaker: active/enabled
> >   pcsd: active/enabled
> > ```
> >
> > I am using this guide
> > https://docs.openvswitch.org/en/latest/topics/integration/.
> >
> > 1. Modify it resource agent symlink to the right path.
> > instructor at ag-controller0:~$ ll /usr/lib/ocf/resource.d/ovn/ovndb-
> > servers
> > lrwxrwxrwx 1 root root 40 Sep  3 22:26
> > /usr/lib/ocf/resource.d/ovn/ovndb-servers ->
> > /usr/share/ovn/scripts/ovndb-servers.ocf*
> >
> > 2. Create ovndb_servers resource
> > instructor at ag-controller0:~$ sudo pcs resource create ovndb_servers
> > ocf:ovn:ovndb-servers master_ip=10.50.50.100
> > ovn_ctl=/usr/share/ovn/scripts/ovn-ctl op monitor interval="10s" op
> > monitor role=Master interval="15s"
> > instructor at ag-controller0:~$ sudo pcs resource master ovndb_servers-
> > master ovndb_servers meta notify="true"
> >
> > 3.  Create contraint
> > sudo pcs constraint order promote ovndb_servers-master then
> > internal_vip
> >
> > 4. Check status, ovndb still on stopped status
> > ```
> > Online: [ ag-controller0 ag-controller1 ag-controller2 ]
> >
> > Full list of resources:
> >
> >  internal_vip (ocf::heartbeat:IPaddr2):       Started ag-controller0
> >  public_vip   (ocf::heartbeat:IPaddr2):       Started ag-controller0
> >  Clone Set: lb-haproxy-clone [lb-haproxy]
> >      Started: [ ag-controller0 ]
> >      Stopped: [ ag-controller1 ag-controller2 ]
> >  Clone Set: wsgi-keystone-clone [wsgi-keystone]
> >      Started: [ ag-controller0 ag-controller1 ag-controller2 ]
> >  Master/Slave Set: ovndb_servers-master [ovndb_servers]
> >      Stopped: [ ag-controller0 ag-controller1 ag-controller2 ]
> >
> > Daemon Status:
> >   corosync: active/enabled
> >   pacemaker: active/enabled
> >   pcsd: active/enabled
> > ```
> > 5. Create collocation
> > sudo pcs constraint colocation add internal_vip with master
> > ovndb_servers-master score=INFINITY
> >
> > 6. Check my pacemaker status again, my VIP resource and ovndb_servers
> > stopped.
> > ```
> > Online: [ ag-controller0 ag-controller1 ag-controller2 ]
> >
> > Full list of resources:
> >
> >  internal_vip (ocf::heartbeat:IPaddr2):       Stopped
> >  public_vip   (ocf::heartbeat:IPaddr2):       Stopped
> >  Clone Set: lb-haproxy-clone [lb-haproxy]
> >      Stopped: [ ag-controller0 ag-controller1 ag-controller2 ]
> >  Clone Set: wsgi-keystone-clone [wsgi-keystone]
> >      Started: [ ag-controller0 ag-controller1 ag-controller2 ]
> >  Master/Slave Set: ovndb_servers-master [ovndb_servers]
> >      Stopped: [ ag-controller0 ag-controller1 ag-controller2 ]
> > ```
> >
> >
> > Is there any working guide that works? Or I miss something on my
> > configuration?
> >
> >
> > Regards
>
> Check the system log and pacemaker detail log for errors. You can also
> try "crm_resource --why -r ovndb_servers" to see if there's an obvious
> reason it's stopped. If none of that helps, try "pcs resource debug-
> start ovndb_servers --full" on one node to see if that gives additional
> info (that will launch the resource outside pacemaker's control, so
> it's a good idea to unmanage it in pacemaker first).
> --
> Ken Gaillot <kgaillot at redhat.com>
>
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.clusterlabs.org/pipermail/users/attachments/20200904/fc264020/attachment.htm>