[ClusterLabs] Trouble with drbd/pacemaker: switch to secondary/secondary

Sun Oct 16 06:33:32 UTC 2016

Yikes.  I don't have any suggestions.  This is beyond me.
Sorry.

J.

On Sat, Oct 15, 2016 at 4:48 AM, Anne Nicolas <ennael1 at gmail.com> wrote:

> Anne
> http://mageia.org
>
> Le 15 oct. 2016 9:02 AM, "Jay Scott" <bigcrater at gmail.com> a écrit :
> >
> >
> > Well, I'm a newbie myself.  But this:
> > drbdadm primary --force ___the name of the drbd res___
> > has worked for me.  But I'm having lots of trouble myself,
> > so...
> > then there's this:
> > drbdadm -- --overwrite-data-of-peer primary bravo
> > (bravo happens to be my drbd res) and that should also
> > strongarm one machine or another to be the primary.
> >
>
> Well I used those commands it goes to primary but I czn see then pacemaker
> switching it to secondary after some secondd
>
> > j.
> >
> > On Fri, Oct 14, 2016 at 3:22 PM, Anne Nicolas <ennael1 at gmail.com> wrote:
> >>
> >> Hi!
> >>
> >> I'm having trouble with a 2 nodes cluster used for DRBD / Apache / Samba
> >> and some other services.
> >>
> >> Whatever I do, it always goes to the following state:
> >>
> >> Last updated: Fri Oct 14 17:41:38 2016
> >> Last change: Thu Oct 13 10:42:29 2016 via cibadmin on bzvairsvr
> >> Stack: corosync
> >> Current DC: bzvairsvr (168430081) - partition with quorum
> >> Version: 1.1.8-9.mga5-394e906
> >> 2 Nodes configured, unknown expected votes
> >> 13 Resources configured.
> >>
> >>
> >> Online: [ bzvairsvr bzvairsvr2 ]
> >>
> >>  Master/Slave Set: drbdservClone [drbdserv]
> >>      Slaves: [ bzvairsvr bzvairsvr2 ]
> >>  Clone Set: fencing [st-ssh]
> >>      Started: [ bzvairsvr bzvairsvr2 ]
> >>
> >> When I reboot bzvairsvr2 this one goes primary again. But after a while
> >> becomes secondary also.
> >> I use a very basic fencing system based on ssh. It's not optimal but
> >> enough for the current tests.
> >>
> >> Here are information about the configuration:
> >>
> >> node 168430081: bzvairsvr
> >> node 168430082: bzvairsvr2
> >> primitive apache apache \
> >>         params configfile="/etc/httpd/conf/httpd.conf" \
> >>         op start interval=0 timeout=120s \
> >>         op stop interval=0 timeout=120s
> >> primitive clusterip IPaddr2 \
> >>         params ip=192.168.100.1 cidr_netmask=24 nic=eno1 \
> >>         meta target-role=Started
> >> primitive clusterroute Route \
> >>         params destination="0.0.0.0/0" gateway=192.168.100.254
> >> primitive drbdserv ocf:linbit:drbd \
> >>         params drbd_resource=server \
> >>         op monitor interval=30s role=Slave \
> >>         op monitor interval=29s role=Master start-delay=30s
> >> primitive fsserv Filesystem \
> >>         params device="/dev/drbd/by-res/server" directory="/Server"
> >> fstype=ext4 \
> >>         op start interval=0 timeout=60s \
> >>         op stop interval=0 timeout=60s \
> >>         meta target-role=Started
> >> primitive libvirt-guests systemd:libvirt-guests
> >> primitive libvirtd systemd:libvirtd
> >> primitive mysql systemd:mysqld
> >> primitive named systemd:named
> >> primitive samba systemd:smb
> >> primitive st-ssh stonith:external/ssh \
> >>         params hostlist="bzvairsvr bzvairsvr2"
> >> group iphd clusterip clusterroute \
> >>         meta target-role=Started
> >> group services libvirtd libvirt-guests apache named mysql samba \
> >>         meta target-role=Started
> >> ms drbdservClone drbdserv \
> >>         meta master-max=1 master-node-max=1 clone-max=2 clone-node-max=1
> >> notify=true target-role=Started
> >> clone fencing st-ssh
> >> colocation fs_on_drbd inf: fsserv drbdservClone:Master
> >> colocation iphd_on_services inf: iphd services
> >> colocation services_on_fsserv inf: services fsserv
> >> order fsserv-after-drbdserv inf: drbdservClone:promote fsserv:start
> >> order services_after_fsserv inf: fsserv services
> >> property cib-bootstrap-options: \
> >>         dc-version=1.1.8-9.mga5-394e906 \
> >>         cluster-infrastructure=corosync \
> >>         no-quorum-policy=ignore \
> >>         stonith-enabled=true \
> >>
> >> cluster logs are flooded by :
> >> Oct 14 17:42:28 [3445] bzvairsvr      attrd:   notice:
> >> attrd_trigger_update:    Sending flush op to all hosts for:
> >> master-drbdserv (10000)
> >> Oct 14 17:42:28 [3445] bzvairsvr      attrd:   notice:
> >> attrd_perform_update:    Sent update master-drbdserv=10000 failed:
> >> Transport endpoint is not connected
> >> Oct 14 17:42:28 [3445] bzvairsvr      attrd:   notice:
> >> attrd_perform_update:    Sent update -107: master-drbdserv=10000
> >> Oct 14 17:42:28 [3445] bzvairsvr      attrd:  warning:
> >> attrd_cib_callback:      Update master-drbdserv=10000 failed: Transport
> >> endpoint is not connected
> >> Oct 14 17:42:59 [3445] bzvairsvr      attrd:   notice:
> >> attrd_trigger_update:    Sending flush op to all hosts for:
> >> master-drbdserv (10000)
> >> Oct 14 17:42:59 [3445] bzvairsvr      attrd:   notice:
> >> attrd_perform_update:    Sent update master-drbdserv=10000 failed:
> >> Transport endpoint is not connected
> >> Oct 14 17:42:59 [3445] bzvairsvr      attrd:   notice:
> >> attrd_perform_update:    Sent update -107: master-drbdserv=10000
> >> Oct 14 17:42:59 [3445] bzvairsvr      attrd:  warning:
> >> attrd_cib_callback:      Update master-drbdserv=10000 failed: Transport
> >> endpoint is not connected
> >>
> >>
> >> And here is dmesg
> >>
> >> [34067.547147] block drbd0: peer( Secondary -> Primary )
> >> [34091.023206] block drbd0: peer( Primary -> Secondary )
> >> [34096.616319] drbd server: peer( Secondary -> Unknown ) conn( Connected
> >> -> TearDown ) pdsk( UpToDate -> DUnknown )
> >> [34096.616353] drbd server: asender terminated
> >> [34096.616358] drbd server: Terminating drbd_a_server
> >> [34096.682874] drbd server: Connection closed
> >> [34096.682894] drbd server: conn( TearDown -> Unconnected )
> >> [34096.682897] drbd server: receiver terminated
> >> [34096.682900] drbd server: Restarting receiver thread
> >> [34096.682902] drbd server: receiver (re)started
> >> [34096.682915] drbd server: conn( Unconnected -> WFConnection )
> >> [34103.311898] drbd server: Handshake successful: Agreed network
> >> protocol version 101
> >> [34103.311903] drbd server: Agreed to support TRIM on protocol level
> >> [34103.311997] drbd server: Peer authenticated using 20 bytes HMAC
> >> [34103.312046] drbd server: conn( WFConnection -> WFReportParams )
> >> [34103.312062] drbd server: Starting asender thread (from drbd_r_server
> >> [4344])
> >> [34103.380311] block drbd0: drbd_sync_handshake:
> >> [34103.380318] block drbd0: self
> >> 8B500BD87A5D76D4:0000000000000000:A1860E99AC8107A0:A1850E99AC8107A0
> >> bits:0 flags:0
> >> [34103.380323] block drbd0: peer
> >> 8B500BD87A5D76D4:0000000000000000:A1860E99AC8107A0:A1850E99AC8107A0
> >> bits:0 flags:0
> >> [34103.380327] block drbd0: uuid_compare()=0 by rule 40
> >> [34103.380335] block drbd0: peer( Unknown -> Secondary ) conn(
> >> WFReportParams -> Connected ) pdsk( DUnknown -> UpToDate )
> >> [34114.046443] bnx2x 0000:05:00.0 enp5s0f0: NIC Link is Down
> >> [34123.802580] drbd server: PingAck did not arrive in time.
> >> [34123.802617] drbd server: peer( Secondary -> Unknown ) conn( Connected
> >> -> NetworkFailure ) pdsk( UpToDate -> DUnknown )
> >> [34123.802773] drbd server: asender terminated
> >> [34123.802777] drbd server: Terminating drbd_a_server
> >> [34123.932565] drbd server: Connection closed
> >> [34123.932585] drbd server: conn( NetworkFailure -> Unconnected )
> >> [34123.932588] drbd server: receiver terminated
> >> [34123.932590] drbd server: Restarting receiver thread
> >> [34123.932592] drbd server: receiver (re)started
> >> [34123.932605] drbd server: conn( Unconnected -> WFConnection )
> >> [34185.719207] bnx2x 0000:05:00.0 enp5s0f0: NIC Link is Up, 10000 Mbps
> >> full duplex, Flow control: ON - receive & transmit
> >> [34232.241599] bnx2x 0000:05:00.0 enp5s0f0: NIC Link is Down
> >> [34268.637861] bnx2x 0000:05:00.0 enp5s0f0: NIC Link is Up, 10000 Mbps
> >> full duplex, Flow control: ON - receive & transmit
> >> [34318.675122] drbd server: Handshake successful: Agreed network
> >> protocol version 101
> >> [34318.675128] drbd server: Agreed to support TRIM on protocol level
> >> [34318.675218] drbd server: Peer authenticated using 20 bytes HMAC
> >> [34318.675258] drbd server: conn( WFConnection -> WFReportParams )
> >> [34318.675276] drbd server: Starting asender thread (from drbd_r_server
> >> [4344])
> >> [34318.738909] block drbd0: drbd_sync_handshake:
> >> [34318.738916] block drbd0: self
> >> 8B500BD87A5D76D4:0000000000000000:A1860E99AC8107A0:A1850E99AC8107A0
> >> bits:0 flags:0
> >> [34318.738921] block drbd0: peer
> >> 8B500BD87A5D76D4:0000000000000000:A1860E99AC8107A0:A1850E99AC8107A0
> >> bits:0 flags:0
> >> [34318.738924] block drbd0: uuid_compare()=0 by rule 40
> >> [34318.738933] block drbd0: peer( Unknown -> Secondary ) conn(
> >> WFReportParams -> Connected ) pdsk( DUnknown -> UpToDate )
> >> [34328.812317] block drbd0: peer( Secondary -> Primary )
> >> [37316.065793] usb 3-11: USB disconnect, device number 3
> >> [52246.642265] block drbd0: peer( Primary -> Secondary )
> >>
> >> Any help would be appreciated
> >>
> >> Cheers
> >>
> >> --
> >> Anne Nicolas
> >> http://mageia.org
> >>
> >> _______________________________________________
> >> Users mailing list: Users at clusterlabs.org
> >> http://clusterlabs.org/mailman/listinfo/users
> >>
> >> Project Home: http://www.clusterlabs.org
> >> Getting started: http://www.clusterlabs.org/
> doc/Cluster_from_Scratch.pdf
> >> Bugs: http://bugs.clusterlabs.org
> >
> >
> >
> > _______________________________________________
> > Users mailing list: Users at clusterlabs.org
> > http://clusterlabs.org/mailman/listinfo/users
> >
> > Project Home: http://www.clusterlabs.org
> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > Bugs: http://bugs.clusterlabs.org
> >
>
>
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.clusterlabs.org/pipermail/users/attachments/20161016/1dd3cebc/attachment-0002.html>