[ClusterLabs] Unable to restart resources

JCA 1.41421 at gmail.com
Tue Mar 26 11:33:39 EDT 2019


Making some progress with Pacemaker/DRBD, but still trying to grasp some of
the basics of this framework. Here is my current situation:

I have a two-node cluster, pmk1 and pmk2, with resources ClusterIP and
DrbdFS. In what follows, commands preceded by '[pmk1] #' are to be
understood as commands issued by the superuser in pmk1, whereas those
preceded by '[pmk2] #' are issued by the superuser in pmk2 (pretty obvious,
but better make it crystal clear).

[pmk1] # pcs status resources
 ClusterIP (ocf::heartbeat:IPaddr2): Started pmk1
 Master/Slave Set: DrbdDataClone [DrbdData]
     Masters: [ pmk1 ]
     Slaves: [ pmk2 ]
 DrbdFS (ocf::heartbeat:Filesystem): Started pmk1

[pmk2] # pcs status resources
 ClusterIP (ocf::heartbeat:IPaddr2): Started pmk1
 Master/Slave Set: DrbdDataClone [DrbdData]
     Masters: [ pmk1 ]
     Slaves: [ pmk2 ]
 DrbdFS (ocf::heartbeat:Filesystem): Started pmk2

There is an ext4 filesystem in the DRBD device, mounted at /var/lib/pmk.
When things are as described above, in pmk1 this directory contains the
data that I used when I populated the DRBD filesystem  in pmk1, whereas in
pmk2 it contains nothing. I.e. everything is as expected.

Then I did

[pmk1] # pcs cluster stop pmk1
pmk1: Stopping Cluster (pacemaker)...
pmk1: Stopping Cluster (corosync)...

[pmk2] # pcs status resources
 ClusterIP (ocf::heartbeat:IPaddr2): Started pmk2
 Master/Slave Set: DrbdDataClone [DrbdData]
     Masters: [ pmk2 ]
     Stopped: [ pmk2 ]
 DrbdFS (ocf::heartbeat:Filesystem): Started pmk2

After this the contents of /var/lib/pmk in pmk2 are those that were used to
populated the DRBD filesystem in pmk1 (plus any changes introduced by pmk1
before I stopped it), whereas /var/lib/pmk in pmk1 is now empty. Which
implies that things seem to be behaving OK - or, at least, the way I was
expecting for them to behave.

Next I tried to bring pmk1 back on:

[pmk1] # pcs cluster start pmk1
pmk1: Starting Cluster (corosync)...
pmk1: Starting Cluster (pacemaker)...

[pmk1] # pcs status resources
ClusterIP (ocf::heartbeat:IPaddr2): Stopped
 Master/Slave Set: DrbdDataClone [DrbdData]
     Stopped: [ pmk1 pmk2 ]
 DrbdFS (ocf::heartbeat:Filesystem): Stopped

[pmk2] # pcs status resources
 ClusterIP (ocf::heartbeat:IPaddr2): Started pmk2
 Master/Slave Set: DrbdDataClone [DrbdData]
     Masters: [ pmk2 ]
     Stopped: [ pmk2 ]
 DrbdFS (ocf::heartbeat:Filesystem): Started pmk2

Node pmk1 is back up, but ClusterIP and DrbdFS are not, at least on pmk1.
And pmk2 remains in charge. I clumsily tried to restart those resources by
hand in pmk1, to no avail:

[pmk1] # pcs resource restart ClusterIP
Error: Error performing operation: No such device or address
ClusterIP is not running anywhere and so cannot be restarted

I also tried stopping and starting the pmk1 node from pmk1, and also from
pmk2, several times, to no avail.

How can I bring back the pmk1 node on correctly, so that everything is how
it originally was - i.e. with pmk1 up and running, and with the resources
also up and running in pmk1?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20190326/3461b1d0/attachment.html>


More information about the Users mailing list