[ClusterLabs] Unable to restart resources

Tue Mar 26 13:20:27 EDT 2019

26.03.2019 18:33, JCA пишет:
> Making some progress with Pacemaker/DRBD, but still trying to grasp some of
> the basics of this framework. Here is my current situation:
> 
> I have a two-node cluster, pmk1 and pmk2, with resources ClusterIP and
> DrbdFS. In what follows, commands preceded by '[pmk1] #' are to be
> understood as commands issued by the superuser in pmk1, whereas those
> preceded by '[pmk2] #' are issued by the superuser in pmk2 (pretty obvious,
> but better make it crystal clear).
> 
> [pmk1] # pcs status resources
>  ClusterIP (ocf::heartbeat:IPaddr2): Started pmk1
>  Master/Slave Set: DrbdDataClone [DrbdData]
>      Masters: [ pmk1 ]
>      Slaves: [ pmk2 ]
>  DrbdFS (ocf::heartbeat:Filesystem): Started pmk1
> 
> [pmk2] # pcs status resources
>  ClusterIP (ocf::heartbeat:IPaddr2): Started pmk1
>  Master/Slave Set: DrbdDataClone [DrbdData]
>      Masters: [ pmk1 ]
>      Slaves: [ pmk2 ]
>  DrbdFS (ocf::heartbeat:Filesystem): Started pmk2
> 
> There is an ext4 filesystem in the DRBD device, mounted at /var/lib/pmk.
> When things are as described above, in pmk1 this directory contains the
> data that I used when I populated the DRBD filesystem  in pmk1, whereas in
> pmk2 it contains nothing. I.e. everything is as expected.
> 
> Then I did
> 
> [pmk1] # pcs cluster stop pmk1
> pmk1: Stopping Cluster (pacemaker)...
> pmk1: Stopping Cluster (corosync)...
> 
> [pmk2] # pcs status resources
>  ClusterIP (ocf::heartbeat:IPaddr2): Started pmk2
>  Master/Slave Set: DrbdDataClone [DrbdData]
>      Masters: [ pmk2 ]
>      Stopped: [ pmk2 ]
>  DrbdFS (ocf::heartbeat:Filesystem): Started pmk2
> 
> After this the contents of /var/lib/pmk in pmk2 are those that were used to
> populated the DRBD filesystem in pmk1 (plus any changes introduced by pmk1
> before I stopped it), whereas /var/lib/pmk in pmk1 is now empty. Which
> implies that things seem to be behaving OK - or, at least, the way I was
> expecting for them to behave.
> 
> Next I tried to bring pmk1 back on:
> 
> [pmk1] # pcs cluster start pmk1
> pmk1: Starting Cluster (corosync)...
> pmk1: Starting Cluster (pacemaker)...
> 
> [pmk1] # pcs status resources
> ClusterIP (ocf::heartbeat:IPaddr2): Stopped
>  Master/Slave Set: DrbdDataClone [DrbdData]
>      Stopped: [ pmk1 pmk2 ]
>  DrbdFS (ocf::heartbeat:Filesystem): Stopped
> 
> [pmk2] # pcs status resources
>  ClusterIP (ocf::heartbeat:IPaddr2): Started pmk2
>  Master/Slave Set: DrbdDataClone [DrbdData]
>      Masters: [ pmk2 ]
>      Stopped: [ pmk2 ]
>  DrbdFS (ocf::heartbeat:Filesystem): Started pmk2
> 
> Node pmk1 is back up, but ClusterIP and DrbdFS are not, at least on pmk1.
> And pmk2 remains in charge. I clumsily tried to restart those resources by
> hand in pmk1, to no avail:
> 
> [pmk1] # pcs resource restart ClusterIP
> Error: Error performing operation: No such device or address
> ClusterIP is not running anywhere and so cannot be restarted
> 

This sounds like pmk1 did not actually join the cluster. You need to
check logs to see what happened when pacemaker on pmk1 was restarted.

> I also tried stopping and starting the pmk1 node from pmk1, and also from
> pmk2, several times, to no avail.
> 
> How can I bring back the pmk1 node on correctly, so that everything is how
> it originally was - i.e. with pmk1 up and running, and with the resources
> also up and running in pmk1?
> 
> 
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
> 
> ClusterLabs home: https://www.clusterlabs.org/
>