[ClusterLabs] Unable to restart resources
Andrei Borzenkov
arvidjaar at gmail.com
Tue Mar 26 13:20:27 EDT 2019
26.03.2019 18:33, JCA пишет:
> Making some progress with Pacemaker/DRBD, but still trying to grasp some of
> the basics of this framework. Here is my current situation:
>
> I have a two-node cluster, pmk1 and pmk2, with resources ClusterIP and
> DrbdFS. In what follows, commands preceded by '[pmk1] #' are to be
> understood as commands issued by the superuser in pmk1, whereas those
> preceded by '[pmk2] #' are issued by the superuser in pmk2 (pretty obvious,
> but better make it crystal clear).
>
> [pmk1] # pcs status resources
> ClusterIP (ocf::heartbeat:IPaddr2): Started pmk1
> Master/Slave Set: DrbdDataClone [DrbdData]
> Masters: [ pmk1 ]
> Slaves: [ pmk2 ]
> DrbdFS (ocf::heartbeat:Filesystem): Started pmk1
>
> [pmk2] # pcs status resources
> ClusterIP (ocf::heartbeat:IPaddr2): Started pmk1
> Master/Slave Set: DrbdDataClone [DrbdData]
> Masters: [ pmk1 ]
> Slaves: [ pmk2 ]
> DrbdFS (ocf::heartbeat:Filesystem): Started pmk2
>
> There is an ext4 filesystem in the DRBD device, mounted at /var/lib/pmk.
> When things are as described above, in pmk1 this directory contains the
> data that I used when I populated the DRBD filesystem in pmk1, whereas in
> pmk2 it contains nothing. I.e. everything is as expected.
>
> Then I did
>
> [pmk1] # pcs cluster stop pmk1
> pmk1: Stopping Cluster (pacemaker)...
> pmk1: Stopping Cluster (corosync)...
>
> [pmk2] # pcs status resources
> ClusterIP (ocf::heartbeat:IPaddr2): Started pmk2
> Master/Slave Set: DrbdDataClone [DrbdData]
> Masters: [ pmk2 ]
> Stopped: [ pmk2 ]
> DrbdFS (ocf::heartbeat:Filesystem): Started pmk2
>
> After this the contents of /var/lib/pmk in pmk2 are those that were used to
> populated the DRBD filesystem in pmk1 (plus any changes introduced by pmk1
> before I stopped it), whereas /var/lib/pmk in pmk1 is now empty. Which
> implies that things seem to be behaving OK - or, at least, the way I was
> expecting for them to behave.
>
> Next I tried to bring pmk1 back on:
>
> [pmk1] # pcs cluster start pmk1
> pmk1: Starting Cluster (corosync)...
> pmk1: Starting Cluster (pacemaker)...
>
> [pmk1] # pcs status resources
> ClusterIP (ocf::heartbeat:IPaddr2): Stopped
> Master/Slave Set: DrbdDataClone [DrbdData]
> Stopped: [ pmk1 pmk2 ]
> DrbdFS (ocf::heartbeat:Filesystem): Stopped
>
> [pmk2] # pcs status resources
> ClusterIP (ocf::heartbeat:IPaddr2): Started pmk2
> Master/Slave Set: DrbdDataClone [DrbdData]
> Masters: [ pmk2 ]
> Stopped: [ pmk2 ]
> DrbdFS (ocf::heartbeat:Filesystem): Started pmk2
>
> Node pmk1 is back up, but ClusterIP and DrbdFS are not, at least on pmk1.
> And pmk2 remains in charge. I clumsily tried to restart those resources by
> hand in pmk1, to no avail:
>
> [pmk1] # pcs resource restart ClusterIP
> Error: Error performing operation: No such device or address
> ClusterIP is not running anywhere and so cannot be restarted
>
This sounds like pmk1 did not actually join the cluster. You need to
check logs to see what happened when pacemaker on pmk1 was restarted.
> I also tried stopping and starting the pmk1 node from pmk1, and also from
> pmk2, several times, to no avail.
>
> How can I bring back the pmk1 node on correctly, so that everything is how
> it originally was - i.e. with pmk1 up and running, and with the resources
> also up and running in pmk1?
>
>
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
>
More information about the Users
mailing list