[ClusterLabs] 2 node cluster dlm/clvm trouble
Andrei Borzenkov
arvidjaar at gmail.com
Thu Sep 6 13:48:44 EDT 2018
06.09.2018 17:36, Patrick Whitney пишет:
> Good Morning Everyone,
>
> I'm hoping someone with more experience with corosync and pacemaker can see
> what I am doing wrong.
>
> I've got a test setup of 2 nodes, with dlm and clvm setup as clones, and
> using fence_scsi as my fencing agent.
>
> I've got it to the point where the cluster is up, and reports it is happy.
> I then began testing fencing. When issuing 'pcs stonith fence' it appears
> to work; that is, the scsi reservation is pulled and the output of 'pcs
> status' looks sane, and I'm able to access resources on the un-fenced node.
>
> Things go awry when I shutdown (init 0) the fenced node... my unfenced node
> decides to fence itself (which looks like it was initiated by dlm due to an
> abandoned lockspace).
>
> I suspect this is due to misconfiguration, since I'm new to the toolset,
> but I'm not quite sure what I need to change.
>
> Any and all input is appreciated!
>
> Below is a chronology of events; my corosync config and cib.xml; command
> output; and annotated logs.
>
> Again, any hints, suggestions, wild guesses, or premonitions are welcomed
> -- I'm stuck! Please let me know if there is additional information which
> would be helpful.
>
> Many thanks,
> -Patrick W.
>
> Sep 6 08:54:14 -- Cluster is up and running; UI reports everything
> healthy.
>
> Sep 6 08:55:44 -- 'pcs stonith fence' called against node 1
> (coro-test-1);
> UI reports everything as expected -- that
> is, resources show only running on unfenced node and they're available.
> Oddly, although the UI says dlm is stopped
> on fenced node, the dlm_controld is still running.
>
> Sep 6 09:03:38 -- node 1 is shutdown, and node 2 falls to pieces.
> - First, corosync sees lost member -- seems
> like this is appropriate, to me.
> - Next, dlm_controld calls to fence
> everything
> - stonith-ng tries to fence node 1 (but its
> already fenced!)
> - dlm closes connection to "node 2" (does
> dlm "nodes" map to cluster nodes? I'm not sure they do)
> - clvmd dlm lockspace is now abandoned;
> cluster attempts to fence the remaining node
> (But can't because scsi_fence doesn't work
> like that).
>
> ***
> ****** -- Configuration --
> ***
> root at coro-test-2:~# pcs --version
> 0.9.149
> root at coro-test-2:~# pacemakerd --version
> Pacemaker 1.1.14
I wonder if https://github.com/ClusterLabs/pacemaker/pull/839 is
relevant here.
More information about the Users
mailing list