[ClusterLabs] Could not start only one node in pacemaker

Ken Gaillot kgaillot at redhat.com
Wed May 2 11:51:23 EDT 2018


On Wed, 2018-05-02 at 02:52 +0000, 范国腾 wrote:
> Hi,
> The cluster has three nodes: one is master and two are slave. Now we
> run “pcs cluster stop --all” to stop all of the nodes. Then we run
> “pcs cluster start” in the master node. We find it not able to
> started. The cause is that the stonith resource could not be started
> so all of the other resource could not be started.

This is how quorum works. Only a cluster partition with quorum (at
least 2 nodes in your case) can run resources or fence other nodes.
That way, if there is a split when all nodes are live, the part of the
split with the most nodes wins.

We test this case in two cluster system and the result is same:
> l  If we start all of the three nodes, the stonith resource could be
> started. If we stop one node after it starts, the stonith resource
> could be migrated to another node and the cluster still work.
> l  If we start only one or only two nodes, the stonith resource could
> not be started.

If you start two nodes, they should fence the third, then proceed to
run resources. However:
 
> (1)   We create the stonith resource using this method in one system:
> pcs stonith create ipmi_node1 fence_ipmilan ipaddr="192.168.100.202"
> login="ADMIN" passwd="ADMIN" pcmk_host_list="node1"
> pcs stonith create ipmi_node2 fence_ipmilan ipaddr="192.168.100.203"
> login="ADMIN" passwd="ADMIN" pcmk_host_list="node2"
> pcs stonith create ipmi_node3 fence_ipmilan ipaddr="192.168.100.204"
> login="ADMIN" passwd="ADMIN" pcmk_host_list="node3"

IPMI fencing requires that the IPMI device be responding to requests.
If the third node does not have power, the IPMI won't respond, so the
fencing will fail, and the cluster will be unable to proceed. Perhaps
that is what happened when you tried a two-node test?

The customary way around this is to use either sbd or power fencing as
a fallback when IPMI fails.
 
> (2)   We create the stonith resource using this method in another
> system:
> pcs stonith create scsi-stonith-device fence_scsi
> devices=/dev/mapper/fence pcmk_monitor_action=metadata
> pcmk_reboot_action=off pcmk_host_list="node1 node2 node3 node4" meta
> provides=unfencing;

It's better to set the stonith-action cluster property to off than
pcmk_reboot_action. The reason is that the cluster may remap some
reboots to off then on, in which case pcmk_reboot_action would not get
used. pcmk_reboot_action is intended for when the fence agent has a
reboot command by some other name.

An exception is that stonith-action applies only to fencing initiated
by the cluster. If some external software (e.g. stonith_admin)
initiates a reboot explicitly, it will still be a reboot.
 
> The log is in the attachment.
> What prevents the stonith resource to be started if we only started
> part of the nodes?
-- 
Ken Gaillot <kgaillot at redhat.com>



More information about the Users mailing list