[Pacemaker] some questions about STONITH
Andrey Groshev
greenx at yandex.ru
Tue Nov 19 13:10:29 EST 2013
Hi everyone again.
I started training with STONITH.
I wrote a little STONITH external script.
Its basic moments:
* send the command "reboot" with SSH authentication using a key.
* The script takes a single argument - the path to the private key.
* Any node can send reboot any node (even yourself).
In the crm config it looks like this:
property $id="cib-bootstrap-options" \
stonith-enabled="true"
primitive st1 stonith:external/sshbykey \
params path2key="/opt/cluster_tools_2/keys/root at dev-cluster2-master" pcmk_host_check="none"
clone cloneStonith st1
Made the first test - Ok, node was rebooted and resource are started.
#export path2key=/opt/cluster_tools_2/keys/root at dev-cluster2-master.unix.tensor.ru
# stonith -t external/sshbykey -E dev-cluster2-node1
info: external_run_cmd: '/usr/lib64/stonith/plugins/external/sshbykey reset dev-cluster2-node1' output: Now boot time 1384850888, send reboot
info: external_run_cmd: '/usr/lib64/stonith/plugins/external/sshbykey reset dev-cluster2-node1' output: Daration: 1340 sec.
info: external_run_cmd: '/usr/lib64/stonith/plugins/external/sshbykey reset dev-cluster2-node1' output: GOOD NEWS: dev-cluster2-node1 booted in 1384864288
Do not worry about attention to the "Duration", this because of the jump time before synchronization time in the virtual machine and the server. Here the meaning of a change, rather than a specific number of seconds. Next time reboot 10 - 20 sec.
But farther, there are problems and questions. :)
1.
Make next test:
#stonith_admin --reboot=dev-cluster2-node2
Node reboot, but resource don't start.
In crm_mon status - Node dev-cluster2-node2 (172793105): pending.
And it will be hung.
Next, if I reboot this node in console, or stonith or stonith_admin (the same command!) - resources stats.
Portions of the logs:
trace: unpack_status: Processing node id=172793105, uname=dev-cluster2-node2
trace: find_xml_node: Could not find transient_attributes in node_state.
trace: unpack_instance_attributes: No instance attributes
trace: unpack_status: determining node state
trace: determine_online_status_fencing: dev-cluster2-node2: in_cluster=false, is_peer=online, join=down, expected=down, term=0
info: determine_online_status_fencing: - Node dev-cluster2-node2 is not ready to run resources
trace: determine_online_status: Node dev-cluster2-node2 is offline
........
trace: unpack_status: Processing lrm resource entries on healthy node: dev-cluster2-node2
trace: find_xml_node: Could not find lrm in node_state.
trace: find_xml_node: Could not find lrm_resources in <NULL>.
trace: unpack_lrm_resources: Unpacking resources on dev-cluster2-node2
..............
trace: can_run_resources: dev-cluster2-node2: online=0, unclean=0, standby=1, maintenance=0
trace: check_actions: Skipping param check for dev-cluster2-node2: cant run resources
.......
trace: native_color: Pre-allloc: VirtualIP allocation score on dev-cluster2-node2: 0
...........
<node id="172793105" uname="dev-cluster2-node2">
<instance_attributes id="nodes-172793105">
<nvpair id="nodes-172793105-pgsql-data-status" name="pgsql-data-status" value="DISCONNECT"/>
<nvpair id="nodes-172793105-standby" name="standby" value="false"/>
<nvpair id="nodes-172793105-thisquorumnode" name="thisquorumnode" value="no"/>
</instance_attributes>
</node>
Why do that behavior?
2.
There is a slight discrepancy in the Pacemaker Expl. and stonith_admin --help.
stonith_admin --reboot nodename.
In one case, the sign of equality is, in other - no.
Not very important, because operate both.
But when you start to work and something goes wrong, do you think at all suspicious things. :)
3.
Andrew! You promised post about STONITH debug.
4. (to ALL)
Also, please tell me the real arguments against the use of the SSH in STONITH.
I have my own guesses and thoughts, but I would like to know your experience.
My environment:
corosync-2.3.2
resource-agents-3.9.5
pacemaker 1.1.11
----
Thanks in advance,
Andrey Groshev
More information about the Pacemaker
mailing list