[ClusterLabs] fencing on iscsi device not working

Ramprasad ramprasad.neethiraj at zoologi.su.se
Wed Oct 30 08:13:13 EDT 2019


Hi everyone,

I am trying to set up a storage cluster with two nodes, both running 
debian buster. The two nodes called, duke and miles, have a LUN residing 
on a SAN box as their shared storage device between them. As you can see 
in the output of pcs status, all the demons are active and I can get the 
nodes online without any issues. However, I cannot get the fencing 
resources to start.

These two nodes were running debian jessie before and had access to the 
same LUN in a storage cluster configuration. Now, I am trying to 
recreate a similar setup with both nodes now running the latest debian.


####### pcs status
Cluster name: jazz
Stack: corosync
Current DC: duke (version 2.0.1-9e909a5bdd) - partition with quorum
Last updated: Wed Oct 30 11:58:19 2019
Last change: Wed Oct 30 11:28:28 2019 by root via cibadmin on duke

2 nodes configured
2 resources configured

Online: [ duke miles ]

Full list of resources:

  fence_duke    (stonith:fence_scsi):    Stopped
  fence_miles    (stonith:fence_scsi):    Stopped

Failed Fencing Actions:
* unfencing of duke failed: delegate=, client=pacemaker-controld.1703, 
origin=duke,
     last-failed='Wed Oct 30 11:43:29 2019'
* unfencing of miles failed: delegate=, client=pacemaker-controld.1703, 
origin=duke,
     last-failed='Wed Oct 30 11:43:29 2019'

Daemon Status:
   corosync: active/enabled
   pacemaker: active/enabled
   pcsd: active/enabled
#######

I used the following commands to add the two fencing devices and set 
their location constraints .

#######
sudo pcs cluster cib test_cib_cfg
pcs -f test_cib_cfg stonith create fence_duke fence_scsi 
pcmk_host_list=duke pcmk_reboot_action="off" 
devices="/dev/disk/by-id/wwn-0x600c0ff0001e8e3c89601b5801000000" meta 
provides="unfencing"
pcs -f test_cib_cfg stonith create fence_miles fence_scsi 
pcmk_host_list=miles pcmk_reboot_action="off" 
devices="/dev/disk/by-id/wwn-0x600c0ff0001e8e3c89601b5801000000" 
delay=15 meta provides="unfencing"
pcs -f test_cib_cfg constraint location fence_duke avoids duke=INFINITY
pcs -f test_cib_cfg constraint location fence_miles avoids miles=INFINITY
pcs cluster cib-push test_cib_cfg
#######

Here is the output in /var/log/pacemaker/pacemaker.log after adding the 
fencing resources

Oct 30 12:06:02 duke pacemaker-schedulerd[1702] 
(determine_online_status_fencing)       info: Node miles is active
Oct 30 12:06:02 duke pacemaker-schedulerd[1702] 
(determine_online_status)       info: Node miles is online
Oct 30 12:06:02 duke pacemaker-schedulerd[1702] 
(determine_online_status_fencing)       info: Node duke is active
Oct 30 12:06:02 duke pacemaker-schedulerd[1702] 
(determine_online_status)       info: Node duke is online
Oct 30 12:06:02 duke pacemaker-schedulerd[1702] (unpack_node_loop)      
info: Node 2 is already processed
Oct 30 12:06:02 duke pacemaker-schedulerd[1702] (unpack_node_loop)      
info: Node 1 is already processed
Oct 30 12:06:02 duke pacemaker-schedulerd[1702] (unpack_node_loop)      
info: Node 2 is already processed
Oct 30 12:06:02 duke pacemaker-schedulerd[1702] (unpack_node_loop)      
info: Node 1 is already processed
Oct 30 12:06:02 duke pacemaker-schedulerd[1702] (common_print) info: 
fence_duke        (stonith:fence_scsi):   Stopped
Oct 30 12:06:02 duke pacemaker-schedulerd[1702] (common_print) info: 
fence_miles       (stonith:fence_scsi):   Stopped
Oct 30 12:06:02 duke pacemaker-schedulerd[1702] (RecurringOp) info:  
Start recurring monitor (60s) for fence_duke on miles
Oct 30 12:06:02 duke pacemaker-schedulerd[1702] (RecurringOp) info:  
Start recurring monitor (60s) for fence_miles on duke
Oct 30 12:06:02 duke pacemaker-schedulerd[1702] (LogNodeActions)        
notice:  * Fence (on) miles 'required by fence_duke monitor'
Oct 30 12:06:02 duke pacemaker-schedulerd[1702] (LogNodeActions)        
notice:  * Fence (on) duke 'required by fence_duke monitor'
Oct 30 12:06:02 duke pacemaker-schedulerd[1702] (LogAction) notice:  * 
Start      fence_duke     ( miles )
Oct 30 12:06:02 duke pacemaker-schedulerd[1702] (LogAction) notice:  * 
Start      fence_miles    (  duke )
Oct 30 12:06:02 duke pacemaker-schedulerd[1702] (process_pe_message)    
notice: Calculated transition 63, saving inputs in 
/var/lib/pacemaker/pengine/pe-input-23.bz2
Oct 30 12:06:02 duke pacemaker-controld  [1703] (do_state_transition)   
info: State transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE | 
input=I_PE_SUCCESS cause=C_IPC_MESSAGE origin=handle_response
Oct 30 12:06:02 duke pacemaker-controld  [1703] (do_te_invoke) info: 
Processing graph 63 (ref=pe_calc-dc-1572433562-101) derived from 
/var/lib/pacemaker/pengine/pe-input-23.bz2
Oct 30 12:06:02 duke pacemaker-controld  [1703] (te_fence_node)         
notice: Requesting fencing (on) of node miles | action=5 timeout=60000
Oct 30 12:06:02 duke pacemaker-controld  [1703] (te_fence_node)         
notice: Requesting fencing (on) of node duke | action=2 timeout=60000
Oct 30 12:06:02 duke pacemaker-fenced    [1699] (handle_request)        
notice: Client pacemaker-controld.1703.470f8b4e wants to fence (on) 
'miles' with device '(any)'
Oct 30 12:06:02 duke pacemaker-fenced    [1699] 
(initiate_remote_stonith_op)    notice: Requesting peer fencing (on) of 
miles | id=a0ac6e3a-0296-4aff-85e3-c591f75f38d3 state=0
Oct 30 12:06:02 duke pacemaker-fenced    [1699] (handle_request)        
notice: Client pacemaker-controld.1703.470f8b4e wants to fence (on) 
'duke' with device '(any)'
Oct 30 12:06:02 duke pacemaker-fenced    [1699] 
(initiate_remote_stonith_op)    notice: Requesting peer fencing (on) of 
duke | id=261d9311-0553-48ff-864f-41d53d12b152 state=0
Oct 30 12:06:02 duke pacemaker-fenced    [1699] 
(can_fence_host_with_device)    notice: fence_miles can not fence (on) 
duke: static-list
Oct 30 12:06:02 duke pacemaker-fenced    [1699] 
(process_remote_stonith_query)  info: Query result 1 of 2 from duke for 
miles/on (0 devices) a0ac6e3a-0296-4aff-85e3-c591f75f38d3
Oct 30 12:06:02 duke pacemaker-fenced    [1699] 
(process_remote_stonith_query)  info: Query result 1 of 2 from duke for 
duke/on (0 devices) 261d9311-0553-48ff-864f-41d53d12b152
Oct 30 12:06:02 duke pacemaker-fenced    [1699] 
(process_remote_stonith_query)  info: Query result 2 of 2 from miles for 
miles/on (0 devices) a0ac6e3a-0296-4aff-85e3-c591f75f38d3
Oct 30 12:06:02 duke pacemaker-fenced    [1699] 
(process_remote_stonith_query)  info: All query replies have arrived, 
continuing (2 expected/2 received)
Oct 30 12:06:02 duke pacemaker-fenced    [1699] (stonith_choose_peer)   
notice: Couldn't find anyone to fence (on) miles with any device
Oct 30 12:06:02 duke pacemaker-fenced    [1699] (call_remote_stonith)   
info: Total timeout set to 60 for peer's fencing of miles for 
pacemaker-controld.1703|id=a0ac6e3a-0296-4aff-85e3-c591f75f38d3
Oct 30 12:06:02 duke pacemaker-fenced    [1699] (call_remote_stonith)   
info: No peers (out of 2) have devices capable of fencing (on) miles for 
pacemaker-controld.1703 (0)
Oct 30 12:06:02 duke pacemaker-fenced    [1699] 
(process_remote_stonith_query)  info: Query result 2 of 2 from miles for 
duke/on (0 devices) 261d9311-0553-48ff-864f-41d53d12b152
Oct 30 12:06:02 duke pacemaker-fenced    [1699] 
(process_remote_stonith_query)  info: All query replies have arrived, 
continuing (2 expected/2 received)
Oct 30 12:06:02 duke pacemaker-fenced    [1699] (stonith_choose_peer)   
notice: Couldn't find anyone to fence (on) duke with any device
Oct 30 12:06:02 duke pacemaker-fenced    [1699] (call_remote_stonith)   
info: Total timeout set to 60 for peer's fencing of duke for 
pacemaker-controld.1703|id=261d9311-0553-48ff-864f-41d53d12b152
Oct 30 12:06:02 duke pacemaker-fenced    [1699] (call_remote_stonith)   
info: No peers (out of 2) have devices capable of fencing (on) duke for 
pacemaker-controld.1703 (0)
Oct 30 12:06:02 duke pacemaker-fenced    [1699] (remote_op_done)        
error: Operation on of miles by <no-one> for 
pacemaker-controld.1703 at duke.a0ac6e3a: No such device
Oct 30 12:06:02 duke pacemaker-controld  [1703] 
(tengine_stonith_callback)      notice: Stonith operation 
15/5:63:0:5e3e0ef6-02a5-4f9a-b999-806413a3da12: No such device (-19)
Oct 30 12:06:02 duke pacemaker-controld  [1703] 
(tengine_stonith_callback)      notice: Stonith operation 15 for miles 
failed (No such device): aborting transition.
Oct 30 12:06:02 duke pacemaker-controld  [1703] 
(tengine_stonith_callback)      warning: No devices found in cluster to 
fence miles, giving up
Oct 30 12:06:02 duke pacemaker-controld  [1703] 
(abort_transition_graph)        notice: Transition 63 aborted: Stonith 
failed | source=abort_for_stonith_failure:776 complete=false
Oct 30 12:06:02 duke pacemaker-fenced    [1699] (remote_op_done)        
error: Operation on of duke by <no-one> for 
pacemaker-controld.1703 at duke.261d9311: No such device
Oct 30 12:06:02 duke pacemaker-controld  [1703] 
(tengine_stonith_notify)        error: Unfencing of miles by <anyone> 
failed: No such device (-19)
Oct 30 12:06:02 duke pacemaker-controld  [1703] 
(tengine_stonith_callback)      notice: Stonith operation 
16/2:63:0:5e3e0ef6-02a5-4f9a-b999-806413a3da12: No such device (-19)
Oct 30 12:06:02 duke pacemaker-controld  [1703] 
(tengine_stonith_callback)      notice: Stonith operation 16 for duke 
failed (No such device): aborting transition.
Oct 30 12:06:02 duke pacemaker-controld  [1703] 
(tengine_stonith_callback)      warning: No devices found in cluster to 
fence duke, giving up
Oct 30 12:06:02 duke pacemaker-controld  [1703] 
(abort_transition_graph)        info: Transition 63 aborted: Stonith 
failed | source=abort_for_stonith_failure:776 complete=false
Oct 30 12:06:02 duke pacemaker-controld  [1703] 
(tengine_stonith_notify)        error: Unfencing of duke by <anyone> 
failed: No such device (-19)
Oct 30 12:06:02 duke pacemaker-controld  [1703] (run_graph) notice: 
Transition 63 (Complete=2, Pending=0, Fired=0, Skipped=0, Incomplete=8, 
Source=/var/lib/pacemaker/pengine/pe-input-23.bz2): Complete
Oct 30 12:06:02 duke pacemaker-controld  [1703] (do_log) info: Input 
I_TE_SUCCESS received in state S_TRANSITION_ENGINE from notify_crmd
Oct 30 12:06:02 duke pacemaker-controld  [1703] (do_state_transition)   
notice: State transition S_TRANSITION_ENGINE -> S_IDLE | 
input=I_TE_SUCCESS cause=C_FSA_INTERNAL origin=notify_crmd
Oct 30 12:06:06 duke pacemaker-based     [1698] (cib_process_ping)      
info: Reporting our current digest to duke: 
c75a23192109201a5ceaa896d6c313cc for 0.28.6 (0x55a5ab8ff1f0 0)

#######
When I tried without explicitly mentioning the device in the stonith 
commands, this is what I end up having in the pacemaker.log.

Oct 30 12:22:34 duke pacemaker-schedulerd[1702] 
(determine_online_status_fencing)       info: Node miles is active
Oct 30 12:22:34 duke pacemaker-schedulerd[1702] 
(determine_online_status)       info: Node miles is online
Oct 30 12:22:34 duke pacemaker-schedulerd[1702] 
(determine_online_status_fencing)       info: Node duke is active
Oct 30 12:22:34 duke pacemaker-schedulerd[1702] 
(determine_online_status)       info: Node duke is online
Oct 30 12:22:34 duke pacemaker-schedulerd[1702] (unpack_node_loop)      
info: Node 2 is already processed
Oct 30 12:22:34 duke pacemaker-schedulerd[1702] (unpack_node_loop)      
info: Node 1 is already processed
Oct 30 12:22:34 duke pacemaker-schedulerd[1702] (unpack_node_loop)      
info: Node 2 is already processed
Oct 30 12:22:34 duke pacemaker-schedulerd[1702] (unpack_node_loop)      
info: Node 1 is already processed
Oct 30 12:22:34 duke pacemaker-schedulerd[1702] (common_print) info: 
fence_duke        (stonith:fence_scsi):   Stopped
Oct 30 12:22:34 duke pacemaker-schedulerd[1702] (common_print) info: 
fence_miles       (stonith:fence_scsi):   Stopped
Oct 30 12:22:34 duke pacemaker-schedulerd[1702] (RecurringOp) info:  
Start recurring monitor (60s) for fence_duke on miles
Oct 30 12:22:34 duke pacemaker-schedulerd[1702] (RecurringOp) info:  
Start recurring monitor (60s) for fence_miles on duke
Oct 30 12:22:34 duke pacemaker-schedulerd[1702] (LogNodeActions)        
notice:  * Fence (on) miles 'required by fence_duke monitor'
Oct 30 12:22:34 duke pacemaker-schedulerd[1702] (LogNodeActions)        
notice:  * Fence (on) duke 'required by fence_duke monitor'
Oct 30 12:22:34 duke pacemaker-schedulerd[1702] (LogAction) notice:  * 
Start      fence_duke     ( miles )
Oct 30 12:22:34 duke pacemaker-schedulerd[1702] (LogAction) notice:  * 
Start      fence_miles    (  duke )
Oct 30 12:22:34 duke pacemaker-schedulerd[1702] (process_pe_message)    
notice: Calculated transition 69, saving inputs in 
/var/lib/pacemaker/pengine/pe-input-28.bz2
Oct 30 12:22:34 duke pacemaker-controld  [1703] (do_state_transition)   
info: State transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE | 
input=I_PE_SUCCESS cause=C_IPC_MESSAGE origin=handle_response
Oct 30 12:22:34 duke pacemaker-controld  [1703] (do_te_invoke) info: 
Processing graph 69 (ref=pe_calc-dc-1572434554-114) derived from 
/var/lib/pacemaker/pengine/pe-input-28.bz2
Oct 30 12:22:34 duke pacemaker-controld  [1703] (te_fence_node)         
notice: Requesting fencing (on) of node miles | action=5 timeout=60000
Oct 30 12:22:34 duke pacemaker-controld  [1703] (te_fence_node)         
notice: Requesting fencing (on) of node duke | action=2 timeout=60000
Oct 30 12:22:34 duke pacemaker-fenced    [1699] (handle_request)        
notice: Client pacemaker-controld.1703.470f8b4e wants to fence (on) 
'miles' with device '(any)'
Oct 30 12:22:34 duke pacemaker-fenced    [1699] 
(initiate_remote_stonith_op)    notice: Requesting peer fencing (on) of 
miles | id=4d360268-d290-42e6-b28f-fd4d7649613b state=0
Oct 30 12:22:34 duke pacemaker-fenced    [1699] (handle_request)        
notice: Client pacemaker-controld.1703.470f8b4e wants to fence (on) 
'duke' with device '(any)'
Oct 30 12:22:34 duke pacemaker-fenced    [1699] 
(initiate_remote_stonith_op)    notice: Requesting peer fencing (on) of 
duke | id=90ca3294-5eb5-4c66-a298-cd5afcbbbd77 state=0
Oct 30 12:22:34 duke pacemaker-fenced    [1699] 
(can_fence_host_with_device)    notice: fence_miles can not fence (on) 
duke: static-list
Oct 30 12:22:34 duke pacemaker-fenced    [1699] 
(process_remote_stonith_query)  info: Query result 1 of 2 from duke for 
miles/on (0 devices) 4d360268-d290-42e6-b28f-fd4d7649613b
Oct 30 12:22:34 duke pacemaker-fenced    [1699] 
(process_remote_stonith_query)  info: Query result 2 of 2 from miles for 
miles/on (0 devices) 4d360268-d290-42e6-b28f-fd4d7649613b
Oct 30 12:22:34 duke pacemaker-fenced    [1699] 
(process_remote_stonith_query)  info: All query replies have arrived, 
continuing (2 expected/2 received)
Oct 30 12:22:34 duke pacemaker-fenced    [1699] (stonith_choose_peer)   
notice: Couldn't find anyone to fence (on) miles with any device
Oct 30 12:22:34 duke pacemaker-fenced    [1699] (call_remote_stonith)   
info: Total timeout set to 60 for peer's fencing of miles for 
pacemaker-controld.1703|id=4d360268-d290-42e6-b28f-fd4d7649613b
Oct 30 12:22:34 duke pacemaker-fenced    [1699] (call_remote_stonith)   
info: No peers (out of 2) have devices capable of fencing (on) miles for 
pacemaker-controld.1703 (0)
Oct 30 12:22:34 duke pacemaker-fenced    [1699] 
(process_remote_stonith_query)  info: Query result 1 of 2 from miles for 
duke/on (0 devices) 90ca3294-5eb5-4c66-a298-cd5afcbbbd77
Oct 30 12:22:34 duke pacemaker-fenced    [1699] 
(process_remote_stonith_query)  info: Query result 2 of 2 from duke for 
duke/on (0 devices) 90ca3294-5eb5-4c66-a298-cd5afcbbbd77
Oct 30 12:22:34 duke pacemaker-fenced    [1699] 
(process_remote_stonith_query)  info: All query replies have arrived, 
continuing (2 expected/2 received)
Oct 30 12:22:34 duke pacemaker-fenced    [1699] (stonith_choose_peer)   
notice: Couldn't find anyone to fence (on) duke with any device
Oct 30 12:22:34 duke pacemaker-fenced    [1699] (call_remote_stonith)   
info: Total timeout set to 60 for peer's fencing of duke for 
pacemaker-controld.1703|id=90ca3294-5eb5-4c66-a298-cd5afcbbbd77
Oct 30 12:22:34 duke pacemaker-fenced    [1699] (call_remote_stonith)   
info: No peers (out of 2) have devices capable of fencing (on) duke for 
pacemaker-controld.1703 (0)
Oct 30 12:22:34 duke pacemaker-fenced    [1699] (remote_op_done)        
error: Operation on of miles by <no-one> for 
pacemaker-controld.1703 at duke.4d360268: No such device
Oct 30 12:22:34 duke pacemaker-controld  [1703] 
(tengine_stonith_callback)      notice: Stonith operation 
25/5:69:0:5e3e0ef6-02a5-4f9a-b999-806413a3da12: No such device (-19)
Oct 30 12:22:34 duke pacemaker-controld  [1703] 
(tengine_stonith_callback)      notice: Stonith operation 25 for miles 
failed (No such device): aborting transition.
Oct 30 12:22:34 duke pacemaker-controld  [1703] 
(tengine_stonith_callback)      warning: No devices found in cluster to 
fence miles, giving up
Oct 30 12:22:34 duke pacemaker-controld  [1703] 
(abort_transition_graph)        notice: Transition 69 aborted: Stonith 
failed | source=abort_for_stonith_failure:776 complete=false
Oct 30 12:22:34 duke pacemaker-fenced    [1699] (remote_op_done)        
error: Operation on of duke by <no-one> for 
pacemaker-controld.1703 at duke.90ca3294: No such device
Oct 30 12:22:34 duke pacemaker-controld  [1703] 
(tengine_stonith_notify)        error: Unfencing of miles by <anyone> 
failed: No such device (-19)
Oct 30 12:22:34 duke pacemaker-controld  [1703] 
(tengine_stonith_callback)      notice: Stonith operation 
26/2:69:0:5e3e0ef6-02a5-4f9a-b999-806413a3da12: No such device (-19)
Oct 30 12:22:34 duke pacemaker-controld  [1703] 
(tengine_stonith_callback)      notice: Stonith operation 26 for duke 
failed (No such device): aborting transition.
Oct 30 12:22:34 duke pacemaker-controld  [1703] 
(tengine_stonith_callback)      warning: No devices found in cluster to 
fence duke, giving up
Oct 30 12:22:34 duke pacemaker-controld  [1703] 
(abort_transition_graph)        info: Transition 69 aborted: Stonith 
failed | source=abort_for_stonith_failure:776 complete=false
Oct 30 12:22:34 duke pacemaker-controld  [1703] 
(tengine_stonith_notify)        error: Unfencing of duke by <anyone> 
failed: No such device (-19)
Oct 30 12:22:34 duke pacemaker-controld  [1703] (run_graph) notice: 
Transition 69 (Complete=2, Pending=0, Fired=0, Skipped=0, Incomplete=8, 
Source=/var/lib/pacemaker/pengine/pe-input-28.bz2): Complete
Oct 30 12:22:34 duke pacemaker-controld  [1703] (do_log) info: Input 
I_TE_SUCCESS received in state S_TRANSITION_ENGINE from notify_crmd
Oct 30 12:22:34 duke pacemaker-controld  [1703] (do_state_transition)   
notice: State transition S_TRANSITION_ENGINE -> S_IDLE | 
input=I_TE_SUCCESS cause=C_FSA_INTERNAL origin=notify_crmd
Oct 30 12:22:37 duke pacemaker-based     [1698] (cib_process_ping)      
info: Reporting our current digest to duke: 
2eb5c8ee7e7df17c5737befc7d93de76 for 0.37.6 (0x55a5ab900f70 0)

#######

Here is my corosync config for your reference,

# Please read the corosync.conf.5 manual page
totem {
version: 2
cluster_name: debian
         token: 3000
         token_retransmits_before_loss_const: 10
         transport: udpu
         interface {
                 ringnumber: 0
                 bindnetaddr: 130.237.191.255
         }
}
logging {
fileline: off
to_stderr: no
to_logfile: yes
logfile: /var/log/corosync/corosync.log
to_syslog: yes
debug: off
timestamp: on
logger_subsys {
subsys: QUORUM
debug: off
}
}

quorum {
provider: corosync_votequorum
     two_node: 1
}

nodelist {
node {
name: duke
nodeid: 1
ring0_addr: XXXXXXXXXX
}
node {
name: miles
nodeid: 2
ring0_addr: XXXXXXXXXX
}
}
#######

I am completely out of ideas in terms of what to do, and I would 
appreciate any help. Let me know if you guys need more details.

Thanks in advance!
Ram


More information about the Users mailing list