<div style="line-height:1.7;color:#000000;font-size:14px;font-family:Arial"><div style="line-height:1.7;color:#000000;font-size:14px;font-family:Arial">Hi all,<br>I am using pacemaker/corosync and iscsi to have a high-available server.<br>At the beginning it is very good, but two days ago there is some error.<br><br>When started one node, it's always offline.<br><br>Last updated: Mon Aug 15 17:31:54 2016<br>Last change: Mon Aug 15 16:34:30 2016 via crmd on node0<br>Current DC: NONE<br>1 Nodes configured<br>0 Resources configured<br><br>Node node0 (1): UNCLEAN (offline)<br><br>In the log /var/log/message:<br>Aug 15 09:25:04 node0 kernel: connection1:0: detected conn error (1020)<br>Aug
15 09:25:04 node0 iscsid: Kernel reported iSCSI connection 1:0 error
(1020 - ISCSI_ERR_TCP_CONN_CLOSE: TCP connection closed) <br><br>state (3)<br>Aug 15 09:25:07 node0 iscsid: connection1:0 is operational after recovery (1 attempts)<br>Aug 15 09:25:09 node0 kernel: connection1:0: detected conn error (1020)<br>Aug
15 09:25:10 node0 iscsid: Kernel reported iSCSI connection 1:0 error
(1020 - ISCSI_ERR_TCP_CONN_CLOSE: TCP connection closed) <br><br>state (3)<br>Aug 15 09:25:12 node0 iscsid: connection1:0 is operational after recovery (1 attempts)<br>Aug 15 09:25:15 node0 kernel: connection1:0: detected conn error (1020)<br>Aug
15 09:25:15 node0 iscsid: Kernel reported iSCSI connection 1:0 error
(1020 - ISCSI_ERR_TCP_CONN_CLOSE: TCP connection closed) <br><br>state (3)<br>Aug 15 09:25:18 node0 iscsid: connection1:0 is operational after recovery (1 attempts)<br>Aug 15 09:25:20 node0 kernel: connection1:0: detected conn error (1020)<br>Aug
15 09:25:20 node0 iscsid: Kernel reported iSCSI connection 1:0 error
(1020 - ISCSI_ERR_TCP_CONN_CLOSE: TCP connection closed) <br><br>state (3)<br>Aug 15 09:25:23 node0 iscsid: connection1:0 is operational after recovery (1 attempts)<br><br>That looks like a iscsi error. Then I stop iscsi, and restart corosync, the node is still offline as before, and the log is as <br><br>follows:<br><br>Aug 15 17:32:04 node0 crmd[7208]: notice: lrm_state_verify_stopped: Stopped 0 recurring operations at shutdown (0 ops remaining)<br>Aug 15 17:32:04 node0 crmd[7208]: notice: do_lrm_control: Disconnected from the LRM<br>Aug 15 17:32:04 node0 crmd[7208]: notice: terminate_cs_connection: Disconnecting from Corosync<br>Aug 15 17:32:04 node0 crmd[7208]: error: crmd_fast_exit: Could not recover from internal error<br>Aug 15 17:32:04 node0 pacemakerd[7100]: error: pcmk_child_exit: Child process crmd (7208) exited: Generic Pacemaker error (201)<br>Aug 15 17:32:04 node0 pacemakerd[7100]: notice: pcmk_process_exit: Respawning failed child process: crmd<br>Aug 15 17:32:04 node0 crmd[7209]: notice: crm_add_logfile: Additional logging available in /var/log/pacemaker.log<br>Aug 15 17:32:04 node0 crmd[7209]: notice: main: CRM Git Version: 368c726<br>Aug 15 17:32:05 node0 crmd[7209]: notice: crm_cluster_connect: Connecting to cluster infrastructure: corosync<br>Aug 15 17:32:05 node0 crmd[7209]: notice: cluster_connect_quorum: Quorum acquired<br>Aug 15 17:32:05 node0 crmd[7209]: notice: crm_update_peer_state: pcmk_quorum_notification: Node node0[1] - state is now member <br><br>(was (null))<br>Aug
15 17:32:05 node0 crmd[7209]: notice: crm_update_peer_state:
pcmk_quorum_notification: Node node0[1] - state is now lost (was <br><br>member)<br>Aug 15 17:32:05 node0 crmd[7209]: error: reap_dead_nodes: We're not part of the cluster anymore<br>Aug 15 17:32:05 node0 crmd[7209]: error: do_log: FSA: Input I_ERROR from reap_dead_nodes() received in state S_STARTING<br>Aug 15 17:32:05 node0 crmd[7209]: notice: do_state_transition: State transition S_STARTING -> S_RECOVERY [ input=I_ERROR <br><br>cause=C_FSA_INTERNAL origin=reap_dead_nodes ]<br>Aug 15 17:32:05 node0 crmd[7209]: warning: do_recover: Fast-tracking shutdown in response to errors<br>Aug 15 17:32:05 node0 crmd[7209]: error: do_started: Start cancelled... S_RECOVERY<br>Aug 15 17:32:05 node0 crmd[7209]: error: do_log: FSA: Input I_TERMINATE from do_recover() received in state S_RECOVERY<br>Aug 15 17:32:05 node0 crmd[7209]: notice: lrm_state_verify_stopped: Stopped 0 recurring operations at shutdown (0 ops remaining)<br>Aug 15 17:32:05 node0 crmd[7209]: notice: do_lrm_control: Disconnected from the LRM<br>Aug 15 17:32:05 node0 crmd[7209]: notice: terminate_cs_connection: Disconnecting from Corosync<br>Aug 15 17:32:05 node0 crmd[7209]: error: crmd_fast_exit: Could not recover from internal error<br>Aug 15 17:32:05 node0 pacemakerd[7100]: error: pcmk_child_exit: Child process crmd (7209) exited: Generic Pacemaker error (201)<br>Aug 15 17:32:05 node0 pacemakerd[7100]: error: pcmk_process_exit: Child respawn count exceeded by crmd</div></div><br><br><span title="neteasefooter"><p> </p></span>