[Pacemaker] pcmk_shutdown: Still waiting for crmd

Wed Dec 7 04:27:31 EST 2011

Hi, 

I built a test cluster with 2 nodes. 
Ubuntu 10.4.3 LTS with ppa:ubuntu-ha-maintainers/ppa 

corosync 1.4.2 
pacemaker 1.1.6 

primitive clvm ocf:lvm2:clvmd \ 
params daemon_timeout="30" \ 
operations $id="clvm-operations" \ 
op start interval="0" timeout="90" \ 
op stop interval="0" timeout="100" \ 
op monitor interval="0" timeout="20" start-delay="0" \ 
meta target-role="started" 
primitive data ocf:heartbeat:LVM \ 
params volgrpname="data" \ 
operations $id="data-operations" \ 
op start interval="0" timeout="30" \ 
op stop interval="0" timeout="30" \ 
op monitor interval="10" timeout="120" start-delay="0" \ 
op methods interval="0" timeout="5" \ 
meta target-role="started" 
primitive dlm ocf:pacemaker:controld \ 
operations $id="dlm-operations" \ 
op start interval="0" timeout="90" \ 
op stop interval="0" timeout="100" \ 
op monitor interval="10" timeout="20" start-delay="0" \ 
meta target-role="started" 
primitive fs ocf:heartbeat:Filesystem \ 
params device="/dev/data/test" directory="/data/test" fstype="ocfs2" \ 
operations $id="fs-operations" \ 
op start interval="0" timeout="60" \ 
op stop interval="0" timeout="60" \ 
op monitor interval="120" timeout="40" start-delay="0" \ 
op notify interval="0" timeout="60" \ 
meta target-role="started" 
primitive o2cb ocf:pacemaker:o2cb \ 
operations $id="o2cb-operations" \ 
op start interval="0" timeout="90" \ 
op stop interval="0" timeout="100" \ 
op monitor interval="0" timeout="20" start-delay="0" \ 
meta target-role="started" 
primitive res_DRBD ocf:linbit:drbd \ 
params drbd_resource="r0" \ 
operations $id="res_DRBD-operations" \ 
op start interval="0" timeout="240" \ 
op promote interval="0" timeout="90" \ 
op demote interval="0" timeout="90" \ 
op stop interval="0" timeout="100" \ 
op monitor interval="30" timeout="20" start-delay="1min" \ 
op notify interval="0" timeout="90" \ 
meta target-role="started" 
group dlm-clvm dlm clvm 
ms ms_DRBD res_DRBD \ 
meta master-max="2" clone-max="2" notify="true" interleave="true" 
clone clone_data data \ 
meta clone-max="2" ordered="true" interleave="true" 
clone dlm-clvm-clone dlm-clvm \ 
meta interleave="true" ordered="true" 
clone fs-clone fs \ 
meta clone-max="2" ordered="true" interleave="true" 
clone o2cb-clone o2cb \ 
meta clone-max="2" interleave="true" 
colocation col_data_clvm-dlm-clone inf: clone_data dlm-clvm-clone 
colocation col_fs_o2cb inf: fs-clone o2cb-clone 
colocation col_ms_DRBD_dlm-clvm-clone inf: dlm-clvm-clone ms_DRBD:Master 
colocation col_o2cb_dlm-clvm inf: o2cb-clone dlm-clvm-clone 
order ord_data_after_clvm-dlm-clone inf: dlm-clvm-clone clone_data 
order ord_ms_DRBD_dlm-clvm-clone inf: ms_DRBD:promote dlm-clvm-clone:start 
order ord_o2cb_after_dlm-clvm 0: dlm-clvm-clone o2cb-clone 
order ord_o2cb_fs inf: o2cb-clone fs-clone 
property $id="cib-bootstrap-options" \ 
dc-version="1.1.6-9971ebba4494012a93c03b40a2c58ec0eb60f50c" \ 
cluster-infrastructure="openais" \ 
expected-quorum-votes="2" \ 
stonith-enabled="false" \ 
no-quorum-policy="ignore" \ 
last-lrm-refresh="1323246238" \ 
default-resource-stickiness="1000" 

The problem is to restart corosync or to reboot a cluster node. All resources are stopped except for drbd resource. Than the system hangs for a long time. 
corosync.log: 

ubuntu0 crmd: [926]: info: do_state_transition: (Re)Issuing shutdown request now that we are the DC 
ubuntu0 crmd: [926]: info: do_state_transition: Starting PEngine Recheck Timer 
ubuntu0 crmd: [926]: info: do_shutdown_req: Sending shutdown request to DC: ubuntu0 
ubuntu0 crmd: [926]: info: handle_shutdown_request: Creating shutdown request for ubuntu0 (state=S_IDLE) 
corosync [pcmk ] notice: pcmk_shutdown: Still waiting for crmd (pid=926, seq=6) to terminate... 
corosync [pcmk ] notice: pcmk_shutdown: Still waiting for crmd (pid=926, seq=6) to terminate... 
corosync [pcmk ] notice: pcmk_shutdown: Still waiting for crmd (pid=926, seq=6) to terminate... 
corosync [pcmk ] notice: pcmk_shutdown: Still waiting for crmd (pid=926, seq=6) to terminate... 
corosync [pcmk ] notice: pcmk_shutdown: Still waiting for crmd (pid=926, seq=6) to terminate... 

I tested the same config with a debian 6.0.3. The reboot works. The behaviour there is, that in the first step the drbd resource demote to secondary and then goes down. 

Is this a known problem?? 

Thank you for help. 

Regards, 
Erik 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.clusterlabs.org/pipermail/pacemaker/attachments/20111207/8ad9424f/attachment-0002.html>