[Pacemaker] pcmk_shutdown: Still waiting for crmd

Andrew Beekhof andrew at beekhof.net
Wed Dec 7 22:15:46 EST 2011


On Wed, Dec 7, 2011 at 9:44 PM, Erik Schwalbe <erik.schwalbe at canoo.com> wrote:
> Hi,
>
> I think there is never an attempt to demote it.
> kern.log:
>
> 11:35:58 ubuntu1 kernel: Kernel logging (proc) stopped.
>
> corosync.log:
> 11:35:59 ubuntu1 crmd: [839]: info: crm_signal_dispatch: Invoking handler for signal 15: Terminated
> 11:35:59 ubuntu1 crmd: [839]: info: crm_shutdown: Requesting shutdown
> 11:35:59 ubuntu1 crmd: [839]: notice: crm_shutdown: Forcing shutdown in: 1200000ms
>
> after that, there is no entry in kern.log

Sounds strange.
Can you file a bug and attach a crm_report dating from the time you
initiated shutdown please?

>
> Regards,
> Erik
>
>
> ----- Ursprüngliche Mail -----
> Von: "Andreas Kurz" <andreas at hastexo.com>
> An: pacemaker at oss.clusterlabs.org
> Gesendet: Mittwoch, 7. Dezember 2011 11:06:14
> Betreff: Re: [Pacemaker] pcmk_shutdown: Still waiting for crmd
>
> On 12/07/2011 10:27 AM, Erik Schwalbe wrote:
>> Hi,
>>
>> I built a test cluster with 2 nodes.
>> Ubuntu 10.4.3 LTS with *ppa:ubuntu-ha-maintainers/ppa*
>>
>> corosync 1.4.2
>> pacemaker 1.1.6
>>
>> primitive clvm ocf:lvm2:clvmd \
>>         params daemon_timeout="30" \
>>         operations $id="clvm-operations" \
>>         op start interval="0" timeout="90" \
>>         op stop interval="0" timeout="100" \
>>         op monitor interval="0" timeout="20" start-delay="0" \
>>         meta target-role="started"
>> primitive data ocf:heartbeat:LVM \
>>         params volgrpname="data" \
>>         operations $id="data-operations" \
>>         op start interval="0" timeout="30" \
>>         op stop interval="0" timeout="30" \
>>         op monitor interval="10" timeout="120" start-delay="0" \
>>         op methods interval="0" timeout="5" \
>>         meta target-role="started"
>> primitive dlm ocf:pacemaker:controld \
>>         operations $id="dlm-operations" \
>>         op start interval="0" timeout="90" \
>>         op stop interval="0" timeout="100" \
>>         op monitor interval="10" timeout="20" start-delay="0" \
>>         meta target-role="started"
>> primitive fs ocf:heartbeat:Filesystem \
>>         params device="/dev/data/test" directory="/data/test"
>> fstype="ocfs2" \
>>         operations $id="fs-operations" \
>>         op start interval="0" timeout="60" \
>>         op stop interval="0" timeout="60" \
>>         op monitor interval="120" timeout="40" start-delay="0" \
>>         op notify interval="0" timeout="60" \
>>         meta target-role="started"
>> primitive o2cb ocf:pacemaker:o2cb \
>>         operations $id="o2cb-operations" \
>>         op start interval="0" timeout="90" \
>>         op stop interval="0" timeout="100" \
>>         op monitor interval="0" timeout="20" start-delay="0" \
>>         meta target-role="started"
>> primitive res_DRBD ocf:linbit:drbd \
>>         params drbd_resource="r0" \
>>         operations $id="res_DRBD-operations" \
>>         op start interval="0" timeout="240" \
>>         op promote interval="0" timeout="90" \
>>         op demote interval="0" timeout="90" \
>>         op stop interval="0" timeout="100" \
>>         op monitor interval="30" timeout="20" start-delay="1min" \
>>         op notify interval="0" timeout="90" \
>>         meta target-role="started"
>> group dlm-clvm dlm clvm
>> ms ms_DRBD res_DRBD \
>>         meta master-max="2" clone-max="2" notify="true" interleave="true"
>> clone clone_data data \
>>         meta clone-max="2" ordered="true" interleave="true"
>> clone dlm-clvm-clone dlm-clvm \
>>         meta interleave="true" ordered="true"
>> clone fs-clone fs \
>>         meta clone-max="2" ordered="true" interleave="true"
>> clone o2cb-clone o2cb \
>>         meta clone-max="2" interleave="true"
>> colocation col_data_clvm-dlm-clone inf: clone_data dlm-clvm-clone
>> colocation col_fs_o2cb inf: fs-clone o2cb-clone
>> colocation col_ms_DRBD_dlm-clvm-clone inf: dlm-clvm-clone ms_DRBD:Master
>> colocation col_o2cb_dlm-clvm inf: o2cb-clone dlm-clvm-clone
>> order ord_data_after_clvm-dlm-clone inf: dlm-clvm-clone clone_data
>> order ord_ms_DRBD_dlm-clvm-clone inf: ms_DRBD:promote dlm-clvm-clone:start
>> order ord_o2cb_after_dlm-clvm 0: dlm-clvm-clone o2cb-clone
>> order ord_o2cb_fs inf: o2cb-clone fs-clone
>> property $id="cib-bootstrap-options" \
>>         dc-version="1.1.6-9971ebba4494012a93c03b40a2c58ec0eb60f50c" \
>>         cluster-infrastructure="openais" \
>>         expected-quorum-votes="2" \
>>         stonith-enabled="false" \
>>         no-quorum-policy="ignore" \
>>         last-lrm-refresh="1323246238" \
>>         default-resource-stickiness="1000"
>>
>> The problem is to restart corosync or to reboot a cluster node. All
>> resources are stopped except for drbd resource. Than the system hangs
>> for a long time.
>
> Is there a timeout on stopping/demoteing DRBD and do you see kernel
> messages from DRBD about being unable to demote because in use ... or is
> there never an attempt to demote it?
>
> Regards,
> Andreas
>
> --
> Need help with Pacemaker?
> http://www.hastexo.com/now
>
>> corosync.log:
>>
>> ubuntu0 crmd: [926]: info: do_state_transition: (Re)Issuing shutdown
>> request now that we are the DC
>> ubuntu0 crmd: [926]: info: do_state_transition: Starting PEngine Recheck
>> Timer
>> ubuntu0 crmd: [926]: info: do_shutdown_req: Sending shutdown request to
>> DC: ubuntu0
>> ubuntu0 crmd: [926]: info: handle_shutdown_request: Creating shutdown
>> request for ubuntu0 (state=S_IDLE)
>> corosync [pcmk  ] notice: pcmk_shutdown: Still waiting for crmd
>> (pid=926, seq=6) to terminate...
>> corosync [pcmk  ] notice: pcmk_shutdown: Still waiting for crmd
>> (pid=926, seq=6) to terminate...
>> corosync [pcmk  ] notice: pcmk_shutdown: Still waiting for crmd
>> (pid=926, seq=6) to terminate...
>> corosync [pcmk  ] notice: pcmk_shutdown: Still waiting for crmd
>> (pid=926, seq=6) to terminate...
>> corosync [pcmk  ] notice: pcmk_shutdown: Still waiting for crmd
>> (pid=926, seq=6) to terminate...
>>
>> I tested the same config with a debian 6.0.3. The reboot works. The
>> behaviour there is, that in the first step the drbd resource demote to
>> secondary and then goes down.
>>
>> Is this a known problem??
>>
>> Thank you for help.
>>
>> Regards,
>> Erik
>>
>>
>>
>>
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org




More information about the Pacemaker mailing list