[Pacemaker] resources does not start on survied node after reboot

Thu Oct 31 13:23:06 EDT 2013

On 2013-10-29T18:12:51, Саша Александров <shurrman at gmail.com> wrote:

> Oct 29 13:04:21 wcs2 pengine[2362]:  warning: stage6: Scheduling Node wcs1
> for STONITH
> Oct 29 13:04:21 wcs2 crmd[2363]:   notice: te_fence_node: Executing reboot
> fencing operation (53) on wcs1 (timeout=60000)
> Oct 29 13:05:33 wcs2 stonith-ng[2359]:    error: remote_op_done: Operation
> reboot of wcs1 by wcs2 for crmd.2363 at wcs2.4a3b045d: Timer expired
> Oct 29 13:05:33 wcs2 crmd[2363]:   notice: tengine_stonith_callback:
> Stonith operation 2/53:0:0:f56c4538-1ad8-4871-825e-167eb9304677: Timer

> The node wcs1 is off, should not SBD determine that, and should not the
> cluster start the resources?

The operation times out after about 10s here, there's nothing from sbd
actually being called in the logs.

The most common case is stonith-timeout in the CIB being set too short
for the configured "msgwait" timeout in sbd.

It may be easier to help you if you share your configuration, versions,
and "sbd dump" for the devices configured.

Regards,
    Lars

-- 
Architect Storage/HA
SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 21284 (AG Nürnberg)
"Experience is the name everyone gives to their mistakes." -- Oscar Wilde