[ClusterLabs] Unexpected behaviour of PEngine Recheck Timer while in maintenance mode

Rolf Weber Rolf.Weber at asamnet.de
Tue Apr 28 19:40:57 UTC 2015


Hi!

On 07:27 Mon 27 Apr     , Andrew Beekhof wrote:
> What exactly were you doing at this point?

resizing a filesystem.
fs was unexported and unmounted.
as I understand maintenence mode this should have worked (and there was no
problem until the recheck timer triggered).

> I ask because:
> 
> Apr 19 23:16:51 astorage1 crmd: [4150]: ERROR: lrm_get_rsc(666): failed to send a getrsc message to lrmd via ch_cmd channel.
> Apr 19 23:16:51 astorage1 crmd: [4150]: ERROR: lrm_get_rsc(666): failed to send a getrsc message to lrmd via ch_cmd channel.
> Apr 19 23:16:51 astorage1 crmd: [4150]: ERROR: lrm_add_rsc(870): failed to send a addrsc message to lrmd via ch_cmd channel.
> Apr 19 23:16:51 astorage1 crmd: [4150]: ERROR: lrm_get_rsc(666): failed to send a getrsc message to lrmd via ch_cmd channel.
> 
> and 
> 
> Apr 19 23:16:52 astorage2 crmd: [4018]: CRIT: lrm_connection_destroy: LRM Connection failed
> Apr 19 23:16:52 astorage2 crmd: [4018]: info: lrm_connection_destroy: LRM Connection disconnected
> 
> suggest that the lrmd processes on both machines crashed or failed.

I know. I didn't find any core files or other indications that it indeed did
crash.
errors appeared after the failed-message for op 43 (which is ipmi fencing).

> I would definitely recommend an upgrade from 1.1.7

will be done, now that jessie is stable. don't quite know when yet.

cheers,
rolf weber





More information about the Users mailing list