[Pacemaker] stonithd process can restart automatically but stonith plugins can't

Fri Jun 13 06:01:59 EDT 2008

Hi,

On Fri, Jun 13, 2008 at 10:56:30AM +0200, Andrew Beekhof wrote:
> Looks like a job for Dejan :-)
>
> On Jun 13, 2008, at 7:54 AM, Junko IKEDA wrote:
>
>> Hi,
>>
>> I set stonith=enable with this combination.
>> Heartbeat Devel : b6de0d1458c0
>> Pacemaker Devel : 32a830e35466
>>
>> When I killed stonithd process (kill -9 PID),
>> heartbeat could restart it automatically in 1 second.
>> But stonith plugins which are set as clone went to "monitor FAILED" and
>> stop.

A stonith resource is started only in the current stonithd
instance. If the stonithd process is gone, along with it gone is
the status of all its stonith resources. A started stonith
resource should more properly be termed enabled and this is only
valid in the current stonithd process.

In other words, there's no use trying a monitor operation with a
new stonithd instance: it is "empty" and will always return "not
running". The only way to proceed, once crmd realises that
stonithd process has died, is to consider all stonith resources
which were "started" on that node as stopped and to start them
again. Probably also not to update the fail_count since the
resources themselves didn't fail, just the stonithd process.

>> It seems that the change of PID causes this.
>> Is it expected?
>>
>> If clone (stonith plugins) has the following parameters,
>> * globally_unique=false
>> * migration-threshold=0
>> plugins would restart again.
>> Is this the suggested configuration?

I don't think that those parameters should influence this.

Thanks,

Dejan

>> Best Regards,
>> Junko Ikeda
>>
>> NTT DATA INTELLILINK CORPORATION
>> <hb_report.tar.gz>_______________________________________________
>> Pacemaker mailing list
>> Pacemaker at clusterlabs.org
>> http://list.clusterlabs.org/mailman/listinfo/pacemaker
>