[Pacemaker] Java application failover problem

Andrew Beekhof andrew at beekhof.net
Wed Jul 10 00:25:16 UTC 2013


On 09/07/2013, at 10:29 PM, Martin Gazak <martin.gazak at microstep-mis.sk> wrote:

> Dňa 7/9/2013 12:56 PM Andrew Beekhof  wrote / napísal(a):
>> 
>> On 09/07/2013, at 8:49 PM, Martin Gazak <martin.gazak at microstep-mis.sk> wrote:
>> 
>>> Dňa 7/9/2013 12:42 PM Andrew Beekhof  wrote / napísal(a):
>>>> 
>>>> On 09/07/2013, at 5:05 PM, Martin Gazak <martin.gazak at microstep-mis.sk> wrote:
>> 
>> It looks to be a bug in 1.1.7, you'll want to contact SUSE so they can get the fix from upstream.
> 
> Dear Andrew,
> thanks for your effort.
> 
> May I have 3 questions:
> 
> - what version did you use to detect a bug ? - you labeled it just
> "current version" ?

1.1.10-rc6

> 
> - we have downloaded corosync SuSE packages 1.1.8 and 1.1.9 - could you
> please confirm one (or both) SuSE versions have this bug fixed ?

I have no idea.

If you install them and run: 
    crm_simulate -Sx /var/lib/pengine/pe-input-2819.bz2
and it returns the same as what I got, then its fixed.

> Or you need the package itself as attachment to inspect it ?
> Or is there a way how to check our package has the bug fixed ?
> 
> - we are going to test the package 1.1.9 anyway with the stress tests.
> As I wrote you, such situation happened extremely rarely on the testing
> cluster (however often enough to make troubles in production environment).
> Do you have any idea how to reproduce this situation in a deterministic
> way ?

It might be a timing issue.

> Just blind killing of master instance of the application from cron does
> not help - the system survived correct 70+ failovers over the weekend.
> 
> Best regards
> 
> Martin Gazak
> 
> 
>> 
>> Your version:
>> 
>> Jul 04 23:45:02 ims0 pengine: [3933]: WARN: unpack_rsc_op: Processing failed op ims:0_last_failure_0 on ims0: not running (7)
>> Jul 04 23:45:02 ims0 pengine: [3933]: notice: LogActions: Recover ims:0	(Master ims0)
>> Jul 04 23:45:02 ims0 pengine: [3933]: notice: LogActions: Restart ims-ip	(Started ims0)
>> Jul 04 23:45:02 ims0 pengine: [3933]: notice: LogActions: Restart ims-ip-src	(Started ims0)
>> Jul 04 23:45:02 ims0 pengine: [3933]: notice: process_pe_message: Transition 4036: PEngine Input stored in: /var/lib/pengine/pe-input-2819.bz2
>> 
>> vs. the current version:
>> 
>>  notice: LogActions: 	Demote  ims:0	(Master -> Stopped ims0)
>>  notice: LogActions: 	Promote ims:1	(Slave -> Master ims1)
>>  notice: LogActions: 	Start   ims-ip	(ims1)
>>  notice: LogActions: 	Start   ims-ip-src	(ims1)
>> 
>> and
>> 
>> Jul 04 23:45:02 ims0 pengine: [3933]: notice: LogActions: Recover ims:0	(Master ims0)
>> Jul 04 23:45:02 ims0 pengine: [3933]: notice: LogActions: Restart ims-ip	(Started ims0)
>> Jul 04 23:45:02 ims0 pengine: [3933]: notice: LogActions: Start   ims-ip-src	(ims0)
>> Jul 04 23:45:02 ims0 pengine: [3933]: notice: process_pe_message: Transition 4037: PEngine Input stored in: /var/lib/pengine/pe-input-2820.bz2
>> 
>> 
>> vs. the current version:
>> 
>>  notice: LogActions: 	Demote  ims:0	(Master -> Stopped ims0)
>>  notice: LogActions: 	Promote ims:1	(Slave -> Master ims1)
>>  notice: LogActions: 	Start   ims-ip	(ims1)
>>  notice: LogActions: 	Start   ims-ip-src	(ims1)
>> 
> 
> 
> -- 
> 
> Regards,
> 
> Martin Gazak
> MicroStep-MIS, spol. s r.o.
> System Development Manager
> Tel.: +421 2 602 00 128
> Fax: +421 2 602 00 180
> martin.gazak at microstep-mis.sk
> http://www.microstep-mis.com





More information about the Pacemaker mailing list