[ClusterLabs] Issue with DB2 HADR cluster

Tue Apr 2 13:49:41 EDT 2019

On 2019-04-02 1:32 p.m., Andrei Borzenkov wrote:
> 02.04.2019 19:32, Dileep V Nair пишет:
>>
>>
>> Hi,
>>
>> 	I have a two node DB2 Cluster with pacemaker and HADR. When I issue a
>> reboot -f on the node where Primary Database is running, I expect the
>> Standby database to be promoted as Primary. But what is happening is
>> pacemaker waits for 180 seconds (guess that is the SBD timeout) and by the
>> time the second node takes action, the DB is already in
>> STANDBY/REMOTE_CATCHUP_PENDING/DISCONNECTED state and cannot be promoted
>> anymore. If that is the expected behaviour, I believe in a node crash
>> situation, the cluster does not work. Can someone guide me on what could be
>> wrong here.
>>
> 
> Is stonith enabled? Did you configure correct timeouts? Very cursory
> look in db2 agent:
> 
> In case of HADR be very deliberate in specifying intervals/timeouts. The
> detection of a failure including promote must complete within
> HADR_PEER_WINDOW.

It's worth noting that SBD fencing is "better than nothing", but slow.
IPMI and/or PDU fencing completes a lot faster.

-- 
Digimer
Papers and Projects: https://alteeve.com/w/
"I am, somehow, less interested in the weight and convolutions of
Einstein’s brain than in the near certainty that people of equal talent
have lived and died in cotton fields and sweatshops." - Stephen Jay Gould