[ClusterLabs] Pacemaker log showing time mismatch after

Mon Jan 28 10:47:54 EST 2019

On Mon, 2019-01-28 at 18:04 +0530, Dileep V Nair wrote:
> Hi,
> 
> I am seeing that there is a log entry showing Recheck Timer popped
> and the time in pacemaker.log went back in time. After sometime, the
> time issue Around the same time the resources also failed over (Slave
> became master). Do anyone know why this behavior ? 
> 
> Jan 23 01:16:48 [9383] pn4ushleccp1 lrmd: notice: operation_finished:
> db_cp1_monitor_20000:32476:stderr [ /usr/bin/.: Permission denied. ]
> Jan 23 01:16:48 [9383] pn4ushleccp1 lrmd: notice: operation_finished:
> db_cp1_monitor_20000:32476:stderr [ /usr/bin/.: Permission denied. ]
> Jan 22 20:17:03 [9386] pn4ushleccp1 crmd: info: crm_timer_popped:
> PEngine Recheck Timer (I_PE_CALC) just popped (900000ms)

Pacemaker can handle the clock jumping forward, but not backward. The
recheck timer here is unrelated to the clock jump, it's just the first
log message to appear since it jumped.

You definitely want to find out what's changing the clock.

If this is at system boot, likely the hardware clock is wrong and some
time manager (ntp, etc.) is adjusting it. Pacemaker's systemd unit file
has "After=time-sync.target" to try to ensure that it doesn't start
until after this has happened, but unfortunately you often have to take
extra steps to make time managers use that target (e.g. enable chronyd-
wait.service if you're using chronyd), and of course if you're not
using systemd it's not any help. But the basic idea is you want to
ensure pacemaker starts after the time has been adjusted at boot.

If this isn't at boot, then your host has something weird going on.
Check the system log around the time of the jump, etc.

> Jan 22 20:17:03 [9386] pn4ushleccp1 crmd: notice:
> do_state_transition: State transition S_IDLE -> S_POLICY_ENGINE |
> input=I_PE_CALC cause=C_TIMER_POPPED origin=crm_timer_popped
> Jan 22 20:17:03 [9386] pn4ushleccp1 crmd: info: do_state_transition:
> Progressed to state S_POLICY_ENGINE after C_TIMER_POPPED
> Jan 22 20:17:03 [9385] pn4ushleccp1 pengine: info:
> process_pe_message: Input has not changed since last time, not saving
> to disk
> Jan 22 20:17:03 [9385] pn4ushleccp1 pengine: notice: unpack_config:
> Relying on watchdog integration for fencing
> Jan 22 20:17:03 [9385] pn4ushleccp1 pengine: info:
> determine_online_status_fencing: Node pn4us7leccp1 is active
> Jan 22 20:17:03 [9385] pn4ushleccp1 pengine: info:
> determine_online_status: Node pn4us7leccp1 is online
> Jan 22 20:17:03 [9385] pn4ushleccp1 pengine: info:
> determine_online_status_fencing: Node pn4ushleccp1 is active
> Jan 22 20:17:03 [9385] pn4ushleccp1 pengine: info:
> determine_online_status: Node pn4ushleccp1 is online
> Jan 22 20:17:03 [9385] pn4ushleccp1 pengine: info:
> determine_op_status: Operation monitor found resource db_cp1:0 active
> on pn4us7leccp1
> Jan 22 20:17:03 [9385] pn4ushleccp1 pengine: info:
> determine_op_status: Operation monitor found resource TSM_DB2 active
> on pn4us7leccp1
> Jan 22 20:17:03 [9385] pn4ushleccp1 pengine: info:
> determine_op_status: Operation monitor found resource TSM_DB2 active
> on pn4us7leccp1
> Jan 22 20:17:03 [9385] pn4ushleccp1 pengine: info:
> determine_op_status: Operation monitor found resource ip_cp1 active
> on pn4ushleccp1
> Jan 22 20:17:03 [9385] pn4ushleccp1 pengine: info:
> determine_op_status: Operation monitor found resource db_cp1:1 active
> in master mode on pn4ushleccp1
> Jan 22 20:17:03 [9385] pn4ushleccp1 pengine: info:
> determine_op_status: Operation monitor found resource TSM_DB2log
> active on pn4ushleccp1
> Jan 22 20:17:03 [9385] pn4ushleccp1 pengine: info:
> determine_op_status: Operation monitor found resource KUD_DB2 active
> on pn4ushleccp1
> Jan 22 20:17:03 [9385] pn4ushleccp1 pengine: info: native_print:
> stonith-sbd (stonith:external/sbd): Started pn4ushleccp1
> Jan 22 20:17:03 [9385] pn4ushleccp1 pengine: info: native_print:
> ip_cp1 (ocf::heartbeat:IPaddr2): Started pn4us7leccp1
> Jan 22 20:17:03 [9385] pn4ushleccp1 pengine: info: clone_print:
> Master/Slave Set: ms_db2_cp1 [db_cp1]
> Jan 22 20:17:03 [9385] pn4ushleccp1 pengine: info: short_print:
> Masters: [ pn4us7leccp1 ]
> Jan 22 20:17:03 [9385] pn4ushleccp1 pengine: info: short_print:
> Slaves: [ pn4ushleccp1 ]
> Jan 22 20:17:03 [9385] pn4ushleccp1 pengine: info: native_print:
> TSM_DB2 (systemd:dsmcad_db2): Started pn4us7leccp1
> Jan 22 20:17:03 [9385] pn4ushleccp1 pengine: info: native_print:
> TSM_DB2log (systemd:dsmcad_db2log): Started pn4us7leccp1
> Jan 22 20:17:03 [9385] pn4ushleccp1 pengine: info: native_print:
> KUD_DB2 (systemd:kuddb2_db2): Started pn4us7leccp1
> Jan 22 20:17:03 [9385] pn4ushleccp1 pengine: info: rsc_merge_weights:
> ms_db2_cp1: Breaking dependency loop at ms_db2_cp1
> Jan 22 20:17:03 [9385] pn4ushleccp1 pengine: info: master_color:
> Promoting db_cp1:0 (Master pn4us7leccp1)
> Jan 22 20:17:03 [9385] pn4ushleccp1 pengine: info: master_color:
> ms_db2_cp1: Promoted 1 instances of a possible 1 to master
> Jan 22 20:17:03 [9385] pn4ushleccp1 pengine: info: LogActions: Leave
> ip_cp1 (Started pn4us7leccp1)
> 
> 
> After the transition, the date was shifted back to normal
> 
> Jan 22 20:47:03 [9386] pn4ushleccp1 crmd: info: do_log: Input
> I_TE_SUCCESS received in state S_TRANSITION_ENGINE from notify_crmd
> Jan 22 20:47:03 [9386] pn4ushleccp1 crmd: notice:
> do_state_transition: State transition S_TRANSITION_ENGINE -> S_IDLE |
> input=I_TE_SUCCESS cause=C_FSA_INTERNAL origin=notify_crmd
> Jan 23 01:47:22 [9383] pn4ushleccp1 lrmd: notice: operation_finished:
> db_cp1_monitor_20000:19518:stderr [ /usr/bin/.: Permission denied. ]
> Jan 23 01:47:22 [9383] pn4ushleccp1 lrmd: notice: operation_finished:
> db_cp1_monitor_20000:19518:stderr [ /usr/bin/.: Permission denied. ]
> 
> 
> 
> Thanks & Regards
> 
> Dileep Nair
> Squad Lead - SAP Base 
> Togaf Certified Enterprise Architect
> IBM Services for Managed Applications
> +91 98450 22258 Mobile
> dilenair at in.ibm.com
> 
> IBM Services
-- 
Ken Gaillot <kgaillot at redhat.com>