[Pacemaker] WARN: mysql:monitor process (PID 21347) timed out (try 2)

Michael Schwartzkopff ms at sys4.de
Mon Jan 20 05:20:16 EST 2014

Am Montag, 20. Januar 2014, 10:54:18 schrieb Stefan Bauer:
> Hi Folks,
> we're running a pacemaker/openais cluster for a webserver with databases in
> an active/passive setup with 2 nodes on debian 6.0
> Unfortunately this is a "closed environment" so we are quite limited in
> regard to version updates.
> Close to midnight, a backup-script is running - this seems to be the reason
> for the timeouts due to high load.
> Even though the load is relaxed a few minutes later, some of the ressources
> become failed and hence are not running:
> postgresql#011(lsb:postgresql):#011Started host41.my.network FAILED
> Here is the log-file:
> http://cubewerk.de/syslog.1
> Maybe somebody see the obvious and can bring some light into this.
> thank you
> stefan


I doubt that the backup process disturbs your setup. At least not from the 
first view:

Jan 19 23:23:07 host41 logger: backup.me start by host41.my.network
Jan 19 23:24:02 host41 logger: backup.me is running (...) - job ended

No errors in between. Everything seems to be fine. But I don't know about your 
backup process. Perhaps The script just triggers the start and the load goes 
on. Check it, please.

Jan 19 23:32:44 host41 crmd: [2108]: info: crm_timer_popped:
Jan 19 23:32:44 host41 crmd: [2108]: info: notify_crmd: Transition 381 status: 
done - <null>

Ages (8 Seconds) after the backup pengines timer popps, checks the cluster 
status and does not find any problems.

3) Your problems start here:
Jan 19 23:40:24 host41 lrmd: [2105]: WARN: cluster_ip:monitor process (PID 
21345) timed out
Jan 19 23:40:24 host41 lrmd: [2105]: WARN: mysql:monitor process (PID 21347) 
timed out
Jan 19 23:40:30 host41 crmd: [2108]: ERROR: process_lrm_event: LRM operation 
mysql_monitor_20000 (122) Timed Out

16 minutes (!) after the backup your problems start. From the logs you cannot 
see why. First the monitorign of the IP adress and then the MySQL DB fails.
Perhaps your backup script is still running or you have some other problem.

Please check your script: How long does it run? What load does it cause? Does 
it block the something, so that the monitoring fails?


Michael Schwartzkopff

[*] sys4 AG

http://sys4.de, +49 (89) 30 90 46 64, +49 (162) 165 0044
Franziskanerstraße 15, 81669 München

Sitz der Gesellschaft: München, Amtsgericht München: HRB 199263
Vorstand: Patrick Ben Koetter, Marc Schiffbauer
Aufsichtsratsvorsitzender: Florian Kirstein
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 230 bytes
Desc: This is a digitally signed message part.
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20140120/dc632c79/attachment-0003.sig>

More information about the Pacemaker mailing list