[ClusterLabs] corosync SCHED_RR stuck at 100% cpu usage with kernel 4.19, priority inversion/livelock?

Edwin Török edvin.torok at citrix.com
Tue Feb 19 12:21:44 EST 2019



On 19/02/2019 17:02, Klaus Wenninger wrote:
> On 02/19/2019 05:41 PM, Edwin Török wrote:
>> On 19/02/2019 16:26, Edwin Török wrote:
>>> On 18/02/2019 18:27, Edwin Török wrote:
>>>> Did a test today with CentOS 7.6 with upstream kernel and with
>>>> 4.20.10-1.el7.elrepo.x86_64 (tested both with upstream SBD, and our
>>>> patched [1] SBD) and was not able to reproduce the issue yet.
>>> I was able to finally reproduce this using only upstream components
>>> (although it seems to be easier to reproduce if we use our patched SBD,
>>> I was able to reproduce this by using only upstream packages unpatched
>>> by us):
> 
> Just out of curiosity: What did you patch in SBD?
> Sorry if I missed the answer in the previous communication.

It is mostly this PR, which calls getquorate quite often (a more
efficient impl. would be to use the quorum notification API like
dlm/pacemaker do, although see concerns in
https://lists.clusterlabs.org/pipermail/users/2019-February/016249.html):
https://github.com/ClusterLabs/sbd/pull/27

We have also added our own servant for watching the health of our
control plane, but that is not relevant to this bug (it reproduces with
that watcher turned off too).

> 
>> I was also able to get a corosync blackbox from one of the stuck VMs
>> that showed something interesting:
>> https://clbin.com/d76Ha
>>
>> It is looping on:
>> debug   Feb 19 16:37:24 mcast_sendmsg(408):12: sendmsg(mcast) failed
>> (non-critical): Resource temporarily unavailable (11)
> 
> Hmm ... something like tx-queue of the device full, or no buffers
> available anymore and kernel-thread doing the cleanup isn't
> scheduled ...

Yes that is very plausible. Perhaps it'd be nicer if corosync went back
to the epoll_wait loop when it gets too many EAGAINs from sendmsg.
(although this seems different from the original bug where it got stuck
in epoll_wait)

> Does the kernel log anything in that situation?

Other than the crmd segfault no.
>From previous observations on xenserver the softirqs were all stuck on
the CPU that corosync hogged 100% (I'll check this on upstream, but I'm
fairly sure it'll be the same). softirqs do not run at realtime priority
(if we increase the priority of ksoftirqd to realtime then it all gets
unstuck), but seem to be essential for whatever corosync is stuck
waiting on, in this case likely the sending/receiving of network packets.

I'm trying to narrow down the kernel between 4.19.16 and 4.20.10 to see
why this was only reproducible on 4.19 so far.

Best regards,
--Edwin
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-sbd-cluster-only-report-good-health-if-quorate-or-no.patch
Type: text/x-patch
Size: 4532 bytes
Desc: not available
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20190219/d4c13805/attachment-0006.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-Tweak-sbd-inquisitor.c-to-retain-the-same-behaviour-.patch
Type: text/x-patch
Size: 803 bytes
Desc: not available
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20190219/d4c13805/attachment-0007.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0002-increase_logging_level_to_CRIT_for_quorum_loss.patch
Type: text/x-patch
Size: 1283 bytes
Desc: not available
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20190219/d4c13805/attachment-0008.bin>
-------------- next part --------------
From 8f09fe89f452ace696b97019fb59a60db69b5965 Mon Sep 17 00:00:00 2001
From: Mark Syms <mark.syms at citrix.com>
Date: Thu, 11 Oct 2018 12:17:53 +0800
Subject: [PATCH] CA-290057: barebones Xapi sbd servant

Signed-off-by: Mark Syms <mark.syms at citrix.com>
Signed-off-by: Liang Dai <liang.dai1 at citrix.com>
---
 src/Makefile.am      |  2 +-
 src/sbd-common.c     | 12 ++++++++
 src/sbd-inquisitor.c | 16 +++++++++-
 src/sbd-xapi.c       | 73 ++++++++++++++++++++++++++++++++++++++++++++
 src/sbd.h            |  1 +
 5 files changed, 102 insertions(+), 2 deletions(-)
 create mode 100644 src/sbd-xapi.c

diff --git a/src/Makefile.am b/src/Makefile.am
index 4d509c2..35c19a5 100644
--- a/src/Makefile.am
+++ b/src/Makefile.am
@@ -5,7 +5,7 @@ AM_CPPFLAGS =  -I$(includedir)/pacemaker  \
 
 sbin_PROGRAMS = sbd
 
-sbd_SOURCES = sbd-common.c sbd-inquisitor.c sbd-pacemaker.c sbd-cluster.c setproctitle.c
+sbd_SOURCES = sbd-common.c sbd-inquisitor.c sbd-pacemaker.c sbd-cluster.c sbd-xapi.c setproctitle.c
 
 if SUPPORT_SHARED_DISK
 sbd_SOURCES += sbd-md.c
diff --git a/src/sbd-common.c b/src/sbd-common.c
index 803bc3a..15154d0 100644
--- a/src/sbd-common.c
+++ b/src/sbd-common.c
@@ -966,3 +966,15 @@ sbd_is_pcmk(struct servants_list_item *servant)
     }
     return false;
 }
+
+bool
+sbd_is_xapi(struct servants_list_item *servant)
+{
+    if ((servant != NULL) &&
+        (servant->devname != NULL) &&
+        (strcmp("xapi", servant->devname) == 0)) {
+        return true;
+    }
+    return false;
+}
+
diff --git a/src/sbd-inquisitor.c b/src/sbd-inquisitor.c
index 1955892..9de48ee 100644
--- a/src/sbd-inquisitor.c
+++ b/src/sbd-inquisitor.c
@@ -25,6 +25,7 @@ static struct servants_list_item *servants_leader = NULL;
 int     disk_priority = 1;
 int	check_pcmk = 0;
 int	check_cluster = 0;
+int     check_xapi = 0;
 int	disk_count	= 0;
 int	servant_count	= 0;
 int	servant_restart_interval = 5;
@@ -159,6 +160,10 @@ void servant_start(struct servants_list_item *s)
 		DBGLOG(LOG_INFO, "Starting Cluster servant");
 		s->pid = assign_servant(s->devname, servant_cluster, start_mode, NULL);
 
+        } else if (sbd_is_xapi(s)) {
+		DBGLOG(LOG_INFO, "Starting Xapi servant");
+		s->pid = assign_servant(s->devname, servant_xapi, start_mode, NULL);
+
         } else {
             cl_log(LOG_ERR, "Unrecognized servant: %s", s->devname);
         }        
@@ -903,7 +908,7 @@ int main(int argc, char **argv, char **envp)
         }
         cl_log(LOG_DEBUG, "Start delay: %d (%s)", (int)start_delay, value?value:"default");
 
-	while ((c = getopt(argc, argv, "czC:DPRTWZhvw:d:n:p:1:2:3:4:5:t:I:F:S:s:")) != -1) {
+	while ((c = getopt(argc, argv, "czC:DPRTWZhvxw:d:n:p:1:2:3:4:5:t:I:F:S:s:")) != -1) {
 		switch (c) {
 		case 'D':
 			break;
@@ -965,6 +970,9 @@ int main(int argc, char **argv, char **envp)
 		case 'P':
 			P_count++;
 			break;
+		case 'x':
+			check_xapi = 1;
+			break;
 		case 'z':
 			disk_priority = 0;
 			break;
@@ -1020,6 +1028,7 @@ int main(int argc, char **argv, char **envp)
 			usage();
 			return (0);
 		default:
+			cl_log(LOG_ERR, "Unhandled option %c", c);
 			exit_status = -2;
 			goto out;
 			break;
@@ -1132,6 +1141,11 @@ int main(int argc, char **argv, char **envp)
                         recruit_servant("cluster", 0);
                 }
 
+		cl_log(LOG_INFO, "Turning on Xapi checks: %d", check_xapi);
+		if (check_xapi) {
+			recruit_servant("xapi", 0);
+		}
+
                 exit_status = inquisitor();
         }
         
diff --git a/src/sbd-xapi.c b/src/sbd-xapi.c
new file mode 100644
index 0000000..4998afb
--- /dev/null
+++ b/src/sbd-xapi.c
@@ -0,0 +1,73 @@
+/*
+ * Copyright (C) Citrix Systems Inc.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public
+ * License as published by the Free Software Foundation; either
+ * version 2 of the License, or (at your option) any later version.
+ *
+ * This software is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, write to the Free Software Foundation, Inc.,
+ * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
+ */
+
+/*
+ * SBD monitor for xapi health
+ */
+
+#include <glib-unix.h>
+
+
+#include "sbd.h"
+
+
+static GMainLoop *mainloop = NULL;
+static guint notify_timer = 0;
+
+
+static gboolean
+notify_timer_cb(gpointer data)
+{
+    set_servant_health(pcmk_health_online, LOG_INFO, "Node state: online");
+
+    notify_parent();
+
+    return TRUE;
+}
+
+static void
+clean_up(int rc)
+{
+    return;
+}
+
+static void
+xapi_shutdown(int nsig)
+{
+    clean_up(0);
+}
+
+int
+servant_xapi(const char *diskname, int mode, const void* argp)
+{
+    cl_log(LOG_INFO, "Monitoring xapi health");
+
+    set_proc_title("sbd: watcher: Xapi");
+
+    mainloop = g_main_new(FALSE);
+    notify_timer = g_timeout_add(timeout_loop * 1000, notify_timer_cb, NULL);
+
+    mainloop_add_signal(SIGTERM, xapi_shutdown);
+    mainloop_add_signal(SIGINT, xapi_shutdown);
+
+    g_main_run(mainloop);
+    g_main_destroy(mainloop);
+
+    clean_up(0);
+    return 0;                   /* never reached */
+}
diff --git a/src/sbd.h b/src/sbd.h
index 0f8847a..315a0bf 100644
--- a/src/sbd.h
+++ b/src/sbd.h
@@ -177,6 +177,7 @@ int servant(const char *diskname, int mode, const void* argp);
 
 int servant_pcmk(const char *diskname, int mode, const void* argp);
 int servant_cluster(const char *diskname, int mode, const void* argp);
+int servant_xapi(const char *diskname, int mode, const void* argp);
 
 struct servants_list_item *lookup_servant_by_dev(const char *devname);
 struct servants_list_item *lookup_servant_by_pid(pid_t pid);
-- 
2.17.1

-------------- next part --------------
CA-290057: call xapi-health-check from thread and set servant status appropriately

From: Mark Syms <mark.syms at citrix.com>

Signed-off-by: Mark Syms <mark.syms at citrix.com>

diff --git a/src/sbd-inquisitor.c b/src/sbd-inquisitor.c
index d0632e2..0e7ef9f 100644
--- a/src/sbd-inquisitor.c
+++ b/src/sbd-inquisitor.c
@@ -395,7 +395,7 @@ int cluster_alive(bool all)
     }
 
     for (s = servants_leader; s; s = s->next) {
-        if (sbd_is_cluster(s) || sbd_is_pcmk(s)) {
+        if (sbd_is_cluster(s) || sbd_is_pcmk(s) || sbd_is_xapi(s)) {
             if(s->outdated) {
                 alive = 0;
             } else if(all == false) {
@@ -509,7 +509,7 @@ void inquisitor_child(void)
 			}
 		} else if (sig == SIG_PCMK_UNHEALTHY) {
 			s = lookup_servant_by_pid(sinfo.si_pid);
-			if (sbd_is_cluster(s) || sbd_is_pcmk(s)) {
+			if (sbd_is_cluster(s) || sbd_is_pcmk(s) || sbd_is_xapi(s)) {
                 if (s->outdated == 0) {
                     cl_log(LOG_WARNING, "%s health check: UNHEALTHY", s->devname);
                 }
@@ -915,8 +915,8 @@ int main(int argc, char **argv, char **envp)
 		case 'v':
                     debug++;
                     if(debug == 1) {
-                        qb_log_filter_ctl(QB_LOG_SYSLOG, QB_LOG_FILTER_ADD, QB_LOG_FILTER_FILE, "sbd-common.c,sbd-inquisitor.c,sbd-md.c,sbd-pacemaker.c", LOG_DEBUG);
-                        qb_log_filter_ctl(QB_LOG_STDERR, QB_LOG_FILTER_ADD, QB_LOG_FILTER_FILE, "sbd-common.c,sbd-inquisitor.c,sbd-md.c,sbd-pacemaker.c", LOG_DEBUG);
+                        qb_log_filter_ctl(QB_LOG_SYSLOG, QB_LOG_FILTER_ADD, QB_LOG_FILTER_FILE, "sbd-common.c,sbd-inquisitor.c,sbd-md.c,sbd-pacemaker.c,sbd-cluster.c,sbd-xapi.c", LOG_DEBUG);
+                        qb_log_filter_ctl(QB_LOG_STDERR, QB_LOG_FILTER_ADD, QB_LOG_FILTER_FILE, "sbd-common.c,sbd-inquisitor.c,sbd-md.c,sbd-pacemaker.c,sbd-cluster.c,sbd-xapi.c", LOG_DEBUG);
 			cl_log(LOG_INFO, "Verbose mode enabled.");
 
                     } else if(debug == 2) {
diff --git a/src/sbd-xapi.c b/src/sbd-xapi.c
index 4998afb..6cee57e 100644
--- a/src/sbd-xapi.c
+++ b/src/sbd-xapi.c
@@ -20,54 +20,146 @@
  * SBD monitor for xapi health
  */
 
+#include <unistd.h>
 #include <glib-unix.h>
 
-
 #include "sbd.h"
 
 
-static GMainLoop *mainloop = NULL;
-static guint notify_timer = 0;
+#define XAPI_HEALTHCHECKER_PATH "/opt/xensource/libexec/"
+#define XAPI_HEALTHCHECKER      "xapi-health-check"
+#define HA_ENABLE_PATH		"/var/run/cluster/"
+#define HA_ENABLE_FILE		"ha_enabled"
+
+static GMainLoop	*mainloop     = NULL;
+static guint		 notify_timer = 0;
+static guint		 health_timer = 0;
+static guint             health_blocked_timer = 100;
+
+int		timeout_health	    	 = 60;
+int             health_execution_timeout = 100;
+
+static GPid       checker_pid = 0;
+
+
+static void
+checker_watch_cb (GPid     pid,
+		  gint     status,
+		  gpointer user_data)
+{
+	cl_log(LOG_INFO, "Checker process watch cb");
+
+	if (g_spawn_check_exit_status(status, NULL)) {
+		set_servant_health(pcmk_health_online, LOG_INFO, "Node state: xapi healthy");
+	} else {
+		set_servant_health(pcmk_health_transient, LOG_INFO, "Node state: xapi health check failed");
+	}
+
+	g_spawn_close_pid (pid);
+	checker_pid = 0;
+}
+
+static gboolean
+health_timer_abort_cb(gpointer data)
+{
+	if (checker_pid != 0) {
+		cl_log(LOG_DEBUG, "Checker process timed out killing");
+		kill(checker_pid, SIGKILL);
+	}
 
+	return FALSE;
+}
+
+static gboolean
+health_timer_cb(gpointer data)
+{
+	const gchar * const argv[] = {XAPI_HEALTHCHECKER_PATH XAPI_HEALTHCHECKER, NULL };
+	GError *error = NULL;
+	guint watch_source;
+
+	cl_log(LOG_DEBUG, "Health timer cb");
+
+	if (access(HA_ENABLE_PATH HA_ENABLE_FILE, F_OK) != -1) {
+		if (checker_pid == 0) {
+			cl_log(LOG_DEBUG, "Starting checker subprocess");
+			g_spawn_async(
+				NULL,
+				(gchar **)argv,
+				NULL,
+				G_SPAWN_DO_NOT_REAP_CHILD | G_SPAWN_STDOUT_TO_DEV_NULL | G_SPAWN_STDERR_TO_DEV_NULL,
+				NULL,
+				NULL,
+				&checker_pid,
+				&error);
+
+			if (error != NULL) {
+				cl_log(LOG_ERR, "Failed to spawn health check %d", error->code);
+				set_servant_health(pcmk_health_transient, LOG_INFO, "Node state: xapi health check failed");
+				g_error_free(error);
+				return FALSE;
+			}
+
+			watch_source = g_child_watch_add (checker_pid, checker_watch_cb, NULL);
+			health_blocked_timer = g_timeout_add(health_execution_timeout * 1000, health_timer_abort_cb, NULL);
+		}
+	} else {
+		set_servant_health(pcmk_health_online, LOG_INFO, "Node state: xapi healthy");
+		checker_pid = 0;
+	}
+
+	return TRUE;
+}
 
 static gboolean
 notify_timer_cb(gpointer data)
 {
-    set_servant_health(pcmk_health_online, LOG_INFO, "Node state: online");
+	cl_log(LOG_DEBUG, "Refreshing state");
+
+	if (access(HA_ENABLE_PATH HA_ENABLE_FILE, F_OK) == -1) {
+		set_servant_health(pcmk_health_online, LOG_INFO, "Node state: xapi healthy");
+	}
 
-    notify_parent();
+	notify_parent();
 
-    return TRUE;
+	return TRUE;
 }
 
 static void
 clean_up(int rc)
 {
-    return;
+	return;
 }
 
 static void
 xapi_shutdown(int nsig)
 {
-    clean_up(0);
+	clean_up(0);
 }
 
 int
 servant_xapi(const char *diskname, int mode, const void* argp)
 {
-    cl_log(LOG_INFO, "Monitoring xapi health");
+	sigset_t procmask;
+
+	cl_log(LOG_INFO, "Monitoring xapi health");
+
+	set_proc_title("sbd: watcher: Xapi");
 
-    set_proc_title("sbd: watcher: Xapi");
+	/* As we use subprocess unblock SIGCHILD */
+	sigemptyset(&procmask);
+	sigaddset(&procmask, SIGCHLD);
+	sigprocmask(SIG_UNBLOCK, &procmask, NULL);
 
-    mainloop = g_main_new(FALSE);
-    notify_timer = g_timeout_add(timeout_loop * 1000, notify_timer_cb, NULL);
+	mainloop = g_main_loop_new(NULL, FALSE);
+	notify_timer = g_timeout_add_seconds(timeout_loop, notify_timer_cb, NULL);
+	health_timer = g_timeout_add_seconds(timeout_health, health_timer_cb, NULL);
 
-    mainloop_add_signal(SIGTERM, xapi_shutdown);
-    mainloop_add_signal(SIGINT, xapi_shutdown);
+	mainloop_add_signal(SIGTERM, xapi_shutdown);
+	mainloop_add_signal(SIGINT, xapi_shutdown);
 
-    g_main_run(mainloop);
-    g_main_destroy(mainloop);
+	g_main_loop_run(mainloop);
+	g_main_loop_unref(mainloop);
 
-    clean_up(0);
-    return 0;                   /* never reached */
+	clean_up(0);
+	return 0;                   /* never reached */
 }
-------------- next part --------------
CA-290057: fix compilation warnings

From: Mark Syms <mark.syms at citrix.com>

Signed-off-by: Mark Syms <mark.syms at citrix.com>

diff --git a/src/sbd-xapi.c b/src/sbd-xapi.c
index 6cee57e..c21dd0f 100644
--- a/src/sbd-xapi.c
+++ b/src/sbd-xapi.c
@@ -21,6 +21,10 @@
  */
 
 #include <unistd.h>
+
+#include <crm/cluster.h>
+#include <crm/common/mainloop.h>
+
 #include <glib-unix.h>
 
 #include "sbd.h"
@@ -75,7 +79,6 @@ health_timer_cb(gpointer data)
 {
 	const gchar * const argv[] = {XAPI_HEALTHCHECKER_PATH XAPI_HEALTHCHECKER, NULL };
 	GError *error = NULL;
-	guint watch_source;
 
 	cl_log(LOG_DEBUG, "Health timer cb");
 
@@ -99,7 +102,7 @@ health_timer_cb(gpointer data)
 				return FALSE;
 			}
 
-			watch_source = g_child_watch_add (checker_pid, checker_watch_cb, NULL);
+			g_child_watch_add (checker_pid, checker_watch_cb, NULL);
 			health_blocked_timer = g_timeout_add(health_execution_timeout * 1000, health_timer_abort_cb, NULL);
 		}
 	} else {
diff --git a/src/sbd.h b/src/sbd.h
index 9c62ea0..1635ca5 100644
--- a/src/sbd.h
+++ b/src/sbd.h
@@ -200,3 +200,4 @@ void set_servant_health(enum pcmk_health state, int level, char const *format, .
 bool sbd_is_disk(struct servants_list_item *servant);
 bool sbd_is_pcmk(struct servants_list_item *servant);
 bool sbd_is_cluster(struct servants_list_item *servant);
+bool sbd_is_xapi(struct servants_list_item *servant);
-------------- next part --------------
CA-290600: Xapi SBD watcher segfault.

Must initialise GError to NULL otherwise g_set_error doesn't fill
it in and then we get a free error.

From: Mark Syms <mark.syms at citrix.com>

Signed-off-by: Mark Syms <mark.syms at citrix.com>

diff --git a/src/sbd-xapi.c b/src/sbd-xapi.c
index 0f65a24..afbd0b8 100644
--- a/src/sbd-xapi.c
+++ b/src/sbd-xapi.c
@@ -89,7 +89,7 @@ checker_watch_cb (GPid     pid,
 		  gint     status,
 		  gpointer user_data)
 {
-	GError *error;
+	GError *error = NULL;
 	cl_log(LOG_INFO, "Checker process watch cb");
 
 	// Remove the abort timer as the process has terminated
@@ -98,10 +98,9 @@ checker_watch_cb (GPid     pid,
 	if (g_spawn_check_exit_status(status, &error)) {
 		set_servant_health(pcmk_health_online, LOG_INFO, "Node state: xapi healthy");
 	} else {
-		cl_log(LOG_ERR, "Xapi health check %s, %d",
-		       error->domain == G_SPAWN_EXIT_ERROR ? "failed" : "signalled",
-		       status);
+		cl_log(LOG_ERR, "Xapi health check '%s', %d", error->message, status);
 		set_servant_health(pcmk_health_transient, LOG_ERR, "Node state: xapi health check failed");
+		notify_parent();
 		g_error_free(error);
 	}
 
@@ -200,6 +199,8 @@ servant_xapi(const char *diskname, int mode, const void* argp)
 	g_main_loop_run(mainloop);
 	g_main_loop_unref(mainloop);
 
+	cl_log(LOG_CRIT, "Unexpected termination of Xapi servant main loop");
+
 	clean_up(0);
 	return 0;                   /* never reached */
 }
-------------- next part --------------
CA-292257: log xapi checker failures at LOG_ERR

From: Mark Syms <mark.syms at citrix.com>

Signed-off-by: Mark Syms <mark.syms at citrix.com>

diff --git a/src/sbd-xapi.c b/src/sbd-xapi.c
index c21dd0f..b2a4652 100644
--- a/src/sbd-xapi.c
+++ b/src/sbd-xapi.c
@@ -56,7 +56,8 @@ checker_watch_cb (GPid     pid,
 	if (g_spawn_check_exit_status(status, NULL)) {
 		set_servant_health(pcmk_health_online, LOG_INFO, "Node state: xapi healthy");
 	} else {
-		set_servant_health(pcmk_health_transient, LOG_INFO, "Node state: xapi health check failed");
+		cl_log(LOG_ERR, "Xapi health check failed, %d", status);
+		set_servant_health(pcmk_health_transient, LOG_ERR, "Node state: xapi health check failed");
 	}
 
 	g_spawn_close_pid (pid);
-------------- next part --------------
CA-292257: log some messages at higher priority.

From: Mark Syms <mark.syms at citrix.com>

Allow the timeout for the health checker to be passed in as an
argument. Set default checker timout to match what Xapi configures
XenHA with (120 instead of 100 seconds).

Refactor some code to make it cleaner.

Also fix a possible race in the checker abort timeout, make sure
it's stopped when the process actually manages to terminate.

Signed-off-by: Mark Syms <mark.syms at citrix.com>

diff --git a/src/sbd-inquisitor.c b/src/sbd-inquisitor.c
index 0e7ef9f..2c572bf 100644
--- a/src/sbd-inquisitor.c
+++ b/src/sbd-inquisitor.c
@@ -26,6 +26,7 @@ int     disk_priority = 1;
 int	check_pcmk = 0;
 int	check_cluster = 0;
 int     check_xapi = 0;
+int     xapi_health_timeout = 0;
 int	disk_count	= 0;
 int	servant_count	= 0;
 int	servant_restart_interval = 5;
@@ -138,6 +139,8 @@ void servant_start(struct servants_list_item *s)
 	int r = 0;
 	union sigval svalue;
 
+	struct xapi_servant_params *xapi_params;
+
 	if (s->pid != 0) {
 		r = sigqueue(s->pid, 0, svalue);
 		if ((r != -1 || errno != ESRCH))
@@ -162,7 +165,9 @@ void servant_start(struct servants_list_item *s)
 
         } else if (sbd_is_xapi(s)) {
 		DBGLOG(LOG_INFO, "Starting Xapi servant");
-		s->pid = assign_servant(s->devname, servant_xapi, start_mode, NULL);
+		xapi_params = malloc(sizeof(struct xapi_servant_params));
+		xapi_params->health_check_timeout = xapi_health_timeout;
+		s->pid = assign_servant(s->devname, servant_xapi, start_mode, xapi_params);
 
         } else {
             cl_log(LOG_ERR, "Unrecognized servant: %s", s->devname);
@@ -892,7 +897,7 @@ int main(int argc, char **argv, char **envp)
         }
         cl_log(LOG_DEBUG, "Start delay: %d (%s)", (int)start_delay, value?value:"default");
 
-	while ((c = getopt(argc, argv, "czC:DPRTWZhvxw:d:n:p:1:2:3:4:5:t:I:F:S:s:")) != -1) {
+	while ((c = getopt(argc, argv, "czC:DPRTWZhvxX:w:d:n:p:1:2:3:4:5:t:I:F:S:s:")) != -1) {
 		switch (c) {
 		case 'D':
 			break;
@@ -956,6 +961,9 @@ int main(int argc, char **argv, char **envp)
 		case 'x':
 			check_xapi = 1;
 			break;
+		case 'X':
+			xapi_health_timeout = atoi(optarg);
+			break;
 		case 'z':
 			disk_priority = 0;
 			break;
diff --git a/src/sbd-xapi.c b/src/sbd-xapi.c
index b2a4652..0f65a24 100644
--- a/src/sbd-xapi.c
+++ b/src/sbd-xapi.c
@@ -38,10 +38,11 @@
 static GMainLoop	*mainloop     = NULL;
 static guint		 notify_timer = 0;
 static guint		 health_timer = 0;
-static guint             health_blocked_timer = 100;
+static guint             health_blocked_timer = 0;
+static struct xapi_servant_params *params;
 
-int		timeout_health	    	 = 60;
-int             health_execution_timeout = 100;
+#define		HEALTH_CHECK_PERIOD   	60
+#define         DEFAULT_HEALTH_TIMEOUT  120
 
 static GPid       checker_pid = 0;
 
@@ -49,15 +50,59 @@ static GPid       checker_pid = 0;
 static void
 checker_watch_cb (GPid     pid,
 		  gint     status,
+		  gpointer user_data);
+static gboolean
+health_timer_abort_cb(gpointer data);
+
+static gboolean
+start_health_checker()
+{
+	const gchar * const argv[] = {XAPI_HEALTHCHECKER_PATH XAPI_HEALTHCHECKER, NULL };
+	GError *error = NULL;
+
+	cl_log(LOG_DEBUG, "Starting checker subprocess, task timeout %d", params->health_check_timeout);
+	g_spawn_async(
+		NULL,
+		(gchar **)argv,
+		NULL,
+		G_SPAWN_DO_NOT_REAP_CHILD | G_SPAWN_STDOUT_TO_DEV_NULL | G_SPAWN_STDERR_TO_DEV_NULL,
+		NULL,
+		NULL,
+		&checker_pid,
+		&error);
+
+	if (error != NULL) {
+		cl_log(LOG_ERR, "Failed to spawn health check %d", error->code);
+		set_servant_health(pcmk_health_transient, LOG_INFO, "Node state: xapi health check failed");
+		g_error_free(error);
+		return FALSE;
+	}
+
+	g_child_watch_add(checker_pid, checker_watch_cb, NULL);
+	health_blocked_timer = g_timeout_add(params->health_check_timeout * 1000, health_timer_abort_cb, NULL);
+
+	return TRUE;
+}
+
+static void
+checker_watch_cb (GPid     pid,
+		  gint     status,
 		  gpointer user_data)
 {
+	GError *error;
 	cl_log(LOG_INFO, "Checker process watch cb");
 
-	if (g_spawn_check_exit_status(status, NULL)) {
+	// Remove the abort timer as the process has terminated
+	g_source_destroy(g_main_context_find_source_by_id(NULL, health_blocked_timer));
+
+	if (g_spawn_check_exit_status(status, &error)) {
 		set_servant_health(pcmk_health_online, LOG_INFO, "Node state: xapi healthy");
 	} else {
-		cl_log(LOG_ERR, "Xapi health check failed, %d", status);
+		cl_log(LOG_ERR, "Xapi health check %s, %d",
+		       error->domain == G_SPAWN_EXIT_ERROR ? "failed" : "signalled",
+		       status);
 		set_servant_health(pcmk_health_transient, LOG_ERR, "Node state: xapi health check failed");
+		g_error_free(error);
 	}
 
 	g_spawn_close_pid (pid);
@@ -68,7 +113,7 @@ static gboolean
 health_timer_abort_cb(gpointer data)
 {
 	if (checker_pid != 0) {
-		cl_log(LOG_DEBUG, "Checker process timed out killing");
+		cl_log(LOG_CRIT, "Checker process timed out killing");
 		kill(checker_pid, SIGKILL);
 	}
 
@@ -78,40 +123,20 @@ health_timer_abort_cb(gpointer data)
 static gboolean
 health_timer_cb(gpointer data)
 {
-	const gchar * const argv[] = {XAPI_HEALTHCHECKER_PATH XAPI_HEALTHCHECKER, NULL };
-	GError *error = NULL;
+	gboolean result = TRUE;
 
 	cl_log(LOG_DEBUG, "Health timer cb");
 
 	if (access(HA_ENABLE_PATH HA_ENABLE_FILE, F_OK) != -1) {
 		if (checker_pid == 0) {
-			cl_log(LOG_DEBUG, "Starting checker subprocess");
-			g_spawn_async(
-				NULL,
-				(gchar **)argv,
-				NULL,
-				G_SPAWN_DO_NOT_REAP_CHILD | G_SPAWN_STDOUT_TO_DEV_NULL | G_SPAWN_STDERR_TO_DEV_NULL,
-				NULL,
-				NULL,
-				&checker_pid,
-				&error);
-
-			if (error != NULL) {
-				cl_log(LOG_ERR, "Failed to spawn health check %d", error->code);
-				set_servant_health(pcmk_health_transient, LOG_INFO, "Node state: xapi health check failed");
-				g_error_free(error);
-				return FALSE;
-			}
-
-			g_child_watch_add (checker_pid, checker_watch_cb, NULL);
-			health_blocked_timer = g_timeout_add(health_execution_timeout * 1000, health_timer_abort_cb, NULL);
+			result = start_health_checker();
 		}
 	} else {
 		set_servant_health(pcmk_health_online, LOG_INFO, "Node state: xapi healthy");
 		checker_pid = 0;
 	}
 
-	return TRUE;
+	return result;
 }
 
 static gboolean
@@ -145,10 +170,18 @@ servant_xapi(const char *diskname, int mode, const void* argp)
 {
 	sigset_t procmask;
 
+	params = (struct xapi_servant_params*)argp;
+
 	cl_log(LOG_INFO, "Monitoring xapi health");
+	cl_log(LOG_INFO, "Monitoring xapi health, checker timeout %d",
+	       params->health_check_timeout);
 
 	set_proc_title("sbd: watcher: Xapi");
 
+	if (params->health_check_timeout == 0) {
+		params->health_check_timeout = DEFAULT_HEALTH_TIMEOUT;
+	}
+
 	/* As we use subprocess unblock SIGCHILD */
 	sigemptyset(&procmask);
 	sigaddset(&procmask, SIGCHLD);
@@ -156,11 +189,14 @@ servant_xapi(const char *diskname, int mode, const void* argp)
 
 	mainloop = g_main_loop_new(NULL, FALSE);
 	notify_timer = g_timeout_add_seconds(timeout_loop, notify_timer_cb, NULL);
-	health_timer = g_timeout_add_seconds(timeout_health, health_timer_cb, NULL);
+	health_timer = g_timeout_add_seconds(HEALTH_CHECK_PERIOD, health_timer_cb, NULL);
 
 	mainloop_add_signal(SIGTERM, xapi_shutdown);
 	mainloop_add_signal(SIGINT, xapi_shutdown);
 
+	/* Start in healthy state */
+	set_servant_health(pcmk_health_online, LOG_INFO, "Node state: xapi healthy");
+
 	g_main_loop_run(mainloop);
 	g_main_loop_unref(mainloop);
 
diff --git a/src/sbd.h b/src/sbd.h
index 1635ca5..4cc9fca 100644
--- a/src/sbd.h
+++ b/src/sbd.h
@@ -108,6 +108,11 @@ struct sbd_context {
 	struct iocb	io;
 };
 
+struct xapi_servant_params
+{
+	int     health_check_timeout;
+};
+
 enum pcmk_health 
 {
     pcmk_health_unknown,
-------------- next part --------------
0001-Tweak-sbd-inquisitor.c-to-retain-the-same-behaviour-.patch
0001-sbd-cluster-only-report-good-health-if-quorate-or-no.patch
0002-increase_logging_level_to_CRIT_for_quorum_loss.patch
CA-290057__barebones_Xapi_sbd_servant
CA-290057__call_xapi-health-check_from_thread_and_set_servant_status_appropriately
CA-290057__fix_compilation_warnings
CA-292257__log_xapi_checker_failures_at_LOG_ERR
CA-292257__sbd_xapi_servant_outdated
CA-290600__Xapi_SBD_watcher_segfault


More information about the Users mailing list