[ClusterLabs] Antw: Re: [Question] About movement of pacemaker_remote.

David Vossel dvossel at redhat.com
Sun Apr 26 21:55:39 UTC 2015



----- Original Message -----
> Hi David,
> 
> Thank you for comments.
> 
> > please turn on debug logging in /etc/sysconfig/pacemaker for both the
> > pacemaker
> > nodes and the nodes running pacemaker remote.
>> > set the following
>> > PCMK_logfile=/var/log/pacemaker.log
> > PCMK_debug=yes
> > PCMK_trace_files=lrmd_client.c,lrmd.c,tls_backend.c,remote.c
>> > Provide the logs with the new debug settings enabled during the time period
> > that pacemaker is unable to reconnect to pacemaker_remote.
> 
> 
> I put the log(log_zip.zip) of two nodes in the next place.(sl7-01 and snmp2)
>  * https://onedrive.live.com/?cid=3A14D57622C66876&id=3A14D57622C66876%21117
> I rebooted pacemaker_remote of snmp2.
> I carried out crm_resource -C snmp2 afterwards.


At first glance this looks gnutls related.  GNUTLS is returning -50 during receive
on the client side (pacemaker's side). -50 maps to 'invalid request'.

debug: crm_remote_recv_once: 	TLS receive failed: The request is invalid.

We treat this error as fatal and destroy the connection. I've never encountered
this error and I don't know what causes it. It's possible there's a bug in
our gnutls usage... it's also possible there's a bug in the version of gnutls
that is in use as well. 

-- David


> 
> -------------------------------------------------
> [root at sl7-01 ~]# crm_mon -1 -Af
> Last updated: Wed Apr 15 09:54:16 2015
> Last change: Wed Apr 15 09:46:17 2015
> Stack: corosync
> Current DC: sl7-01 - partition WITHOUT quorum
> Version: 1.1.12-3e93bc1
> 3 Nodes configured
> 5 Resources configured
> 
> 
> Online: [ sl7-01 ]
> RemoteOnline: [ snmp1 ]
> RemoteOFFLINE: [ snmp2 ]
> 
>  Host-rsc1      (ocf::heartbeat:Dummy): Started sl7-01
>  Remote-rsc1    (ocf::heartbeat:Dummy): Started snmp1
>  Remote-rsc2    (ocf::heartbeat:Dummy): Started snmp1 (failure ignored)
>  snmp1  (ocf::pacemaker:remote):        Started sl7-01
> 
> Node Attributes:
> * Node sl7-01:
> * Node snmp1:
> 
> Migration summary:
> * Node sl7-01:
>    snmp2: migration-threshold=1 fail-count=1000000 last-failure='Wed Apr 15
>    09:47:16 2015'
> * Node snmp1:
> 
> Failed actions:
>     snmp2_start_0 on sl7-01 'unknown error' (1): call=8, status=Timed Out,
>     exit-reason='none', last-rc-change='Wed Apr 15 09:46:18 2015',
>     queued=0ms, exec=0ms
> -------------------------------------------------
> 
> Best Regards,
> 
> Hideo Yamauchi.
> 
> 
> ----- Original Message -----
> > From: David Vossel <dvossel at redhat.com>
> > To: renayama19661014 at ybb.ne.jp; Cluster Labs - All topics related to
> > open-source clustering welcomed <users at clusterlabs.org>
> > Cc:
> > Date: 2015/4/15, Wed 07:22
> > Subject: Re: [ClusterLabs] Antw: Re: [Question] About movement of
> > pacemaker_remote.
> > 
> > 
> > 
> > ----- Original Message -----
> >>  Hi Andrew,
> >> 
> >>  Thank you for comments.
> >> 
> >>  >> Step 4) We clear snmp2 of remote by crm_resource command,
> >>  > 
> >>  > Was pacemaker_remoted running at this point?
> > 
> > please turn on debug logging in /etc/sysconfig/pacemaker for both the
> > pacemaker
> > nodes and the nodes running pacemaker remote.
> > 
> > set the following
> > 
> > PCMK_logfile=/var/log/pacemaker.log
> > PCMK_debug=yes
> > PCMK_trace_files=lrmd_client.c,lrmd.c,tls_backend.c,remote.c
> > 
> > Provide the logs with the new debug settings enabled during the time period
> > that pacemaker is unable to reconnect to pacemaker_remote.
> > 
> > Thanks,
> > --David
> > 
> >> 
> >> 
> >>  Yes.
> >> 
> >>  In the node that rebooted pacemaker_remote, it becomes the following log.
> >> 
> >> 
> >>  ------------------------------
> >>  Apr 13 15:47:29 snmp2 pacemaker_remoted[1494]:     info: main: Starting
> > ---->
> >>  #### RESTARTED pacemaker_remote.
> >>  Apr 13 15:47:42 snmp2 pacemaker_remoted[1494]:   notice:
> > lrmd_remote_listen:
> >>  LRMD client connection established. 0x24f4ca0 id:
> >>  5b56e54e-b9da-4804-afda-5c72038d089c
> >>  Apr 13 15:47:43 snmp2 pacemaker_remoted[1494]:     info:
> >>  lrmd_remote_client_msg: Client disconnect detected in tls msg dispatcher.
> >>  Apr 13 15:47:43 snmp2 pacemaker_remoted[1494]:   notice:
> >>  lrmd_remote_client_destroy: LRMD client disconnecting remote client -
> >>  name:
> >>  remote-lrmd-snmp2:3121 id: 5b56e54e-b9da-4804-afda-5c72038d089c
> >>  Apr 13 15:47:44 snmp2 pacemaker_remoted[1494]:   notice:
> > lrmd_remote_listen:
> >>  LRMD client connection established. 0x24f4ca0 id:
> >>  907cd1fc-6c1d-40f1-8c60-34bc8b66715f
> >>  Apr 13 15:47:44 snmp2 pacemaker_remoted[1494]:     info:
> >>  lrmd_remote_client_msg: Client disconnect detected in tls msg dispatcher.
> >>  Apr 13 15:47:44 snmp2 pacemaker_remoted[1494]:   notice:
> >>  lrmd_remote_client_destroy: LRMD client disconnecting remote client -
> >>  name:
> >>  remote-lrmd-snmp2:3121 id: 907cd1fc-6c1d-40f1-8c60-34bc8b66715f
> >>  Apr 13 15:47:45 snmp2 pacemaker_remoted[1494]:   notice:
> > lrmd_remote_listen:
> >>  LRMD client connection established. 0x24f4ca0 id:
> >>  8b38c0dd-9338-478a-8f23-523aee4cc1a6
> >>  Apr 13 15:47:46 snmp2 pacemaker_remoted[1494]:     info:
> >>  lrmd_remote_client_msg: Client disconnect detected in tls msg dispatcher.
> >>  (snip)
> >>  After that the log is repeated.
> >> 
> >> 
> >>  ------------------------------
> >> 
> >> 
> >>  > I mentioned this earlier today, we need to improve the experience in
> > this
> >>  > area.
> >>  > 
> >>  > Probably a good excuse to fix on-fail=ignore for start actions.
> >>  > 
> >>  >> but remote cannot participate in a cluster.
> >> 
> >> 
> >> 
> >>  I changed crm file as follows.(on-fail=ignore for start)
> >> 
> >> 
> >>  (snip)
> >>  primitive snmp1 ocf:pacemaker:remote \
> >>          params \
> >>                  server="snmp1" \
> >>          op start interval="0s" timeout="60s"
> > on-fail="ignore" \
> >>          op monitor interval="3s" timeout="15s" \
> >>          op stop interval="0s" timeout="60s"
> > on-fail="ignore"
> >> 
> >>  primitive snmp2 ocf:pacemaker:remote \
> >>          params \
> >>                  server="snmp2" \
> >>          op start interval="0s" timeout="60s"
> > on-fail="ignore" \
> >>          op monitor interval="3s" timeout="15s" \
> >>          op stop interval="0s" timeout="60s"
> > on-fail="stop"
> >> 
> >>  (snip)
> >> 
> >>  However, the result was the same.
> >>  Even if the node of pacemaker_remote which rebooted carries out
> > crm_resource
> >>  -C, the node does not participate in a cluster.
> >> 
> >>  [root at sl7-01 ~]# crm_mon -1 -Af
> >>  Last updated: Mon Apr 13 15:51:58 2015
> >>  Last change: Mon Apr 13 15:47:41 2015
> >>  Stack: corosync
> >>  Current DC: sl7-01 - partition WITHOUT quorum
> >>  Version: 1.1.12-3e93bc1
> >>  3 Nodes configured
> >>  5 Resources configured
> >> 
> >> 
> >>  Online: [ sl7-01 ]
> >>  RemoteOnline: [ snmp1 ]
> >>  RemoteOFFLINE: [ snmp2 ]
> >> 
> >>   Host-rsc1      (ocf::heartbeat:Dummy): Started sl7-01
> >>   Remote-rsc1    (ocf::heartbeat:Dummy): Started snmp1
> >>   Remote-rsc2    (ocf::heartbeat:Dummy): Started snmp1 (failure ignored)
> >>   snmp1  (ocf::pacemaker:remote):        Started sl7-01
> >> 
> >>  Node Attributes:
> >>  * Node sl7-01:
> >>  * Node snmp1:
> >> 
> >>  Migration summary:
> >>  * Node sl7-01:
> >>     snmp2: migration-threshold=1 fail-count=1000000 last-failure='Mon
> > Apr 13
> >>     15:48:40 2015'
> >>  * Node snmp1:
> >> 
> >>  Failed actions:
> >>      snmp2_start_0 on sl7-01 'unknown error' (1): call=8,
> > status=Timed Out,
> >>      exit-reason='none', last-rc-change='Mon Apr 13 15:47:42
> > 2015',
> >>      queued=0ms, exec=0ms
> >> 
> >> 
> >> 
> >>  Best Regards,
> >>  Hidoe Yamauchi.
> >> 
> >> 
> >> 
> >>  ----- Original Message -----
> >>  > From: Andrew Beekhof <andrew at beekhof.net>
> >>  > To: renayama19661014 at ybb.ne.jp; Cluster Labs - All topics related to
> >>  > open-source clustering welcomed <users at clusterlabs.org>
> >>  > Cc:
> >>  > Date: 2015/4/13, Mon 14:11
> >>  > Subject: Re: [ClusterLabs] Antw: Re: [Question] About movement of
> >>  > pacemaker_remote.
> >>  > 
> >>  > 
> >>  >>  On 8 Apr 2015, at 12:27 pm, renayama19661014 at ybb.ne.jp wrote:
> >>  >> 
> >>  >>  Hi All,
> >>  >> 
> >>  >>  Let me confirm the first question once again.
> >>  >> 
> >>  >>  I confirmed the next movement in Pacemaker1.1.13-rc1.
> >>  >>  Stonith does not set it.
> >>  >> 
> >>  >>  -------------------------------------------------------------
> >>  >>  property no-quorum-policy="ignore" \
> >>  >>          stonith-enabled="false" \
> >>  >>          startup-fencing="false" \
> >>  >> 
> >>  >>  rsc_defaults resource-stickiness="INFINITY" \
> >>  >>          migration-threshold="1"
> >>  >> 
> >>  >>  primitive snmp1 ocf:pacemaker:remote \
> >>  >>          params \
> >>  >>                  server="snmp1" \
> >>  >>          op start interval="0s" timeout="60s"
> >>  > on-fail="restart" \
> >>  >>          op monitor interval="3s"
> > timeout="15s" \
> >>  >>          op stop interval="0s" timeout="60s"
> >>  > on-fail="ignore"
> >>  >> 
> >>  >>  primitive snmp2 ocf:pacemaker:remote \
> >>  >>          params \
> >>  >>                  server="snmp2" \
> >>  >>          op start interval="0s" timeout="60s"
> >>  > on-fail="restart" \
> >>  >>          op monitor interval="3s"
> > timeout="15s" \
> >>  >>          op stop interval="0s" timeout="60s"
> >>  > on-fail="stop"
> >>  >> 
> >>  >>  primitive Host-rsc1 ocf:heartbeat:Dummy \
> >>  >>          op start interval="0s" timeout="60s"
> >>  > on-fail="restart" \
> >>  >>          op monitor interval="10s"
> > timeout="60s"
> >>  > on-fail="restart" \
> >>  >>          op stop interval="0s" timeout="60s"
> >>  > on-fail="ignore"
> >>  >> 
> >>  >>  primitive Remote-rsc1 ocf:heartbeat:Dummy \
> >>  >>          op start interval="0s" timeout="60s"
> >>  > on-fail="restart" \
> >>  >>          op monitor interval="10s"
> > timeout="60s"
> >>  > on-fail="restart" \
> >>  >>          op stop interval="0s" timeout="60s"
> >>  > on-fail="ignore"
> >>  >> 
> >>  >>  primitive Remote-rsc2 ocf:heartbeat:Dummy \
> >>  >>          op start interval="0s" timeout="60s"
> >>  > on-fail="restart" \
> >>  >>          op monitor interval="10s"
> > timeout="60s"
> >>  > on-fail="restart" \
> >>  >>          op stop interval="0s" timeout="60s"
> >>  > on-fail="ignore"
> >>  >> 
> >>  >>  location loc1 Remote-rsc1 \
> >>  >>          rule 200: #uname eq snmp1 \
> >>  >>          rule 100: #uname eq snmp2
> >>  >>  location loc2 Remote-rsc2 \        rule 200: #uname eq snmp2
> > \
> >>  >   rule 100: #uname eq snmp1
> >>  >>  location loc3 Host-rsc1 \
> >>  >>          rule 200: #uname eq sl7-01
> >>  >> 
> >>  >>  -------------------------------------------------------------
> >>  >> 
> >>  >>  Step 1) We use two remote nodes and constitute a cluster.
> >>  >>  -------------------------------------------------------------
> >>  >>  Version: 1.1.12-3e93bc1
> >>  >>  3 Nodes configured
> >>  >>  5 Resources configured
> >>  >> 
> >>  >> 
> >>  >>  Online: [ sl7-01 ]
> >>  >>  RemoteOnline: [ snmp1 snmp2 ]
> >>  >> 
> >>  >>   Host-rsc1      (ocf::heartbeat:Dummy): Started sl7-01
> >>  >>   Remote-rsc1    (ocf::heartbeat:Dummy): Started snmp1
> >>  >>   Remote-rsc2    (ocf::heartbeat:Dummy): Started snmp2
> >>  >>   snmp1  (ocf::pacemaker:remote):        Started sl7-01
> >>  >>   snmp2  (ocf::pacemaker:remote):        Started sl7-01
> >>  >> 
> >>  >>  Node Attributes:
> >>  >>  * Node sl7-01:
> >>  >>  * Node snmp1:
> >>  >>  * Node snmp2:
> >>  >> 
> >>  >>  Migration summary:
> >>  >>  * Node sl7-01:
> >>  >>  * Node snmp1:
> >>  >>  * Node snmp2:
> >>  >>  -------------------------------------------------------------
> >>  >> 
> >>  >>  Step 2) We stop pacemaker_remoted in one remote.
> >>  >>  -------------------------------------------------------------
> >>  >>  Current DC: sl7-01 - partition WITHOUT quorum
> >>  >>  Version: 1.1.12-3e93bc1
> >>  >>  3 Nodes configured
> >>  >>  5 Resources configured
> >>  >> 
> >>  >> 
> >>  >>  Online: [ sl7-01 ]
> >>  >>  RemoteOnline: [ snmp1 ]
> >>  >>  RemoteOFFLINE: [ snmp2 ]
> >>  >> 
> >>  >>   Host-rsc1      (ocf::heartbeat:Dummy): Started sl7-01
> >>  >>   Remote-rsc1    (ocf::heartbeat:Dummy): Started snmp1
> >>  >>   snmp1  (ocf::pacemaker:remote):        Started sl7-01
> >>  >>   snmp2  (ocf::pacemaker:remote):        FAILED sl7-01
> >>  >> 
> >>  >>  Node Attributes:
> >>  >>  * Node sl7-01:
> >>  >>  * Node snmp1:
> >>  >> 
> >>  >>  Migration summary:
> >>  >>  * Node sl7-01:
> >>  >>     snmp2: migration-threshold=1 fail-count=1
> > last-failure='Fri Apr  3
> >>  > 12:56:12 2015'
> >>  >>  * Node snmp1:
> >>  >> 
> >>  >>  Failed actions:
> >>  >>      snmp2_monitor_3000 on sl7-01 'unknown error' (1):
> > call=6,
> >>  > status=Error, exit-reason='none', last-rc-change='Fri Apr
> > 3
> >>  > 12:56:12 2015', queued=0ms, exec=0ms
> >>  > 
> >>  > Ideally we’d have fencing configured and reboot the remote node here.
> >>  > But for the sake of argument, ok :)
> >>  > 
> >>  > 
> >>  >>  -------------------------------------------------------------
> >>  >> 
> >>  >>  Step 3) We reboot pacemaker_remoted which stopped.
> >>  > 
> >>  > As in you reboot the node on which pacemaker_remoted is stopped and
> >>  > pacemaker_remoted is configured to start at boot?
> >>  > 
> >>  >> 
> >>  >>  Step 4) We clear snmp2 of remote by crm_resource command,
> >>  > 
> >>  > Was pacemaker_remoted running at this point?
> >>  > I mentioned this earlier today, we need to improve the experience in
> > this
> >>  > area.
> >>  > 
> >>  > Probably a good excuse to fix on-fail=ignore for start actions.
> >>  > 
> >>  >>  but remote cannot participate in a cluster.
> >>  >>  -------------------------------------------------------------
> >>  >>  Version: 1.1.12-3e93bc1
> >>  >>  3 Nodes configured
> >>  >>  5 Resources configured
> >>  >> 
> >>  >> 
> >>  >>  Online: [ sl7-01 ]
> >>  >>  RemoteOnline: [ snmp1 ]
> >>  >>  RemoteOFFLINE: [ snmp2 ]
> >>  >> 
> >>  >>   Host-rsc1      (ocf::heartbeat:Dummy): Started sl7-01
> >>  >>   Remote-rsc1    (ocf::heartbeat:Dummy): Started snmp1
> >>  >>   snmp1  (ocf::pacemaker:remote):        Started sl7-01
> >>  >>   snmp2  (ocf::pacemaker:remote):        FAILED sl7-01
> >>  >> 
> >>  >>  Node Attributes:
> >>  >>  * Node sl7-01:
> >>  >>  * Node snmp1:
> >>  >> 
> >>  >>  Migration summary:
> >>  >>  * Node sl7-01:
> >>  >>     snmp2: migration-threshold=1 fail-count=1000000
> > last-failure='Wed
> >>  > Apr  8 11:21:09 2015'
> >>  >>  * Node snmp1:
> >>  >> 
> >>  >>  Failed actions:
> >>  >>      snmp2_start_0 on sl7-01 'unknown error' (1): call=8,
> >>  > status=Timed Out, exit-reason='none', last-rc-change='Wed
> > Apr  8
> >>  > 11:20:11 2015', queued=0ms, exec=0ms
> >>  >>  -------------------------------------------------------------
> >>  >> 
> >>  >> 
> >>  >>  Node of pacemaker and the remote node output the following log
> >>  >>  repeatedly.
> >>  >> 
> >>  >>  -------------------------------------------------------------
> >>  >>  Apr  8 11:20:38 sl7-01 crmd[17101]: info:
> > crm_remote_tcp_connect_async:
> >>  > Attempting to connect to remote server at 192.168.40.110:3121
> >>  >>  Apr  8 11:20:38 sl7-01 crmd[17101]: info: lrmd_tcp_connect_cb:
> > Remote
> >>  >>  lrmd
> >>  > client TLS connection established with server snmp2:3121
> >>  >>  Apr  8 11:20:38 sl7-01 crmd[17101]: error: lrmd_tls_recv_reply:
> > Unable to
> >>  > receive expected reply, disconnecting.
> >>  >>  Apr  8 11:20:38 sl7-01 crmd[17101]: error: lrmd_tls_send_recv:
> > Remote
> >>  >>  lrmd
> >>  > server disconnected while waiting for reply with id 101.
> >>  >>  Apr  8 11:20:38 sl7-01 crmd[17101]: info:
> > lrmd_tls_connection_destroy:
> >>  >>  TLS
> >>  > connection destroyed
> >>  >>  Apr  8 11:20:38 sl7-01 crmd[17101]: info: lrmd_api_disconnect:
> >>  > Disconnecting from lrmd service
> >>  >>  -------------------------------------------------------------
> >>  >>  Apr  8 11:20:36 snmp2 pacemaker_remoted[1502]:   notice:
> >>  > lrmd_remote_client_destroy: LRMD client disconnecting remote client -
> > name:
> >>  > remote-lrmd-snmp2:3121 id: 8fbbc3cd-daa5-406b-942d-21be868cfc62
> >>  >>  Apr  8 11:20:37 snmp2 pacemaker_remoted[1502]:   notice:
> >>  > lrmd_remote_listen: LRMD client connection established. 0xbb7ca0 id:
> >>  > a59392c9-6575-40ed-9b53-98a68de00409
> >>  >>  Apr  8 11:20:38 snmp2 pacemaker_remoted[1502]:     info:
> >>  > lrmd_remote_client_msg: Client disconnect detected in tls msg
> > dispatcher.
> >>  >>  Apr  8 11:20:38 snmp2 pacemaker_remoted[1502]:   notice:
> >>  > lrmd_remote_client_destroy: LRMD client disconnecting remote client -
> > name:
> >>  > remote-lrmd-snmp2:3121 id: a59392c9-6575-40ed-9b53-98a68de00409
> >>  >>  Apr  8 11:20:39 snmp2 pacemaker_remoted[1502]:   notice:
> >>  > lrmd_remote_listen: LRMD client connection established. 0xbb7ca0 id:
> >>  > 0e58614c-b1c5-4e37-a917-1f8e3de5de24
> >>  >>  Apr  8 11:20:39 snmp2 pacemaker_remoted[1502]:     info:
> >>  > lrmd_remote_client_msg: Client disconnect detected in tls msg
> > dispatcher.
> >>  >>  Apr  8 11:20:39 snmp2 pacemaker_remoted[1502]:   notice:
> >>  > lrmd_remote_client_destroy: LRMD client disconnecting remote client -
> > name:
> >>  > remote-lrmd-snmp2:3121 id: 0e58614c-b1c5-4e37-a917-1f8e3de5de24
> >>  >>  Apr  8 11:20:40 snmp2 pacemaker_remoted[1502]:   notice:
> >>  > lrmd_remote_listen: LRMD client connection established. 0xbb7ca0 id:
> >>  > 518bcca5-5f83-47fb-93ea-2ece33690111
> >>  >>  -------------------------------------------------------------
> >>  >> 
> >>  >>  Is this movement right?
> >>  >> 
> >>  >>  Best Regards,
> >>  >>  Hideo Yamauchi.
> >>  >> 
> >>  >> 
> >>  >> 
> >>  >>  ----- Original Message -----
> >>  >>>  From: Ulrich Windl <Ulrich.Windl at rz.uni-regensburg.de>
> >>  >>>  To: users at clusterlabs.org
> >>  >>>  Cc:
> >>  >>>  Date: 2015/4/2, Thu 22:30
> >>  >>>  Subject: [ClusterLabs] Antw: Re: [Question] About movement of
> >>  > pacemaker_remote.
> >>  >>> 
> >>  >>>>>>  David Vossel <dvossel at redhat.com> schrieb
> > am
> >>  > 02.04.2015 um
> >>  >>>  14:58 in Nachricht
> >>  >>>  
> > <796820123.6644200.1427979523554.JavaMail.zimbra at redhat.com>:
> >>  >>> 
> >>  >>>> 
> >>  >>>>  ----- Original Message -----
> >>  >>>>> 
> >>  >>>>>>  On 14 Mar 2015, at 10:14 am, David Vossel
> >>  >>>  <dvossel at redhat.com> wrote:
> >>  >>>>>> 
> >>  >>>>>> 
> >>  >>>>>> 
> >>  >>>>>>  ----- Original Message -----
> >>  >>>>>>> 
> >>  >>>>>>>  Failed actions:
> >>  >>>>>>>       snmp2_start_0 on sl7-01 'unknown
> > error'
> >>  > (1):
> >>  >>>  call=8, status=Timed Out,
> >>  >>>>>>>       exit-reason='none',
> >>  > last-rc-change='Thu Mar 12
> >>  >>>  14:26:26 2015',
> >>  >>>>>>>       queued=0ms, exec=0ms
> >>  >>>>>>>       snmp2_start_0 on sl7-01 'unknown
> > error'
> >>  > (1):
> >>  >>>  call=8, status=Timed Out,
> >>  >>>>>>>       exit-reason='none',
> >>  > last-rc-change='Thu Mar 12
> >>  >>>  14:26:26 2015',
> >>  >>>>>>>       queued=0ms, exec=0ms
> >>  >>>>>>>  -----------------------
> >>  >>>>>> 
> >>  >>>>>>  Pacemaker is attempting to restore connection to
> > the remote
> >>  > node
> >>  >>>  here, are
> >>  >>>>>>  you
> >>  >>>>>>  sure the remote is accessible? The "Timed
> > Out"
> >>  > error
> >>  >>>  means that pacemaker
> >>  >>>>>>  was
> >>  >>>>>>  unable to establish the connection during the
> > timeout
> >>  > period.
> >>  >>>>> 
> >>  >>>>>  Random question: Are we smart enough not to try and
> > start
> >>  >>>  pacemaker-remote
> >>  >>>>>  resources for node's we've just fenced?
> >>  >>>> 
> >>  >>>>  we try and re-connect to remote nodes after fencing. if
> > the fence
> >>  > operation
> >>  >>>>  was 'off' instead of 'reboot', this would
> > make no
> >>  > sense.
> >>  >>>  I'm not entirely
> >>  >>>>  sure how to handle this. We want the remote-node
> > re-integrated into
> >>  > the
> >>  >>>>  cluster,
> >>  >>>>  but i'd like to optimize the case where we know the
> > node will
> >>  > not be
> >>  >>>  coming
> >>  >>>>  back online.
> >>  >>> 
> >>  >>>  Beware: Even if the fencing action is "off" (for
> > software), a
> >>  > human
> >>  >>>  may decide to boot the node anyway, also starting the cluster
> > software.
> >>  >>> 
> >>  >>>> 
> >>  >>>>> 
> >>  >>>>> 
> >>  >>>>> 
> >>  >>>>>  _______________________________________________
> >>  >>>>>  Users mailing list: Users at clusterlabs.org
> >>  >>>>>  http://clusterlabs.org/mailman/listinfo/users
> >>  >>>>> 
> >>  >>>>>  Project Home: http://www.clusterlabs.org
> >>  >>>>>  Getting started:
> >>  >>>  http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> >>  >>>>>  Bugs: http://bugs.clusterlabs.org
> >>  >>>>> 
> >>  >>>> 
> >>  >>>>  _______________________________________________
> >>  >>>>  Users mailing list: Users at clusterlabs.org
> >>  >>>>  http://clusterlabs.org/mailman/listinfo/users
> >>  >>>> 
> >>  >>>>  Project Home: http://www.clusterlabs.org
> >>  >>>>  Getting started:
> >>  > http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> >>  >>>>  Bugs: http://bugs.clusterlabs.org
> >>  >>> 
> >>  >>> 
> >>  >>> 
> >>  >>> 
> >>  >>> 
> >>  >>>  _______________________________________________
> >>  >>>  Users mailing list: Users at clusterlabs.org
> >>  >>>  http://clusterlabs.org/mailman/listinfo/users
> >>  >>> 
> >>  >>>  Project Home: http://www.clusterlabs.org
> >>  >>>  Getting started:
> >>  > http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> >>  >>>  Bugs: http://bugs.clusterlabs.org
> >>  >>> 
> >>  >> 
> >>  >>  _______________________________________________
> >>  >>  Users mailing list: Users at clusterlabs.org
> >>  >>  http://clusterlabs.org/mailman/listinfo/users
> >>  >> 
> >>  >>  Project Home: http://www.clusterlabs.org
> >>  >>  Getting started:
> > http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> >>  >>  Bugs: http://bugs.clusterlabs.org
> >>  > 
> >> 
> >>  _______________________________________________
> >>  Users mailing list: Users at clusterlabs.org
> >>  http://clusterlabs.org/mailman/listinfo/users
> >> 
> >>  Project Home: http://www.clusterlabs.org
> >>  Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> >>  Bugs: http://bugs.clusterlabs.org
> >> 
> > 
> 
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>




More information about the Users mailing list