[ClusterLabs] Antw: Re: Antw: Re: [Question] About movement of pacemaker_remote.

Ulrich Windl Ulrich.Windl at rz.uni-regensburg.de
Wed Apr 15 02:51:08 EDT 2015


Hi!

With the message "    snmp2_start_0 on sl7-01 'unknown error' (1): call=8,
status=Timed Out, exit-reason='none', last-rc-change='Wed Apr 15 09:46:18
2015', " you know where (server and the time) to look in syslog for more
details. Usually you'll find them. Try it! Maybe you can fix the problem.

Regards,
Ulrich

>>> <renayama19661014 at ybb.ne.jp> schrieb am 15.04.2015 um 03:37 in Nachricht
<355704.17914.qm at web200004.mail.kks.yahoo.co.jp>:
> Hi David,
> 
> Thank you for comments.
> 
>> please turn on debug logging in /etc/sysconfig/pacemaker for both the 
> pacemaker
>> nodes and the nodes running pacemaker remote.
>> 
>> set the following
>> 
>> PCMK_logfile=/var/log/pacemaker.log
>> PCMK_debug=yes
>> PCMK_trace_files=lrmd_client.c,lrmd.c,tls_backend.c,remote.c
>> 
>> Provide the logs with the new debug settings enabled during the time
period
>> that pacemaker is unable to reconnect to pacemaker_remote.
> 
> 
> I put the log(log_zip.zip) of two nodes in the next place.(sl7-01 and
snmp2)
>  * https://onedrive.live.com/?cid=3A14D57622C66876&id=3A14D57622C66876%21117

> I rebooted pacemaker_remote of snmp2.
> I carried out crm_resource -C snmp2 afterwards.
> 
> -------------------------------------------------
> [root at sl7-01 ~]# crm_mon -1 -Af
> Last updated: Wed Apr 15 09:54:16 2015
> Last change: Wed Apr 15 09:46:17 2015
> Stack: corosync
> Current DC: sl7-01 - partition WITHOUT quorum
> Version: 1.1.12-3e93bc1
> 3 Nodes configured
> 5 Resources configured
> 
> 
> Online: [ sl7-01 ]
> RemoteOnline: [ snmp1 ]
> RemoteOFFLINE: [ snmp2 ]
> 
>  Host-rsc1      (ocf::heartbeat:Dummy): Started sl7-01 
>  Remote-rsc1    (ocf::heartbeat:Dummy): Started snmp1 
>  Remote-rsc2    (ocf::heartbeat:Dummy): Started snmp1 (failure ignored)
>  snmp1  (ocf::pacemaker:remote):        Started sl7-01 
> 
> Node Attributes:
> * Node sl7-01:
> * Node snmp1:
> 
> Migration summary:
> * Node sl7-01: 
>    snmp2: migration-threshold=1 fail-count=1000000 last-failure='Wed Apr 15

> 09:47:16 2015'
> * Node snmp1: 
> 
> Failed actions:
>     snmp2_start_0 on sl7-01 'unknown error' (1): call=8, status=Timed Out, 
> exit-reason='none', last-rc-change='Wed Apr 15 09:46:18 2015', queued=0ms, 
> exec=0ms
> -------------------------------------------------
> 
> Best Regards,
> 
> Hideo Yamauchi.
> 
> 
> ----- Original Message -----
>> From: David Vossel <dvossel at redhat.com>
>> To: renayama19661014 at ybb.ne.jp; Cluster Labs - All topics related to 
> open-source clustering welcomed <users at clusterlabs.org>
>> Cc: 
>> Date: 2015/4/15, Wed 07:22
>> Subject: Re: [ClusterLabs] Antw: Re: [Question] About movement of 
> pacemaker_remote.
>> 
>> 
>> 
>> ----- Original Message -----
>>>  Hi Andrew,
>>> 
>>>  Thank you for comments.
>>> 
>>>  >> Step 4) We clear snmp2 of remote by crm_resource command,
>>>  > 
>>>  > Was pacemaker_remoted running at this point?
>> 
>> please turn on debug logging in /etc/sysconfig/pacemaker for both the 
> pacemaker
>> nodes and the nodes running pacemaker remote.
>> 
>> set the following
>> 
>> PCMK_logfile=/var/log/pacemaker.log
>> PCMK_debug=yes
>> PCMK_trace_files=lrmd_client.c,lrmd.c,tls_backend.c,remote.c
>> 
>> Provide the logs with the new debug settings enabled during the time
period
>> that pacemaker is unable to reconnect to pacemaker_remote.
>> 
>> Thanks,
>> --David
>> 
>>> 
>>> 
>>>  Yes.
>>> 
>>>  In the node that rebooted pacemaker_remote, it becomes the following
log.
>>> 
>>> 
>>>  ------------------------------
>>>  Apr 13 15:47:29 snmp2 pacemaker_remoted[1494]:     info: main: Starting 
>> ---->
>>>  #### RESTARTED pacemaker_remote.
>>>  Apr 13 15:47:42 snmp2 pacemaker_remoted[1494]:   notice: 
>> lrmd_remote_listen:
>>>  LRMD client connection established. 0x24f4ca0 id:
>>>  5b56e54e-b9da-4804-afda-5c72038d089c
>>>  Apr 13 15:47:43 snmp2 pacemaker_remoted[1494]:     info:
>>>  lrmd_remote_client_msg: Client disconnect detected in tls msg
dispatcher.
>>>  Apr 13 15:47:43 snmp2 pacemaker_remoted[1494]:   notice:
>>>  lrmd_remote_client_destroy: LRMD client disconnecting remote client -
name:
>>>  remote-lrmd-snmp2:3121 id: 5b56e54e-b9da-4804-afda-5c72038d089c
>>>  Apr 13 15:47:44 snmp2 pacemaker_remoted[1494]:   notice: 
>> lrmd_remote_listen:
>>>  LRMD client connection established. 0x24f4ca0 id:
>>>  907cd1fc-6c1d-40f1-8c60-34bc8b66715f
>>>  Apr 13 15:47:44 snmp2 pacemaker_remoted[1494]:     info:
>>>  lrmd_remote_client_msg: Client disconnect detected in tls msg
dispatcher.
>>>  Apr 13 15:47:44 snmp2 pacemaker_remoted[1494]:   notice:
>>>  lrmd_remote_client_destroy: LRMD client disconnecting remote client -
name:
>>>  remote-lrmd-snmp2:3121 id: 907cd1fc-6c1d-40f1-8c60-34bc8b66715f
>>>  Apr 13 15:47:45 snmp2 pacemaker_remoted[1494]:   notice: 
>> lrmd_remote_listen:
>>>  LRMD client connection established. 0x24f4ca0 id:
>>>  8b38c0dd-9338-478a-8f23-523aee4cc1a6
>>>  Apr 13 15:47:46 snmp2 pacemaker_remoted[1494]:     info:
>>>  lrmd_remote_client_msg: Client disconnect detected in tls msg
dispatcher.
>>>  (snip)
>>>  After that the log is repeated.
>>> 
>>> 
>>>  ------------------------------
>>> 
>>> 
>>>  > I mentioned this earlier today, we need to improve the experience in 
>> this
>>>  > area.
>>>  > 
>>>  > Probably a good excuse to fix on-fail=ignore for start actions.
>>>  > 
>>>  >> but remote cannot participate in a cluster.
>>> 
>>> 
>>> 
>>>  I changed crm file as follows.(on-fail=ignore for start)
>>> 
>>> 
>>>  (snip)
>>>  primitive snmp1 ocf:pacemaker:remote \
>>>          params \
>>>                  server="snmp1" \
>>>          op start interval="0s" timeout="60s" 
>> on-fail="ignore" \
>>>          op monitor interval="3s" timeout="15s" \
>>>          op stop interval="0s" timeout="60s" 
>> on-fail="ignore"
>>> 
>>>  primitive snmp2 ocf:pacemaker:remote \
>>>          params \
>>>                  server="snmp2" \
>>>          op start interval="0s" timeout="60s" 
>> on-fail="ignore" \
>>>          op monitor interval="3s" timeout="15s" \
>>>          op stop interval="0s" timeout="60s" 
>> on-fail="stop"
>>> 
>>>  (snip)
>>> 
>>>  However, the result was the same.
>>>  Even if the node of pacemaker_remote which rebooted carries out 
>> crm_resource
>>>  -C, the node does not participate in a cluster.
>>> 
>>>  [root at sl7-01 ~]# crm_mon -1 -Af
>>>  Last updated: Mon Apr 13 15:51:58 2015
>>>  Last change: Mon Apr 13 15:47:41 2015
>>>  Stack: corosync
>>>  Current DC: sl7-01 - partition WITHOUT quorum
>>>  Version: 1.1.12-3e93bc1
>>>  3 Nodes configured
>>>  5 Resources configured
>>> 
>>> 
>>>  Online: [ sl7-01 ]
>>>  RemoteOnline: [ snmp1 ]
>>>  RemoteOFFLINE: [ snmp2 ]
>>> 
>>>   Host-rsc1      (ocf::heartbeat:Dummy): Started sl7-01
>>>   Remote-rsc1    (ocf::heartbeat:Dummy): Started snmp1
>>>   Remote-rsc2    (ocf::heartbeat:Dummy): Started snmp1 (failure ignored)
>>>   snmp1  (ocf::pacemaker:remote):        Started sl7-01
>>> 
>>>  Node Attributes:
>>>  * Node sl7-01:
>>>  * Node snmp1:
>>> 
>>>  Migration summary:
>>>  * Node sl7-01:
>>>     snmp2: migration-threshold=1 fail-count=1000000 last-failure='Mon 
>> Apr 13
>>>     15:48:40 2015'
>>>  * Node snmp1:
>>> 
>>>  Failed actions:
>>>      snmp2_start_0 on sl7-01 'unknown error' (1): call=8, 
>> status=Timed Out,
>>>      exit-reason='none', last-rc-change='Mon Apr 13 15:47:42 
>> 2015',
>>>      queued=0ms, exec=0ms
>>> 
>>> 
>>> 
>>>  Best Regards,
>>>  Hidoe Yamauchi.
>>> 
>>> 
>>> 
>>>  ----- Original Message -----
>>>  > From: Andrew Beekhof <andrew at beekhof.net>
>>>  > To: renayama19661014 at ybb.ne.jp; Cluster Labs - All topics related to
>>>  > open-source clustering welcomed <users at clusterlabs.org>
>>>  > Cc:
>>>  > Date: 2015/4/13, Mon 14:11
>>>  > Subject: Re: [ClusterLabs] Antw: Re: [Question] About movement of
>>>  > pacemaker_remote.
>>>  > 
>>>  > 
>>>  >>  On 8 Apr 2015, at 12:27 pm, renayama19661014 at ybb.ne.jp wrote:
>>>  >> 
>>>  >>  Hi All,
>>>  >> 
>>>  >>  Let me confirm the first question once again.
>>>  >> 
>>>  >>  I confirmed the next movement in Pacemaker1.1.13-rc1.
>>>  >>  Stonith does not set it.
>>>  >> 
>>>  >>  -------------------------------------------------------------
>>>  >>  property no-quorum-policy="ignore" \
>>>  >>          stonith-enabled="false" \
>>>  >>          startup-fencing="false" \
>>>  >> 
>>>  >>  rsc_defaults resource-stickiness="INFINITY" \
>>>  >>          migration-threshold="1"
>>>  >> 
>>>  >>  primitive snmp1 ocf:pacemaker:remote \
>>>  >>          params \
>>>  >>                  server="snmp1" \
>>>  >>          op start interval="0s" timeout="60s"
>>>  > on-fail="restart" \
>>>  >>          op monitor interval="3s" 
>> timeout="15s" \
>>>  >>          op stop interval="0s" timeout="60s"
>>>  > on-fail="ignore"
>>>  >> 
>>>  >>  primitive snmp2 ocf:pacemaker:remote \
>>>  >>          params \
>>>  >>                  server="snmp2" \
>>>  >>          op start interval="0s" timeout="60s"
>>>  > on-fail="restart" \
>>>  >>          op monitor interval="3s" 
>> timeout="15s" \
>>>  >>          op stop interval="0s" timeout="60s"
>>>  > on-fail="stop"
>>>  >> 
>>>  >>  primitive Host-rsc1 ocf:heartbeat:Dummy \
>>>  >>          op start interval="0s" timeout="60s"
>>>  > on-fail="restart" \
>>>  >>          op monitor interval="10s" 
>> timeout="60s"
>>>  > on-fail="restart" \
>>>  >>          op stop interval="0s" timeout="60s"
>>>  > on-fail="ignore"
>>>  >> 
>>>  >>  primitive Remote-rsc1 ocf:heartbeat:Dummy \
>>>  >>          op start interval="0s" timeout="60s"
>>>  > on-fail="restart" \
>>>  >>          op monitor interval="10s" 
>> timeout="60s"
>>>  > on-fail="restart" \
>>>  >>          op stop interval="0s" timeout="60s"
>>>  > on-fail="ignore"
>>>  >> 
>>>  >>  primitive Remote-rsc2 ocf:heartbeat:Dummy \
>>>  >>          op start interval="0s" timeout="60s"
>>>  > on-fail="restart" \
>>>  >>          op monitor interval="10s" 
>> timeout="60s"
>>>  > on-fail="restart" \
>>>  >>          op stop interval="0s" timeout="60s"
>>>  > on-fail="ignore"
>>>  >> 
>>>  >>  location loc1 Remote-rsc1 \
>>>  >>          rule 200: #uname eq snmp1 \
>>>  >>          rule 100: #uname eq snmp2
>>>  >>  location loc2 Remote-rsc2 \        rule 200: #uname eq snmp2 
>> \
>>>  >   rule 100: #uname eq snmp1
>>>  >>  location loc3 Host-rsc1 \
>>>  >>          rule 200: #uname eq sl7-01
>>>  >> 
>>>  >>  -------------------------------------------------------------
>>>  >> 
>>>  >>  Step 1) We use two remote nodes and constitute a cluster.
>>>  >>  -------------------------------------------------------------
>>>  >>  Version: 1.1.12-3e93bc1
>>>  >>  3 Nodes configured
>>>  >>  5 Resources configured
>>>  >> 
>>>  >> 
>>>  >>  Online: [ sl7-01 ]
>>>  >>  RemoteOnline: [ snmp1 snmp2 ]
>>>  >> 
>>>  >>   Host-rsc1      (ocf::heartbeat:Dummy): Started sl7-01
>>>  >>   Remote-rsc1    (ocf::heartbeat:Dummy): Started snmp1
>>>  >>   Remote-rsc2    (ocf::heartbeat:Dummy): Started snmp2
>>>  >>   snmp1  (ocf::pacemaker:remote):        Started sl7-01
>>>  >>   snmp2  (ocf::pacemaker:remote):        Started sl7-01
>>>  >> 
>>>  >>  Node Attributes:
>>>  >>  * Node sl7-01:
>>>  >>  * Node snmp1:
>>>  >>  * Node snmp2:
>>>  >> 
>>>  >>  Migration summary:
>>>  >>  * Node sl7-01:
>>>  >>  * Node snmp1:
>>>  >>  * Node snmp2:
>>>  >>  -------------------------------------------------------------
>>>  >> 
>>>  >>  Step 2) We stop pacemaker_remoted in one remote.
>>>  >>  -------------------------------------------------------------
>>>  >>  Current DC: sl7-01 - partition WITHOUT quorum
>>>  >>  Version: 1.1.12-3e93bc1
>>>  >>  3 Nodes configured
>>>  >>  5 Resources configured
>>>  >> 
>>>  >> 
>>>  >>  Online: [ sl7-01 ]
>>>  >>  RemoteOnline: [ snmp1 ]
>>>  >>  RemoteOFFLINE: [ snmp2 ]
>>>  >> 
>>>  >>   Host-rsc1      (ocf::heartbeat:Dummy): Started sl7-01
>>>  >>   Remote-rsc1    (ocf::heartbeat:Dummy): Started snmp1
>>>  >>   snmp1  (ocf::pacemaker:remote):        Started sl7-01
>>>  >>   snmp2  (ocf::pacemaker:remote):        FAILED sl7-01
>>>  >> 
>>>  >>  Node Attributes:
>>>  >>  * Node sl7-01:
>>>  >>  * Node snmp1:
>>>  >> 
>>>  >>  Migration summary:
>>>  >>  * Node sl7-01:
>>>  >>     snmp2: migration-threshold=1 fail-count=1 
>> last-failure='Fri Apr  3
>>>  > 12:56:12 2015'
>>>  >>  * Node snmp1:
>>>  >> 
>>>  >>  Failed actions:
>>>  >>      snmp2_monitor_3000 on sl7-01 'unknown error' (1): 
>> call=6,
>>>  > status=Error, exit-reason='none', last-rc-change='Fri Apr  
>> 3
>>>  > 12:56:12 2015', queued=0ms, exec=0ms
>>>  > 
>>>  > Ideally we’d have fencing configured and reboot the remote node
here.
>>>  > But for the sake of argument, ok :)
>>>  > 
>>>  > 
>>>  >>  -------------------------------------------------------------
>>>  >> 
>>>  >>  Step 3) We reboot pacemaker_remoted which stopped.
>>>  > 
>>>  > As in you reboot the node on which pacemaker_remoted is stopped and
>>>  > pacemaker_remoted is configured to start at boot?
>>>  > 
>>>  >> 
>>>  >>  Step 4) We clear snmp2 of remote by crm_resource command,
>>>  > 
>>>  > Was pacemaker_remoted running at this point?
>>>  > I mentioned this earlier today, we need to improve the experience in 
>> this
>>>  > area.
>>>  > 
>>>  > Probably a good excuse to fix on-fail=ignore for start actions.
>>>  > 
>>>  >>  but remote cannot participate in a cluster.
>>>  >>  -------------------------------------------------------------
>>>  >>  Version: 1.1.12-3e93bc1
>>>  >>  3 Nodes configured
>>>  >>  5 Resources configured
>>>  >> 
>>>  >> 
>>>  >>  Online: [ sl7-01 ]
>>>  >>  RemoteOnline: [ snmp1 ]
>>>  >>  RemoteOFFLINE: [ snmp2 ]
>>>  >> 
>>>  >>   Host-rsc1      (ocf::heartbeat:Dummy): Started sl7-01
>>>  >>   Remote-rsc1    (ocf::heartbeat:Dummy): Started snmp1
>>>  >>   snmp1  (ocf::pacemaker:remote):        Started sl7-01
>>>  >>   snmp2  (ocf::pacemaker:remote):        FAILED sl7-01
>>>  >> 
>>>  >>  Node Attributes:
>>>  >>  * Node sl7-01:
>>>  >>  * Node snmp1:
>>>  >> 
>>>  >>  Migration summary:
>>>  >>  * Node sl7-01:
>>>  >>     snmp2: migration-threshold=1 fail-count=1000000 
>> last-failure='Wed
>>>  > Apr  8 11:21:09 2015'
>>>  >>  * Node snmp1:
>>>  >> 
>>>  >>  Failed actions:
>>>  >>      snmp2_start_0 on sl7-01 'unknown error' (1): call=8,
>>>  > status=Timed Out, exit-reason='none', last-rc-change='Wed 
>> Apr  8
>>>  > 11:20:11 2015', queued=0ms, exec=0ms
>>>  >>  -------------------------------------------------------------
>>>  >> 
>>>  >> 
>>>  >>  Node of pacemaker and the remote node output the following log
>>>  >>  repeatedly.
>>>  >> 
>>>  >>  -------------------------------------------------------------
>>>  >>  Apr  8 11:20:38 sl7-01 crmd[17101]: info: 
>> crm_remote_tcp_connect_async:
>>>  > Attempting to connect to remote server at 192.168.40.110:3121
>>>  >>  Apr  8 11:20:38 sl7-01 crmd[17101]: info: lrmd_tcp_connect_cb: 
>> Remote
>>>  >>  lrmd
>>>  > client TLS connection established with server snmp2:3121
>>>  >>  Apr  8 11:20:38 sl7-01 crmd[17101]: error: lrmd_tls_recv_reply: 
>> Unable to
>>>  > receive expected reply, disconnecting.
>>>  >>  Apr  8 11:20:38 sl7-01 crmd[17101]: error: lrmd_tls_send_recv: 
>> Remote
>>>  >>  lrmd
>>>  > server disconnected while waiting for reply with id 101.
>>>  >>  Apr  8 11:20:38 sl7-01 crmd[17101]: info: 
>> lrmd_tls_connection_destroy:
>>>  >>  TLS
>>>  > connection destroyed
>>>  >>  Apr  8 11:20:38 sl7-01 crmd[17101]: info: lrmd_api_disconnect:
>>>  > Disconnecting from lrmd service
>>>  >>  -------------------------------------------------------------
>>>  >>  Apr  8 11:20:36 snmp2 pacemaker_remoted[1502]:   notice:
>>>  > lrmd_remote_client_destroy: LRMD client disconnecting remote client - 
>> name:
>>>  > remote-lrmd-snmp2:3121 id: 8fbbc3cd-daa5-406b-942d-21be868cfc62
>>>  >>  Apr  8 11:20:37 snmp2 pacemaker_remoted[1502]:   notice:
>>>  > lrmd_remote_listen: LRMD client connection established. 0xbb7ca0 id:
>>>  > a59392c9-6575-40ed-9b53-98a68de00409
>>>  >>  Apr  8 11:20:38 snmp2 pacemaker_remoted[1502]:     info:
>>>  > lrmd_remote_client_msg: Client disconnect detected in tls msg 
>> dispatcher.
>>>  >>  Apr  8 11:20:38 snmp2 pacemaker_remoted[1502]:   notice:
>>>  > lrmd_remote_client_destroy: LRMD client disconnecting remote client - 
>> name:
>>>  > remote-lrmd-snmp2:3121 id: a59392c9-6575-40ed-9b53-98a68de00409
>>>  >>  Apr  8 11:20:39 snmp2 pacemaker_remoted[1502]:   notice:
>>>  > lrmd_remote_listen: LRMD client connection established. 0xbb7ca0 id:
>>>  > 0e58614c-b1c5-4e37-a917-1f8e3de5de24
>>>  >>  Apr  8 11:20:39 snmp2 pacemaker_remoted[1502]:     info:
>>>  > lrmd_remote_client_msg: Client disconnect detected in tls msg 
>> dispatcher.
>>>  >>  Apr  8 11:20:39 snmp2 pacemaker_remoted[1502]:   notice:
>>>  > lrmd_remote_client_destroy: LRMD client disconnecting remote client - 
>> name:
>>>  > remote-lrmd-snmp2:3121 id: 0e58614c-b1c5-4e37-a917-1f8e3de5de24
>>>  >>  Apr  8 11:20:40 snmp2 pacemaker_remoted[1502]:   notice:
>>>  > lrmd_remote_listen: LRMD client connection established. 0xbb7ca0 id:
>>>  > 518bcca5-5f83-47fb-93ea-2ece33690111
>>>  >>  -------------------------------------------------------------
>>>  >> 
>>>  >>  Is this movement right?
>>>  >> 
>>>  >>  Best Regards,
>>>  >>  Hideo Yamauchi.
>>>  >> 
>>>  >> 
>>>  >> 
>>>  >>  ----- Original Message -----
>>>  >>>  From: Ulrich Windl <Ulrich.Windl at rz.uni-regensburg.de>
>>>  >>>  To: users at clusterlabs.org 
>>>  >>>  Cc:
>>>  >>>  Date: 2015/4/2, Thu 22:30
>>>  >>>  Subject: [ClusterLabs] Antw: Re: [Question] About movement of
>>>  > pacemaker_remote.
>>>  >>> 
>>>  >>>>>>  David Vossel <dvossel at redhat.com> schrieb 
>> am
>>>  > 02.04.2015 um
>>>  >>>  14:58 in Nachricht
>>>  >>>  
>> <796820123.6644200.1427979523554.JavaMail.zimbra at redhat.com>:
>>>  >>> 
>>>  >>>> 
>>>  >>>>  ----- Original Message -----
>>>  >>>>> 
>>>  >>>>>>  On 14 Mar 2015, at 10:14 am, David Vossel
>>>  >>>  <dvossel at redhat.com> wrote:
>>>  >>>>>> 
>>>  >>>>>> 
>>>  >>>>>> 
>>>  >>>>>>  ----- Original Message -----
>>>  >>>>>>> 
>>>  >>>>>>>  Failed actions:
>>>  >>>>>>>       snmp2_start_0 on sl7-01 'unknown 
>> error'
>>>  > (1):
>>>  >>>  call=8, status=Timed Out,
>>>  >>>>>>>       exit-reason='none',
>>>  > last-rc-change='Thu Mar 12
>>>  >>>  14:26:26 2015',
>>>  >>>>>>>       queued=0ms, exec=0ms
>>>  >>>>>>>       snmp2_start_0 on sl7-01 'unknown 
>> error'
>>>  > (1):
>>>  >>>  call=8, status=Timed Out,
>>>  >>>>>>>       exit-reason='none',
>>>  > last-rc-change='Thu Mar 12
>>>  >>>  14:26:26 2015',
>>>  >>>>>>>       queued=0ms, exec=0ms
>>>  >>>>>>>  -----------------------
>>>  >>>>>> 
>>>  >>>>>>  Pacemaker is attempting to restore connection to 
>> the remote
>>>  > node
>>>  >>>  here, are
>>>  >>>>>>  you
>>>  >>>>>>  sure the remote is accessible? The "Timed 
>> Out"
>>>  > error
>>>  >>>  means that pacemaker
>>>  >>>>>>  was
>>>  >>>>>>  unable to establish the connection during the 
>> timeout
>>>  > period.
>>>  >>>>> 
>>>  >>>>>  Random question: Are we smart enough not to try and 
>> start
>>>  >>>  pacemaker-remote
>>>  >>>>>  resources for node's we've just fenced?
>>>  >>>> 
>>>  >>>>  we try and re-connect to remote nodes after fencing. if 
>> the fence
>>>  > operation
>>>  >>>>  was 'off' instead of 'reboot', this would 
>> make no
>>>  > sense.
>>>  >>>  I'm not entirely
>>>  >>>>  sure how to handle this. We want the remote-node 
>> re-integrated into
>>>  > the
>>>  >>>>  cluster,
>>>  >>>>  but i'd like to optimize the case where we know the 
>> node will
>>>  > not be
>>>  >>>  coming
>>>  >>>>  back online.
>>>  >>> 
>>>  >>>  Beware: Even if the fencing action is "off" (for 
>> software), a
>>>  > human
>>>  >>>  may decide to boot the node anyway, also starting the cluster 
>> software.
>>>  >>> 
>>>  >>>> 
>>>  >>>>> 
>>>  >>>>> 
>>>  >>>>> 
>>>  >>>>>  _______________________________________________
>>>  >>>>>  Users mailing list: Users at clusterlabs.org 
>>>  >>>>>  http://clusterlabs.org/mailman/listinfo/users 
>>>  >>>>> 
>>>  >>>>>  Project Home: http://www.clusterlabs.org 
>>>  >>>>>  Getting started:
>>>  >>>  http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf 
>>>  >>>>>  Bugs: http://bugs.clusterlabs.org 
>>>  >>>>> 
>>>  >>>> 
>>>  >>>>  _______________________________________________
>>>  >>>>  Users mailing list: Users at clusterlabs.org 
>>>  >>>>  http://clusterlabs.org/mailman/listinfo/users 
>>>  >>>> 
>>>  >>>>  Project Home: http://www.clusterlabs.org 
>>>  >>>>  Getting started:
>>>  > http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf 
>>>  >>>>  Bugs: http://bugs.clusterlabs.org 
>>>  >>> 
>>>  >>> 
>>>  >>> 
>>>  >>> 
>>>  >>> 
>>>  >>>  _______________________________________________
>>>  >>>  Users mailing list: Users at clusterlabs.org 
>>>  >>>  http://clusterlabs.org/mailman/listinfo/users 
>>>  >>> 
>>>  >>>  Project Home: http://www.clusterlabs.org 
>>>  >>>  Getting started:
>>>  > http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf 
>>>  >>>  Bugs: http://bugs.clusterlabs.org 
>>>  >>> 
>>>  >> 
>>>  >>  _______________________________________________
>>>  >>  Users mailing list: Users at clusterlabs.org 
>>>  >>  http://clusterlabs.org/mailman/listinfo/users 
>>>  >> 
>>>  >>  Project Home: http://www.clusterlabs.org 
>>>  >>  Getting started: 
>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf 
>>>  >>  Bugs: http://bugs.clusterlabs.org 
>>>  > 
>>> 
>>>  _______________________________________________
>>>  Users mailing list: Users at clusterlabs.org 
>>>  http://clusterlabs.org/mailman/listinfo/users 
>>> 
>>>  Project Home: http://www.clusterlabs.org 
>>>  Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf

>>>  Bugs: http://bugs.clusterlabs.org 
>>> 
>> 
> 
> _______________________________________________
> Users mailing list: Users at clusterlabs.org 
> http://clusterlabs.org/mailman/listinfo/users 
> 
> Project Home: http://www.clusterlabs.org 
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf 
> Bugs: http://bugs.clusterlabs.org 







More information about the Users mailing list