[ClusterLabs] Antw: Re: [Question] About movement of pacemaker_remote.
renayama19661014 at ybb.ne.jp
renayama19661014 at ybb.ne.jp
Wed Apr 8 02:27:49 UTC 2015
Hi All,
Let me confirm the first question once again.
I confirmed the next movement in Pacemaker1.1.13-rc1.
Stonith does not set it.
-------------------------------------------------------------
property no-quorum-policy="ignore" \
stonith-enabled="false" \
startup-fencing="false" \
rsc_defaults resource-stickiness="INFINITY" \
migration-threshold="1"
primitive snmp1 ocf:pacemaker:remote \
params \
server="snmp1" \
op start interval="0s" timeout="60s" on-fail="restart" \
op monitor interval="3s" timeout="15s" \
op stop interval="0s" timeout="60s" on-fail="ignore"
primitive snmp2 ocf:pacemaker:remote \
params \
server="snmp2" \
op start interval="0s" timeout="60s" on-fail="restart" \
op monitor interval="3s" timeout="15s" \
op stop interval="0s" timeout="60s" on-fail="stop"
primitive Host-rsc1 ocf:heartbeat:Dummy \
op start interval="0s" timeout="60s" on-fail="restart" \
op monitor interval="10s" timeout="60s" on-fail="restart" \
op stop interval="0s" timeout="60s" on-fail="ignore"
primitive Remote-rsc1 ocf:heartbeat:Dummy \
op start interval="0s" timeout="60s" on-fail="restart" \
op monitor interval="10s" timeout="60s" on-fail="restart" \
op stop interval="0s" timeout="60s" on-fail="ignore"
primitive Remote-rsc2 ocf:heartbeat:Dummy \
op start interval="0s" timeout="60s" on-fail="restart" \
op monitor interval="10s" timeout="60s" on-fail="restart" \
op stop interval="0s" timeout="60s" on-fail="ignore"
location loc1 Remote-rsc1 \
rule 200: #uname eq snmp1 \
rule 100: #uname eq snmp2
location loc2 Remote-rsc2 \ rule 200: #uname eq snmp2 \ rule 100: #uname eq snmp1
location loc3 Host-rsc1 \
rule 200: #uname eq sl7-01
-------------------------------------------------------------
Step 1) We use two remote nodes and constitute a cluster.
-------------------------------------------------------------
Version: 1.1.12-3e93bc1
3 Nodes configured
5 Resources configured
Online: [ sl7-01 ]
RemoteOnline: [ snmp1 snmp2 ]
Host-rsc1 (ocf::heartbeat:Dummy): Started sl7-01
Remote-rsc1 (ocf::heartbeat:Dummy): Started snmp1
Remote-rsc2 (ocf::heartbeat:Dummy): Started snmp2
snmp1 (ocf::pacemaker:remote): Started sl7-01
snmp2 (ocf::pacemaker:remote): Started sl7-01
Node Attributes:
* Node sl7-01:
* Node snmp1:
* Node snmp2:
Migration summary:
* Node sl7-01:
* Node snmp1:
* Node snmp2:
-------------------------------------------------------------
Step 2) We stop pacemaker_remoted in one remote.
-------------------------------------------------------------
Current DC: sl7-01 - partition WITHOUT quorum
Version: 1.1.12-3e93bc1
3 Nodes configured
5 Resources configured
Online: [ sl7-01 ]
RemoteOnline: [ snmp1 ]
RemoteOFFLINE: [ snmp2 ]
Host-rsc1 (ocf::heartbeat:Dummy): Started sl7-01
Remote-rsc1 (ocf::heartbeat:Dummy): Started snmp1
snmp1 (ocf::pacemaker:remote): Started sl7-01
snmp2 (ocf::pacemaker:remote): FAILED sl7-01
Node Attributes:
* Node sl7-01:
* Node snmp1:
Migration summary:
* Node sl7-01:
snmp2: migration-threshold=1 fail-count=1 last-failure='Fri Apr 3 12:56:12 2015'
* Node snmp1:
Failed actions:
snmp2_monitor_3000 on sl7-01 'unknown error' (1): call=6, status=Error, exit-reason='none', last-rc-change='Fri Apr 3 12:56:12 2015', queued=0ms, exec=0ms
-------------------------------------------------------------
Step 3) We reboot pacemaker_remoted which stopped.
Step 4) We clear snmp2 of remote by crm_resource command, but remote cannot participate in a cluster.
-------------------------------------------------------------
Version: 1.1.12-3e93bc1
3 Nodes configured
5 Resources configured
Online: [ sl7-01 ]
RemoteOnline: [ snmp1 ]
RemoteOFFLINE: [ snmp2 ]
Host-rsc1 (ocf::heartbeat:Dummy): Started sl7-01
Remote-rsc1 (ocf::heartbeat:Dummy): Started snmp1
snmp1 (ocf::pacemaker:remote): Started sl7-01
snmp2 (ocf::pacemaker:remote): FAILED sl7-01
Node Attributes:
* Node sl7-01:
* Node snmp1:
Migration summary:
* Node sl7-01:
snmp2: migration-threshold=1 fail-count=1000000 last-failure='Wed Apr 8 11:21:09 2015'
* Node snmp1:
Failed actions:
snmp2_start_0 on sl7-01 'unknown error' (1): call=8, status=Timed Out, exit-reason='none', last-rc-change='Wed Apr 8 11:20:11 2015', queued=0ms, exec=0ms
-------------------------------------------------------------
Node of pacemaker and the remote node output the following log repeatedly.
-------------------------------------------------------------
Apr 8 11:20:38 sl7-01 crmd[17101]: info: crm_remote_tcp_connect_async: Attempting to connect to remote server at 192.168.40.110:3121
Apr 8 11:20:38 sl7-01 crmd[17101]: info: lrmd_tcp_connect_cb: Remote lrmd client TLS connection established with server snmp2:3121
Apr 8 11:20:38 sl7-01 crmd[17101]: error: lrmd_tls_recv_reply: Unable to receive expected reply, disconnecting.
Apr 8 11:20:38 sl7-01 crmd[17101]: error: lrmd_tls_send_recv: Remote lrmd server disconnected while waiting for reply with id 101.
Apr 8 11:20:38 sl7-01 crmd[17101]: info: lrmd_tls_connection_destroy: TLS connection destroyed
Apr 8 11:20:38 sl7-01 crmd[17101]: info: lrmd_api_disconnect: Disconnecting from lrmd service
-------------------------------------------------------------
Apr 8 11:20:36 snmp2 pacemaker_remoted[1502]: notice: lrmd_remote_client_destroy: LRMD client disconnecting remote client - name: remote-lrmd-snmp2:3121 id: 8fbbc3cd-daa5-406b-942d-21be868cfc62
Apr 8 11:20:37 snmp2 pacemaker_remoted[1502]: notice: lrmd_remote_listen: LRMD client connection established. 0xbb7ca0 id: a59392c9-6575-40ed-9b53-98a68de00409
Apr 8 11:20:38 snmp2 pacemaker_remoted[1502]: info: lrmd_remote_client_msg: Client disconnect detected in tls msg dispatcher.
Apr 8 11:20:38 snmp2 pacemaker_remoted[1502]: notice: lrmd_remote_client_destroy: LRMD client disconnecting remote client - name: remote-lrmd-snmp2:3121 id: a59392c9-6575-40ed-9b53-98a68de00409
Apr 8 11:20:39 snmp2 pacemaker_remoted[1502]: notice: lrmd_remote_listen: LRMD client connection established. 0xbb7ca0 id: 0e58614c-b1c5-4e37-a917-1f8e3de5de24
Apr 8 11:20:39 snmp2 pacemaker_remoted[1502]: info: lrmd_remote_client_msg: Client disconnect detected in tls msg dispatcher.
Apr 8 11:20:39 snmp2 pacemaker_remoted[1502]: notice: lrmd_remote_client_destroy: LRMD client disconnecting remote client - name: remote-lrmd-snmp2:3121 id: 0e58614c-b1c5-4e37-a917-1f8e3de5de24
Apr 8 11:20:40 snmp2 pacemaker_remoted[1502]: notice: lrmd_remote_listen: LRMD client connection established. 0xbb7ca0 id: 518bcca5-5f83-47fb-93ea-2ece33690111
-------------------------------------------------------------
Is this movement right?
Best Regards,
Hideo Yamauchi.
----- Original Message -----
> From: Ulrich Windl <Ulrich.Windl at rz.uni-regensburg.de>
> To: users at clusterlabs.org
> Cc:
> Date: 2015/4/2, Thu 22:30
> Subject: [ClusterLabs] Antw: Re: [Question] About movement of pacemaker_remote.
>
>>>> David Vossel <dvossel at redhat.com> schrieb am 02.04.2015 um
> 14:58 in Nachricht
> <796820123.6644200.1427979523554.JavaMail.zimbra at redhat.com>:
>
>>
>> ----- Original Message -----
>>>
>>> > On 14 Mar 2015, at 10:14 am, David Vossel
> <dvossel at redhat.com> wrote:
>>> >
>>> >
>>> >
>>> > ----- Original Message -----
>>> >>
>>> >> Failed actions:
>>> >> snmp2_start_0 on sl7-01 'unknown error' (1):
> call=8, status=Timed Out,
>>> >> exit-reason='none', last-rc-change='Thu Mar 12
> 14:26:26 2015',
>>> >> queued=0ms, exec=0ms
>>> >> snmp2_start_0 on sl7-01 'unknown error' (1):
> call=8, status=Timed Out,
>>> >> exit-reason='none', last-rc-change='Thu Mar 12
> 14:26:26 2015',
>>> >> queued=0ms, exec=0ms
>>> >> -----------------------
>>> >
>>> > Pacemaker is attempting to restore connection to the remote node
> here, are
>>> > you
>>> > sure the remote is accessible? The "Timed Out" error
> means that pacemaker
>>> > was
>>> > unable to establish the connection during the timeout period.
>>>
>>> Random question: Are we smart enough not to try and start
> pacemaker-remote
>>> resources for node's we've just fenced?
>>
>> we try and re-connect to remote nodes after fencing. if the fence operation
>> was 'off' instead of 'reboot', this would make no sense.
> I'm not entirely
>> sure how to handle this. We want the remote-node re-integrated into the
>> cluster,
>> but i'd like to optimize the case where we know the node will not be
> coming
>> back online.
>
> Beware: Even if the fencing action is "off" (for software), a human
> may decide to boot the node anyway, also starting the cluster software.
>
>>
>>>
>>>
>>>
>>> _______________________________________________
>>> Users mailing list: Users at clusterlabs.org
>>> http://clusterlabs.org/mailman/listinfo/users
>>>
>>> Project Home: http://www.clusterlabs.org
>>> Getting started:
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://bugs.clusterlabs.org
>>>
>>
>> _______________________________________________
>> Users mailing list: Users at clusterlabs.org
>> http://clusterlabs.org/mailman/listinfo/users
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>
>
>
>
>
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>
More information about the Users
mailing list