[ClusterLabs] Antw: Re: Antw: Re: [Question] About movement of pacemaker_remote.
renayama19661014 at ybb.ne.jp
renayama19661014 at ybb.ne.jp
Mon May 11 22:12:55 EDT 2015
Hi All,
The problem is like a buffer becoming NULL after crm_resouce -C practice somehow or other after having rebooted remote node.
I incorporated log in a source code and confirmed it.
------------------------------------------------
crm_remote_recv_once(crm_remote_t * remote)
{
(snip)
/* automatically grow the buffer when needed */
if(remote->buffer_size < read_len) {
remote->buffer_size = 2 * read_len;
crm_trace("Expanding buffer to %u bytes", remote->buffer_size);
remote->buffer = realloc_safe(remote->buffer, remote->buffer_size + 1);
CRM_ASSERT(remote->buffer != NULL);
}
#ifdef HAVE_GNUTLS_GNUTLS_H
if (remote->tls_session) {
if (remote->buffer == NULL) {
crm_info("### YAMAUCHI buffer is NULL [buffer_zie[%d] readlen[%d]", remote->buffer_size, read_len);
}
rc = gnutls_record_recv(*(remote->tls_session),
remote->buffer + remote->buffer_offset,
remote->buffer_size - remote->buffer_offset);
(snip)
------------------------------------------------
May 12 10:54:01 sl7-01 crmd[30447]: info: crm_remote_recv_once: ### YAMAUCHI buffer is NULL [buffer_zie[1326] readlen[40]
May 12 10:54:02 sl7-01 crmd[30447]: info: crm_remote_recv_once: ### YAMAUCHI buffer is NULL [buffer_zie[1326] readlen[40]
May 12 10:54:04 sl7-01 crmd[30447]: info: crm_remote_recv_once: ### YAMAUCHI buffer is NULL [buffer_zie[1326] readlen[40]
------------------------------------------------
gnutls_record_recv processes an empty buffer and becomes the error.
------------------------------------------------
(snip)
ssize_t
_gnutls_recv_int(gnutls_session_t session, content_type_t type,
gnutls_handshake_description_t htype,
gnutls_packet_t *packet,
uint8_t * data, size_t data_size, void *seq,
unsigned int ms)
{
int ret;
if (packet == NULL && (type != GNUTLS_ALERT && type != GNUTLS_HEARTBEAT)
&& (data_size == 0 || data == NULL))
return gnutls_assert_val(GNUTLS_E_INVALID_REQUEST);
(sip)
ssize_t
gnutls_record_recv(gnutls_session_t session, void *data, size_t data_size)
{
return _gnutls_recv_int(session, GNUTLS_APPLICATION_DATA, -1, NULL,
data, data_size, NULL,
session->internals.record_timeout_ms);
}
(snip)
------------------------------------------------
Best Regards,
Hideo Yamauchi.
----- Original Message -----
> From: "renayama19661014 at ybb.ne.jp" <renayama19661014 at ybb.ne.jp>
> To: "users at clusterlabs.org" <users at clusterlabs.org>
> Cc:
> Date: 2015/5/11, Mon 16:45
> Subject: Re: [ClusterLabs] Antw: Re: Antw: Re: [Question] About movement of pacemaker_remote.
>
> Hi Ulrich,
>
> Thank you for comments.
>
>> So your host and you resource are both named "snmp1"? I also
> don't
>> have much experience with cleaning up resources for a node that is offline.
> What
>> change should it make (while the node is offline)?
>
>
> The name of the remote resource and the name of the remote node make same
> "snmp1".
>
>
> (snip)
> primitive snmp1 ocf:pacemaker:remote \
> params \
> server="snmp1" \
> op start interval="0s" timeout="60s"
> on-fail="ignore" \
> op monitor interval="3s" timeout="15s" \
> op stop interval="0s" timeout="60s"
> on-fail="ignore"
>
> primitive Host-rsc1 ocf:heartbeat:Dummy \
> op start interval="0s" timeout="60s"
> on-fail="restart" \
> op monitor interval="10s" timeout="60s"
> on-fail="restart" \
> op stop interval="0s" timeout="60s"
> on-fail="ignore"
>
> primitive Remote-rsc1 ocf:heartbeat:Dummy \
> op start interval="0s" timeout="60s"
> on-fail="restart" \
> op monitor interval="10s" timeout="60s"
> on-fail="restart" \
> op stop interval="0s" timeout="60s"
> on-fail="ignore"
>
> location loc1 Remote-rsc1 \
> rule 200: #uname eq snmp1
> location loc3 Host-rsc1 \
> rule 200: #uname eq bl460g8n1
> (snip)
>
> The pacemaker_remoted of the snmp1 node stops in SIGTERM.
> I reboot pacemaker_remoted of the snmp1 node afterwards.
> And I execute crm_resource command, but the snmp1 node remains off-line.
>
> After having executed crm_resource command, the remote node thinks that it is
> right movement to become the snmp1 online.
>
>
>
> Best Regards,
> Hideo Yamauchi.
>
>
>
>
>
> ----- Original Message -----
>> From: Ulrich Windl <Ulrich.Windl at rz.uni-regensburg.de>
>> To: users at clusterlabs.org; renayama19661014 at ybb.ne.jp
>> Cc:
>> Date: 2015/5/11, Mon 15:39
>> Subject: Antw: Re: [ClusterLabs] Antw: Re: [Question] About movement of
> pacemaker_remote.
>>
>>>>> <renayama19661014 at ybb.ne.jp> schrieb am 11.05.2015 um
> 06:22
>> in Nachricht
>> <361916.15877.qm at web200006.mail.kks.yahoo.co.jp>:
>>> Hi All,
>>>
>>> I matched the OS version of the remote node with a host once again and
>
>>> confirmed it in Pacemaker1.1.13-rc2.
>>>
>>> It was the same even if I made a host RHEL7.1.(bl460g8n1)
>>> I made the remote host RHEL7.1.(snmp1)
>>>
>>> The first crm_resource -C fails.
>>> --------------------------------
>>> [root at bl460g8n1 ~]# crm_resource -C -r snmp1
>>> Cleaning up snmp1 on bl460g8n1
>>> Waiting for 1 replies from the CRMd. OK
>>>
>>> [root at bl460g8n1 ~]# crm_mon -1 -Af
>>> Last updated: Mon May 11 12:44:31 2015
>>> Last change: Mon May 11 12:43:30 2015
>>> Stack: corosync
>>> Current DC: bl460g8n1 - partition WITHOUT quorum
>>> Version: 1.1.12-7a2e3ae
>>> 2 Nodes configured
>>> 3 Resources configured
>>>
>>>
>>> Online: [ bl460g8n1 ]
>>> RemoteOFFLINE: [ snmp1 ]
>>
>> So your host and you resource are both named "snmp1"? I also
> don't
>> have much experience with cleaning up resources for a node that is offline.
> What
>> change should it make (while the node is offline)?
>>
>>>
>>> Host-rsc1 (ocf::heartbeat:Dummy): Started bl460g8n1
>>> Remote-rsc1 (ocf::heartbeat:Dummy): Started bl460g8n1 (failure
> ignored)
>>>
>>> Node Attributes:
>>> * Node bl460g8n1:
>>> + ringnumber_0 : 192.168.101.21 is UP
>>> + ringnumber_1 : 192.168.102.21 is UP
>>>
>>> Migration summary:
>>> * Node bl460g8n1:
>>> snmp1: migration-threshold=1 fail-count=1000000
> last-failure='Mon
>> May 11
>>> 12:44:28 2015'
>>>
>>> Failed actions:
>>> snmp1_start_0 on bl460g8n1 'unknown error' (1): call=5,
>> status=Timed
>>> Out, exit-reason='none', last-rc-change='Mon May 11
> 12:43:31
>> 2015', queued=0ms,
>>> exec=0ms
>>> --------------------------------
>>>
>>>
>>> The second crm_resource -C succeeded and was connected to the remote
> host.
>>
>> Then the node was online it seems.
>>
>> Regards,
>> Ulrich
>>
>>> --------------------------------
>>> [root at bl460g8n1 ~]# crm_mon -1 -Af
>>> Last updated: Mon May 11 12:44:54 2015
>>> Last change: Mon May 11 12:44:48 2015
>>> Stack: corosync
>>> Current DC: bl460g8n1 - partition WITHOUT quorum
>>> Version: 1.1.12-7a2e3ae
>>> 2 Nodes configured
>>> 3 Resources configured
>>>
>>>
>>> Online: [ bl460g8n1 ]
>>> RemoteOnline: [ snmp1 ]
>>>
>>> Host-rsc1 (ocf::heartbeat:Dummy): Started bl460g8n1
>>> Remote-rsc1 (ocf::heartbeat:Dummy): Started snmp1
>>> snmp1 (ocf::pacemaker:remote): Started bl460g8n1
>>>
>>> Node Attributes:
>>> * Node bl460g8n1:
>>> + ringnumber_0 : 192.168.101.21 is UP
>>> + ringnumber_1 : 192.168.102.21 is UP
>>> * Node snmp1:
>>>
>>> Migration summary:
>>> * Node bl460g8n1:
>>> * Node snmp1:
>>> --------------------------------
>>>
>>> The gnutls of a host and the remote node was the next version.
>>>
>>> gnutls-devel-3.3.8-12.el7.x86_64
>>> gnutls-dane-3.3.8-12.el7.x86_64
>>> gnutls-c++-3.3.8-12.el7.x86_64
>>> gnutls-3.3.8-12.el7.x86_64
>>> gnutls-utils-3.3.8-12.el7.x86_64
>>>
>>>
>>> Best Regards,
>>> Hideo Yamauchi.
>>>
>>>
>>>
>>>
>>> ----- Original Message -----
>>>> From: "renayama19661014 at ybb.ne.jp"
>> <renayama19661014 at ybb.ne.jp>
>>>> To: Cluster Labs - All topics related to open-source clustering
>> welcomed
>>> <users at clusterlabs.org>
>>>> Cc:
>>>> Date: 2015/4/28, Tue 14:06
>>>> Subject: Re: [ClusterLabs] Antw: Re: [Question] About movement of
>>> pacemaker_remote.
>>>>
>>>> Hi David,
>>>>
>>>> Even if the result changed the remote node to RHEL7.1, it was the
> same.
>>>>
>>>>
>>>> I try it with a host node of pacemaker as RHEL7.1 this time.
>>>>
>>>>
>>>> I noticed an interesting phenomenon.
>>>> The remote node fails in a reconnection in the first crm_resource.
>>>> However, the remote node succeeds in a reconnection in the second
>>> crm_resource.
>>>>
>>>> I think that I have some problem with the point where I cut the
>> connection
>>> with
>>>> the remote node first.
>>>>
>>>> Best Regards,
>>>> Hideo Yamauchi.
>>>>
>>>>
>>>> ----- Original Message -----
>>>>> From: "renayama19661014 at ybb.ne.jp"
>>>> <renayama19661014 at ybb.ne.jp>
>>>>> To: Cluster Labs - All topics related to open-source
> clustering
>> welcomed
>>>> <users at clusterlabs.org>
>>>>> Cc:
>>>>> Date: 2015/4/28, Tue 11:52
>>>>> Subject: Re: [ClusterLabs] Antw: Re: [Question] About
> movement of
>>>> pacemaker_remote.
>>>>>
>>>>> Hi David,
>>>>> Thank you for comments.
>>>>>> At first glance this looks gnutls related. GNUTLS is
>> returning -50
>>>> during
>>>>> receive
>>>>>
>>>>>> on the client side (pacemaker's side). -50 maps to
>> 'invalid
>>>>> request'. >debug: crm_remote_recv_once: TLS
> receive
>> failed: The
>>>>> request is invalid. >We treat this error as fatal and
> destroy
>> the
>>>> connection.
>>>>> I've never encountered
>>>>>> this error and I don't know what causes it. It's
>> possible
>>>>> there's a bug in
>>>>>> our gnutls usage... it's also possible there's a
> bug
>> in the
>>>> version
>>>>> of gnutls
>>>>>> that is in use as well.
>>>>> We built the remote node in RHEL6.5.
>>>>> Because it may be a problem of gnutls, I confirm it in
> RHEL7.1.
>>>>>
>>>>> Best Regards,
>>>>> Hideo Yamauchi.
>>>>>
>>>>> _______________________________________________
>>>>> Users mailing list: Users at clusterlabs.org
>>>>> http://clusterlabs.org/mailman/listinfo/users
>>>>>
>>>>> Project Home: http://www.clusterlabs.org
>>>>> Getting started:
>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>> Bugs: http://bugs.clusterlabs.org
>>>>>
>>>>
>>>> _______________________________________________
>>>> Users mailing list: Users at clusterlabs.org
>>>> http://clusterlabs.org/mailman/listinfo/users
>>>>
>>>> Project Home: http://www.clusterlabs.org
>>>> Getting started:
>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>> Bugs: http://bugs.clusterlabs.org
>>>>
>>>
>>> _______________________________________________
>>> Users mailing list: Users at clusterlabs.org
>>> http://clusterlabs.org/mailman/listinfo/users
>>>
>>> Project Home: http://www.clusterlabs.org
>>> Getting started:
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://bugs.clusterlabs.org
>>
>
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>
More information about the Users
mailing list