[ClusterLabs] pacemaker remote configuration on ubuntu 14.04

Tue Mar 22 15:55:42 UTC 2016

On 03/19/2016 10:40 PM, Сергей Филатов wrote:
> I’m fairly new to pacemaker, could you tell me what could the blocker?

It's not clear from this information. There do not appear to be any
constraints related to compute-1, so that doesn't seem to be an issue.

Make sure it is enabled (pcs resource enable compute-1), and make sure
it is not blocked by previous errors (pcs resource cleanup compute-1).
Then see if it comes up or reports a new error (pcs status).

FYI, your colocation constraints mean that the specified resources will
be placed together (on the same node). However you have no ordering
constraints, so they may be started in any order. That might be OK in
your situation, or not.

> root at controller-1:~# pcs constraint
> Location Constraints:
>   Resource: clone_p_dns
>     Enabled on: controller-1.domain.com (score:100)
>   Resource: clone_p_haproxy
>     Enabled on: controller-1.domain.com (score:100)
>   Resource: clone_p_heat-engine
>     Enabled on: controller-1.domain.com (score:100)
>   Resource: clone_p_mysql
>     Enabled on: controller-1.domain.com (score:100)
>   Resource: clone_p_neutron-dhcp-agent
>     Enabled on: controller-1.domain.com (score:100)
>   Resource: clone_p_neutron-l3-agent
>     Enabled on: controller-1.domain.com (score:100)
>   Resource: clone_p_neutron-metadata-agent
>     Enabled on: controller-1.domain.com (score:100)
>   Resource: clone_p_neutron-plugin-openvswitch-agent
>     Enabled on: controller-1.domain.com (score:100)
>   Resource: clone_p_ntp
>     Enabled on: controller-1.domain.com (score:100)
>   Resource: clone_p_vrouter
>     Enabled on: controller-1.domain.com (score:100)
>   Resource: clone_ping_vip__public
>     Enabled on: controller-1.domain.com (score:100)
>   Resource: master_p_conntrackd
>     Enabled on: controller-1.domain.com (score:100)
>   Resource: master_p_rabbitmq-server
>     Enabled on: controller-1.domain.com (score:100)
>   Resource: vip__management
>     Enabled on: controller-1.domain.com (score:100)
>   Resource: vip__public
>     Enabled on: controller-1.domain.com (score:100)
>     Constraint: loc_ping_vip__public
>       Rule: score=-INFINITY boolean-op=or
>         Expression: not_defined pingd
>         Expression: pingd lte 0
>   Resource: vip__vrouter
>     Enabled on: controller-1.domain.com (score:100)
>   Resource: vip__vrouter_pub
>     Enabled on: controller-1.domain.com (score:100)
> Ordering Constraints:
> Colocation Constraints:
>   vip__vrouter with vip__vrouter_pub
>   vip__management with clone_p_haproxy
>   vip__public with clone_p_haproxy
>   clone_p_dns with clone_p_vrouter
>   vip__vrouter_pub with master_p_conntrackd (rsc-role:Started) (with-rsc-role:Master)
> 
> 
> crm configure show:
> 
> node 14: controller-1.domain.com
> primitive compute-1 ocf:pacemaker:remote \
>         op monitor interval=60
> primitive p_conntrackd ocf:fuel:ns_conntrackd \
>         op monitor interval=30 timeout=60 \
>         op monitor interval=27 role=Master timeout=60 \
>         meta migration-threshold=INFINITY failure-timeout=180s
> primitive p_dns ocf:fuel:ns_dns \
>         op monitor interval=20 timeout=10 \
>         op start interval=0 timeout=30 \
>         op stop interval=0 timeout=30 \
>         params ns=vrouter \
>         meta migration-threshold=3 failure-timeout=120
> primitive p_haproxy ocf:fuel:ns_haproxy \
>         op monitor interval=30 timeout=60 \
>         op start interval=0 timeout=60 \
>         op stop interval=0 timeout=60 \
>         params ns=haproxy debug=false other_networks="172.21.1.0/24 192.168.33.0/24 192.168.31.0/24 192.168.32.0/24 10.2.55.0/24" \
>         meta migration-threshold=3 failure-timeout=120
> primitive p_heat-engine ocf:fuel:heat-engine \
>         op monitor interval=20 timeout=30 \
>         op start interval=0 timeout=60 \
>         op stop interval=0 timeout=60 \
>         meta resource-stickiness=1 migration-threshold=3
> primitive p_mysql ocf:fuel:mysql-wss \
>         op monitor interval=60 timeout=55 \
>         op start interval=0 timeout=300 \
>         op stop interval=0 timeout=120 \
>         params test_user=wsrep_sst test_passwd=mlNsGR89 socket="/var/run/mysqld/mysqld.sock"
> primitive p_neutron-dhcp-agent ocf:fuel:ocf-neutron-dhcp-agent \
>         op monitor interval=20 timeout=10 \
>         op start interval=0 timeout=60 \
>         op stop interval=0 timeout=60 \
>         params plugin_config="/etc/neutron/dhcp_agent.ini" remove_artifacts_on_stop_start=true
> primitive p_neutron-l3-agent ocf:fuel:ocf-neutron-l3-agent \
>         op monitor interval=20 timeout=10 \
>         op start interval=0 timeout=60 \
>         op stop interval=0 timeout=60 \
>         params plugin_config="/etc/neutron/l3_agent.ini" remove_artifacts_on_stop_start=true
> primitive p_neutron-metadata-agent ocf:fuel:ocf-neutron-metadata-agent \
>         op monitor interval=60 timeout=10 \
>         op start interval=0 timeout=30 \
>         op stop interval=0 timeout=30
> primitive p_neutron-plugin-openvswitch-agent ocf:fuel:ocf-neutron-ovs-agent \
>         op monitor interval=20 timeout=10 \
> 
>> On 11 Mar 2016, at 14:11, Ken Gaillot <kgaillot at redhat.com> wrote:
>>
>> On 03/10/2016 11:36 PM, Сергей Филатов wrote:
>>> This one is the right log
>>
>> Something in the cluster configuration and state (for example, an
>> unsatisfied constraint) is preventing the cluster from starting the
>> resource:
>>
>> Mar 10 04:00:53 [11785] controller-1.domain.com    pengine:     info:
>> native_print:     compute-1       (ocf::pacemaker:remote):        Stopped
>> Mar 10 04:00:53 [11785] controller-1.domain.com    pengine:     info:
>> native_color:     Resource compute-1 cannot run anywhere
>>
>>
>>>
>>>
>>>
>>>> On 10 Mar 2016, at 08:17, Сергей Филатов <filatecs at gmail.com 
>>>> <mailto:filatecs at gmail.com>> wrote:
>>>>
>>>> pcs resource show compute-1
>>>>
>>>> Resource: compute-1 (class=ocf provider=pacemaker type=remote)
>>>> Operations: monitor interval=60s (compute-1-monitor-interval-60s)
>>>>
>>>> Can’t find _start_0 template in pacemaker logs
>>>> I don’t have ipv6 address for remote node, but I guess it should be listening 
>>>> on both
>>>>
>>>> attached pacemaker.log for cluster node
>>>> <pacemaker.log.tar.gz>
>>>>
>>>>
>>>>> On 09 Mar 2016, at 10:23, Ken Gaillot <kgaillot at redhat.com 
>>>>> <mailto:kgaillot at redhat.com>> wrote:
>>>>>
>>>>> On 03/08/2016 11:38 PM, Сергей Филатов wrote:
>>>>>> ssh -p 3121 compute-1
>>>>>> ssh_exchange_identification: read: Connection reset by peer
>>>>>>
>>>>>> That’s what I get in /var/log/pacemaker.log after restarting pacemaker_remote:
>>>>>> Mar 09 05:30:27 [28031] compute-1.domain.com <http://compute-1.domain.com/> 
>>>>>>      lrmd:     info: crm_signal_dispatch:  Invoking handler for signal 15: 
>>>>>> Terminated
>>>>>> Mar 09 05:30:27 [28031] compute-1.domain.com <http://compute-1.domain.com/> 
>>>>>>      lrmd:     info: lrmd_shutdown:        Terminating with  0 clients
>>>>>> Mar 09 05:30:27 [28031] compute-1.domain.com <http://compute-1.domain.com/> 
>>>>>>      lrmd:     info: qb_ipcs_us_withdraw:  withdrawing server sockets
>>>>>> Mar 09 05:30:27 [28031] compute-1.domain.com <http://compute-1.domain.com/> 
>>>>>>      lrmd:     info: crm_xml_cleanup:      Cleaning up memory from libxml2
>>>>>> Mar 09 05:30:27 [28193] compute-1.domain.com <http://compute-1.domain.com/> 
>>>>>>      lrmd:     info: crm_log_init:         Changed active directory to 
>>>>>> /var/lib/heartbeat/cores/root
>>>>>> Mar 09 05:30:27 [28193] compute-1.domain.com <http://compute-1.domain.com/> 
>>>>>>      lrmd:     info: qb_ipcs_us_publish:   server name: lrmd
>>>>>> Mar 09 05:30:27 [28193] compute-1.domain.com <http://compute-1.domain.com/> 
>>>>>>      lrmd:   notice: lrmd_init_remote_tls_server:  Starting a tls listener 
>>>>>> on port 3121.
>>>>>> Mar 09 05:30:28 [28193] compute-1.domain.com <http://compute-1.domain.com/> 
>>>>>>      lrmd:   notice: bind_and_listen:      Listening on address ::
>>>>>> Mar 09 05:30:28 [28193] compute-1.domain.com <http://compute-1.domain.com/> 
>>>>>>      lrmd:     info: qb_ipcs_us_publish:   server name: cib_ro
>>>>>> Mar 09 05:30:28 [28193] compute-1.domain.com <http://compute-1.domain.com/> 
>>>>>>      lrmd:     info: qb_ipcs_us_publish:   server name: cib_rw
>>>>>> Mar 09 05:30:28 [28193] compute-1.domain.com <http://compute-1.domain.com/> 
>>>>>>      lrmd:     info: qb_ipcs_us_publish:   server name: cib_shm
>>>>>> Mar 09 05:30:28 [28193] compute-1.domain.com <http://compute-1.domain.com/> 
>>>>>>      lrmd:     info: qb_ipcs_us_publish:   server name: attrd
>>>>>> Mar 09 05:30:28 [28193] compute-1.domain.com <http://compute-1.domain.com/> 
>>>>>>      lrmd:     info: qb_ipcs_us_publish:   server name: stonith-ng
>>>>>> Mar 09 05:30:28 [28193] compute-1.domain.com <http://compute-1.domain.com/> 
>>>>>>      lrmd:     info: qb_ipcs_us_publish:   server name: crmd
>>>>>> Mar 09 05:30:28 [28193] compute-1.domain.com <http://compute-1.domain.com/> 
>>>>>>      lrmd:     info: main:         Starting
>>>>>
>>>>> It looks like the cluster is not even trying to connect to the remote
>>>>> node. pacemaker_remote here is binding only to IPv6, so the cluster will
>>>>> need to contact it on that address.
>>>>>
>>>>> What is your ocf:pacemaker:remote resource configuration?
>>>>>
>>>>> Check your cluster node logs for the start action -- if your resource is
>>>>> named R, the start action will be R_start_0. There will be two nodes of
>>>>> interest: the node assigned the remote node resource, and the DC.
>>>>>
>>>>>> I got only pacemaker-remote resource-agents pcs installed, so no 
>>>>>> /etc/default/pacemaker file on remote node
>>>>>> selinux is disabled and I specifically opened firewall on 2224, 3121 and 
>>>>>> 21064 tcp and 5405 udp
>>>>>>
>>>>>>> On 08 Mar 2016, at 08:51, Ken Gaillot <kgaillot at redhat.com 
>>>>>>> <mailto:kgaillot at redhat.com>> wrote:
>>>>>>>
>>>>>>> On 03/07/2016 09:10 PM, Сергей Филатов wrote:
>>>>>>>> Thanks for an answer. Turned out the problem was not in ipv6.
>>>>>>>> Remote node is listening on 3121 port and it’s name is resolving fine.
>>>>>>>> Got authkey file at /etc/pacemaker on both remote and cluster nodes.
>>>>>>>> What can I check in addition? Is there any walkthrough for ubuntu?
>>>>>>>
>>>>>>> Nothing specific to ubuntu, but there's not much distro-specific to it.
>>>>>>>
>>>>>>> If you "ssh -p 3121" to the remote node from a cluster node, what do you
>>>>>>> get?
>>>>>>>
>>>>>>> pacemaker_remote will use the usual log settings for pacemaker (probably
>>>>>>> /var/log/pacemaker.log, probably configured in /etc/default/pacemaker on
>>>>>>> ubuntu). You should see "New remote connection" in the remote node's log
>>>>>>> when the cluster tries to connect, and "LRMD client connection
>>>>>>> established" if it's successful.
>>>>>>>
>>>>>>> As always, check for firewall and SELinux issues.
>>>>>>>
>>>>>>>>
>>>>>>>>> On 07 Mar 2016, at 09:40, Ken Gaillot <kgaillot at redhat.com 
>>>>>>>>> <mailto:kgaillot at redhat.com>> wrote:
>>>>>>>>>
>>>>>>>>> On 03/06/2016 07:43 PM, Сергей Филатов wrote:
>>>>>>>>>> Hi,
>>>>>>>>>> I’m trying to set up pacemaker_remote resource on ubuntu 14.04
>>>>>>>>>> I followed "remote node walkthrough” guide 
>>>>>>>>>> (http://clusterlabs.org/doc/en-US/Pacemaker/1.1/html-single/Pacemaker_Remote/#idm140473081667280 
>>>>>>>>>> <http://clusterlabs.org/doc/en-US/Pacemaker/1.1/html-single/Pacemaker_Remote/#idm140473081667280>)
>>>>>>>>>> After creating ocf:pacemaker:remote resource on cluster node, remote 
>>>>>>>>>> node doesn’t show up as online.
>>>>>>>>>> I guess I need to configure remote agent to listen on ipv4, where can I 
>>>>>>>>>> configure it?
>>>>>>>>>> Or is there any other steps to set up remote node besides the ones 
>>>>>>>>>> mentioned in guide?
>>>>>>>>>> tcp6       0      0 :::3121                 :::* 
>>>>>>>>>>                   LISTEN      21620/pacemaker_rem off (0.00/0/0)
>>>>>>>>>>
>>>>>>>>>> pacemaker and pacemaker_remote are 1.12 version
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> pacemaker_remote will try to bind to IPv6 addresses first, and only if
>>>>>>>>> that fails, will it bind to IPv4. There is no way to configure this
>>>>>>>>> behavior currently, though it obviously would be nice to have.
>>>>>>>>>
>>>>>>>>> The only workarounds I can think of are to make IPv6 connections work
>>>>>>>>> between the cluster and the remote node, or disable IPv6 on the remote
>>>>>>>>> node. Using IPv6, there could be an issue if your name resolution
>>>>>>>>> returns both IPv4 and IPv6 addresses for the remote host; you could
>>>>>>>>> potentially work around that by adding an IPv6-only name for it, and
>>>>>>>>> using that as the server option to the remote resource.