[Pacemaker] Failed-over incomplete

Fri Dec 5 03:17:42 EST 2014

В Thu, 4 Dec 2014 17:41:39 +0700
Teerapatr Kittiratanachai <maillist.tk at gmail.com> пишет:

> sorry for my mistyping,
> it's res.vBKN6
> 

pacemaker tried to stop res.vBKN6 but resource agent failed to do it

Dec 03 03:35:57 [2027] node0.ntt.co.th       crmd:   notice: process_lrm_event: 	LRM operation res.vBKN6_stop_0 (call=97, rc=1, cib-update=34, confirmed=true) unknown error

This means pacemaker cannot start res.vBKN6 anywhere else, at least
without going via node stonith. 

> --teenigma
> 
> On Thu, Dec 4, 2014 at 4:23 PM, Andrei Borzenkov <arvidjaar at gmail.com> wrote:
> > On Thu, Dec 4, 2014 at 9:52 AM, Teerapatr Kittiratanachai
> > <maillist.tk at gmail.com> wrote:
> >> Dear Andrei,
> >>
> >> Since the failed over is uncompleted so all the resource isn't failed
> >> over to another node.
> >>
> >> I think this case happened because of the res.vBKN is go into unmanaged state.
> >>
> >
> > There is no resource res.vBKN in your logs or configuration snippet
> > you have shown.
> >
> >> But why? Since there is no configuration is changed.
> >>
> >> --teenigma
> >>
> >> On Thu, Dec 4, 2014 at 1:41 PM, Andrei Borzenkov <arvidjaar at gmail.com> wrote:
> >>> On Thu, Dec 4, 2014 at 4:56 AM, Teerapatr Kittiratanachai
> >>> <maillist.tk at gmail.com> wrote:
> >>>> Dear List,
> >>>>
> >>>> We are using Pacemaker and Corosync with CMAN as our HA software as
> >>>> below version.
> >>>>
> >>>>     OS:            CentOS release 6.5 (Final) 64-bit
> >>>>     Pacemaker:        pacemaker.x86_64        1.1.10-14.el6_5.3
> >>>>     Corosync:        corosync.x86_64        1.4.1-17.el6_5.1
> >>>>     CMAN:            cman.x86_64            3.0.12.1-59.el6_5.2
> >>>>     Resource-Agent:    resource-agents.x86_64    3.9.5-3.12
> >>>>
> >>>>     Topology:        2 Nodes with Active/Standby model. (MySQL is
> >>>> Active/Active by clone)
> >>>>
> >>>> All packages are install from CentOS official repository, and the
> >>>> Resource-Agent is only one which be installed from OpenSUSE repository
> >>>> (http://download.opensuse.org/repositories/network:/ha-clustering:/Stable/CentOS_CentOS-6/).
> >>>>
> >>>> The system is work normally for few months until yesterday morning,
> >>>> around 03:35 UTC+0700, we found that one of resource is go into
> >>>> UNMANAGED state without any configuration changed. After another
> >>>> resource is failed, the pacemaker try to failed-over resource to
> >>>> another node but it incomplete after facing this resource.
> >>>>
> >>>> Configuration of some resource is below and the LOG during event is in
> >>>> attached file.
> >>>>
> >>>
> >>> The log just covers resource monitor failure and stopping of
> >>> resources. It does not contain any event related to starting resources
> >>> on another nodes.
> >>>
> >>> You would need to collect crm_report with start time before resource
> >>> failed and stop time after resources were started on another node.
> >>>
> >>>> primitive res.vBKN6 IPv6addr \
> >>>>         params ipv6addr="2001:db8:0:f::61a" cidr_netmask=64 nic=eth0 \
> >>>>         op monitor interval=10s
> >>>>
> >>>> primitive res.vDMZ6 IPv6addr \
> >>>>         params ipv6addr="2001:db8:0:9::61a" cidr_netmask=64 nic=eth1 \
> >>>>         op monitor interval=10s
> >>>>
> >>>> group gr.mainService res.vDMZ4 res.vDMZ6 res.vBKN4 res.vBKN6 res.http res.ftp
> >>>>
> >>>> rsc_defaults rsc_defaults-options: \
> >>>>         migration-threshold=1
> >>>>
> >>>> Please help me to solve this problem.
> >>>>
> >>>> --teenigma
> >>>>
> >>>> _______________________________________________
> >>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> >>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >>>>
> >>>> Project Home: http://www.clusterlabs.org
> >>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> >>>> Bugs: http://bugs.clusterlabs.org
> >>>>
> >>>
> >>> _______________________________________________
> >>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> >>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >>>
> >>> Project Home: http://www.clusterlabs.org
> >>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> >>> Bugs: http://bugs.clusterlabs.org
> >>
> >> _______________________________________________
> >> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >>
> >> Project Home: http://www.clusterlabs.org
> >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> >> Bugs: http://bugs.clusterlabs.org
> >
> > _______________________________________________
> > Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >
> > Project Home: http://www.clusterlabs.org
> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > Bugs: http://bugs.clusterlabs.org
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org