[Pacemaker] "stonith_admin -F node" results in a pair of reboots
    emmanuel segura 
    emi2fast at gmail.com
       
    Wed Jan  1 05:04:13 UTC 2014
    
    
  
maybe you missing log when you had fenced the node? because i think the
clvmd hungup because your node are in unclean state, use dlm_tool ls to see
if you any pending fencing operation.
2014/1/1 Bob Haxo <bhaxo at sgi.com>
>  Greetings ... Happy New Year!
>
> I am testing a configuration that is created from example in "Chapter 6.
> Configuring a GFS2 File System in a Cluster" of the "Red Hat Enterprise
> Linux 7.0 Beta Global File System 2" document.  Only addition is
> stonith:fence_ipmilan.  After encountering this issue when I configured
> with "crm", I re-configured using "pcs". I've included the configuration
> below.
>
> I'm thinking that, in a 2-node cluster, if I run "stonith_admin -F
> <peer-node>", then <peer-node> should reboot and cleanly rejoin the
> cluster.  This is not happening.
>
> What ultimately happens is that after the initially fenced node reboots,
> the system from which the stonith_admin -F command was run is fenced and
> reboots. The fencing stops there, leaving the cluster in an appropriate
> state.
>
> The issue seems to reside with clvmd/lvm.  With the reboot of the
> initially fenced node, the clvmd resource fails on the surviving node, with
> a maximum of errors.  I hypothesize there is an issue with locks, but have
> insufficient knowledge of clvmd/lvm locks to prove or disprove this
> hypothesis.
>
> Have I missed something ...
>
> 1) Is this expected behavior, and always the reboot of the fencing node
> happens?
>
> 2) Or, maybe I didn't correctly duplicate the Chapter 6 example?
>
> 3) Or, perhaps something is wrong or omitted from the Chapter 6 example?
>
> Suggestions will be much appreciated.
>
> Thanks,
> Bob Haxo
>
> RHEL6.5
> pacemaker-cli-1.1.10-14.el6_5.1.x86_64
> crmsh-1.2.5-55.1sgi709r3.rhel6.x86_64
> pacemaker-libs-1.1.10-14.el6_5.1.x86_64
> cman-3.0.12.1-59.el6_5.1.x86_64
> pacemaker-1.1.10-14.el6_5.1.x86_64
> corosynclib-1.4.1-17.el6.x86_64
> corosync-1.4.1-17.el6.x86_64
> pacemaker-cluster-libs-1.1.10-14.el6_5.1.x86_64
>
> Cluster Name: mici
> Corosync Nodes:
>
> Pacemaker Nodes:
> mici-admin mici-admin2
>
> Resources:
> Clone: clusterfs-clone
>   Meta Attrs: interleave=true target-role=Started
>   Resource: clusterfs (class=ocf provider=heartbeat type=Filesystem)
>    Attributes: device=/dev/vgha2/lv_clust2 directory=/images fstype=gfs2
> options=defaults,noatime,nodiratime
>    Operations: monitor on-fail=fence interval=30s
> (clusterfs-monitor-interval-30s)
> Clone: clvmd-clone
>   Meta Attrs: interleave=true ordered=true target-role=Started
>   Resource: clvmd (class=lsb type=clvmd)
>    Operations: monitor on-fail=fence interval=30s
> (clvmd-monitor-interval-30s)
> Clone: dlm-clone
>   Meta Attrs: interleave=true ordered=true
>   Resource: dlm (class=ocf provider=pacemaker type=controld)
>    Operations: monitor on-fail=fence interval=30s
> (dlm-monitor-interval-30s)
>
> Stonith Devices:
> Resource: p_ipmi_fencing_1 (class=stonith type=fence_ipmilan)
>   Attributes: ipaddr=128.##.##.78 login=XXXXX passwd=XXXXX lanplus=1
> action=reboot pcmk_host_check=static-list pcmk_host_list=mici-admin
>   Meta Attrs: target-role=Started
>   Operations: monitor start-delay=30 interval=60s timeout=30
> (p_ipmi_fencing_1-monitor-60s)
> Resource: p_ipmi_fencing_2 (class=stonith type=fence_ipmilan)
>   Attributes: ipaddr=128.##.##.220 login=XXXXX passwd=XXXXX lanplus=1
> action=reboot pcmk_host_check=static-list pcmk_host_list=mici-admin2
>   Meta Attrs: target-role=Started
>   Operations: monitor start-delay=30 interval=60s timeout=30
> (p_ipmi_fencing_2-monitor-60s)
> Fencing Levels:
>
> Location Constraints:
>   Resource: p_ipmi_fencing_1
>     Disabled on: mici-admin (score:-INFINITY)
> (id:location-p_ipmi_fencing_1-mici-admin--INFINITY)
>   Resource: p_ipmi_fencing_2
>     Disabled on: mici-admin2 (score:-INFINITY)
> (id:location-p_ipmi_fencing_2-mici-admin2--INFINITY)
> Ordering Constraints:
>   start dlm-clone then start clvmd-clone (Mandatory)
> (id:order-dlm-clone-clvmd-clone-mandatory)
>   start clvmd-clone then start clusterfs-clone (Mandatory)
> (id:order-clvmd-clone-clusterfs-clone-mandatory)
> Colocation Constraints:
>   clusterfs-clone with clvmd-clone (INFINITY)
> (id:colocation-clusterfs-clone-clvmd-clone-INFINITY)
>   clvmd-clone with dlm-clone (INFINITY)
> (id:colocation-clvmd-clone-dlm-clone-INFINITY)
>
> Cluster Properties:
> cluster-infrastructure: cman
> dc-version: 1.1.10-14.el6_5.1-368c726
> last-lrm-refresh: 1388530552
> no-quorum-policy: ignore
> stonith-enabled: true
> Node Attributes:
> mici-admin: standby=off
> mici-admin2: standby=off
>
>
> Last updated: Tue Dec 31 17:15:55 2013
> Last change: Tue Dec 31 16:57:37 2013 via cibadmin on mici-admin
> Stack: cman
> Current DC: mici-admin2 - partition with quorum
> Version: 1.1.10-14.el6_5.1-368c726
> 2 Nodes configured
> 8 Resources configured
>
> Online: [ mici-admin mici-admin2 ]
>
> Full list of resources:
>
> p_ipmi_fencing_1        (stonith:fence_ipmilan):        Started mici-admin2
> p_ipmi_fencing_2        (stonith:fence_ipmilan):        Started mici-admin
> Clone Set: clusterfs-clone [clusterfs]
>      Started: [ mici-admin mici-admin2 ]
> Clone Set: clvmd-clone [clvmd]
>      Started: [ mici-admin mici-admin2 ]
> Clone Set: dlm-clone [dlm]
>      Started: [ mici-admin mici-admin2 ]
>
> Migration summary:
> * Node mici-admin:
> * Node mici-admin2:
>
> =====================================================
> crm_mon  after the fenced node reboots.  Shows the failure of clvmd that
> then
> occurs, which in turn triggers a fencing of that nnode
>
> Last updated: Tue Dec 31 17:06:55 2013
> Last change: Tue Dec 31 16:57:37 2013 via cibadmin on mici-admin
> Stack: cman
> Current DC: mici-admin - partition with quorum
> Version: 1.1.10-14.el6_5.1-368c726
> 2 Nodes configured
> 8 Resources configured
>
> Node mici-admin: UNCLEAN (online)
> Online: [ mici-admin2 ]
>
> Full list of resources:
>
> p_ipmi_fencing_1        (stonith:fence_ipmilan):        Stopped
> p_ipmi_fencing_2        (stonith:fence_ipmilan):        Started mici-admin
> Clone Set: clusterfs-clone [clusterfs]
>      Started: [ mici-admin ]
>      Stopped: [ mici-admin2 ]
> Clone Set: clvmd-clone [clvmd]
>      clvmd      (lsb:clvmd):    FAILED mici-admin
>      Stopped: [ mici-admin2 ]
> Clone Set: dlm-clone [dlm]
>      Started: [ mici-admin mici-admin2 ]
>
> Migration summary:
> * Node mici-admin:
>    clvmd: migration-threshold=1000000 fail-count=1 last-failure='Tue Dec
> 31 17:04:29 2013'
> * Node mici-admin2:
>
> Failed actions:
>     clvmd_monitor_30000 on mici-admin 'unknown error' (1): call=60,
> status=Timed Out, la
> st-rc-change='Tue Dec 31 17:04:29 2013', queued=0ms, exec=0ms
>
>
>
>
>
>
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>
>
-- 
esta es mi vida e me la vivo hasta que dios quiera
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20140101/16c05c5f/attachment.htm>
    
    
More information about the Pacemaker
mailing list