[ClusterLabs] ringid interface FAULTY no resource move

Wed May 4 10:19:59 EDT 2016

On 05/04/2016 07:14 AM, Rafał Sanocki wrote:
> Hello,
> I cant find what i did wrong. I have 2 node cluster, Corosync ,Pacemaker
> , DRBD .  When i plug out cable nothing happend.
> 
> Corosync.conf
> 
> # Please read the corosync.conf.5 manual page
> totem {
>         version: 2
>         crypto_cipher: none
>         crypto_hash: none
>         rrp_mode: passive
> 
>         interface {
>                 ringnumber: 0
>                 bindnetaddr: 172.17.10.0
>                 mcastport: 5401
>                 ttl: 1
>         }
>         interface {
>                 ringnumber: 1
>                 bindnetaddr: 255.255.255.0
>                 mcastport: 5409
>                 ttl: 1
>         }

255.255.255.0 is not a valid bindnetaddr. bindnetaddr should be the IP
network address (not netmask) of the desired interface.

Also, the point of rrp is to have two redundant network links. So
unplugging one shouldn't cause problems, if the other is still up.

> 
>         transport: udpu
> }
> 
> logging {
>         fileline: off
>         to_logfile: yes
>         to_syslog: yes
>         logfile: /var/log/cluster/corosync.log
>         debug: off
>         timestamp: on
>         logger_subsys {
>                 subsys: QUORUM
>                 debug: off
>         }
> }
> 
> nodelist {
>         node {
>                 ring0_addr: 172.17.10.81
>                 ring1_addr: 255.255.255.1
>                 nodeid: 1
>         }
>         node {
>                 ring0_addr: 172.17.10.89
>                 ring1_addr: 255.255.255.9
>                 nodeid: 2
>         }
> 
> }
> quorum {
>         # Enable and configure quorum subsystem (default: off)
>         # see also corosync.conf.5 and votequorum.5
>         provider: corosync_votequorum
> }
> 
> crm config
> 
> crm(live)configure# show
> node 1: cs01A
> node 2: cs01B
> primitive p_drbd2dev ocf:linbit:drbd \
>         params drbd_resource=b1 \
>         op monitor interval=29s role=Master \
>         op monitor interval=31s role=Slave \
>         meta target-role=Started
> primitive p_exportfs_fs2 exportfs \
>         params fsid=101 directory="/data1/b1"
> options="rw,sync,no_root_squash,insecure,anonuid=100,anongid=101,nohide"
> clientspec="172.17.10.0/255.255.255.0" wait_for_leasetime_on_stop=false \
>         op monitor interval=30s \
>         op start interval=0 timeout=240s \
>         op stop interval=0 timeout=100s \
>         meta target-role=Started
> primitive p_ip_2 IPaddr2 \
>         params ip=172.17.10.97 nic=neteth0 cidr_netmask=24 \
>         op monitor interval=30s timeout=5s \
>         meta target-role=Started
> primitive p_mount_fs2 Filesystem \
>         params fstype=reiserfs options="noatime,nodiratime,notail"
> directory="/data1" device="/dev/drbd2" \
>         op start interval=0 timeout=400s \
>         op stop interval=0 timeout=100s \
>         op monitor interval=30s \
>         meta target-role=Started
> group g_nfs2 p_ip_2 p_mount_fs2 p_exportfs_fs2
> ms ms_drbd2 p_drbd2dev \
>         meta master-max=1 master-node-max=1 clone-max=2 clone-node-max=1
> notify=true is-managed=true target-role=Slave
> colocation co_drbd2 inf: g_nfs2 ms_drbd2:Master
> order ms_drbd2_order Mandatory: ms_drbd2:promote g_nfs2:start
> property cib-bootstrap-options: \
>         stonith-enabled=false \
>         have-watchdog=true \
>         dc-version=1.1.14-535193a \
>         cluster-infrastructure=corosync \
>         maintenance-mode=false \
>         no-quorum-policy=ignore \
>         last-lrm-refresh=1460627538
> 
> 
> # ip addr show
> neteth1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq portid
> d8d385bda90c state DOWN group default qlen 1000
>     link/ether d8:d3:85:aa:aa:aa brd ff:ff:ff:ff:ff:ff
>     inet 255.255.255.1/24 brd 255.255.255.255 scope global neteth1
>        valid_lft forever preferred_lft forever
> 
> # corosync-cfgtool -s
> Printing ring status.
> Local node ID 1
> RING ID 0
>         id      = 172.17.10.81
>         status  = ring 0 active with no faults
> RING ID 1
>         id      = 255.255.255.1
>         status  = Marking ringid 1 interface 255.255.255.1 FAULTY
> 
> #crm_mon -A
> 
> Stack: corosync
> Current DC: csb01A (version 1.1.14-535193a) - partition with quorum
> Last updated: Wed May  4 14:11:34 2016          Last change: Thu Apr 14
> 13:06:15 2016 by root via crm_resource on csb01B
> 
> 2 nodes and 5 resources configured: 2 resources DISABLED and 0 BLOCKED
> from being started due to failures
> 
> Online: [ cs01A cs01B ]
> 
>  Resource Group: g_nfs2
>      p_ip_2     (ocf::heartbeat:IPaddr2):       Started csb01A
>      p_mount_fs2        (ocf::heartbeat:Filesystem):    Started csb01A
>      p_exportfs_fs2     (ocf::heartbeat:exportfs):      Started csb01A
>  Master/Slave Set: ms_drbd2 [p_drbd2dev]
>      Masters: [ csb01A ]
>      Slaves (target-role): [ csb01B ]
> 
> Node Attributes:
> * Node csb01A:
>     + master-p_drbd2dev                 : 10000
> * Node csb01B:
>     + master-p_drbd2dev                 : 1000
>