[Pacemaker] crm_mon on Node-2 shows both Node-1 & Node-2 as online but crm_mon on Node-1 shows Node-2 as offline

Parshvi parshvi.17 at gmail.com
Thu Apr 19 08:56:11 EDT 2012


1) What is the use of ssh without pass key between cluster nodes in pacemaker ?
  a. Use case:
    i. Two nodes in a cluster (Call them Node-1 and Node-2)
    ii. One interface configured in corosync.conf for its heartbeat or 
messaging. Eg. Bind net addr : 192.168.10.0
    iii. Another interface configured in /etc/hosts for hostname resolution. 
    Eg. IP: 192.168.129.10 Hostname: Node-1
    Eg. IP: 192.168.129.11 Hostname: Node-2
    iv. Hence for all ssh communication between the two nodes, hostname resolves 
to subnet 129 address.
    v. 12 services configured in active/passive mode
    vi. 1 service configured in master/slave mode
    vii. 8 services are non-sticky (they failback) in active/passive
    viii. 4 services are sticky (do not failback) in active/passive
    ix. Distribution: Node-1 is primary for 8 services (of which 4 are non-
sticky), Node-2 is preferred for 4 services of a total 12 (non-sticky)

  b. Observations:
    i. On Node-2, the interface was down over which IP: 192.168.129.11 Hostname: 
Node-2 was configured.
    ii. On Node-1 all interfaces were up.
    iii. Interface used by corosync for hearbeat/messaging was up at all times 
(Bind net addr : 192.168.10.0)
    iv. In crm_mon: Node-1 sees Node-2 as offline
        cibadmin --query fails to work (remote node did not respond)
    v. In crm_mon: Node-2 sees Node-1 as online
    vi. All the services were seen active on Node-1 (including those that were 
preferred for Node-2). Observed in crm_mon output.
    vii. 4 services for which Node-2 was preferred were seen active Node-2 also 
(hence 4 services active on both the nodes).
    Observed in crm_mon output: Only 4 services were shown active, the status of         
the rest of the services active on Node-1 did not reflect in crm_mon
    Even though crm_mon on Node-2 sees Node-1 as “online”.
  c. Errors in log file:
    i. On Node-2:
      1. Resource ocf::RscRA:rsc appears to be active on 2 nodes
      2. The above error appears for all the resources configured in pacemaker.


Query:
1) For what purpose does Pacemaker require “ssh without a pass key” to be 
enabled between the nodes in a cluster ?
2) For what purpose does Pacemaker use Node “hostname” for ? how Node “hostname” 
come into picture ?
3) Let’s say in a two node cluster two communication paths are available between 
the two nodes. 
  a. Eth1 and eth2.
  b. The hostname of the node resolves to IP Address on eth1.
  c. Consider, eth1 (network cable disconnected) goes down.
  d. Eth2 is up, but hostname does not resolve to the IP on eth2 (resolves to 
eth1 addr).
  e. Will this (hostname) have any issue ? 







More information about the Pacemaker mailing list