[Pacemaker] host came online, but is ignored

Andrew Beekhof beekhof at gmail.com
Wed Mar 18 11:42:17 EDT 2009


On Wed, Mar 18, 2009 at 13:29, Juha Heinanen <jh at tutpro.com> wrote:
> i kept on testing the example configuration and found a failure situation
> when i rebooted the host (lenny1) that was online, but was not master.
>
> starting situation:
>
> root at lenny2:~# crm_mon -1
>
> ============
> Last updated: Wed Mar 18 14:12:09 2009
> Current DC: lenny2 (f13aff7b-6c94-43ac-9a24-b118e62d5325)
> Version: 1.0.2-ec6b0bbee1f3aa72c4c2559997e675db6ab39160
> 2 Nodes configured.
> 2 Resources configured.
> ============
>
> Node: lenny1 (8df8447f-6ecf-41a7-a131-c89fd59a120d): OFFLINE
> Node: lenny2 (f13aff7b-6c94-43ac-9a24-b118e62d5325): online
>
> Master/Slave Set: ms-drbd0
>    drbd0:0     (ocf::heartbeat:drbd):  Stopped
>    drbd0:1     (ocf::heartbeat:drbd):  Master lenny2
> Resource Group: mysql-group
>    fs0 (ocf::heartbeat:Filesystem):    Started lenny2
>    mysql-server        (lsb:mysql):    Started lenny2
>    virtual-ip  (ocf::heartbeat:IPaddr2):       Started lenny2
>
> lenny1 is powered off and ressources are running ok on lenny2.
>
> then i power on lenny1 and expect the node show up as "online".
> but it doesn't.  below is what came to the log.  any ideas why lenny1 is
> ignored?  this is with version 2.99.2 of heartbeat.

crmd[1923]: 2009/03/18_14:13:33 WARN: crmd_ha_msg_callback: Ignoring
HA message (op=join_announce) from lenny1: not in our membership list
(size=1)

apparently some part of the cluster is under the impression lenny1 is
not part of the cluster.
there's not enough information to decide which part, but from what i
see above, i suspect its a CCM issue.


>
> -- juha
>
> this is when lenny1 was powered off:
>
> root at lenny2:~# heartbeat[1831]: 2009/03/18_14:12:32 WARN: node lenny1: is dead
> heartbeat[1831]: 2009/03/18_14:12:32 info: Link lenny1:eth1 dead.
> crmd[1923]: 2009/03/18_14:12:32 notice: crmd_ha_status_callback: Status update: Node lenny1 now has status [dead] (DC=true)
> crmd[1923]: 2009/03/18_14:12:32 info: crm_update_peer_proc: lenny1.ais is now offline
> crmd[1923]: 2009/03/18_14:12:32 info: te_graph_trigger: Transition 12 is now complete
> crmd[1923]: 2009/03/18_14:12:32 info: notify_crmd: Transition 12 status: done - <null>
>
> and now it is powered on again:
>
> heartbeat[1831]: 2009/03/18_14:12:56 info: Heartbeat restart on node lenny1
> heartbeat[1831]: 2009/03/18_14:12:56 info: Link lenny1:eth1 up.
> heartbeat[1831]: 2009/03/18_14:12:56 info: Status update for node lenny1: status init
> heartbeat[1831]: 2009/03/18_14:12:56 info: Status update for node lenny1: status up
> crmd[1923]: 2009/03/18_14:12:56 notice: crmd_ha_status_callback: Status update: Node lenny1 now has status [init] (DC=true)
> crmd[1923]: 2009/03/18_14:12:56 info: crm_update_peer_proc: lenny1.ais is now online
> crmd[1923]: 2009/03/18_14:12:56 notice: crmd_ha_status_callback: Status update: Node lenny1 now has status [up] (DC=true)
> heartbeat[1831]: 2009/03/18_14:13:26 info: Status update for node lenny1: status active
> crmd[1923]: 2009/03/18_14:13:26 notice: crmd_ha_status_callback: Status update: Node lenny1 now has status [active] (DC=true)
> cib[1919]: 2009/03/18_14:13:26 info: cib_client_status_callback: Status update: Client lenny1/cib now has status [join]
> cib[1919]: 2009/03/18_14:13:26 info: crm_update_peer_proc: lenny1.cib is now online
> heartbeat[1831]: 2009/03/18_14:13:30 WARN: 1 lost packet(s) for [lenny1] [55:57]
> heartbeat[1831]: 2009/03/18_14:13:30 info: No pkts missing from lenny1!
> crmd[1923]: 2009/03/18_14:13:30 notice: crmd_client_status_callback: Status update: Client lenny1/crmd now has status [online] (DC=true)
> crmd[1923]: 2009/03/18_14:13:30 info: crm_update_peer_proc: lenny1.crmd is now online
> heartbeat[1831]: 2009/03/18_14:13:31 WARN: 1 lost packet(s) for [lenny1] [59:61]
> heartbeat[1831]: 2009/03/18_14:13:31 info: No pkts missing from lenny1!
> crmd[1923]: 2009/03/18_14:13:33 WARN: crmd_ha_msg_callback: Ignoring HA message (op=join_announce) from lenny1: not in our membership list (size=1)
> crmd[1923]: 2009/03/18_14:13:43 WARN: crmd_ha_msg_callback: Ignoring HA message (op=vote) from lenny1: not in our membership list (size=1)
> cib[1919]: 2009/03/18_14:13:46 WARN: cib_peer_callback: Discarding cib_slave_all message (50) from lenny1: not in our membership
> cib[1919]: 2009/03/18_14:13:47 WARN: cib_peer_callback: Discarding cib_replace message (54) from lenny1: not in our membership
> cib[1919]: 2009/03/18_14:13:48 WARN: cib_peer_callback: Discarding cib_apply_diff message (58) from lenny1: not in our membership
> cib[1919]: 2009/03/18_14:13:50 WARN: cib_peer_callback: Discarding cib_apply_diff message (5c) from lenny1: not in our membership
> cib[1919]: 2009/03/18_14:13:51 WARN: cib_peer_callback: Discarding cib_apply_diff message (5e) from lenny1: not in our membership
> cib[1919]: 2009/03/18_14:13:51 WARN: cib_peer_callback: Discarding cib_apply_diff message (5f) from lenny1: not in our membership
> cib[1919]: 2009/03/18_14:13:51 WARN: cib_peer_callback: Discarding cib_apply_diff message (60) from lenny1: not in our membership
> cib[1919]: 2009/03/18_14:13:51 WARN: cib_peer_callback: Discarding cib_apply_diff message (61) from lenny1: not in our membership
> cib[1919]: 2009/03/18_14:13:51 WARN: cib_peer_callback: Discarding cib_apply_diff message (62) from lenny1: not in our membership
> heartbeat[1831]: 2009/03/18_14:16:01 info: all clients are now paused
> cib[1919]: 2009/03/18_14:16:27 info: cib_stats: Processed 32 operations (19062.00us average, 0% utilization) in the last 10min
> heartbeat[1831]: 2009/03/18_14:18:02 WARN: Message hist queue is filling up (376 messages in queue)
> heartbeat[1831]: 2009/03/18_14:18:03 WARN: Message hist queue is filling up (377 messages in queue)
> ...
>
> _______________________________________________
> Pacemaker mailing list
> Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>




More information about the Pacemaker mailing list