[Pacemaker] crmd - set dc unset dc loop

Andreas Kurz andreas at hastexo.com
Thu Jan 19 08:38:55 EST 2012


Hello,

On 01/19/2012 12:21 PM, Philippe Carbonnier wrote:
> Hello Andreas, and others,
> 
> I'm sorry not having well explain my problem (I'm french and don't speak
> english fluently...).

np .. I'm also not a native speaker ;-)

> The loop has arrived after putting ujoss1 (not ujboss2) standy, this is
> why I put more log for ujboss1 and not ujboss2.
> 
> I'll try to answer your questions :
> the network setup is simple : ujboss1 ans ujboss2 are on the same lan,
> without firewall between. Perhaps it's not what you was looking for?

hmm ... looks really like a short split brain ... default timeouts in
corosync are quite short at 1000ms for token lost ... try e.g. "token:
5000" in the "totem" section of your corosync config.

Regards,
Andreas

-- 
Need help with Pacemaker?
http://www.hastexo.com/now

> 
> corosync.conf :
> aisexec {
>         user:   root
>         group:  root
> }
> service {
>         name: pacemaker
>         ver:  0
> }
> compatibility: whitetank
> totem {
>         version: 2
>         secauth: on
>         threads: 0
>         interface {
>                 ringnumber: 0
>                 bindnetaddr: 128.1.0.0
>                 mcastaddr: 226.94.1.1
>                 mcastport: 5405
>         }
> }
> logging {
>         fileline: off
>         to_stderr: no
>         to_logfile: yes
>         to_syslog: no
>         logfile: /var/log/cluster/corosync.log
>         debug: off
>         timestamp: on
>         logger_subsys {
>                 subsys: AMF
>                 debug: off
>         }
> }
> amf {
>         mode: disabled
> }
> 
> and crm configure show :
> node ujbossterrena1 \
>         attributes standby="off"
> node ujbossterrena2 \
>         attributes standby="off"
> primitive clusterIP ocf:heartbeat:IPaddr2 \
>         params ip="128.1.13.9" cidr_netmask="32" iflabel="jbossfailover" \
>         op monitor interval="30s"
> primitive routing-jboss lsb:routing-jboss \
>         op monitor interval="30s"
> group vifGroup clusterIP routing-jboss \
>         meta target-role="Started"
> location prefer-clusterIP clusterIP 50: ujboss1
> property $id="cib-bootstrap-options" \
>         dc-version="1.0.10-da7075976b5ff0bee71074385f8fd02f296ec8a3" \
>         cluster-infrastructure="openais" \
>         expected-quorum-votes="2" \
>         stonith-enabled="false" \
>         no-quorum-policy="ignore"
> rsc_defaults $id="rsc-options" \
>         resource-stickiness="20"
> 
> 
> and the previous log for ujboss2:
> Jan 17 12:51:46 corosync [TOTEM ] A processor failed, forming new
> configuration.
> Jan 17 12:51:47 corosync [pcmk  ] notice: pcmk_peer_update: Transitional
> membership event on ring 564: memb=1, new=0, lost=1
> Jan 17 12:51:47 corosync [pcmk  ] info: pcmk_peer_update: memb:
> ujbossterrena2 51184000
> Jan 17 12:51:47 corosync [pcmk  ] info: pcmk_peer_update: lost:
> ujbossterrena1 34406784
> Jan 17 12:51:47 corosync [pcmk  ] notice: pcmk_peer_update: Stable
> membership event on ring 564: memb=1, new=0, lost=0
> Jan 17 12:51:47 corosync [pcmk  ] info: pcmk_peer_update: MEMB:
> ujbossterrena2 51184000
> Jan 17 12:51:47 corosync [pcmk  ] info: ais_mark_unseen_peer_dead: Node
> ujbossterrena1 was not seen in the previous transition
> Jan 17 12:51:47 corosync [pcmk  ] info: update_member: Node
> 34406784/ujbossterrena1 is now: lost
> Jan 17 12:51:47 corosync [pcmk  ] info: send_member_notification:
> Sending membership update 564 to 2 children
> Jan 17 12:51:47 corosync [TOTEM ] A processor joined or left the
> membership and a new membership was formed.
> Jan 17 12:51:47 corosync [MAIN  ] Completed service synchronization,
> ready to provide service.
> Jan 17 12:51:47 ujbossterrena2 crmd: [18796]: notice: ais_dispatch:
> Membership 564: quorum lost
> Jan 17 12:51:47 ujbossterrena2 crmd: [18796]: info: crm_update_peer:
> Node ujbossterrena1: id=34406784 state=lost (new) addr=r(
> 0) ip(128.1.13.2)  votes=1 born=488 seen=556
> proc=00000000000000000000000000013312
> Jan 17 12:51:47 ujbossterrena2 crmd: [18796]: WARN: check_dead_member:
> Our DC node (ujbossterrena1) left the cluster
> Jan 17 12:51:47 ujbossterrena2 cib: [18792]: notice: ais_dispatch:
> Membership 564: quorum lost
> Jan 17 12:51:47 ujbossterrena2 cib: [18792]: info: crm_update_peer: Node
> ujbossterrena1: id=34406784 state=lost (new) addr=r(0
> ) ip(128.1.13.2)  votes=1 born=488 seen=556
> proc=00000000000000000000000000013312
> Jan 17 12:51:47 ujbossterrena2 crmd: [18796]: info: do_state_transition:
> State transition S_NOT_DC -> S_ELECTION [ input=I_ELE
> CTION cause=C_FSA_INTERNAL origin=check_dead_member ]
> Jan 17 12:51:47 ujbossterrena2 crmd: [18796]: info: update_dc: Unset DC
> ujbossterrena1
> Jan 17 12:51:47 ujbossterrena2 crmd: [18796]: info: do_state_transition:
> State transition S_ELECTION -> S_INTEGRATION [ input=
> I_ELECTION_DC cause=C_FSA_INTERNAL origin=do_election_check ]
> Jan 17 12:51:47 ujbossterrena2 crmd: [18796]: info: do_te_control:
> Registering TE UUID: 9a4e45a9-9436-4a23-b4e9-9875ba334906
> Jan 17 12:51:47 ujbossterrena2 crmd: [18796]: WARN:
> cib_client_add_notify_callback: Callback already present
> Jan 17 12:51:47 ujbossterrena2 crmd: [18796]: info: set_graph_functions:
> Setting custom graph functions
> Jan 17 12:51:47 ujbossterrena2 crmd: [18796]: info: unpack_graph:
> Unpacked transition -1: 0 actions in 0 synapses
> Jan 17 12:51:47 ujbossterrena2 crmd: [18796]: info: do_dc_takeover:
> Taking over DC status for this partition
> Jan 17 12:51:47 ujbossterrena2 cib: [18792]: info:
> cib_process_readwrite: We are now in R/W mode
> Jan 17 12:51:47 ujbossterrena2 cib: [18792]: info: cib_process_request:
> Operation complete: op cib_master for section 'all' (o
> rigin=local/crmd/73, version=0.233.5): ok (rc=0)
> Jan 17 12:51:47 ujbossterrena2 cib: [18792]: info: cib_process_request:
> Operation complete: op cib_modify for section cib (ori
> gin=local/crmd/74, version=0.233.5): ok (rc=0)
> Jan 17 12:51:47 ujbossterrena2 cib: [18792]: info: cib_process_request:
> Operation complete: op cib_modify for section crm_conf
> ig (origin=local/crmd/76, version=0.233.5): ok (rc=0)
> Jan 17 12:51:47 ujbossterrena2 crmd: [18796]: info: join_make_offer:
> Making join offers based on membership 564
> Jan 17 12:51:47 ujbossterrena2 cib: [18792]: info: cib_process_request:
> Operation complete: op cib_modify for section crm_conf
> ig (origin=local/crmd/78, version=0.233.5): ok (rc=0)
> Jan 17 12:51:47 ujbossterrena2 crmd: [18796]: info:
> do_dc_join_offer_all: join-3: Waiting on 1 outstanding join acks
> Jan 17 12:51:47 ujbossterrena2 crmd: [18796]: info: ais_dispatch:
> Membership 564: quorum still lost
> Jan 17 12:51:47 ujbossterrena2 crmd: [18796]: info: crm_ais_dispatch:
> Setting expected votes to 2
> Jan 17 12:51:47 ujbossterrena2 crmd: [18796]: info:
> config_query_callback: Checking for expired actions every 900000ms
> Jan 17 12:51:47 ujbossterrena2 crmd: [18796]: info:
> config_query_callback: Sending expected-votes=2 to corosync
> Jan 17 12:51:47 ujbossterrena2 cib: [18792]: info: cib_process_request:
> Operation complete: op cib_modify for section crm_conf
> ig (origin=local/crmd/81, version=0.233.5): ok (rc=0)
> Jan 17 12:51:47 ujbossterrena2 crmd: [18796]: info: update_dc: Set DC to
> ujbossterrena2 (3.0.1)
> Jan 17 12:51:47 ujbossterrena2 crmd: [18796]: info: ais_dispatch:
> Membership 564: quorum still lost
> Jan 17 12:51:47 ujbossterrena2 crmd: [18796]: info: crm_ais_dispatch:
> Setting expected votes to 2
> Jan 17 12:51:47 ujbossterrena2 cib: [18792]: info: cib_process_request:
> Operation complete: op cib_modify for section crm_conf
> ig (origin=local/crmd/84, version=0.233.5): ok (rc=0)
> Jan 17 12:51:47 ujbossterrena2 crmd: [18796]: info: do_state_transition:
> State transition S_INTEGRATION -> S_FINALIZE_JOIN [ i
> nput=I_INTEGRATED cause=C_FSA_INTERNAL origin=check_join_state ]
> Jan 17 12:51:47 ujbossterrena2 crmd: [18796]: info: do_state_transition:
> All 1 cluster nodes responded to the join offer.
> Jan 17 12:51:47 ujbossterrena2 crmd: [18796]: info: do_dc_join_finalize:
> join-3: Syncing the CIB from ujbossterrena2 to the re
> st of the cluster
> Jan 17 12:51:47 ujbossterrena2 crmd: [18796]: info: te_connect_stonith:
> Attempting connection to fencing daemon...
> Jan 17 12:51:47 ujbossterrena2 cib: [18792]: info: cib_process_request:
> Operation complete: op cib_sync for section 'all' (ori
> gin=local/crmd/85, version=0.233.5): ok (rc=0)
> Jan 17 12:51:48 ujbossterrena2 crmd: [18796]: info: te_connect_stonith:
> Connected
> Jan 17 12:51:48 ujbossterrena2 cib: [18792]: info: cib_process_request:
> Operation complete: op cib_modify for section nodes (o
> rigin=local/crmd/86, version=0.233.5): ok (rc=0)
> Jan 17 12:51:48 corosync [pcmk  ] notice: pcmk_peer_update: Transitional
> membership event on ring 568: memb=1, new=0, lost=0
> Jan 17 12:51:48 corosync [pcmk  ] info: pcmk_peer_update: memb:
> ujbossterrena2 51184000
> Jan 17 12:51:48 corosync [pcmk  ] notice: pcmk_peer_update: Stable
> membership event on ring 568: memb=2, new=1, lost=0
> Jan 17 12:51:48 corosync [pcmk  ] info: update_member: Node
> 34406784/ujbossterrena1 is now: member
> Jan 17 12:51:48 corosync [pcmk  ] info: pcmk_peer_update: NEW: 
> ujbossterrena1 34406784
> Jan 17 12:51:48 corosync [pcmk  ] info: pcmk_peer_update: MEMB:
> ujbossterrena1 34406784
> Jan 17 12:51:48 corosync [pcmk  ] info: pcmk_peer_update: MEMB:
> ujbossterrena2 51184000
> Jan 17 12:51:48 corosync [pcmk  ] info: send_member_notification:
> Sending membership update 568 to 2 children
> Jan 17 12:51:48 ujbossterrena2 crmd: [18796]: notice: ais_dispatch:
> Membership 568: quorum acquired
> Jan 17 12:51:48 ujbossterrena2 crmd: [18796]: info: ais_status_callback:
> status: ujbossterrena1 is now member (was lost)
> Jan 17 12:51:48 ujbossterrena2 crmd: [18796]: info: crm_update_peer:
> Node ujbossterrena1: id=34406784 state=member (new) addr=
> r(0) ip(128.1.13.2)  votes=1 born=488 seen=568
> proc=00000000000000000000000000013312
> Jan 17 12:51:48 ujbossterrena2 cib: [18792]: notice: ais_dispatch:
> Membership 568: quorum acquired
> Jan 17 12:51:48 ujbossterrena2 cib: [18792]: info: crm_update_peer: Node
> ujbossterrena1: id=34406784 state=member (new) addr=r
> (0) ip(128.1.13.2)  votes=1 born=488 seen=568
> proc=00000000000000000000000000013312
> Jan 17 12:51:48 ujbossterrena2 cib: [18792]: info: cib_process_request:
> Operation complete: op cib_delete for section //node_s
> tate[@uname='ujbossterrena1']/lrm (origin=local/crmd/87,
> version=0.233.6): ok (rc=0)
> Jan 17 12:51:48 corosync [TOTEM ] A processor joined or left the
> membership and a new membership was formed.
> Jan 17 12:51:48 ujbossterrena2 crmd: [18796]: info: crm_update_quorum:
> Updating quorum status to true (call=91)
> Jan 17 12:51:48 ujbossterrena2 cib: [18792]: info: cib_process_request:
> Operation complete: op cib_delete for section //node_s
> tate[@uname='ujbossterrena1']/transient_attributes
> (origin=local/crmd/88, version=0.233.7): ok (rc=0)
> Jan 17 12:51:48 ujbossterrena2 cib: [18792]: info: cib_process_request:
> Operation complete: op cib_modify for section nodes (o
> rigin=local/crmd/89, version=0.233.7): ok (rc=0)
> Jan 17 12:51:48 ujbossterrena2 cib: [18792]: info: log_data_element:
> cib:diff: - <cib dc-uuid="ujbossterrena1" admin_epoch="0"
>  epoch="233" num_updates="7" />
> Jan 17 12:51:48 ujbossterrena2 cib: [18792]: info: log_data_element:
> cib:diff: + <cib dc-uuid="ujbossterrena2" admin_epoch="0"
>  epoch="234" num_updates="1" />
> Jan 17 12:51:48 ujbossterrena2 cib: [18792]: info: cib_process_request:
> Operation complete: op cib_modify for section cib (ori
> gin=local/crmd/91, version=0.234.1): ok (rc=0)
> Jan 17 12:51:48 ujbossterrena2 crmd: [18796]: info: crm_ais_dispatch:
> Setting expected votes to 2
> Jan 17 12:51:48 ujbossterrena2 crmd: [18796]: info:
> erase_xpath_callback: Deletion of "//node_state[@uname='ujbossterrena1']/l
> rm": ok (rc=0)
> Jan 17 12:51:48 ujbossterrena2 crmd: [18796]: info:
> erase_xpath_callback: Deletion of "//node_state[@uname='ujbossterrena1']/t
> ransient_attributes": ok (rc=0)
> Jan 17 12:51:48 ujbossterrena2 cib: [18792]: info: cib_process_request:
> Operation complete: op cib_modify for section crm_conf
> ig (origin=local/crmd/93, version=0.234.1): ok (rc=0)
> Jan 17 12:51:48 ujbossterrena2 cib: [20766]: info: write_cib_contents:
> Archived previous version as /var/lib/heartbeat/crm/cib
> -88.raw
> Jan 17 12:51:48 ujbossterrena2 cib: [20766]: info: write_cib_contents:
> Wrote version 0.234.0 of the CIB to disk (digest: 62f91
> 5518fc318be93098daf9a5fecca)
> Jan 17 12:51:48 ujbossterrena2 cib: [20766]: info: retrieveCib: Reading
> cluster configuration from: /var/lib/heartbeat/crm/cib
> .4oDZVT (digest: /var/lib/heartbeat/crm/cib.i5iINe)
> Jan 17 12:51:48 ujbossterrena2 cib: [18792]: WARN: cib_process_diff:
> Diff 0.233.5 -> 0.234.1 not applied to 0.234.1: current "
> epoch" is greater than required
> Jan 17 12:51:48 ujbossterrena2 crmd: [18796]: info: do_lrm_rsc_op:
> Performing key=8:8777:0:39671e48-9519-4b61-b781-2efcd379df7
> a op=clusterIP_start_0 )
> Jan 17 12:51:48 ujbossterrena2 crmd: [18796]: info: do_lrm_rsc_op:
> Discarding attempt to perform action start on clusterIP in
> state S_FINALIZE_JOIN
> Jan 17 12:51:48 ujbossterrena2 crmd: [18796]: info: send_direct_ack:
> ACK'ing resource op clusterIP_start_0 from 8:8777:0:39671
> e48-9519-4b61-b781-2efcd379df7a: lrm_invoke-lrmd-1326801108-25
> Jan 17 12:51:48 ujbossterrena2 crmd: [18796]: info: process_te_message:
> Processing (N)ACK lrm_invoke-lrmd-1326801108-25 from u
> jbossterrena2
> Jan 17 12:51:48 ujbossterrena2 crmd: [18796]: info: process_graph_event:
> Action clusterIP_start_0/8 (4:99;8:8777:0:39671e48-95
> 19-4b61-b781-2efcd379df7a) initiated by a different transitioner
> Jan 17 12:51:48 ujbossterrena2 crmd: [18796]: info:
> abort_transition_graph: process_graph_event:456 - Triggered transition abo
> rt (complete=1, tag=lrm_rsc_op, id=clusterIP_start_0,
> magic=4:99;8:8777:0:39671e48-9519-4b61-b781-2efcd379df7a) : Foreign even
> t
> Jan 17 12:51:48 ujbossterrena2 crmd: [18796]: info: update_dc: Unset DC
> ujbossterrena2
> Jan 17 12:51:48 ujbossterrena2 crmd: [18796]: WARN: do_log: FSA: Input
> I_RELEASE_DC from do_election_count_vote() received in
> state S_FINALIZE_JOIN
> Jan 17 12:51:48 ujbossterrena2 crmd: [18796]: info: do_state_transition:
> State transition S_FINALIZE_JOIN -> S_RELEASE_DC [ in
> put=I_RELEASE_DC cause=C_FSA_INTERNAL origin=do_election_count_vote ]
> Jan 17 12:51:48 ujbossterrena2 crmd: [18796]: info: do_dc_release: DC
> role released
> Jan 17 12:51:48 ujbossterrena2 crmd: [18796]: info:
> pe_connection_destroy: Connection to the Policy Engine released
> Jan 17 12:51:48 ujbossterrena2 crmd: [18796]: info: do_te_control:
> Transitioner is now inactive
> Jan 17 12:51:48 ujbossterrena2 crmd: [18796]: info: do_te_control:
> Disconnecting STONITH...
> Jan 17 12:51:48 ujbossterrena2 crmd: [18796]: info:
> tengine_stonith_connection_destroy: Fencing daemon disconnected
> Jan 17 12:51:48 ujbossterrena2 crmd: [18796]: notice: Not currently
> connected.
> Jan 17 12:51:48 ujbossterrena2 crmd: [18796]: info: do_state_transition:
> State transition S_RELEASE_DC -> S_PENDING [ input=I_
> RELEASE_SUCCESS cause=C_FSA_INTERNAL origin=do_dc_release ]
> Jan 17 12:51:48 ujbossterrena2 cib: [18792]: info:
> cib_process_readwrite: We are now in R/O mode
> Jan 17 12:51:49 corosync [MAIN  ] Completed service synchronization,
> ready to provide service.
> Jan 17 12:51:49 ujbossterrena2 crmd: [18796]: info: update_dc: Set DC to
> ujbossterrena1 (3.0.1)
> Jan 17 12:51:49 ujbossterrena2 crmd: [18796]: info: update_dc: Unset DC
> ujbossterrena1
> Jan 17 12:51:49 ujbossterrena2 crmd: [18796]: info: update_dc: Set DC to
> ujbossterrena1 (3.0.1)
> Jan 17 12:51:49 ujbossterrena2 crmd: [18796]: info: update_dc: Unset DC
> ujbossterrena1
> Jan 17 12:51:49 ujbossterrena2 crmd: [18796]: info: update_dc: Set DC to
> ujbossterrena1 (3.0.1)
> 
> Best regards,
> Philippe
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org



-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 286 bytes
Desc: OpenPGP digital signature
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20120119/29b3a552/attachment-0003.sig>


More information about the Pacemaker mailing list