[Pacemaker] different behavior cibadmin -Ql with cman and corosync2

Mon Sep 2 03:27:14 EDT 2013

30.08.2013, 07:18, "Andrew Beekhof" <andrew at beekhof.net>:
> On 29/08/2013, at 7:31 PM, Andrey Groshev <greenx at yandex.ru> wrote:
>
>>  29.08.2013, 12:25, "Andrey Groshev" <greenx at yandex.ru>:
>>>  29.08.2013, 02:55, "Andrew Beekhof" <andrew at beekhof.net>:
>>>>   On 28/08/2013, at 5:38 PM, Andrey Groshev <greenx at yandex.ru> wrote:
>>>>>    28.08.2013, 04:06, "Andrew Beekhof" <andrew at beekhof.net>:
>>>>>>    On 27/08/2013, at 1:13 PM, Andrey Groshev <greenx at yandex.ru> wrote:
>>>>>>>     27.08.2013, 05:39, "Andrew Beekhof" <andrew at beekhof.net>:
>>>>>>>>     On 26/08/2013, at 3:09 PM, Andrey Groshev <greenx at yandex.ru> wrote:
>>>>>>>>>      26.08.2013, 03:34, "Andrew Beekhof" <andrew at beekhof.net>:
>>>>>>>>>>      On 23/08/2013, at 9:39 PM, Andrey Groshev <greenx at yandex.ru> wrote:
>>>>>>>>>>>       Hello,
>>>>>>>>>>>
>>>>>>>>>>>       Today I try remake my test cluster from cman to corosync2.
>>>>>>>>>>>       I drew attention to the following:
>>>>>>>>>>>       If I reset cluster with cman through cibadmin --erase --force
>>>>>>>>>>>       In cib is still there exist names of nodes.
>>>>>>>>>>      Yes, the cluster puts back entries for all the nodes it know about automagically.
>>>>>>>>>>>       cibadmin -Ql
>>>>>>>>>>>       .....
>>>>>>>>>>>          <nodes>
>>>>>>>>>>>            <node id="dev-cluster2-node2.unix.tensor.ru" uname="dev-cluster2-node2"/>
>>>>>>>>>>>            <node id="dev-cluster2-node4.unix.tensor.ru" uname="dev-cluster2-node4"/>
>>>>>>>>>>>            <node id="dev-cluster2-node3.unix.tensor.ru" uname="dev-cluster2-node3"/>
>>>>>>>>>>>          </nodes>
>>>>>>>>>>>       ....
>>>>>>>>>>>
>>>>>>>>>>>       Even if cman and pacemaker running only one node.
>>>>>>>>>>      I'm assuming all three are configured in cluster.conf?
>>>>>>>>>      Yes, there exist list nodes.
>>>>>>>>>>>       And if I do too on cluster with corosync2
>>>>>>>>>>>       I see only names of nodes which run corosync and pacemaker.
>>>>>>>>>>      Since you're not included your config, I can only guess that your corosync.conf does not have a nodelist.
>>>>>>>>>>      If it did, you should get the same behaviour.
>>>>>>>>>      I try and expected_node and nodelist.
>>>>>>>>     And it didn't work? What version of pacemaker?
>>>>>>>     It does not work as I expected.
>>>>>>    Thats because you've used IP addresses in the node list.
>>>>>>    ie.
>>>>>>
>>>>>>    node {
>>>>>>      ring0_addr: 10.76.157.17
>>>>>>    }
>>>>>>
>>>>>>    try including the node name as well, eg.
>>>>>>
>>>>>>    node {
>>>>>>      name: dev-cluster2-node2
>>>>>>      ring0_addr: 10.76.157.17
>>>>>>    }
>>>>>    The same thing.
>>>>   I don't know what to say.  I tested it here yesterday and it worked as expected.
>>>  I found that the reason that You and I have different results - I did not have reverse DNS zone for these nodes.
>>>  I know what it should be, but (PACEMAKER + CMAN) worked without a reverse area!
>>  Hasty. Deleted all. Reinstalled. Configured. Not working again. Damn!
>
> It would have surprised me... pacemaker 1.1.11 doesn't do any dns lookups - reverse or otherwise.
> Can you set
>
>  PCMK_trace_files=corosync.c
>
> in your environment and retest?
>
> On RHEL6 that means putting the following in /etc/sysconfig/pacemaker
>   export PCMK_trace_files=corosync.c
>
> It should produce additional logging[1] that will help diagnose the issue.
>
> [1] http://blog.clusterlabs.org/blog/2013/pacemaker-logging/
>

Hello, Andrew.

You are a little misunderstood me.
I wrote that I rushed to judgment.
After I did the reverse DNS zone, the cluster behaved correctly.
BUT after I took apart the cluster dropped configs and restarted on the new cluster, 
cluster again don't showed all the nodes in the nodes (only node with running pacemaker).

A small portion of the log. Full log 
In which (I thought) there is something interesting.

Aug 30 12:31:11 [9986] dev-cluster2-node4        cib: (  corosync.c:423   )   trace: check_message_sanity:      Verfied message 4: (dest=<all>:cib, from=dev-cluster2-node4:cib.9986, compressed=0, size=1551, total=2143)
Aug 30 12:31:11 [9989] dev-cluster2-node4      attrd: (  corosync.c:96    )   trace: corosync_node_name:        Checking 172793107 vs 0 from nodelist.node.0.nodeid
Aug 30 12:31:11 [9989] dev-cluster2-node4      attrd: (      ipcc.c:378   )   debug: qb_ipcc_disconnect:        qb_ipcc_disconnect()
Aug 30 12:31:11 [9989] dev-cluster2-node4      attrd: (ringbuffer.c:294   )   debug: qb_rb_close:       Closing ringbuffer: /dev/shm/qb-cmap-request-9616-9989-27-header
Aug 30 12:31:11 [9989] dev-cluster2-node4      attrd: (ringbuffer.c:294   )   debug: qb_rb_close:       Closing ringbuffer: /dev/shm/qb-cmap-response-9616-9989-27-header
Aug 30 12:31:11 [9989] dev-cluster2-node4      attrd: (ringbuffer.c:294   )   debug: qb_rb_close:       Closing ringbuffer: /dev/shm/qb-cmap-event-9616-9989-27-header
Aug 30 12:31:11 [9989] dev-cluster2-node4      attrd: (  corosync.c:134   )  notice: corosync_node_name:        Unable to get node name for nodeid 172793107
Aug 30 12:31:11 [9989] dev-cluster2-node4      attrd: (   cluster.c:338   )  notice: get_node_name:     Defaulting to uname -n for the local corosync node name
Aug 30 12:31:11 [9989] dev-cluster2-node4      attrd: (     attrd.c:651   )   debug: attrd_cib_callback:        Update 4 for probe_complete=true passed
Aug 30 12:31:11 [9615] dev-cluster2-node4 corosync debug   [QB    ] HUP conn (9616-9989-27)
Aug 30 12:31:11 [9615] dev-cluster2-node4 corosync debug   [QB    ] qb_ipcs_disconnect(9616-9989-27) state:2
Aug 30 12:31:11 [9615] dev-cluster2-node4 corosync debug   [QB    ] epoll_ctl(del): Bad file descriptor (9)
Aug 30 12:31:11 [9615] dev-cluster2-node4 corosync debug   [MAIN  ] cs_ipcs_connection_closed()
Aug 30 12:31:11 [9615] dev-cluster2-node4 corosync debug   [CMAP  ] exit_fn for conn=0x7fa96bcb31b0
Aug 30 12:31:11 [9615] dev-cluster2-node4 corosync debug   [MAIN  ] cs_ipcs_connection_destroyed()
Aug 30 12:31:11 [9615] dev-cluster2-node4 corosync debug   [QB    ] Free'ing ringbuffer: /dev/shm/qb-cmap-response-9616-9989-27-header
Aug 30 12:31:11 [9615] dev-cluster2-node4 corosync debug   [QB    ] Free'ing ringbuffer: /dev/shm/qb-cmap-event-9616-9989-27-header
Aug 30 12:31:11 [9615] dev-cluster2-node4 corosync debug   [QB    ] Free'ing ringbuffer: /dev/shm/qb-cmap-request-9616-9989-27-header
Aug 30 12:31:11 [9989] dev-cluster2-node4      attrd: (  corosync.c:423   )   trace: check_message_sanity:      Verfied message 1: (dest=<all>:attrd, from=dev-cluster2-node4:attrd.9989, compressed=0, size=181, total=773)
Aug 30 12:31:42 [9984] dev-cluster2-node4 pacemakerd: (  mainloop.c:270   )    info: crm_signal_dispatch:       Invoking handler for signal 10: User defined signal 1
Aug 30 12:31:59 [9986] dev-cluster2-node4        cib: (       ipc.c:307   )    info: crm_client_new:    Connecting 0x16c98e0 for uid=0 gid=0 pid=10007 id=f2f15044-8f76-4ea7-a714-984660619ae7
Aug 30 12:31:59 [9986] dev-cluster2-node4        cib: ( ipc_setup.c:476   )   debug: handle_new_connection:     IPC credentials authenticated (9986-10007-13)
Aug 30 12:31:59 [9986] dev-cluster2-node4        cib: (   ipc_shm.c:294   )   debug: qb_ipcs_shm_connect:       connecting to client [10007]
Aug 30 12:31:59 [9986] dev-cluster2-node4        cib: (ringbuffer.c:227   )   debug: qb_rb_open_2:      shm size:524288; real_size:524288; rb->word_size:131072
Aug 30 12:31:59 [9986] dev-cluster2-node4        cib: (ringbuffer.c:227   )   debug: qb_rb_open_2:      shm size:524288; real_size:524288; rb->word_size:131072
Aug 30 12:31:59 [9986] dev-cluster2-node4        cib: (ringbuffer.c:227   )   debug: qb_rb_open_2:      shm size:524288; real_size:524288; rb->word_size:131072
Aug 30 12:31:59 [9986] dev-cluster2-node4        cib: (        io.c:579   )   debug: activateCibXml:    Triggering CIB write for cib_erase op
Aug 30 12:31:59 [9991] dev-cluster2-node4       crmd: (te_callbacks:122   )   debug: te_update_diff:    Processing diff (cib_erase): 0.9.3 -> 0.11.1 (S_IDLE)
Aug 30 12:31:59 [9991] dev-cluster2-node4       crmd: (  te_utils.c:423   )    info: abort_transition_graph:    te_update_diff:126 - Triggered transition abort (complete=1, node=, tag=diff, id=(null), magic=NA, cib=0.11.1) : Non-status change

>>>>>    # corosync-cmapctl |grep nodelist
>>>>>    nodelist.local_node_pos (u32) = 2
>>>>>    nodelist.node.0.name (str) = dev-cluster2-node2
>>>>>    nodelist.node.0.ring0_addr (str) = 10.76.157.17
>>>>>    nodelist.node.1.name (str) = dev-cluster2-node3
>>>>>    nodelist.node.1.ring0_addr (str) = 10.76.157.18
>>>>>    nodelist.node.2.name (str) = dev-cluster2-node4
>>>>>    nodelist.node.2.ring0_addr (str) = 10.76.157.19
>>>>>
>>>>>    # corosync-quorumtool -s
>>>>>    Quorum information
>>>>>    ------------------
>>>>>    Date:             Wed Aug 28 11:29:49 2013
>>>>>    Quorum provider:  corosync_votequorum
>>>>>    Nodes:            1
>>>>>    Node ID:          172793107
>>>>>    Ring ID:          52
>>>>>    Quorate:          No
>>>>>
>>>>>    Votequorum information
>>>>>    ----------------------
>>>>>    Expected votes:   3
>>>>>    Highest expected: 3
>>>>>    Total votes:      1
>>>>>    Quorum:           2 Activity blocked
>>>>>    Flags:
>>>>>
>>>>>    Membership information
>>>>>    ----------------------
>>>>>       Nodeid      Votes Name
>>>>>    172793107          1 dev-cluster2-node4 (local)
>>>>>
>>>>>    # cibadmin -Q
>>>>>    <cib epoch="25" num_updates="3" admin_epoch="0" validate-with="pacemaker-1.2" crm_feature_set="3.0.7" cib-last-written="Wed Aug 28 11:24:06 2013" update-origin="dev-cluster2-node4" update-client="crmd" have-quorum="0" dc-uuid="172793107">
>>>>>     <configuration>
>>>>>       <crm_config>
>>>>>         <cluster_property_set id="cib-bootstrap-options">
>>>>>           <nvpair id="cib-bootstrap-options-dc-version" name="dc-version" value="1.1.11-1.el6-4f672bc"/>
>>>>>           <nvpair id="cib-bootstrap-options-cluster-infrastructure" name="cluster-infrastructure" value="corosync"/>
>>>>>         </cluster_property_set>
>>>>>       </crm_config>
>>>>>       <nodes>
>>>>>         <node id="172793107" uname="dev-cluster2-node4"/>
>>>>>       </nodes>
>>>>>       <resources/>
>>>>>       <constraints/>
>>>>>     </configuration>
>>>>>     <status>
>>>>>       <node_state id="172793107" uname="dev-cluster2-node4" in_ccm="true" crmd="online" crm-debug-origin="do_state_transition" join="member" expected="member">
>>>>>         <lrm id="172793107">
>>>>>           <lrm_resources/>
>>>>>         </lrm>
>>>>>         <transient_attributes id="172793107">
>>>>>           <instance_attributes id="status-172793107">
>>>>>             <nvpair id="status-172793107-probe_complete" name="probe_complete" value="true"/>
>>>>>           </instance_attributes>
>>>>>         </transient_attributes>
>>>>>       </node_state>
>>>>>     </status>
>>>>>    </cib>
>>>>>>>     I figured out a way get around this, but it would be easier to do if the CIB has worked as a with CMAN.
>>>>>>>     I just do not start the main resource if the attribute is not defined or it is not true.
>>>>>>>     This slightly changes the logic of the cluster.
>>>>>>>     But I'm not sure what the correct behavior.
>>>>>>>
>>>>>>>     libqb 0.14.4
>>>>>>>     corosync 2.3.1
>>>>>>>     pacemaker 1.1.11
>>>>>>>
>>>>>>>     All build from source in previews week.
>>>>>>>>>      Now in corosync.conf:
>>>>>>>>>
>>>>>>>>>      totem {
>>>>>>>>>             version: 2
>>>>>>>>>             crypto_cipher: none
>>>>>>>>>             crypto_hash: none
>>>>>>>>>             interface {
>>>>>>>>>                     ringnumber: 0
>>>>>>>>>      bindnetaddr: 10.76.157.18
>>>>>>>>>      mcastaddr: 239.94.1.56
>>>>>>>>>                     mcastport: 5405
>>>>>>>>>                     ttl: 1
>>>>>>>>>             }
>>>>>>>>>      }
>>>>>>>>>      logging {
>>>>>>>>>             fileline: off
>>>>>>>>>             to_stderr: no
>>>>>>>>>             to_logfile: yes
>>>>>>>>>             logfile: /var/log/cluster/corosync.log
>>>>>>>>>             to_syslog: yes
>>>>>>>>>             debug: on
>>>>>>>>>             timestamp: on
>>>>>>>>>             logger_subsys {
>>>>>>>>>                     subsys: QUORUM
>>>>>>>>>                     debug: on
>>>>>>>>>             }
>>>>>>>>>      }
>>>>>>>>>      quorum {
>>>>>>>>>             provider: corosync_votequorum
>>>>>>>>>      }
>>>>>>>>>      nodelist {
>>>>>>>>>      node {
>>>>>>>>>      ring0_addr: 10.76.157.17
>>>>>>>>>      }
>>>>>>>>>      node {
>>>>>>>>>      ring0_addr: 10.76.157.18
>>>>>>>>>      }
>>>>>>>>>      node {
>>>>>>>>>      ring0_addr: 10.76.157.19
>>>>>>>>>      }
>>>>>>>>>      }
>>>>>>>>>
>>>>>>>>>      _______________________________________________
>>>>>>>>>      Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>>>>>>>      http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>>>>>>
>>>>>>>>>      Project Home: http://www.clusterlabs.org
>>>>>>>>>      Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>>>>>>      Bugs: http://bugs.clusterlabs.org
>>>>>>>>     ,
>>>>>>>>     _______________________________________________
>>>>>>>>     Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>>>>>>     http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>>>>>
>>>>>>>>     Project Home: http://www.clusterlabs.org
>>>>>>>>     Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>>>>>     Bugs: http://bugs.clusterlabs.org
>>>>>>>     _______________________________________________
>>>>>>>     Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>>>>>     http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>>>>
>>>>>>>     Project Home: http://www.clusterlabs.org
>>>>>>>     Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>>>>     Bugs: http://bugs.clusterlabs.org
>>>>>>    ,
>>>>>>    _______________________________________________
>>>>>>    Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>>>>    http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>>>
>>>>>>    Project Home: http://www.clusterlabs.org
>>>>>>    Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>>>    Bugs: http://bugs.clusterlabs.org
>>>>>    _______________________________________________
>>>>>    Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>>>    http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>>
>>>>>    Project Home: http://www.clusterlabs.org
>>>>>    Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>>    Bugs: http://bugs.clusterlabs.org
>>>>   ,
>>>>   _______________________________________________
>>>>   Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>>   http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>
>>>>   Project Home: http://www.clusterlabs.org
>>>>   Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>   Bugs: http://bugs.clusterlabs.org
>>>  _______________________________________________
>>>  Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>  http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>
>>>  Project Home: http://www.clusterlabs.org
>>>  Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>  Bugs: http://bugs.clusterlabs.org
>>  _______________________________________________
>>  Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>  http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>>  Project Home: http://www.clusterlabs.org
>>  Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>  Bugs: http://bugs.clusterlabs.org
>
> ,
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org