[Pacemaker] different behavior cibadmin -Ql with cman and corosync2

Tue Sep 3 00:20:30 EDT 2013

On 02/09/2013, at 5:27 PM, Andrey Groshev <greenx at yandex.ru> wrote:

> 
> 
> 30.08.2013, 07:18, "Andrew Beekhof" <andrew at beekhof.net>:
>> On 29/08/2013, at 7:31 PM, Andrey Groshev <greenx at yandex.ru> wrote:
>> 
>>>  29.08.2013, 12:25, "Andrey Groshev" <greenx at yandex.ru>:
>>>>  29.08.2013, 02:55, "Andrew Beekhof" <andrew at beekhof.net>:
>>>>>   On 28/08/2013, at 5:38 PM, Andrey Groshev <greenx at yandex.ru> wrote:
>>>>>>    28.08.2013, 04:06, "Andrew Beekhof" <andrew at beekhof.net>:
>>>>>>>    On 27/08/2013, at 1:13 PM, Andrey Groshev <greenx at yandex.ru> wrote:
>>>>>>>>     27.08.2013, 05:39, "Andrew Beekhof" <andrew at beekhof.net>:
>>>>>>>>>     On 26/08/2013, at 3:09 PM, Andrey Groshev <greenx at yandex.ru> wrote:
>>>>>>>>>>      26.08.2013, 03:34, "Andrew Beekhof" <andrew at beekhof.net>:
>>>>>>>>>>>      On 23/08/2013, at 9:39 PM, Andrey Groshev <greenx at yandex.ru> wrote:
>>>>>>>>>>>>       Hello,
>>>>>>>>>>>> 
>>>>>>>>>>>>       Today I try remake my test cluster from cman to corosync2.
>>>>>>>>>>>>       I drew attention to the following:
>>>>>>>>>>>>       If I reset cluster with cman through cibadmin --erase --force
>>>>>>>>>>>>       In cib is still there exist names of nodes.
>>>>>>>>>>>      Yes, the cluster puts back entries for all the nodes it know about automagically.
>>>>>>>>>>>>       cibadmin -Ql
>>>>>>>>>>>>       .....
>>>>>>>>>>>>          <nodes>
>>>>>>>>>>>>            <node id="dev-cluster2-node2.unix.tensor.ru" uname="dev-cluster2-node2"/>
>>>>>>>>>>>>            <node id="dev-cluster2-node4.unix.tensor.ru" uname="dev-cluster2-node4"/>
>>>>>>>>>>>>            <node id="dev-cluster2-node3.unix.tensor.ru" uname="dev-cluster2-node3"/>
>>>>>>>>>>>>          </nodes>
>>>>>>>>>>>>       ....
>>>>>>>>>>>> 
>>>>>>>>>>>>       Even if cman and pacemaker running only one node.
>>>>>>>>>>>      I'm assuming all three are configured in cluster.conf?
>>>>>>>>>>      Yes, there exist list nodes.
>>>>>>>>>>>>       And if I do too on cluster with corosync2
>>>>>>>>>>>>       I see only names of nodes which run corosync and pacemaker.
>>>>>>>>>>>      Since you're not included your config, I can only guess that your corosync.conf does not have a nodelist.
>>>>>>>>>>>      If it did, you should get the same behaviour.
>>>>>>>>>>      I try and expected_node and nodelist.
>>>>>>>>>     And it didn't work? What version of pacemaker?
>>>>>>>>     It does not work as I expected.
>>>>>>>    Thats because you've used IP addresses in the node list.
>>>>>>>    ie.
>>>>>>> 
>>>>>>>    node {
>>>>>>>      ring0_addr: 10.76.157.17
>>>>>>>    }
>>>>>>> 
>>>>>>>    try including the node name as well, eg.
>>>>>>> 
>>>>>>>    node {
>>>>>>>      name: dev-cluster2-node2
>>>>>>>      ring0_addr: 10.76.157.17
>>>>>>>    }
>>>>>>    The same thing.
>>>>>   I don't know what to say.  I tested it here yesterday and it worked as expected.
>>>>  I found that the reason that You and I have different results - I did not have reverse DNS zone for these nodes.
>>>>  I know what it should be, but (PACEMAKER + CMAN) worked without a reverse area!
>>>  Hasty. Deleted all. Reinstalled. Configured. Not working again. Damn!
>> 
>> It would have surprised me... pacemaker 1.1.11 doesn't do any dns lookups - reverse or otherwise.
>> Can you set
>> 
>>  PCMK_trace_files=corosync.c
>> 
>> in your environment and retest?
>> 
>> On RHEL6 that means putting the following in /etc/sysconfig/pacemaker
>>   export PCMK_trace_files=corosync.c
>> 
>> It should produce additional logging[1] that will help diagnose the issue.
>> 
>> [1] http://blog.clusterlabs.org/blog/2013/pacemaker-logging/
>> 
> 
> Hello, Andrew.
> 
> You are a little misunderstood me.

No, I understood you fine.

> I wrote that I rushed to judgment.
> After I did the reverse DNS zone, the cluster behaved correctly.
> BUT after I took apart the cluster dropped configs and restarted on the new cluster, 
> cluster again don't showed all the nodes in the nodes (only node with running pacemaker).
> 
> A small portion of the log. Full log 
> In which (I thought) there is something interesting.
> 
> Aug 30 12:31:11 [9986] dev-cluster2-node4        cib: (  corosync.c:423   )   trace: check_message_sanity:      Verfied message 4: (dest=<all>:cib, from=dev-cluster2-node4:cib.9986, compressed=0, size=1551, total=2143)
> Aug 30 12:31:11 [9989] dev-cluster2-node4      attrd: (  corosync.c:96    )   trace: corosync_node_name:        Checking 172793107 vs 0 from nodelist.node.0.nodeid
> Aug 30 12:31:11 [9989] dev-cluster2-node4      attrd: (      ipcc.c:378   )   debug: qb_ipcc_disconnect:        qb_ipcc_disconnect()
> Aug 30 12:31:11 [9989] dev-cluster2-node4      attrd: (ringbuffer.c:294   )   debug: qb_rb_close:       Closing ringbuffer: /dev/shm/qb-cmap-request-9616-9989-27-header
> Aug 30 12:31:11 [9989] dev-cluster2-node4      attrd: (ringbuffer.c:294   )   debug: qb_rb_close:       Closing ringbuffer: /dev/shm/qb-cmap-response-9616-9989-27-header
> Aug 30 12:31:11 [9989] dev-cluster2-node4      attrd: (ringbuffer.c:294   )   debug: qb_rb_close:       Closing ringbuffer: /dev/shm/qb-cmap-event-9616-9989-27-header
> Aug 30 12:31:11 [9989] dev-cluster2-node4      attrd: (  corosync.c:134   )  notice: corosync_node_name:        Unable to get node name for nodeid 172793107

I wonder if you need to be including the nodeid too. ie.

node {
 name: dev-cluster2-node2
 ring0_addr: 10.76.157.17
 nodeid: 2
}

I _thought_ that was implicit.  
Chrissie: is "nodelist.node.%d.nodeid" always available for corosync2 or only if explicitly defined in the config?

> Aug 30 12:31:11 [9989] dev-cluster2-node4      attrd: (   cluster.c:338   )  notice: get_node_name:     Defaulting to uname -n for the local corosync node name
> Aug 30 12:31:11 [9989] dev-cluster2-node4      attrd: (     attrd.c:651   )   debug: attrd_cib_callback:        Update 4 for probe_complete=true passed
> Aug 30 12:31:11 [9615] dev-cluster2-node4 corosync debug   [QB    ] HUP conn (9616-9989-27)
> Aug 30 12:31:11 [9615] dev-cluster2-node4 corosync debug   [QB    ] qb_ipcs_disconnect(9616-9989-27) state:2
> Aug 30 12:31:11 [9615] dev-cluster2-node4 corosync debug   [QB    ] epoll_ctl(del): Bad file descriptor (9)
> Aug 30 12:31:11 [9615] dev-cluster2-node4 corosync debug   [MAIN  ] cs_ipcs_connection_closed()
> Aug 30 12:31:11 [9615] dev-cluster2-node4 corosync debug   [CMAP  ] exit_fn for conn=0x7fa96bcb31b0
> Aug 30 12:31:11 [9615] dev-cluster2-node4 corosync debug   [MAIN  ] cs_ipcs_connection_destroyed()
> Aug 30 12:31:11 [9615] dev-cluster2-node4 corosync debug   [QB    ] Free'ing ringbuffer: /dev/shm/qb-cmap-response-9616-9989-27-header
> Aug 30 12:31:11 [9615] dev-cluster2-node4 corosync debug   [QB    ] Free'ing ringbuffer: /dev/shm/qb-cmap-event-9616-9989-27-header
> Aug 30 12:31:11 [9615] dev-cluster2-node4 corosync debug   [QB    ] Free'ing ringbuffer: /dev/shm/qb-cmap-request-9616-9989-27-header
> Aug 30 12:31:11 [9989] dev-cluster2-node4      attrd: (  corosync.c:423   )   trace: check_message_sanity:      Verfied message 1: (dest=<all>:attrd, from=dev-cluster2-node4:attrd.9989, compressed=0, size=181, total=773)
> Aug 30 12:31:42 [9984] dev-cluster2-node4 pacemakerd: (  mainloop.c:270   )    info: crm_signal_dispatch:       Invoking handler for signal 10: User defined signal 1
> Aug 30 12:31:59 [9986] dev-cluster2-node4        cib: (       ipc.c:307   )    info: crm_client_new:    Connecting 0x16c98e0 for uid=0 gid=0 pid=10007 id=f2f15044-8f76-4ea7-a714-984660619ae7
> Aug 30 12:31:59 [9986] dev-cluster2-node4        cib: ( ipc_setup.c:476   )   debug: handle_new_connection:     IPC credentials authenticated (9986-10007-13)
> Aug 30 12:31:59 [9986] dev-cluster2-node4        cib: (   ipc_shm.c:294   )   debug: qb_ipcs_shm_connect:       connecting to client [10007]
> Aug 30 12:31:59 [9986] dev-cluster2-node4        cib: (ringbuffer.c:227   )   debug: qb_rb_open_2:      shm size:524288; real_size:524288; rb->word_size:131072
> Aug 30 12:31:59 [9986] dev-cluster2-node4        cib: (ringbuffer.c:227   )   debug: qb_rb_open_2:      shm size:524288; real_size:524288; rb->word_size:131072
> Aug 30 12:31:59 [9986] dev-cluster2-node4        cib: (ringbuffer.c:227   )   debug: qb_rb_open_2:      shm size:524288; real_size:524288; rb->word_size:131072
> Aug 30 12:31:59 [9986] dev-cluster2-node4        cib: (        io.c:579   )   debug: activateCibXml:    Triggering CIB write for cib_erase op
> Aug 30 12:31:59 [9991] dev-cluster2-node4       crmd: (te_callbacks:122   )   debug: te_update_diff:    Processing diff (cib_erase): 0.9.3 -> 0.11.1 (S_IDLE)
> Aug 30 12:31:59 [9991] dev-cluster2-node4       crmd: (  te_utils.c:423   )    info: abort_transition_graph:    te_update_diff:126 - Triggered transition abort (complete=1, node=, tag=diff, id=(null), magic=NA, cib=0.11.1) : Non-status change
> 
> 
> 
>>>>>>    # corosync-cmapctl |grep nodelist
>>>>>>    nodelist.local_node_pos (u32) = 2
>>>>>>    nodelist.node.0.name (str) = dev-cluster2-node2
>>>>>>    nodelist.node.0.ring0_addr (str) = 10.76.157.17
>>>>>>    nodelist.node.1.name (str) = dev-cluster2-node3
>>>>>>    nodelist.node.1.ring0_addr (str) = 10.76.157.18
>>>>>>    nodelist.node.2.name (str) = dev-cluster2-node4
>>>>>>    nodelist.node.2.ring0_addr (str) = 10.76.157.19
>>>>>> 
>>>>>>    # corosync-quorumtool -s
>>>>>>    Quorum information
>>>>>>    ------------------
>>>>>>    Date:             Wed Aug 28 11:29:49 2013
>>>>>>    Quorum provider:  corosync_votequorum
>>>>>>    Nodes:            1
>>>>>>    Node ID:          172793107
>>>>>>    Ring ID:          52
>>>>>>    Quorate:          No
>>>>>> 
>>>>>>    Votequorum information
>>>>>>    ----------------------
>>>>>>    Expected votes:   3
>>>>>>    Highest expected: 3
>>>>>>    Total votes:      1
>>>>>>    Quorum:           2 Activity blocked
>>>>>>    Flags:
>>>>>> 
>>>>>>    Membership information
>>>>>>    ----------------------
>>>>>>       Nodeid      Votes Name
>>>>>>    172793107          1 dev-cluster2-node4 (local)
>>>>>> 
>>>>>>    # cibadmin -Q
>>>>>>    <cib epoch="25" num_updates="3" admin_epoch="0" validate-with="pacemaker-1.2" crm_feature_set="3.0.7" cib-last-written="Wed Aug 28 11:24:06 2013" update-origin="dev-cluster2-node4" update-client="crmd" have-quorum="0" dc-uuid="172793107">
>>>>>>     <configuration>
>>>>>>       <crm_config>
>>>>>>         <cluster_property_set id="cib-bootstrap-options">
>>>>>>           <nvpair id="cib-bootstrap-options-dc-version" name="dc-version" value="1.1.11-1.el6-4f672bc"/>
>>>>>>           <nvpair id="cib-bootstrap-options-cluster-infrastructure" name="cluster-infrastructure" value="corosync"/>
>>>>>>         </cluster_property_set>
>>>>>>       </crm_config>
>>>>>>       <nodes>
>>>>>>         <node id="172793107" uname="dev-cluster2-node4"/>
>>>>>>       </nodes>
>>>>>>       <resources/>
>>>>>>       <constraints/>
>>>>>>     </configuration>
>>>>>>     <status>
>>>>>>       <node_state id="172793107" uname="dev-cluster2-node4" in_ccm="true" crmd="online" crm-debug-origin="do_state_transition" join="member" expected="member">
>>>>>>         <lrm id="172793107">
>>>>>>           <lrm_resources/>
>>>>>>         </lrm>
>>>>>>         <transient_attributes id="172793107">
>>>>>>           <instance_attributes id="status-172793107">
>>>>>>             <nvpair id="status-172793107-probe_complete" name="probe_complete" value="true"/>
>>>>>>           </instance_attributes>
>>>>>>         </transient_attributes>
>>>>>>       </node_state>
>>>>>>     </status>
>>>>>>    </cib>
>>>>>>>>     I figured out a way get around this, but it would be easier to do if the CIB has worked as a with CMAN.
>>>>>>>>     I just do not start the main resource if the attribute is not defined or it is not true.
>>>>>>>>     This slightly changes the logic of the cluster.
>>>>>>>>     But I'm not sure what the correct behavior.
>>>>>>>> 
>>>>>>>>     libqb 0.14.4
>>>>>>>>     corosync 2.3.1
>>>>>>>>     pacemaker 1.1.11
>>>>>>>> 
>>>>>>>>     All build from source in previews week.
>>>>>>>>>>      Now in corosync.conf:
>>>>>>>>>> 
>>>>>>>>>>      totem {
>>>>>>>>>>             version: 2
>>>>>>>>>>             crypto_cipher: none
>>>>>>>>>>             crypto_hash: none
>>>>>>>>>>             interface {
>>>>>>>>>>                     ringnumber: 0
>>>>>>>>>>      bindnetaddr: 10.76.157.18
>>>>>>>>>>      mcastaddr: 239.94.1.56
>>>>>>>>>>                     mcastport: 5405
>>>>>>>>>>                     ttl: 1
>>>>>>>>>>             }
>>>>>>>>>>      }
>>>>>>>>>>      logging {
>>>>>>>>>>             fileline: off
>>>>>>>>>>             to_stderr: no
>>>>>>>>>>             to_logfile: yes
>>>>>>>>>>             logfile: /var/log/cluster/corosync.log
>>>>>>>>>>             to_syslog: yes
>>>>>>>>>>             debug: on
>>>>>>>>>>             timestamp: on
>>>>>>>>>>             logger_subsys {
>>>>>>>>>>                     subsys: QUORUM
>>>>>>>>>>                     debug: on
>>>>>>>>>>             }
>>>>>>>>>>      }
>>>>>>>>>>      quorum {
>>>>>>>>>>             provider: corosync_votequorum
>>>>>>>>>>      }
>>>>>>>>>>      nodelist {
>>>>>>>>>>      node {
>>>>>>>>>>      ring0_addr: 10.76.157.17
>>>>>>>>>>      }
>>>>>>>>>>      node {
>>>>>>>>>>      ring0_addr: 10.76.157.18
>>>>>>>>>>      }
>>>>>>>>>>      node {
>>>>>>>>>>      ring0_addr: 10.76.157.19
>>>>>>>>>>      }
>>>>>>>>>>      }
>>>>>>>>>> 
>>>>>>>>>>      _______________________________________________
>>>>>>>>>>      Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>>>>>>>>      http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>>>>>>> 
>>>>>>>>>>      Project Home: http://www.clusterlabs.org
>>>>>>>>>>      Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>>>>>>>      Bugs: http://bugs.clusterlabs.org
>>>>>>>>>     ,
>>>>>>>>>     _______________________________________________
>>>>>>>>>     Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>>>>>>>     http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>>>>>> 
>>>>>>>>>     Project Home: http://www.clusterlabs.org
>>>>>>>>>     Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>>>>>>     Bugs: http://bugs.clusterlabs.org
>>>>>>>>     _______________________________________________
>>>>>>>>     Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>>>>>>     http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>>>>> 
>>>>>>>>     Project Home: http://www.clusterlabs.org
>>>>>>>>     Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>>>>>     Bugs: http://bugs.clusterlabs.org
>>>>>>>    ,
>>>>>>>    _______________________________________________
>>>>>>>    Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>>>>>    http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>>>> 
>>>>>>>    Project Home: http://www.clusterlabs.org
>>>>>>>    Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>>>>    Bugs: http://bugs.clusterlabs.org
>>>>>>    _______________________________________________
>>>>>>    Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>>>>    http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>>> 
>>>>>>    Project Home: http://www.clusterlabs.org
>>>>>>    Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>>>    Bugs: http://bugs.clusterlabs.org
>>>>>   ,
>>>>>   _______________________________________________
>>>>>   Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>>>   http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>> 
>>>>>   Project Home: http://www.clusterlabs.org
>>>>>   Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>>   Bugs: http://bugs.clusterlabs.org
>>>>  _______________________________________________
>>>>  Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>>  http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>> 
>>>>  Project Home: http://www.clusterlabs.org
>>>>  Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>  Bugs: http://bugs.clusterlabs.org
>>>  _______________________________________________
>>>  Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>  http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>> 
>>>  Project Home: http://www.clusterlabs.org
>>>  Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>  Bugs: http://bugs.clusterlabs.org
>> 
>> ,
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>> 
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 841 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20130903/67ca5e76/attachment-0003.sig>