[Pacemaker] [Openais] Linux HA on debian sparc
Steven Dake
sdake at redhat.com
Tue Jun 7 09:08:43 EDT 2011
On 06/07/2011 04:44 AM, william felipe_welter wrote:
> More two questions.. The patch for mmap calls will be on the mainly
> development for all archs ?
> Any problems if i send this patch's for Debian project ?
>
These patches will go into the maintenance branches
You can send them to whoever you like ;)
Regards
-steve
> 2011/6/3 Steven Dake <sdake at redhat.com>:
>> On 06/02/2011 08:16 PM, william felipe_welter wrote:
>>> Well,
>>>
>>> Now with this patch, the pacemakerd process starts and up his other
>>> process ( crmd, lrmd, pengine....) but after the process pacemakerd do
>>> a fork, the forked process pacemakerd dies due to "signal 10, Bus
>>> error".. And on the log, the process of pacemark ( crmd, lrmd,
>>> pengine....) cant connect to open ais plugin (possible because the
>>> "death" of the pacemakerd process).
>>> But this time when the forked pacemakerd dies, he generates a coredump.
>>>
>>> gdb -c "/usr/var/lib/heartbeat/cores/root/ pacemakerd 7986" -se
>>> /usr/sbin/pacemakerd :
>>> GNU gdb (GDB) 7.0.1-debian
>>> Copyright (C) 2009 Free Software Foundation, Inc.
>>> License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
>>> This is free software: you are free to change and redistribute it.
>>> There is NO WARRANTY, to the extent permitted by law. Type "show copying"
>>> and "show warranty" for details.
>>> This GDB was configured as "sparc-linux-gnu".
>>> For bug reporting instructions, please see:
>>> <http://www.gnu.org/software/gdb/bugs/>...
>>> Reading symbols from /usr/sbin/pacemakerd...done.
>>> Reading symbols from /usr/lib64/libuuid.so.1...(no debugging symbols
>>> found)...done.
>>> Loaded symbols for /usr/lib64/libuuid.so.1
>>> Reading symbols from /usr/lib/libcoroipcc.so.4...done.
>>> Loaded symbols for /usr/lib/libcoroipcc.so.4
>>> Reading symbols from /usr/lib/libcpg.so.4...done.
>>> Loaded symbols for /usr/lib/libcpg.so.4
>>> Reading symbols from /usr/lib/libquorum.so.4...done.
>>> Loaded symbols for /usr/lib/libquorum.so.4
>>> Reading symbols from /usr/lib64/libcrmcommon.so.2...done.
>>> Loaded symbols for /usr/lib64/libcrmcommon.so.2
>>> Reading symbols from /usr/lib/libcfg.so.4...done.
>>> Loaded symbols for /usr/lib/libcfg.so.4
>>> Reading symbols from /usr/lib/libconfdb.so.4...done.
>>> Loaded symbols for /usr/lib/libconfdb.so.4
>>> Reading symbols from /usr/lib64/libplumb.so.2...done.
>>> Loaded symbols for /usr/lib64/libplumb.so.2
>>> Reading symbols from /usr/lib64/libpils.so.2...done.
>>> Loaded symbols for /usr/lib64/libpils.so.2
>>> Reading symbols from /lib/libbz2.so.1.0...(no debugging symbols found)...done.
>>> Loaded symbols for /lib/libbz2.so.1.0
>>> Reading symbols from /usr/lib/libxslt.so.1...(no debugging symbols
>>> found)...done.
>>> Loaded symbols for /usr/lib/libxslt.so.1
>>> Reading symbols from /usr/lib/libxml2.so.2...(no debugging symbols
>>> found)...done.
>>> Loaded symbols for /usr/lib/libxml2.so.2
>>> Reading symbols from /lib/libc.so.6...(no debugging symbols found)...done.
>>> Loaded symbols for /lib/libc.so.6
>>> Reading symbols from /lib/librt.so.1...(no debugging symbols found)...done.
>>> Loaded symbols for /lib/librt.so.1
>>> Reading symbols from /lib/libdl.so.2...(no debugging symbols found)...done.
>>> Loaded symbols for /lib/libdl.so.2
>>> Reading symbols from /lib/libglib-2.0.so.0...(no debugging symbols
>>> found)...done.
>>> Loaded symbols for /lib/libglib-2.0.so.0
>>> Reading symbols from /usr/lib/libltdl.so.7...(no debugging symbols
>>> found)...done.
>>> Loaded symbols for /usr/lib/libltdl.so.7
>>> Reading symbols from /lib/ld-linux.so.2...(no debugging symbols found)...done.
>>> Loaded symbols for /lib/ld-linux.so.2
>>> Reading symbols from /lib/libpthread.so.0...(no debugging symbols found)...done.
>>> Loaded symbols for /lib/libpthread.so.0
>>> Reading symbols from /lib/libm.so.6...(no debugging symbols found)...done.
>>> Loaded symbols for /lib/libm.so.6
>>> Reading symbols from /usr/lib/libz.so.1...(no debugging symbols found)...done.
>>> Loaded symbols for /usr/lib/libz.so.1
>>> Reading symbols from /lib/libpcre.so.3...(no debugging symbols found)...done.
>>> Loaded symbols for /lib/libpcre.so.3
>>> Reading symbols from /lib/libnss_compat.so.2...(no debugging symbols
>>> found)...done.
>>> Loaded symbols for /lib/libnss_compat.so.2
>>> Reading symbols from /lib/libnsl.so.1...(no debugging symbols found)...done.
>>> Loaded symbols for /lib/libnsl.so.1
>>> Reading symbols from /lib/libnss_nis.so.2...(no debugging symbols found)...done.
>>> Loaded symbols for /lib/libnss_nis.so.2
>>> Reading symbols from /lib/libnss_files.so.2...(no debugging symbols
>>> found)...done.
>>> Loaded symbols for /lib/libnss_files.so.2
>>> Core was generated by `pacemakerd'.
>>> Program terminated with signal 10, Bus error.
>>> #0 cpg_dispatch (handle=17861288972693536769, dispatch_types=7986) at cpg.c:339
>>> 339 switch (dispatch_data->id) {
>>> (gdb) bt
>>> #0 cpg_dispatch (handle=17861288972693536769, dispatch_types=7986) at cpg.c:339
>>> #1 0xf6f100f0 in ?? ()
>>> #2 0xf6f100f4 in ?? ()
>>> Backtrace stopped: previous frame identical to this frame (corrupt stack?)
>>>
>>>
>>>
>>> I take a look at the cpg.c and see that the dispatch_data was aquired
>>> by coroipcc_dispatch_get (that was defined on lib/coroipcc.c)
>>> function:
>>>
>>> do {
>>> error = coroipcc_dispatch_get (
>>> cpg_inst->handle,
>>> (void **)&dispatch_data,
>>> timeout);
>>>
>>>
>>>
>>
>> Try the recent patch sent to fix alignment.
>>
>> Regards
>> -steve
>>
>>>
>>> Resumed log:
>>> ...
>>> un 02 23:12:20 corosync [CPG ] got mcast request on 0x62500
>>> Jun 02 23:12:20 corosync [TOTEM ] mcasted message added to pending queue
>>> Jun 02 23:12:20 corosync [TOTEM ] Delivering f to 10
>>> Jun 02 23:12:20 corosync [TOTEM ] Delivering MCAST message with seq 10
>>> to pending delivery queue
>>> Jun 02 23:12:20 corosync [TOTEM ] releasing messages up to and including f
>>> Jun 02 23:12:20 corosync [TOTEM ] releasing messages up to and including 10
>>> Jun 02 23:12:20 xxxxxxxxxx pacemakerd: [7986]: info: start_child:
>>> Forked child 7991 for process lrmd
>>> Jun 02 23:12:20 xxxxxxxxxx pacemakerd: [7986]: info:
>>> update_node_processes: Node xxxxxxxxxx now has process list:
>>> 00000000000000000000000000100112 (was
>>> 00000000000000000000000000100102)
>>> Jun 02 23:12:20 corosync [CPG ] got mcast request on 0x62500
>>> Jun 02 23:12:20 corosync [TOTEM ] mcasted message added to pending queue
>>> Jun 02 23:12:20 corosync [TOTEM ] Delivering 10 to 11
>>> Jun 02 23:12:20 corosync [TOTEM ] Delivering MCAST message with seq 11
>>> to pending delivery queue
>>> Jun 02 23:12:20 corosync [TOTEM ] releasing messages up to and including 11
>>> Jun 02 23:12:20 xxxxxxxxxx pacemakerd: [7986]: info: start_child:
>>> Forked child 7992 for process attrd
>>> Jun 02 23:12:20 xxxxxxxxxx pacemakerd: [7986]: info:
>>> update_node_processes: Node xxxxxxxxxx now has process list:
>>> 00000000000000000000000000101112 (was
>>> 00000000000000000000000000100112)
>>> Jun 02 23:12:20 corosync [CPG ] got mcast request on 0x62500
>>> Jun 02 23:12:20 corosync [TOTEM ] mcasted message added to pending queue
>>> Jun 02 23:12:20 corosync [TOTEM ] Delivering 11 to 12
>>> Jun 02 23:12:20 corosync [TOTEM ] Delivering MCAST message with seq 12
>>> to pending delivery queue
>>> Jun 02 23:12:20 corosync [TOTEM ] releasing messages up to and including 12
>>> Jun 02 23:12:20 xxxxxxxxxx pacemakerd: [7986]: info: start_child:
>>> Forked child 7993 for process pengine
>>> Jun 02 23:12:20 xxxxxxxxxx pacemakerd: [7986]: info:
>>> update_node_processes: Node xxxxxxxxxx now has process list:
>>> 00000000000000000000000000111112 (was
>>> 00000000000000000000000000101112)
>>> Jun 02 23:12:20 corosync [CPG ] got mcast request on 0x62500
>>> Jun 02 23:12:20 corosync [TOTEM ] mcasted message added to pending queue
>>> Jun 02 23:12:20 corosync [TOTEM ] Delivering 12 to 13
>>> Jun 02 23:12:20 corosync [TOTEM ] Delivering MCAST message with seq 13
>>> to pending delivery queue
>>> Jun 02 23:12:20 corosync [TOTEM ] releasing messages up to and including 13
>>> Jun 02 23:12:20 xxxxxxxxxx pacemakerd: [7986]: info: start_child:
>>> Forked child 7994 for process crmd
>>> Jun 02 23:12:20 xxxxxxxxxx pacemakerd: [7986]: info:
>>> update_node_processes: Node xxxxxxxxxx now has process list:
>>> 00000000000000000000000000111312 (was
>>> 00000000000000000000000000111112)
>>> Jun 02 23:12:20 corosync [CPG ] got mcast request on 0x62500
>>> Jun 02 23:12:20 xxxxxxxxxx pacemakerd: [7986]: info: main: Starting mainloop
>>> Jun 02 23:12:20 corosync [TOTEM ] mcasted message added to pending queue
>>> Jun 02 23:12:20 corosync [TOTEM ] Delivering 13 to 14
>>> Jun 02 23:12:20 corosync [TOTEM ] Delivering MCAST message with seq 14
>>> to pending delivery queue
>>> Jun 02 23:12:20 corosync [TOTEM ] releasing messages up to and including 14
>>> Jun 02 23:12:20 corosync [CPG ] got mcast request on 0x62500
>>> Jun 02 23:12:20 corosync [TOTEM ] mcasted message added to pending queue
>>> Jun 02 23:12:20 corosync [TOTEM ] Delivering 14 to 15
>>> Jun 02 23:12:20 corosync [TOTEM ] Delivering MCAST message with seq 15
>>> to pending delivery queue
>>> Jun 02 23:12:20 corosync [TOTEM ] releasing messages up to and including 15
>>> Jun 02 23:12:20 xxxxxxxxxx stonith-ng: [7989]: info: Invoked:
>>> /usr/lib64/heartbeat/stonithd
>>> Jun 02 23:12:20 xxxxxxxxxx stonith-ng: [7989]: info:
>>> crm_log_init_worker: Changed active directory to
>>> /usr/var/lib/heartbeat/cores/root
>>> Jun 02 23:12:20 xxxxxxxxxx stonith-ng: [7989]: info: get_cluster_type:
>>> Cluster type is: 'openais'.
>>> Jun 02 23:12:20 xxxxxxxxxx stonith-ng: [7989]: info:
>>> crm_cluster_connect: Connecting to cluster infrastructure: classic
>>> openais (with plugin)
>>> Jun 02 23:12:20 xxxxxxxxxx stonith-ng: [7989]: info:
>>> init_ais_connection_classic: Creating connection to our Corosync
>>> plugin
>>> Jun 02 23:12:20 xxxxxxxxxx cib: [7990]: info: crm_log_init_worker:
>>> Changed active directory to /usr/var/lib/heartbeat/cores/hacluster
>>> Jun 02 23:12:20 xxxxxxxxxx cib: [7990]: info: retrieveCib: Reading
>>> cluster configuration from: /usr/var/lib/heartbeat/crm/cib.xml
>>> (digest: /usr/var/lib/heartbeat/crm/cib.xml.sig)
>>> Jun 02 23:12:20 xxxxxxxxxx cib: [7990]: WARN: retrieveCib: Cluster
>>> configuration not found: /usr/var/lib/heartbeat/crm/cib.xml
>>> Jun 02 23:12:20 xxxxxxxxxx cib: [7990]: WARN: readCibXmlFile: Primary
>>> configuration corrupt or unusable, trying backup...
>>> Jun 02 23:12:20 xxxxxxxxxx cib: [7990]: debug: get_last_sequence:
>>> Series file /usr/var/lib/heartbeat/crm/cib.last does not exist
>>> Jun 02 23:12:20 xxxxxxxxxx cib: [7990]: debug: readCibXmlFile: Backup
>>> file /usr/var/lib/heartbeat/crm/cib-99.raw not found
>>> Jun 02 23:12:20 xxxxxxxxxx cib: [7990]: WARN: readCibXmlFile:
>>> Continuing with an empty configuration.
>>> Jun 02 23:12:20 xxxxxxxxxx cib: [7990]: debug: readCibXmlFile[on-disk]
>>> <cib epoch="0" num_updates="0" admin_epoch="0"
>>> validate-with="pacemaker-1.2" >
>>> Jun 02 23:12:20 xxxxxxxxxx cib: [7990]: debug: readCibXmlFile[on-disk]
>>> <configuration >
>>> Jun 02 23:12:20 xxxxxxxxxx cib: [7990]: debug: readCibXmlFile[on-disk]
>>> <crm_config />
>>> Jun 02 23:12:20 xxxxxxxxxx cib: [7990]: debug: readCibXmlFile[on-disk]
>>> <nodes />
>>> Jun 02 23:12:20 xxxxxxxxxx cib: [7990]: debug: readCibXmlFile[on-disk]
>>> <resources />
>>> Jun 02 23:12:20 xxxxxxxxxx cib: [7990]: debug: readCibXmlFile[on-disk]
>>> <constraints />
>>> Jun 02 23:12:20 xxxxxxxxxx cib: [7990]: debug: readCibXmlFile[on-disk]
>>> </configuration>
>>> Jun 02 23:12:20 xxxxxxxxxx cib: [7990]: debug: readCibXmlFile[on-disk]
>>> <status />
>>> Jun 02 23:12:20 xxxxxxxxxx cib: [7990]: debug: readCibXmlFile[on-disk] </cib>
>>> Jun 02 23:12:20 xxxxxxxxxx cib: [7990]: info: validate_with_relaxng:
>>> Creating RNG parser context
>>> Jun 02 23:12:20 xxxxxxxxxx stonith-ng: [7989]: info:
>>> init_ais_connection_classic: Connection to our AIS plugin (9) failed:
>>> Doesn't exist (12)
>>> Jun 02 23:12:20 xxxxxxxxxx stonith-ng: [7989]: CRIT: main: Cannot sign
>>> in to the cluster... terminating
>>> Jun 02 23:12:20 xxxxxxxxxx crmd: [7994]: info: Invoked:
>>> /usr/lib64/heartbeat/crmd
>>> Jun 02 23:12:20 xxxxxxxxxx pengine: [7993]: info: Invoked:
>>> /usr/lib64/heartbeat/pengine
>>> Jun 02 23:12:20 xxxxxxxxxx crmd: [7994]: info: crm_log_init_worker:
>>> Changed active directory to /usr/var/lib/heartbeat/cores/hacluster
>>> Jun 02 23:12:20 xxxxxxxxxx pengine: [7993]: info: crm_log_init_worker:
>>> Changed active directory to /usr/var/lib/heartbeat/cores/hacluster
>>> Jun 02 23:12:20 xxxxxxxxxx crmd: [7994]: info: main: CRM Hg Version:
>>> e872eeb39a5f6e1fdb57c3108551a5353648c4f4
>>>
>>> Jun 02 23:12:20 xxxxxxxxxx pengine: [7993]: debug: main: Checking for
>>> old instances of pengine
>>> Jun 02 23:12:20 xxxxxxxxxx pengine: [7993]: debug:
>>> init_client_ipc_comms_nodispatch: Attempting to talk on:
>>> /usr/var/run/crm/pengine
>>> Jun 02 23:12:20 xxxxxxxxxx lrmd: [7991]: info: enabling coredumps
>>> Jun 02 23:12:20 xxxxxxxxxx crmd: [7994]: info: crmd_init: Starting crmd
>>> Jun 02 23:12:20 xxxxxxxxxx pengine: [7993]: debug:
>>> init_client_ipc_comms_nodispatch: Could not init comms on:
>>> /usr/var/run/crm/pengine
>>> Jun 02 23:12:20 xxxxxxxxxx lrmd: [7991]: debug: main: run the loop...
>>> Jun 02 23:12:20 xxxxxxxxxx lrmd: [7991]: info: Started.
>>> Jun 02 23:12:20 xxxxxxxxxx pengine: [7993]: debug: main: Init server comms
>>> Jun 02 23:12:20 xxxxxxxxxx crmd: [7994]: debug: s_crmd_fsa: Processing
>>> I_STARTUP: [ state=S_STARTING cause=C_STARTUP origin=crmd_init ]
>>> Jun 02 23:12:20 xxxxxxxxxx crmd: [7994]: debug: do_fsa_action:
>>> actions:trace: // A_LOG
>>> Jun 02 23:12:20 xxxxxxxxxx crmd: [7994]: debug: do_fsa_action:
>>> actions:trace: // A_STARTUP
>>> Jun 02 23:12:20 xxxxxxxxxx pengine: [7993]: info: main: Starting pengine
>>> Jun 02 23:12:20 xxxxxxxxxx crmd: [7994]: debug: do_startup:
>>> Registering Signal Handlers
>>> Jun 02 23:12:20 xxxxxxxxxx crmd: [7994]: debug: do_startup: Creating
>>> CIB and LRM objects
>>> Jun 02 23:12:20 xxxxxxxxxx crmd: [7994]: debug: do_fsa_action:
>>> actions:trace: // A_CIB_START
>>> Jun 02 23:12:20 xxxxxxxxxx crmd: [7994]: debug:
>>> init_client_ipc_comms_nodispatch: Attempting to talk on:
>>> /usr/var/run/crm/cib_rw
>>> Jun 02 23:12:20 xxxxxxxxxx crmd: [7994]: debug:
>>> init_client_ipc_comms_nodispatch: Could not init comms on:
>>> /usr/var/run/crm/cib_rw
>>> Jun 02 23:12:20 xxxxxxxxxx crmd: [7994]: debug: cib_native_signon_raw:
>>> Connection to command channel failed
>>> Jun 02 23:12:20 xxxxxxxxxx crmd: [7994]: debug:
>>> init_client_ipc_comms_nodispatch: Attempting to talk on:
>>> /usr/var/run/crm/cib_callback
>>> Jun 02 23:12:20 xxxxxxxxxx crmd: [7994]: debug:
>>> init_client_ipc_comms_nodispatch: Could not init comms on:
>>> /usr/var/run/crm/cib_callback
>>> Jun 02 23:12:20 xxxxxxxxxx crmd: [7994]: debug: cib_native_signon_raw:
>>> Connection to callback channel failed
>>> Jun 02 23:12:20 xxxxxxxxxx crmd: [7994]: debug: cib_native_signon_raw:
>>> Connection to CIB failed: connection failed
>>> Jun 02 23:12:20 xxxxxxxxxx crmd: [7994]: debug: cib_native_signoff:
>>> Signing out of the CIB Service
>>> Jun 02 23:12:20 xxxxxxxxxx cib: [7990]: debug: activateCibXml:
>>> Triggering CIB write for start op
>>> Jun 02 23:12:20 xxxxxxxxxx cib: [7990]: info: startCib: CIB
>>> Initialization completed successfully
>>> Jun 02 23:12:20 xxxxxxxxxx cib: [7990]: info: get_cluster_type:
>>> Cluster type is: 'openais'.
>>> Jun 02 23:12:20 xxxxxxxxxx cib: [7990]: info: crm_cluster_connect:
>>> Connecting to cluster infrastructure: classic openais (with plugin)
>>> Jun 02 23:12:20 xxxxxxxxxx cib: [7990]: info:
>>> init_ais_connection_classic: Creating connection to our Corosync
>>> plugin
>>> Jun 02 23:12:20 xxxxxxxxxx cib: [7990]: info:
>>> init_ais_connection_classic: Connection to our AIS plugin (9) failed:
>>> Doesn't exist (12)
>>> Jun 02 23:12:20 xxxxxxxxxx cib: [7990]: CRIT: cib_init: Cannot sign in
>>> to the cluster... terminating
>>> Jun 02 23:12:21 corosync [CPG ] exit_fn for conn=0x62500
>>> Jun 02 23:12:21 corosync [TOTEM ] mcasted message added to pending queue
>>> Jun 02 23:12:21 corosync [TOTEM ] Delivering 15 to 16
>>> Jun 02 23:12:21 corosync [TOTEM ] Delivering MCAST message with seq 16
>>> to pending delivery queue
>>> Jun 02 23:12:21 corosync [CPG ] got procleave message from cluster
>>> node 1377289226
>>> Jun 02 23:12:21 corosync [TOTEM ] releasing messages up to and including 16
>>> Jun 02 23:12:21 xxxxxxxxxx attrd: [7992]: info: Invoked:
>>> /usr/lib64/heartbeat/attrd
>>> Jun 02 23:12:21 xxxxxxxxxx attrd: [7992]: info: crm_log_init_worker:
>>> Changed active directory to /usr/var/lib/heartbeat/cores/hacluster
>>> Jun 02 23:12:21 xxxxxxxxxx attrd: [7992]: info: main: Starting up
>>> Jun 02 23:12:21 xxxxxxxxxx attrd: [7992]: info: get_cluster_type:
>>> Cluster type is: 'openais'.
>>> Jun 02 23:12:21 xxxxxxxxxx attrd: [7992]: info: crm_cluster_connect:
>>> Connecting to cluster infrastructure: classic openais (with plugin)
>>> Jun 02 23:12:21 xxxxxxxxxx attrd: [7992]: info:
>>> init_ais_connection_classic: Creating connection to our Corosync
>>> plugin
>>> Jun 02 23:12:21 xxxxxxxxxx attrd: [7992]: info:
>>> init_ais_connection_classic: Connection to our AIS plugin (9) failed:
>>> Doesn't exist (12)
>>> Jun 02 23:12:21 xxxxxxxxxx attrd: [7992]: ERROR: main: HA Signon failed
>>> Jun 02 23:12:21 xxxxxxxxxx attrd: [7992]: info: main: Cluster connection active
>>> Jun 02 23:12:21 xxxxxxxxxx attrd: [7992]: info: main: Accepting
>>> attribute updates
>>> Jun 02 23:12:21 xxxxxxxxxx attrd: [7992]: ERROR: main: Aborting startup
>>> Jun 02 23:12:21 xxxxxxxxxx crmd: [7994]: debug:
>>> init_client_ipc_comms_nodispatch: Attempting to talk on:
>>> /usr/var/run/crm/cib_rw
>>> Jun 02 23:12:21 xxxxxxxxxx crmd: [7994]: debug:
>>> init_client_ipc_comms_nodispatch: Could not init comms on:
>>> /usr/var/run/crm/cib_rw
>>> Jun 02 23:12:21 xxxxxxxxxx crmd: [7994]: debug: cib_native_signon_raw:
>>> Connection to command channel failed
>>> Jun 02 23:12:21 xxxxxxxxxx crmd: [7994]: debug:
>>> init_client_ipc_comms_nodispatch: Attempting to talk on:
>>> /usr/var/run/crm/cib_callback
>>> ...
>>>
>>>
>>> 2011/6/2 Steven Dake <sdake at redhat.com>:
>>>> On 06/01/2011 11:05 PM, william felipe_welter wrote:
>>>>> I recompile my kernel without hugetlb .. and the result are the same..
>>>>>
>>>>> My test program still resulting:
>>>>> PATH=/dev/shm/teste123XXXXXX
>>>>> page size=20000
>>>>> fd=3
>>>>> ADDR_ORIG:0xe000a000 ADDR:0xffffffff
>>>>> Erro
>>>>>
>>>>> And Pacemaker still resulting because the mmap error:
>>>>> Could not initialize Cluster Configuration Database API instance error 2
>>>>>
>>>>
>>>> Give the patch I posted recently a spin - corosync WFM with this patch
>>>> on sparc64 with hugetlb set. Please report back results.
>>>>
>>>> Regards
>>>> -steve
>>>>
>>>>> For make sure that i have disable the hugetlb there is my /proc/meminfo:
>>>>> MemTotal: 33093488 kB
>>>>> MemFree: 32855616 kB
>>>>> Buffers: 5600 kB
>>>>> Cached: 53480 kB
>>>>> SwapCached: 0 kB
>>>>> Active: 45768 kB
>>>>> Inactive: 28104 kB
>>>>> Active(anon): 18024 kB
>>>>> Inactive(anon): 1560 kB
>>>>> Active(file): 27744 kB
>>>>> Inactive(file): 26544 kB
>>>>> Unevictable: 0 kB
>>>>> Mlocked: 0 kB
>>>>> SwapTotal: 6104680 kB
>>>>> SwapFree: 6104680 kB
>>>>> Dirty: 0 kB
>>>>> Writeback: 0 kB
>>>>> AnonPages: 14936 kB
>>>>> Mapped: 7736 kB
>>>>> Shmem: 4624 kB
>>>>> Slab: 39184 kB
>>>>> SReclaimable: 10088 kB
>>>>> SUnreclaim: 29096 kB
>>>>> KernelStack: 7088 kB
>>>>> PageTables: 1160 kB
>>>>> Quicklists: 17664 kB
>>>>> NFS_Unstable: 0 kB
>>>>> Bounce: 0 kB
>>>>> WritebackTmp: 0 kB
>>>>> CommitLimit: 22651424 kB
>>>>> Committed_AS: 519368 kB
>>>>> VmallocTotal: 1069547520 kB
>>>>> VmallocUsed: 11064 kB
>>>>> VmallocChunk: 1069529616 kB
>>>>>
>>>>>
>>>>> 2011/6/1 Steven Dake <sdake at redhat.com>:
>>>>>> On 06/01/2011 07:42 AM, william felipe_welter wrote:
>>>>>>> Steven,
>>>>>>>
>>>>>>> cat /proc/meminfo
>>>>>>> ...
>>>>>>> HugePages_Total: 0
>>>>>>> HugePages_Free: 0
>>>>>>> HugePages_Rsvd: 0
>>>>>>> HugePages_Surp: 0
>>>>>>> Hugepagesize: 4096 kB
>>>>>>> ...
>>>>>>>
>>>>>>
>>>>>> It definitely requires a kernel compile and setting the config option to
>>>>>> off. I don't know the debian way of doing this.
>>>>>>
>>>>>> The only reason you may need this option is if you have very large
>>>>>> memory sizes, such as 48GB or more.
>>>>>>
>>>>>> Regards
>>>>>> -steve
>>>>>>
>>>>>>> Its 4MB..
>>>>>>>
>>>>>>> How can i disable hugetlb ? ( passing CONFIG_HUGETLBFS=n at boot to
>>>>>>> kernel ?)
>>>>>>>
>>>>>>> 2011/6/1 Steven Dake <sdake at redhat.com <mailto:sdake at redhat.com>>
>>>>>>>
>>>>>>> On 06/01/2011 01:05 AM, Steven Dake wrote:
>>>>>>> > On 05/31/2011 09:44 PM, Angus Salkeld wrote:
>>>>>>> >> On Tue, May 31, 2011 at 11:52:48PM -0300, william felipe_welter
>>>>>>> wrote:
>>>>>>> >>> Angus,
>>>>>>> >>>
>>>>>>> >>> I make some test program (based on the code coreipcc.c) and i
>>>>>>> now i sure
>>>>>>> >>> that are problems with the mmap systems call on sparc..
>>>>>>> >>>
>>>>>>> >>> Source code of my test program:
>>>>>>> >>>
>>>>>>> >>> #include <stdlib.h>
>>>>>>> >>> #include <sys/mman.h>
>>>>>>> >>> #include <stdio.h>
>>>>>>> >>>
>>>>>>> >>> #define PATH_MAX 36
>>>>>>> >>>
>>>>>>> >>> int main()
>>>>>>> >>> {
>>>>>>> >>>
>>>>>>> >>> int32_t fd;
>>>>>>> >>> void *addr_orig;
>>>>>>> >>> void *addr;
>>>>>>> >>> char path[PATH_MAX];
>>>>>>> >>> const char *file = "teste123XXXXXX";
>>>>>>> >>> size_t bytes=10024;
>>>>>>> >>>
>>>>>>> >>> snprintf (path, PATH_MAX, "/dev/shm/%s", file);
>>>>>>> >>> printf("PATH=%s\n",path);
>>>>>>> >>>
>>>>>>> >>> fd = mkstemp (path);
>>>>>>> >>> printf("fd=%d \n",fd);
>>>>>>> >>>
>>>>>>> >>>
>>>>>>> >>> addr_orig = mmap (NULL, bytes, PROT_NONE,
>>>>>>> >>> MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);
>>>>>>> >>>
>>>>>>> >>>
>>>>>>> >>> addr = mmap (addr_orig, bytes, PROT_READ | PROT_WRITE,
>>>>>>> >>> MAP_FIXED | MAP_SHARED, fd, 0);
>>>>>>> >>>
>>>>>>> >>> printf("ADDR_ORIG:%p ADDR:%p\n",addr_orig,addr);
>>>>>>> >>>
>>>>>>> >>>
>>>>>>> >>> if (addr != addr_orig) {
>>>>>>> >>> printf("Erro");
>>>>>>> >>> }
>>>>>>> >>> }
>>>>>>> >>>
>>>>>>> >>> Results on x86:
>>>>>>> >>> PATH=/dev/shm/teste123XXXXXX
>>>>>>> >>> fd=3
>>>>>>> >>> ADDR_ORIG:0x7f867d8e6000 ADDR:0x7f867d8e6000
>>>>>>> >>>
>>>>>>> >>> Results on sparc:
>>>>>>> >>> PATH=/dev/shm/teste123XXXXXX
>>>>>>> >>> fd=3
>>>>>>> >>> ADDR_ORIG:0xf7f72000 ADDR:0xffffffff
>>>>>>> >>
>>>>>>> >> Note: 0xffffffff == MAP_FAILED
>>>>>>> >>
>>>>>>> >> (from man mmap)
>>>>>>> >> RETURN VALUE
>>>>>>> >> On success, mmap() returns a pointer to the mapped area. On
>>>>>>> >> error, the value MAP_FAILED (that is, (void *) -1) is
>>>>>>> returned,
>>>>>>> >> and errno is set appropriately.
>>>>>>> >>
>>>>>>> >>>
>>>>>>> >>>
>>>>>>> >>> But im wondering if is really needed to call mmap 2 times ?
>>>>>>> What are the
>>>>>>> >>> reason to call the mmap 2 times, on the second time using the
>>>>>>> address of the
>>>>>>> >>> first?
>>>>>>> >>>
>>>>>>> >>>
>>>>>>> >> Well there are 3 calls to mmap()
>>>>>>> >> 1) one to allocate 2 * what you need (in pages)
>>>>>>> >> 2) maps the first half of the mem to a real file
>>>>>>> >> 3) maps the second half of the mem to the same file
>>>>>>> >>
>>>>>>> >> The point is when you write to an address over the end of the
>>>>>>> >> first half of memory it is taken care of the the third mmap which
>>>>>>> maps
>>>>>>> >> the address back to the top of the file for you. This means you
>>>>>>> >> don't have to worry about ringbuffer wrapping which can be a
>>>>>>> headache.
>>>>>>> >>
>>>>>>> >> -Angus
>>>>>>> >>
>>>>>>> >
>>>>>>> > interesting this mmap operation doesn't work on sparc linux.
>>>>>>> >
>>>>>>> > Not sure how I can help here - Next step would be a follow up with the
>>>>>>> > sparc linux mailing list. I'll do that and cc you on the message
>>>>>>> - see
>>>>>>> > if we get any response.
>>>>>>> >
>>>>>>> > http://vger.kernel.org/vger-lists.html
>>>>>>> >
>>>>>>> >>>
>>>>>>> >>>
>>>>>>> >>>
>>>>>>> >>>
>>>>>>> >>> 2011/5/31 Angus Salkeld <asalkeld at redhat.com
>>>>>>> <mailto:asalkeld at redhat.com>>
>>>>>>> >>>
>>>>>>> >>>> On Tue, May 31, 2011 at 06:25:56PM -0300, william felipe_welter
>>>>>>> wrote:
>>>>>>> >>>>> Thanks Steven,
>>>>>>> >>>>>
>>>>>>> >>>>> Now im try to run on the MCP:
>>>>>>> >>>>> - Uninstall the pacemaker 1.0
>>>>>>> >>>>> - Compile and install 1.1
>>>>>>> >>>>>
>>>>>>> >>>>> But now i have problems to initialize the pacemakerd: Could not
>>>>>>> >>>> initialize
>>>>>>> >>>>> Cluster Configuration Database API instance error 2
>>>>>>> >>>>> Debbuging with gdb i see that the error are on the confdb.. most
>>>>>>> >>>> specificaly
>>>>>>> >>>>> the errors start on coreipcc.c at line:
>>>>>>> >>>>>
>>>>>>> >>>>>
>>>>>>> >>>>> 448 if (addr != addr_orig) {
>>>>>>> >>>>> 449 goto error_close_unlink; <- enter here
>>>>>>> >>>>> 450 }
>>>>>>> >>>>>
>>>>>>> >>>>> Some ideia about what can cause this ?
>>>>>>> >>>>>
>>>>>>> >>>>
>>>>>>> >>>> I tried porting a ringbuffer (www.libqb.org
>>>>>>> <http://www.libqb.org>) to sparc and had the same
>>>>>>> >>>> failure.
>>>>>>> >>>> There are 3 mmap() calls and on sparc the third one keeps failing.
>>>>>>> >>>>
>>>>>>> >>>> This is a common way of creating a ring buffer, see:
>>>>>>> >>>>
>>>>>>> http://en.wikipedia.org/wiki/Circular_buffer#Exemplary_POSIX_Implementation
>>>>>>> >>>>
>>>>>>> >>>> I couldn't get it working in the short time I tried. It's probably
>>>>>>> >>>> worth looking at the clib implementation to see why it's failing
>>>>>>> >>>> (I didn't get to that).
>>>>>>> >>>>
>>>>>>> >>>> -Angus
>>>>>>> >>>>
>>>>>>>
>>>>>>> Note, we sorted this out we believe. Your kernel has hugetlb enabled,
>>>>>>> probably with 4MB pages. This requires corosync to allocate 4MB pages.
>>>>>>>
>>>>>>> Can you verify your hugetlb settings?
>>>>>>>
>>>>>>> If you can turn this option off, you should have atleast a working
>>>>>>> corosync.
>>>>>>>
>>>>>>> Regards
>>>>>>> -steve
>>>>>>> >>>>
>>>>>>> >>>> _______________________________________________
>>>>>>> >>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>>>>> <mailto:Pacemaker at oss.clusterlabs.org>
>>>>>>> >>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>>>> >>>>
>>>>>>> >>>> Project Home: http://www.clusterlabs.org
>>>>>>> >>>> Getting started:
>>>>>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>>>> >>>> Bugs:
>>>>>>> >>>>
>>>>>>> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>>>>>>> >>>>
>>>>>>> >>>
>>>>>>> >>>
>>>>>>> >>>
>>>>>>> >>> --
>>>>>>> >>> William Felipe Welter
>>>>>>> >>> ------------------------------
>>>>>>> >>> Consultor em Tecnologias Livres
>>>>>>> >>> william.welter at 4linux.com.br <mailto:william.welter at 4linux.com.br>
>>>>>>> >>> www.4linux.com.br <http://www.4linux.com.br>
>>>>>>> >>
>>>>>>> >>> _______________________________________________
>>>>>>> >>> Openais mailing list
>>>>>>> >>> Openais at lists.linux-foundation.org
>>>>>>> <mailto:Openais at lists.linux-foundation.org>
>>>>>>> >>> https://lists.linux-foundation.org/mailman/listinfo/openais
>>>>>>> >>
>>>>>>> >>
>>>>>>> >> _______________________________________________
>>>>>>> >> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>>>>> <mailto:Pacemaker at oss.clusterlabs.org>
>>>>>>> >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>>>> >>
>>>>>>> >> Project Home: http://www.clusterlabs.org
>>>>>>> >> Getting started:
>>>>>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>>>> >> Bugs:
>>>>>>> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>>>>>>> >
>>>>>>> > _______________________________________________
>>>>>>> > Openais mailing list
>>>>>>> > Openais at lists.linux-foundation.org
>>>>>>> <mailto:Openais at lists.linux-foundation.org>
>>>>>>> > https://lists.linux-foundation.org/mailman/listinfo/openais
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>>>>> <mailto:Pacemaker at oss.clusterlabs.org>
>>>>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>>>>
>>>>>>> Project Home: http://www.clusterlabs.org
>>>>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>>>> Bugs:
>>>>>>> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> William Felipe Welter
>>>>>>> ------------------------------
>>>>>>> Consultor em Tecnologias Livres
>>>>>>> william.welter at 4linux.com.br <mailto:william.welter at 4linux.com.br>
>>>>>>> www.4linux.com.br <http://www.4linux.com.br>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>>>>
>>>>>>> Project Home: http://www.clusterlabs.org
>>>>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>>>> Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>>>
>>>
>>
>>
>
>
>
More information about the Pacemaker
mailing list