[Pacemaker] [Openais] Linux HA on debian sparc

Fri Jun 3 12:37:10 EDT 2011

On 06/02/2011 08:16 PM, william felipe_welter wrote:
> Well,
> 
> Now with this patch, the pacemakerd process starts and up his other
> process ( crmd, lrmd, pengine....) but after the process pacemakerd do
> a fork, the forked  process pacemakerd dies due to "signal 10, Bus
> error".. And  on the log, the process of pacemark ( crmd, lrmd,
> pengine....) cant connect to open ais plugin (possible because the
> "death" of the pacemakerd process).
> But this time when the forked pacemakerd dies, he generates a coredump.
> 
> gdb  -c "/usr/var/lib/heartbeat/cores/root/ pacemakerd 7986"  -se
> /usr/sbin/pacemakerd :
> GNU gdb (GDB) 7.0.1-debian
> Copyright (C) 2009 Free Software Foundation, Inc.
> License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
> This is free software: you are free to change and redistribute it.
> There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
> and "show warranty" for details.
> This GDB was configured as "sparc-linux-gnu".
> For bug reporting instructions, please see:
> <http://www.gnu.org/software/gdb/bugs/>...
> Reading symbols from /usr/sbin/pacemakerd...done.
> Reading symbols from /usr/lib64/libuuid.so.1...(no debugging symbols
> found)...done.
> Loaded symbols for /usr/lib64/libuuid.so.1
> Reading symbols from /usr/lib/libcoroipcc.so.4...done.
> Loaded symbols for /usr/lib/libcoroipcc.so.4
> Reading symbols from /usr/lib/libcpg.so.4...done.
> Loaded symbols for /usr/lib/libcpg.so.4
> Reading symbols from /usr/lib/libquorum.so.4...done.
> Loaded symbols for /usr/lib/libquorum.so.4
> Reading symbols from /usr/lib64/libcrmcommon.so.2...done.
> Loaded symbols for /usr/lib64/libcrmcommon.so.2
> Reading symbols from /usr/lib/libcfg.so.4...done.
> Loaded symbols for /usr/lib/libcfg.so.4
> Reading symbols from /usr/lib/libconfdb.so.4...done.
> Loaded symbols for /usr/lib/libconfdb.so.4
> Reading symbols from /usr/lib64/libplumb.so.2...done.
> Loaded symbols for /usr/lib64/libplumb.so.2
> Reading symbols from /usr/lib64/libpils.so.2...done.
> Loaded symbols for /usr/lib64/libpils.so.2
> Reading symbols from /lib/libbz2.so.1.0...(no debugging symbols found)...done.
> Loaded symbols for /lib/libbz2.so.1.0
> Reading symbols from /usr/lib/libxslt.so.1...(no debugging symbols
> found)...done.
> Loaded symbols for /usr/lib/libxslt.so.1
> Reading symbols from /usr/lib/libxml2.so.2...(no debugging symbols
> found)...done.
> Loaded symbols for /usr/lib/libxml2.so.2
> Reading symbols from /lib/libc.so.6...(no debugging symbols found)...done.
> Loaded symbols for /lib/libc.so.6
> Reading symbols from /lib/librt.so.1...(no debugging symbols found)...done.
> Loaded symbols for /lib/librt.so.1
> Reading symbols from /lib/libdl.so.2...(no debugging symbols found)...done.
> Loaded symbols for /lib/libdl.so.2
> Reading symbols from /lib/libglib-2.0.so.0...(no debugging symbols
> found)...done.
> Loaded symbols for /lib/libglib-2.0.so.0
> Reading symbols from /usr/lib/libltdl.so.7...(no debugging symbols
> found)...done.
> Loaded symbols for /usr/lib/libltdl.so.7
> Reading symbols from /lib/ld-linux.so.2...(no debugging symbols found)...done.
> Loaded symbols for /lib/ld-linux.so.2
> Reading symbols from /lib/libpthread.so.0...(no debugging symbols found)...done.
> Loaded symbols for /lib/libpthread.so.0
> Reading symbols from /lib/libm.so.6...(no debugging symbols found)...done.
> Loaded symbols for /lib/libm.so.6
> Reading symbols from /usr/lib/libz.so.1...(no debugging symbols found)...done.
> Loaded symbols for /usr/lib/libz.so.1
> Reading symbols from /lib/libpcre.so.3...(no debugging symbols found)...done.
> Loaded symbols for /lib/libpcre.so.3
> Reading symbols from /lib/libnss_compat.so.2...(no debugging symbols
> found)...done.
> Loaded symbols for /lib/libnss_compat.so.2
> Reading symbols from /lib/libnsl.so.1...(no debugging symbols found)...done.
> Loaded symbols for /lib/libnsl.so.1
> Reading symbols from /lib/libnss_nis.so.2...(no debugging symbols found)...done.
> Loaded symbols for /lib/libnss_nis.so.2
> Reading symbols from /lib/libnss_files.so.2...(no debugging symbols
> found)...done.
> Loaded symbols for /lib/libnss_files.so.2
> Core was generated by `pacemakerd'.
> Program terminated with signal 10, Bus error.
> #0  cpg_dispatch (handle=17861288972693536769, dispatch_types=7986) at cpg.c:339
> 339			switch (dispatch_data->id) {
> (gdb) bt
> #0  cpg_dispatch (handle=17861288972693536769, dispatch_types=7986) at cpg.c:339
> #1  0xf6f100f0 in ?? ()
> #2  0xf6f100f4 in ?? ()
> Backtrace stopped: previous frame identical to this frame (corrupt stack?)
> 
> 
> 
> I take a look at the cpg.c and see that the dispatch_data was aquired
> by coroipcc_dispatch_get (that was defined on lib/coroipcc.c)
> function:
> 
>        do {
>                 error = coroipcc_dispatch_get (
>                         cpg_inst->handle,
>                         (void **)&dispatch_data,
>                         timeout);
> 
> 
> 

Try the recent patch sent to fix alignment.

Regards
-steve

> 
> Resumed log:
> ...
> un 02 23:12:20 corosync [CPG   ] got mcast request on 0x62500
> Jun 02 23:12:20 corosync [TOTEM ] mcasted message added to pending queue
> Jun 02 23:12:20 corosync [TOTEM ] Delivering f to 10
> Jun 02 23:12:20 corosync [TOTEM ] Delivering MCAST message with seq 10
> to pending delivery queue
> Jun 02 23:12:20 corosync [TOTEM ] releasing messages up to and including f
> Jun 02 23:12:20 corosync [TOTEM ] releasing messages up to and including 10
> Jun 02 23:12:20 xxxxxxxxxx pacemakerd: [7986]: info: start_child:
> Forked child 7991 for process lrmd
> Jun 02 23:12:20 xxxxxxxxxx pacemakerd: [7986]: info:
> update_node_processes: Node xxxxxxxxxx now has process list:
> 00000000000000000000000000100112 (was
> 00000000000000000000000000100102)
> Jun 02 23:12:20 corosync [CPG   ] got mcast request on 0x62500
> Jun 02 23:12:20 corosync [TOTEM ] mcasted message added to pending queue
> Jun 02 23:12:20 corosync [TOTEM ] Delivering 10 to 11
> Jun 02 23:12:20 corosync [TOTEM ] Delivering MCAST message with seq 11
> to pending delivery queue
> Jun 02 23:12:20 corosync [TOTEM ] releasing messages up to and including 11
> Jun 02 23:12:20 xxxxxxxxxx pacemakerd: [7986]: info: start_child:
> Forked child 7992 for process attrd
> Jun 02 23:12:20 xxxxxxxxxx pacemakerd: [7986]: info:
> update_node_processes: Node xxxxxxxxxx now has process list:
> 00000000000000000000000000101112 (was
> 00000000000000000000000000100112)
> Jun 02 23:12:20 corosync [CPG   ] got mcast request on 0x62500
> Jun 02 23:12:20 corosync [TOTEM ] mcasted message added to pending queue
> Jun 02 23:12:20 corosync [TOTEM ] Delivering 11 to 12
> Jun 02 23:12:20 corosync [TOTEM ] Delivering MCAST message with seq 12
> to pending delivery queue
> Jun 02 23:12:20 corosync [TOTEM ] releasing messages up to and including 12
> Jun 02 23:12:20 xxxxxxxxxx pacemakerd: [7986]: info: start_child:
> Forked child 7993 for process pengine
> Jun 02 23:12:20 xxxxxxxxxx pacemakerd: [7986]: info:
> update_node_processes: Node xxxxxxxxxx now has process list:
> 00000000000000000000000000111112 (was
> 00000000000000000000000000101112)
> Jun 02 23:12:20 corosync [CPG   ] got mcast request on 0x62500
> Jun 02 23:12:20 corosync [TOTEM ] mcasted message added to pending queue
> Jun 02 23:12:20 corosync [TOTEM ] Delivering 12 to 13
> Jun 02 23:12:20 corosync [TOTEM ] Delivering MCAST message with seq 13
> to pending delivery queue
> Jun 02 23:12:20 corosync [TOTEM ] releasing messages up to and including 13
> Jun 02 23:12:20 xxxxxxxxxx pacemakerd: [7986]: info: start_child:
> Forked child 7994 for process crmd
> Jun 02 23:12:20 xxxxxxxxxx pacemakerd: [7986]: info:
> update_node_processes: Node xxxxxxxxxx now has process list:
> 00000000000000000000000000111312 (was
> 00000000000000000000000000111112)
> Jun 02 23:12:20 corosync [CPG   ] got mcast request on 0x62500
> Jun 02 23:12:20 xxxxxxxxxx pacemakerd: [7986]: info: main: Starting mainloop
> Jun 02 23:12:20 corosync [TOTEM ] mcasted message added to pending queue
> Jun 02 23:12:20 corosync [TOTEM ] Delivering 13 to 14
> Jun 02 23:12:20 corosync [TOTEM ] Delivering MCAST message with seq 14
> to pending delivery queue
> Jun 02 23:12:20 corosync [TOTEM ] releasing messages up to and including 14
> Jun 02 23:12:20 corosync [CPG   ] got mcast request on 0x62500
> Jun 02 23:12:20 corosync [TOTEM ] mcasted message added to pending queue
> Jun 02 23:12:20 corosync [TOTEM ] Delivering 14 to 15
> Jun 02 23:12:20 corosync [TOTEM ] Delivering MCAST message with seq 15
> to pending delivery queue
> Jun 02 23:12:20 corosync [TOTEM ] releasing messages up to and including 15
> Jun 02 23:12:20 xxxxxxxxxx stonith-ng: [7989]: info: Invoked:
> /usr/lib64/heartbeat/stonithd
> Jun 02 23:12:20 xxxxxxxxxx stonith-ng: [7989]: info:
> crm_log_init_worker: Changed active directory to
> /usr/var/lib/heartbeat/cores/root
> Jun 02 23:12:20 xxxxxxxxxx stonith-ng: [7989]: info: get_cluster_type:
> Cluster type is: 'openais'.
> Jun 02 23:12:20 xxxxxxxxxx stonith-ng: [7989]: info:
> crm_cluster_connect: Connecting to cluster infrastructure: classic
> openais (with plugin)
> Jun 02 23:12:20 xxxxxxxxxx stonith-ng: [7989]: info:
> init_ais_connection_classic: Creating connection to our Corosync
> plugin
> Jun 02 23:12:20 xxxxxxxxxx cib: [7990]: info: crm_log_init_worker:
> Changed active directory to /usr/var/lib/heartbeat/cores/hacluster
> Jun 02 23:12:20 xxxxxxxxxx cib: [7990]: info: retrieveCib: Reading
> cluster configuration from: /usr/var/lib/heartbeat/crm/cib.xml
> (digest: /usr/var/lib/heartbeat/crm/cib.xml.sig)
> Jun 02 23:12:20 xxxxxxxxxx cib: [7990]: WARN: retrieveCib: Cluster
> configuration not found: /usr/var/lib/heartbeat/crm/cib.xml
> Jun 02 23:12:20 xxxxxxxxxx cib: [7990]: WARN: readCibXmlFile: Primary
> configuration corrupt or unusable, trying backup...
> Jun 02 23:12:20 xxxxxxxxxx cib: [7990]: debug: get_last_sequence:
> Series file /usr/var/lib/heartbeat/crm/cib.last does not exist
> Jun 02 23:12:20 xxxxxxxxxx cib: [7990]: debug: readCibXmlFile: Backup
> file /usr/var/lib/heartbeat/crm/cib-99.raw not found
> Jun 02 23:12:20 xxxxxxxxxx cib: [7990]: WARN: readCibXmlFile:
> Continuing with an empty configuration.
> Jun 02 23:12:20 xxxxxxxxxx cib: [7990]: debug: readCibXmlFile[on-disk]
> <cib epoch="0" num_updates="0" admin_epoch="0"
> validate-with="pacemaker-1.2" >
> Jun 02 23:12:20 xxxxxxxxxx cib: [7990]: debug: readCibXmlFile[on-disk]
>   <configuration >
> Jun 02 23:12:20 xxxxxxxxxx cib: [7990]: debug: readCibXmlFile[on-disk]
>     <crm_config />
> Jun 02 23:12:20 xxxxxxxxxx cib: [7990]: debug: readCibXmlFile[on-disk]
>     <nodes />
> Jun 02 23:12:20 xxxxxxxxxx cib: [7990]: debug: readCibXmlFile[on-disk]
>     <resources />
> Jun 02 23:12:20 xxxxxxxxxx cib: [7990]: debug: readCibXmlFile[on-disk]
>     <constraints />
> Jun 02 23:12:20 xxxxxxxxxx cib: [7990]: debug: readCibXmlFile[on-disk]
>   </configuration>
> Jun 02 23:12:20 xxxxxxxxxx cib: [7990]: debug: readCibXmlFile[on-disk]
>   <status />
> Jun 02 23:12:20 xxxxxxxxxx cib: [7990]: debug: readCibXmlFile[on-disk] </cib>
> Jun 02 23:12:20 xxxxxxxxxx cib: [7990]: info: validate_with_relaxng:
> Creating RNG parser context
> Jun 02 23:12:20 xxxxxxxxxx stonith-ng: [7989]: info:
> init_ais_connection_classic: Connection to our AIS plugin (9) failed:
> Doesn't exist (12)
> Jun 02 23:12:20 xxxxxxxxxx stonith-ng: [7989]: CRIT: main: Cannot sign
> in to the cluster... terminating
> Jun 02 23:12:20 xxxxxxxxxx crmd: [7994]: info: Invoked:
> /usr/lib64/heartbeat/crmd
> Jun 02 23:12:20 xxxxxxxxxx pengine: [7993]: info: Invoked:
> /usr/lib64/heartbeat/pengine
> Jun 02 23:12:20 xxxxxxxxxx crmd: [7994]: info: crm_log_init_worker:
> Changed active directory to /usr/var/lib/heartbeat/cores/hacluster
> Jun 02 23:12:20 xxxxxxxxxx pengine: [7993]: info: crm_log_init_worker:
> Changed active directory to /usr/var/lib/heartbeat/cores/hacluster
> Jun 02 23:12:20 xxxxxxxxxx crmd: [7994]: info: main: CRM Hg Version:
> e872eeb39a5f6e1fdb57c3108551a5353648c4f4
> 
> Jun 02 23:12:20 xxxxxxxxxx pengine: [7993]: debug: main: Checking for
> old instances of pengine
> Jun 02 23:12:20 xxxxxxxxxx pengine: [7993]: debug:
> init_client_ipc_comms_nodispatch: Attempting to talk on:
> /usr/var/run/crm/pengine
> Jun 02 23:12:20 xxxxxxxxxx lrmd: [7991]: info: enabling coredumps
> Jun 02 23:12:20 xxxxxxxxxx crmd: [7994]: info: crmd_init: Starting crmd
> Jun 02 23:12:20 xxxxxxxxxx pengine: [7993]: debug:
> init_client_ipc_comms_nodispatch: Could not init comms on:
> /usr/var/run/crm/pengine
> Jun 02 23:12:20 xxxxxxxxxx lrmd: [7991]: debug: main: run the loop...
> Jun 02 23:12:20 xxxxxxxxxx lrmd: [7991]: info: Started.
> Jun 02 23:12:20 xxxxxxxxxx pengine: [7993]: debug: main: Init server comms
> Jun 02 23:12:20 xxxxxxxxxx crmd: [7994]: debug: s_crmd_fsa: Processing
> I_STARTUP: [ state=S_STARTING cause=C_STARTUP origin=crmd_init ]
> Jun 02 23:12:20 xxxxxxxxxx crmd: [7994]: debug: do_fsa_action:
> actions:trace: 	// A_LOG
> Jun 02 23:12:20 xxxxxxxxxx crmd: [7994]: debug: do_fsa_action:
> actions:trace: 	// A_STARTUP
> Jun 02 23:12:20 xxxxxxxxxx pengine: [7993]: info: main: Starting pengine
> Jun 02 23:12:20 xxxxxxxxxx crmd: [7994]: debug: do_startup:
> Registering Signal Handlers
> Jun 02 23:12:20 xxxxxxxxxx crmd: [7994]: debug: do_startup: Creating
> CIB and LRM objects
> Jun 02 23:12:20 xxxxxxxxxx crmd: [7994]: debug: do_fsa_action:
> actions:trace: 	// A_CIB_START
> Jun 02 23:12:20 xxxxxxxxxx crmd: [7994]: debug:
> init_client_ipc_comms_nodispatch: Attempting to talk on:
> /usr/var/run/crm/cib_rw
> Jun 02 23:12:20 xxxxxxxxxx crmd: [7994]: debug:
> init_client_ipc_comms_nodispatch: Could not init comms on:
> /usr/var/run/crm/cib_rw
> Jun 02 23:12:20 xxxxxxxxxx crmd: [7994]: debug: cib_native_signon_raw:
> Connection to command channel failed
> Jun 02 23:12:20 xxxxxxxxxx crmd: [7994]: debug:
> init_client_ipc_comms_nodispatch: Attempting to talk on:
> /usr/var/run/crm/cib_callback
> Jun 02 23:12:20 xxxxxxxxxx crmd: [7994]: debug:
> init_client_ipc_comms_nodispatch: Could not init comms on:
> /usr/var/run/crm/cib_callback
> Jun 02 23:12:20 xxxxxxxxxx crmd: [7994]: debug: cib_native_signon_raw:
> Connection to callback channel failed
> Jun 02 23:12:20 xxxxxxxxxx crmd: [7994]: debug: cib_native_signon_raw:
> Connection to CIB failed: connection failed
> Jun 02 23:12:20 xxxxxxxxxx crmd: [7994]: debug: cib_native_signoff:
> Signing out of the CIB Service
> Jun 02 23:12:20 xxxxxxxxxx cib: [7990]: debug: activateCibXml:
> Triggering CIB write for start op
> Jun 02 23:12:20 xxxxxxxxxx cib: [7990]: info: startCib: CIB
> Initialization completed successfully
> Jun 02 23:12:20 xxxxxxxxxx cib: [7990]: info: get_cluster_type:
> Cluster type is: 'openais'.
> Jun 02 23:12:20 xxxxxxxxxx cib: [7990]: info: crm_cluster_connect:
> Connecting to cluster infrastructure: classic openais (with plugin)
> Jun 02 23:12:20 xxxxxxxxxx cib: [7990]: info:
> init_ais_connection_classic: Creating connection to our Corosync
> plugin
> Jun 02 23:12:20 xxxxxxxxxx cib: [7990]: info:
> init_ais_connection_classic: Connection to our AIS plugin (9) failed:
> Doesn't exist (12)
> Jun 02 23:12:20 xxxxxxxxxx cib: [7990]: CRIT: cib_init: Cannot sign in
> to the cluster... terminating
> Jun 02 23:12:21 corosync [CPG   ] exit_fn for conn=0x62500
> Jun 02 23:12:21 corosync [TOTEM ] mcasted message added to pending queue
> Jun 02 23:12:21 corosync [TOTEM ] Delivering 15 to 16
> Jun 02 23:12:21 corosync [TOTEM ] Delivering MCAST message with seq 16
> to pending delivery queue
> Jun 02 23:12:21 corosync [CPG   ] got procleave message from cluster
> node 1377289226
> Jun 02 23:12:21 corosync [TOTEM ] releasing messages up to and including 16
> Jun 02 23:12:21 xxxxxxxxxx attrd: [7992]: info: Invoked:
> /usr/lib64/heartbeat/attrd
> Jun 02 23:12:21 xxxxxxxxxx attrd: [7992]: info: crm_log_init_worker:
> Changed active directory to /usr/var/lib/heartbeat/cores/hacluster
> Jun 02 23:12:21 xxxxxxxxxx attrd: [7992]: info: main: Starting up
> Jun 02 23:12:21 xxxxxxxxxx attrd: [7992]: info: get_cluster_type:
> Cluster type is: 'openais'.
> Jun 02 23:12:21 xxxxxxxxxx attrd: [7992]: info: crm_cluster_connect:
> Connecting to cluster infrastructure: classic openais (with plugin)
> Jun 02 23:12:21 xxxxxxxxxx attrd: [7992]: info:
> init_ais_connection_classic: Creating connection to our Corosync
> plugin
> Jun 02 23:12:21 xxxxxxxxxx attrd: [7992]: info:
> init_ais_connection_classic: Connection to our AIS plugin (9) failed:
> Doesn't exist (12)
> Jun 02 23:12:21 xxxxxxxxxx attrd: [7992]: ERROR: main: HA Signon failed
> Jun 02 23:12:21 xxxxxxxxxx attrd: [7992]: info: main: Cluster connection active
> Jun 02 23:12:21 xxxxxxxxxx attrd: [7992]: info: main: Accepting
> attribute updates
> Jun 02 23:12:21 xxxxxxxxxx attrd: [7992]: ERROR: main: Aborting startup
> Jun 02 23:12:21 xxxxxxxxxx crmd: [7994]: debug:
> init_client_ipc_comms_nodispatch: Attempting to talk on:
> /usr/var/run/crm/cib_rw
> Jun 02 23:12:21 xxxxxxxxxx crmd: [7994]: debug:
> init_client_ipc_comms_nodispatch: Could not init comms on:
> /usr/var/run/crm/cib_rw
> Jun 02 23:12:21 xxxxxxxxxx crmd: [7994]: debug: cib_native_signon_raw:
> Connection to command channel failed
> Jun 02 23:12:21 xxxxxxxxxx crmd: [7994]: debug:
> init_client_ipc_comms_nodispatch: Attempting to talk on:
> /usr/var/run/crm/cib_callback
> ...
> 
> 
> 2011/6/2 Steven Dake <sdake at redhat.com>:
>> On 06/01/2011 11:05 PM, william felipe_welter wrote:
>>> I recompile my kernel without hugetlb .. and the result are the same..
>>>
>>> My test program still resulting:
>>> PATH=/dev/shm/teste123XXXXXX
>>> page size=20000
>>> fd=3
>>> ADDR_ORIG:0xe000a000  ADDR:0xffffffff
>>> Erro
>>>
>>> And Pacemaker still resulting because the mmap error:
>>> Could not initialize Cluster Configuration Database API instance error 2
>>>
>>
>> Give the patch I posted recently a spin - corosync WFM with this patch
>> on sparc64 with hugetlb set.  Please report back results.
>>
>> Regards
>> -steve
>>
>>> For make sure that i have disable the hugetlb there is my /proc/meminfo:
>>> MemTotal:       33093488 kB
>>> MemFree:        32855616 kB
>>> Buffers:            5600 kB
>>> Cached:            53480 kB
>>> SwapCached:            0 kB
>>> Active:            45768 kB
>>> Inactive:          28104 kB
>>> Active(anon):      18024 kB
>>> Inactive(anon):     1560 kB
>>> Active(file):      27744 kB
>>> Inactive(file):    26544 kB
>>> Unevictable:           0 kB
>>> Mlocked:               0 kB
>>> SwapTotal:       6104680 kB
>>> SwapFree:        6104680 kB
>>> Dirty:                 0 kB
>>> Writeback:             0 kB
>>> AnonPages:         14936 kB
>>> Mapped:             7736 kB
>>> Shmem:              4624 kB
>>> Slab:              39184 kB
>>> SReclaimable:      10088 kB
>>> SUnreclaim:        29096 kB
>>> KernelStack:        7088 kB
>>> PageTables:         1160 kB
>>> Quicklists:        17664 kB
>>> NFS_Unstable:          0 kB
>>> Bounce:                0 kB
>>> WritebackTmp:          0 kB
>>> CommitLimit:    22651424 kB
>>> Committed_AS:     519368 kB
>>> VmallocTotal:   1069547520 kB
>>> VmallocUsed:       11064 kB
>>> VmallocChunk:   1069529616 kB
>>>
>>>
>>> 2011/6/1 Steven Dake <sdake at redhat.com>:
>>>> On 06/01/2011 07:42 AM, william felipe_welter wrote:
>>>>> Steven,
>>>>>
>>>>> cat /proc/meminfo
>>>>> ...
>>>>> HugePages_Total:       0
>>>>> HugePages_Free:        0
>>>>> HugePages_Rsvd:        0
>>>>> HugePages_Surp:        0
>>>>> Hugepagesize:       4096 kB
>>>>> ...
>>>>>
>>>>
>>>> It definitely requires a kernel compile and setting the config option to
>>>> off.  I don't know the debian way of doing this.
>>>>
>>>> The only reason you may need this option is if you have very large
>>>> memory sizes, such as 48GB or more.
>>>>
>>>> Regards
>>>> -steve
>>>>
>>>>> Its 4MB..
>>>>>
>>>>> How can i disable hugetlb ? ( passing CONFIG_HUGETLBFS=n at boot to
>>>>> kernel ?)
>>>>>
>>>>> 2011/6/1 Steven Dake <sdake at redhat.com <mailto:sdake at redhat.com>>
>>>>>
>>>>>     On 06/01/2011 01:05 AM, Steven Dake wrote:
>>>>>     > On 05/31/2011 09:44 PM, Angus Salkeld wrote:
>>>>>     >> On Tue, May 31, 2011 at 11:52:48PM -0300, william felipe_welter
>>>>>     wrote:
>>>>>     >>> Angus,
>>>>>     >>>
>>>>>     >>> I make some test program (based on the code coreipcc.c) and i
>>>>>     now i sure
>>>>>     >>> that are problems with the mmap systems call on sparc..
>>>>>     >>>
>>>>>     >>> Source code of my test program:
>>>>>     >>>
>>>>>     >>> #include <stdlib.h>
>>>>>     >>> #include <sys/mman.h>
>>>>>     >>> #include <stdio.h>
>>>>>     >>>
>>>>>     >>> #define PATH_MAX  36
>>>>>     >>>
>>>>>     >>> int main()
>>>>>     >>> {
>>>>>     >>>
>>>>>     >>> int32_t fd;
>>>>>     >>> void *addr_orig;
>>>>>     >>> void *addr;
>>>>>     >>> char path[PATH_MAX];
>>>>>     >>> const char *file = "teste123XXXXXX";
>>>>>     >>> size_t bytes=10024;
>>>>>     >>>
>>>>>     >>> snprintf (path, PATH_MAX, "/dev/shm/%s", file);
>>>>>     >>> printf("PATH=%s\n",path);
>>>>>     >>>
>>>>>     >>> fd = mkstemp (path);
>>>>>     >>> printf("fd=%d \n",fd);
>>>>>     >>>
>>>>>     >>>
>>>>>     >>> addr_orig = mmap (NULL, bytes, PROT_NONE,
>>>>>     >>>               MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);
>>>>>     >>>
>>>>>     >>>
>>>>>     >>> addr = mmap (addr_orig, bytes, PROT_READ | PROT_WRITE,
>>>>>     >>>               MAP_FIXED | MAP_SHARED, fd, 0);
>>>>>     >>>
>>>>>     >>> printf("ADDR_ORIG:%p  ADDR:%p\n",addr_orig,addr);
>>>>>     >>>
>>>>>     >>>
>>>>>     >>>   if (addr != addr_orig) {
>>>>>     >>>                printf("Erro");
>>>>>     >>>         }
>>>>>     >>> }
>>>>>     >>>
>>>>>     >>> Results on x86:
>>>>>     >>> PATH=/dev/shm/teste123XXXXXX
>>>>>     >>> fd=3
>>>>>     >>> ADDR_ORIG:0x7f867d8e6000  ADDR:0x7f867d8e6000
>>>>>     >>>
>>>>>     >>> Results on sparc:
>>>>>     >>> PATH=/dev/shm/teste123XXXXXX
>>>>>     >>> fd=3
>>>>>     >>> ADDR_ORIG:0xf7f72000  ADDR:0xffffffff
>>>>>     >>
>>>>>     >> Note: 0xffffffff == MAP_FAILED
>>>>>     >>
>>>>>     >> (from man mmap)
>>>>>     >> RETURN VALUE
>>>>>     >>        On success, mmap() returns a pointer to the mapped area.  On
>>>>>     >>        error, the value MAP_FAILED (that is, (void *) -1) is
>>>>>     returned,
>>>>>     >>        and errno is  set appropriately.
>>>>>     >>
>>>>>     >>>
>>>>>     >>>
>>>>>     >>> But im wondering if is really needed to call mmap 2 times ?
>>>>>      What are the
>>>>>     >>> reason to call the mmap 2 times, on the second time using the
>>>>>     address of the
>>>>>     >>> first?
>>>>>     >>>
>>>>>     >>>
>>>>>     >> Well there are 3 calls to mmap()
>>>>>     >> 1) one to allocate 2 * what you need (in pages)
>>>>>     >> 2) maps the first half of the mem to a real file
>>>>>     >> 3) maps the second half of the mem to the same file
>>>>>     >>
>>>>>     >> The point is when you write to an address over the end of the
>>>>>     >> first half of memory it is taken care of the the third mmap which
>>>>>     maps
>>>>>     >> the address back to the top of the file for you. This means you
>>>>>     >> don't have to worry about ringbuffer wrapping which can be a
>>>>>     headache.
>>>>>     >>
>>>>>     >> -Angus
>>>>>     >>
>>>>>     >
>>>>>     > interesting this mmap operation doesn't work on sparc linux.
>>>>>     >
>>>>>     > Not sure how I can help here - Next step would be a follow up with the
>>>>>     > sparc linux mailing list.  I'll do that and cc you on the message
>>>>>     - see
>>>>>     > if we get any response.
>>>>>     >
>>>>>     > http://vger.kernel.org/vger-lists.html
>>>>>     >
>>>>>     >>>
>>>>>     >>>
>>>>>     >>>
>>>>>     >>>
>>>>>     >>> 2011/5/31 Angus Salkeld <asalkeld at redhat.com
>>>>>     <mailto:asalkeld at redhat.com>>
>>>>>     >>>
>>>>>     >>>> On Tue, May 31, 2011 at 06:25:56PM -0300, william felipe_welter
>>>>>     wrote:
>>>>>     >>>>> Thanks Steven,
>>>>>     >>>>>
>>>>>     >>>>> Now im try to run on the MCP:
>>>>>     >>>>> - Uninstall the pacemaker 1.0
>>>>>     >>>>> - Compile and install 1.1
>>>>>     >>>>>
>>>>>     >>>>> But now i have problems to initialize the pacemakerd: Could not
>>>>>     >>>> initialize
>>>>>     >>>>> Cluster Configuration Database API instance error 2
>>>>>     >>>>> Debbuging with gdb i see that the error are on the confdb.. most
>>>>>     >>>> specificaly
>>>>>     >>>>> the errors start on coreipcc.c  at line:
>>>>>     >>>>>
>>>>>     >>>>>
>>>>>     >>>>> 448        if (addr != addr_orig) {
>>>>>     >>>>> 449                goto error_close_unlink;  <- enter here
>>>>>     >>>>> 450       }
>>>>>     >>>>>
>>>>>     >>>>> Some ideia about  what can cause this  ?
>>>>>     >>>>>
>>>>>     >>>>
>>>>>     >>>> I tried porting a ringbuffer (www.libqb.org
>>>>>     <http://www.libqb.org>) to sparc and had the same
>>>>>     >>>> failure.
>>>>>     >>>> There are 3 mmap() calls and on sparc the third one keeps failing.
>>>>>     >>>>
>>>>>     >>>> This is a common way of creating a ring buffer, see:
>>>>>     >>>>
>>>>>     http://en.wikipedia.org/wiki/Circular_buffer#Exemplary_POSIX_Implementation
>>>>>     >>>>
>>>>>     >>>> I couldn't get it working in the short time I tried. It's probably
>>>>>     >>>> worth looking at the clib implementation to see why it's failing
>>>>>     >>>> (I didn't get to that).
>>>>>     >>>>
>>>>>     >>>> -Angus
>>>>>     >>>>
>>>>>
>>>>>     Note, we sorted this out we believe.  Your kernel has hugetlb enabled,
>>>>>     probably with 4MB pages.  This requires corosync to allocate 4MB pages.
>>>>>
>>>>>     Can you verify your hugetlb settings?
>>>>>
>>>>>     If you can turn this option off, you should have atleast a working
>>>>>     corosync.
>>>>>
>>>>>     Regards
>>>>>     -steve
>>>>>     >>>>
>>>>>     >>>> _______________________________________________
>>>>>     >>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>>>     <mailto:Pacemaker at oss.clusterlabs.org>
>>>>>     >>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>>     >>>>
>>>>>     >>>> Project Home: http://www.clusterlabs.org
>>>>>     >>>> Getting started:
>>>>>     http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>>     >>>> Bugs:
>>>>>     >>>>
>>>>>     http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>>>>>     >>>>
>>>>>     >>>
>>>>>     >>>
>>>>>     >>>
>>>>>     >>> --
>>>>>     >>> William Felipe Welter
>>>>>     >>> ------------------------------
>>>>>     >>> Consultor em Tecnologias Livres
>>>>>     >>> william.welter at 4linux.com.br <mailto:william.welter at 4linux.com.br>
>>>>>     >>> www.4linux.com.br <http://www.4linux.com.br>
>>>>>     >>
>>>>>     >>> _______________________________________________
>>>>>     >>> Openais mailing list
>>>>>     >>> Openais at lists.linux-foundation.org
>>>>>     <mailto:Openais at lists.linux-foundation.org>
>>>>>     >>> https://lists.linux-foundation.org/mailman/listinfo/openais
>>>>>     >>
>>>>>     >>
>>>>>     >> _______________________________________________
>>>>>     >> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>>>     <mailto:Pacemaker at oss.clusterlabs.org>
>>>>>     >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>>     >>
>>>>>     >> Project Home: http://www.clusterlabs.org
>>>>>     >> Getting started:
>>>>>     http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>>     >> Bugs:
>>>>>     http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>>>>>     >
>>>>>     > _______________________________________________
>>>>>     > Openais mailing list
>>>>>     > Openais at lists.linux-foundation.org
>>>>>     <mailto:Openais at lists.linux-foundation.org>
>>>>>     > https://lists.linux-foundation.org/mailman/listinfo/openais
>>>>>
>>>>>
>>>>>     _______________________________________________
>>>>>     Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>>>     <mailto:Pacemaker at oss.clusterlabs.org>
>>>>>     http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>>
>>>>>     Project Home: http://www.clusterlabs.org
>>>>>     Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>>     Bugs:
>>>>>     http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> William Felipe Welter
>>>>> ------------------------------
>>>>> Consultor em Tecnologias Livres
>>>>> william.welter at 4linux.com.br <mailto:william.welter at 4linux.com.br>
>>>>> www.4linux.com.br <http://www.4linux.com.br>
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>>
>>>>> Project Home: http://www.clusterlabs.org
>>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>> Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>>>>
>>>>
>>>
>>>
>>>
>>
>>
> 
> 
>