[Pacemaker] pacemaker unable to start

Wed Oct 21 12:28:55 EDT 2009

Steve, this is what my installation shows--

ls -l /usr/libexec/lcrso

-rwxr-xr-x  1 root root  101243 Jul 29 11:21 coroparse.lcrso
-rwxr-xr-x  1 root root  117688 Jul 29 11:21 objdb.lcrso
-rwxr-xr-x  1 root root   92702 Jul 29 11:54 openaisserviceenable.lcrso
-rwxr-xr-x  1 root root  110808 Jul 29 11:21 quorum_testquorum.lcrso
-rwxr-xr-x  1 root root  159057 Jul 29 11:21 quorum_votequorum.lcrso
-rwxr-xr-x  1 root root 1175430 Jul 29 11:54 service_amf.lcrso
-rwxr-xr-x  1 root root  133976 Jul 29 11:21 service_cfg.lcrso
-rwxr-xr-x  1 root root  218374 Jul 29 11:54 service_ckpt.lcrso
-rwxr-xr-x  1 root root  139029 Jul 29 11:54 service_clm.lcrso
-rwxr-xr-x  1 root root  122668 Jul 29 11:21 service_confdb.lcrso
-rwxr-xr-x  1 root root  138412 Jul 29 11:21 service_cpg.lcrso
-rwxr-xr-x  1 root root  125638 Jul 29 11:21 service_evs.lcrso
-rwxr-xr-x  1 root root  196443 Jul 29 11:54 service_evt.lcrso
-rwxr-xr-x  1 root root  194885 Jul 29 11:54 service_lck.lcrso
-rwxr-xr-x  1 root root  235168 Jul 29 11:54 service_msg.lcrso
-rwxr-xr-x  1 root root  120445 Jul 29 11:21 service_pload.lcrso
-rwxr-xr-x  1 root root  135340 Jul 29 11:54 service_tmr.lcrso
-rwxr-xr-x  1 root root  124092 Jul 29 11:21 vsf_quorum.lcrso
-rwxr-xr-x  1 root root  121298 Jul 29 11:21 vsf_ykd.lcrso

I also did

export COROSYNC_DEFAULT_CONFIG_IFACE="openaisserviceenable:openaisparser"

In place of openaisparser I also tried corosyncparse and
corosyncparser but to no avail.

-sincerely
Shravan

On Wed, Oct 21, 2009 at 11:49 AM, Steven Dake <sdake at redhat.com> wrote:
> I recommend using corosync 1.1.1 - several bug fixes one critical for
> proper pacemaker operation.  It won't fix this particular problem
> however.
>
> Corosync loads pacemaker by searching for a pacemaker lcrso file.  These
> files are default installed in /usr/libexec/lcrso but may be in a
> different location depending on your distribution.
>
> Regards
> -steve
>
> On Wed, 2009-10-21 at 11:13 -0400, Shravan Mishra wrote:
>> Hello guys,
>>
>> We are running
>>
>> corosync-1.0.0
>> heartbeat-2.99.1
>> pacemaker-1.0.4
>>
>> the corosync.conf  under /etc/corosync/ is
>>
>> ============
>> # Please read the corosync.conf.5 manual page
>> compatibility: whitetank
>>
>> aisexec {
>>        user: root
>>        group: root
>> }
>> totem {
>>        version: 2
>>        secauth: off
>>        threads: 0
>>        interface {
>>                ringnumber: 0
>>                bindnetaddr: 172.30.0.0
>>                mcastaddr:226.94.1.1
>>                mcastport: 5406
>>        }
>> }
>>
>> logging {
>>        fileline: off
>>        to_stderr: yes
>>        to_logfile: yes
>>        to_syslog: yes
>>        logfile: /tmp/corosync.log
>>        debug: on
>>        timestamp: on
>>        logger_subsys {
>>                subsys: pacemaker
>>                debug: on
>>                tags: enter|leave|trace1|trace2| trace3|trace4|trace6
>>        }
>> }
>>
>>
>> service {
>>        name: pacemaker
>>        ver: 0
>>     #   use_mgmtd: yes
>>      #  use_logd:yes
>> }
>>
>>
>> corosync {
>>        user: root
>>        group: root
>> }
>>
>>
>> amf {
>>        mode: disabled
>> }
>> ============
>>
>>
>> #service corosync start
>>
>> starts the messaging but fails to load pacemaker,
>>
>> /tmp/corosync.log  ---
>>
>> ==================
>>
>> Oct 21 11:05:43 corosync [MAIN  ] Corosync Cluster Engine ('trunk'):
>> started and ready to provide service.
>> Oct 21 11:05:43 corosync [MAIN  ] Successfully read main configuration
>> file '/etc/corosync/corosync.conf'.
>> Oct 21 11:05:43 corosync [TOTEM ] Token Timeout (1000 ms) retransmit
>> timeout (238 ms)
>> Oct 21 11:05:43 corosync [TOTEM ] token hold (180 ms) retransmits
>> before loss (4 retrans)
>> Oct 21 11:05:43 corosync [TOTEM ] join (50 ms) send_join (0 ms)
>> consensus (800 ms) merge (200 ms)
>> Oct 21 11:05:43 corosync [TOTEM ] downcheck (1000 ms) fail to recv
>> const (50 msgs)
>> Oct 21 11:05:43 corosync [TOTEM ] seqno unchanged const (30 rotations)
>> Maximum network MTU 1500
>> Oct 21 11:05:43 corosync [TOTEM ] window size per rotation (50
>> messages) maximum messages per rotation (17 messages)
>> Oct 21 11:05:43 corosync [TOTEM ] send threads (0 threads)
>> Oct 21 11:05:43 corosync [TOTEM ] RRP token expired timeout (238 ms)
>> Oct 21 11:05:43 corosync [TOTEM ] RRP token problem counter (2000 ms)
>> Oct 21 11:05:43 corosync [TOTEM ] RRP threshold (10 problem count)
>> Oct 21 11:05:43 corosync [TOTEM ] RRP mode set to none.
>> Oct 21 11:05:43 corosync [TOTEM ] heartbeat_failures_allowed (0)
>> Oct 21 11:05:43 corosync [TOTEM ] max_network_delay (50 ms)
>> Oct 21 11:05:43 corosync [TOTEM ] HeartBeat is Disabled. To enable set
>> heartbeat_failures_allowed > 0
>> Oct 21 11:05:43 corosync [TOTEM ] Initializing transmit/receive
>> security: libtomcrypt SOBER128/SHA1HMAC (mode 0).
>> Oct 21 11:05:43 corosync [TOTEM ] Receive multicast socket recv buffer
>> size (262142 bytes).
>> Oct 21 11:05:43 corosync [TOTEM ] Transmit multicast socket send
>> buffer size (262142 bytes).
>> Oct 21 11:05:43 corosync [TOTEM ] The network interface [172.30.0.145]
>> is now up.
>> Oct 21 11:05:43 corosync [TOTEM ] Created or loaded sequence id
>> 184.172.30.0.145 for this ring.
>> Oct 21 11:05:43 corosync [TOTEM ] entering GATHER state from 15.
>> Oct 21 11:05:43 corosync [SERV  ] Service failed to load 'pacemaker'.
>> Oct 21 11:05:43 corosync [SERV  ] Service initialized 'corosync
>> extended virtual synchrony service'
>> Oct 21 11:05:43 corosync [SERV  ] Service initialized 'corosync
>> configuration service'
>> Oct 21 11:05:43 corosync [SERV  ] Service initialized 'corosync
>> cluster closed process group service v1.01'
>> Oct 21 11:05:43 corosync [SERV  ] Service initialized 'corosync
>> cluster config database access v1.01'
>> Oct 21 11:05:43 corosync [SERV  ] Service initialized 'corosync
>> profile loading service'
>> Oct 21 11:05:43 corosync [MAIN  ] Compatibility mode set to
>> whitetank.  Using V1 and V2 of the synchronization engine.
>> Oct 21 11:05:43 corosync [TOTEM ] Creating commit token because I am
>> the rep.
>> Oct 21 11:05:43 corosync [TOTEM ] Saving state aru 0 high seq received
>> 0
>> Oct 21 11:05:43 corosync [TOTEM ] Storing new sequence id for ring bc
>> Oct 21 11:05:43 corosync [TOTEM ] entering COMMIT state.
>> Oct 21 11:05:43 corosync [TOTEM ] got commit token
>> Oct 21 11:05:43 corosync [TOTEM ] entering RECOVERY state.
>> Oct 21 11:05:43 corosync [TOTEM ] position [0] member 172.30.0.145:
>> Oct 21 11:05:43 corosync [TOTEM ] previous ring seq 184 rep
>> 172.30.0.145
>> Oct 21 11:05:43 corosync [TOTEM ] aru 0 high delivered 0 received flag
>> 1
>> Oct 21 11:05:43 corosync [TOTEM ] Did not need to originate any
>> messages in recovery.
>> Oct 21 11:05:43 corosync [TOTEM ] got commit token
>> Oct 21 11:05:43 corosync [TOTEM ] Sending initial ORF token
>> Oct 21 11:05:43 corosync [TOTEM ] token retrans flag is 0 my set
>> retrans flag0 retrans queue empty 1 count 0, aru 0
>> Oct 21 11:05:43 corosync [TOTEM ] install seq 0 aru 0 high seq
>> received 0
>> Oct 21 11:05:43 corosync [TOTEM ] token retrans flag is 0 my set
>> retrans flag0 retrans queue empty 1 count 1, aru 0
>> Oct 21 11:05:43 corosync [TOTEM ] install seq 0 aru 0 high seq
>> received 0
>> Oct 21 11:05:43 corosync [TOTEM ] token retrans flag is 0 my set
>> retrans flag0 retrans queue empty 1 count 2, aru 0
>> Oct 21 11:05:43 corosync [TOTEM ] install seq 0 aru 0 high seq
>> received 0
>> Oct 21 11:05:43 corosync [TOTEM ] token retrans flag is 0 my set
>> retrans flag0 retrans queue empty 1 count 3, aru 0
>> Oct 21 11:05:43 corosync [TOTEM ] install seq 0 aru 0 high seq
>> received 0
>> Oct 21 11:05:43 corosync [TOTEM ] retrans flag count 4 token aru 0
>> install seq 0 aru 0 0
>> Oct 21 11:05:43 corosync [TOTEM ] recovery to regular 1-0
>> Oct 21 11:05:43 corosync [TOTEM ] Delivering to app 1 to 0
>> Oct 21 11:05:43 corosync [SYNC  ] This node is within the primary
>> component and will provide service.
>> Oct 21 11:05:43 corosync [TOTEM ] entering OPERATIONAL state.
>> Oct 21 11:05:43 corosync [TOTEM ] A processor joined or left the
>> membership and a new membership was formed.
>> Oct 21 11:05:43 corosync [TOTEM ] mcasted message added to pending
>> queue
>> Oct 21 11:05:43 corosync [TOTEM ] Delivering 0 to 1
>> Oct 21 11:05:43 corosync [TOTEM ] Delivering MCAST message with seq 1
>> to pending delivery queue
>> Oct 21 11:05:43 corosync [SYNC  ] confchg entries 1
>> Oct 21 11:05:43 corosync [SYNC  ] Barrier Start Received From
>> -1862263124
>> Oct 21 11:05:43 corosync [SYNC  ] Barrier completion status for nodeid
>> -1862263124 = 1.
>> ==================
>>
>>
>>
>>
>> I'm curious to know how actually corosync/openais loads pacemaker, the
>> config directive seems to have done the magic but apparently not in my
>> case.
>> What should I be looking for, as the log message hardly gives any
>> information.
>>
>>
>> Pacemaker comprises bunch of daemons like crmd, stonithd and stuff, I
>> ran them individually to see any permission problems
>> like /var/lib/heartbeat and /var/run/heartbeat which should be chown
>> hacluster:haclient.
>>
>>
>>
>>
>> Even after doing those it fails to load.
>>
>>
>>
>>
>> Please advise me what should I do.
>>
>>
>>
>>
>> Thanks
>> Shravan
>>
>>
>>
>>
>>
>>
>> _______________________________________________
>> Pacemaker mailing list
>> Pacemaker at oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
>
> _______________________________________________
> Pacemaker mailing list
> Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>