[Pacemaker] ccm returning with exit code 100 and system rebooting

Dejan Muhamedagic dejanmm at fastmail.fm
Tue Jan 18 10:35:14 EST 2011


On Tue, Jan 18, 2011 at 04:55:38PM +0530, akshay punja wrote:
> Hi,
> 
> Thanks for the help,
> 
> As suggest I have changed the crm on to respawn, after the configuration
> change Rebooting has stopped.
> 
> I are using tomcat, apache httpd and mysql master - slave replication,  I
> have set this up in multiple environments and its working fine. I are see
> this issue only in one of the nodes and so I isolated the node to find a
> solution. The log file is only printing info and warn is there a way to
> enable debug logging too.

Use the debug directive in ha.cf.

Also, try 'rpm -q --verify' with all pacemaker/heartbeat
packages.

Thanks,

Dejan

> *ERROR: Unable to set scheduler parameters.: Operation not permitted*  - we
> are seeing this issue in health clusters in other environment too. I think
> this would not be the root cause
> 
> 
> bash-3.2# ps -ef
> UID        PID  PPID  C STIME TTY          TIME CMD
> root      2662 21268  0 11:17 pts/0    00:00:00 ps -ef
> root         1     0  0 11:01 ?        00:00:00 init [3]
> root     20382     1  0 11:01 ?        00:00:00 /usr/sbin/syslog_adapter
> root     20389     1  0 11:01 ?        00:00:00 syslogd -m 0 -p
> /dev/log_adapted
> root     20400     1  0 11:01 ?        00:00:00 logmgr
> root     21236  2953  0 11:02 pts/0    00:00:00 /bin/sh /bin/console
> root     21268 21236  0 11:02 pts/0    00:00:00 /bin/bash --login
> root     30746     1  0 11:12 ?        00:00:00 heartbeat: master control
> proces
> root     30748 30746  0 11:12 ?        00:00:00 heartbeat: FIFO reader
> root     30749 30746  0 11:12 ?        00:00:00 heartbeat: write: ucast eth0
> root     30750 30746  0 11:12 ?        00:00:00 heartbeat: read: ucast eth0
> root     32731 30746  0 11:14 ?        00:00:00 /usr/lib/heartbeat/lrmd -r
> root     32732 30746  0 11:14 ?        00:00:00 /usr/lib/heartbeat/stonithd
> 
> Log file after the configuration ifle
> Jan 18 11:12:47 mysqlis1 heartbeat: [30745]: info: Pacemaker support:
> respawn
> Jan 18 11:12:47 mysqlis1 heartbeat: [30745]: info: Pacemaker support: false
> Jan 18 11:12:47 mysqlis1 heartbeat: [30745]: WARN: Logging daemon is
> disabled --enabling logging daemon is recommended
> Jan 18 11:12:47 mysqlis1 heartbeat: [30745]: info:
> **************************
> Jan 18 11:12:47 mysqlis1 heartbeat: [30745]: info: Configuration validated.
> Starting heartbeat 3.0.2
> Jan 18 11:12:47 mysqlis1 heartbeat: [30746]: info: heartbeat: version 3.0.2
> Jan 18 11:12:47 mysqlis1 heartbeat: [30746]: info: Heartbeat generation:
> 1293183350
> Jan 18 11:12:47 mysqlis1 heartbeat: [30746]: info: glib: ucast: write socket
> priority set to IPTOS_LOWDELAY on eth0
> Jan 18 11:12:47 mysqlis1 heartbeat: [30746]: info: glib: ucast: bound send
> socket to device: eth0
> Jan 18 11:12:47 mysqlis1 heartbeat: [30746]: info: glib: ucast: bound
> receive socket to device: eth0
> Jan 18 11:12:47 mysqlis1 heartbeat: [30746]: info: glib: ucast: started on
> port 694 interface eth0 to 172.21.52.135
> Jan 18 11:12:47 mysqlis1 heartbeat: [30746]: info:
> G_main_add_TriggerHandler: Added signal manual handler
> Jan 18 11:12:47 mysqlis1 heartbeat: [30746]: info:
> G_main_add_TriggerHandler: Added signal manual handler
> Jan 18 11:12:47 mysqlis1 heartbeat: [30746]: info: G_main_add_SignalHandler:
> Added signal handler for signal 17
> Jan 18 11:12:47 mysqlis1 heartbeat: [30746]: ERROR: Unable to set scheduler
> parameters.: Operation not permitted
> Jan 18 11:12:47 mysqlis1 heartbeat: [30746]: info: Local status now set to:
> 'up'
> Jan 18 11:12:48 mysqlis1 heartbeat: [30748]: ERROR: Unable to set scheduler
> parameters.: Operation not permitted
> Jan 18 11:12:48 mysqlis1 heartbeat: [30749]: ERROR: Unable to set scheduler
> parameters.: Operation not permitted
> Jan 18 11:12:48 mysqlis1 heartbeat: [30750]: ERROR: Unable to set scheduler
> parameters.: Operation not permitted
> Jan 18 11:14:48 mysqlis1 heartbeat: [30746]: WARN: node mysql3: is dead
> Jan 18 11:14:48 mysqlis1 heartbeat: [30746]: info: Comm_now_up(): updating
> status to active
> Jan 18 11:14:48 mysqlis1 heartbeat: [30746]: info: Local status now set to:
> 'active'
> Jan 18 11:14:48 mysqlis1 heartbeat: [30746]: info: Starting child client
> "/usr/lib/heartbeat/ccm" (100,101)
> Jan 18 11:14:48 mysqlis1 heartbeat: [30746]: info: Starting child client
> "/usr/lib/heartbeat/cib" (100,101)
> Jan 18 11:14:48 mysqlis1 heartbeat: [30746]: info: Starting child client
> "/usr/lib/heartbeat/lrmd -r" (0,0)
> Jan 18 11:14:48 mysqlis1 heartbeat: [30746]: info: Starting child client
> "/usr/lib/heartbeat/stonithd" (0,0)
> Jan 18 11:14:48 mysqlis1 heartbeat: [30746]: info: Starting child client
> "/usr/lib/heartbeat/attrd" (100,101)
> Jan 18 11:14:48 mysqlis1 heartbeat: [30746]: info: Starting child client
> "/usr/lib/heartbeat/crmd" (100,101)
> Jan 18 11:14:48 mysqlis1 heartbeat: [32729]: info: Starting
> "/usr/lib/heartbeat/ccm" as uid 100  gid 101 (pid 32729)
> Jan 18 11:14:48 mysqlis1 heartbeat: [32730]: info: Starting
> "/usr/lib/heartbeat/cib" as uid 100  gid 101 (pid 32730)
> Jan 18 11:14:48 mysqlis1 heartbeat: [32731]: info: Starting
> "/usr/lib/heartbeat/lrmd -r" as uid 0  gid 0 (pid 32731)
> Jan 18 11:14:48 mysqlis1 lrmd: [32731]: info: G_main_add_SignalHandler:
> Added signal handler for signal 15
> Jan 18 11:14:48 mysqlis1 lrmd: [32731]: info: G_main_add_SignalHandler:
> Added signal handler for signal 17
> Jan 18 11:14:48 mysqlis1 lrmd: [32731]: info: enabling coredumps
> Jan 18 11:14:48 mysqlis1 lrmd: [32731]: info: G_main_add_SignalHandler:
> Added signal handler for signal 10
> Jan 18 11:14:48 mysqlis1 lrmd: [32731]: info: G_main_add_SignalHandler:
> Added signal handler for signal 12
> Jan 18 11:14:48 mysqlis1 lrmd: [32731]: info: Started.
> Jan 18 11:14:48 mysqlis1 heartbeat: [32732]: info: Starting
> "/usr/lib/heartbeat/stonithd" as uid 0  gid 0 (pid 32732)
> Jan 18 11:14:48 mysqlis1 heartbeat: [32733]: info: Starting
> "/usr/lib/heartbeat/attrd" as uid 100  gid 101 (pid 32733)
> Jan 18 11:14:48 mysqlis1 heartbeat: [30746]: WARN: Managed
> /usr/lib/heartbeat/ccm process 32729 exited with return code 100.
> Jan 18 11:14:48 mysqlis1 heartbeat: [30746]: WARN: Managed
> /usr/lib/heartbeat/cib process 32730 exited with return code 100.
> Jan 18 11:14:48 mysqlis1 heartbeat: [30746]: WARN: Managed
> /usr/lib/heartbeat/attrd process 32733 exited with return code 100.
> Jan 18 11:14:48 mysqlis1 stonithd: [32732]: info: G_main_add_SignalHandler:
> Added signal handler for signal 10
> Jan 18 11:14:48 mysqlis1 stonithd: [32732]: info: G_main_add_SignalHandler:
> Added signal handler for signal 12
> Jan 18 11:14:48 mysqlis1 stonithd: [32732]: info: crm_cluster_connect:
> Connecting to Heartbeat
> Jan 18 11:14:48 mysqlis1 heartbeat: [32734]: info: Starting
> "/usr/lib/heartbeat/crmd" as uid 100  gid 101 (pid 32734)
> Jan 18 11:14:48 mysqlis1 heartbeat: [30746]: WARN: Managed
> /usr/lib/heartbeat/crmd process 32734 exited with return code 100.
> Jan 18 11:14:48 mysqlis1 stonithd: [32732]: info: register_heartbeat_conn:
> Hostname: mysqlis1
> Jan 18 11:14:48 mysqlis1 stonithd: [32732]: info: register_heartbeat_conn:
> UUID: d26dfd2b-5412-42b5-84d2-86567676c849
> Jan 18 11:14:48 mysqlis1 stonithd: [32732]: notice:
> /usr/lib/heartbeat/stonithd start up successfully.
> Jan 18 11:14:48 mysqlis1 stonithd: [32732]: info: G_main_add_SignalHandler:
> Added signal handler for signal 17
> 
> Regards,
> Akshay
> 
> On Tue, Jan 18, 2011 at 2:25 PM, Andrew Beekhof <andrew at beekhof.net> wrote:
> 
> > On Tue, Jan 18, 2011 at 4:04 AM, akshay punja <akshay.punja at gmail.com>
> > wrote:
> > > Please let me know if any one has solved this issue.
> >
> > Can you try "crm respawn" instead of "crm on" so the node stays up
> > long enough to see why the ccm is unhappy.
> >
> > Lars, you really aught to think about changing the default behavior
> > and adding "crm fatal" or something.
> >
> > > CCM exiting with return code 100 and system rebooting
> > >
> > > On Mon, Jan 17, 2011 at 1:29 PM, akshay punja <akshay.punja at gmail.com>
> > > wrote:
> > >>
> > >> Hi All,
> > >>
> > >> We am using pacemaker(pacemaker-1.0.9.1-1.15.el5.i386.rpm) with
> > >> heartbeat(heartbeat-3.0.3-2.3.el5.i386.rpm) for a production deployment.
> > >>
> > >> Node : we are using two node in a cluster and hosting a bunch of
> > >> application on the HA.
> > >>
> > >> We are seeing a strange rebooting of one of the nodes Managed
> > >> /usr/lib/heartbeat/ccm process 22115 exited with return code 100. What
> > could
> > >> be possible issue and how could we fix it.
> > >>
> > >> Jan 17 07:50:38 mysqlis1 heartbeat: [17619]: info: Pacemaker support:
> > yes
> > >> Jan 17 07:50:38 mysqlis1 heartbeat: [17619]: info: Pacemaker support:
> > >> false
> > >> Jan 17 07:50:38 mysqlis1 heartbeat: [17619]: WARN: Logging daemon is
> > >> disabled --enabling logging daemon is recommended
> > >> Jan 17 07:50:38 mysqlis1 heartbeat: [17619]: info:
> > >> **************************
> > >> Jan 17 07:50:38 mysqlis1 heartbeat: [17619]: info: Configuration
> > >> validated. Starting heartbeat 3.0.2
> > >> Jan 17 07:50:38 mysqlis1 heartbeat: [17620]: info: heartbeat: version
> > >> 3.0.2
> > >> Jan 17 07:50:38 mysqlis1 heartbeat: [17620]: info: Heartbeat generation:
> > >> 1293182645
> > >> Jan 17 07:50:38 mysqlis1 heartbeat: [17620]: info: glib: ucast: write
> > >> socket priority set to IPTOS_LOWDELAY on eth0
> > >> Jan 17 07:50:38 mysqlis1 heartbeat: [17620]: info: glib: ucast: bound
> > send
> > >> socket to device: eth0
> > >> Jan 17 07:50:38 mysqlis1 heartbeat: [17620]: info: glib: ucast: bound
> > >> receive socket to device: eth0
> > >> Jan 17 07:50:38 mysqlis1 heartbeat: [17620]: info: glib: ucast: started
> > on
> > >> port 694 interface eth0 to 172.21.52.135
> > >> Jan 17 07:50:38 mysqlis1 heartbeat: [17620]: info:
> > >> G_main_add_TriggerHandler: Added signal manual handler
> > >> Jan 17 07:50:38 mysqlis1 heartbeat: [17620]: info:
> > >> G_main_add_TriggerHandler: Added signal manual handler
> > >> Jan 17 07:50:38 mysqlis1 heartbeat: [17620]: info:
> > >> G_main_add_SignalHandler: Added signal handler for signal 17
> > >> Jan 17 07:50:38 mysqlis1 heartbeat: [17620]: ERROR: Unable to set
> > >> scheduler parameters.: Operation not permitted
> > >> Jan 17 07:50:38 mysqlis1 heartbeat: [17620]: info: Local status now set
> > >> to: 'up'
> > >> Jan 17 07:50:39 mysqlis1 heartbeat: [17627]: ERROR: Unable to set
> > >> scheduler parameters.: Operation not permitted
> > >> Jan 17 07:50:39 mysqlis1 heartbeat: [17629]: ERROR: Unable to set
> > >> scheduler parameters.: Operation not permitted
> > >> Jan 17 07:50:39 mysqlis1 heartbeat: [17628]: ERROR: Unable to set
> > >> scheduler parameters.: Operation not permitted
> > >> Jan 17 07:52:39 mysqlis1 heartbeat: [17620]: WARN: node mysql3: is dead
> > >> Jan 17 07:52:39 mysqlis1 heartbeat: [17620]: info: Comm_now_up():
> > updating
> > >> status to active
> > >> Jan 17 07:52:39 mysqlis1 heartbeat: [17620]: info: Local status now set
> > >> to: 'active'
> > >> Jan 17 07:52:39 mysqlis1 heartbeat: [17620]: info: Starting child client
> > >> "/usr/lib/heartbeat/ccm" (100,101)
> > >> Jan 17 07:52:39 mysqlis1 heartbeat: [17620]: info: Starting child client
> > >> "/usr/lib/heartbeat/cib" (100,101)
> > >> Jan 17 07:52:39 mysqlis1 heartbeat: [17620]: info: Starting child client
> > >> "/usr/lib/heartbeat/lrmd -r" (0,0)
> > >> Jan 17 07:52:39 mysqlis1 heartbeat: [17620]: info: Starting child client
> > >> "/usr/lib/heartbeat/stonithd" (0,0)
> > >> Jan 17 07:52:39 mysqlis1 heartbeat: [17620]: info: Starting child client
> > >> "/usr/lib/heartbeat/attrd" (100,101)
> > >> Jan 17 07:52:39 mysqlis1 heartbeat: [17620]: info: Starting child client
> > >> "/usr/lib/heartbeat/crmd" (100,101)
> > >> Jan 17 07:52:39 mysqlis1 heartbeat: [19576]: info: Starting
> > >> "/usr/lib/heartbeat/ccm" as uid 100  gid 101 (pid 19576)
> > >> Jan 17 07:52:39 mysqlis1 heartbeat: [19577]: info: Starting
> > >> "/usr/lib/heartbeat/cib" as uid 100  gid 101 (pid 19577)
> > >> Jan 17 07:52:39 mysqlis1 heartbeat: [19578]: info: Starting
> > >> "/usr/lib/heartbeat/lrmd -r" as uid 0  gid 0 (pid 19578)
> > >> Jan 17 07:52:39 mysqlis1 lrmd: [19578]: info: G_main_add_SignalHandler:
> > >> Added signal handler for signal 15
> > >> Jan 17 07:52:39 mysqlis1 lrmd: [19578]: info: G_main_add_SignalHandler:
> > >> Added signal handler for signal 17
> > >> Jan 17 07:52:39 mysqlis1 lrmd: [19578]: info: enabling coredumps
> > >> Jan 17 07:52:39 mysqlis1 lrmd: [19578]: info: G_main_add_SignalHandler:
> > >> Added signal handler for signal 10
> > >> Jan 17 07:52:39 mysqlis1 lrmd: [19578]: info: G_main_add_SignalHandler:
> > >> Added signal handler for signal 12
> > >> Jan 17 07:52:39 mysqlis1 lrmd: [19578]: info: Started.
> > >> Jan 17 07:52:39 mysqlis1 heartbeat: [19579]: info: Starting
> > >> "/usr/lib/heartbeat/stonithd" as uid 0  gid 0 (pid 19579)
> > >> Jan 17 07:52:39 mysqlis1 heartbeat: [19580]: info: Starting
> > >> "/usr/lib/heartbeat/attrd" as uid 100  gid 101 (pid 19580)
> > >> Jan 17 07:52:39 mysqlis1 heartbeat: [17620]: WARN: Managed
> > >> /usr/lib/heartbeat/ccm process 19576 exited with return code 100.
> > >> Jan 17 07:52:39 mysqlis1 heartbeat: [17620]: EMERG: Rebooting system.
> > >> Reason: /usr/lib/heartbeat/ccm
> > >> Jan 17 07:52:39 mysqlis1 stonithd: [19579]: info:
> > >> G_main_add_SignalHandler: Added signal handler for signal 10
> > >> Jan 17 07:52:39 mysqlis1 stonithd: [19579]: info:
> > >> G_main_add_SignalHandler: Added signal handler for signal 12
> > >> Jan 17 07:52:39 mysqlis1 stonithd: [19579]: info: crm_cluster_connect:
> > >> Connecting to Heartbeat
> > >> Jan 17 07:52:39 mysqlis1 heartbeat: [19581]: info: Starting
> > >> "/usr/lib/heartbeat/crmd" as uid 100  gid 101 (pid 19581)
> > >> Jan 17 07:52:41 mysqlis1 heartbeat: [17620]: EMERG: ALL REBOOT OPTIONS
> > >> FAILED: /sbin/reboot -nf returned 0
> > >> Jan 17 07:52:41 mysqlis1 stonithd: [19579]: ERROR:
> > >> register_heartbeat_conn: Cannot sign on with heartbeat:
> > >> Jan 17 07:52:41 mysqlis1 stonithd: [19579]: ERROR: failed to connect to
> > >> cluster
> > >> Jan 17 07:52:41 mysqlis1 stonithd: [19579]: ERROR:
> > >> /usr/lib/heartbeat/stonithd abnormally abort.
> > >> Jan 17 07:52:42 mysqlis1 heartbeat: [17627]: CRIT: Emergency Shutdown:
> > >> Master Control process died.
> > >> Jan 17 07:52:42 mysqlis1 heartbeat: [17627]: CRIT: Killing pid 17620
> > with
> > >> SIGTERM
> > >> Jan 17 07:52:42 mysqlis1 heartbeat: [17627]: CRIT: Killing pid 17628
> > with
> > >> SIGTERM
> > >> Jan 17 07:52:42 mysqlis1 heartbeat: [17627]: CRIT: Killing pid 17629
> > with
> > >> SIGTERM
> > >> Jan 17 07:52:42 mysqlis1 heartbeat: [17627]: CRIT: Emergency
> > Shutdown(MCP
> > >> dead): Killing ourselves.
> > >>
> > >> Regards,
> > >> Akshay
> > >>
> > >>
> > >
> > >
> > > _______________________________________________
> > > Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> > > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> > >
> > > Project Home: http://www.clusterlabs.org
> > > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > > Bugs:
> > >
> > http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
> > >
> > >
> >
> > _______________________________________________
> > Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >
> > Project Home: http://www.clusterlabs.org
> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > Bugs:
> > http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
> >

> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker





More information about the Pacemaker mailing list