[Pacemaker] Problems with Pacemaker 1.1.8 on F17

Fri Feb 22 00:11:11 EST 2013

>
> > You'd think that would help, but
> >
> > https://bugzilla.redhat.com/show_bug.cgi?id=880035 suggests otherwise.
> > I have one remaining fedora machine where KVM clusters still work, I
> > don't think I'll ever update it now.
> >

Well, that was fascinating read.

Using the udpu transport seems to have stabilized corosync.  If I
understand that bug report correctly I should also see better multicast
behavior if I enable the multicast_querier, but I'm happy with udpu for
now. This lets me focus on the other things that are acting oddly:

Trying to add a monitor to a systemd: resource, like this:

pcs resource create httpd systemd:httpd op monitor interval=30s

Which generates this in the cib:

-- <cib admin_epoch="0" epoch="7" num_updates="25" /> ++ <primitive
class="systemd" id="httpd" type="httpd" > ++ <instance_attributes
id="httpd-instance_attributes" /> ++ <operations > ++ <op
id="httpd-monitor-interval-30s" interval="30s" name="monitor" /> ++
</operations> ++ </primitive>
Results in the service never successfully starting:

notice: process_lrm_event: LRM operation httpd_monitor_0 (call=10, rc=7,
cib-update=30, confirmed=true) not running notice: process_lrm_event: LRM
operation httpd_start_0 (call=13, rc=0, cib-update=31, confirmed=true) ok
notice: process_lrm_event: LRM operation httpd_monitor_30000 (call=16,
rc=7, cib-update=32, confirmed=false) not running warning: status_from_rc:
Action 11 (httpd_monitor_30000) on puppet0 failed (target: 0 vs. rc: 7):
Error warning: update_failcount: Updating failcount for httpd on puppet0
after failed monitor: rc=7 (update=value++, time=1361503742) notice:
run_graph: Transition 2 (Complete=7, Pending=0, Fired=0, Skipped=0,
Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-2278.bz2):
Complete notice: attrd_trigger_update: Sending flush op to all hosts for:
fail-count-httpd (1) notice: attrd_perform_update: Sent update 11:
fail-count-httpd=1 notice: attrd_trigger_update: Sending flush op to all
hosts for: last-failure-httpd (1361503742) notice: attrd_perform_update:
Sent update 14: last-failure-httpd=1361503742 warning: unpack_rsc_op:
Processing failed op monitor for httpd on puppet0: not running (7) notice:
LogActions: Recover httpd#011(Started puppet0) notice: process_pe_message:
Calculated Transition 3: /var/lib/pacemaker/pengine/pe-input-2279.bz2
warning: unpack_rsc_op: Processing failed op monitor for httpd on puppet0:
not running (7) notice: LogActions: Recover httpd#011(Started puppet0)
notice: process_pe_message: Calculated Transition 4:
/var/lib/pacemaker/pengine/pe-input-2280.bz2 warning: unpack_rsc_op:
Processing failed op monitor for httpd on puppet0: not running (7)

This will continue until pacemaker declares the service FAILED, even though
httpd (in this example) starts up manually (with "systemctl start httpd")
without a problem. For what it's worth, the dbus method call to get the
ActiveState property appears to work:

# systemctl start httpd
# gdbus call --system --dest org.freedesktop.systemd1 --object-path
/org/freedesktop/systemd1/unit/httpd_2eservice -m
org.freedesktop.DBus.Properties.Get org.freedesktop.systemd1.Unit
ActiveState
(<'active'>,)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20130222/78be25c2/attachment-0007.html>