[ClusterLabs] A processor failed, forming new configuration very often and without reason

Fri Apr 10 13:37:02 UTC 2015

Hello,

The context :
  Red Hat Enterprise Linux Server release 5.7
  corosynclib-1.2.7-1.1.el5.x86_64
  corosync-1.2.7-1.1.el5.x86_64
  pacemaker-1.0.10-1.4.el5.x86_64
  pacemaker-libs-1.0.10-1.4.el5.x86_64
  2 nodes, both on same ESX server

I've lost of processor joined of left the membership message but can't
understand why, because the 2 hosts are up and running, and when the
corosync try to start the cluster's ressource he can't because the are
already up on the first node.
We can see "Another DC detected" so the communication between the 2 VM is
OK.

I've tried to raise totem parameter, without success.

Here are some log's extract :

service corosync restart at 11:35 :

grep TOTEM corosync.log
Apr 10 11:35:56 corosync [TOTEM ] Initializing transport (UDP/IP).
Apr 10 11:35:56 corosync [TOTEM ] Initializing transmit/receive security:
libtomcrypt SOBER128/SHA1HMAC (mode 0).
Apr 10 11:35:56 corosync [TOTEM ] The network interface [10.10.72.7] is now
up.
Apr 10 11:35:56 corosync [TOTEM ] A processor joined or left the membership
and a new membership was formed.
Apr 10 11:35:56 corosync [TOTEM ] A processor joined or left the membership
and a new membership was formed.
Apr 10 13:13:07 corosync [TOTEM ] A processor failed, forming new
configuration.
Apr 10 13:13:08 corosync [TOTEM ] A processor joined or left the membership
and a new membership was formed.
Apr 10 13:13:09 corosync [TOTEM ] A processor joined or left the membership
and a new membership was formed.
Apr 10 13:13:09 corosync [TOTEM ] A processor joined or left the membership
and a new membership was formed.
Apr 10 13:31:39 corosync [TOTEM ] A processor failed, forming new
configuration.
Apr 10 13:31:40 corosync [TOTEM ] A processor joined or left the membership
and a new membership was formed.
Apr 10 13:31:41 corosync [TOTEM ] A processor joined or left the membership
and a new membership was formed.
Apr 10 13:34:53 corosync [TOTEM ] A processor failed, forming new
configuration.
Apr 10 13:34:54 corosync [TOTEM ] A processor joined or left the membership
and a new membership was formed.
Apr 10 13:34:55 corosync [TOTEM ] A processor joined or left the membership
and a new membership was formed.
Apr 10 13:34:56 corosync [TOTEM ] A processor joined or left the membership
and a new membership was formed.
Apr 10 13:47:59 corosync [TOTEM ] A processor failed, forming new
configuration.
Apr 10 13:48:00 corosync [TOTEM ] A processor joined or left the membership
and a new membership was formed.
Apr 10 13:48:01 corosync [TOTEM ] A processor joined or left the membership
and a new membership was formed.
Apr 10 13:48:01 corosync [TOTEM ] A processor joined or left the membership
and a new membership was formed.
Apr 10 13:55:35 corosync [TOTEM ] A processor failed, forming new
configuration.
Apr 10 13:55:36 corosync [TOTEM ] A processor joined or left the membership
and a new membership was formed.
Apr 10 13:55:37 corosync [TOTEM ] A processor joined or left the membership
and a new membership was formed.
Apr 10 13:55:38 corosync [TOTEM ] A processor failed, forming new
configuration.
Apr 10 13:55:39 corosync [TOTEM ] A processor joined or left the membership
and a new membership was formed.
Apr 10 13:55:42 corosync [TOTEM ] A processor joined or left the membership
and a new membership was formed.
Apr 10 13:57:54 corosync [TOTEM ] A processor failed, forming new
configuration.
Apr 10 13:57:55 corosync [TOTEM ] A processor joined or left the membership
and a new membership was formed.
Apr 10 13:57:56 corosync [TOTEM ] A processor joined or left the membership
and a new membership was formed.
Apr 10 14:01:03 corosync [TOTEM ] A processor failed, forming new
configuration.
Apr 10 14:01:04 corosync [TOTEM ] A processor joined or left the membership
and a new membership was formed.
Apr 10 14:01:05 corosync [TOTEM ] A processor joined or left the membership
and a new membership was formed.
Apr 10 14:01:06 corosync [TOTEM ] A processor joined or left the membership
and a new membership was formed.
Apr 10 14:01:06 corosync [TOTEM ] A processor joined or left the membership
and a new membership was formed.

This comes so often !

grep ERROR corosync.log
Apr 10 13:13:10 host2.exemple.com crmd: [26530]: ERROR: crmd_ha_msg_filter:
Another DC detected: vif5_7 (op=noop)
Apr 10 13:31:42 host2.exemple.com crmd: [26530]: ERROR: crmd_ha_msg_filter:
Another DC detected: vif5_7 (op=noop)
Apr 10 13:34:55 host2.exemple.com pengine: [26529]: ERROR: unpack_rsc_op:
Hard error - routing-jboss_stop_0 failed with rc=2: Preventing
routing-jboss from re-starting on host2.exemple.com
Apr 10 13:34:55 host2.exemple.com crmd: [26530]: ERROR: te_graph_trigger:
Transition failed: terminated
Apr 10 13:34:56 host2.exemple.com crmd: [26530]: ERROR: crmd_ha_msg_filter:
Another DC detected: vif5_7 (op=noop)
Apr 10 13:48:01 host2.exemple.com pengine: [26529]: ERROR: unpack_rsc_op:
Hard error - routing-jboss_stop_0 failed with rc=2: Preventing
routing-jboss from re-starting on host2.exemple.com
Apr 10 13:48:01 host2.exemple.com crmd: [26530]: ERROR: te_graph_trigger:
Transition failed: terminated
Apr 10 13:48:01 host2.exemple.com crmd: [26530]: ERROR: crmd_ha_msg_filter:
Another DC detected: vif5_7 (op=noop)
Apr 10 13:55:39 host2.exemple.com pengine: [26529]: ERROR: unpack_rsc_op:
Hard error - routing-jboss_stop_0 failed with rc=2: Preventing
routing-jboss from re-starting on host2.exemple.com
Apr 10 13:55:39 host2.exemple.com crmd: [26530]: ERROR: te_graph_trigger:
Transition failed: terminated
Apr 10 13:57:56 host2.exemple.com crmd: [26530]: ERROR: crmd_ha_msg_filter:
Another DC detected: vif5_7 (op=noop)
Apr 10 14:01:05 host2.exemple.com pengine: [26529]: ERROR: unpack_rsc_op:
Hard error - routing-jboss_stop_0 failed with rc=2: Preventing
routing-jboss from re-starting on host2.exemple.com
Apr 10 14:01:05 host2.exemple.com crmd: [26530]: ERROR: te_graph_trigger:
Transition failed: terminated
Apr 10 14:01:06 host2.exemple.com crmd: [26530]: ERROR: crmd_ha_msg_filter:
Another DC detected: vif5_7 (op=noop)

grep WARN corosync.log
Apr 10 13:13:08 host2.example.com crmd: [26530]: WARN: check_dead_member:
Our DC node (vif5_7) left the cluster
Apr 10 13:13:08 host2.example.com crmd: [26530]: WARN:
cib_client_add_notify_callback: Callback already present
Apr 10 13:13:09 host2.example.com cib: [26526]: WARN: cib_process_diff:
Diff 0.2812.8 -> 0.2812.9 not applied to 0.2813.1: current "epoch" is
greater than required
Apr 10 13:13:09 host2.example.com cib: [26526]: WARN: cib_process_diff:
Diff 0.2812.9 -> 0.2813.1 not applied to 0.2813.1: current "epoch" is
greater than required
Apr 10 13:13:09 host2.example.com cib: [26526]: WARN:
cib_server_process_diff: Not requesting full refresh in slave mode.
Apr 10 13:13:09 host2.example.com cib: [26526]: WARN: cib_diff_notify:
Local-only Change (client:crmd, call: 139): -1.-1.-1 (Application of an
update diff failed)
Apr 10 13:13:09 host2.example.com cib: [26526]: WARN:
cib_server_process_diff: Not requesting full refresh in slave mode.
Apr 10 13:13:09 host2.example.com cib: [26526]: WARN:
cib_server_process_diff: Not requesting full refresh in slave mode.
Apr 10 13:13:09 host2.example.com cib: [26526]: WARN:
cib_server_process_diff: Not requesting full refresh in slave mode.
Apr 10 13:14:29 host2.example.com lrmd: [27113]: WARN: For LSB init script,
no additional parameters are needed.
Apr 10 13:31:40 host2.example.com crmd: [26530]: WARN: check_dead_member:
Our DC node (vif5_7) left the cluster
Apr 10 13:31:40 host2.example.com crmd: [26530]: WARN:
cib_client_add_notify_callback: Callback already present
Apr 10 13:31:41 host2.example.com cib: [26526]: WARN:
cib_server_process_diff: Not requesting full refresh in slave mode.
Apr 10 13:31:41 host2.example.com cib: [26526]: WARN:
cib_server_process_diff: Not requesting full refresh in slave mode.
Apr 10 13:31:41 host2.example.com cib: [26526]: WARN:
cib_server_process_diff: Not requesting full refresh in slave mode.
Apr 10 13:31:41 host2.example.com cib: [26526]: WARN:
cib_server_process_diff: Not requesting full refresh in slave mode.
Apr 10 13:34:54 host2.example.com crmd: [26530]: WARN: check_dead_member:
Our DC node (vif5_7) left the cluster
Apr 10 13:34:54 host2.example.com crmd: [26530]: WARN:
cib_client_add_notify_callback: Callback already present
Apr 10 13:34:55 host2.example.com crmd: [26530]: WARN: match_down_event: No
match for shutdown action on vif5_7
Apr 10 13:34:55 host2.example.com pengine: [26529]: WARN: unpack_rsc_op:
Processing failed op routing-jboss_stop_0 on tango2.luxlait.lan: invalid
parameter (2)
Apr 10 13:34:55 host2.example.com pengine: [26529]: WARN:
common_apply_stickiness: Forcing routing-jboss away from tango2.luxlait.lan
after 1000000 failures (max=1000000)
Apr 10 13:34:55 host2.example.com crmd: [26530]: WARN: run_graph:
Transition 0 (Complete=1, Pending=0, Fired=0, Skipped=0, Incomplete=4,
Source=/var/lib/pengine/pe-input-87782.bz2): Terminated
Apr 10 13:34:55 host2.example.com crmd: [26530]: WARN: print_graph: Graph 0
(5 actions in 5 synapses): batch-limit=30 jobs, network-delay=60000ms
Apr 10 13:34:55 host2.example.com crmd: [26530]: WARN: print_graph: Synapse
0 is pending (priority: 0)
Apr 10 13:34:55 host2.example.com crmd: [26530]: WARN: print_elem:
[Action 9]: Pending (id: vifGroup_start_0, type: pseduo, priority: 0)
Apr 10 13:34:55 host2.example.com crmd: [26530]: WARN: print_elem:      *
[Input 11]: Completed (id: vifGroup_stop_0, type: pseduo, priority: 0)
Apr 10 13:34:55 host2.example.com crmd: [26530]: WARN: print_elem:      *
[Input 12]: Pending (id: vifGroup_stopped_0, type: pseduo, priority: 0)
Apr 10 13:34:55 host2.example.com crmd: [26530]: WARN: print_graph: Synapse
1 was confirmed (priority: 0)
Apr 10 13:34:55 host2.example.com crmd: [26530]: WARN: print_graph: Synapse
2 is pending (priority: 0)
Apr 10 13:34:55 host2.example.com crmd: [26530]: WARN: print_elem:
[Action 12]: Pending (id: vifGroup_stopped_0, type: pseduo, priority: 0)
Apr 10 13:34:55 host2.example.com crmd: [26530]: WARN: print_elem:      *
[Input 6]: Pending (id: routing-jboss_stop_0, loc: tango2.luxlait.lan,
priority: 0)
Apr 10 13:34:55 host2.example.com crmd: [26530]: WARN: print_elem:      *
[Input 11]: Completed (id: vifGroup_stop_0, type: pseduo, priority: 0)
Apr 10 13:34:55 host2.example.com crmd: [26530]: WARN: print_graph: Synapse
3 is pending (priority: 0)
Apr 10 13:34:55 host2.example.com crmd: [26530]: WARN: print_elem:
[Action 4]: Pending (id: clusterIP_start_0, loc: tango2.luxlait.lan,
priority: 0)
Apr 10 13:34:55 host2.example.com crmd: [26530]: WARN: print_elem:      *
[Input 9]: Pending (id: vifGroup_start_0, type: pseduo, priority: 0)
Apr 10 13:34:55 host2.example.com crmd: [26530]: WARN: print_graph: Synapse
4 is pending (priority: 0)
Apr 10 13:34:55 host2.example.com crmd: [26530]: WARN: print_elem:
[Action 5]: Pending (id: clusterIP_monitor_30000, loc: tango2.luxlait.lan,
priority: 0)
Apr 10 13:34:55 host2.example.com crmd: [26530]: WARN: print_elem:      *
[Input 4]: Pending (id: clusterIP_start_0, loc: tango2.luxlait.lan,
priority: 0)
Apr 10 13:34:56 host2.example.com cib: [26526]: WARN: cib_process_diff:
Diff 0.2817.1 -> 0.2817.2 not applied to 0.2819.1: current "epoch" is
greater than required
Apr 10 13:34:56 host2.example.com cib: [26526]: WARN: cib_process_diff:
Diff 0.2817.2 -> 0.2817.3 not applied to 0.2819.1: current "epoch" is
greater than required
Apr 10 13:34:56 host2.example.com cib: [26526]: WARN: cib_process_diff:
Diff 0.2817.3 -> 0.2817.4 not applied to 0.2819.1: current "epoch" is
greater than required
Apr 10 13:34:56 host2.example.com crmd: [26530]: WARN: do_log: FSA: Input
I_JOIN_OFFER from route_message() received in state S_ELECTION
Apr 10 13:34:56 host2.example.com cib: [26526]: WARN: cib_process_diff:
Diff 0.2817.4 -> 0.2818.1 not applied to 0.2819.1: current "epoch" is
greater than required
Apr 10 13:48:00 host2.example.com crmd: [26530]: WARN: check_dead_member:
Our DC node (vif5_7) left the cluster
Apr 10 13:48:00 host2.example.com crmd: [26530]: WARN:
cib_client_add_notify_callback: Callback already present
Apr 10 13:48:01 host2.example.com crmd: [26530]: WARN: match_down_event: No
match for shutdown action on vif5_7
Apr 10 13:48:01 host2.example.com pengine: [26529]: WARN: unpack_rsc_op:
Processing failed op routing-jboss_stop_0 on tango2.luxlait.lan: invalid
parameter (2)
Apr 10 13:48:01 host2.example.com pengine: [26529]: WARN:
common_apply_stickiness: Forcing routing-jboss away from tango2.luxlait.lan
after 1000000 failures (max=1000000)
Apr 10 13:48:01 host2.example.com crmd: [26530]: WARN: run_graph:
Transition 1 (Complete=1, Pending=0, Fired=0, Skipped=0, Incomplete=4,
Source=/var/lib/pengine/pe-input-87783.bz2): Terminated
Apr 10 13:48:01 host2.example.com crmd: [26530]: WARN: print_graph: Graph 1
(5 actions in 5 synapses): batch-limit=30 jobs, network-delay=60000ms
Apr 10 13:48:01 host2.example.com crmd: [26530]: WARN: print_graph: Synapse
0 is pending (priority: 0)
Apr 10 13:48:01 host2.example.com crmd: [26530]: WARN: print_elem:
[Action 9]: Pending (id: vifGroup_start_0, type: pseduo, priority: 0)
Apr 10 13:48:01 host2.example.com crmd: [26530]: WARN: print_elem:      *
[Input 11]: Completed (id: vifGroup_stop_0, type: pseduo, priority: 0)
Apr 10 13:48:01 host2.example.com crmd: [26530]: WARN: print_elem:      *
[Input 12]: Pending (id: vifGroup_stopped_0, type: pseduo, priority: 0)
Apr 10 13:48:01 host2.example.com crmd: [26530]: WARN: print_graph: Synapse
1 was confirmed (priority: 0)
Apr 10 13:48:01 host2.example.com crmd: [26530]: WARN: print_graph: Synapse
2 is pending (priority: 0)
Apr 10 13:48:01 host2.example.com crmd: [26530]: WARN: print_elem:
[Action 12]: Pending (id: vifGroup_stopped_0, type: pseduo, priority: 0)
Apr 10 13:48:01 host2.example.com crmd: [26530]: WARN: print_elem:      *
[Input 6]: Pending (id: routing-jboss_stop_0, loc: tango2.luxlait.lan,
priority: 0)
Apr 10 13:48:01 host2.example.com crmd: [26530]: WARN: print_elem:      *
[Input 11]: Completed (id: vifGroup_stop_0, type: pseduo, priority: 0)
Apr 10 13:48:01 host2.example.com crmd: [26530]: WARN: print_graph: Synapse
3 is pending (priority: 0)
Apr 10 13:48:01 host2.example.com crmd: [26530]: WARN: print_elem:
[Action 4]: Pending (id: clusterIP_start_0, loc: tango2.luxlait.lan,
priority: 0)
Apr 10 13:48:01 host2.example.com crmd: [26530]: WARN: print_elem:      *
[Input 9]: Pending (id: vifGroup_start_0, type: pseduo, priority: 0)
Apr 10 13:48:01 host2.example.com crmd: [26530]: WARN: print_graph: Synapse
4 is pending (priority: 0)
Apr 10 13:48:01 host2.example.com crmd: [26530]: WARN: print_elem:
[Action 5]: Pending (id: clusterIP_monitor_30000, loc: tango2.luxlait.lan,
priority: 0)
Apr 10 13:48:01 host2.example.com crmd: [26530]: WARN: print_elem:      *
[Input 4]: Pending (id: clusterIP_start_0, loc: tango2.luxlait.lan,
priority: 0)
Apr 10 13:48:01 host2.example.com cib: [26526]: WARN: cib_process_diff:
Diff 0.2821.1 -> 0.2821.2 not applied to 0.2822.1: current "epoch" is
greater than required
Apr 10 13:48:01 host2.example.com cib: [26526]: WARN: cib_process_diff:
Diff 0.2821.2 -> 0.2821.3 not applied to 0.2822.1: current "epoch" is
greater than required
Apr 10 13:48:01 host2.example.com crmd: [26530]: WARN: do_log: FSA: Input
I_JOIN_OFFER from route_message() received in state S_ELECTION
Apr 10 13:48:01 host2.example.com cib: [26526]: WARN: cib_process_diff:
Diff 0.2821.3 -> 0.2821.4 not applied to 0.2822.1: current "epoch" is
greater than required
Apr 10 13:48:01 host2.example.com cib: [26526]: WARN: cib_process_diff:
Diff 0.2821.4 -> 0.2822.1 not applied to 0.2822.1: current "epoch" is
greater than required
Apr 10 13:48:01 host2.example.com cib: [26526]: WARN: cib_diff_notify:
Local-only Change (client:crmd, call: 283): -1.-1.-1 (Application of an
update diff failed, requesting a full refresh)
Apr 10 13:55:36 host2.example.com crmd: [26530]: WARN: check_dead_member:
Our DC node (vif5_7) left the cluster
Apr 10 13:55:36 host2.example.com crmd: [26530]: WARN:
cib_client_add_notify_callback: Callback already present
Apr 10 13:55:39 host2.example.com pengine: [26529]: WARN: unpack_rsc_op:
Processing failed op routing-jboss_stop_0 on tango2.luxlait.lan: invalid
parameter (2)
Apr 10 13:55:39 host2.example.com pengine: [26529]: WARN:
common_apply_stickiness: Forcing routing-jboss away from tango2.luxlait.lan
after 1000000 failures (max=1000000)
Apr 10 13:55:39 host2.example.com crmd: [26530]: WARN: run_graph:
Transition 2 (Complete=1, Pending=0, Fired=0, Skipped=0, Incomplete=4,
Source=/var/lib/pengine/pe-input-87784.bz2): Terminated
Apr 10 13:55:39 host2.example.com crmd: [26530]: WARN: print_graph: Graph 2
(5 actions in 5 synapses): batch-limit=30 jobs, network-delay=60000ms
Apr 10 13:55:39 host2.example.com crmd: [26530]: WARN: print_graph: Synapse
0 is pending (priority: 0)
Apr 10 13:55:39 host2.example.com crmd: [26530]: WARN: print_elem:
[Action 9]: Pending (id: vifGroup_start_0, type: pseduo, priority: 0)
Apr 10 13:55:39 host2.example.com crmd: [26530]: WARN: print_elem:      *
[Input 11]: Completed (id: vifGroup_stop_0, type: pseduo, priority: 0)
Apr 10 13:55:39 host2.example.com crmd: [26530]: WARN: print_elem:      *
[Input 12]: Pending (id: vifGroup_stopped_0, type: pseduo, priority: 0)
Apr 10 13:55:39 host2.example.com crmd: [26530]: WARN: print_graph: Synapse
1 was confirmed (priority: 0)
Apr 10 13:55:39 host2.example.com crmd: [26530]: WARN: print_graph: Synapse
2 is pending (priority: 0)
Apr 10 13:55:39 host2.example.com crmd: [26530]: WARN: print_elem:
[Action 12]: Pending (id: vifGroup_stopped_0, type: pseduo, priority: 0)
Apr 10 13:55:39 host2.example.com crmd: [26530]: WARN: print_elem:      *
[Input 6]: Pending (id: routing-jboss_stop_0, loc: tango2.luxlait.lan,
priority: 0)
Apr 10 13:55:39 host2.example.com crmd: [26530]: WARN: print_elem:      *
[Input 11]: Completed (id: vifGroup_stop_0, type: pseduo, priority: 0)
Apr 10 13:55:39 host2.example.com crmd: [26530]: WARN: print_graph: Synapse
3 is pending (priority: 0)
Apr 10 13:55:39 host2.example.com crmd: [26530]: WARN: print_elem:
[Action 4]: Pending (id: clusterIP_start_0, loc: tango2.luxlait.lan,
priority: 0)
Apr 10 13:55:39 host2.example.com crmd: [26530]: WARN: print_elem:      *
[Input 9]: Pending (id: vifGroup_start_0, type: pseduo, priority: 0)
Apr 10 13:55:39 host2.example.com crmd: [26530]: WARN: print_graph: Synapse
4 is pending (priority: 0)
Apr 10 13:55:39 host2.example.com crmd: [26530]: WARN: print_elem:
[Action 5]: Pending (id: clusterIP_monitor_30000, loc: tango2.luxlait.lan,
priority: 0)
Apr 10 13:55:39 host2.example.com crmd: [26530]: WARN: print_elem:      *
[Input 4]: Pending (id: clusterIP_start_0, loc: tango2.luxlait.lan,
priority: 0)
Apr 10 13:55:42 host2.example.com crmd: [26530]: WARN: do_log: FSA: Input
I_RELEASE_DC from do_election_count_vote() received in state S_INTEGRATION
Apr 10 13:55:42 host2.example.com crmd: [26530]: WARN: update_dc: New DC
vif5_7 is not tango2.luxlait.lan
Apr 10 13:55:42 host2.example.com crmd: [26530]: WARN:
do_cl_join_offer_respond: Discarding offer from vif5_7 (expected
tango2.luxlait.lan)
Apr 10 13:57:55 host2.example.com crmd: [26530]: WARN: check_dead_member:
Our DC node (vif5_7) left the cluster
Apr 10 13:57:55 host2.example.com crmd: [26530]: WARN:
cib_client_add_notify_callback: Callback already present
Apr 10 13:57:56 host2.example.com cib: [26526]: WARN:
cib_server_process_diff: Not requesting full refresh in slave mode.
Apr 10 13:57:56 host2.example.com cib: [26526]: WARN:
cib_server_process_diff: Not requesting full refresh in slave mode.
Apr 10 13:57:56 host2.example.com cib: [26526]: WARN:
cib_server_process_diff: Not requesting full refresh in slave mode.
Apr 10 13:57:56 host2.example.com cib: [26526]: WARN:
cib_server_process_diff: Not requesting full refresh in slave mode.
Apr 10 13:57:56 host2.example.com cib: [26526]: WARN: cib_diff_notify:
Local-only Change (client:crmd, call: 356): -1.-1.-1 (Application of an
update diff failed, requesting a full refresh)
Apr 10 14:01:04 host2.example.com crmd: [26530]: WARN: check_dead_member:
Our DC node (vif5_7) left the cluster
Apr 10 14:01:04 host2.example.com crmd: [26530]: WARN:
cib_client_add_notify_callback: Callback already present
Apr 10 14:01:05 host2.example.com crmd: [26530]: WARN: match_down_event: No
match for shutdown action on vif5_7
Apr 10 14:01:05 host2.example.com pengine: [26529]: WARN: unpack_rsc_op:
Processing failed op routing-jboss_stop_0 on tango2.luxlait.lan: invalid
parameter (2)
Apr 10 14:01:05 host2.example.com pengine: [26529]: WARN:
common_apply_stickiness: Forcing routing-jboss away from tango2.luxlait.lan
after 1000000 failures (max=1000000)
Apr 10 14:01:05 host2.example.com crmd: [26530]: WARN: run_graph:
Transition 3 (Complete=1, Pending=0, Fired=0, Skipped=0, Incomplete=4,
Source=/var/lib/pengine/pe-input-87785.bz2): Terminated
Apr 10 14:01:05 host2.example.com crmd: [26530]: WARN: print_graph: Graph 3
(5 actions in 5 synapses): batch-limit=30 jobs, network-delay=60000ms
Apr 10 14:01:05 host2.example.com crmd: [26530]: WARN: print_graph: Synapse
0 is pending (priority: 0)
Apr 10 14:01:05 host2.example.com crmd: [26530]: WARN: print_elem:
[Action 9]: Pending (id: vifGroup_start_0, type: pseduo, priority: 0)
Apr 10 14:01:05 host2.example.com crmd: [26530]: WARN: print_elem:      *
[Input 11]: Completed (id: vifGroup_stop_0, type: pseduo, priority: 0)
Apr 10 14:01:05 host2.example.com crmd: [26530]: WARN: print_elem:      *
[Input 12]: Pending (id: vifGroup_stopped_0, type: pseduo, priority: 0)
Apr 10 14:01:05 host2.example.com crmd: [26530]: WARN: print_graph: Synapse
1 was confirmed (priority: 0)
Apr 10 14:01:05 host2.example.com crmd: [26530]: WARN: print_graph: Synapse
2 is pending (priority: 0)
Apr 10 14:01:05 host2.example.com crmd: [26530]: WARN: print_elem:
[Action 12]: Pending (id: vifGroup_stopped_0, type: pseduo, priority: 0)
Apr 10 14:01:05 host2.example.com crmd: [26530]: WARN: print_elem:      *
[Input 6]: Pending (id: routing-jboss_stop_0, loc: tango2.luxlait.lan,
priority: 0)
Apr 10 14:01:05 host2.example.com crmd: [26530]: WARN: print_elem:      *
[Input 11]: Completed (id: vifGroup_stop_0, type: pseduo, priority: 0)
Apr 10 14:01:05 host2.example.com crmd: [26530]: WARN: print_graph: Synapse
3 is pending (priority: 0)
Apr 10 14:01:05 host2.example.com crmd: [26530]: WARN: print_elem:
[Action 4]: Pending (id: clusterIP_start_0, loc: tango2.luxlait.lan,
priority: 0)
Apr 10 14:01:05 host2.example.com crmd: [26530]: WARN: print_elem:      *
[Input 9]: Pending (id: vifGroup_start_0, type: pseduo, priority: 0)
Apr 10 14:01:05 host2.example.com crmd: [26530]: WARN: print_graph: Synapse
4 is pending (priority: 0)
Apr 10 14:01:05 host2.example.com crmd: [26530]: WARN: print_elem:
[Action 5]: Pending (id: clusterIP_monitor_30000, loc: tango2.luxlait.lan,
priority: 0)
Apr 10 14:01:05 host2.example.com crmd: [26530]: WARN: print_elem:      *
[Input 4]: Pending (id: clusterIP_start_0, loc: tango2.luxlait.lan,
priority: 0)
Apr 10 14:01:06 host2.example.com cib: [26526]: WARN: cib_process_diff:
Diff 0.2832.1 -> 0.2832.2 not applied to 0.2834.1: current "epoch" is
greater than required
Apr 10 14:01:06 host2.example.com cib: [26526]: WARN: cib_process_diff:
Diff 0.2832.2 -> 0.2832.3 not applied to 0.2834.1: current "epoch" is
greater than required
Apr 10 14:01:06 host2.example.com cib: [26526]: WARN: cib_process_diff:
Diff 0.2832.3 -> 0.2832.4 not applied to 0.2834.1: current "epoch" is
greater than required
Apr 10 14:01:06 host2.example.com cib: [26526]: WARN: cib_process_diff:
Diff 0.2832.4 -> 0.2833.1 not applied to 0.2834.1: current "epoch" is
greater than required

Multicast is no more used because of errors on other customers.

corosync.conf :
compatibility: whitetank

aisexec {
        user:   root
        group:  root
}
service {
        # Load the Pacemaker Cluster Resource Manager
        name: pacemaker
        ver:  0
}
totem {
        version: 2
        secauth: on
        threads: 0
        interface {
                ringnumber: 0
                bindnetaddr: 10.10.72.0
                #mcastaddr: 226.94.1.1
                mcastport: 5405
                broadcast: yes
        token: 10000
        }
}

logging {
        fileline: off
        to_stderr: no
        to_logfile: yes
        to_syslog: no
        logfile: /var/log/cluster/corosync.log
        debug: off
        timestamp: on
        logger_subsys {
                subsys: AMF
                debug: off
        }
}

amf {
        mode: disabled
}

crm configuration:
node host2.exemple.com \
        attributes standby="off"
node vif5_7 \
        attributes standby="off"
primitive clusterIP ocf:heartbeat:IPaddr2 \
        params ip="10.10.72.3" cidr_netmask="32" iflabel="jbossfailover" \
        op monitor interval="30s"
primitive routing-jboss lsb:routing-jboss \
        op monitor interval="30s"
group vifGroup clusterIP routing-jboss
location prefer-clusterIP clusterIP 50: vif5_7
property $id="cib-bootstrap-options" \
        dc-version="1.0.11-1554a83db0d3c3e546cfd3aaff6af1184f79ee87" \
        cluster-infrastructure="openais" \
        expected-quorum-votes="2" \
        stonith-enabled="false" \
        no-quorum-policy="ignore"
rsc_defaults $id="rsc-options" \
        resource-stickiness="20"

Best regards,
Philippe

-- 
[image: logoVif] <http://www.vif.fr/>L'informatique 100% Agrowww.vif.fr 
[image: VifYouTube] <http://www.youtube.com/user/Agrovif>[image: VifTwitter] 
<https://twitter.com/VIF_agro>*Suivez l'actualité VIF sur:* 
<http://www.vif.fr/regilait-devoile-les-dessous-du-processus-sop-et-vif-lance-vif-sop-au-cfia-de-rennes/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20150410/06002f64/attachment-0003.html>