<div dir="ltr"><span style="font-size:13px">Hello list,</span><div style="font-size:13px"><br></div><div style="font-size:13px">Can you, please, help me in debugging 1 resource not being started after node failover ?</div><div style="font-size:13px"><br></div><div style="font-size:13px">Here is configuration that I'm testing:</div><div style="font-size:13px">3 nodes(kvm VM) cluster, that have:</div><div style="font-size:13px"><br></div><div style="font-size:13px"><div><div>node 10: aic-controller-58055.test.<wbr>domain.local</div><div>node 6: aic-controller-50186.test.<wbr>domain.local</div><div>node 9: aic-controller-12993.test.<wbr>domain.local</div><div>primitive cmha cmha \</div><div>        params conffile="/etc/cmha/cmha.conf" daemon="/usr/bin/cmhad" pidfile="/var/run/cmha/cmha.<wbr>pid" user=cmha \</div><div>        meta failure-timeout=30 resource-stickiness=1 target-role=Started migration-threshold=3 \</div><div>        op monitor interval=10 on-fail=restart timeout=20 \</div><div>        op start interval=0 on-fail=restart timeout=60 \</div><div>        op stop interval=0 on-fail=block timeout=90</div></div><div><div>primitive sysinfo_aic-controller-12993.<wbr>test.domain.local ocf:pacemaker:SysInfo \</div><div>        params disk_unit=M disks="/ /var/log" min_disk_free=512M \</div><div>        op monitor interval=15s</div><div>primitive sysinfo_aic-controller-50186.<wbr>test.domain.local ocf:pacemaker:SysInfo \</div><div>        params disk_unit=M disks="/ /var/log" min_disk_free=512M \</div><div>        op monitor interval=15s</div><div>primitive sysinfo_aic-controller-58055.<wbr>test.domain.local ocf:pacemaker:SysInfo \</div><div>        params disk_unit=M disks="/ /var/log" min_disk_free=512M \</div><div>        op monitor interval=15s</div></div><div><br></div><div>location cmha-on-aic-controller-12993.<wbr>test.domain.local cmha 100: aic-controller-12993.test.<wbr>domain.local</div><div>location cmha-on-aic-controller-50186.<wbr>test.domain.local cmha 100: aic-controller-50186.test.<wbr>domain.local</div><div>location cmha-on-aic-controller-58055.<wbr>test.domain.local cmha 100: aic-controller-58055.test.<wbr>domain.local</div><div>location sysinfo-on-aic-controller-<wbr>12993.test.domain.local sysinfo_aic-controller-12993.<wbr>test.domain.local inf: aic-controller-12993.test.<wbr>domain.local</div><div>location sysinfo-on-aic-controller-<wbr>50186.test.domain.local sysinfo_aic-controller-50186.<wbr>test.domain.local inf: aic-controller-50186.test.<wbr>domain.local</div><div>location sysinfo-on-aic-controller-<wbr>58055.test.domain.local sysinfo_aic-controller-58055.<wbr>test.domain.local inf: aic-controller-58055.test.<wbr>domain.local</div><div>property cib-bootstrap-options: \</div><div>        have-watchdog=false \</div><div>        dc-version=1.1.14-70404b0 \</div><div>        cluster-infrastructure=<wbr>corosync \</div><div>        cluster-recheck-interval=15s \</div><div>        no-quorum-policy=stop \</div><div>        stonith-enabled=false \</div><div>        start-failure-is-fatal=false \</div><div>        symmetric-cluster=false \</div><div>        node-health-strategy=migrate-<wbr>on-red \</div><div>        last-lrm-refresh=1470334410</div></div><div style="font-size:13px"><br></div><div style="font-size:13px">When 3 nodes online, everything seemed OK, this is output of scoreshow.sh:</div><div style="font-size:13px"><div>Resource                                                Score     Node                                   Stickiness #Fail    Migration-Threshold</div><div>cmha                                                    -INFINITY aic-controller-12993.test.<wbr>domain.local 1          0</div><div>cmha                                                              101 aic-controller-50186.test.<wbr>domain.local 1          0</div><div>cmha                                                    -INFINITY aic-controller-58055.test.<wbr>domain.local 1          0</div></div><div style="font-size:13px"><div>sysinfo_aic-controller-12993.<wbr>test.domain.local          INFINITY  aic-controller-12993.test.<wbr>domain.local 0          0</div><div>sysinfo_aic-controller-50186.<wbr>test.domain.local          -INFINITY aic-controller-50186.test.<wbr>domain.local 0          0</div><div>sysinfo_aic-controller-58055.<wbr>test.domain.local          INFINITY  aic-controller-58055.test.<wbr>domain.local 0          0</div></div><div style="font-size:13px"><br></div><div style="font-size:13px">The problem starts when 1 node, goes offline (aic-controller-50186). The resource cmha is stocked in stopped state.</div><div style="font-size:13px">Here is the showscores:</div><div style="font-size:13px"><div>Resource                                                Score     Node                                   Stickiness #Fail    Migration-Threshold</div><div>cmha                                                    -INFINITY aic-controller-12993.test.<wbr>domain.local 1          0</div><div>cmha                                                    -INFINITY aic-controller-50186.test.<wbr>domain.local 1          0</div><div>cmha                                                    -INFINITY aic-controller-58055.test.<wbr>domain.local 1          0</div></div><div style="font-size:13px"><br></div><div style="font-size:13px">Even it has target-role=Started pacemaker skipping this resource. And in logs I see:</div><div style="font-size:13px">pengine:     info: native_print:      cmha    (ocf::heartbeat:cmha):  Stopped<br></div><div style="font-size:13px">pengine:     info: native_color:      Resource cmha cannot run anywhere<br></div><div style="font-size:13px">pengine:     info: LogActions:        Leave   cmha    (Stopped)<br></div><div style="font-size:13px"><br></div><div style="font-size:13px">To recover cmha resource I need to run either:</div><div style="font-size:13px">1) crm resource cleanup cmha</div><div style="font-size:13px">2) crm resource reprobe</div><div style="font-size:13px"><br></div><div style="font-size:13px">After any of the above commands, resource began to be picked up be pacemaker and I see valid scores:</div><div style="font-size:13px"><div>Resource                                                Score     Node                                   Stickiness #Fail    Migration-Threshold</div><div>cmha                                                    100       aic-controller-58055.test.<wbr>domain.local 1          0        3</div><div>cmha                                                    101       aic-controller-12993.test.<wbr>domain.local 1          0        3</div><div>cmha                                                    -INFINITY aic-controller-50186.test.<wbr>domain.local 1          0        3</div></div><div style="font-size:13px"><br></div><div style="font-size:13px">So the questions here - why cluster-recheck doesn't work, and should it do reprobing ?</div><div style="font-size:13px">How to make migration work or what I missed in configuration that prevents migration? </div><div style="font-size:13px"><br></div><div style="font-size:13px">corosync  2.3.4<br></div><div style="font-size:13px">pacemaker 1.1.14</div></div>