[Pacemaker] the behavior of clone resource

Junko IKEDA ikedaj at intellilink.co.jp
Tue Mar 16 09:20:16 EDT 2010


Hi,

There is just a little strange clone behavior.
I found that;

(1) start the group which contains three primitive resources,
           and clone set

# crm_mon -1

============
Last updated: Tue Mar 16 21:39:10 2010
Stack: openais
Current DC: cspm01 - partition with quorum
Version: 1.0.8-a77303a7adce stable-1.0 tip
4 Nodes configured, 4 expected votes
2 Resources configured.
============

Online: [ cspm01 cspm02 cspm03 cspm04 ]

        Resource Group: UMgroup01
            UmDummy01  (ocf::heartbeat:Dummy): Started cspm01
            UmDummy02  (ocf::heartbeat:Dummy): Started cspm01
            UmDummy03  (ocf::heartbeat:Dummy): Started cspm01
        Clone Set: clnUMgroup01
            Started: [ cspm01 cspm04 ]

(2) edit Dummy RA to create clnUMgroup01 stop NG.

# vim /usr/lib/ocf/resource.d/heartbeat/Dummy01
-----------------------------------------------
dummy_stop() {
           exit $OCF_ERR_GENERIC # intentional error

           dummy_monitor
           if [ $? =  $OCF_SUCCESS ]; then
               rm ${OCF_RESKEY_state}
           fi
           return $OCF_SUCCESS
}
-----------------------------------------------

(on cspm01)
# rm -f /var/run/heartbeat/rsctmp/Dummy-clnUMdummy01:0.state

(3) check the status of each resources

# crm_mon -1

============
Last updated: Tue Mar 16 21:40:11 2010
Stack: openais
Current DC: cspm01 - partition with quorum
Version: 1.0.8-a77303a7adce stable-1.0 tip
4 Nodes configured, 4 expected votes
2 Resources configured.
============

Online: [ cspm01 cspm02 cspm03 cspm04 ]

        Clone Set: clnUMgroup01
            Resource Group: clnUmResource:0
                clnUMdummy01:0 (ocf::heartbeat:Dummy01):       Started  
cspm01
(unmanaged) FAILED
                clnUMdummy02:0 (ocf::heartbeat:Dummy02):       Stopped
            Started: [ cspm04 ]

Failed actions:
           clnUMdummy01:0_monitor_10000 (node=cspm01, call=8, rc=7,
status=complete): not running
           clnUMdummy01:0_stop_0 (node=cspm01, call=18, rc=1,
status=complete):
unknown error
           UmDummy03_monitor_10000 (node=cspm01, call=16, rc=7,
status=complete):
not running
           UmDummy01_monitor_10000 (node=cspm01, call=12, rc=7,
status=complete):
not running
           clnUMdummy02:0_monitor_10000 (node=cspm01, call=10, rc=7,
status=complete): not running


In this case, clone instance on cspm04 keeps running.

but when I added the other resource in group, like this;

============
Last updated: Tue Mar 16 21:53:26 2010
Stack: openais
Current DC: cspm01 - partition with quorum
Version: 1.0.8-a77303a7adce stable-1.0 tip
4 Nodes configured, 4 expected votes
2 Resources configured.
============

Online: [ cspm01 cspm02 cspm03 cspm04 ]

        Resource Group: UMgroup01
            UmDummy01  (ocf::heartbeat:Dummy): Started cspm01
            UmDummy02  (ocf::heartbeat:Dummy): Started cspm01
            UmDummy03  (ocf::heartbeat:Dummy): Started cspm01
            UmDummy04  (ocf::heartbeat:Dummy): Started cspm01
        Clone Set: clnUMgroup01
            Started: [ cspm01 cspm04 ]


after the same error as the above,
the result of crm_mon was strange.

============
Last updated: Tue Mar 16 21:54:46 2010
Stack: openais
Current DC: cspm01 - partition with quorum
Version: 1.0.8-a77303a7adce stable-1.0 tip
4 Nodes configured, 4 expected votes
2 Resources configured.
============

Online: [ cspm01 cspm02 cspm03 cspm04 ]

        Clone Set: clnUMgroup01
            Resource Group: clnUmResource:0
                clnUMdummy01:0 (ocf::heartbeat:Dummy01):       Started  
cspm01
(unmanaged) FAILED
                clnUMdummy02:0 (ocf::heartbeat:Dummy02):       Stopped
            Stopped: [ clnUmResource:1 ]

Failed actions:
           clnUMdummy01:0_monitor_10000 (node=cspm01, call=9, rc=7,
status=complete): not running
           clnUMdummy01:0_stop_0 (node=cspm01, call=21, rc=1,
status=complete):
unknown error


In this case, clone instance on cspm04 was stopped.
I didn't change the rsc_colocation or order setting.
Which case is the expected?

By the way, I tried to get the log with hb_report,
but it failed to gather ha_log.txt,
its size was 0 bite...
anyway, I attached hb_report.

Thanks,
Junko
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Dummy_x4.tar.bz2
Type: application/octet-stream
Size: 38804 bytes
Desc: not available
URL: <http://lists.clusterlabs.org/pipermail/pacemaker/attachments/20100316/8fe892e7/attachment.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Dummy_x3.tar.bz2
Type: application/octet-stream
Size: 66242 bytes
Desc: not available
URL: <http://lists.clusterlabs.org/pipermail/pacemaker/attachments/20100316/8fe892e7/attachment-0001.obj>


More information about the Pacemaker mailing list