[ClusterLabs] Resource ocf:heartbeat:asterisk fails to start

Mon Aug 3 21:00:36 EDT 2015

> On 25 Jul 2015, at 2:38 am, Danilo Malcangio <d.malcangio at eletech.it> wrote:
> 
> Hello everyone,
> I have a cluster with the following configuration
> 
> 
> node 1: pc-1
> node 2: pc-2
> 
> primitive asterisk asterisk \
>         params user=root group=root maxfiles=65536 monitor_sipuri="sip:10.2.31.240"
> primitive pingGW PingOnFailOver
> primitive tftp lsb:tftpd-hpa \
>         op monitor interval=30s \
>         op start interval=0 timeout=120s \
>         op stop interval=0 timeout=120s
> primitive virtual-ip IPaddr2 \
>         params ip=10.2.31.240 cidr_netmask=20
> colocation et-cluster-dependency inf: virtual-ip asterisk pingGW tftp
> order et-cluster-order inf: virtual-ip asterisk pingGW tftp
> 
> 
> I have installed sipsak to use the ocf:heartbeat:asterisk RA
> Asterisk has a binding on the virtual ip (bindnetaddr)
> Resource asterisk doesn't start and I get the following errors with crm_mon
> 
> 
> Online: [ pc-1 pc-2 ]
> 
> virtual-ip      (ocf::heartbeat:IPaddr2):       Started pc-2
> 
> Failed actions:
>     asterisk_start_0 on pc-1 'unknown error' (1): call=20, status=complete, last-rc-change='Fri Jul 24 18:0
> 9:05 2015', queued=0ms, exec=2131ms
>     asterisk_start_0 on pc-2 'unknown error' (1): call=25, status=complete, last-rc-change='Fri Jul 24 18:0
> 9:21 2015', queued=0ms, exec=2123ms
> 
> 
> I tried to debug the RA as described here http://clusterlabs.org/wiki/Debugging_Resource_Failures, configured the cluster only with the virtual ip (10.2.31.240)
> 
> root at pc-1:~# echo $OCF_ROOT
> /usr/lib/ocf
> root at pc-1:~# export OCF_RESKEY_user=root
> root at pc-1:~# export OCF_RESKEY_group=root
> root at pc-1:~# export OCF_RESKEY_maxfiles=65536
> root at pc-1:~# export OCF_RESKEY_monitor_sipuri=sip:10.2.31.240
> 
> 
> 
> root at pc-1:~# /usr/lib/ocf/resource.d/heartbeat/asterisk start ; echo $?
> ERROR: /usr/lib/ocf/resource.d/heartbeat/asterisk: 1: kill: No such process
> INFO: Asterisk PBX not running: removing old PID file
> ERROR: Unable to connect to remote asterisk (does /var/run/asterisk/asterisk.ctl exist?)
> INFO: Asterisk PBX not running yet
> INFO: 0 active channels 0 active calls 0 calls processed
> ERROR: command failed: sipsak -s sip:10.2.31.240
> ERROR: Asterisk PBX start failed
> 1
> 
> root at pc-1:~# /usr/lib/ocf/resource.d/heartbeat/asterisk start ; echo $?                                             INFO: Asterisk PBX already running
> 0
> 
> 
> Running the script from shell I get those errors the first time, but asterisk starts, infact if I run the script again it results online.
> It happened that twice the resource started correctly with the others, but after a failover the resource didn't start on the other node.
> 
> What am I missing?

Unless you think the agent is running the wrong command, the asterisk logs is where I would be looking next. 

> 
> Thanks
> 
> 
> P.s.
> 
> My conf is
> Debian Jessie 8.1
> Pacemaker 1.1.12
> Corosync 2.3.4
> crmsh 2.1.0
> asterisk 13.4.0
> 
>  
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org