[ClusterLabs] Oracle and pcs resources problem

Tue Apr 4 04:30:54 EDT 2023

Hi Dejan Muhamedagic

We are using pacemaker with your pcs resource oracle and oralsnr.
When we sometimes execute DELETE (thousands rows and transaction time
mostly > 10s Oracle will usually stop and it is very bad.
These servers are mission critical.
Indexes is rebuilded and statistics is analyzed every day.
Structure of table is

CREATE TABLE TDC.VD_ARC10
(
    IDST     INTEGER,
    IDTP     INTEGER,
    IDLN     INTEGER,
    DTIME    DATE,
    FLAG     CHAR (1 BYTE) NOT NULL,
    VALID    CHAR (1 BYTE) NOT NULL,
    VALUE    NUMBER NOT NULL
)
TABLESPACE USERS
PCTFREE 10
INITRANS 1
MAXTRANS 255
STORAGE (INITIAL 64 K
         NEXT 1 M
         MINEXTENTS 1
         MAXEXTENTS UNLIMITED
         PCTINCREASE 0
         BUFFER_POOL DEFAULT
         FLASH_CACHE DEFAULT
         CELL_FLASH_CACHE DEFAULT)
NOLOGGING
NOCOMPRESS
NOCACHE
RESULT_CACHE (MODE DEFAULT)
NOPARALLEL;

indexes are

--
-- VD_ARC10_I1  (Index) 
--

CREATE UNIQUE INDEX TDC.VD_ARC10_I1
    ON TDC.VD_ARC10 (DTIME,
                     IDST,
                     IDTP,
                     IDLN)
    NOLOGGING
    TABLESPACE USERS
    PCTFREE 10
    INITRANS 2
    MAXTRANS 255
    STORAGE (INITIAL 50 M
             NEXT 100 M
             MINEXTENTS 1
             MAXEXTENTS UNLIMITED
             PCTINCREASE 0
             BUFFER_POOL DEFAULT
             FLASH_CACHE DEFAULT
             CELL_FLASH_CACHE DEFAULT)
    NOPARALLEL;

--
-- VD_ARC10_PK  (Index) 
--

CREATE UNIQUE INDEX TDC.VD_ARC10_PK
    ON TDC.VD_ARC10 (IDST,
                     IDTP,
                     IDLN,
                     DTIME)
    NOLOGGING
    TABLESPACE USERS
    PCTFREE 10
    INITRANS 2
    MAXTRANS 255
    STORAGE (INITIAL 50 M
             NEXT 100 M
             MINEXTENTS 1
             MAXEXTENTS UNLIMITED
             PCTINCREASE 0
             BUFFER_POOL DEFAULT
             FLASH_CACHE DEFAULT
             CELL_FLASH_CACHE DEFAULT)
    NOPARALLEL;

ALTER TABLE TDC.VD_ARC10
    ADD (CONSTRAINT VD_ARC10_PK PRIMARY KEY (IDST,
                                             IDTP,
                                             IDLN,
                                             DTIME)
             USING INDEX TDC.VD_ARC10_I1 ENABLE VALIDATE);

Number of records are over 18 million rows. Your
script tries to connect to database and checks if
it is succesfull or not.
But Oracle instance could be living if it is in NOMOUNT
or MOUNT state and not in the state OPEN. Time between
NOMOUNT and OPEN state could sometimes spend minutes if
Oracle do consistency from redo files. Cluster resource
still shutdown database or disconnects cluster and I must
reboot all servers more than once.

If is instance living, we could get an information by

sqlplus / as sysdba
select status from v$instance

STATUS return STARTED, MOUNT or OPEN but if it is
down and it is not possible to connect as sysdba Oracle
shows error message:

ORA-01034: ORACLE not available

Static queries like view v$instance work also in not
opened database.

We need not to shutdown database or not to disconect cluster
when long time queries is running. What to do? I am beginner
in administration of pcs resources. Timouts of resources?

[root at vdpr2 ~]# pcs resource show
 Master/Slave Set: cluster_drbd_clone [cluster_drbd]
     Masters: [ vdpr2a ]
     Slaves: [ vdpr2b ]
 cluster_www    (ocf::heartbeat:apache):        Started vdpr2a
 cluster_cron   (systemd:crond):        Started vdpr2a
 Resource Group: telemat_group
     cluster_ip (ocf::heartbeat:IPaddr2):       Started vdpr2a
     cluster_ip_gprs    (ocf::heartbeat:IPaddr2):       Started vdpr2a
     cluster_fs (ocf::heartbeat:Filesystem):    Started vdpr2a
     cluster_hostname   (lsb:hostname): Started vdpr2a
     cluster_oracle_listener    (ocf::heartbeat:oralsnr):       Started 
vdpr2a
     cluster_oracle     (ocf::heartbeat:oracle):        Started vdpr2a
     cluster_delay_telemat      (ocf::heartbeat:Delay): Started vdpr2a
     cluster_scada      (systemd:monit-node):   Started vdpr2a
     cluster_delay_telemat2     (ocf::heartbeat:Delay): Started vdpr2a
     cluster_centrala   (systemd:monit-node2):  Started vdpr2a
[root at vdpr2 ~]#

[root at vdpr2 ~]# pcs resource show cluster_oracle
 Resource: cluster_oracle (class=ocf provider=heartbeat type=oracle)
  Attributes: sid=ORAC user=oracle
  Operations: methods interval=0s timeout=5s 
(cluster_oracle-methods-interval-0s)
              monitor interval=120s timeout=30s 
(cluster_oracle-monitor-interval-120s)
              start interval=0s timeout=300s 
(cluster_oracle-start-interval-0s)
              stop interval=0s timeout=120s 
(cluster_oracle-stop-interval-0s)
[root at vdpr2 ~]#

[root at vdpr2 ~]# pcs resource show cluster_oracle_listener
 Resource: cluster_oracle_listener (class=ocf provider=heartbeat 
type=oralsnr)
  Attributes: sid=ORAC user=oracle
  Meta Attrs: migration-threshold=3
  Operations: methods interval=0s timeout=5s 
(cluster_oracle_listener-methods-interval-0s)
              monitor interval=10s timeout=30s 
(cluster_oracle_listener-monitor-interval-10s)
              start interval=0s timeout=120s 
(cluster_oracle_listener-start-interval-0s)
              stop interval=0s timeout=120s 
(cluster_oracle_listener-stop-interval-0s)
[root at vdpr2 ~]#

How to test status of Oracle during startup or shutdown (do nothing yet):

[oracle at vdpr2 ~]$ sqlplus / as sysdba << EOF
> SELECT status FROM v\$instance;
> exit;
> EOF

Output:
SQL*Plus: Release 12.2.0.1.0 Production on Tue Apr 4 10:04:54 2023

Copyright (c) 1982, 2016, Oracle.  All rights reserved.

Connected to:
Oracle Database 12c Standard Edition Release 12.2.0.1.0 - 64bit Production

SQL>
STATUS
------------
OPEN

SQL> Disconnected from Oracle Database 12c Standard Edition Release 
12.2.0.1.0 -   64bit Production
[oracle at vdpr2 ~]$

If STATUS is STARTED then it is in NOMOUNT state but instance is started
If STATUS is MOUNT then Oracle opens the control files and find names of
datafiles and online redo files.
If STATUS is OPEN then Oracle is ready to connect by another user then
/ as sysdba

The best is test status if it is in STATUS STARTED or MOUNT and wait
until Oracle will open. It can spend minutes.

If listener is running and working properly we could test it by

[oracle at vdpr2 ~]$ lsnrctl status | grep "READY"
  Instance "ORAC", status READY, has 1 handler(s) for this service...
  Instance "ORAC", status READY, has 1 handler(s) for this service...

Do you have any idea why DELETE with transaction time more then 10
seconds shutdowns database? Thanks for advice in advance.

S pozdravem 

Robert Reich, informatik
_______________________________

Povodí Ohře, státní podnik
Bezručova 4219, 430 03 Chomutov
tel.  : +420 474 636 352
mobil : +420 606 757 459
e-mail: reich at poh.cz
http://www.poh.cz/
_______________________________

Kontrolováno antivirovým systémem SOPHOS

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20230404/db2fb907/attachment-0001.htm>