Monday, August 3, 2020

ORA-16198: Received timed out error from KSR & LAD:2 network reconnect

I have setup Data Guard and log synchronization happening without any issue but recently I found below errors, which are happening rarely in alert.log file

GWR (PID:13557): ORA-16198: Received timed out error from KSR
LGWR (PID:13557): Attempting LAD:2 network reconnect (16198)
LGWR (PID:13557): LAD:2 n
etwork reconnect abandoned
2020-07-13T15:32:25.865034-07:00
Errors in file /oracle/diag/rdbms/orcl/ORCL/trace/orcl_lgwr_13557.trc:
ORA-16198: Timeout incurred on internal channel during remote archival
LGWR (PID:13557): Error 16198 for LNO:2 to 'orcl_stdby'
2020-07-13T15:32:25.865253-07:00
LGWR (PID:13557): LAD:2 is UNSYNCHRONIZED
LGWR (PID:13557): Failed to archive LNO:2 T-1.S-5003, error=16198
LGWR (PID:13557): Error 16198 disconnecting from LAD:2 standby host 'orcl_stdby'


Redo Transport Services failed with ORA-16198 error from primary database standby database using LGWR SYNC mode.

Data Guard Broker log shows below error messages…

Data Guard Broker Status Summary:
  Type                        Name                             Severity  Status
  Configuration               orcl_dg_fsfconf                   Warning  ORA-16608: one or more members have warnings
  Primary Database            orcl                              Success  ORA-0: normal, successful completion
  Physical Standby Database   orcl_stdby                        Warning  ORA-16853: apply lag has exceeded specified threshold

When I very configuration everything looks good and no issues.

DGMGRL for Linux: Release 19.0.0.0.0 - Production on Mon Aug 3 15:19:29 2020
Version 19.5.0.0.0
Copyright (c) 1982, 2019, Oracle and/or its affiliates. All rights reserved.
Welcome to DGMGRL, type "help" for information.
Connected to "ORCL"
Connected as SYSDBA.
DGMGRL> show configuration;
Configuration - orcl_dg_fsfconf
Protection Mode: MaxAvailability
Members:
orcl - Primary database
orcl_stdby - Physical standby database
Fast-Start Failover: Disabled
Configuration Status:
SUCCESS (status updated 34 seconds ago)

If this is happening frequently then it is critical and must take action immediately. In my case it is happening rarely which lean towards time out setting. I see that my current DB has net_timeout as 30.
After research, I have increased the net_timeout to 60 and issue has been resolved.

In case if you are not using Data Guard broker then you can run below command from SQL*Plus
SQL> alter system set log_archive_dest_2 service="orcl_stdby", SYNC AFFIRM delay=0 optional compression=disable max_failure=0 reopen=300 db_unique_name="orcl_stdby" net_timeout=30, valid_for=(online_logfile,all_roles)

In case if you are using DGMGRL command-line interface, then you can run below command

DGMGRL> EDIT DATABASE 'ORCL' SET PROPERTY NetTimeout = 30;

Thanks,
Satishbabu G, Oracle ACE
http://oracleracepxert.com

ORA-00700: soft internal error, arguments

I have come across ORA-00700 error recently and found below error in the logs. This

ORA-00700: soft internal error, arguments: [dbgrfafr_1], [60], [60], [1], [0x7FB5DD11BC70], [], [], [], [], [], [], []
Incident details in: /oracle/diag/rdbms/orcl/ORCL/incident/incdir_75765/ORCL_ora_8274_i75765.trc


Incident details in the trace will have more detailed error. In my case below connections are coming from foglight monitoring tool . After research found that this error can occur due failing statement called from monitoring tools like TOAD,foglight..etc

Dump file /oracle/diag/rdbms/orcl/ORCL/incident/incdir_75765/ORCL_ora_8274_i75765.trc
Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
Version 19.5.0.0.0
Build label: RDBMS_19.3.0.0.0DBRU_LINUX.X64_190417
ORACLE_HOME: /oracle/product/19.0.0.0/dbhome_1
System name: Linux
Node name: server1
Release: 3.10.0-1062.12.1.el7.x86_64
Version: #1 SMP Thu Dec 12 06:44:49 EST 2019
Machine: x86_64
Instance name: ORCL
Redo thread mounted by this instance: 1
Oracle process number: 130
Unix process pid: 29807, image: oracle@server1

*** 2020-05-20T00:45:56.870884-07:00
*** SESSION ID:(204.25936) 2020-05-20T00:45:56.870895-07:00
*** CLIENT ID:() 2020-05-20T00:45:56.870900-07:00
*** SERVICE NAME:(ORCL) 2020-05-20T00:45:56.870904-07:00
*** MODULE NAME:(Foglight for Oracle <5.9.5.20>) 2020-05-20T00:45:56.870909-07:00
*** ACTION NAME:(isrgorcl-db-ORCL-DBO_Instance_General) 2020-05-20T00:45:56.870914-07:00
*** CLIENT DRIVER:(jdbcthin : 12.2.0.1.0) 2020-05-20T00:45:56.870919-07:00

[TOC00000]
Jump to table of contents
Dump continued from file: /oracle/diag/rdbms/orcl/ORCL/trace/ORCL_ora_29807.trc
[TOC00001]
ORA-00700: soft internal error, arguments: [dbgrfafr_1], [60], [60], [0], [0x7FFC2A3E6488], [], [], [], [], [], [], []

[TOC00001-END]
[TOC00002]
========= Dump for incident 201097 (ORA 700 [dbgrfafr_1]) ========

*** 2020-05-20T00:45:56.871792-07:00
dbkedDefDump(): Starting incident default dumps (flags=0x2, level=3, mask=0x0)
[TOC00003]

----- Current SQL Statement for this session (sql_id=gxpn94tgbwz4d) -----
select /*+ RULE */ version "db_version",
decode(log_mode,'NOARCHIVELOG','NO','YES') "is_archived",
decode(a.asm_files,0,'NO','YES') "is_asm",
case when dg.dataguard > 0 THEN 'YES'
when controlfile_type = 'STANDBY' THEN 'YES'
else 'NO'
end as "is_dataguard",
database_role,
decode(r.rman_conf,0,'NO','YES') "is_rman",
flashback_on "is_flashback",
instance_name,
instance_number,
open_mode,
decode(c.cell_count,0,'NO','YES') "is_exadata",
decode(al.al_count,0,'NO','YES') "is_alertlog",
'NO' "is_rds",
decode(cdb,'YES','YES','NO') is_pluggable
from sys.v$instance, sys.v$database,
(select count(*) asm_files from v$datafile where name like '+%' and rownum<2) a,
(select count(*) dataguard from sys.v$archive_dest where target = 'STANDBY' and rownum<2) dg,
(select count(*) rman_conf from sys.v$RMAN_STATUS where rownum<2) r,
(select count(*) cell_count from sys.v$cell where rownum<2) c,
(select count(*) al_count from sys.x$DBGALERTEXT where rownum<2) al
[TOC00003-END]

Also few other users encountered HIGH CPU consumption issue. This can happen when $DBGALERTEXT is populated from the XML alert log file situated in the ADR location. Note that this is an undocumented fixed table and when XML alert log is very large it takes time to access and also it can cause ORA-700 error.

You can query on X$DBGALERTEXT table to see high CPU taking a long time to complete.

SELECT count(*)
FROM X$DBGALERTEXT
WHERE to_date(to_char(originating_timestamp, 'dd-mon-yyyy hh24:mi'), 'dd-mon-yyyy hh24:mi') > to_date(to_char(systimestamp - .00694, 'dd-mon-yyyy hh24:mi'), 'dd-mon-yyyy hh24:mi') /* last 10 minutes */
AND (message_text = 'ORA-00600'
OR message_text LIKE 'Ăștal%'
OR message_text LIKE '%error%'
OR message_text LIKE '%ORA-%'
OR message_text LIKE '%terminating the instance%');

It’s a good practice to purge the table regularly and it can be purged using ADRCI utility.

--First make sure you check the count
SQL> select count(*) from X$DBGALERTEXT;

--Connect to ADRCI utility

$adrci> show home
ADR Homes:
diag/rdbms/orcl/ORCL
diag/tnslsnr/server1/listener_orcl
$adrci> SET HOMEPATH /oracle/diag/rdbms/orcl/ORCL

--Run below command to purse logs older than 1 day.
adrci> purge -age 1440 -type alert
adrci> exit

If you still see the error from Monitoring tools like TOAD, FOGLIGHT contact the vendor. If the issue is not from monitoring tools then contact Oracle

Refer  Metalink note 2056666.1  for more info.

Regards
Satishbabu G, Oracle ACE

Tuesday, July 28, 2020

RENAME Table or Column or Index in Oracle

RENAME Table
You can rename an existing table in any schema except the schema SYS. To rename a table you must be either DB owner or Table owner.

Use RENAME TABLE to rename a table.

Syntax: RENAME TABLE table-Name TO new-Table-Name
Ex:- RENAME TABLE EMPLOYEE TO EMPLOYEE_ACT

If you have a view or foreign key that references the table, attempts to rename it will generate an error. Also if there are any check constraints or triggers on the table, attempts to rename it will also generate an error.

RENAME Column: Use the RENAME COLUMN to rename a column in a table.
ex: RENAME COLUMN EMPLOYEE.Employee_ID TO EMP_ID;

You can use ALTER TABLE and RENAME COLUMN to modify Column data type
ALTER TABLE EMPLOYEE ADD COLUMN EMP_ID NEWTYPE
UPDATE EMPLOYEE SET EMP_ID = Employee_ID
ALTER TABLE EMPLOYEE DROP COLUMN Employee_ID
RENAME COLUMN t. Employee_ID TO EMP_ID

If a view, trigger, check constraint, foreign key constraint then attempt to rename it will generate an error. Also RENAME COLUMN is not allowed if you have any open cursors that reference the column that is being altered.

If there is an index defined on the column then you can still rename. The index will update automatically to refer by its new name

RENAME Index: Use the RENAME Index to rename an index, you cannot rename indexes in SYS schema

Ex:- RENAME INDEX EMPLOYEE_ID_INDEX TO EMP_ID_INDEX

In case if there are any open cursors that reference the index being renamed then RENAME INDEX is not allowed .

Thanks
Satishbabu G, Oracle ACE

How to enable Oracle SQL*Net Client, Server and Listener Tracing

To enable tracing make sure TNS_ADMIN environment variable should be set and add below tracing parameters in sqlnet.ora. Pls note that any trace settings apply to all server processes so you need to careful when making changes on Prod, I would always suggest to test before applying changes on PRD.
Enable SQL*Net server and Client side tracing.
TRACE_LEVEL_CLIENT = SUPPORT
TRACE_UNIQUE_CLIENT = on
TRACE_LEVEL_SERVER = SUPPORT
TRACE_DIRECTORY_CLIENT = \tmp
TRACE_FILE_CLIENT = Client_Trace.log
TRACE_DIRECTORY_SERVER = \tmp
TRACE_FILE_SERVER = Server_Trace.log
TRACE_TIMESTAMP_ CLIENT = ON
DIAG_ADR_ENABLED= OFF

Starting from 11g and higher to enable client, server trace you need to add DIAG_ADR_ENABLED= OFF

SQL*Net Listener tracing
TRACE_FILE_LISTENER = Listener_Trace.log
TRACE_DIRECTORY_LISTENER = \tmp
TRACE_LEVEL_LISTENER = SUPPORT
DIAG_ADR_ENABLED_LISTENER_<LISTENER> = OFF

To start Listener Trace you need to set DIAG_ADR_ENABLED_<listener name> = OFF (must set to OFF)

To enable Tnsping Trace
TNSPING.TRACE_LEVEL = SUPPORT
TNSPING.TRACE_DIRECTORY = <Trace file path>

Enabling Dynamic Listener Tracing
LSNRCTL> set current_listener listener_ORCL
Current Listener is listener_ORCL
LSNRCTL> set trc_level 16
Connecting to (ADDRESS=(PROTOCOL=tcp)(HOST=)(PORT=1521))
LISTENER parameter "trc_level" set to support
The command completed successfully
LSNRCTL> set trc_directory /oracle/diag/tnslsnr/host1/listener_orcl/trace
Connecting to (ADDRESS=(PROTOCOL=tcp)(HOST=)(PORT=1521))
LISTENER parameter "trc_directory" set to /oracle/diag/tnslsnr/host1/listener_orcl/trace

Turn off dynamic tracing using:
LSNRCTL>set current_listener listener
LSNRCTL>set trc_level OFF


Thanks
Satishbabu Gunukula, Oracle ACE
http://www.oracleracexpert.com

Wednesday, June 24, 2020

java.net.SocketException: No buffer space available (maximum connections reached?)

I have come across an issue that Tableau server degraded suddenly. We had to restart the services to bring Tableau server online.

We see below errors from the log file and after research, we found that Tableau Server Port Exhaustion caused the issue

Errors in Server event log
TCP/IP failed to establish an outgoing connection because the selected local endpoint was recently used to connect to the same remote endpoint. This error typically occurs when outgoing connections are opened and closed at a high rate, causing all available local ports to be used and forcing TCP/IP to reuse a local port for an outgoing connection. To minimize the risk of data corruption, the TCP/IP standard requires a minimum time period to elapse between successive connections from a given local endpoint to a given remote endpoint.

Error in Tableau log files
2020-07-05 07:10:42.800 -0700 (,,,) StatusServerThread_dataserver_2 : INFO com.tableausoftware.service.thrift.win32.NamedPipeServerTransport - Awaiting client connection on \\.\pipe\dataserver_2-status, handle native@0x1a3c.
2020-07-05 07:10:49.990 -0700 (,,,) FilestoreHostLocatorThread : ERROR org.apache.curator.framework.imps.CuratorFrameworkImpl - Background exception was not retry-able or retry gave up
java.io.IOException: Couldn't instantiate org.apache.zookeeper.ClientCnxnSocketNIO
at org.apache.zookeeper.ZooKeeper.getClientCnxnSocket(ZooKeeper.java:1845)
at org.apache.zookeeper.ZooKeeper.<init>(ZooKeeper.java:452)
at org.apache.curator.utils.DefaultZookeeperFactory.newZooKeeper(DefaultZookeeperFactory.java:29)
at org.apache.curator.framework.imps.CuratorFrameworkImpl$2.newZooKeeper(CuratorFrameworkImpl.java:150)
at org.apache.curator.HandleHolder$1.getZooKeeper(HandleHolder.java:94)
at org.apache.curator.HandleHolder.getZooKeeper(HandleHolder.java:55)
at org.apache.curator.ConnectionState.reset(ConnectionState.java:262)
at org.apache.curator.ConnectionState.start(ConnectionState.java:109)
at org.apache.curator.CuratorZookeeperClient.start(CuratorZookeeperClient.java:191)
at org.apache.curator.framework.imps.CuratorFrameworkImpl.start(CuratorFrameworkImpl.java:259)
at com.tableausoftware.tdfs.common.ZkConnection.<init>(ZkConnection.java:67)
at com.tableausoftware.tdfs.common.ZkConnectionProvider.getConnection(ZkConnectionProvider.java:50)
at com.tableausoftware.tdfs.common.HostLocator.lambda$initialize$1(HostLocator.java:146)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.GeneratedConstructorAccessor105.newInstance(Unknown Source)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.zookeeper.ZooKeeper.getClientCnxnSocket(ZooKeeper.java:1843)
... 19 more
Caused by: java.io.IOException: Unable to establish loopback connection at sun.nio.ch.PipeImpl$Initializer.run(PipeImpl.java:94)
at sun.nio.ch.PipeImpl$Initializer.run(PipeImpl.java:61)
at java.security.AccessController.doPrivileged(Native Method)
at sun.nio.ch.PipeImpl.<init>(PipeImpl.java:171)
at sun.nio.ch.SelectorProviderImpl.openPipe(SelectorProviderImpl.java:50)
at java.nio.channels.Pipe.open(Pipe.java:155)
at sun.nio.ch.WindowsSelectorImpl.<init>(WindowsSelectorImpl.java:127)
at sun.nio.ch.WindowsSelectorProvider.openSelector(WindowsSelectorProvider.java:44)
at java.nio.channels.Selector.open(Selector.java:227)
at org.apache.zookeeper.ClientCnxnSocketNIO.<init>(ClientCnxnSocketNIO.java:43)
... 23 more

Caused by: java.net.SocketException: No buffer space available (maximum connections reached?): bind
1. To Identify the dynamic port range

Open a command prompt as an administrator, and run the following command to determine the dynamic port range:

netsh int ipv4 show dynamicportrange tcp
or
netsh int ipv6 show dynamicportrange tcp

The command will display the port number of the first dynamic port, and the number of ports. You need to add the number of ports to the start port to determine the port range.

2. Run below command to view ports currently in use
netstat –n 

3. Run the following command to increase the dynamic port range:

netsh int <ipv4|ipv6> set dynamicportrange <tcp|udp> start=number num=range store=<active/persistent>

Note that port 65535 is the maximum port number . You can start by setting the range to include ports 32767 through 65535

Also refer below article To determine port exhaustion caused the issue.
https://kb.tableau.com/articles/issue/tableau-server-port-exhaustion-problems#:~:text=Increase%20the%20dynamic%20port%20range,is%20the%20maximum%20port%20number.

Refer below article identify port-exhaustion at server side.
https://support.microsoft.com/en-us/help/3123245/update-improves-port-exhaustion-identification-in-windows-server-2012
https://social.technet.microsoft.com/Forums/ie/en-US/b632acdc-a546-4014-a299-4c27781e6c5a/tcpip-failed-to-establish-an-outgoing-connection-event-id-4227?forum=winserverPN
https://technet.microsoft.com/en-us/library/cc938217.aspx 

Thanks & Regards,
Satishbabu Gunukula, Oracle ACE
http://www.oracleracexpert.com