Monday, December 11, 2017

HANA Index Server is not coming up

We have seen very high memory usage and user queries are taking longer and memory is not releasing. To release memory we have restarted HANA using below commands

sapcontrol -nr 00 -function StopSystem
sapcontrol -nr 00 -function StartSystem

HANA Database is not coming online and we see that Index Server is Initializing and we waited for 25 mins and no luck

$ sapcontrol -nr 00 -function GetProcessList

11.12.2017 08:29:31
GetProcessList
OK
name, description, dispstatus, textstatus, starttime, elapsedtime, pid
hdbdaemon, HDB Daemon, GREEN, Running, 2017 12 04 21:58:59, 154:30:32, 5259
hdbcompileserver, HDB Compileserver, GREEN, Running, 2017 12 04 21:59:31, 154:30:00, 5490
hdbindexserver, HDB Indexserver, YELLOW, Initializing, 2017 12 04 21:59:33, 154:29:58, 5528
hdbnameserver, HDB Nameserver, GREEN, Running, 2017 12 04 21:59:03, 154:30:28, 5283
hdbpreprocessor, HDB Preprocessor, YELLOW, Initializing, 2017 12 04 21:59:31, 154:30:00, 5492
hdbwebdispatcher, HDB Web Dispatcher, YELLOW, Running, 2017 12 04 22:37:54, 153:51:37, 11186
hdbxsengine, HDB XSEngine, YELLOW, Initializing, 2017 12 04 21:59:33, 154:29:58, 5530
In meanwhile when I look into the logs I see below errors

1. HANA OUT OF MEMORY, that is expected

indexserver_scllpa0000012.30003.rtedump.20171204-173900.014195.compositelimit_oom.trc
Composite limit violation (OUT OF MEMORY) occurred.

2. This below error is new and not seen in HANA logs before

Unknown pending exception
Throw location of the exception unknown
[4216]{-1}[-1/-1] 2017-12-04 19:58:44.751308 d Crypto           RootKeyStoreAccessorBase.cpp(00093) : Caught RootKeyStoreRecordNotFoundException reading SSFS version /usr/sap/HANADBX/SYS/global/hdb/security/ssfs:HDB_SERVER/PERSISTENCE/ROOTKEY/VERSION
[4216]{-1}[-1/-1] 2017-12-04 19:58:44.751320 a Crypto           RootKeyStoreAccessorBase.cpp(00072) : ExcExit time_t Crypto::SecureStore::RootKeyAccess::RootKeyStoreAccessorBase::readValueForContentKey(const string&, Crypto::Buffer&)(1596usec)
exception  1: no.301103  (Crypto/Ssfs/RootKeyAccess/RawAccess/RawRootKeyStoreReaderWriter.cpp:52)
    RawRootKeyStoreReader::read: SSFS-4218: Record with key "HDB_SERVER/PERSISTENCE/ROOTKEY/VERSION" not found in secure storage <-- SSFS-4215: Data file "/usr/sap/HANADBX/SYS/global/hdb/security/ssfs/SSFS_HANADBX.DAT" exists, but does not contain the requested entry
exception throw location:
 1: 0x00007fd869fe4721 in Crypto::SecureStore::RootKeyAccess::RawAccess::RawRootKeyStoreReader::read(Crypto::Buffer&)+0x40 at RawRootKeyStoreReaderWriter.cpp:86 (libhdbbasement.so)
 2: 0x00007fd869fe2f06 in Crypto::SecureStore::RootKeyAccess::RawAccess::RawRootKeyStore<false, false>::read(ltt::basic_string<char, ltt::char_traits<char> > const&, ltt::basic_string<char,

3. We see that HANA trying to recovery and but not coming online.

[4216]{-1}[-1/-1] 2017-12-04 19:59:59.100341 i Logger           PersistenceManagerImpl.cpp(06361) : Starting log replay at position 0x2b6f7dd9d02
[4216]{-1}[-1/-1] 2017-12-04 19:59:59.100362 i Logger           RecoveryHandlerImpl.cpp(01681) : Triggering recovery of remaining log
[4216]{-1}[-1/-1] 2017-12-04 19:59:59.100399 i Logger           RecoveryHandlerImpl.cpp(01741) : Finishing log recovery, waiting for in-process and in-load segments
[4607]{-1}[-1/-1] 2017-12-04 19:59:59.100685 i Logger           PersistenceManagerImpl.cpp(05084) : Found savepoint 253169 log record
[4607]{-1}[-1/-1] 2017-12-04 19:59:59.143111 i Logger           RecoveryHandlerImpl.cpp(02271) : Log recovery cannot expect any more data at position 0x2b6f7decec0, load states:
  - LogPartition[/hana/log/HANADBX/mnt00001/hdb00004/:0]: #segments 0/1 processed, 1/1 loaded, 1 total, last loaded LogSegment[0:0x2b6f7dd6d80/Writing,ts=2017-12-04 19:55:56.333631][GUID=5575D4BD-004F-20171119-045408-3E00001BA3/PrevGUID=5575D4BD-004F-20171119-045408-3E00001BA2/PersGUID=47A64522-004F-20150528-232756-3C50000001/RestoreGUID=74736948-6469-7547-0000-002000000000]
[4607]{-1}[-1/-1] 2017-12-04 19:59:59.143164 i Logger           RecoveryHandlerImpl.cpp(02553) : Recovery finished at log position 0x2b6f7decec0
[4597]{-1}[-1/-1] 2017-12-04 19:59:59.582532 i Logger           RecoveryHandlerImpl.cpp(03921) : Signaling finish wait barrier
[4216]{-1}[-1/-1] 2017-12-04 19:59:59.586365 i Logger           LogSegment.cpp(00672) : Closing open log segments after log recovery completed at position 0x2b6f7decec0
[4216]{-1}[-1/-1] 2017-12-04 19:59:59.586411 i Logger           LogSegment.cpp(00760) : Closing open on-disk segment LogSegment[0/18:0x2b6f7dd6d80-0x2b6f7decec0(0x585000B)/GUID=5575D4BD-004F-20171119-045408-3E00001BA3/PrevGUID=5575D4BD-004F-20171119-045408-3E00001BA2,TS=2017-12-04 19:55:56.333631,Hole=0x2b6f7dd6d80/Writing/0x0]@0x00007f43f1a385c0 after log recovery completed at position 0x2b6f7decec0
[4216]{-1}[-1/-1] 2017-12-04 19:59:59.587204 i Logger           LoggerImpl.cpp(01468) : Replayed 5009280B (4MB) of log in 1.522 seconds; 3.13877MB/s; max known TID=379629377
[4216]{-1}[-1/-1] 2017-12-04 19:59:59.587299 i Logger           LoggerImpl.cpp(01064) : Starting logger with 8 log buffers per partition, 1024KB each, segment size 1024MB, log mode normal
[4511]{-1}[-1/-1] 2017-12-04 20:00:39.628427 i Logger           RecoveryHandlerImpl.cpp(00701) : Termination of indoubt transactions. (raw page size in bytes to be still rolled back 136717467648 or garbage collected 0)
[4512]{-1}[-1/-1] 2017-12-04 20:01:29.664585 i Logger           RecoveryHandlerImpl.cpp(00701) : Termination of indoubt transactions. (raw page size in bytes to be still rolled back 133028642816 or garbage collected 0)

We have seen this type of behavior earlier but not same errors, restarting HANA helped to resolve the issue. We restarted HANA again and to see where it helps, but still “HANA Index Server is Initializing” and the log stopped at below error and no further update..

[4216]{-1}[-1/-1] 2017-12-04 19:59:05.121382 i Logger           LoggerImpl.cpp(00525) : Setting log segment backup timeout to 900s, original value was 0s
[4216]{-1}[-1/-1] 2017-12-04 19:59:05.190851 i PersistenceManag PersistenceManagerImpl.cpp(05984) : BACKUP DATA seems to have been executed, database should be recoverable from backup
[4216]{-1}[-1/-1] 2017-12-04 19:59:05.190879 i Logger           PersistenceManagerImpl.cpp(06226) : Deactivating REDO logging for log replay
[4216]{-1}[-1/-1] 2017-12-04 19:59:05.190881 i Logger           PersistenceManagerImpl.cpp(06237) : Start termination of rollback(s) open in restart/backup savepoint
[4216]{-1}[-1/-1] 2017-12-04 19:59:32.523954 i Logger           PersistenceManagerImpl.cpp(06245) : Termination of rollback(s) open in restart/backup savepoint finished in 27.2676 seconds;

[4607]{-1}[-1/-1] 2017-12-04 19:59:59.143164 i Logger           RecoveryHandlerImpl.cpp(02553) : Recovery finished at log position 0x2b6f7decec0
[4597]{-1}[-1/-1] 2017-12-04 19:59:59.582532 i Logger           RecoveryHandlerImpl.cpp(03921) : Signaling finish wait barrier
[4216]{-1}[-1/-1] 2017-12-04 19:59:59.586365 i Logger           LogSegment.cpp(00672) : Closing open log segments after log recovery completed at position 0x2b6f7decec0
[4216]{-1}[-1/-1] 2017-12-04 19:59:59.586411 i Logger           LogSegment.cpp(00760) : Closing open on-disk segment LogSegment[0/18:0x2b6f7dd6d80-0x2b6f7decec0(0x585000B)/GUID=5575D4BD-004F-20171119-045408-3E00001BA3/PrevGUID=5575D4BD-004F-20171119-045408-3E00001BA2,TS=2017-12-04 19:55:56.333631,Hole=0x2b6f7dd6d80/Writing/0x0]@0x00007f43f1a385c0 after log recovery completed at position 0x2b6f7decec0
[4216]{-1}[-1/-1] 2017-12-04 19:59:59.587204 i Logger           LoggerImpl.cpp(01468) : Replayed 5009280B (4MB) of log in 1.522 seconds; 3.13877MB/s; max known TID=379629377
[4216]{-1}[-1/-1] 2017-12-04 19:59:59.587299 i Logger           LoggerImpl.cpp(01064) : Starting logger with 8 log buffers per partition, 1024KB each, segment size 1024MB, log mode normal
[4511]{-1}[-1/-1] 2017-12-04 20:00:39.628427 i Logger           RecoveryHandlerImpl.cpp(00701) : Termination of indoubt transactions. (raw page size in bytes to be still rolled back 136717467648 or garbage collected 0)

[4607]{-1}[-1/-1] 2017-12-04 19:59:59.143164 i Logger           RecoveryHandlerImpl.cpp(02553) : Recovery finished at log position 0x2b6f7decec0
[4597]{-1}[-1/-1] 2017-12-04 19:59:59.582532 i Logger           RecoveryHandlerImpl.cpp(03921) : Signaling finish wait barrier

After reviewing the logs I see that many errors are DEVIATING the problem and it looks like issue with log segment. Normally you will see this behavior when your log file system is full or backup backup file system is full.

I have verified all locations and everything looks good except log backup location. When I am trying to list or view the log backups it just hung, I see that there is some issue with backup location.

We have unmounted and remounted back and also renamed old backup location and created new. When I look into the logs I see that HANA coming online..

As you see below log HANA was just hanging and after fixing issue immediately HANA coming online.

[5699]{-1}[-1/-1] 2017-12-04 22:07:28.699548 i Basis            Helper.cpp(00514) : Using 'x64_64 ABI unwind' for stack tracing
[5576]{-1}[-1/-1] 2017-12-04 22:37:20.216535 i Logger           LogSegment.cpp(00672) : Closing open log segments after log recovery completed at position 0x2b6f7decf00
[5576]{-1}[-1/-1] 2017-12-04 22:37:20.216830 i Logger           LogSegment.cpp(00760) : Closing open on-disk segment LogSegment[0/32:0x2b6f7decec0-

Note that during HANA index server initialization, it needs all required resources for rollback,recovery,backup..etc. But log backup mount point was timing out and not accessible from HANA side which is causing the issue.

1 comment:

  1. Good Post! The term paper writer of buyanessay.co I had was fantastic and it was delivered sooner than my deadline! I plan on using him again for any papers I need written in the future.

    ReplyDelete