Tuesday, April 27, 2010

Overview of Transparent application Failover (TAF) in Oracle

Transparent Application Failover (TAF) is a client-side feature that allows for clients to reconnect to surviving nodes in the event of a failure of an instance. The reconnect happens automatically from within the OCI (Oracle Call Interface) library. Any uncommitted transactions are rolled back and server side program variables and session properties will be lost. In some case the select statements automatically re-executed on the new connection with the cursor positioned on the row on which it was positioned prior to the failover.

For high availability and scalability, Oracle provides the Transparent Application Failover feature part of Oracle Real Application Clusters (RAC).

The failover is configured in tnsnames.ora file, the TAF settings are placed in CONNECT_DATA section of the tnsnames.ora using FAILOVER_MODES parameters.

FAILOVER_MODE contains the subparameters
----------------------------------------------------------
BACKUP: Specify a different net service name for backup instance connections. A backup should be specified when using PRECONNECT to pre-establish connections.

A sample configuration would look like

TESTDB1 =
(DESCRIPTION =
(LOAD_BALANCE = ON)
(FAILOVER = ON)
(ADDRESS = (PROTOCOL = TCP)(HOST = TESTRAC1-VIP)(PORT = 1521))
(ADDRESS = (PROTOCOL = TCP)(HOST = TESTRAC2-VIP)(PORT = 1521))
(CONNECT_DATA =
(SERVICE_NAME = testdb.oracleracexpert.com)
(FAILOVER_MODE = (TYPE = SELECT)(METHOD = PRECONNECT)(BACKUP=TESTDB2))
)
)

TYPE: TAF supports three types of failover types

1.SESSION failover - If a user's connection is lost, SESSION failover establishes a new session automatically created for the user on the backup node. This type of failover does not attempt to recover selects. This failover is ideal for OLTP (online transaction processing) systems, where transactions are small.

2.SELECT failover – If the connection is lost, Oracle Net establishes a connection to another node and re-executes the SELECT statements with cursor positioned on the row on which it was positioned prior to the failover. This mode involves overhead on the client side and Oracle NET keeps track of SELECT statements. This approach is best for data warehouse systems, where the transactions are big and complex

3.NONE: This setting is the default and failover functionality is provided. Use this setting to prevent failover.

A sample configuration would look like

TESTDB1 =
(DESCRIPTION =
(LOAD_BALANCE = ON)
(FAILOVER = ON)
(ADDRESS = (PROTOCOL = TCP)(HOST = TESTRAC1-VIP)(PORT = 1521))
(ADDRESS = (PROTOCOL = TCP)(HOST = TESTRAC2-VIP)(PORT = 1521))
(CONNECT_DATA =
(SERVICE_NAME = testdb.oracleracexpert.com)
(FAILOVER_MODE = (TYPE = SELECT)(METHOD = BASIC)(RETRIES = 10)(DELAY = 5))
)
)

METHOD: This parameters determines how failover occurs from the primary node to the backup node
BASIC: Use this mode to establish connections at failover time, no work on the backup server until failover time.
PRECONNECT: Use this mode to pre-established connections. This PRECONNECT mode provides faster failover but requires that the backup instance be capable of supporting all connections from every supported instance.

RETRIES: Use this parameter to specify number of times to attempt to connect after a failover. If DELAY is specified but RETRIES is not specified, RETRIES default to five retry attempts.

DELAY: Use this parameter to Specify the amount of time in seconds to wait between connect attempts. If RETRIES is specified but DELAY is not specified, DELAY default to one second.

A sample configuration would look like

TESTDB1 =
(DESCRIPTION =
(LOAD_BALANCE = ON)
(FAILOVER = ON)
(ADDRESS = (PROTOCOL = TCP)(HOST = TESTRAC1-VIP)(PORT = 1521))
(ADDRESS = (PROTOCOL = TCP)(HOST = TESTRAC2-VIP)(PORT = 1521))
(CONNECT_DATA =
(SERVICE_NAME = testdb.oracleracexpert.com)
(FAILOVER_MODE = (TYPE = SELECT)(METHOD = BASIC)(RETRIES = 10)(DELAY = 5))
)
)

Please note that you can pre-establish a connection to reduce the failover time using METHOD=PRECONNECT option.

To verify that TAF is correctly configured, you query FAILOVER_TYPE, FAILOVER_METHOD, and FAILED_OVER columns in the V$SESSION view.
SQL> SELECT MACHINE, FAILOVER_TYPE, FAILOVER_METHOD, FAILED_OVER, COUNT(*) FROM V$SESSION

Regards,
Satishbabu Gunukula
http://www.oracleracexpert.com

5 comments:

  1. Thanks for writing this article! However, I do have one question: my application consists of several multi-threaded programs with each thread having its own connection to the Oracle database. There are 10 processes running with each process having 24 threads so there are 240 threads and 240 connections that are always open with each connection running 1 to 3 database transactions per second. If TAF Failover is configured as "Type=Select", "Method=Basic", 200 Retries and the default Delay of 1 second and the failover connection fails, will each one of the 240 connections begin to retry to reconnect at the same time? My guess is that the answer is "Yes" and wonder whether all of these reconnects (almost 50,000 of them) could affect performance on the entire machine?

    Again, thanks for your posting and thanks for any help that you can give me.

    Monty

    ReplyDelete
  2. If your application is multi-threaded then you must be using some connection pooling method then you should have only few physical connections to database. The TAF is enabled for connection pooling and it will apply equally well with connections in the connection pool except that the BACKUP and PRECONNECT clauses should not be used in the connect string and do not work with connection pooling and TAF.

    for ex:- thread1...thread200--> pool1--> 4 physical connections--> Database server

    When a connection in the connection pool fails over, it uses the primary connect string itself to connect. Sessions failover when they use the pool for a database round trip after their instance failure. The listener would be configured to route it to a good instance if available, as is typical with service-based connect strings.

    You should not see any performance effect on the machine.

    Hope this helps,

    Regards
    Satish

    ReplyDelete
  3. Thanks Satish, Excellent details about TAF

    ReplyDelete
  4. Hi

    This was my exam question last week and still not sure about the answer.
    Any help is much appreciated.


    Which three statements are true about services and transparent application failover (TAF)

    A-)TAF has been configured for a service, sessions using that service fail over to a surviving instance when an outage occurs
    B-)The TAF setting on a service can be none, basic, preconnect or postconnect and overrides and TAF setting in the client connection definition.
    C-)TAF can restart a query after failover has completed but for other statements such as insert,update or delete, the application must resubmit the transaction
    D-)The Taf setting for a client connection overrides any TAF setting in the service definition
    E-)Services simplify the deployment of TAF because definining a TAF policy for a service, all connections using this service will automatically have TAF enabled

    ReplyDelete