Monday, April 14, 2008

Data Guard and forced logging

When you start with Oracle Data Guard you have three sources of useful information: The Data Guard Concepts and Administration manual, which is quite good, other peoples experience, and after some time your own. The manual does not include all the mistakes one can do, nor what many really don't use (like the data guard broker; not everybody would leave switchover, or worse failover to a robot).

After a planned switchover to the physical standby the following ora-600 error message was received on our logical standby:

ORA-00600: internal error code, arguments: [krvxbpx20], [1],
[34735], [60], [16], [], [],[]

SQL Apply terminated after this error and neither restart of sql apply or bouncing the instance helped. On Metalink the only relevant information I found was Bug 6022014, which has status "Closed, not a bug". An ora-600 error that is not a bug...well, if it's my fault a decent error message would help. An SR was opened with Oracle Support and it was suspected quite right that supplemental logging was not enabled on primary and asked me to run the following query on the new primary:

select FORCE_LOGGING, SUPPLEMENTAL_LOG_DATA_PK,
SUPPLEMENTAL_LOG_DATA_UI, SUPPLEMENTAL_LOG_DATA_FK,
SUPPLEMENTAL_LOG_DATA_MIN, SUPPLEMENTAL_LOG_DATA_ALL
from v$database;

Which returned:

FOR SUP SUP SUP SUPPLEME SUP
--- --- --- --- -------- ---
NO NO NO NO NO NO

Suddenly our logical standby was not my biggest concern. How come the primary does not run with forced logging? Following the manual this is one of the first step when preparing for data guard, mentioned in section 3.1.1 (10g) and from table 3-1 looks like a requirement. The manual doesn't tell you to execute it again for the new physical standby, neither before or after the first switchover. It surprised me that it is possible to successfully execute a switchover without having the new primary running with forced logging. Though I have not tested it, I imagine that the clone for the first physical standby was made before forced logging was enabled in the production database, and if I had enabled it in the order given in the manual the setting would follow the copy of the controlfile (in standby format) used to make the physical standby. At least that is the only explanation I have.

The solution was simple, just turn on forced logging with:

alter database force logging;

This is not recommended in an OLTP database with many ongoing transactions and can take some time to complete, but will finish soon when activity is low.

Just in case you don't know, forced logging makes sure nobody can create a table, index or other object without logging everything to the redo log, i.e. the NOLOGGING option in create statements is ignored. Checking that every node runs with forced logging should be part of a regular review of your data guard configuration.

Our logical standby must be recreated since many transactions have been lost, but that's OK now that a potential bigger problem have been avoided. Why do we have a physical and a logical standby ? The former serves the fastest and safest switchover or failover if necessary. The latter is used for off-site recovery options and as a source for our data warehouse. Remarkably stable and easy to setup once you have data guard with physical standby in place.