Graceful Failover and Failback Procedures in Manual Data Guard Environments

Introduction

This paper describes the procedures to perform “graceful” failovers and failbacks of standby databases in Oracle environments that are not using Automatic Data Guard.
A “graceful” failover is one that does not require databases to be opened with the RESETLOGS option—and as a result, graceful failbacks do not require the primary database to be rebuilt. (That is, they do not require the standby database’s datafiles to be copied to the primary server.) Graceful failovers and failbacks are also known as “switchovers” and “switchbacks”.

In order to avoid confusion, the original primary database will be referred to as database “ABC”, and the original standby database will be referred to as database “XYZ”. In other words, before any of these procedures are run, the primary database is “ABC” and the standby database is “XYZ”..

The procedures in this document are generally based on the information contained in Metalink document 90817.1.

Graceful Failover or SwitchOver Procedure

  1. Shut down database ABC and database XYZ with SHUTDOWN IMMEDIATE.

  2. Copy the unsent archived redo logs, all of the online redo logs, and all of the control files from “ABC” host to “XYZ” host. Make sure that you copy all of these files into their pre-existing locations on “XYZ” host. For example, if a control file called “control01.ctl” exists in the /u01 directory on "ABC" host; but it exists in the /u04 directory on “XYZ” host, then copy that file from /u01 on abc’s host to /u04 on xyz’s host.

  3. Start up database “XYZ” with STARTUP MOUNT.

  4. If necessary, rename the datafiles and online redo logs in database “XYZ” to reflect those files’ locations on that host with ALTER DATABASE RENAME FILE commands. This will be necessary if any datafiles or online redo logs have different filenames or locations on “XYZ” host than on "ABC" host.

  5. In database “XYZ”, execute the command RECOVER DATABASE. If any archived redo logs are needed for recovery, then you will be prompted for the next log. In that case, type in AUTO at that prompt. When the instance finishes applying all of the changes contained in the archived redo logs and the online redo logs, the message “Media recovery complete” will be displayed.

  6. Open database “XYZby executing the ALTER DATABASE OPEN; command. The “XYZ” database is now open in read-write mode; so at this point the “XYZ” database is ready to be used as the NEW primary database.

  7. In database “XYZ” execute the following command, substituting an appropriate filename with full path:
            ALTER DATABASE CREATE STANDBY CONTROLFILE AS 'filename';
  8. Copy the standby controlfile that you just created to "ABC" host. Then, on that host, overwrite all of the existing controlfiles for the "ABC" database with this standby controlfile. In other words, copy the standby controlfile into each of the existing controlfile’s directories; and then rename the standby controlfiles to match the name of the existing controlfiles.

  9. Start up the "ABC" database with the following commands:
            STARTUP NOMOUNT
    ALTER DATABASE MOUNT STANDBY DATABASE;
  10. If necessary, rename the datafiles and online redo logs in database "ABC" to reflect those files’ locations on abc’s host with ALTER DATABASE RENAME FILE commands. This will be necessary if any datafiles or online redo logs have different filenames or locations on abc’s host than on xyz’s host. At this point, the "ABC" database is configured as the new standby database, and is ready to apply archived redo logs from the "XYZ" database.

Graceful Failback or SwitchBack Procedure

(These steps are basically identical to the failover procedure above—the only difference is that you are reversing the roles of the "ABC" and "XYZ" databases.)

  1. Shut down database "ABC" and database "XYZ" with SHUTDOWN IMMEDIATE.

  2. Copy the unsent archived redo logs, all of the online redo logs, and all of the control files from "XYZ" host to "ABC" host. Make sure that you copy all of these files into their pre-existing locations on "ABC" host. For example, if a control file called “control01.ctl” exists in the /u01 directory on "ABC" host; but it exists in the /u04 directory on “XYZ” host, then copy that file from /u04 on xyz’s host to /u01 on abc’s host

  3. Start up database "ABC" with STARTUP MOUNT.

  4. If necessary, rename the datafiles and online redo logs in database "ABC" to reflect those files’ locations on that host with ALTER DATABASE RENAME FILE commands. This will be necessary if any datafiles or online redo logs have different filenames or locations on "ABC" host than on "XYZ" host.

  5. In database "ABC", execute the command RECOVER DATABASE. If any archived redo logs are needed for recovery, then you will be prompted for the next log. In that case, type in AUTO at that prompt. When the instance finishes applying all of the changes contained in the archived redo logs and the online redo logs, the message “Media recovery complete” will be displayed.

  6. Open database "ABC" by executing the ALTER DATABASE OPEN; command. The "ABC" database is now open in read-write mode; so at this point the "ABC" database is ready to be used as the NEW primary database.

  7. In database "ABC" execute the following command, substituting an appropriate filename with full path:
            ALTER DATABASE CREATE STANDBY CONTROLFILE AS 'filename';
  8. Copy the standby controlfile that you just created to "XYZ" host. Then, on that host, overwrite all of the existing controlfiles for the "XYZ" database with this standby controlfile. In other words, copy the standby controlfile into each of the existing controlfile’s directories; and then rename the standby controlfiles to match the name of the existing controlfiles.

  9. Start up the "XYZ" database with the following commands:
            STARTUP NOMOUNT
    ALTER DATABASE MOUNT STANDBY DATABASE;
  10. If necessary, rename the datafiles and online redo logs in database "XYZ" to reflect those files’ locations on xyz’s host with ALTER DATABASE RENAME FILE commands. This will be necessary if any datafiles or online redo logs have different filenames or locations on xyz’s host than on abc’s host. At this point, the "XYZ" database is configured as the new standby database, and is ready to apply archived redo logs from the "ABC" database.

Requirements and Notes

This section describes the requirements that must be met in order to use these procedures; along with some notes.

First, the primary database’s host must be accessible. In order to use these procedures, the control files and online redo logs from the primary database need to be copied from the primary database’s host to the standby database’s host. Therefore, these procedures can not be used if the primary database’s host is inaccessible—because that would prevent those files from being copied to the standby server. (It may be possible to configure a geographic disk mirroring utility between the primary and standby hosts in order to be able to eliminate this requirement; but I have not seen any successful implementation of geographic disk mirroring in Oracle environments.)

Next, the primary and standby databases must be shut down with normal or immediate priority. If the databases (particularly the primary database) are shut down with SHUTDOWN ABORT, then it might not be possible to fully recover the standby database—and that would prevent these procedures from being usable.

These procedures include converting the original primary database into a standby database during graceful failover (after the original standby database has been converted into a primary database). It should be pointed out that it is not required to convert the primary into a standby—it is still possible to fail back to the original primary, even if that database was not converted into a standby. However, if you do not convert the original primary database into a standby, then you must ensure that that database never gets opened until you are performing the failback procedure. This is because the act of opening a primary database generates some redo in that database—and that redo will cause the primary database’s datafiles to become permanently out of sync with the redo in the standby database. So, if you open up the original primary database without converting it to a standby, you will not be able to use these procedures; you will have to completely rebuild the primary database from a copy of the standby database’s datafiles.

Thirdly, these procedures specify to copy the control files from the primary database to the standby database. An alternative to doing this is to rebuild the standby’s control files from the output of an ALTER DATABASE BACKUP CONTROLFILE TO TRACE command. Note, however, that if you wish to rebuild the standby’s control files, then you must ensure that the CREATE CONTROLFILE statement contains the NORESETLOGS option. If you change that option to specify RESETLOGS, then you will not be able to open the database with a regular ALTER DATABASE OPEN. Instead, you will be forced to specify RESETLOGS when opening the database, and, of course, the whole purpose of these procedures is to avoid opening the database with RESETLOGS.

Finally, there is an “idiosyncrasy” with these procedures when they are used in conjunction with locally-managed temporary tablespaces. To be specific, if these procedures are used with a database that contains a locally-managed temporary tablespace, and if the tempfiles of that tablespace were not added to the hot standby database before the standby was converted into the new primary database, then that locally managed temporary tablespace will have to be completely dropped and re-created in the new primary database in order to use the tempfiles in question.

The reason for this is that these procedures specify to copy the control files from the primary database to the standby database. Therefore, if you try to add the tempfiles to the standby database’s temporary tablespace after converting it to the new primary, you will receive a “file is already part of database” error message – because the control files (which were copied from the original primary) will already contain the information about those tempfiles. Also, if the new primary instance tries to use any of the tempfiles (such as for a sorting operation) then a “cannot identify/lock data file xxx” error message will be displayed – because the tempfile in question does not actually exist on the new primary’s host. As mentioned above, one solution to this issue is to simply drop and re-create the locally managed temporary tablespace completely, after converting the standby database into a primary database. An alternate solution is to manually ensure that all tempfiles have been added to the standby before converting the standby into the new primary.