This post is important for my own record and shows how to salvage data from a large table with several corrupt blocks. I also had a learning experience I wanted to remember, explaining the redundancy of information here. Previous post was about how to map out tables and the corrupt blocks in a temporary table (that is, not a table created as a global temporary table, but a table you'll drop when you're finished.)
Note 61685.1 on My Oracle Support (MOS) shows how to extract data from a corrupt table. This post builds on some of that. Before you begin; data in corrupt blocks are gone. Unless you have special tools and start reading the data files with a tool like DUDE you won't recover those data. This is about how to recover the data that reside in blocks that have not been marked as corrupt.
One essential function gives you the ROWID from the block number and its file number: DBMS_ROWID.ROWID_CREATE:
select DBMS_ROWID.ROWID_CREATE(1,253071,89,660864,0) from DUAL;
In the example above 1 means extended rowid; no need to change this, 253071 is the DATA_OBJECT_ID from DBA_OBJECTS, 89 is FILE# (file number) from V$DATABASE_BLOCK_CORRUPTION , and 660864 is BLOCK# (block number) from the same view. The last parameter is for the row number in that block, in this case we want the whole block and therefore starts with row 0.
This is an example of the content of V$DATABASE_BLOCK_CORRUPTION:
FILE# BLOCK# BLOCKS CORRUPTION_CHANGE# CORRUPTION_TYPE 89 661112 8 0 CORRUPT 89 661048 16 0 CORRUPT 89 660984 16 0 CORRUPT 89 660920 16 0 CORRUPT 89 660864 8 0 CORRUPT
In previous post I created a table SEGMENTS_WITH_CORRUPTION. The following query gives a report of corrupt segments with an interval by rowid using the function above:
select O.OWNER,O.OBJECT_NAME,O.OBJECT_TYPE, DBMS_ROWID.ROWID_CREATE(1,O.DATA_OBJECT_ID,C.FILE_ID,C.START_CORRUPT,0) ROWID_OF_FIRST_CORR, DBMS_ROWID.ROWID_CREATE(1,o.DATA_OBJECT_ID,c.file_id,c.start_corrupt+c.corrupt_blocks,0) ROWID_OF_NEXT_CLEAN from segments_with_corruption c,DBA_OBJECTS O where C.OWNER=O.OWNER and c.segment_name=o.object_name;An example of output (slightly redacted so I don't reveal the source)
OWNER OBJECT_NAME OBJECT_TYPE ROWID_OF_FIRST_CORR ROWID_OF_NEXT_CLEAN LUSER FUBAR TABLE AAA9yPABZAAChZ4AAA AAA9yPABZAAChaAAAA LUSER FUBAR TABLE AAA9yPABZAAChY4AAA AAA9yPABZAAChZIAAA LUSER FUBAR TABLE AAA9yPABZAAChX4AAA AAA9yPABZAAChYIAAA LUSER FUBAR TABLE AAA9yPABZAAChW4AAA AAA9yPABZAAChXIAAA LUSER FUBAR TABLE AAA9yPABZAAChWAAAA AAA9yPABZAAChWIAAA
Now, create a new table to hold the good data. Using LUSER.FUBAR as the owner.table_name for the original table:
Create table luser.fubar_recover as Select * from luser.fubar Where 1=0;This means we create an empty table with the same structure, we will fill it with data in a few steps. If you had only a few intervals of corrupt blocks, you can create insert statements like the following in order to retrieve the good data:
Insert into luser.fubar_recover select * from luser.fubar where rowid < ‘AAA9yPABZAAChZ4AAA’; Insert into luser.fubar_recover select * from luser.fubar where rowid >= ‘AAA9yPABZAAChaAAAA’ and rowid < ‘AAA9yPABZAAChY4AAA’;
In the first example you get all the rows in blocks up to the corrupt one. In the second you get all the rows between two corrupt blocks. In my case I had a large table with several stretches of corrupt and good blocks. I did not want to handcraft all these insert statements, I wanted one statement that could retrieve all the good data from the table.
To achieve that I started with a query that generates interval of corrupt blocks in the order as they are laid out in the table. At this point I was only working on one table and one data file as the query below shows.
select FIRST_CORRUPT,FIRST_CLEAN,LAG(FIRST_CORRUPT) over (order by start_corrupt) PREVIOUS_CORRUPT, LEAD(FIRST_CORRUPT) over (order by start_corrupt) NEXT_CORRUPT, rownum rn from( select DBMS_ROWID.ROWID_CREATE(1,O.DATA_OBJECT_ID,C.FILE_ID,C.START_CORRUPT,0) FIRST_CORRUPT, DBMS_ROWID.ROWID_CREATE(1,O.DATA_OBJECT_ID,C.FILE_ID,C.START_CORRUPT+C.CORRUPT_BLOCKS,0) FIRST_CLEAN, C.START_CORRUPT from SEGMENTS_WITH_CORRUPTION C, DBA_OBJECTS O where C.OWNER=O.OWNER and C.SEGMENT_NAME=O.OBJECT_NAME and FILE_ID=89 and C.OWNER=’LUSER’ and C.SEGMENT_NAME='FUBAR’ order by c.start_corrupt ) ;
The output looked like this:
FIRST_CORRUPT | FIRST_CLEAN | PREVIOUS_CORRUPT | NEXT_CORRUPT | RN |
---|---|---|---|---|
AAA9yPABZAAChWAAAA | AAA9yPABZAAChWIAAA | AAA9yPABZAAChW4AAA | 1 | |
AAA9yPABZAAChW4AAA | AAA9yPABZAAChXIAAA | AAA9yPABZAAChWAAAA | AAA9yPABZAAChX4AAA | 2 |
AAA9yPABZAAChX4AAA | AAA9yPABZAAChYIAAA | AAA9yPABZAAChW4AAA | AAA9yPABZAAChY4AAA | 3 |
AAA9yPABZAAChY4AAA | AAA9yPABZAAChZIAAA | AAA9yPABZAAChX4AAA | AAA9yPABZAAChZ4AAA | 4 |
AAA9yPABZAAChZ4AAA | AAA9yPABZAAChaAAAA | AAA9yPABZAAChY4AAA | 5 |
with INTERVALS as ( select FIRST_CORRUPT,FIRST_CLEAN,LAG(FIRST_CORRUPT) over (order by start_corrupt) PREVIOUS_CORRUPT, LEAD(FIRST_CORRUPT) over (order by start_corrupt) NEXT_CORRUPT, rownum rn from( select DBMS_ROWID.ROWID_CREATE(1,O.DATA_OBJECT_ID,C.FILE_ID,C.START_CORRUPT,0) FIRST_CORRUPT, DBMS_ROWID.ROWID_CREATE(1,O.DATA_OBJECT_ID,C.FILE_ID,C.START_CORRUPT+C.CORRUPT_BLOCKS,0) FIRST_CLEAN, C.START_CORRUPT from SEGMENTS_WITH_CORRUPTION C, DBA_OBJECTS O where C.OWNER=O.OWNER and C.SEGMENT_NAME=O.OBJECT_NAME and FILE_ID=89 and C.OWNER='LUSER' and C.SEGMENT_NAME='FUBAR' order by c.start_corrupt ) ) select 'where rowid < ''' || FIRST_CORRUPT || '''' LINE from INTERVALS where RN=1 union select 'or (rowid >= ''' || FIRST_CLEAN || ''' and rowid < ''' || NEXT_CORRUPT || ''')' LINE from INTERVALS where RN>1 and NEXT_CORRUPT is not null union select 'or (rowid >= ''' || FIRST_CLEAN || ''')' LINE from INTERVALS where NEXT_CORRUPT is null order by rn ;
This gives the following output:
LINE where rowid < 'AAA9yPABZAAChWAAAA' or (rowid >= 'AAA9yPABZAAChXIAAA' and rowid < 'AAA9yPABZAAChX4AAA') or (rowid >= 'AAA9yPABZAAChYIAAA' and rowid < 'AAA9yPABZAAChY4AAA') or (rowid >= 'AAA9yPABZAAChZIAAA' and rowid < 'AAA9yPABZAAChZ4AAA') or (rowid >= 'AAA9yPABZAAChaAAAA)
I can now amend the insert statement by using these where clauses at the end of the select statement:
insert into LUSER.FUBAR_RECOVER select /*+ ROWID(T) */ * from LUSER.FUBAR T where rowid < 'AAA9yPABZAAChWAAAA' or (rowid >= 'AAA9yPABZAAChXIAAA' and rowid < 'AAA9yPABZAAChX4AAA') or (rowid >= 'AAA9yPABZAAChYIAAA' and rowid < 'AAA9yPABZAAChY4AAA') or (rowid >= 'AAA9yPABZAAChZIAAA' and rowid < 'AAA9yPABZAAChZ4AAA') or (rowid >= 'AAA9yPABZAAChaAAAA');
Also note the ROWID hint that tells Oracle how to scan the table. The biggest problem with this query is that it does not work(!) Yes, I'm going to drag you through the learning experience I had when making things automated and slightly more advanced. When executing the SQL above Oracle performs a normal table scan and aborts when it reads the first corrupt block. This can be seen if you look at the execution plan:
explain plan for select /*+ ROWID(T) */ * from LUSER.FUBAR T where rowid < 'AAA9yPABZAAChWAAAA' or (rowid >= 'AAA9yPABZAAChXIAAA' and rowid < 'AAA9yPABZAAChX4AAA') or (rowid >= 'AAA9yPABZAAChYIAAA' and rowid < 'AAA9yPABZAAChY4AAA') or (rowid >= 'AAA9yPABZAAChZIAAA' and rowid < 'AAA9yPABZAAChZ4AAA') or (rowid >= 'AAA9yPABZAAChaAAAA'); select * from table(dbms_xplan.display);
Plan hash value: 3975471795 ---------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ---------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 11M| 2111M| 838K (1)| 02:47:37 | |* 1 | TABLE ACCESS FULL| FUBAR | 11M| 2111M| 838K (1)| 02:47:37 | ---------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter(ROWID<'AAA9yPABZAAChWAAAA' OR ROWID>='AAA9yPABZAAChaAAAA' OR ROWID>='AAA9yPABZAAChXIAAA' AND ROWID<'AAA9yPABZAAChX4AAA' OR ROWID>='AAA9yPABZAAChYIAAA' AND ROWID<'AAA9yPABZAAChY4AAA' OR ROWID>='AAA9yPABZAAChZIAAA' AND ROWID<'AAA9yPABZAAChZ4AAA')Compare this to the plan for the SQL that was used in the beginning:
explain plan for select /*+ ROWID(T) */ * from LUSER.FUBAR T where rowid >= 'AAA9yPABZAAChaAAAA'; select * from table(dbms_xplan.display);
Plan hash value: 3328636578 -------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | -------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 5445K| 1012M| 836K (1)| 02:47:18 | |* 1 | TABLE ACCESS BY ROWID RANGE| FUBAR | 5445K| 1012M| 836K (1)| 02:47:18 | -------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - access(ROWID>='AAA9yPABZAAChaAAAA')
This one uses TABLE ACCESS BY ROWID, but the one that fails uses normal TABLE SCAN. I therefore changed slightly the SQL to generate where clauses:
with INTERVALS as ( select FIRST_CORRUPT,FIRST_CLEAN,LAG(FIRST_CORRUPT) over (order by start_corrupt) PREVIOUS_CORRUPT, LEAD(FIRST_CORRUPT) over (order by start_corrupt) NEXT_CORRUPT, rownum rn from( select DBMS_ROWID.ROWID_CREATE(1,O.DATA_OBJECT_ID,C.FILE_ID,C.START_CORRUPT,0) FIRST_CORRUPT, DBMS_ROWID.ROWID_CREATE(1,O.DATA_OBJECT_ID,C.FILE_ID,C.START_CORRUPT+C.CORRUPT_BLOCKS,0) FIRST_CLEAN, C.START_CORRUPT from SEGMENTS_WITH_CORRUPTION C, DBA_OBJECTS O where C.OWNER=O.OWNER and C.SEGMENT_NAME=O.OBJECT_NAME and FILE_ID=89 and C.OWNER='LUSER' and C.SEGMENT_NAME='FUBAR' order by c.start_corrupt ) ) select 'select /*+ ROWID(T) */ * from LUSER.FUBAR T where rowid < ''' || FIRST_CORRUPT || '''' LINE from INTERVALS where RN=1 union select 'union select /*+ ROWID(T) */ * from LUSER.FUBAR T where rowid >= ''' || FIRST_CLEAN || ''' and rowid < ''' || NEXT_CORRUPT || '''' LINE from INTERVALS where RN>1 and NEXT_CORRUPT is not null union select 'union select /*+ ROWID(T) */ * from LUSER.FUBAR T where rowid >= ''' || FIRST_CLEAN || ''';' LINE from INTERVALS where NEXT_CORRUPT is null order by RN ;
The resulting SQL used to retrieve good data looks like this:
select /*+ ROWID(T) */ * from LUSER.FUBAR T where rowid < 'AAA9yPABZAAChWAAAA' union select /*+ ROWID(T) */ * from LUSER.FUBAR T where rowid >= 'AAA9yPABZAAChXIAAA' and rowid < 'AAA9yPABZAAChX4AAA' union select /*+ ROWID(T) */ * from LUSER.FUBAR T where rowid >= 'AAA9yPABZAAChYIAAA' and rowid < 'AAA9yPABZAAChY4AAA' union select /*+ ROWID(T) */ * from LUSER.FUBAR T where rowid >= 'AAA9yPABZAAChZIAAA' and rowid < 'AAA9yPABZAAChZ4AAA' union select /*+ ROWID(T) */ * from LUSER.FUBAR T where rowid >= 'AAA9yPABZAAChaAAAA';
This one actually runs, but it runs for a long time and it eventually failed later because TEMP ran out of space. The reason for this can also be seen in the execution plan:
Plan hash value: 3469639691 ------------------------------------------------------------------------------------------------------ | Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time | ------------------------------------------------------------------------------------------------------ | 0 | SELECT STATEMENT | | 11M| 2177M| | 4680K (78)| 15:36:06 | | 1 | SORT UNIQUE | | 11M| 2177M| 2858M| 4680K (78)| 15:36:06 | | 2 | UNION-ALL | | | | | | | |* 3 | TABLE ACCESS BY ROWID RANGE| FUBAR | 5445K| 1012M| | 836K (1)| 02:47:18 | |* 4 | TABLE ACCESS BY ROWID RANGE| FUBAR | 272K| 50M| | 836K (1)| 02:47:16 | |* 5 | TABLE ACCESS BY ROWID RANGE| FUBAR | 272K| 50M| | 836K (1)| 02:47:16 | |* 6 | TABLE ACCESS BY ROWID RANGE| FUBAR | 272K| 50M| | 836K (1)| 02:47:16 | |* 7 | TABLE ACCESS BY ROWID RANGE| FUBAR | 5445K| 1012M| | 836K (1)| 02:47:18 | ------------------------------------------------------------------------------------------------------ Predicate Information (identified by operation id): --------------------------------------------------- 3 - access(ROWID<'AAA9yPABZAAChWAAAA') 4 - access(ROWID>='AAA9yPABZAAChXIAAA' AND ROWID<'AAA9yPABZAAChX4AAA') 5 - access(ROWID>='AAA9yPABZAAChYIAAA' AND ROWID<'AAA9yPABZAAChY4AAA') 6 - access(ROWID>='AAA9yPABZAAChZIAAA' AND ROWID<'AAA9yPABZAAChZ4AAA') 7 - access(ROWID>='AAA9yPABZAAChaAAAA')
Look at line 1 and 2 in the plan. I forgot to use UNION ALL instead of UNION and Oracle then goes on to sort this stuff in order to return distinct rows. The table had over 100M rows and sorting this filled the TEMP tablespace completely. Since I know I want all rows, I changed it to UNION ALL (this is finally a good one you can use):
select FIRST_CORRUPT,FIRST_CLEAN,LAG(FIRST_CORRUPT) over (order by start_corrupt) PREVIOUS_CORRUPT, LEAD(FIRST_CORRUPT) over (order by start_corrupt) NEXT_CORRUPT, rownum rn from( select DBMS_ROWID.ROWID_CREATE(1,O.DATA_OBJECT_ID,C.FILE_ID,C.START_CORRUPT,0) FIRST_CORRUPT, DBMS_ROWID.ROWID_CREATE(1,O.DATA_OBJECT_ID,C.FILE_ID,C.START_CORRUPT+C.CORRUPT_BLOCKS,0) FIRST_CLEAN, C.START_CORRUPT from SEGMENTS_WITH_CORRUPTION C, DBA_OBJECTS O where C.OWNER=O.OWNER and C.SEGMENT_NAME=O.OBJECT_NAME and FILE_ID=89 and C.OWNER='LUSER' and C.SEGMENT_NAME='FUBAR' order by c.start_corrupt ) ) select 'select /*+ ROWID(T) */ * from LUSER.FUBAR T where rowid < ''' || FIRST_CORRUPT || '''' LINE from INTERVALS where RN=1 union select 'union all select /*+ ROWID(T) */ * from LUSER.FUBAR T where rowid >= ''' || FIRST_CLEAN || ''' and rowid < ''' || NEXT_CORRUPT || '''' LINE from INTERVALS where RN>1 and NEXT_CORRUPT is not null union all select 'union all select /*+ ROWID(T) */ * from LUSER.FUBAR T where rowid >= ''' || FIRST_CLEAN || ''';' LINE from INTERVALS where NEXT_CORRUPT is null order by RN ;The new execution plan looks like this:
Plan hash value: 439408122 --------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | --------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 11M| 2177M| 4181K (81)| 13:56:23 | | 1 | UNION-ALL | | | | | | |* 2 | TABLE ACCESS BY ROWID RANGE| FUBAR | 5445K| 1012M| 836K (1)| 02:47:18 | |* 3 | TABLE ACCESS BY ROWID RANGE| FUBAR | 272K| 50M| 836K (1)| 02:47:16 | |* 4 | TABLE ACCESS BY ROWID RANGE| FUBAR | 272K| 50M| 836K (1)| 02:47:16 | |* 5 | TABLE ACCESS BY ROWID RANGE| FUBAR | 272K| 50M| 836K (1)| 02:47:16 | |* 6 | TABLE ACCESS BY ROWID RANGE| FUBAR | 5445K| 1012M| 836K (1)| 02:47:18 | --------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access(ROWID<'AAA9yPABZAAChWAAAA') 3 - access(ROWID>='AAA9yPABZAAChXIAAA' AND ROWID<'AAA9yPABZAAChX4AAA') 4 - access(ROWID>='AAA9yPABZAAChYIAAA' AND ROWID<'AAA9yPABZAAChY4AAA') 5 - access(ROWID>='AAA9yPABZAAChZIAAA' AND ROWID<'AAA9yPABZAAChZ4AAA') 6 - access(ROWID>='AAA9yPABZAAChaAAAA')To wrap it up:
If you have a large table with several corrupt blocks and many good ones, you can save the good data by creating a table that maps the bad blocks and then create a select statements that retrieve the rows from all the good blocks.
Best is of course to get your backup and recover routines in order, combine it with a decent fault management so you'll discover the corrupt blocks at an early stage, thereby making it possible to recover the blocks before they have been aged out of your backup system.
If corrupt blocks happens again and again you should see if it happens only with very few tables or many. If the corruption is restricted to few out of many tables it may indicate that it has something to do with how these tables are created and used (thinking about NOLOGGING operations here). In case of frequent and seemingly random corruption I would start thinking about the underlying storage.