Øyvind Isene's Blog: 2013

Friday, September 20, 2013

Upcoming conferences

OSL - CPH - SFO. Starting on my route in a few hours. Really look forward to see tweeps and a bunch of people from the worldwide Oracle community. Lots of good presentations of course, but also bloggers' meetup, Oaktable world, RAC Attack, Anchor Steam, and more.

I'll speak at the Bulgarian Oracle User Group conference 22nd to 24th of November. I have heard from many that this is a good conference. First time in Bulgaria means this will be a triple excitement; giving a presentation about something I have thought a lot about, seeing a new country and meeting a possibly different culture.

The UKOUG Tech conference starts the week after. That is another conference I've never been to (strangely enough since it is rather close compared to OOW).

If you happen to be at one of these conferences, please say Hi!

Sunday, September 15, 2013

Finding corrupt XML files on Linux

Here is an easy command to find corrupt XML-files on a Linux box.

A couple of days ago I had a problem with OBIEE and weblogic. Googling and search on My Oracle Support indicated that the failure may be caused by a corrupt XML file. And you know that there are many of them in an OBIEE installation. In fact MOS in Doc 1475146.1 blindly suggests to move a bunch of them to a backup location, because in some cases they had become corrupt for some strange reason. I was not satisfied with just deleting a few random files that have caused problems for others earlier, I wanted to find the file.

I then came up with the idea that I could find the corrupt file by parsing every XML file below a point in the file hierarchy. On Linux this is quite easy. The command xmlint is possibly not standard on every system, but should be easy to get with yum or similar. In my case the file I was looking for belonged to user weblogic, meaning the corrupt file would reside in a directory called weblogic, hence the command. But you can of course tweak the arguments to the find command to search the files you want:

find . -wholename \*weblogic\*.xml -ls -exec xmlint {} > /dev/null \;

Any normal output will be discarded and only failures sent to stderr (on your screen). A similar command to parse every file with extension xml below $ORACLE_BASE:

find $ORACLE_BASE -name \*.xml -exec xmlint {} > /dev/null \;

If you (re)move the file you usually have to restart stuff with opmnctl.

Sunday, July 7, 2013

Corrupt database blocks III

If you don't know what procrastination is you could start your own blog. Start a post with something like "I worked with ... and decided to write three posts about it". Procrastination is what you experience before the last post is ready; seemingly simple things take forever to get out the door.

This post is important for my own record and shows how to salvage data from a large table with several corrupt blocks. I also had a learning experience I wanted to remember, explaining the redundancy of information here. Previous post was about how to map out tables and the corrupt blocks in a temporary table (that is, not a table created as a global temporary table, but a table you'll drop when you're finished.)

Note 61685.1 on My Oracle Support (MOS) shows how to extract data from a corrupt table. This post builds on some of that. Before you begin; data in corrupt blocks are gone. Unless you have special tools and start reading the data files with a tool like DUDE you won't recover those data. This is about how to recover the data that reside in blocks that have not been marked as corrupt.

One essential function gives you the ROWID from the block number and its file number: DBMS_ROWID.ROWID_CREATE:

select DBMS_ROWID.ROWID_CREATE(1,253071,89,660864,0) from DUAL;

In the example above 1 means extended rowid; no need to change this, 253071 is the DATA_OBJECT_ID from DBA_OBJECTS, 89 is FILE# (file number) from V$DATABASE_BLOCK_CORRUPTION , and 660864 is BLOCK# (block number) from the same view. The last parameter is for the row number in that block, in this case we want the whole block and therefore starts with row 0.

This is an example of the content of V$DATABASE_BLOCK_CORRUPTION:

FILE#     BLOCK#    BLOCKS  CORRUPTION_CHANGE#  CORRUPTION_TYPE
89        661112    8       0                   CORRUPT
89        661048    16      0                   CORRUPT
89        660984    16      0                   CORRUPT
89        660920    16      0                   CORRUPT
89        660864    8       0                   CORRUPT

In previous post I created a table SEGMENTS_WITH_CORRUPTION. The following query gives a report of corrupt segments with an interval by rowid using the function above:

select O.OWNER,O.OBJECT_NAME,O.OBJECT_TYPE,
DBMS_ROWID.ROWID_CREATE(1,O.DATA_OBJECT_ID,C.FILE_ID,C.START_CORRUPT,0) ROWID_OF_FIRST_CORR,
  DBMS_ROWID.ROWID_CREATE(1,o.DATA_OBJECT_ID,c.file_id,c.start_corrupt+c.corrupt_blocks,0) ROWID_OF_NEXT_CLEAN
from segments_with_corruption c,DBA_OBJECTS O
where C.OWNER=O.OWNER
and c.segment_name=o.object_name;

An example of output (slightly redacted so I don't reveal the source)

OWNER  OBJECT_NAME  OBJECT_TYPE    ROWID_OF_FIRST_CORR   ROWID_OF_NEXT_CLEAN
LUSER  FUBAR        TABLE          AAA9yPABZAAChZ4AAA    AAA9yPABZAAChaAAAA
LUSER  FUBAR        TABLE          AAA9yPABZAAChY4AAA    AAA9yPABZAAChZIAAA
LUSER  FUBAR        TABLE          AAA9yPABZAAChX4AAA    AAA9yPABZAAChYIAAA
LUSER  FUBAR        TABLE          AAA9yPABZAAChW4AAA    AAA9yPABZAAChXIAAA
LUSER  FUBAR        TABLE          AAA9yPABZAAChWAAAA    AAA9yPABZAAChWIAAA

Now, create a new table to hold the good data. Using LUSER.FUBAR as the owner.table_name for the original table:

Create table luser.fubar_recover as
Select * from luser.fubar
Where 1=0;

This means we create an empty table with the same structure, we will fill it with data in a few steps. If you had only a few intervals of corrupt blocks, you can create insert statements like the following in order to retrieve the good data:

Insert into luser.fubar_recover
  select *
  from luser.fubar
  where rowid < ‘AAA9yPABZAAChZ4AAA’;

Insert into luser.fubar_recover
  select *
  from luser.fubar
  where rowid >= ‘AAA9yPABZAAChaAAAA’  
  and rowid < ‘AAA9yPABZAAChY4AAA’;

In the first example you get all the rows in blocks up to the corrupt one. In the second you get all the rows between two corrupt blocks. In my case I had a large table with several stretches of corrupt and good blocks. I did not want to handcraft all these insert statements, I wanted one statement that could retrieve all the good data from the table.

To achieve that I started with a query that generates interval of corrupt blocks in the order as they are laid out in the table. At this point I was only working on one table and one data file as the query below shows.

select FIRST_CORRUPT,FIRST_CLEAN,LAG(FIRST_CORRUPT) over (order by start_corrupt) PREVIOUS_CORRUPT,
  LEAD(FIRST_CORRUPT) over (order by start_corrupt) NEXT_CORRUPT, rownum rn
  from(
    select 
    DBMS_ROWID.ROWID_CREATE(1,O.DATA_OBJECT_ID,C.FILE_ID,C.START_CORRUPT,0) FIRST_CORRUPT,
    DBMS_ROWID.ROWID_CREATE(1,O.DATA_OBJECT_ID,C.FILE_ID,C.START_CORRUPT+C.CORRUPT_BLOCKS,0) FIRST_CLEAN,
    C.START_CORRUPT
    from SEGMENTS_WITH_CORRUPTION C, DBA_OBJECTS O
    where C.OWNER=O.OWNER
    and C.SEGMENT_NAME=O.OBJECT_NAME
    and FILE_ID=89 and C.OWNER=’LUSER’ and C.SEGMENT_NAME='FUBAR’
    order by c.start_corrupt
  )
;

The output looked like this:

FIRST_CORRUPT	FIRST_CLEAN	PREVIOUS_CORRUPT	NEXT_CORRUPT	RN
AAA9yPABZAAChWAAAA	AAA9yPABZAAChWIAAA		AAA9yPABZAAChW4AAA	1
AAA9yPABZAAChW4AAA	AAA9yPABZAAChXIAAA	AAA9yPABZAAChWAAAA	AAA9yPABZAAChX4AAA	2
AAA9yPABZAAChX4AAA	AAA9yPABZAAChYIAAA	AAA9yPABZAAChW4AAA	AAA9yPABZAAChY4AAA	3
AAA9yPABZAAChY4AAA	AAA9yPABZAAChZIAAA	AAA9yPABZAAChX4AAA	AAA9yPABZAAChZ4AAA	4
AAA9yPABZAAChZ4AAA	AAA9yPABZAAChaAAAA	AAA9yPABZAAChY4AAA		5

Oh, well. Layout of the last table could have been better, I experimented with export html-code from SQL Developer that can be pasted into the HTML-tab. Anyway, it shows stretches of blocks that are corrupt for this file. Now I want to generate WHERE clauses that are used in the select statement that retrieves the good data:

with INTERVALS as (
select FIRST_CORRUPT,FIRST_CLEAN,LAG(FIRST_CORRUPT) over (order by start_corrupt) PREVIOUS_CORRUPT,
  LEAD(FIRST_CORRUPT) over (order by start_corrupt) NEXT_CORRUPT, rownum rn
  from(
    select 
    DBMS_ROWID.ROWID_CREATE(1,O.DATA_OBJECT_ID,C.FILE_ID,C.START_CORRUPT,0) FIRST_CORRUPT,
    DBMS_ROWID.ROWID_CREATE(1,O.DATA_OBJECT_ID,C.FILE_ID,C.START_CORRUPT+C.CORRUPT_BLOCKS,0) FIRST_CLEAN,
    C.START_CORRUPT
    from SEGMENTS_WITH_CORRUPTION C, DBA_OBJECTS O
    where C.OWNER=O.OWNER
    and C.SEGMENT_NAME=O.OBJECT_NAME
    and FILE_ID=89 and C.OWNER='LUSER' and C.SEGMENT_NAME='FUBAR'
    order by c.start_corrupt
  )
)  select 'where rowid < ''' || FIRST_CORRUPT || '''' LINE
   from INTERVALS
   where RN=1
   union
   select 'or (rowid >= ''' || FIRST_CLEAN || ''' and rowid < ''' || NEXT_CORRUPT || ''')' LINE
   from INTERVALS
   where RN>1 and NEXT_CORRUPT is not null
   union
   select 'or (rowid >= ''' || FIRST_CLEAN || ''')' LINE
   from INTERVALS
   where NEXT_CORRUPT is null
   order by rn
  ;

This gives the following output:

LINE
where rowid < 'AAA9yPABZAAChWAAAA'
or (rowid >= 'AAA9yPABZAAChXIAAA' and rowid < 'AAA9yPABZAAChX4AAA')
or (rowid >= 'AAA9yPABZAAChYIAAA' and rowid < 'AAA9yPABZAAChY4AAA')
or (rowid >= 'AAA9yPABZAAChZIAAA' and rowid < 'AAA9yPABZAAChZ4AAA')
or (rowid >= 'AAA9yPABZAAChaAAAA)

I can now amend the insert statement by using these where clauses at the end of the select statement:

insert   into LUSER.FUBAR_RECOVER
select /*+ ROWID(T) */ *
from LUSER.FUBAR T
where rowid < 'AAA9yPABZAAChWAAAA'
or (rowid >= 'AAA9yPABZAAChXIAAA' and rowid < 'AAA9yPABZAAChX4AAA')
or (rowid >= 'AAA9yPABZAAChYIAAA' and rowid < 'AAA9yPABZAAChY4AAA')
or (rowid >= 'AAA9yPABZAAChZIAAA' and rowid < 'AAA9yPABZAAChZ4AAA')
or (rowid >= 'AAA9yPABZAAChaAAAA');

Also note the ROWID hint that tells Oracle how to scan the table. The biggest problem with this query is that it does not work(!) Yes, I'm going to drag you through the learning experience I had when making things automated and slightly more advanced. When executing the SQL above Oracle performs a normal table scan and aborts when it reads the first corrupt block. This can be seen if you look at the execution plan:

explain plan for
select /*+ ROWID(T) */ *
from LUSER.FUBAR T
where rowid < 'AAA9yPABZAAChWAAAA'
or (rowid >= 'AAA9yPABZAAChXIAAA' and rowid < 'AAA9yPABZAAChX4AAA')
or (rowid >= 'AAA9yPABZAAChYIAAA' and rowid < 'AAA9yPABZAAChY4AAA')
or (rowid >= 'AAA9yPABZAAChZIAAA' and rowid < 'AAA9yPABZAAChZ4AAA')
or (rowid >= 'AAA9yPABZAAChaAAAA');
select * from table(dbms_xplan.display);

Plan hash value: 3975471795
----------------------------------------------------------------------------------
| Id  | Operation         | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |              |    11M|  2111M|   838K  (1)| 02:47:37 |
|*  1 |  TABLE ACCESS FULL| FUBAR        |    11M|  2111M|   838K  (1)| 02:47:37 |
----------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter(ROWID<'AAA9yPABZAAChWAAAA' OR ROWID>='AAA9yPABZAAChaAAAA'
              OR ROWID>='AAA9yPABZAAChXIAAA' AND ROWID<'AAA9yPABZAAChX4AAA' OR
              ROWID>='AAA9yPABZAAChYIAAA' AND ROWID<'AAA9yPABZAAChY4AAA' OR
              ROWID>='AAA9yPABZAAChZIAAA' AND ROWID<'AAA9yPABZAAChZ4AAA')

Compare this to the plan for the SQL that was used in the beginning:

explain plan for
select /*+ ROWID(T) */ *
from LUSER.FUBAR T
where rowid >= 'AAA9yPABZAAChaAAAA';
select * from table(dbms_xplan.display);

Plan hash value: 3328636578
--------------------------------------------------------------------------------------------
| Id  | Operation                   | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |              |  5445K|  1012M|   836K  (1)| 02:47:18 |
|*  1 |  TABLE ACCESS BY ROWID RANGE| FUBAR        |  5445K|  1012M|   836K  (1)| 02:47:18 |
--------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access(ROWID>='AAA9yPABZAAChaAAAA')

This one uses TABLE ACCESS BY ROWID, but the one that fails uses normal TABLE SCAN. I therefore changed slightly the SQL to generate where clauses:

with INTERVALS as (
select FIRST_CORRUPT,FIRST_CLEAN,LAG(FIRST_CORRUPT) over (order by start_corrupt) PREVIOUS_CORRUPT,
  LEAD(FIRST_CORRUPT) over (order by start_corrupt) NEXT_CORRUPT, rownum rn
  from(
    select 
    DBMS_ROWID.ROWID_CREATE(1,O.DATA_OBJECT_ID,C.FILE_ID,C.START_CORRUPT,0) FIRST_CORRUPT,
    DBMS_ROWID.ROWID_CREATE(1,O.DATA_OBJECT_ID,C.FILE_ID,C.START_CORRUPT+C.CORRUPT_BLOCKS,0) FIRST_CLEAN,
    C.START_CORRUPT
    from SEGMENTS_WITH_CORRUPTION C, DBA_OBJECTS O
    where C.OWNER=O.OWNER
    and C.SEGMENT_NAME=O.OBJECT_NAME
    and FILE_ID=89 and C.OWNER='LUSER' and C.SEGMENT_NAME='FUBAR'
    order by c.start_corrupt
  )
)  select 'select /*+ ROWID(T) */ *
from LUSER.FUBAR T where rowid < ''' || FIRST_CORRUPT || '''' LINE 
   from INTERVALS
   where RN=1
   union
   select 'union select /*+ ROWID(T) */ *
from LUSER.FUBAR T  where rowid >= ''' || FIRST_CLEAN || ''' and rowid < ''' || NEXT_CORRUPT || '''' LINE
   from INTERVALS
   where RN>1 and NEXT_CORRUPT is not null
   union
   select 'union select /*+ ROWID(T) */ *
from LUSER.FUBAR T where rowid >= ''' || FIRST_CLEAN || ''';' LINE
   from INTERVALS
   where NEXT_CORRUPT is null
   order by RN
  ;

The resulting SQL used to retrieve good data looks like this:

select /*+ ROWID(T) */ *
from LUSER.FUBAR T where rowid < 'AAA9yPABZAAChWAAAA'
union select /*+ ROWID(T) */ *
from LUSER.FUBAR T  where rowid >= 'AAA9yPABZAAChXIAAA' and rowid < 'AAA9yPABZAAChX4AAA'
union select /*+ ROWID(T) */ *
from LUSER.FUBAR T  where rowid >= 'AAA9yPABZAAChYIAAA' and rowid < 'AAA9yPABZAAChY4AAA'
union select /*+ ROWID(T) */ *
from LUSER.FUBAR T  where rowid >= 'AAA9yPABZAAChZIAAA' and rowid < 'AAA9yPABZAAChZ4AAA'
union select /*+ ROWID(T) */ *
from LUSER.FUBAR T where rowid >= 'AAA9yPABZAAChaAAAA';

This one actually runs, but it runs for a long time and it eventually failed later because TEMP ran out of space. The reason for this can also be seen in the execution plan:

Plan hash value: 3469639691
------------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name         | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |              |    11M|  2177M|       |  4680K (78)| 15:36:06 |
|   1 |  SORT UNIQUE                  |              |    11M|  2177M|  2858M|  4680K (78)| 15:36:06 |
|   2 |   UNION-ALL                   |              |       |       |       |            |          |
|*  3 |    TABLE ACCESS BY ROWID RANGE| FUBAR        |  5445K|  1012M|       |   836K  (1)| 02:47:18 |
|*  4 |    TABLE ACCESS BY ROWID RANGE| FUBAR        |   272K|    50M|       |   836K  (1)| 02:47:16 |
|*  5 |    TABLE ACCESS BY ROWID RANGE| FUBAR        |   272K|    50M|       |   836K  (1)| 02:47:16 |
|*  6 |    TABLE ACCESS BY ROWID RANGE| FUBAR        |   272K|    50M|       |   836K  (1)| 02:47:16 |
|*  7 |    TABLE ACCESS BY ROWID RANGE| FUBAR        |  5445K|  1012M|       |   836K  (1)| 02:47:18 |
------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access(ROWID<'AAA9yPABZAAChWAAAA')
   4 - access(ROWID>='AAA9yPABZAAChXIAAA' AND ROWID<'AAA9yPABZAAChX4AAA')
   5 - access(ROWID>='AAA9yPABZAAChYIAAA' AND ROWID<'AAA9yPABZAAChY4AAA')
   6 - access(ROWID>='AAA9yPABZAAChZIAAA' AND ROWID<'AAA9yPABZAAChZ4AAA')
   7 - access(ROWID>='AAA9yPABZAAChaAAAA')

Look at line 1 and 2 in the plan. I forgot to use UNION ALL instead of UNION and Oracle then goes on to sort this stuff in order to return distinct rows. The table had over 100M rows and sorting this filled the TEMP tablespace completely. Since I know I want all rows, I changed it to UNION ALL (this is finally a good one you can use):

select FIRST_CORRUPT,FIRST_CLEAN,LAG(FIRST_CORRUPT) over (order by start_corrupt) PREVIOUS_CORRUPT,
  LEAD(FIRST_CORRUPT) over (order by start_corrupt) NEXT_CORRUPT, rownum rn
  from(
    select 
    DBMS_ROWID.ROWID_CREATE(1,O.DATA_OBJECT_ID,C.FILE_ID,C.START_CORRUPT,0) FIRST_CORRUPT,
    DBMS_ROWID.ROWID_CREATE(1,O.DATA_OBJECT_ID,C.FILE_ID,C.START_CORRUPT+C.CORRUPT_BLOCKS,0) FIRST_CLEAN,
    C.START_CORRUPT
    from SEGMENTS_WITH_CORRUPTION C, DBA_OBJECTS O
    where C.OWNER=O.OWNER
    and C.SEGMENT_NAME=O.OBJECT_NAME
    and FILE_ID=89 and C.OWNER='LUSER' and C.SEGMENT_NAME='FUBAR'
    order by c.start_corrupt
  )
)  select 'select /*+ ROWID(T) */ *
from LUSER.FUBAR T where rowid < ''' || FIRST_CORRUPT || '''' LINE 
   from INTERVALS
   where RN=1
   union 
   select 'union all select /*+ ROWID(T) */ *
from LUSER.FUBAR T  where rowid >= ''' || FIRST_CLEAN || ''' and rowid < ''' || NEXT_CORRUPT || '''' LINE
   from INTERVALS
   where RN>1 and NEXT_CORRUPT is not null
   union all
   select 'union all select /*+ ROWID(T) */ *
from LUSER.FUBAR T where rowid >= ''' || FIRST_CLEAN || ''';' LINE
   from INTERVALS
   where NEXT_CORRUPT is null
   order by RN
  ;

The new execution plan looks like this:

Plan hash value: 439408122
---------------------------------------------------------------------------------------------
| Id  | Operation                    | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |              |    11M|  2177M|  4181K (81)| 13:56:23 |
|   1 |  UNION-ALL                   |              |       |       |            |          |
|*  2 |   TABLE ACCESS BY ROWID RANGE| FUBAR        |  5445K|  1012M|   836K  (1)| 02:47:18 |
|*  3 |   TABLE ACCESS BY ROWID RANGE| FUBAR        |   272K|    50M|   836K  (1)| 02:47:16 |
|*  4 |   TABLE ACCESS BY ROWID RANGE| FUBAR        |   272K|    50M|   836K  (1)| 02:47:16 |
|*  5 |   TABLE ACCESS BY ROWID RANGE| FUBAR        |   272K|    50M|   836K  (1)| 02:47:16 |
|*  6 |   TABLE ACCESS BY ROWID RANGE| FUBAR        |  5445K|  1012M|   836K  (1)| 02:47:18 |
---------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access(ROWID<'AAA9yPABZAAChWAAAA')
   3 - access(ROWID>='AAA9yPABZAAChXIAAA' AND ROWID<'AAA9yPABZAAChX4AAA')
   4 - access(ROWID>='AAA9yPABZAAChYIAAA' AND ROWID<'AAA9yPABZAAChY4AAA')
   5 - access(ROWID>='AAA9yPABZAAChZIAAA' AND ROWID<'AAA9yPABZAAChZ4AAA')
   6 - access(ROWID>='AAA9yPABZAAChaAAAA')

To wrap it up:

If you have a large table with several corrupt blocks and many good ones, you can save the good data by creating a table that maps the bad blocks and then create a select statements that retrieve the rows from all the good blocks.

Best is of course to get your backup and recover routines in order, combine it with a decent fault management so you'll discover the corrupt blocks at an early stage, thereby making it possible to recover the blocks before they have been aged out of your backup system.

If corrupt blocks happens again and again you should see if it happens only with very few tables or many. If the corruption is restricted to few out of many tables it may indicate that it has something to do with how these tables are created and used (thinking about NOLOGGING operations here). In case of frequent and seemingly random corruption I would start thinking about the underlying storage.

Tuesday, May 21, 2013

Download from OTN slow? Check DNS

Sometimes download from Oracle Technology Network (OTN) is really slow, but it may not be Oracle's fault. Oracle is using Akamai Technologies to distribute the software. You can see this if you do an nslookup on download.oracle.com and see it resolves (finally) to a host under akamai.net.

They have servers all over the world and when things work as they are supposed to the box on your end should connect to a host physically near you and not on another continent. If your download is really slow find the IP address that your DNS resolves for download.oracle.com (or the host you are downloading from) and then use a geo-lookup site to see where it is located.

Just now the DNS server for my local ISP (get.no) resolved download.oracle.com to two servers in California:

[root@rio ~]# nslookup download.oracle.com
Server:         84.208.20.110
Address:        84.208.20.110#53

Non-authoritative answer:
download.oracle.com     canonical name = download.oracle.com.edgesuite.net.
download.oracle.com.edgesuite.net       canonical name = a1961.d.akamai.net.
Name:   a1961.d.akamai.net
Address: 64.209.77.10
Name:   a1961.d.akamai.net
Address: 64.209.77.18

I used www.infosniper.net to find where this host was located. Me being in Oslo this was obviously not optimal. The only DNS server I can remember is 8.8.8.8 which is one of Google's public DNS servers. I decided to change /etc/resolv.conf to use 8.8.8.8 as the primary DNS server, and download.oracle.com now resolves to two servers in Germany:

[root@rio ~]# nslookup download.oracle.com
Server:         8.8.8.8
Address:        8.8.8.8#53

Non-authoritative answer:
download.oracle.com     canonical name = download.oracle.com.edgesuite.net.
download.oracle.com.edgesuite.net       canonical name = a1961.d.akamai.net.
a1961.d.akamai.net      canonical name = a1961.d.akamai.net.0.1.cn.akamaitech.net.
Name:   a1961.d.akamai.net.0.1.cn.akamaitech.net
Address: 46.33.72.121
Name:   a1961.d.akamai.net.0.1.cn.akamaitech.net
Address: 46.33.72.129

Here is the link if you like maps... I then flushed DNS in Firefox, restarted the download, and confirmed with netstat -an that the connection now went to a server in Germany. The download went much faster, well, at least for a while... YMMV.

Actually I thought that using Google's DNS server could give a suboptimal response if Google's DNS server was located somewhere far away (like in USA), but clearly they have some magic here that returns an IP address better than the DNS server of my local ISP.

Wednesday, April 24, 2013

OUGN Spring Seminar 2013

Last week's conference is the second since I joined the board of our user group. Never been more worried, but it turned out better than ever, in my not so humble opinion.

I was really proud of our agenda, we had several new speakers with top content.

Cary Millsap from Method R gave a keynote and two excellent presentations. This was his first time in Norway, but judging by the feedback we got he will be back soon. Actually having him here at our conference was a goal I committed to last year. Thanks, Cary, for excellent content and presentation!

Last OOW I met Connor McDonald at the blogger's meetup that Pythian organizes at OOW every year and I thought it would be cool if we could have him at our conference. After some severe lobbying with the rest of the board we decided on a plan to make it happen. Actually I had never seen Connor present before, but when he did his Odds and Ends presentation I understood what everybody was talking about. I think it is awesome that we can get the best, even if they come from the other side of the earth. We have not processed the feedback forms yet, but the impression so far is that Connor and Cary is sharing first place on content and presentation.

Tom Kyte, Bryn Llewellyn and Joel Goodman were back too. Always popular, each has his public (overlapping I assume) and genuinely dedicated. I am thankful for all the good publicity they are giving our conference.

There was also time for some psycho analysis and after a few beers Doug Burns and me discovered we are more or less soul mates. Always good to see the man and his presentations, he was mildly shocked when I showed up on one of his presos in stead of the one Cary was giving at the same time (confession: I had already seen the one Cary was giving). I learned a lot this time too. Good stuff! And thanks for the feedback, motivation and interests for our user group. You will always be on our MRU list.

We have two areas where we have to improve; Java and MySQL. Markus Eisele helped us to get speakers for the Development/Java track in addition to speak himself. We have work to do on Java, and the same goes for MySQL, it is a different group of people. I know Oracle and many in the community have a lot of good stuff on Java and MySQL, but first we need to appear on those users' radar and have them come to our table.

I spend much of my digital social life on Twitter. When I go to OOW or MOW I get to meet people whose Tweets and blogs I've been reading for a time. It is always a nice experience. This year two Twitter friends came to our conference; Philippe Fierens from Belgium (check out his nice post about the conference) and Jan Karremans from Netherlands. In fact the whole group from Netherlands was quite impressive.

It is a lot of hard work to organize a user group conference, the agenda requires a lot to get in place. For me it is the best networking opportunity, it is the place where I get to know what is not written in any manual or book, and meet friends IRL. Here is one, Martin Nash, and the whole Martin cluster.

The cruise part of Spring Seminar 2014 will start 2014-04-03, with a possible conference day in Oslo the day before. But first I'll take a short vacation next week, going to MOW2013.

Sunday, April 14, 2013

Removing obsolete parameters from spfile

This is a common error message after upgrade of the database:

ORA-32004: obsolete or deprecated parameter(s) 
specified for RDBMS instance

In 11g (last tested with 11.2.0.3) even if your start up the database from an old type pfile without any obsolete parameters and then create an spfile from memory like this:

create spfile from memory;

Oracle will insert two obsolete parameters into the spfile. This query will list obsolete parameters in your spfile:

select p.name,p.value
from v$parameter p, v$spparameter s
where s.name=p.name
and p.isdeprecated='TRUE'
and s.isspecified='TRUE';

NAME                  VALUE
--------------------- ----------------------------------------------
background_dump_dest  /u01/app/oracle/diag/rdbms/hfmtst/HFMTST/trace
user_dump_dest        /u01/app/oracle/diag/rdbms/hfmtst/HFMTST/trace

They can be cleared with:

alter system reset background_dump_dest;
alter system reset user_dump_dest;

Sunday, April 7, 2013

Corrupt database blocks II

After you have validated database as shown in previous post any corrupt database blocks will be reported in the view V$DATABASE_BLOCK_CORRUPTION. Now you want to find out the tables that used to have data in these blocks; those data have typically been lost if you have no backup of them or do not posses special tools like DUDE.

Before you start I recommend to purge the DBA Recycle Bin if it is in use, as reported here certain queries seem to take longer time when it is not empty. Execute as SYSDBA:

purge dba_recyclebin;

Then create a temporary table with information on the corrupt blocks:

create table segments_with_corruption
as select FILE_ID,BLOCK_ID,blocks,OWNER,SEGMENT_NAME,
  start_corrupt,corrupt_blocks
from (
  select e.FILE_ID,e.BLOCK_ID,e.blocks,e.OWNER,e.SEGMENT_NAME,
    c.block# start_corrupt,c.blocks corrupt_blocks
  from DBA_EXTENTS E, V$DATABASE_BLOCK_CORRUPTION C
  where E.BLOCK_ID <= C.block# 
    and E.BLOCK_ID + E.blocks - 1 >= C.block#
  and e.file_id=c.file#
  union all
  select e.FILE_ID,e.BLOCK_ID,e.blocks,e.OWNER,e.SEGMENT_NAME,
    c.block# start_corrupt,c.blocks corrupt_blocks
  from DBA_EXTENTS E, V$DATABASE_BLOCK_CORRUPTION C
  where E.BLOCK_ID >= C.block# 
    and E.BLOCK_ID <= C.block# + C.blocks -1
  and e.file_id=c.file#
);

This query may take a long time to run, which is a reason to materialize it in a table while you are working on this. You can find a more advanced query on MOS in article 472231.1 To create this query I made a drawing for myself with all different scenarios on how interval of corrupt blocks could possible overlap with extents belonging to some object. Then I wrote down the criteria and merged them together.

Use this query to find owner, name and type of the segments you have to work with:

select distinct c.owner,c.segment_name, c.segment_type
from segments_with_corruption c, dba_segments s
where c.segment_name=c.segment_name and c.owner=s.owner;

For segment_type=INDEX you can fix it by simply rebuilding the index (hopefully none of the rows it indexes are in corrupt blocks).

The next post will be on how to salvage as much as possibly from the good blocks belonging to a table that has some corrupt blocks. I'll show a query you will run to generate other statements that will be used to extract data from blocks around the corrupt blocks.

Sunday, March 31, 2013

Corrupt database blocks I

I worked with corrupt blocks in a database a few weeks ago and decided to write three posts about it so I don't have to rethink all of it next time it happens. Hardware and the database have improved a lot, because I rarely run into this problem.

The first and obvious thing to state is that corrupt blocks in the database should be detected early by proper fault management (if you use Enterprise Manager you are good). When detected it should be fixed by using RMAN, either by command:

recover corruption list;

or by recovering individual blocks, e.g.

recover datafile 42 block 5 to 6;

In case you don't have any backup or the error went undetected for a long time and you decide that block recovery is not feasible you need to clean it up some other way. If the RMAN backup detects the corrupt block it will abort at first corrupt block (unless you change the SET MAXCORRUPT FOR DATAFILE). This means that there may be more corrupt blocks in the database and you may just as well find them all while you are working on it.

The Database Verification tool is supposed to do find all corrupt blocks, but last time (11.2.0.2 on Linux x64) it failed to find new ones. This tool is documented in the manual Utilities. I looped over all the data files with command dbv, but it was a waste of time. This may have something to do that there are different types of corruptions. Instead I used VALIDATE in RMAN. You can validate individual data files or the entire database:

VALIDATE DATAFILE 42;
VALIDATE DATABASE;

This will not consider MAXCORRUPTIONS, but scan the data file / entire database and update the view v$database_block_corruption with info on corrupt blocks found.

Next post will be on how to find objects that have, or rather, used to have data in these blocks.

Thursday, February 28, 2013

Throw out the garbage

The garbage in this case is the recycle bin in the database. Just as in any tidy house you should not wait too long before you empty the garbage bin. I've come across two reasons to empty it.

One is the ORA-3297 that is thrown when you try to downsize a data file you think have lots of free space at the end. I once wrote this SQL to find the lower limit that I can reduce the data file to:

Select File_Id,ceil(Hb.Min_Red*8192/1024/1024/1024) Minimal_Size_Gb ,round(Df.Blocks*8/1024/1024,2) Total_Gb,
Round((Df.Blocks-Hb.Min_Red)*8/1024/1024) higher_free_GB
from dba_data_files df join
(Select File_Id, Max(Block_Id+Blocks) Min_Red
From Dba_Extents
Group By File_Id) Hb Using(File_Id)
Order By  Higher_Free_Gb Desc;

On one occasion when I tried to reduce the size of the data file an ORA-3297 was thrown. Of course I could have checked DBA_RECYCLEBIN, but I was checking DBA_SEGMENTS and noticed several segments with names that started with BIN which immediately reminded me about the recycle bin. I purged the recycle bin and the resize operation went through.

The other reason is that certain queries against DBA_FREE_SPACE may actually take much longer when there are segments (tables) in the recycle bin. One third-party monitoring application had a query that took more time to complete than anything else, after a purge dba_recyclebin the query had a reasonable response time.

Some argue that one should not use the recycle bin in production. I think it depends; if your application creates and drops tables frequently, see if you can add PURGE to the DROP TABLE statement, if not turn the recycle bin off by setting the system parameter recyclebin to off. Also after a successful release of a new version empty the recycle bin when everything is verified OK. Accidents do happen, therefore I like to have the recycle bin enabled just in case. Except for the case already mentioned I think that dropping tables should not happen so often in a production environment that it becomes a problem.

Friday, January 11, 2013

Flashback database fails with read-only tablespace

Flashback database, (active) duplicate with RMAN, and tablespaces that are read-only. You can only have two out of three from my recent testing. One database with a few read-only tablespaces was cloned by using active duplicate in RMAN. Before the testing (using Real Application Testing replay) a restore point with guarantee flashback database was created. After the testing I tried to perform a flashback database, but it failed with:

SQL> flashback database to restore point b4_test;
flashback database to restore point b4_test
*
ERROR at line 1:
ORA-01122: database file 38 failed verification check
ORA-01110: data file 38: '+DATA/clonedb/datafile/foo_bar002.443.804434293'
ORA-01207: file is more recent than control file - old control file

The file in question belongs to a read-only tablespace and is certainly not newer than the control file. It looks to be a bug on 11.2.0.3 and I found a way to work around it if you take some actions before you start testing. I did not find the same error stack mentioned together with flashback, or anything directly relevant on My Oracle Support, but similar symptoms made me think the problem would go away if I set the read-only tablespaces read-write. I generated SQL with this:

select 'alter tablespace ' || tablespace_name || ' read write;'
from dba_tablespaces
where status='READ ONLY';

After I executed the generated statements, I created a new restore point with:

drop restore point b4_test;
create restore point b4_test guarantee flashback database;

For testing databases it is probably no big deal if you set all the tablespaces read only. If you try to set the tablespaces read-write after you have done some changes to the database you want to revert (i.e. testing) the flashback operation will probably fail, possibly with an ORA-600 error, but I have not tested it. In other words, the workaround must be in place before you create the restore point.

I don't know why this happens for databases that are duplicated with RMAN. Flashback database on the source database works OK even with read-only tablespaces. The source database was created from a restore of an RMAN backup taken on a physical standby database. Also the duplication itself works OK with the read-only tablespace. Two out of three, then.

Subscribe to: Posts ( Atom )