Sunday, September 15, 2013

Finding corrupt XML files on Linux

Here is an easy command to find corrupt XML-files on a Linux box.

A couple of days ago I had a problem with OBIEE and weblogic. Googling and search on My Oracle Support indicated that the failure may be caused by a corrupt XML file. And you know that there are many of them in an OBIEE installation. In fact MOS in Doc 1475146.1 blindly suggests to move a bunch of them to a backup location, because in some cases they had become corrupt for some strange reason. I was not satisfied with just deleting a few random files that have caused problems for others earlier, I wanted to find the file.

I then came up with the idea that I could find the corrupt file by parsing every XML file below a point in the file hierarchy.  On Linux this is quite easy. The command xmlint is possibly not standard on every system, but should be easy to get with yum or similar. In my case the file I was looking for belonged to user weblogic, meaning the corrupt file would reside in a directory called weblogic, hence the command. But you can of course tweak the arguments to the find command to search the files you want:


find . -wholename \*weblogic\*.xml -ls -exec xmlint {} > /dev/null \;


Any normal output will be discarded and only failures sent to stderr (on your screen). A similar command to parse every file with extension xml below $ORACLE_BASE:

find $ORACLE_BASE -name \*.xml -exec xmlint {} > /dev/null \;


If you (re)move the file you usually have to restart stuff with opmnctl.