Pages

Saturday, March 31, 2012

How to find all UTF-8 files with BOM

It is often annoying when you see strange spaces on your web page even when HTML code looks perfect and you are spending hours to figure out the reason by modifying HTML/CSS files without any success. The reason may be in your application providing HTML output having some UTF files containing byte order mark (BOM) which is invisible at editor therefore very difficult to be eliminated.
Here is a simple way how to find the list of such files:
find -type f|while read file;do [ "`head -c3 -- "$file"`" == $'\xef\xbb\xbf' ] && echo "found BOM in: $file";done
You may write a shell script if you wish:
find -type f |
while read file do
    if [ "`head -c 3 -- "$file"`" == \xef\xbb\xbf' ] then
    echo "found BOM in: $file"
fi
done
If you are windows developer consider Cygwin :)

No comments:

Post a Comment