I have a CSS file that looks fine when I open it using gedit, but when it's read by PHP (to merge all the CSS files into one), this CSS has the following characters prepended to it: 
PHP removes all whitespace, so a random  in the middle of the code messes up the entire thing. As I mentioned, I can't actually see these characters when I open the file in gedit, so I can't remove them very easily.
I googled the problem, and there is clearly something wrong with the file encoding, which makes sense being as I've been shifting the files around to different Linux/Windows servers via ftp and rsync, with a range of text editors. I don't really know much about character encoding though, so help would be appreciated.
If it helps, the file is being saved in UTF-8 format, and gedit won't let me save it in ISO-8859-15 format (the document contains one or more characters that cannot be encoded using the specified character encoding). I tried saving it with Windows and Linux line endings, but neither helped.
Three words for you:
That's the representation for the UTF-8 BOM in ISO-8859-1. You have to tell your editor to not use BOMs or use a different editor to strip them out.
To automatize the BOM's removal you can use awk
as shown in this question.
As another answer says, the best would be for PHP to actually interpret the BOM correctly, for that you can use mb_internal_encoding()
, like this:
<?php
//Storing the previous encoding in case you have some other piece
//of code sensitive to encoding and counting on the default value.
$previous_encoding = mb_internal_encoding();
//Set the encoding to UTF-8, so when reading files it ignores the BOM
mb_internal_encoding('UTF-8');
//Process the CSS files...
//Finally, return to the previous encoding
mb_internal_encoding($previous_encoding);
//Rest of the code...
?>
Open your file in Notepad++. From the Encoding menu, select Convert to UTF-8 without BOM, save the file, replace the old file with this new file. And it will work, damn sure.
In PHP, you can do the following to remove all non characters including the character in question.
$response = preg_replace('/[\x00-\x1F\x80-\xFF]/', '', $response);
/
, it should be: $response = preg_replace('/[\x80-\xFF]/', '', $response);
For those with shell access here is a little command to find all files with the BOM set in the public_html directory - be sure to change it to what your correct path on your server is
Code:
grep -rl $'\xEF\xBB\xBF' /home/username/public_html
and if you are comfortable with the vi editor, open the file in vi:
vi /path-to-file-name/file.php
And enter the command to remove the BOM:
set nobomb
Save the file:
wq
grep -rlI $'\xEF\xBB\xBF' .
to ignore binary files.
BOM is just a sequence of characters ($EF $BB $BF for UTF-8), so just remove them using scripts or configure the editor so it's not added.
From Removing BOM from UTF-8:
#!/usr/bin/perl
@file=<>;
$file[0] =~ s/^\xEF\xBB\xBF//;
print(@file);
I am sure it translates to PHP easily.
$string = preg_replace('/\x{EF}\x{BB}\x{BF}/','',$string);
. before you use this, reconsider if you can't fix the problem at the source instead.
I don't know PHP, so I don't know if this is possible, but the best solution would be to read the file as UTF-8 rather than some other encoding. The BOM is actually a ZERO WIDTH NO BREAK SPACE. This is whitespace, so if the file were being read in the correct encoding (UTF-8), then the BOM would be interpreted as whitespace and it would be ignored in the resulting CSS file.
Also, another advantage of reading the file in the correct encoding is that you don't have to worry about characters being misinterpreted. Your editor is telling you that the code page you want to save it in won't do all the characters that you need. If PHP is then reading the file in the incorrect encoding, then it is very likely that other characters besides the BOM are being silently misinterpreted. Use UTF-8 everywhere, and these problems disappear.
For me, this worked:
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
If I remove this meta, the  appears again. Hope this helps someone...
You can use
vim -e -c 'argdo set fileencoding=utf-8|set encoding=utf-8| set nobomb| wq'
Replacing with awk seems to work, but it is not in place.
grep -rl $'\xEF\xBB\xBF' * | xargs vim -e -c 'argdo set fileencoding=utf-8|set encoding=utf-8| set nobomb| wq'
grep -rlI $'\xEF\xBB\xBF' .
to ignore binary files. And also .
better then *
here.
I had the same problem with the BOM appearing in some of my PHP files ().
If you use PhpStorm you can set at hotkey to remove it in Settings -> IDE Settings -> Keymap -> Main Menu - > File -> Remove BOM.
In Notepad++, choose the "Encoding" menu, then "Encode in UTF-8 without BOM". Then save.
See Stack Overflow question How to make Notepad to save text in UTF-8 without BOM?.
Open the PHP file under question, in Notepad++.
Click on Encoding at the top and change from "Encoding in UTF-8 without BOM" to just "Encoding in UTF-8". Save and overwrite the file on your server.
Same problem, different solution.
One line in the PHP file was printing out XML headers (which use the same begin/end tags as PHP). Looks like the code within these tags set the encoding, and was executed within PHP which resulted in the strange characters. Either way here's the solution:
# Original
$xml_string = "<?xml version=\"1.0\" encoding=\"UTF-8\"?>";
# fixed
$xml_string = "<" . "?xml version=\"1.0\" encoding=\"UTF-8\"?" . ">";
If you need to be able to remove the BOM from UTF-8 encoded files, you first need to get hold of an editor that is aware of them.
I personally use E Text Editor.
In the bottom right, there are options for character encoding, including the BOM tag. Load your file, deselect Byte Order Marker if it is selected, resave, and it should be done.
Alt text http://oth4.com/encoding.png
E is not free, but there is a free trial, and it is an excellent editor (limited TextMate compatibility).
Here is another good solution for the problem with BOM. These are two VBScript (.vbs) scripts.
One for finding the BOM in a file and one for KILLING the damned BOM in the file. It works pretty fine and is easy to use.
Just create a .vbs file, and paste the following code in it.
You can use the VBScript script simply by dragging and dropping the suspicious file onto the .vbs file. It will tell you if there is a BOM or not.
' Heiko Jendreck - personal helpdesk & webdesign
' http://www.phw-jendreck.de
' 2010.05.10 Vers 1.0
'
' find_BOM.vbs
' ====================
' Kleines Hilfsmittel, welches das BOM finden soll
'
Const UTF8_BOM = ""
Const UTF16BE_BOM = "þÿ"
Const UTF16LE_BOM = "ÿþ"
Const ForReading = 1
Const ForWriting = 2
Dim fso
Set fso = WScript.CreateObject("Scripting.FileSystemObject")
Dim f
f = WScript.Arguments.Item(0)
Dim t
t = fso.OpenTextFile(f, ForReading).ReadAll
If Left(t, 3) = UTF8_BOM Then
MsgBox "UTF-8-BOM detected!"
ElseIf Left(t, 2) = UTF16BE_BOM Then
MsgBox "UTF-16-BOM (Big Endian) detected!"
ElseIf Left(t, 2) = UTF16LE_BOM Then
MsgBox "UTF-16-BOM (Little Endian) detected!"
Else
MsgBox "No BOM detected!"
End If
If it tells you there is BOM, go and create the second .vbs file with the following code and drag the suspicios file onto the .vbs file.
' Heiko Jendreck - personal helpdesk & webdesign
' http://www.phw-jendreck.de
' 2010.05.10 Vers 1.0
'
' kill_BOM.vbs
' ====================
' Kleines Hilfmittel, welches das gefundene BOM löschen soll
'
Const UTF8_BOM = ""
Const ForReading = 1
Const ForWriting = 2
Dim fso
Set fso = WScript.CreateObject("Scripting.FileSystemObject")
Dim f
f = WScript.Arguments.Item(0)
Dim t
t = fso.OpenTextFile(f, ForReading).ReadAll
If Left(t, 3) = UTF8_BOM Then
fso.OpenTextFile(f, ForWriting).Write (Mid(t, 4))
MsgBox "BOM gelöscht!"
Else
MsgBox "Kein UTF-8-BOM vorhanden!"
End If
The code is from Heiko Jendreck.
In PHPStorm, for multiple files and BOM not necessarily at the beginning of the file, you can search \x{FEFF}
(Regular Expression) and replace with nothing.
Same problem, but it only affected one file so I just created a blank file, copy/pasted the code from the original file to the new file, and then replaced the original file. Not fancy but it worked.
Use Total Commander to search for all BOMed files:
Elegant way to search for UTF-8 files with BOM?
Open these files in some proper editor (that recognizes BOM) like Eclipse.
Change the file's encoding to ISO (right click, properties).
Cut  from the beginning of the file, save
Change the file's encoding back to UTF-8
...and do not even think about using n...d again!
I had the same problem. The problem was because one of my php files was in utf-8 (the most important, the configuaration file which is included in all php files).
In my case, I had 2 different solutions which worked for me :
First, I changed the Apache Configuration by using AddDefaultCharsetDirective in configuration files (or in .htaccess). This solution forces Apache to use the correct encodage.
AddDefaultCharset ISO-8859-1
The second solution was to change the bad encoding of the php file.
Copy the text of your filename.css file. Close your css file. Rename it filename2.css to avoid a filename clash. In MS Notepad or Wordpad, create a new file. Paste the text into it. Save it as filename.css, selecting UTF-8 from the encoding options. Upload filename.css.
This works for me!
def removeBOMs(fileName):
BOMs = ['',#Bytes as CP1252 characters
'þÿ',
'ÿþ',
'^@^@þÿ',
'ÿþ^@^@',
'+/v',
'÷dL',
'Ýsfs',
'Ýsfs',
'^Nþÿ',
'ûî(',
'„1•3']
inputFile = open(fileName, 'r')
contents = inputFile.read()
for BOM in BOMs:
if not BOM in contents:#no BOM in the file...
pass
else:
newContents = contents.replace(BOM,'', 1)
newFile = open(fileName, 'w')
newFile.write(newContents)
return None
Check on your index.php
, find "... charset=iso-8859-1
" and replace it with "... charset=utf-8
".
Maybe it'll work.
Success story sharing